Archive for the ‘Miscellaneous’ Category

Search Engines Cache in the Times of Drug Busts

May 7, 2008

One nice thing about modern search engines is that these allow users access to cached pages. These are old version pages that reside -often precompressed- in a specific section of their architecture. 

Unless the owner or administrator of a site instructs search engines (via metadata or a robot text file) not to cache a document(s) old versions will be available to the end users via the cache command or via a cache link next to a search result.

This feature comes handy for those that use search engines for intelligence purposes. A lot of useful information can be found by searching for cached documents. At the same times old glorious pages can be become unwanted.

Ask San Diego State University’s Marketing and Communication Department. Out of embarrassment, they just removed the document listed at http://advancement.sdsu.edu/marcomm/features/2006/compact.html in which they feature a role model student (Kenneth Ciaccio), which yesterday was arrested on charges in connection with an on campus drug bust operation.

The page is still showing up in Google’s cache and reflects bad on SDSU and its Compact for Success program. To access this in Google just do a search and click the cache link or enter in the query box cache:url where url is the address of the above document.

 

Pink Keywords: Optimization of Resumes and Job Applications

May 5, 2008

The current slump in the US and PR economy and so many local employers giving pink slips induces me to think of the importance of pink keywords.

These are keywords one would use to optimize resumes and job applications.

Now than ever recruiters, middle management, and HR departments need to look through zillion of resumes, looking for specific clues in the form of pinky keywords. This means that resumes and job applications must be optimized for such terms.

http://career-advice.monster.com/resume-writing-basics/Keyword-Challenge/home.aspx

The best way of finding good pinky keywords consists in selling to employers their own crappy ads and job offers; that is, by scanning employment ads, job offerings, and classifieds relevant to the target position one is interested in and then using the target terms in your own resume. Another thing one can do is to expand these with related or contextual terms; of couse, using those that match your own experience and skills.

I see here an opportunity for ethical SEO companies to provide a valuable and noble service: Pinky Optimization. At the same time I see an opportunity for crook SEOs and spammers to prey on other people’s misfortune. Since many in the seophere have being disposed by fat cats and sold(soul)-outs, these folks are also job searching. Life ironies.

SEOs - Desperate Seeking Clients

April 24, 2008

From time to time I receive unsolicited emails from SEOs offering me their services, to list my site in the major search engines and directories. They often send templates-like automatic messages (”Dear website owner”) and appear not to even bother to check if recipients need the service. 

These SEOs often look desperate and sound like snakeoil sellers and crooks. They even claim to be better than other SEOs.

They often pitch the same crap:

  • “I recently visited your site” (Really? Why then send this crap?).
  • “you are not listed in the top search engines and directories” (Really? How do they know?).
  • “we can increase your traffic by X astronomical amount” (Really? Could you double X for me, please?).
  • “we can help you get top rankings in Google” (Really? For which keywords?).
  • “our link building program” (Really? Read here link exchange and link spam).
  • “we have proprietary crap, blah, blah, …” (Really? Sell it or get a patent!).

I just received one of such emails last night, even when my site is known in the IR/SEO spheres and has been listed for many years in the top search engines and directories, and ranking well.

Dear website owner,

I visited your website and noticed that you are not listed in many of the major search engines and directories. If our company can increase your traffic up to 500% by getting you top ranking results on the search engines such as Google would you be interested? We specialize in link building content writing and programming. We have proprietary techniques that work better and are less expensive than any other SEO firm.

Please let me send you a proposal and show you how we can make your website profitable.

Sincerely,

Christian Frank

2060 AVENIDA DE LOS ARBOLES, STE D
THOUSAND OAKS,
CA 91362-1361 - USA

These are the type of companies that give a black eye to the SEO industry. If SEOs send you this type of crap, I feel your pain. Stay away from their businesses or whatever they claim or seem to offer.

Few Rants: Microsoft, a Conference, and a Database Site

April 11, 2008

I normally don’t rant at this blog about trivial stuff in life since this blog is about IR and search engine research. Today I feel like I want to make an exception. So let see how I can tie few rants about silly every-day things to search engines.

Rant 1

I bought the Home and Student version of Windows Office ($122, through Costco). The learning curve started. I tried to open its case by just pulling off the red tab as suggested. The red tab was detached from the case and still there was no “open Sesame”. I then tried different thing until decided to slice the clear seal at the top of the case with a knife and voila! Nothing like a puertorican solution for a “Made in Puerto Rico” Windows Vista product! Duh!

So the recipe is: (1) get a knife, (2) slice seal, and (3) pull with your fingers the case identations toward your right. The inside case should open.

Out of curiousity I wanted to know if others out there struggled with the design of the case. I ended up googling for how to open windows office case and found this site which discussed the very same problem and the very same solution. I realize I was not alone.

There are now dozen of sites like this one that show users this dumb “how-to”. Many are complaining about the “brilliant” design of the box, which is just an usability and accessibility nightmare.

Read what others at the aforementioned site are commenting. Some there commented that ended up searching for:

open office 2007 box
open vista box
Office Packaging “how to open”
open microsoft office box
how to open MS office 2007 box

Something from the product design side is wrong when soooo many have to Google for just how to open the damn case of a Microsoft product, or of any product for that matter. Some thing is wrong when Microsoft lab rats have to explain online how to open the annoying case.

Rant 2

There is a local conference on information security I was invited to. Down the organization pipeline, something is wrong with a conference when their organizers have to chase for potential presenters one week before the event. I pass and wish them good luck.

Rant 3

There is a local company that created a database-driven site for the upcoming Elections. The problem: how to get politicians and average users to know how to use the technology. Also, the site already needs to be redesigned so it can rank high and gain traffic from search engine users.

All these, kind of belong to the Land of Duh.

Searchmageddon: Microsoft to Buy Yahoo!

February 1, 2008

As we mentioned few days ago,

http://irthoughts.wordpress.com/2008/01/23/microsofts-black-cloud-on-yahoos-seo-tag-clouds/ ,

Microsoft is finally buying Yahoo! Check here:

http://online.wsj.com/article/SB120186786283735047.html?mod=hpp_us_whats_news

As Jeremy Sawodny, I predict many old folks will walk rather than working for Bill Gates while duplicated positions will be eliminated. With already thousand being terminated at Yahoo!, more will soon follow.

The Final Search Battle is coming and its name is SEARCHMAGEDDON.

“And I saw from the mouth of the operating system, and the mouth of the database     and the mouth of the paid advertisers, three unclean spirits… the spirits of marketers, spammers, and hackers working signs, and they go forth to the kings of the whole network, to gather them to battle against the great day of the Almighty GoOGLE.  And Stanford shall gather them together into a place which in digital business is called SEARCHMAGEDDON.” –Internet 2:1, 2008.

Microsoft’s Black Cloud on Yahoo! & SEO Tag Clouds

January 23, 2008

From time to time rumors spread of the black cloud of Microsoft over Yahoo!; i.e., of Microsoft buying Yahoo!. This time things are less cloudy, especially now that Yahoo! is about to cut jobs.

Early this year, Jeremy Zawodny from Yahoo!, wrote:

“Sure, there would be cultural problems, integration challenges, and many people who’d likely walk. But at the end of the day, Microsoft would end up with a much larger set of online services, a better advertising network, and people who know how to build, brand, and market web stuff that people actually use.”

Talking about clouds:

A student asked me about some SEOs claiming that text tag clouds are a kind of LSI technology.

Pure non sense coming from many SEOs, as usual.

These clouds are easy to construct. No LSI is needed:

1. Sort terms from a document or lookup list by frequencies.
2. Normalize frequencies to run between the 0,1 interval.
3. Use normalized frequencies as parameters to be passed as font sizes.

For pizzaz, store terms into array to be sorted or randomized and or use some CSS.

We can do the same with hit counts assigned to blog categories, links, etc. No special technology is needed.

Until Next Year

December 28, 2007

Well, this was an incredible year.

I participated of several international conferences, changed ISPs, went back to teaching at the graduate school, and to conduct academic research; I also gained new friends from all over the world.

Next year I have several conferences and activities to take care of, teach next Spring a new graduate course, titled Search Engines Architecture, and take care of few consulting projects.

The IRW Newsletter should arrive to subscribers inbox early: Today.

I’m taking few days off. Until Next Year.

Cheers,

Dr. E. Garcia

Random Notes

October 3, 2007

1. IRW will run late due to my academic duties. The topic: Genetic Algorithms. It covers recent advances in the field and dozen of videos relevant to GA.

2. My vector analyzer and binary calculator is ready, but I need to double test its accuracy. It accept vectors in the order of 10,000 elements. Cool!

 3. Thank you for those using the Levenshtein Edit Distance Tool and for your suggestions.

4. The course on Web Mining and Business Intelligence is getting ready.

5. Current academic duties are limiting my time. Sorry. I need to be away from the web. Stick with me.

Random Notes and LauraMansfield

September 12, 2007

These are some late random notes. Sorry for the delay.

1. I am putting together a research project for a graduate student. The topic is quite interesting: homeland security. While researching the topic I came across LauraMansfield.com site. Mansfield’s site is a goldmine of information, especially for those interested in co-occurrence and word association research applied to the terrorist knowledge domain.

2. I am reviewing a graduate thesis in which logistic regression is used for data mining medical claims. Quite interesting the thesis topic. The manuscript needs some rework, though.

3. I am reading bits and pieces of an old paper on the non-transitivity nature of Jaccard’s Coefficient and a proposed indirect similarity measure.

Random Notes

August 30, 2007

I’m putting the final touches to IR Watch, now in its first year of publication. I started the project a year ago. Thank you for your support.

Tomorrow I will post a sneak preview of the September issue. This one is about research conducted at the Office of Naval Research in the area of search modes. If you are a keyword researcher you need to read this issue.

I’m also researching a large repository of obscure databases, accessible through ftp. If you are a KDD researcher, you will love to know about these.

JavaScript Tips

August 21, 2007

This is not a post about an IR topic, but since at some point IR projects resource to programming, I believe the post is relevant to this blog –especially when many IR tools used in a classroom demonstration setting are written in JavaScript.

I’m reading Douglas Crockford great video/ppt presentations on JavaScript via http://101out.com/js.php. There are many things average programmers don’t know about JavaScript, the most misunderstood programming language on the Planet. For those not familiar with Crockford, few years ago he pioneered the right way of writing JavaScript. Haven’t heard of JSON?

He is giving so much great tips in those videos and ppt slides. Here are some tips:

Tip #1

//Instead of

if(a==null) {...} //which does coercion

//do this:

if(a===null){...}

//Also instead of != use !==//Avoid altogether == and != in your code. The === operator compares objects references, not values. It is true only if both operands are the same object

Tip #2

//Instead ofif(a){return a.member;}else{return a;}

//do this, which is shorter:

return a && a.member;

 Tip #3

//Use || to set default values
//do this, which requires less typing:

var last=input||nr_items;

//if input is truthy, last is input, otherwise set last to nr_items

Tip #4

//Statements can have labels. Break statements can refer to labels. Use labels only on do, for, switch, and while.

//do this

loop: for(; ;)
{
//do something
if(…){break loop;}
//do something
}

There are more great tips, but is better if you assimilate these at your own pace. Time to use literals more often. So,

//it is time to use () instead of new Object() and [] instead of new Array().

For code conventions for the JavaScript programming Language visit

http://javascript.crockford.com/code.html

I must agree with him that most JavaScript code on the web is crap.

When Local Relevancy is Irrelevant to Locals

August 10, 2007

When is local relevancy irrelevant to locals? In other words, when is local not important to locals?

That depends on whom you ask. For instance, at times news relevant to a location are not known by locals because of manipulation by media moguls. When globals know more than locals about local news you know that something is not working right.

(more…)

Minerazzi: What in a name?

May 22, 2007

I’m building a client-side suite of text mining tools for extracting intelligence from text files, Web pages, and email documents.  It comes in four versions: basic,  intermediate,  advance,  and pro. The basic version provide the following reports:

(more…)

Eigenvectors and Reggaeton Music = Eiggaeton

May 21, 2007

Eigenvectors and eigenvalues come in pairs; that is why we use the term eigenpair. Some have asked me about practical applications of eigenpairs. So this post goes.

Did you know the connection between eigenvectors and Reggaeton Music (or music in general)? How about eigenvectors and bridges, car designers, speakers, architecture, or oil companies?

(more…)

The New Iteration of Mi Islita

May 15, 2007

Today I uploaded the new iteration of Mi Islita.com site.

I’ve added or updated the following resource pages:

IR Thoughts Archives - A sample of posts from this blog, powered by a homemade AJAX reader and regexps.

IR Calls - A list of conferences and industry events we recommend you to attend.

IR Tutorials - Tutorials on Vector Space and LSI Models, Matrix Algebra, and more.

Educational Links - Graduate theses and research projects referencing Mi Islita.

Marketing Links - Search engine marketing articles referencing Mi Islita.

(more…)

IR Thoughts New Home

April 30, 2007

Welcome to the new home of IR Thoughts, Mi Islita.com blog about news, papers, and theses relevant to information retrieval, data mining, and search engine technologies. This blog replaces the version over at http://www.miislita.com.

Why a new home for IR Thoughts?

(more…)