• About IR Thoughts

IR Thoughts

~ Thoughts on Information Retrieval, Data Mining, and Search Engines

IR Thoughts

Category Archives: Human-Computer Interaction

PlaceRaider: A government smartphone spyware?

03 Wednesday Oct 2012

Posted by egarcia in Homeland Security, Human-Computer Interaction

≈ Leave a Comment

PlaceRaider has been called a government spyware for smartphones. Expect copycats soon. Download the PlaceRaider article.

The abstract says:

“As smartphones become more pervasive, they are increasingly targeted by malware. At the same time, each new generation of smartphone features increasingly powerful onboard sensor suites. A new strain of `sensor malware’ has been developing that leverages these sensors to steal information from the physical environment | e.g., researchers have recently demonstrated how malware can `listen’ for spoken credit card numbers through the microphone, or `feel’ keystroke vibrations using the accelerometer. Yet the possibilities of what malware can `see’ through a camera have been understudied.”

“This paper introduces a novel `visual malware’ called PlaceRaider, which allows remote attackers to engage in remote reconnaissance and what we call \virtual theft.” Through completely opportunistic use of the phone’s camera and other sensors, PlaceRaider constructs rich, three dimensional models of indoor environments. Remote burglars can thus `download’ the physical space, study the environment carefully, and steal virtual objects from the environment (such as nancial documents, information on computer monitors, and personally identi able information). Through two human subject studies we demonstrate the e ectiveness of using mobile devices as powerful surveillance and virtual theft platforms, and we suggest several possible defenses against visual malware.”

Electronic Drugs and Hackers

24 Tuesday Apr 2012

Posted by egarcia in Hacking, Human-Computer Interaction, Machine Learning

≈ Leave a Comment

I Doser has been called an addictive electronic drug. It is a common hype in social networks. But, actually it is nothing new, but a well-repacked business.

You can get all kind of e-drugs: from e-marihuana to e-….anything by just using earphones. A dangerous mixture if you are driving a car!

Such e-drugs are based on binaural beats, discovered in 1839 by Dove. These are slow modulations that are perceived when tones of different frequency are presented to each ear. Such auditory beats in the brain can have unexpected results, altering consciousness: A virtual LSD?

In 1973 Oster discovered that binaural beats can be detected by humans when carrier tones are below approximately 1000 Hz. According to Lane et al (see references below)

WHEN two pure auditory signals of similar frequency are mixed together, the phase interference between their waveforms produces a composite signal with a frequency midway between the upper and lower frequencies and an amplitude modulation that occurs with a frequency equal to the difference between the two original frequencies. For example, mixing tones of 100 Hz and 110 Hz yields a signal with a perceived frequency of 105 Hz that rises and falls in amplitude with a frequency of 10 Hz. The amplitude-modulated composite signal is called an auditory beat.

A similar phenomenon occurs when auditory signals of similar frequency are presented separately to the left and right ear through stereo headphones. Although each ear hears only one of the frequencies, the listener perceives the middle frequency and the amplitude modulation, even though the auditory beat does not exist in physical space. This phenomenon, called a ‘‘binaural auditory beat,’’ and described more than 25 years ago (6), is created by the brain’s processing of the two separate auditory signals at the level of the olivary nuclei of the brainstem.

It was a matter of time to see some looking for making a quick cash doing a 2 + 2 math, mixing hungry with necessity (“se juntó el hambre con la necesidad”). So now we can see low level forms of life looking for an escape to their reality through I Doser.

Hackers may soon be able to misuse these e-brain technologies to cause physical harm. A WMD in the making or accident waiting to happen?

References

Binaural Auditory Beats Affect Vigilance Performance and Mood
Auditory Beats in the Brain
Inducing Altered States Auditory and Visual Stimulation
Entraining Tones and Binaural Beats
Research_Frequencies
Audio-Visual Entrainment

Hey, SEOs: On Information Gain, Keyword Wallop, and Relevance

13 Monday Feb 2012

Posted by egarcia in Human-Computer Interaction, Machine Learning, Marketing Research

≈ Leave a Comment

Which words pack more wallop, are more emphatic, are more beefy or juicy? Whatever you want to call it, if you are an SEO or copywriter, you probably know what I mean.

Well, the answer to such a question depends on what you are trying to accomplish.
According to the family of BM25 algorithms,

http://irthoughts.wordpress.com/2011/08/04/bm25-and-bm25f-implications-to-seo-and-web-design/

a term has more information gain during its first occurrences, especially if these occur earlier in a document. This pressumes some kind of relationship between information gain and the position and distribution of words in a document.

Journalists and editors understand the concept. That’s why they like to answer the who, what, when, why, and how early in a copy, although not necessarily in that order.

And that’s why you see so many press release titles written in a ‘who-what’ way!

That strategy might work with search engines, but if you want to emphasize more specific keywords in a natural way you probably need a different keyword positioning strategy, at least if you write in English.

Says who? William Strunk, Jr. in his book The Elements of Style.
Says who? Joe Carrillo and Strunk, and quote:

http://josecarilloforum.com/forum/index.php?topic=496.0;prev_next=next

In his original 1918 edition of The Elements of Style (that was long before E. B. White came up with a chapter on style that made him a co-author of the book), William Strunk, Jr. came up with this perplexing prescription in his discussion of the principles of exposition:

“The proper place for the word, or group of words, which the writer desires to make most prominent is usually the end of the sentence…The word or group of words entitled to this position of prominence is usually the logical predicate, that is, the new element in the sentence…”

Strunk gave the following example to illustrate his point:

The modifying phrase at the tail-end of the sentence: “This steel is principally used for making razors, because of its hardness.”

The logical predicate at the tail-end of the sentence: “Because of its hardness, this steel is principally used in making razors.”

And here is the eye-opening point:

For his final words on the subject, however, Strunk made the following provocative—and as I already said, perplexing—prescription:

“The principle that the proper place for what is to be made most prominent is the end applies equally to the words of a sentence, to the sentences of a paragraph, and to the paragraphs of a composition.”

Carrillo’s essay is an excellent one. He later wrote a follow up post and quote:
http://josecarilloforum.com/forum/index.php?topic=627.0

In spoken English, we can emphasize the ideas we want to emphasize by giving them a stronger stress, leveling off our voice when enunciating minor or neutral ones, and downplaying the points that simply don’t support our contention. In writing, however, the process is rarely that simple. We can achieve emphasis only with our choice of words and how we array them into word clusters, into clauses and phrases, and ultimately into sentences and paragraphs. Mechanical devices exist that help, of course, like underlining, boldface type, italics, headlines and subheadlines, and—in today’s savvy word-processing routines—even colors, clip-arts, and emoticons. But as the aspiring writer soon discovers, much of the emphasis we seek has to be built into the very contours of the individual words as they unfold on the page.

There are three basic word-positioning principles we must know for maximum emphasis in writing English sentences: first, the initial and terminal positions of sentences are by nature more emphatic than their middles; second, when we construct a complex sentence, the main clause gets more emphasis than subordinate clauses; and third, when everything is written and done, the last words of the sentence are normally the most emphatic of all. These are structurally inherent in the English language itself, as we will see more clearly when we study them in closer detail.

Carrillo then mentions three important concepts:
1. The initial and terminal positions of sentences are prime.
2. The main clause gets more emphasis than subordinate clauses.
3. The last words of the sentence are normally the most emphatic.

The take away

Clearly, all this shows that although interrelated, information gain, keyword wallop, and relevancy are not the same thing. Relevancy is more along the lines of “aboutness”, “eliteness”, and few other semantic concepts.

The problem is that there is a relevance perception divide between machines and end-users: topic that we have discussed. See this link:

http://irthoughts.wordpress.com/2007/06/01/sneak-preview-of-ir-watch-2007-6-issue/

Still thinking in the keyword density/spamming crap?

A New Weighting Strategy

27 Tuesday Dec 2011

Posted by egarcia in Data Mining, Human-Computer Interaction, Machine Learning, Marketing Research, Programming, Quack Science, Statistics and Mathematics, Web Mining Course

≈ Leave a Comment

I received this morning from the editors of Communications in Statistics: Theory and Methods confirmation that they accepted and will be publishing my peer reviewed paper on a new model for statistical analysis. It should be out this 2012.

Once published, you will understand the SEO (* SEOmoz, I should say) non-sense of computing arithmetic averages of correlation coefficients and why some meta-analysis studies published in the past (* Hunter-Schmidt; Hedges-Olkin) are flawed and invalid.

It took me several meals and research hours to figure it out. I hope that IRs, dataminers, and statistics colleagues find new applications for the model.

The model can be applied to many fields, including marketing, business, risk analysis, data mining, signal processing, engineering, clinical trials, and almost any field or knowledge domain that involves the calculation of weighted statistics. I look forward to discuss it online once it get published.

Happy New Year.

PS. (*) I’ve edited this post to make these points obvious. So, the issue of arithmetically averaging correlations has been raised and killed for good before the scientific and statistical community.

PS. Just in: Last night (Jan-03-2012) I received news from one of the editors of the journal that the paper was assigned to issue 41 (8). Check for its title: The Self-Weighting Model (in Spanish is something like “El Modelo de Autoponderacion“. I forget to mention that this journal is published biweekly; so, things are moving fast. What a way of ending 2011 and starting 2012!!!

An online Crawler for the masses

11 Monday Apr 2011

Posted by egarcia in Data Mining, Human-Computer Interaction, Machine Learning, Programming, Software

≈ Leave a Comment

Since at this time we haven’t launch an official blog, this post goes…

We are excited to announce several updates to the minerazzi crawler. This is the online version of the indexing crawler used by the minerazzi search engine (beta).

The long-term goal is to turn this version into a multifunctional mining platform and a crawler for the IT masses; i.e., a crawler to be used by IR researchers, data miners, webmasters, developers, etc. That is, a crawler that even Web designers and the average public can use.

You’re welcome to give it a try. Keep in mind the tool is still in beta. While you are there, feel also free to test the multiple whois domain name tool.

IRW-3-7-2010: Artificial Languages

28 Wednesday Jul 2010

Posted by egarcia in Human-Computer Interaction, Machine Learning, Newsletters

≈ Leave a Comment

artificial languages

 

The current issue of IR Watch – The Newsletter should arrive to subscribers today or at the latest tomorrow. IRW reaches research centers from the academic and industry “world”. Centers can then forward the newsletter to their members, many of which elect to have their own subscription. That’s a huge reach. And the best part is that IRW is free.

This issue of IRW covers some artificial language algorithms originally investigated by Claude Shannon in their infamous work A Mathematical Theory of Communication.

In that work, frequencies associated to a pool of strings were used. In his tests, Shannon used the 26 letters from the English alphabet plus the space. He also used entire words.

Despite of their simplicity, students often have problems understanding these algorithms. In this issue we show how teachers and students can reproduce Shannon’s algorithms. To adhere to his experiments, we reproduce his comments and findings.

Enjoy it.

Fractal CSS Design as a Row Primary Technique

06 Wednesday Jan 2010

Posted by egarcia in Human-Computer Interaction

≈ Leave a Comment

A new article, Fractal CSS Design as a Row Primary Technique is now available. This one is a follow up work. The abstract is given below. Enjoy its reading.

“This paper describes fractal css design as a row primary technique. It is demonstrated that this approach helps to minimize the user-machine relevance perception mismatch. The impact of design on the parsing of documents is demonstrated. It is shown that replacing table-based layouts with CSS tableless design not necessarily produces equivalent layouts.”

The Danger of Microsoft: Data Lost

11 Sunday Oct 2009

Posted by egarcia in Human-Computer Interaction

≈ Leave a Comment

According to a Techcrunch 10-10-09 news a crash at Microsoft’s Danger servers resulted in the lost of all user personal data and they don’t have a backup!

The news says:

T-Mobile and Danger, the Microsoft-owned subsidiary that makes the Sidekick, has just announced that they’ve likely lost all user data that was being stored on Microsoft’s servers due to a server failure. That means that any contacts, photos, calendars, or to-do lists that haven’t been locally backed up are gone.

And there is no backup for the data. Really smart, Microsoft people. That says a lot!

This is gonna be in an information security textbook near you. How about in a textbook on Human-Computer No-Interaction?

Search Interfaces and Visual Clues

06 Wednesday Aug 2008

Posted by egarcia in Human-Computer Interaction

≈ 1 Comment

This is a continuation of yesterday’s post on Search Interface Usability. This time we want to touch upon visual clues and search interfaces.

Such clues should be obvious; i.e., they should guide users without explicitly having to explain anything.  Often users interpret such elements as a friendly environment.

Whoever is in charge of Google’s interface is good at it.

A screenshot of Google’s Book results for lsi tutorial illustrates this:

 

[Hum.How comvenient query, you may think, since our tutorials on LSI/SVD is referenced in a book that ranks #1. And you bet you are right :) ]

Nevertheless, back to the post.

Do a search in Google and note that its search interface has great usability clues in the form of anchor text consisting of action terms (below, in bold font).

At the top of the page you have crumb menu ending with a “more” link. This instructs the user to find more options. The arrow next to “more” suggests the user that this triggers a pulldown menu.

At the far right corner there are two links “MyLibrary” and “Sign in” link, instructing users to sign in;i.e. to register.

This is followed by visual clues like:

“Search Books” in the search button

“Advanced Book Search“

“Google Book Search Help”

“Showing” (next to a form selection menu)

“View all web results for…”

Then, there is also at the far right the following action text:

“List view” and “Cover view“

Note that to improve usability we don’t need to reinvent the wheel or mess with what users perceive as a “standard” search interface.

June 2013
M T W T F S S
« May    
 12
3456789
10111213141516
17181920212223
24252627282930

Favorite Sites

  • Mi Islita

Pages

  • About IR Thoughts

Categories

  • AIRWeb Course
  • Conferences
  • Data Mining
  • Dynamics
  • Fractal Geometry
  • Graduate Courses
  • Hacking
  • Homeland Security
  • Human-Computer Interaction
  • Image Compression
  • Internet Engineering
  • IR Quizzes
  • IR Tools
  • IR Tutorials
  • Latent Semantic Indexing
  • Legacy Posts
  • Machine Learning
  • Marketing Research
  • Miscellaneous
  • News
  • Newsletters
  • Programming
  • Quack Science
  • Queries
  • Scripts
  • Search Engines Architecture Course
  • SEO Myths
  • Software
  • Spam
  • Statistics and Mathematics
  • Theses
  • Vector Space Models
  • Web Mining Course

Recent Posts

  • “Powered by” in Spanish
  • Some nice features added to the Image Crawler
  • The Images Crawler
  • A nice service for my locals
  • An update to the Web Crawler
  • New similarity measures
  • The Web Crawler is Back!
  • Tracking Users: An Email Crawler on Steroids
  • The Email Crawler: A Tool for Gathering Emails
  • The Binary Distance Calculator – a tool for comparing binary sets
  • Fractalettes: A Fractal Design Strategy to Color Mining and Learning through Discovery
  • AZZOO and WAZZOO: New Similarity Measures for the 21st Century
  • The Binary Similarity Calculator
  • From Harlem Shake to Link Shake: The Qualified Links Shake
  • Web Vulnerabilities and Search Engines

Archives

  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • April 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007

AIRWeb Course Conferences Data Mining Fractal Geometry Graduate Courses Hacking Homeland Security Human-Computer Interaction Internet Engineering IR Quizzes IR Tools IR Tutorials Latent Semantic Indexing Legacy Posts Machine Learning Marketing Research Miscellaneous Newsletters Programming Quack Science Queries Scripts Search Engines Architecture Course SEO Myths Software Spam Statistics and Mathematics Theses Vector Space Models Web Mining Course

Blog at WordPress.com. Theme: Chateau by Ignacio Ricci.