• About IR Thoughts

IR Thoughts

~ Thoughts on Information Retrieval, Data Mining, and Search Engines

IR Thoughts

Category Archives: Miscellaneous

“Powered by” in Spanish

14 Tuesday May 2013

Posted by egarcia in Marketing Research, Miscellaneous

≈ Leave a Comment

The Problem:

When it comes to properly mean “powered by” in Spanish web pages, a lot of spanish-speaking users don’t seem to agree on how to properly mean that, as can be seen from the following links:

http://www.spanishdict.com/answers/144054/how-do-you-say-powered-by-in-espanol

http://forum.wordreference.com/showthread.php?t=329580

Not even Google and Microsoft seem to find a sounded way of meaning the above:

http://es.bab.la/diccionario/ingles-espanol/powered-by

I realized that this is an even worse problem among those hispanics and second generation latinos in the U.S. that are too “americanized” (if that is a valid term).

The Solution:

When we have problems finding terms from different languages with equivalent meanings, the best that we can do is to stop forcing translations and start describing what we want to say. This is a kind of a descriptive strategy.

For instance, according to http://arl-shipzine.com/issue-2-powered-by , “powered by” implies the notion of a service being “provided by”.

So, following the descriptive strategy, “provisto por” is a more appropriate option than ”impulsado por”, “potenciado por”, “generado por”, “auspiciado por”, “producido por” and other expressions that sound a bit ridiculous for the context in question. Of course, that’s just my opinion and you don’t have to agree with me.

“Impulsado por”…really? Are you launching a rocket?

“Potenciado por”…really? Are you a battery or power supply?

“Generado por”…really? Are you an electric generator?

“Auspiciado por”…really? Are you sponsoring something?

“Producido por”…really? Are you in a production business?

I know, I know. I’m being sarcastic. Me bad.

What is important to point out is that the above alternatives are subject to misinterpretations, while to imply “a product or service provided by”, o “un producto o servicio provisto por” has only one meaning: a product or service provided by someone, in a b2c (business-to-consumer) or b2b (business-to-business) context. 

At first glance, the above seems trivial, but is not. You would be surprised to see the faces of those latinos that read web content and creatives translated by SEO companies with no knowledge about Spanish or that use automatic translators. Bad translations can ruin any marketing, press release, or link-building campaign.

A nice service for my locals

08 Monday Apr 2013

Posted by egarcia in Data Mining, IR Tools, Miscellaneous, News, Programming, Software

≈ Leave a Comment

Puerto Rico Daily News & Image Searches. Driving traffic to Puerto Rico’s best media sites. The fastest way to find news and images relevant to Puerto Rico. Coming soon to http://www.miislita.com

I think this can be applied to many knowledge domains without making the same mistakes from similar services across the Web.  For now, baby steps.

Why time spent on a site is so important?

13 Tuesday Nov 2012

Posted by egarcia in IR Tools, Marketing Research, Miscellaneous, Software

≈ Leave a Comment

That is a recurrent question being asked by some of my readers. Here is my answer.

Back in 1995, I wrote in the Dedication section of my doctoral thesis:

“If I have a theory, but no experimental results, I may have nothing. And if I have a theory without practical applications, I may have an artifact.”

So, don’t give your visitors hearsays, half-lies, or misrepresentation of facts found across the Web, but things that they can really test, use, and that solve a real or urgent problem for them. Don’t waste your time repeating interesting -perhaps catchy concepts-, but that at the end of the day are just useless.

In addition to textual and audiovisual content of good quality, give them TOOLS. However, provide tools that make them interact more time with your site and that authoritative pages will recommend or link to.

This is important because the amount of time spent by users in a site is directly correlated to several web metrics/analytics like:

  • frequency cap – restriction on the amount of time a specific visitor is shown a particular advertisement.
  • stickiness – the amount of time spent at a site over a given time period.
  • underdelivery – delivery of less impressions, visitors, or conversions than contracted for a specified period of time.
  • unique visitors – individuals who have visited a site (or network) at least once during a fixed time frame.
  • bandwidth – how much data (e.g., content, ads, creatives) can be transmitted in a time period over a communication channel, often expressed in kilobits per second (kbps). Data is any alphanumeric content. This includes parameters, variables or any text/pixel-based creative.

Other time-based metrics inherited from traditional media (TV, radio) and that are based on the time spent by users viewing a communication channel can be applied to web channels and sites; among others:

  • average audience – the average number of people who tuned into the given time selected and expressed in thousands or as a percentage (also known as a Rating) of thetotal potential audience of the demographic selected. It is also known as a T.A.R.P -Targeted Audience Rating Point.
  • channel share – the share one channel has of all viewing for a particular time period. The share, expressed as a percentage, is calculated by dividing the channel’s average audience by the average audience of all channels (PUTs) (It is held in higher esteem by networks than media buyers on a day to day basis and is only referred to by the latter group when apportioning budgets and evaluating a programme for sponsorship).
  • cummulative audience or reach – the total number of different people within the selected demographic who tuned into the selected time period for 8 minutes or more (i.e., reached at least once by a specific schedule or advertisement).
  • frequency – the average number of times that a person within the target audience has had the opportunity to see an advertisement over the campaign period.
  • time spent viewing or TVS – how many minutes/hours an audience has viewed a particular channel.

[Sources: WebSiteMagazine, WebMediaSolutions, NielsenMedia].

So, any tool that helps your visitors to wisely improve their time spent on your site -in an effective manner, of course- cannot hurt you. For this to be true, however, the tool provided must be engaging, useful, effortless, and with a minimum learning curve; otherwise the user experience of your visitors can be frustrating and a waste of time.

Social Networks: New Player in PR Political Debate

20 Monday Aug 2012

Posted by egarcia in Miscellaneous

≈ Leave a Comment

Yesterday, Puerto Rico’s constitutional ammendment referendum was won by the people thanks to the power of intelligent people and the use of social networks, beating money-driven efforts by the current government.

Ha, Ha. More power to the people.

What is the more effective way of writing scientific research articles?

15 Friday Jun 2012

Posted by egarcia in Data Mining, Graduate Courses, Miscellaneous, Newsletters, Theses

≈ Leave a Comment

Over the years, I’ve been asked about the more effective way of writing peer-reviewed articles for scientific journals.

My response is always the same: Think like a referee/editor. Here is a list of items that they want to see accomplished:

Referees/editors like to see that the content and format of the title, abstract, document body, tables, images, graphics, appendices, and references follow their journal guidelines.

In general, referees/editors like to see in the first page of the printed version of an article:

1. Statement of the problem – what is the problem to be solved.
2. Purpose of the article – how the present research solves the problem.
3. Organization – how the article is organized and what is covered in each section.

This is a general practice across scientific journals. So, whenever possible, I try to accomplish 1 – 3 in the first three paragraphs of the first page of the printed article. To do this, you need to avoid lengthy introductions and wordiness. Be concise and ‘go the point’.

Referees/editors also like to see the article as a whole semantic unit. So they like to see:

Transitional statements; i.e., sections ending as an introduction to the next section.

1. One paragraph, one idea; i.e., each paragraph discussing one main idea.
2. Short paragraphs; i.e., each paragraph of about five sentences or less, where sentences are of appropriate length. This provides a natural stop to the reading. In general, short paragraphs and sentences are easier to read than the long ones. Use compound sentences with caution.
3. Facts supported by pertinent references.
4. Opinion written as opinions, not as facts.

Of course, there are other tips to think about, but in my opinion, the above can make a difference… well, in my opinion :)

Accessing Wikipedia Today

18 Wednesday Jan 2012

Posted by egarcia in Miscellaneous

≈ 4 Comments

I have no problem accessing and navigating the spanish version of wikipedia (http://es.wikipedia.org/wiki/Wikipedia:Portada ). Ha. That much for their Internet “block” Dark Day.

 

Intute: Another Scholarly Search Project Closing Operations

31 Tuesday May 2011

Posted by egarcia in Miscellaneous

≈ Leave a Comment

The fate of any good technology idea disconnected from a business/revenue model is not promisory. This is true for commercial and academic projects or for any project for that matter. Soon or later, even grant-funded projects will have their reality-check-day. Consider the case of Intute, which will be closing by July 2011; i.e., in about a month.

According to their FAQs page at http://www.intute.ac.uk/faq.html,

Why have JISC made this decision?

As stated in the JISC statement about the Intute review, when services “reach the end of their existing funding cycle it is always intended, wherever possible, that they move from being fully funded to being part-funded or fully sustained by other sources”. Unfortunately in the current economic climate no realistic alternative funding model for Intute as it currently stands has been identified.

However, we are working to ensure that the legacy of Intute lives on, and we are working with other organisations in the sector to find a new home for Intute content.

Why can’t Intute continue without JISC funding?

Over the last three years, we have investigated alternative funding models for Intute, including alternative grant funding, subscription and advertising/sponsorship, and we have spoken to librarians, academics and students to find out what they think. Unfortunately, we have been unable to find a model that will sustain Intute in its current form into the future.

and

Can you open up Intute for community updating and contributions? This model may be a better fit now with the rise of social /community web 2.0 ways of working.

We have looked at the possibility of facilitating a community generated resource catalogue, and investigated exporting all of our resources to Delicious. However, in December 2010 reports circulated that Yahoo will be shutting down Delicious. With Intute funding ending in July 2011, the uncertainty surrounding Delicious means that further investigations are unlikely.

Is there any way to save Intute? What about an internet fundraising drive or trying to raise funds from institutions, foundations or advertising?

In principle – maybe, but in practice we have investigated alternative funding models for Intute, including alternative grant funding, subscription and advertising/sponsorship, and we have been unable to find a model that will sustain Intute in its current form into the future. Intute as it stands costs over 1 million a year to run excluding the contributions associated with housing staff at our different partner institutions.

 Intute was created by a consortium of seven universities, working together with a whole host of partners.

The Intute consortium was:

  • University of Birmingham
  • University of Bristol
  • Heriot-Watt University
  • The University of Manchester
  • Manchester Metropolitan University
  • University of Nottingham
  • University of Oxford

Amazing that with so much human resources talent their fate is as described above.

Are we near the end of hardcopy scholarly journals?

14 Monday Feb 2011

Posted by egarcia in Marketing Research, Miscellaneous

≈ 4 Comments

According to Lang (2010), we could ask the question whether hardcopy scholarly journals are near the end.  I know, I know. This is kind of an elephant in the middle of a room. 

Lang raises the question based on the following bullet points:

1. Forty-page Articles Are Dead.
2. Survey Articles Are Dead.
3. Journal Issues Are Dead.
4. Page Numbers Are Dead.
5. Copy Editing Is Dead.
6. Peer Reviewing Might Be Dying Too.
7. The Article as a Unit of Publication Is Dead.

Lang then concludes with a question and call to action.

A New Beginning for Scholarly Publishing?

“So let’s abandon all the 20th-century baggage of traditional journals, and move to a more rational model for scholarly publication, with no copy editors, no reviewers, no redundancy, and no unnecessary delays. A concrete step would be to give each ACL member a DOI for a unipaper, and then ask them to non-redundantly populate this with a sequence, or a tree, of numbered paragraphs that consolidate all their work on a topic. Then, to get things moving, the present journal could insist that some proportion of citations be to paragraphs within these unipapers, with hyperlinks embedded right there in the citations. What are we waiting for?”

Feel free to take issues with any of the above points.

My opinion? Lang has very good arguments. However, …. I would say that due to the changing times -read here smart phones, I-tablets, blogs, social networks, etc- many hardcopy scholarly journals are actually evolving while the weakers or unfit to changes are dying as a natural e-phenomenon observed in online ecosystems. This is not unique of scholarly journals. Actually the same is true for any piece of hardcopy journal, newspaper, magazine, newsletters.

With more retailers giving discounts and even freebies just for showing a tweet about their products or services at their store, who knows what will be the fate of  flyers, coupons, etc.

Publishers that don’t adjust their business models to the changing times are deemed to become the next  LPs, 8-tracks, cassette tapes, etc.   

Lang, N. (2010) Are We Near the End of the Journal. Computational Linguistics Volume 36, Number 4.  Retrieved from http://www.mitpressjournals.org/doi/pdf/10.1162/coli_a_00019  

Ho, Ho, Ho: My Tips as Gifts

22 Tuesday Dec 2009

Posted by egarcia in Marketing Research, Miscellaneous

≈ Leave a Comment

Ho, Ho, Ho: My Tips as Gifts. Santa’s here.

Here are my Holiday’s Gifts to you in the form of useful tips.

Headers

Some search engines weight texts in strong, header, and anchor tags. So instead of this:

<h1>keyword 1</h1>

try this:

<h1><strong>keyword 1</strong></h1>

or this:

<h1><a …><strong>keyword 1</strong</a></h1>

CSS-style them to your heart needs. Try with several keywords and headers (h1, h2, h3, etc).

Element Size Rendering

On most browsers, medium size text is 16 pixels which is rendered as 1em when body is 100%.

Thus, for any body font-size %, X px*(1em/16px)*(body %) = Y em.
When body font-size is set to 100%, converting px into ems reduces to dividing by 16. If no nested em units are used, then

20px*(1em/16px) = 1.25em, rendered as 20px
19px*(1em/16px) = 1.1875em, rendered as 19px
18px*(1em/16px) = 1.125em, rendered as 18px
17px*(1em/16px) = 1.0556em, rendered as 17px
16px*(1em/16px) = 1em, rendered as 16px
15px*(1em/16px) = 0.9375em, rendered as 15px
14px*(1em/16px) = 0.875em, rendered as 14px
13px*(1em/16px) = 0.8125em, endered as 13px
12px*(1em/16px) = 0.75em, rendered as 12px
11px*(1em/16px) = 0.6875em, rendered as 11px
8px*(1em/16px) = 0.5em, rendered as 8px
1px*(1em/16px) = 0.0625em, rendered as 1px

At this body %, the mininum size is 1px = 0.0625em. To render 1px to smaller  ems, lower body font-size %;  e.g.,  if it  is 80%

1px*(1em/16px)*0.80 = 0.05em

Nesting elements with ems can also do the trick (*Please, see note).

BTW. Watch out for nested ems as you would need to account for these; For instance,  for body 100% all these are rendered same size:

<pre><p style=”font-size:1em;”>edel</p>
<p style=”font-size:16px;”>edel</p>
<p style=”font-size:2em;”><span style=”font-size:0.5em;”>edel</span></p></pre>

Under the Counter Prescription Medication

Extra Strength Tylenol PM contains  500 mg acetaminophen (pain reliever) plus 25 mg diphenhydramine HCl (antihistamine, nighttime sleep aid).

Regular Tylenol contains 325 mg acetaminophen

Regular Benadryl Allergy contains 25 mg diphenhydramine HCl

Which one would you take and when?

PS. I modified/corrected  some lines.

* Conversely, to retain 1px exactly as 1px at the 80% level, use 1px*(1em/16 px)/0.80 = 0.07812em.

* Some authors like to set body to 62.5% and then simply divide by 10 all pixels to get ems which is easier to remember. Your choice. I prefer the 100% mark across all  browsers.

*Pre tag above are to insure rendering in the post and not needed in the actual HTML code. I though this was obvious.

*You can also try

<h1><strong><a …>keyword</a></strong></h1>.

The above nesting techniques can also  be tried with in-page navigation (jump links) and even with img tags. If you care about W3C conformance, validate your code before putting to use  any nesting technique.

Random Notes Before School Starts

03 Monday Aug 2009

Posted by egarcia in Latent Semantic Indexing, Miscellaneous, Vector Space Models

≈ 1 Comment

1. The current issue of IR Watch will be out over the weekend–a bit delayed due to getting ready for school, preparing lessons and research projects. If things go as expected, my academic schedule will be a bit busy between teaching and research at two different universities.

2. I’m researching for a manuscript that deals with affine transformations applied to several IR problems. It expands on Vector Space Theory and allows one to think out of the “term-document” box. Great stuff.

3. Here is a great grad project in ppt format: Semantically Motivated Information Retrieval. I thank its author for referencing my  SVD Fast Track Tutorial.

4. Talking about semantically motivated, sentiment analysis, spam, etc… Funny how some folks in the SEO world like to damage the reputation of others without presenting any evidence. This time the trolls took on Kim Krause Berge ( http://cre8pc.com/archives/1489 ). I always admire Kim’s work, consider her an usability icon, and had the privilege of meeting her back in 2005. I was surprised to see these folks having a field day at her expense at Rand’s site. Kim, I feel your pain. However, more than one SEO forum/blog had lose credibility by allowing these folks, most of which think they can be socially “ranked” by attacking whoever is at the “top”. The fact is that most trolls are paper tigers that go hidding at the first Cease & Desist or defamation lawsuit.

← Older posts
May 2013
M T W T F S S
« Apr    
 12345
6789101112
13141516171819
20212223242526
2728293031  

Favorite Sites

  • Mi Islita

Pages

  • About IR Thoughts

Categories

  • AIRWeb Course
  • Conferences
  • Data Mining
  • Dynamics
  • Fractal Geometry
  • Graduate Courses
  • Hacking
  • Homeland Security
  • Human-Computer Interaction
  • Image Compression
  • Internet Engineering
  • IR Quizzes
  • IR Tools
  • IR Tutorials
  • Latent Semantic Indexing
  • Legacy Posts
  • Machine Learning
  • Marketing Research
  • Miscellaneous
  • News
  • Newsletters
  • Programming
  • Quack Science
  • Queries
  • Scripts
  • Search Engines Architecture Course
  • SEO Myths
  • Software
  • Spam
  • Statistics and Mathematics
  • Theses
  • Vector Space Models
  • Web Mining Course

Recent Posts

  • “Powered by” in Spanish
  • Some nice features added to the Image Crawler
  • The Images Crawler
  • A nice service for my locals
  • An update to the Web Crawler
  • New similarity measures
  • The Web Crawler is Back!
  • Tracking Users: An Email Crawler on Steroids
  • The Email Crawler: A Tool for Gathering Emails
  • The Binary Distance Calculator – a tool for comparing binary sets
  • Fractalettes: A Fractal Design Strategy to Color Mining and Learning through Discovery
  • AZZOO and WAZZOO: New Similarity Measures for the 21st Century
  • The Binary Similarity Calculator
  • From Harlem Shake to Link Shake: The Qualified Links Shake
  • Web Vulnerabilities and Search Engines

Archives

  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • April 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007

AIRWeb Course Conferences Data Mining Fractal Geometry Graduate Courses Hacking Homeland Security Human-Computer Interaction Internet Engineering IR Quizzes IR Tools IR Tutorials Latent Semantic Indexing Legacy Posts Machine Learning Marketing Research Miscellaneous Newsletters Programming Quack Science Queries Scripts Search Engines Architecture Course SEO Myths Software Spam Statistics and Mathematics Theses Vector Space Models Web Mining Course

Blog at WordPress.com. Theme: Chateau by Ignacio Ricci.