• About IR Thoughts

IR Thoughts

~ Thoughts on Information Retrieval, Data Mining, and Search Engines

IR Thoughts

Category Archives: Homeland Security

Web Vulnerabilities and Search Engines

04 Monday Mar 2013

Posted by egarcia in Hacking, Homeland Security

≈ Leave a Comment

Interesting finding:

Vulnerability scans via search engines. Includes Google scans and Bing reflections.

PlaceRaider: A government smartphone spyware?

03 Wednesday Oct 2012

Posted by egarcia in Homeland Security, Human-Computer Interaction

≈ Leave a Comment

PlaceRaider has been called a government spyware for smartphones. Expect copycats soon. Download the PlaceRaider article.

The abstract says:

“As smartphones become more pervasive, they are increasingly targeted by malware. At the same time, each new generation of smartphone features increasingly powerful onboard sensor suites. A new strain of `sensor malware’ has been developing that leverages these sensors to steal information from the physical environment | e.g., researchers have recently demonstrated how malware can `listen’ for spoken credit card numbers through the microphone, or `feel’ keystroke vibrations using the accelerometer. Yet the possibilities of what malware can `see’ through a camera have been understudied.”

“This paper introduces a novel `visual malware’ called PlaceRaider, which allows remote attackers to engage in remote reconnaissance and what we call \virtual theft.” Through completely opportunistic use of the phone’s camera and other sensors, PlaceRaider constructs rich, three dimensional models of indoor environments. Remote burglars can thus `download’ the physical space, study the environment carefully, and steal virtual objects from the environment (such as nancial documents, information on computer monitors, and personally identi able information). Through two human subject studies we demonstrate the e ectiveness of using mobile devices as powerful surveillance and virtual theft platforms, and we suggest several possible defenses against visual malware.”

Minerazzi Crawler and Whois Updates: Email Addresses, Reverse DNS, IPv4 Mapping, Navigation

11 Monday Jul 2011

Posted by egarcia in Data Mining, Homeland Security, IR Quizzes, Machine Learning, Programming, Software

≈ Leave a Comment

We keep improving the Minerazzi site (http://www.minerazzi.com). We moved all pages to a php format. In addition, here are recent changelogs for the Web Crawler (http://www.minerazzi.com/labs/crawlinker.php):

07-05-11: Email address extraction, deduplication, and sorting capabilities added.
07-04-11: Design and copy changes.
07-03-11: Navigation menu restored and bug fixed.
07-03-11: Navigation menu removed to test bug.
07-02-11: Top-bottom quick navigation menu added.
07-02-11: Day/Time Stamp, Reverse DNS, and IPv4 List capabilities added.
07-02-11: Integration to Whois Tool.

The Whois Database Retriever (http://www.minerazzi.com/labs/whois.php) now features suffix/prefix stripping capabilities. This means that users only need to enter a candidate domain name without any alias or extension and the tool scans multiple registrar databases. We expect to add some additional features to this time-saving application.

In the meantime, we keep beta testing the engine. Our staff of ‘miners’ are doing just a great job.

Hackers in your Cell Phone, Car, and TV

09 Tuesday Mar 2010

Posted by egarcia in Hacking, Homeland Security

≈ 1 Comment

Thanks to the Internet, hackers are -or soon will be- invading your cell phones, car, and TV.

Cell Phones:

The Energizer DUO Trojan: What You Need to Know, reports that the Energizer USB charger has been infected with a nasty Trojan.

Cars:

Ford Motor Rolls Out New Security Features To Prevent Car-Hacking, reports that Ford is taking steps to prevent hackers from literally car-jacking your vehicle.

TV:

Google, DISH Network in Set-top Tests, reports that Google is moving to provide search services through your TV. With TV soon hitting the market with Internet Widgets and similar technologies, soon your TV sessions will be subject to hacking.

So, very soon: hackers, spammers, and marketers in your car, phone, and TV.

To secure a job, get certified in Internet Security related technologies. Or how about, Multimedia Search Marketing (MSM)? That’s a new great acronym to think about.

Stop the Press: RSA Encryption Cracked!

03 Wednesday Mar 2010

Posted by egarcia in Hacking, Homeland Security

≈ Leave a Comment

According to this news:

Researchers have found that by playing with the voltage on a device, it is possible to crack the popular RSA encryption keys. Hackers are having a field day with this research.

The article says:

“Researchers at the University of Michigan say they have uncovered a way to circumvent encryption used on many devices.

The research is the work of Valeria Bertacco, Todd Austin and Andrea Pellegrini. According to their paper, entitled ‘Fault-Based Attack of RSA Authentication’ (PDF), the trio demonstrated a way to beat the popular encryption method, which is used in media players, laptop computers, smartphones and other devices. It is also used by retailers to secure customer information online.

The researchers found that by varying the voltage on a device it was possible to get their hands on the ‘private key’ needed to beat the security feature. Using what they described as an inexpensive device specially-built for the experiment, the trio manipulated the voltage and caused the computer to make small mistakes in its communications with other clients. This ultimately revealed small pieces of the private key, which they eventually used to reconstruct the key offline.”

What does Scraping is good for?

13 Wednesday Jan 2010

Posted by egarcia in Data Mining, Hacking, Homeland Security, Newsletters

≈ 2 Comments

The current issue of IRW features Web Scraping as a vehicle for conducting Web Mining.

As mentioned in the newsletter, there are so many things that can be done with scrapers. For instance, the below is a comparative of the number of script tags (<script …>…</script>) and link tags  (<link …./ >) declared in several index pages and extracted with two scrapers mentioned in the IRW article: the Script and Link Tag Scrapers. As expected, pages with a lot of content are prone to have  more scripts.

Search Engines Script Tags Link Tags
Yahoo.com * 15 2
Bing.com 12 1
Ask.com 10 0
Google.com 4 0
Gigablast.com 1 0

 

Socially-oriented Sites Script Tags Link Tags
Searchenginewatch.com 38 5
Twitter.com 9 3
Seomoz.org 7 13
Facebook.com 5 4
Wikipedia.com ** 1 6

 

* At the time of the analysis, Yahoo.com redirects to the m.yahoo.com alias, but same results are obtained.

** Wikipedia.org and Wikipedia.com return same results.

On the other hand, Web Scraping can unveil potential Web Vulnerabilites in an architecture, so there is a positive side to the story.  

In the good hands, scrapers can do great things. In the wrong ones, they can be a nightmare.

Unfortunately, hackers know well that scrapers can be embedded into malware and get their hands on source codes. Ask victims of such scrapers like Google and other companies (http://www.wired.com/threatlevel/2010/01/google-hack-attack/).

Besides legal issues and an unfriendly landscape (censorship), it appears they got tired of chinese hackers picking on them so they are pulling out of China  -or treatening to do so.

http://online.wsj.com/article/SB10001424052748704362004575000440265987982.html?mod=rss_Today’s_Most_Popular

Beaten in their own game: brain power.

IRW:Web Scraping – Web Mining with Scrapers

12 Tuesday Jan 2010

Posted by egarcia in Data Mining, Homeland Security, Marketing Research, Newsletters, Software

≈ Leave a Comment

Web Scraping

The current issue of IR Watch – The Newsletter is out. Featuring article’s abstract follows.

“Web Mining is a subfield of Data Mining where patterns are derived from the Web. If scraping tools are used for Web Mining this is referred to as Web Scraping (WS).

A scraper is a program designed to extract information from online documents. Scrapers work by matching document source codes against regular expression libraries.

WS is widely used, in part due to the rising popularity of scripting technologies like Asynchronous JavaScript and XML (AJAX), which allows users to retrieve source codes and manipulate the Document Object Model (DOM). WS is a form of Information Extraction where tools, not necessarily scrapers, and repositories, not necessarily the Web are used.

For the last 10 years we have been developing scrapers to simplify the collection and analysis of intelligence from the Web or local machines. For the last 4 years these were slowly converted to AJAX. In this issue of the newsletter, we want to share with readers our experience using several scrapers.”

Global Terrorism Database

03 Tuesday Nov 2009

Posted by egarcia in Data Mining, Homeland Security

≈ Leave a Comment

If you are into homeland security oriented data mining, this post is for you.

The University of Maryland has a Global Terrorism Database (GTD; http://www.start.umd.edu/gtd/) with information on over 80,000 terrorist attacks that intelligence researchers can tap into.

GTD is an open-source database including information on terrorist events around the world from 1970 through 2007 (with annual updates planned for the future). Unlike many other event databases, the GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 80,000 cases.

You can search by keywords or browse by region, country, perpetraror, weapon, attack, or target.

It also has advanced search capabilites. To perform an advanced search you need to select all categories you wish to search. If you do not check any options then your search will include all content from that category, for example, selecting Algeria from the “Country” list will restrict your search to incidents in Algeria, while leaving it blank searches all countries.

Incident searches can be restricted to specific years using several pull-down menus.

I tested by querying [puerto rico] and indeed was able to obtain incident records related with Los Macheteros. The answer set, however, included results not relevant to the Island of Puerto Rico.

The database is pretty small, but can come handy at times. Definitively, I will use it for one of my next graduate courses on search engine architectures.

Hacking the Cloud: Getting Google’s User Data by Hacking Twitter

17 Friday Jul 2009

Posted by egarcia in Hacking, Homeland Security

≈ Leave a Comment

A day ago Michael Arrington’s Techrunch published excerpts from “leaked” documents stolen from the Google Apps account of a Twitter Employee which included over 300 confidential files meant for “internal” Twitter consumption. “Hacker Croll” sent TechCrunch a zip file with 310 private files from inside Twitter.
(http://www.techtree.com/India/News/Leaked_Documents_Twitter_TechCrunch_Faceoff/551-104503-643.html).

It appears HC essentially used a cracker tool of some sort to brute-guess weak passwords. Once inside the first security ring, …

Cloud Programs: A Web Vulnerability Paradise for Hackers

Twitter relies heavily on cloud-based apps (Web-centric programs such as Google Docs or Web-based e-mail), and these services are becoming increasingly interconnected. Even social Web apps are beginning to share data: Facebook Connect and Google Friend Connect, for example, let you log in to multiple sites with a simple Facebook or Google account, raising the vulnerability of your entire online identity.
(http://www.switched.com/2009/07/17/twitter-employee-accounts-hacked-business-documents-leaked/)

The documents coming out of the hacker seem to be pretty significant. The “problem” is that if you have a Google Apps email account compromised, you also have shared calendar, Docs, Contacts, Wikis(Sites), etc.
(http://www.pcworld.com/article/168572/google_apps_security_questioned_after_twitter_leak.html)

This might be a good case study for students planning to take the AIR Web: Web Spam and Internet Vulnerability course.

IRW-2009-6:Hackers: Taxonomy & Writing Styles

01 Monday Jun 2009

Posted by egarcia in Hacking, Homeland Security, Newsletters

≈ Leave a Comment

hackers

The current issue of IRW should reach subscribers inbox during the day or at the latest, tomorrow.

In this issue:

  • Featuring article: Hackers: Taxonomy and Writing Styles
    Due to the increasing interest in developing Information Retrieval and Data Mining courses at the intersection of Information Security, this issue of the newsletter covers a brief taxonomy on hackers and their writing styles.
  • QA: Excel Matrix Multiplications: How to convert a term-document occurrence matrix into a term-term or document-document co-occurrence matrix?
  • Vacuum Tubes & Transistors Historical
  • Who is Who in IR: Thomas K. Landauer
  • Top CS Departments: Dartmouth College
  • Outstanding Graduate Theses
  • Calls and Events
  • IR Blogs
  • and more…
← Older posts
May 2013
M T W T F S S
« Apr    
 12345
6789101112
13141516171819
20212223242526
2728293031  

Favorite Sites

  • Mi Islita

Pages

  • About IR Thoughts

Categories

  • AIRWeb Course
  • Conferences
  • Data Mining
  • Dynamics
  • Fractal Geometry
  • Graduate Courses
  • Hacking
  • Homeland Security
  • Human-Computer Interaction
  • Image Compression
  • Internet Engineering
  • IR Quizzes
  • IR Tools
  • IR Tutorials
  • Latent Semantic Indexing
  • Legacy Posts
  • Machine Learning
  • Marketing Research
  • Miscellaneous
  • News
  • Newsletters
  • Programming
  • Quack Science
  • Queries
  • Scripts
  • Search Engines Architecture Course
  • SEO Myths
  • Software
  • Spam
  • Statistics and Mathematics
  • Theses
  • Vector Space Models
  • Web Mining Course

Recent Posts

  • “Powered by” in Spanish
  • Some nice features added to the Image Crawler
  • The Images Crawler
  • A nice service for my locals
  • An update to the Web Crawler
  • New similarity measures
  • The Web Crawler is Back!
  • Tracking Users: An Email Crawler on Steroids
  • The Email Crawler: A Tool for Gathering Emails
  • The Binary Distance Calculator – a tool for comparing binary sets
  • Fractalettes: A Fractal Design Strategy to Color Mining and Learning through Discovery
  • AZZOO and WAZZOO: New Similarity Measures for the 21st Century
  • The Binary Similarity Calculator
  • From Harlem Shake to Link Shake: The Qualified Links Shake
  • Web Vulnerabilities and Search Engines

Archives

  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • April 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007

AIRWeb Course Conferences Data Mining Fractal Geometry Graduate Courses Hacking Homeland Security Human-Computer Interaction Internet Engineering IR Quizzes IR Tools IR Tutorials Latent Semantic Indexing Legacy Posts Machine Learning Marketing Research Miscellaneous Newsletters Programming Quack Science Queries Scripts Search Engines Architecture Course SEO Myths Software Spam Statistics and Mathematics Theses Vector Space Models Web Mining Course

Blog at WordPress.com. Theme: Chateau by Ignacio Ricci.