Archive for the ‘Homeland Security’ Category

Global Terrorism Database

November 3, 2009

If you are into homeland security oriented data mining, this post is for you.

The University of Maryland has a Global Terrorism Database (GTD; http://www.start.umd.edu/gtd/) with information on over 80,000 terrorist attacks that intelligence researchers can tap into.

GTD is an open-source database including information on terrorist events around the world from 1970 through 2007 (with annual updates planned for the future). Unlike many other event databases, the GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 80,000 cases.

You can search by keywords or browse by region, country, perpetraror, weapon, attack, or target.

It also has advanced search capabilites. To perform an advanced search you need to select all categories you wish to search. If you do not check any options then your search will include all content from that category, for example, selecting Algeria from the “Country” list will restrict your search to incidents in Algeria, while leaving it blank searches all countries.

Incident searches can be restricted to specific years using several pull-down menus.

I tested by querying [puerto rico] and indeed was able to obtain incident records related with Los Macheteros. The answer set, however, included results not relevant to the Island of Puerto Rico.

The database is pretty small, but can come handy at times. Definitively, I will use it for one of my next graduate courses on search engine architectures.

Hacking the Cloud: Getting Google’s User Data by Hacking Twitter

July 17, 2009

A day ago Michael Arrington’s Techrunch published excerpts from “leaked” documents stolen from the Google Apps account of a Twitter Employee which included over 300 confidential files meant for “internal” Twitter consumption. “Hacker Croll” sent TechCrunch a zip file with 310 private files from inside Twitter.
(http://www.techtree.com/India/News/Leaked_Documents_Twitter_TechCrunch_Faceoff/551-104503-643.html).

It appears HC essentially used a cracker tool of some sort to brute-guess weak passwords. Once inside the first security ring, …

Cloud Programs: A Web Vulnerability Paradise for Hackers

Twitter relies heavily on cloud-based apps (Web-centric programs such as Google Docs or Web-based e-mail), and these services are becoming increasingly interconnected. Even social Web apps are beginning to share data: Facebook Connect and Google Friend Connect, for example, let you log in to multiple sites with a simple Facebook or Google account, raising the vulnerability of your entire online identity.
(http://www.switched.com/2009/07/17/twitter-employee-accounts-hacked-business-documents-leaked/)

The documents coming out of the hacker seem to be pretty significant. The “problem” is that if you have a Google Apps email account compromised, you also have shared calendar, Docs, Contacts, Wikis(Sites), etc.
(http://www.pcworld.com/article/168572/google_apps_security_questioned_after_twitter_leak.html)

This might be a good case study for students planning to take the AIR Web: Web Spam and Internet Vulnerability course.

IRW-2009-6:Hackers: Taxonomy & Writing Styles

June 1, 2009

hackers

The current issue of IRW should reach subscribers inbox during the day or at the latest, tomorrow.

In this issue:

  • Featuring article: Hackers: Taxonomy and Writing Styles
    Due to the increasing interest in developing Information Retrieval and Data Mining courses at the intersection of Information Security, this issue of the newsletter covers a brief taxonomy on hackers and their writing styles.
  • QA: Excel Matrix Multiplications: How to convert a term-document occurrence matrix into a term-term or document-document co-occurrence matrix?
  • Vacuum Tubes & Transistors Historical
  • Who is Who in IR: Thomas K. Landauer
  • Top CS Departments: Dartmouth College
  • Outstanding Graduate Theses
  • Calls and Events
  • IR Blogs
  • and more…

NSA/DHS Designates PUPR as a CAE

May 5, 2009

As blogged yesterday, the current issue of IRW should reach subscribers inbox today. The Top CS Departments column features Polytechnic University of Puerto Rico, where I teach graduate courses. As mentioned few days ago, PUPR has been designated a CAE. This is a great news that is making a splash across academic centers within the U.S., the Caribbean Region and Latin America, and whose mission is research relevant to homeland security.

Associate Director for Computer Science, Dr. Alfredo Cruz, sent me an  official announcement, which I am reproducing.

Polytechnic University of Puerto Rico (PUPR) is Designated National Center of Academic Excellence in Information Assurance Education by NSA and DHS. PUPR was recently designated as a National Center of Academic Excellence in Information Assurance Education (CAE/IAE) by the National Security Agency (NSA) and the Department of Homeland Security (DHS) on April 22, 2009. The goal of these centers is to reduce the vulnerability of the national information infrastructure by promoting higher education and research in Information Assurance (IA) and Security through the development of a growing number of professionals with IA expertise in various related disciplines. PUPR will be recognized as the first institution in Puerto Rico to be designated as a CAE/IAE on June 3, 2009 in Seattle, Washington. Dr. Alfredo Cruz from the Department of Electrical & Computer Engineering and Computer Science will be present to receive the designation. He is the Director of the Center of Information Assurance for Research and Education (CIARE) at PUPR. Dr. Cruz is the person responsible for this designation. PUPR is of the very few Hispanic serving institution (HSI) in the Nation to receive this designation, and to become one of the first 100 institutions nationwide; this is a very special recognition. This designation requires that the President of the United States send the Governor of Puerto Rico a certification that should be handed to the president of PUPR designating the Institution as a CAE/IAE at a National level. The Congress and all the respective Congressional Committees are also notified.

Some of the benefits of the CAE/IAE designation are:
• PUPR will receive formal recognition from the U.S. Government as well as opportunities for prestige and publicity for our roll in securing the Nation’s information systems.
• This designation increases collaboration opportunities between designated and aspiring institutions at local and national levels. This includes internships, faculty and student exchange, research, and publications, among other activities.
• With this designation as a CAE/IAE PUPR can obtain scholarships that can help outstanding students to pursue graduate studies in IA, enabling them to work with the Federal Government or other federal institutions and agencies.
• PUPR can compete and benefit from proposal calls (RFP) that are specifically for designated CAE/IAE institutions. These proposals offer millions of dollars from the DoD, NSF, NSA and “Homeland Security”, among others, for research and infrastructure.
• Student scholarships offered under the NSF’s Scholarship for Service (SFS) program. The SFS scholarship offers the following:
–2-year scholarship, includes 8K stipend (12K for graduate students), plus tuition and nominal room and board expenses.
–Paid summer internship in a federal agency.
–Placement in federal government at the end of the scholarship period.

SIDIM XXIV Conference

March 5, 2009

I am presenting at The Seminario Interuniversitario de Investigación en Ciencias Matemáticas (Interuniversity Seminar on Mathematical Sciences Research, SIDIM).

This is one of the most important activities held in Puerto Rico for the promotion of Mathematics research. (http://sidim2009.uprr.pr/)

This year SIDIM will be held at University of Puerto Rico, Rio Piedras in March 6-7, 2009. The SIDIM program and book of abstracts  is available at http://sidim.uprh.edu/libroSIDIM2009.pdf

I will be presenting new research work on IDF and a new model for the conditional specificity of terms. If you have followed previous posts on the topic of inverse document frequency, now you will understand why I have dissected the topic several times. Thank you all for your private comments and feedback on the topic.

My abstract follows:

Scaled Inverse Document Frequency: A Model for the Evaluation of the Conditional Specificity of Query Terms in Search Engine Collections

Edel Garcia, Internet Business Development Center, Interamerican University of Puerto Rico, Metropolitan Campus

Inverse document frequency (IDF) is a measure of the specificity of query terms over a collection of D number of documents that has been successfully incorporated into numerous vector space information retrieval models. Since these models assume term independence, the specificity of a given term, present in different queries, is assumed to be unique and independent from other query terms. To the best of our knowledge, there are no known models that condition the specificity of terms to the presence of other terms in a query.

This paper proposes a new measure called scaled inverse document frequency (SIDF) which evaluates the conditional specificity of query terms over a subset S of D and without making any assumption about term independence. S can be estimated from search results, OR searches, or computed from inverted index data. We have evaluated SIDF values from commercial search engines by submitting queries relevant to the financial investment domain. Results compare favorably across search engines and queries. Our approach has practical applications for `real-world’ scenarios like in Web Mining, Homeland Security, and keyword-driven marketing research scenarios. SIDF can be incorporated into a variety of information retrieval models as a global weight scoring system.

Keywords: inverse document frequency, conditional term specificity, web mining, search engines

Coming Soon: Data Mining MP3 Players

January 27, 2009

MP3 Confidentials: I saw this morning on CNN a technology news about how military records, including the names, SSNs, phones, etc of soldiers were discovered stored in an MP3 Player. According to the news,

Chris Ogle of New Zealand was in Oklahoma about a year ago when he bought a used MP3 player from a thrift store for $9. A few weeks ago, he plugged it into his computer to download a song, and he instead discovered confidential U.S. military files.

“The more I look at it, the more I see, and the less I think I should be,” Ogle said with a nervous laugh in an interview with TVNZ.

The files included the home addresses, Social Security numbers and cell phone numbers of U.S. soldiers. The player also included what appeared to be mission briefings and lists of equipment deployed to hot spots in Afghanistan and Iraq.

Pentagon officials told CNN that they are aware of the MP3 player, but can’t talk about it until investigators confirm that the information came from the U.S. Department of Defense. 

“The government isn’t doing a good job of protecting the information that it collects,” said Marc Rotenberg of the Electronic Privacy Information Center in Washington.

Despite government efforts to protect sensitive information, this is a growing problem, privacy experts say.

Two years ago, the Department of Veterans Affairs lost track of a laptop with the personal information of millions of soldiers. And computer hard drives with classified military information have been found for sale at street markets in Afghanistan.

“When you can identify American personnel, when you have their names, their home address, their cell phone numbers, you put people in a dangerous position,” Rotenberg said.

It might be time to cover data mining of MP3 Players.

More SSNs Compromised

December 5, 2008

As mentioned in recent posts, the current issue of IRW features an article covering incidents where social security numbers (SSNs) have been leaked to the Web. Along the same line, a cardinal rule in Web security is to never provide a connection between an intranet and the Internet. Once such a connection is established (hardware-based or via links), chances are that you no longer have an intranet. So, why take the risks?

In addition, never place sensitive information in a test server with access to the Web. Unfortunately the first offenders frequently are government agencies and universities. Stubborn IT administrators never get it!

For instance adding insult to injury, here is a report from the Orlando Sentinel, wherein 250,000+ users accounts containing SSNs were compromised:

http://blogs.orlandosentinel.com/news_politics/2008/12/state-agency-pu.html

According to this news and quote:

“The state Agency for Workforce Innovation blamed a “security breach” Wednesday for why it accidentally placed the names and Social Security numbers of 250,000 job-seekers on a “test server” that could have been accessed online.”

“The names and information were online for 19 days and removed in late October after the state Department of Revenue came across it during “routine work,” officials said. The only common denominator among the names placed online was that they all got services over the last six years from one of the 81 Florida “career centers” that provide job-training and resources around the state.”

The breach is giving bad publicity to Agency for Workforce Innovation (AWI). According to http://infosecurity.us/?p=4041, the Liberty Coalition asked AWI the following questions:

  1. Why did the Agency for Workforce Innovation store sensitive Excel files on a server at all?
  2. Why was this website left open to the public for more than a month, undetected by AWI’s IT department?
  3. Why were the files on the server not behind a firewall, password protected or encrypted?
  4. How many other servers store sensitive personal information, and how many of those are available to the public right now?
  5. How many AWI employees have access to clients’ social security numbers, and do they all need access?
  6. How do you plan to train employees to appropriately handle sensitive personal information?
  7. Do you have a regular schedule of scanning your internal networks and external servers for personal information? If so, why was this breach not discovered?
  8. Does the Agency for Workforce Innovation intend to pay for identity theft protection services for the victims of this breach?
  9. Will the Agency notify victims by mail?

Infosecurity states that the Liberty Coalition has raised the following issues:

  1. AWI has not offered to protect victims with identity theft protection services.
  2. AWI relied on public search engines and a member of the public 800 miles away to discover the breach.
  3. The Agency should destroy the information, not just restrict access.
  4. How many other AWI servers are currently exposing personal information.
  5. Why the need for AWI to collect minors’ social security numbers.
  6. AWI has not indicated how many employees have access to clients’ social security numbers, and whether these employees require access to fulfil their job descriptions.
  7. AWI does not appear to regularly scans its networks for sensitive personal information.

To play pr/damage control after the facts and gross incompetence, the FloridaJobs.org site published the following:

“The Agency for Workforce Innovation is continuing to take action to address a security breach that recently occurred on a test server. Upon discovery, the Agency immediately contacted the appropriate law enforcement agencies, began a thorough investigation and promptly coordinated with all major external search engine companies to ensure the information was no longer accessible to the public. The Agency has no reason to believe any personal information has been accessed for unlawful purposes.”
http://www.floridajobs.org/publications/news_rel/securityBreach.html

They have “no reason to believe any personal information has been accessed for unlawful purposes.” Good pr try. How do they know that? After their comedy of errors, why would anyone want to submit resumes to their databases? The rest of their pr excuses are a wall of smoke.

Note also how they quickly contacted search engines, just in case these have indexed the documents. At least they are realizing the power of search engines. Chances are they have cached copies of these documents.

Search Engines and SSNs

December 3, 2008

 In the current issue of IRW we explain why facilitating social security numbers (SSNs) online is an enabling crime; one that is relevant to Homeland Security (1). We show that, ironically, government agencies and universities are the first facilitators of SSNs on the Web.

We examined how crafting smart queries in Google and other search engines allows users to find incidents wherein SSNs have been released for the entire world to see online. Althought nothing new, it is a widespread problem across the Web. It is a shame when administrators of the above two offenders (government and university dependencies) ignore the problem or justify it in the name of what is practical.

We show why the common practice of facilitating the last four digits of a SSN is a very bad idea. With SSN Allocation tables, we can map the first three digits to the region wherein the SSN application was filed, by US State and territory. If the last four digits are known, only the middle two digits need to be guessed. Identity thieves and stalkers might be having a field day.

There is still hope, though. We cover how Northern Michigan University (2) and John Hopkins University (3) are proactively becoming part of the solution and not part of the problem. In the case of NMU, they have published a one year case study outlining the full eradication of SSNs as identifiers from NMU campus.

 References

1. The Homeland Security and Terrorism Threat: From Document Fraud, Identity Theft and Social Security Number Misuse
http://finance.senate.gov/hearings/testimony/2003test/091003pctest.pdf
2. Full Eradication of Social Security Number as an Identifier
http://net.educause.edu/ir/library/pdf/EDU04144.pdf
3. Policy on Social Security Number Protection and Use
http://education.jhu.edu/catalog/academic-policies/policy-on-ssn-protection-and-use/

IRW Sneak Preview: Identity Thefts through Search Engines

December 1, 2008

identity thefts

This is a sneak preview of IR Watch. In this issue the main article, Identity Thefts through Search Engines, covers quite old and well known incidents wherein social security numbers have been released for the entire world to see. These are accessible through search engines.

Although not a new problem, facilitating a SSN, even a portion of it, has been labeled as an enabling crime. This is a must-read topic for those conducting data mining and web mining at the intersection of information assurance and homeland security.

Ironically, the biggest offenders are government agencies and universities.

During the week, we will blog on other sections of the newsletter.

On Online Hackers, Marketers, and Criminals

August 19, 2008

Hackers that market themselves are fully getting into the crime scene.

We have seen marketers getting into hacking and vice versa: hackers getting into marketing. Designing web pages that rank high in the search engines for the sole purpose of using these to spread malicious resources and tools is one example. We call them hacketers = hackers + marketers.

Now hackers are getting physical.

Back in March, 2008 it was reported how hackers were causing harm to folks suffering from epilepsy. Some usability and accessibility marketers are using those incidents to better promote their own services a la your-problem-is-my-opportunity.

Other marketers are creating reputation management problems and then ‘go back through the kitchen’ to market “reputation management” solutions. A scam not any different from the click fraud scam promoted by marketers part of a mob organization. Hah, Hah.

Now, we have the news of a hacker allegedly kidnaping and torturing another alleged hacker.

These probably are the first cases of hackers physically hurting others.

What is next? Google worse than ISP Snooping? –as AT&T claims.

Some times controlling information is worse than physically controlling others.

Ah, the many faces of opportunism.

Verizon, FCC, and the C Block Competition

March 24, 2008

Now that the B and C blocks of the spectrum has been allotted by the FCC things are set-ready-go to open mobile broadband U.S. networks, broadband IR, and, yes, to a whole new hacking space. It’s a matter of time. The C Block hacking competition is coming. Never ignore what can be done with such new playground.

I wonder how the FCC is going to enforce regulations on the 22-MHz portion of the spectrum, already handled to Verizon. http://www.pcworld.com/article/id,143705-c,industrynews/article.html

Meanwhile, IR research centered around open broadband networks are needed, so as search engines.

Search Engines for Penetration Testing

February 21, 2008

Well, I’m getting ready for my talk this afternoon at University of Turabo. I’ve organized the talk in three parts:

 Part 1: Spam and Fraud through Search Engines

Part 2: Gathering Intelligence through Search Engines

Part 3: Identity Theft through Search Engines

A disclaimer will be necessary to indicate that the information to be presented is for educational purposes, only.

This gonna be a nice one. I hope to see old friends.

Web Mining, Search Engines, and Information Security

February 15, 2008

This thursday the 21st I’ll be presenting before the faculty of University of Turabo, Gurabo, PR the talk:

Web Mining, Search Engines, and Information Security

I hope to see old friends there. Here is the abstract of my talk:

Web Mining is a research area of Data Mining wherein the Web is the “database” and search engines are the “user’s interface”. End-users can resource to search engines for all sorts of things. For instance, marketers can use search engines to gain traffic derived from ranking high Web pages for specific queries, hence enhancing the online presence of businesses, products, and services (search engine optimization, SEO). Spammers can inundate search engine indexes to deceive searchers (spamdexing). Hackers can attempt to rank high documents that lead to security risks (hacketers, hacketering) or use all form of injections (links, forms, scripts, redirections, etc). Terrorists and criminals can use search engines to commit all sort of crime-enabling activities, for instance, by stealing private information like SSNs, passwords, students and users’s IDs, gaining access to “private” documentation, stalking people, etc.

This talk covers these and other aspects of search engines: the Good, the Bad, and the Ugly. The speaker will then talk about his own research projects in the area of Web Mining, Search Engines, and Intelligence. A disclaimer will be necessary to indicate that the information to be presented is for educational purposes only.

Web Mining Week 9

January 28, 2008

Week 9 Agenda

Intelligence Searching for Penetration Testers (PPT Presentation)
Searching for Terrorist Threats and Identity Thefts, the SSN Way (PPT Presentation)
Mining VIN numbers, Email Headers, and other Undocumented Commands (PPT Presentation)

Required Reading Material

Provided during lecture.

Thesis: DNIDS Using the CSI-KNN Algorithm

January 4, 2008

Here is a great 2007 MS Thesis from Liwei (Vivian) Kuang from School of Computing, Queen’s University, Kingston, Ontario, Canada. DNIDS: A Dependable Network Intrusion Detection System Using the CSI-KNN Algorithm

I’m happy she quoted my Cosine Similarity Tutorial.

Part of the abstract states: “In this thesis, we propose a Dependable Network Intrusion Detection System(DNIDS) based on the Combined Strangeness and Isolation measure K-Nearest Neighbor(CSI-KNN) algorithm. The DNIDS can effectively detect network intrusionswhile providing continued service even under attacks. The intrusion detection algorithmanalyzes different characteristics of network data by employing two measures:strangeness and isolation. Based on these measures, a correlation unit raises intrusionalerts with associated confidence estimates. In the DNIDS, multiple CSI-KNNclassifiers work in parallel to deal with different types of network traffic. An intrusiontolerantmechanism monitors the classifiers and the hosts on which the classifiers resideand enables the IDS to survive component failure due to intrusions. As soon asa failed IDS component is discovered, a copy of the component is installed to replaceit and the detection service continues.”

“We evaluate our detection approach over the KDD’99 benchmark dataset. Theexperimental results show that the performance of our approach is better than the bestresult of the KDD’99 contest winner. In addition, the intrusion alerts generated byour algorithm provide graded confidence that offers some insight into the reliabilityof the intrusion detection. To verify the survivability of the DNIDS, we test theprototype in simulated attack scenarios. In addition, we evaluate the performanceof the intrusion-tolerant mechanism and analyze the system reliability.”

Resources on the Dark Web

December 14, 2007

Few days ago I reported on the Dark Web Project.

There is one section of that paper that reads (emphasis added):

“IV. Presentations in Seminars or Conferences (PowerPoint) – Password protected; please send request via email and provide a brief explanation of your interest.”

Clicking on the links that follow that statement triggers a history.go(-1) JavaScript event in the browser history. Looking at the source of the document shows a JavaScript asking for the password (which is given as “ailab”) and the following partial paths to the documents:

publications/conf/WriteprintsandInkBlots.pdf
publications/conf/data%20mining%20webometric%20analysis%203aug05.pdf
publications/conf/SeminarGroupAuthorship.pdf
publications/conf/comparative_03_25_05.pdf
publications/conf/Dark%20Web%20200502.pdf
publications/conf/AASlidesMod.pdf
publications/conf/WebForum0712_2007.ppt
publications/conf/SpecializedContent_2007.ppt
publications/conf/ClearGuidance_2006.ppt

Other than accessing the entries in the history.go array of end users, I’m not sure why they added this “password protected” feature since simply adding http://ai.arizona.edu/research/terror/ to the above paths allows one to access and download the documents, anyway.

The article also points to the following great resources:

Reid, E. and Chen, H., “Mapping the Contemporary Terrorism Research Domain.” International Journal of Human-Computer Studies, 65, Pages 42-56, 2007.

Qin, J., Zhou, Y., Reid, E., Lai, G., Chen, H., “Analyzing Terror Campaigns on the Internet: Technical Sophistication, Content Richness, and Web Interactivity,” International Journal of Human-Computer Studies, 65, Pages 71-84, 2007.

H. Chen and F. Wang, “Artificial Intelligence for Homeland Security“,IEEE Intelligent Systems, Special Issue on Artificial Intelligence for National and Homeland Security, pp. 12-16, September/October 2005.

A. Abbasi and H. Chen, “Applying Authorship Analysis to Extremist-Group Web Forum Messages“,IEEE Intelligent Systems, Special Issue on Artificial Intelligence for National and Homeland Security, pp. 67-75, September/October 2005.

Zhou, Y., Reid, E., Qin, J., Lai, G., Chen, H., “U.S. Domestic Extremist Groups on the Web: Link and Content Analysis,”IEEE Intelligent Systems, Special Issue on Artificial Intelligence for National and Homeland Security, pp. 44-51, September/October 2005.

A. Abbasi and H. Chen, “Visualizing Authorship for Identification,” In Proceedings of the Intelligence and Security Informatics: IEEE International Conference on Intelligence and Security Informatics (ISI 2006), San Diego, CA, USA, May 23-24, 2006.

J. Wang, T. Fu, H. Lin, and H. Chen, “A Framework for Exploring Gray Web Forums: Analysis of Forum-Based Communities in Taiwan,” In Proceedings of the Intelligence and Security Informatics: IEEE International Conference on Intelligence and Security Informatics (ISI 2006), San Diego, CA, USA, May 23-24, 2006.

Y. Zhou, J. Qin, G. Lai, E. Reid, and H. Chen, “Exploring the Dark Side of the Web: Collection and Analysis of U.S. Extremist Online Forums,” In Proceedings of the Intelligence and Security Informatics: IEEE International Conference on Intelligence and Security Informatics (ISI 2006), San Diego, CA, USA, May 23-24, 2006.

A. Salem, E. Reid, and H. Chen, “Content Analysis of Jihadi Extremist Groups’ Videos,” In Proceedings of the Intelligence and Security Informatics: IEEE International Conference on Intelligence and Security Informatics (ISI 2006), San Diego, CA, USA, May 23-24, 2006.

J. Xu, H. Chen, Y. Zhou, and J. Qin, “On the Topology of the Dark Web of Terrorist Groups,” In Proceedings of the Intelligence and Security Informatics: IEEE International Conference on Intelligence and Security Informatics (ISI 2006), San Diego, CA, USA, May 23-24, 2006.

Zhou, Y., Qin, J., Lai, G., Reid E. and Chen, H., “Building Knowledge Management System for Researching Terrorist Groups on the Web,” Proceedings of the AIS Americas Conference on Information Systems (AMCIS 2005) , Omaha, NE, USA, August 11-14, 2005.

Mapping the Contemporary Terrorism Research Domain: Researchers, Publications, and Institutions Analysis,” ISI Conference 2005, Atlanta, GA, May, 2005.

Reid, E., Qin, J., Zhou, Y., Lai, G., Sageman, M., Weimann, G., and Chen, H., “Collecting and Analyzing the Presence of Terrorists on the Web: A Case Study of Jihad Websites,” IEEE International Conference on Intelligence and Security (ISI 2005), Atlanta, Georgia, 2005.

Chen, H., Qin, J., Reid, E., Chung, W., Zhou, Y., Xi, W., Lai, G., Bonillas, A. and Sageman, M., “The Dark Web Portal: Collecting and Analyzing the Presence of Domestic and International Terrorist Groups on the Web,” Proceedings of the 7th International Conference on Intelligent Transportation Systems (ITSC), Washington D.C., October 3-6, 2004.

IRSeek, Polymorphic JavaScript, and Hacketers

December 6, 2007

According to a DarkReading report IRSeek is a start-up designed to target hackers and their IRC anonymous chat activities. Hacking the hackers?

The report states:

“Hackers favor IRC because it allows them to protect their identities and cover their tracks. But a new search engine startup called IRSeek is now calling those features into question…”

“This could all be bad news for hackers, who don’t want their conversations indexed or searchable by nickname. While they could partially beat the system by simply changing their nicknames frequently, hackers may eventually feel that IRSeek threatens their anonymity, and ultimately, their privacy.”

Here is more on the topic.

Well, this can be fun to watch/test for those that conduct Web Mining for security purposes.

Meanwhile, according to a CNN report Search Engine-based hacking attacks are on the rise and becoming a preferred targeting method. This includes link-based spam, polymorphic JavaScript scripts also referred to as “Polyscripts”, and or combined with dark marketing practices. Here is a Top 10 List to watch.

1. Phishing
2. Malicious link injections through forums, blogs to rank high in search engines.
3. Attackers use Web’s ‘weakest links’ to launch attacks.
4. Compromised Web sites will surpass number of created malicious sites.
5. Cross-platform Web attacks .
6. Web 2.0-based attacks.
7. Polymorphic JavaScripts, designed to evade anti-virus scanners.
8. Data concealment methods.
9. Key hacker groups.
10.Vishing and voice spam.

Hackers + Spammers + Crook marketers/SEOs = What A Killer Combination. Compromised sites ranking high means trapping more users in the mess. I wonder how many of the folks from the seophere are involved and making few bucks. The usual suspects?

Perhaps not all are real SEOs, but as we say in Spanish: “Ante la duda, saluda.”

Here is a nice one: Hacking Duke University to rank high via link injection

And some how related, how about cracking passwords with Google?

Welcome to an-on-the-rise new breed:

Hacketers = Hackers + Marketers

PS. I coined the name after noticing with the Levenshtein Edit Distance Calculator that it only requires of two edits between hacketers and marketers.

http://www.miislita.com/searchito/levenshtein-edit-distance.html

Heh, Heh. Apparently “peer” pressure forced IRSeek to shutdown. Nevertheless, it is still a great concept: I wonder how many of these mole  projects are in place all over the Web. Check the whole deadpool story here:

http://www.techcrunch.com/2007/12/03/fastest-deadpool-ever-irseek-shuts-down/#comment-1813205

http://www.irseek.com/blog/

Dark Web Project and Web Mining

November 13, 2007

Prof Chen, UofArizona, has a fascinating project on Web Mining applied to Homeland Security called the Dark Web Project, over at http://ai.arizona.edu/research/terror/index.htm

The project is funded by NSF, DHS, CNRI, and Library of Congress. 

From their site:

“The AI Lab Dark Web project is a long-term scientific research program that aims to study and understand the international terrorism (Jihadist) phenomena via a computational, data-centric approach. We aim to collect “ALL” web content generated by international terrorist groups, including web sites, forums, chat rooms, blogs, social networking sites, videos, virtual world, etc. “

“We have developed various multilingual data mining, text mining, and web mining techniques to perform link analysis, content analysis,  web metrics (technical sophistication) analysis, sentiment analysis, authorship analysis, and video analysis in our research.”

“The approaches and methods developed in this project contribute to advancing the field of Intelligence and Security Informatics (ISI). Such advances will help related stakeholders to perform terrorism research and facilitate international security and peace. “

“It is our belief that we (US and allies) are facing the dire danger of losing the “The War on Terror” in cyberspace (especially when many young people are being recruited, incited, infected, and radicalized on the web) and we would like to help in our small (computational) way.”

Random Notes and LauraMansfield

September 12, 2007

These are some late random notes. Sorry for the delay.

1. I am putting together a research project for a graduate student. The topic is quite interesting: homeland security. While researching the topic I came across LauraMansfield.com site. Mansfield’s site is a goldmine of information, especially for those interested in co-occurrence and word association research applied to the terrorist knowledge domain.

2. I am reviewing a graduate thesis in which logistic regression is used for data mining medical claims. Quite interesting the thesis topic. The manuscript needs some rework, though.

3. I am reading bits and pieces of an old paper on the non-transitivity nature of Jaccard’s Coefficient and a proposed indirect similarity measure.

Data Mining and Reports on Terrorism

August 22, 2007

I’m researching the topic of Data Mining (KDD) and Terrorism Information Awareness (TIA) for a graduate course and came across a great old resource:

Data Mining

It is oldie, but the important part are the references.

It may interest IRs conducting similar research.

Here is another great resource:

Data Mining and Homeland Security

DARPA Agent Markup Language (DAML)

August 1, 2007

DARPA Agent Markup language (DAML) site has tons of tools and resources CS/IR graduate students and SEM/SEO practitioners with some IR knowledge can use for data mining purposes. These can help with nice experiments, from ontology-based keyword discovery to the construction of crawlers (or at least really learn how these actually work).

Here is a list of resources.

(more…)