The URL Query Parser is our most recent tool for mining URLs. It is available at

http://www.minerazzi.com/tools/url-query/parser.php

What is a URL query?

A URL query is the trailing text after the question mark (?) found in a URL. It consists of attribute-value pairs delimited by ampersands (&). These are also called name-value, key-value, or field-value pairs.

What this tool does

This tool parses URL queries and extracts its name-value pairs.

The tool helps users identify and filter URL queries from a collection or build collections consisting exclusively of URL queries.

With minor modifications, the tool can be converted into a massive URL cleaner. We are currently building another tool that does this, precisely. In this way we may be able to clean up URLs found in Google and Bing search result pages and safely use them in data mining studies.

What is computed

  • Up to 5,000 URLs can be parsed. If no query is found in a URL, that record is ignored.
  • We have arbitrarily imposed the 5,000 limit for several reasons: to (a) provide fast responses, (b) minimize browser crashes, and (c) minimize abuses.
  • Users can opt between two query result modes:
    • individual results (useful for comparing individual URL queries).
    • combined results (useful for comparing specific name-value pairs).

    The latter is the default mode. Since in this mode results are alphabetically sorted, users can easily identify the most common or popular name-value pairs.

Implications to Web Security

This tool can be used by those interested in mining URL queries or conducting studies relevant to Web Security. Why? Please keep reading.

URL queries are used to transmit small pieces of data in the form of name-value pairs. The transmission can be of three types: (a) between web pages, (b) between a web page and a database, or (c) between databases. Real-world applications include access to web services, social profiling, and cloud computing, among others (Kantarcioglu, 2013).

In addition, URL queries are frequently used as vehicles for transmitting session parameters, form data, tracking mechanisms, user names, email addresses, and other data considered sensitive by users.

In a 2014 study, West & Aviv, from Verisign and the US Naval Academy, analyzed over 892 million user-submitted URLs containing 1.3 billion name-value pairs. They found over a quarter-billion plain text pairs involving referral tracking, with more than 10 million pairs potentially revealing some form of demographic, identity-based, or geographical information. Extreme cases involved the facilitation of password authentication credentials, email addresses, and user names (West & Aviv, 2014).

Thus, the development of tools designed to mining URL queries is something relevant to Web Security.

Suggested Exercises

  • Do a search in several search engines or public databases. Collect a set of URL queries and submit this set to our tool. Compare results.
  • Analyze a set of URL queries obtained from a public forum, social networks, or groups (e.g. Google Groups).
  • For this exercise you need to install a browser add-on to facilitate collection of URLs. In Firefox for instance you can install an add-on plugin that lets you selects multiple links and copy their URLs. Do a search in Google or similar search engines and with said add-on collect search result URLs. Submit these URLs to our tool. Compare results. This is a nice way of grabbing intelligence from URL queries relevant to specific search terms. In addition, since Google lets you do advanced field-specific searches (e.g., inurl, intitle, etc), this is a nice way of mining URL queries driven by advanced searches.

References

Advertisements