Tags

, , , , ,

One way of mining web feeds consists in adopting the following two-step strategy:

Step 1: Convert a web feed into a multidimensional associative array, A.

Step 2: Flatten A into a one-dimensional array, B.

The rest is a matter of reading and manipulating the key-value pairs of A and B and use that information for other text mining purposes, like document validation or the design of a feed parser capable of discriminating between feed formats (i.e., ATOM, RSS, and RDF).

We have developed a tool aimed at that: The Web Feed Flattener

http://www.minerazzi.com/tools/flattener/feed-flattener.php

The tool was tested with several news sites that offer web feeds, like MIT News (http://news.mit.edu/rss), and with several blog feeds.

Since then we have found other interesting uses for it.

The tool works fairly well with local and remote feeds, but might fail to properly convert a feed if its access is blocked or if the feed is not a valid XML document.

Advertisements