This is an amazing research: Latent Simplex Position Model: High Dimensional Multi-view Clusteringwith Uncertainty Quantification, by Prof. Leo Duan from Department of Statistics University of Florida, Gainesville, FL.
Back in 2017, Stepanyan & Petoukhov reported that long nucleotide sequences can be modeled as binary fractals by means of Kronecker exponentiation of matrices.
Abstract reads in part:
“This method uses a set of symmetries of biochemical attributes of nucleotides. It also uses the possibility of presentation of every whole set of N-mers as one of the members of a Kronecker family of genetic matrices. With this method, a long nucleotide sequence can be visually represented as an individual fractal-like mosaic or another regular mosaic of binary type.”
We added the fractal resembling the pattern of the nucleotide sequence Homo sapiens chromosome 22 genomic scaffold into our Fractal Studio tool at
Researchers can reproduce its binary mosaic, shown above, by just selecting the Homo Sapiens Mosaic option from the tool selection menu. Compare results with Figures 4 and 8 of Stepanyan & Petoukhov article. Compare also some multifractals that the tool generates with some of the genetic mosaics described in the article.
Multidisciplinary research is a beautiful thing.
Two of our tools, Web Feed Flattener and Feed URLs Extractor, were updated and now accept files with the .xml extension so we changed their names to indicate this. These tools are available at
These updates take the tools to a whole new level. Now you can flatten the tree structure of files like sitemaps.xml and similar files and extract URLs. Just submit a target web address and you are good to go.
I know there are tools out there that can scrape .xml files in order to extract specific pieces of data like URLs, but found them too cumbersome. A major drawback of said design alternatives is that frequently one must know in advance how the document tree was constructed, with all of its tags and nuances, before coding a tool. To top off, if the author of the file changes or edits tags, probably the tool won’t work as expected.
Our approach is different and very flexible. The key here is the flattening of the document tree structure embedded in XML files without even having to know how it was designed or edited. Document tree flattening will unveil this information before you can say: “Give me some soup!”
Of course, we assume that the document tree has no orphan or broken tags (and better, pass validation) which is something to be expected from trusted sources. If it is not valid, well, there are ways of fixing it or ignore the offenders.
With the proposed technique, we can mine all sort of .xml files and build customized tools on top of the flattened results, like derivative tools for mining sitemaps, inventories, raw data, recipes, etc… No need to know in advance anything about the document tree, resource to additional scripting technologies, software, or reinvent the wheel.
Right now we can mine sitemaps all over the Web, including sitemaps hosted at Google, W3C, company sites, etc, and then recrawl the output to grow a microindex. See “Suggested Exercises” sections of the tools for interesting examples. This is a value-added approach for our Maps2Miners ongoing project.
Considering that there are government agencies and organizations facilitating data in .xml format for developers to mine, flattening .xml files and build on top of these is one of those “ah-ha!” ideas.
The Chaos Game Explorer is our most recent tool. It was developed to help users replicate many of the patterns found in the fractal geometry literature. The tool is available at
The following complementary collections were reindexed and updated
Information Retrieval, http://www.minerazzi.com/irc
Data Structures & Algorithms, http://www.minerazzi.com/dsac
Both include RSS news channels to Bing, Google, MIT, and Arxiv so users can easily find news relevant to these collections.