# Dataset for Sequential Sentence Classification

**10**
*Wednesday*
Apr 2019

**10**
*Wednesday*
Apr 2019

A research group at CSIE department of National Taiwan University (NTU), supervised by Prof. Shou-De Lin, is currently working on releasing a new dataset for sequential sentence classification and type classification for paper abstracts on arXiv.

This annotation and classification project is available at https://mslab.csie.ntu.edu.tw/~labeler/abstract.php

Annotations are made by article authors and upon being invited/contacted by the research group. I’m happy to see they included in the dataset my “Local Term Weights Models from Power Transformations | Development of BM25IR” paper (https://arxiv.org/ftp/arxiv/papers/1608/1608.01573.pdf).

Advertisements

**01**
*Monday*
Apr 2019

**Tags**

Apache, BM25, BM25 Tutorial, BM25F, BM25F Tutorial, minerazzi, open source, PHP

Here is a nice article from International Journal of Computer Applications Technology and Research, https://ijcat.com/archieve/volume7/issue10/ijcatr07101003.pdf, on text mining digital libraries with OKAPI BM25. The authors successfully used a PHP implementation (XAMPP bundle).

Happy to see in reference 22 of the article that they cited one of my tutorials, on the BM25F implementation. This one is available at http://www.minerazzi.com/tutorials/bm25f-model-tutorial.pdf (and through researchgate.net).

The authors generously included some PHP snippets so it should be easy for those that can code to follow up.

**20**
*Thursday*
Dec 2018

Posted Algorithms, calculators, chemical mining, chemistry, Chemometrics, Data Mining, Programming, Software

inThis tool computes bond orders of diatomic species having up to 20 electrons, without using Molecular Orbital Theory! It is available at

http://www.minerazzi.com/tools/bond-order/calculator.php

We developed the tool inspired in Dr. Arijit Das set of innovative and time economic formulae for chemical education. His methodologies are suitable for computer-based learning (CBL) activities or for writing computer programs for solving chemistry problems.

Unlike with other bond order calculators, to use ours you don’t need to write Lewis structures, and electron configurations, or count electrons, bonds, orbitals, and atoms. Just enter a chemical formula and the tool will do the rest for you.

In my opinion, students who know how to write programs for solving chemistry problems have an edge when taking quantitative courses like analytical chemistry, instrumental analysis, chemometrics, computational chemistry, and similar courses. I think they might be better prepared for multidisciplinary research work than those who cannot code.

Developing this tool was really gratifying as the work inspired us to derive an algorithm for predicting number of unpaired electrons and magnetic properties of single atoms, diatomic species, and their ions. Hopefully, this algorithm will be available early next year in the form of a new chemistry calculator.

We are also developing a tool that computes bond orders of all kind of species, including the polyatomic cases.

We are sincerely in debt to Dr. Arijit Das from Ramthakur College, Agartala, West Tripura, India for encouraging us to develop this tool for educators, scholars, and chemistry students.

Note:

This tool, as our Hydrocarbons Parser (http://www.minerazzi.com/tools/hydrocarbons/parser.php) is listed in the City College Chemistry Web Resources Guide at CUNY. Find them both in the guide Computational Chemistry category (http://libguides.ccny.cuny.edu/chemistry/computational).

**14**
*Friday*
Sep 2018

We have updated and improved our Regression & Correlation Calculator to demonstrate, as shown in the above figure, that a Spearman’s Correlation Coefficient is just a Pearson’s Correlation Coefficient computed from ranks.

The tool uses an algorithm that converts values to ranks and averages any ties that might be present before calculating the correlations. This comes handy when we need to compute a Spearman’s Correlation Coefficient from ranks with a large number of ties.

We have explained in the “What is Computed?” section of the page’s tool that as the number of ties increases the classic textbook formula for computing Spearman’s correlations

increasingly overestimates the results, even if ties were averaged.

By contrast, computing a Spearman’s as a Pearson’s always work, even in the presence or absence of ties.

To illustrate the above, consider the following two sets:

X = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Y = [1, 1, 1, 1, 1, 1, 1, 1, 1, 2]

using Spearman’s classic equation r_{s} = 0.6364 ≈ 0.64.

By contrast, r_{s} = 0.5222 ≈ 0.52 when computed as a Pearson coefficient derived from ranks. This is a non trivial difference.

Accordingly, we can make a case as to why we should ditch for good Spearman’s classic formula.

We also demonstrate in the page’s tool why we should never arithmetically add or average Spearman’s correlation coefficients. The same goes for Pearson’s.

Early articles in the literature of correlation coefficients theory failed to recognize the non-additivity of Pearson’s and Spearman’s Correlation Coefficients.

Sadly to say, this is sometimes reflected in current research articles, textbooks, and online publications. The worst offenders are some marketers and teachers that, in order to protect their failing models, resist to consider up-to-date research on the topic.

PS. Updated on 09-14-2018 to include the numerical example and to rewrite some lines.

**05**
*Wednesday*
Sep 2018

This is a great tool from Google

https://www.blog.google/products/search/making-it-easier-discover-datasets/

I will try to see how the feed flattener can benefit from it.

http://www.minerazzi.com/tools/flattener/feed-flattener.php

**12**
*Sunday*
Aug 2018

Posted Algorithms, Data Mining, Fractal Geometry, Fractal Patterns, Mathematics, miner

inI’m reading with great interest biographic notes and work of Peter Scholze who at the age of 30 is one of the youngest Fields Medal Award Laureates. He has already won most of the top awards in Mathematics.

He currently is a Max Planck Institute for Mathematics director and a Hausdorff Chair at the University of Bonn. Super impressive!

Scholze’s key innovation is a class of fractal structures that he calls perfectoid spaces (2011 PhD thesis) which has far-reaching ramifications in the field of Arithmetic Geometry.

To help others learn about his awesome research work, the following links were indexed in the Math Bios (http://www.minerazzi.com/mathbios) miner.

https://en.wikipedia.org/wiki/Peter_Scholze

https://www.wired.com/2016/07/the-oracle-of-arithmetic/

https://www.scopus.com/authid/detail.uri?authorId=45561671200

https://www.mpim-bonn.mpg.de/node/8461

http://www.hcm.uni-bonn.de/uploads/tx_bzdstaffdirectory/d2ff97eae17a476570c2107b25d84778.pdf

http://www.math.uni-bonn.de/people/scholze/PerfectoidSpaces.pdf (2011 PhD thesis).

A miner on its own class and focused on perfectoid spaces is more than meritorious, I believe. Don’t you think so?

PS. Here is an introductory note by Jared Weinstein on perfectoid spaces:

https://www.msri.org/system/cms/files/83/files/original/141109_Emissary-Fall-2014-Web.pdf

Here are some notes about his greatness

https://www.mathunion.org/fileadmin/IMU/Prizes/Fields/2018/scholze-final.pdf

And here is a chat on perfectoid spaces:

https://mathoverflow.net/questions/65729/what-are-perfectoid-spaces

I decided to go ahead and build the perfectoid spaces miner. It should be ready pretty soon.

PS. The Perfectoid Spaces Miner is now available

https://irthoughts.wordpress.com/2018/08/16/perfectoid-spaces-miner/

Read also about its implications for Quantum Theory at

Additional links are listed there.

**28**
*Saturday*
Jul 2018

**Tags**

This is the second part of a previous post.

**20**
*Friday*
Jul 2018

Posted Algorithms, Chaos, Data Mining, Fractal Geometry, Fractal Patterns, Mathematics, News, Programming, Software

inThe figure was generated with our Chaos Game Explorer tool, using the algorithm described at

http://www.minerazzi.com/tools/chaos-game/chaos-game.php

and as presented in Barnsley’s books (Fractals Everywhere, 1988; The Desktop Fractal Design HandBook, 1989).

The game was played N = 100,000 times by randomly placing a point within an n-gon (polygon with n vertices), using different combinations of vertices (n) and scale ratios (r), and by coloring in white the emerging patterns. Some combinations produce patterns somehow resembling ancient calendars, medallions, rings,… from different ancient cultures.

For the above figure, I used n = 12 and r = 0.30.

Running the algorithm by coding the pixels in different colors reveals that the patterns are just the result of partially overlapping the same n-gon across many scales of observations. Did ancient cultures know about this technique?

Just for fun, you may want to try with other values, then run searches in Google Images for ancient calendars, medallions, rings, etc and compare results. Share your images and let me know if you found something interesting. I’m documenting results.

**16**
*Wednesday*
May 2018

Posted Algorithms, Data Mining, miner, minerazzi, Theoretical Physics

inThe Theoretical Physics Miner, available at

http://minerazzi.com/theoretical-physics/

is our most recent search solution.

Use its recrawling capabilities under a given search result to start building your own curated collection.

Use its news section at

http://www.minerazzi.com/theoretical-physics/spp.php

to access all ARXIV and MIT news feeds relevant to theoretical and experimental physics.

The figure below is for illustration purposes. It was generated through affine transformations that include reflection operations within an n-gon. Any resemblance with a black hole at its center is pure coincidence.

**22**
*Sunday*
Apr 2018

**Tags**

Algorithms, Linear Algebra, Mathematics, Matrices, Polynomial Regression, Regression, tools, tutorials

If you are a chemist, biodesigner, or a researcher working in other fields, eventually you may need to fit a paired data set to a polynomial regression model. You could use software to do that, or build your own solution. This tutorial is aimed at those interested in the latter. Access it now at

http://www.minerazzi.com/tutorials/polynomial-regression-tutorial.pdf

Three different methods for implementing polynomial regression are described. Teachers and students might benefit from the tutorial since the calculations can be done with a spreadsheet software like Excel, by writing a computer program, or with a programmable calculator.