I am intrigued with the subject of data mining poetry. This is an interesting topic for a grad student thesis since:

EXACT querying search engines for “data mining poetry” returns a small answer set.

Unlike other type of content, all words, including those considered stopwords, might matter; i.e., these must be counted as might act as content-bearing terms –thus, there is no such thing as stopwords in poetry.

Word statistics (e.g., word counts per lines) and specific tokens matter, unless we talk about the so-called free-style poetry.

Metric makes poetry suitable for building language-specific and writing style-specific parsers.

Any help will be kindly appreciated. Meanwhile, here are some relevant links:

