It looks like I’ll be teaching this Fall a graduate course on Data Mining (DM) for CS and Business students. I often find myself explaining across disciplines that DM is the Discipline of Knowledge (DK), that there is nothing unusual for someone with a background in chemistry, biology, or business to cross the line of university departments and reach computer engineering courses, looking for data mining or knowledge discovery in data bases (KDD). This might explain why search engine companies hire PhDs from all disciplines.

Some university administrators still don’t get it since they are happy sitting in their little and obscure ivory tower, making unrealistic decisions.

Fortunately, time is our best friend and now many of these old farts are discovering the importance of DM.

So this post goes.

Let me start by answering the question: Why DM is so important in all disciplines?

Well, this question is addressed in Dagstuhl Seminar Proceedings 04292 in the 2005 Workshop Data Mining: The Next Generation

“In recent years, research has tended to be fragmented into several distinct pockets without a comprehensive framework. Researchers have continued to work largely within the parameters of their parent disciplines, building upon existing and distinct research methodologies. Even when they address a common problem (for example, how to cluster a dataset) they apply different techniques, different perspectives on what the important issues are, and different evaluation criteria. While different approaches can be complementary, and such a diversity is ultimately a strength of the field, better communication across disciplines is required if Data Mining is to forge a distinct identity with a core set of principles, perspectives, and challenges that differentiate it from each of the parent disciplines. Further, while the amount and complexity of data continues to grow rapidly, and the task of distilling useful insight continues to be central, serious concerns have emerged about social implications of data mining. Addressing these concerns will require advances in our theoretical understanding of the principles that underlie Data Mining algorithms, as well as an integrated approach to security and privacy in all phases of data management and analysis….”

“…Data Mining is applicable to practically any application where the rate of data collection exceeds the ability of manual analysis, there is an interest in understanding the underlying nature of the application, including unexpected insights, and there is potentially a benefit to be obtained in doing so. We identified a number of applications that can benefit from data mining. These include: Life Sciences (LS), Customer Relationship Management (CRM), Web Applications, Manufacturing, Competitive Intelligence, Retail/Finance/Banking, Computer/Network/Security, Monitoring/Surveillance applications, Teaching Support, Climate modeling, Astronomy, and Behavioral Ecology. Indeed, most scientific disciplines are becoming data-intensive and turning to data mining as a tool.”

As a sample of the importance of KDD, the workshop mentions:

“In our discussion, we distinguish medical data from molecular biology data for several reasons:

Typically, medical data is directly connected to a person, while molecular biology data is not.

In molecular biology research many data sets, e.g., the genomes of several species, are publicly available; therefore, data mining applications in LS often use data produced elsewhere. This is rarely the case in medical applications.

Molecular biology research is often publicly funded and mostly devoted to basic research. In contrast, medical research is often financed by pharmaceutical companies, and often directly addresses questions regarding drug development or drug improvement.”

Some applications of KDD across disciplines include:

1. Finding Transcription Factor Binding Sites
2. Functional Annotation of Genes
3. Detection of Evolutionary Differences between Various Species
4. Revealing Protein-Protein Interactions Using Text Mining
5. Analysis of Data Mining in Life Sciences
6. Customer Relationship Management (CRM)
7. Analysis of Data Mining in CRM

This might explain why some universities have now in-house data mining specialists (very much like a grant writer specialist) whose primary role is to act as supporting staff for researchers drawn from disimilar fields.

Others go to the extreme of offering certificates in data mining for business intelligence and even others are moving toward having an Institute for Knowledge Services designed to interface between a School of Business, a School of Computer Engineering, and Community OutReach/Transfer technology offices.

Advertisements