SDACM Logo  
  San Diego Professional Chapter Association for Computing Machinery
Meetings
Past Meetings
Mailing List
Join ACM
Professional Development
Career Task Force
Jobs
Related Orgs
Membership Policy
Officers

 

Past Meeting - November 20, 2003Meetings Picture

Learn about Data Mining Tools and Techniquies and a Respectable Open Source Implementation

Thursday, October 23, 2003
6:30 P.M. - 8:00 P.M.

Sun Microsystems
Building Sun SAN05
9540 Towne Centre Drive
San Diego, CA 92121

Summary of the meeting

Drs. Balac and Sipes gave an excellent overview of the problems addressed by data mining techniques, then delved into some simple techniques, and ultimately into Weka, an excellent open source general data mining tool. We found out that data mining is only partially about digging into a database and finding correlations. Much of the work is in preparing the data so that its as uniform and complete as possible. Only after that's done can tools like Weka be used to scan and evaluate it.

The story doesn't end there, though. It takes a thorough understanding of the kinds of computations a tool like Weka can make -- knowing what Weka's doing, when to use particular Weka tools, and what the results mean requires advanced statistics, advanced mathematics, and advanced computer skills. Recommendation: while there is a lot one can find out by using data mining disciplines, don't try this without a thorough indoctrination into all of its aspects -- preferably from one of Drs. Balac's and Sipes' UCSD Extension classes!

Abstract

There is an abundance of data that is rapidly being generated. Intelligent software tools are increasingly needed to process and filter the data, detect new patterns and similarities, and learn the information lying hidden in the data. Large databases of information create great opportunities for the application of data mining methods. Conventional computer science algorithms are useful, but not powerful enough in solving many of the knowledge discovery and pattern extraction problems. Data mining approaches (such as decision trees, regression trees, clustering, association rules and neural networks), are ideally suited for domains characterized by the presence of large amounts of noisy data, and the absence of general theories or hypothesis about the data. The fundamental idea behind these approaches is to learn automatically from the data, creating a theory, hypothesis or a model, through a process of inference, model fitting, or learning from examples.

This talk introduces data mining and gives an overview of the basic data mining tools and techniques, followed by a presentation of Weka, a respectable open source data mining tool. We describe Weka and compare it to several other tools. We conclude with what Weka, and data mining in general can accomplish, and how righteously Data Mining has become a topic of so much interest.

Presenter Bios

Natasha Balac, Ph.D. received her Ph.D. in Computer Science from Vanderbilt University, with emphasis in Artificial Intelligence, Data Mining and Robotics. She has developed a novel planning and learning system for a mobile robot, using action models produced by the data mining technique she introduced: multi-variate regression tree induction method. Currently, Natasha is at the San Diego Supercomputer Center as well as teaching Data Mining courses at the University of California San Diego Extension.

Tamara Sipes, Ph.D. is a Data Mining Specialist at Alodar Systems, Inc. a consulting company offering solutions in Bioinformatics, Predictive Modeling, and Enterprise Application Integration. Dr. Sipes uses her data mining expertise to analyze data, select meaningful attributes, discard outlying and redundant information, and build predictive models that discover significant trends and relationships. Her work has led to patent awards for clients in Biotechnology and other industries and published research in the areas of data mining and learning technologies. Dr. Sipes was awarded her doctorate in Artificial Intelligence/Data Mining at Vanderbilt University.