Testing of Inductive Preprocessing Algorithm. Miroslav Cepek, Miroslav Snorek, Pavel Kordik

Abstract. The data preprocessing is very important part of the knowledge discovery process. Data mining systems contains tens of preprocessing methods (for example methods for missing data imputation, data reduction, discretization, data enrichment, etc...) and usually it is not clear which methods to use. The selection of preprocessing methods appropriate for particular dataset needs strong experience and a lot of experimenting. In this paper we will test our extension of inductive approach to data preprocessing. We developed inductive preprocessing method which utilizes genetic algorithm to compose from scratch a sequence of preprocessing methods which fits to the data and allows successful model to be created. To test our automatic preprocessing utilize several real-world datasets available from UCI Machine learning repository. To extend our experiments we selected three common problems with dataset – missing data, imbalanced classes and data with noise and introduce them into the data. In this paper we will demonstrate abilities of inductive preprocessing method.

Keywords. Inductive preprocessing, UCI.

References.

1. Miningmart internet case base, available  http://mmart.cs.uni-dortmund.de/end-user/casebase.html.

2. Uci machine learning repository, available at  http://www.ics.uci.edu/ mlearn/mlrepository.html, Sept. 2006.

3. B. A., P. F., and H. S. Intelligent assistance for the data mining process: An ontology-based approach. Information

Systems Working Papers Series, 2002.

4. A. Bernstein and F. Provost. An intelligent assistant for the knowledge discovery process. Proceedings of the

IJCAI-01 Workshop on Wrappers for Performance Enhancement in KDD, 2001.

5. A. Bernstein, F. Provost, and S. Hill. Towards intelligent assistance for a data mining proces. IEEE Transactions on

Knowledge and Data Engineering, 17(4):503518, 2005.

6. P. Brazdil, C. Giraud-Carrier, C. Soares, and R. Vilalta. Metalearning, Applications to Data Mining. Cognitive

Technologies. Springer Berlin Heidelberg, 2009.

7. T. Euler. Publishing operational models of data mining case studies. Proceedings of the ICDM Workshop on Data

Mining Case Studies, 2005.

8. T. Euler, K. Morik, and M. Scholz. Miningmart: Sharing successful kdd processes. LLWA 2003 Tagungsband der

GI-Workshop-Woche Lehren Lernen Wissen Adaptivitat, 2003.

9. T. Euler and M. Scholz. Using ontologies in a kdd workbench. Proceedings of the ECML/PKDD Workshop on

Knowledge Discovery and Ontologies, 2004.

10. K. Morik and M. Scholz. The miningmart approach to knowledge discovery in databases. In N. Zhong and J. Liu,

editors, Intelligent Technologies for Information Analysis. Springer, 2004.

11. D. Pyle. Data Preparation for Data Mining. Morgan Kaufmann Publishers, 1999.

Last modified by Gleb on 10/29/09 15:28:33 (2 years ago)

Attachments