Dataset visualization based on a simulation of intermolecular forces. Jan Drchal, Pavel Kordik, Miroslav Snorek. IWIM, Prague, 2007.

 Article (in pdf)

Abstract. The visualization is an important technique used in many stages of data mining process. This article deals mostly with visualization for preprocessing purposes. The aim of our approach is to visualize distances (Euclidean or others) between data samples. This can be helpful when taking picture of data clustering. In classification tasks it can be used to select outlayer for removal. In this paper we present a novel way of such visualization which is based on a physical system simulation. It is inspired by intermolecular forces and employs overall energy minimization. This minimization is done via known unconstrained optimization numerical methods such as Steepest Descent, Conjugated Gradients or Quasi-Newton. The proposed algorithm was originally designed and was found useful when interpretting diversity in evolutionary algorithms. Here, we show its properties on well-known datasets Iris and Ecoli.

Keywords. Data mining, visualization, optimization.

References.

  1. PAL: Phylogenetic analysis library.  http://www.cebl.auckland.ac.nz/pal-project/index.html.
  1. J.F. Bonnans, J.C. Gilbert, C. Lemar´echal, and C.A. Sagastiz´abal. Numerical Optimization: Theoretical and Practical Aspects. Springer-Verlag, Berlin Heidelberg, Germany, 2003.
  1. C.L. Blake D.J. Newman, S. Hettich and C.J. Merz. UCI repository of machine learning databases, 1998.  http://www.ics.uci.edu/~mlearn/MLRepository.html.
  1. R. Fletcher. Practical Methods of Optimization Vol.1: Unconstrained Optimization. John Wiley & Sons, New York, USA, 1980.
  1. M. ˇSnorek J. Drchal. Diversity visualization in evolutionary algorithms. In J. ˇStefan, editor, Proceedings of 41th Spring International Conference MOSIS 07, Modelling and Simulation of Systems, pages 77–84. Ostrava: MARQ, 2007.
  1. J. W. Sammon Jr. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, C-18(5):401–409, 1969.
  1. W.J. Moore. Physical Chemistry. Prentice Hall, New York, USA, 1972.

  1. J. Nocedal and S.J. Wright. Numerical Optimization. Springer-Verlag, New York, USA, 1999.
  1. W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes in C: the art of scientific computing 2nd ed. Cambridge University Press, Cambridge, 1992.
  1. R.B. Schnabel, J.E. Koontz, and B.E. Weiss. A modular system of algorithms for unconstrained minimization. ACM Transactions on Mathematical Software, 11(4):419–440, December 1985.
Last modified by Perelom on 11/03/07 12:18:50 (4 years ago)