ELKI, the data mining framework I use for all my research, is coming along nicely, and will see continued progress in 2013. The next release is scheduled for SIGMOD 2013, where we will be presenting the novel 3D parallel coordinates visualization we recently developed. This release will bear the version number 0.6.0.

Version 0.5.5 of ELKI is in Debian unstable since december (Version 0.5.0 will be in the next stable release) and Ubuntu raring. The packaged installation can share the dependencies with other Debian packages, so they are smaller than the download from the ELKI web site.

If you are developing cluster analysis or outlier detection algorithm, I would love to see them contributed to ELKI. If I get a clean and well-integrated code by mid june, your algorithm could be included in the next release, too. Publishing your algorithms in source code in a larger framework such as ELKI will often give you more citations. Because it is easier to compare with your algorithm then and to try it on new problems. And, well, citations counts are a measure that administration loves to judge researchers …

So what else is happening with ELKI:

  • The new book “Outlier Analysis” by C. C. Aggarwal mentions ELKI for visual evaluation of outlier results as well as in the “Resources for the Practioner” section and cites around 10 publications closely related to ELKI.
  • Some classes for color feature extraction of ELKI have been contributed to jFeatureLib, a Java library for feature detection in image data.
  • I’d love to participate in the Google Summer of Code, but I need a contact at Google to “vouch” for the project, otherwise it is hard to get in. I’ve been sending a couple of emails, but so far have not heard back much yet.
  • As the performance of SVG/Batik is not too good, I’d like to see more OpenGL based visualizations. This could also lead to an Android based version for use on tablets.
  • As I’m not an UI guy, I would love to have someone make a fancier UI that still exposes all the rich functions we have. The current UI is essentially an automatically generated command line builder - which is nice, as new functionality shows up without the need to modify UI code. It’s good for experienced users like me, but hard for beginners to get started.
  • I’d love to see integration of ELKI with e.g. OpenRefine / Google Refine to make it easier to do appropriate data cleaning and preprocessing
  • There is work underway for a distributed version running on Hadoop/YARN.