ELKI, the data mining framework I use for all
my research, is coming along nicely, and will see continued progress in 2013.
The next release is scheduled for
SIGMOD 2013,
where we will be presenting the novel 3D parallel coordinates visualization we recently developed.
This release will bear the version number 0.6.0.
Version 0.5.5 of ELKI is in Debian unstable since december (Version 0.5.0 will be in the next
stable release) and Ubuntu raring. The packaged installation can share the dependencies with other
Debian packages, so they are smaller than the download from the ELKI web site.
If you are developing cluster analysis or outlier detection algorithm, I would love to see
them contributed to ELKI. If I get a clean and well-integrated code by mid june, your
algorithm could be included in the next release, too. Publishing your algorithms in source
code in a larger framework such as ELKI will often give you more citations. Because it is
easier to compare with your algorithm then and to try it on new problems. And, well, citations
counts are a measure that administration loves to judge researchers ...
So what else is happening with ELKI:
- The new book "Outlier Analysis" by C. C. Aggarwal mentions ELKI for visual evaluation of
outlier results as well as in the "Resources for the Practioner" section and cites around
10 publications closely related to ELKI.
- Some classes for color feature extraction of ELKI have been contributed to
jFeatureLib, a Java library for feature
detection in image data.
- I'd love to participate in the Google Summer of Code, but I need a contact at Google to
"vouch" for the project, otherwise it is hard to get in. I've been sending a couple of
emails, but so far have not heard back much yet.
- As the performance of SVG/Batik is not too good, I'd like to see more OpenGL based visualizations.
This could also lead to an Android based version for use on tablets.
- As I'm not an UI guy, I would love to have someone make a fancier UI that still exposes all
the rich functions we have. The current UI is essentially an automatically generated command line
builder - which is nice, as new functionality shows up without the need to modify UI code.
It's good for experienced users like me, but hard for beginners to get started.
- I'd love to see integration of ELKI with e.g.
OpenRefine / Google Refine
to make it easier to do appropriate data cleaning and preprocessing
- There is work underway for a distributed version running on Hadoop/YARN.