I wonder if we should do some RDF representation of packages - especially including dependency data.

That way we can maybe use some RDF reasoners to query our data, and maybe extract some interesting information. On the other hand, people with interest in RDF to use our real-world data for their experiments, and maybe we get something back from them.

There is a couple of package metadata we currently are not tracking inside the actual archive, but in different places. Including licensing information (debian/copyright), homepage location (on packages.qa.debian.org), download location (debian/watch) - it would be nice to aggregate these into some RDF store, and export them somehow.

For most of the package information (especially dependency information), we’ll have to write our own ontology (I wonder if we can map version numbers to some standard rule language, or if applications will need an external reasoner to process them?); for some things we can reuse the FOAF (Friend-of-a-friend) or DOAP (Description of a project) ontologies. The first is rather common for describing people, people-people and people-thing relationships; the latter was designed for describing opensource projects (but won’t be directly applicable to packages of a project).

I’ve blogged about my RDF export of Debtags data before; the canonical first step would actually have been to export the package data, and enrich it with the Debtags collected data…

Note that RDF is designed in a way that you can have one site provide metadata for another site. For example, the Debtags RDF export contains “category” information for Debian packages, but does not contain e.g. the description of the packages it talks about via an URI. So there is nothing wrong from a RDF point of view of keeping e.g. the licensing, watch or homepage data separate.

For the Google Summer of Code, there was a proposal including “collaborative repository of meta-informations about source packages (CRMI)”; but the first part of the proposal, the “distribution wide tracker tool (DWTT)” showed to be a bigger task than expected.

But maybe we’ll still see CRMI at some point, and maybe we can have it provide an RDF export of data (using a semantic wiki might be a good starting point for CRMI maybe?).

[P.S. this blog posting maybe belongs more into the en/linux/debian category. But only the xml category is also syndicated on planet.XMLhack, and I want this post to go there to reach more RDF users. I really need to switch my blog to some software which supports tagging…]