Why a blog software should be XML based

Simple answer: because RSS and Atom are XML formats. So please generate valid output. Adhere to the standards! The RSS feeds of Planet Gnome and Planet Ubuntu, for example, are currently broken. (Planet Gnome due to a bug in Planetplanet AFAICT, Planet Ubuntu due to a broken RSS feed from Wordpress… Yes, you must escape single < characters in XML…)

Templating systems such as the default templating of the Django Framework, or the popular Clearsilver templating engine are unfortunately not suited for XML output. Oh, and please never ever write XML using ‘print’ commands either. Use a proper XML writer, which can handle charset issues and escaping properly.

Good examples for XML-enabled templating engines include TAL and METAL used by Zope (but available for a variety of languages), and KID (again python, used by the Turbogears Framework).

Closely related is the “HTML fragments” issue, which basically is why I want XML to be used internally, too (you could also store StructuredText only, and convert it to XML just for transformation to the output):

HTML fragments in my blog should be valid slices of an XHTML file, to avoid issues when generating both the web pages as well as when integrating the feed into other pages. With RSS, the HTML code is AFAICT escaped in one big data chunk, so it doesn’t matter there. It probably does for Atom.

So what a good blog tool (including a rewrite of planetplanet which has way too many bugs) needs to do is to parse it’s input data (plain text, HTML provided by a web browsers WYSIWYG edit component, mail, structured text, XML, …) and either reject broken entries or try to guess whats intended, but guarantee that the output is valid XML. Then generate proper feeds and output from that.

Thanks to all those who already replied to my previous posting. One thing I was pointed at, and I’ll probably look into is Apache Forrest. Although I likely will use KID instead, if I happen to write my own tool after or despite my upcoming final exams.