Recently, I was working on a script that took some text file, parsed it and built an XML tree. Unfortunately, it was limited by the lack of power of the MSXML suite.
Ideally, I would have used LibXML2 because this is a really powerful library. Unfortunately, there is no ActiveX component that wraps LibXML – and doing it myself was not worth it – because of ActiveX, not LibXML2.
So, I had to look for alternatives, languages I know that have strong XML support of some kind. Came to mind (In no order of priority)[^1]:
- Mono/.NET* PHP* Python* Perl* C
Two of them are obvious candidates for the Trash[^2], a third (PHP) was also quickly dropped because of a lack of a decent XML parser and a general bad gut-feeling of using a web development language for a desktop application.
From the remaining two (Mono and .NET are the same, basically), I choose Python. Why? There are several reasons:
- I know more about Python coding than about C# or any CLR language
- LibXML2 has excellent Python bindings
- It’s about time to learn a new programming language
- Python and LibXML2 are both cross-platform
- Python has WebDAV support that I understand and can leverage
(The last point is particularly important with respect to the project I’m working on)
Building the libraries#
Because Python exists for the Mac (OS X) as well, it was obvious that I would do the main development on my yummy little Aluminium beast – and not on the big, noisy PC I got from my boss. (Not that I don’t value my Boss’ efforts to provide me with the best hardware I can dream of, but I love my command-line)
A glance at the LibXML2 web site showed that there are pre-compiled binaries available as [LibXML2 for Mac OS X]. Obviously, there are no [Python bindings] for the version “2.6.8” of the [LibXML2 for Mac OS X]. Bummer, I had to build them myself.
To make it short: The only things you must pay attention to when you build LibXML2 and LibXSLT – including the Python bindings – is to make sure that LibXSLT and the bindings find the right version of the library: Panther already has LibXML2 2.5.4 installed and if you don’t pay attention, either one can fail.
configure of LibXSLT required a
DIR points to the location where you installed LibXML2 (I choose
/usr/local/ for all that stuff) and for the Python bindings, the easiest is to modiy the
Just search the following piece of code in
setup.py and comment-out the part where it includes
/usr/include (The other paths might differ, depending on your system configuration):
# those are examined to find # - libxml2/libxml/tree.h # - iconv.h # - libxslt/xsltconfig.hincludes_dir = ["/usr/include","/usr/local/include","/opt/include",os.path.join(ROOT,'include'),HOME];
It should look like this (more or less):
# those are examined to find # - libxml2/libxml/tree.h # - iconv.h # - libxslt/xsltconfig.hincludes_dir = [# "/usr/include","/usr/local/include","/opt/include",os.path.join(ROOT,'include'),HOME];
Once you’ve made these changes, a
sudo setup.py install should run smoothly.
Using the libraries#
Well, I didn’t want to build them, I wanted to use them. There’s a nifty [Python/XML] tutorial over at [Kimbro Staken’s blog] that covers the basics – that was enuff for me.
There is, btw, an awful lot of nifty stuff in Kimbro’s [XML category] over at his site. Check it out!
I had done this in approx 6hrs. Adding the XML output was a matter of 1hr maybe (Mainly due to my lack of knowledge on Python and libxml programming) and so, after less than one working day, I have a very working prototype of what shall become a Really Useful Technology® for my company.
The toughest part that remains so far is to add support for the WebDAV stuff I mentioned above. This will occupy me some more days I guess, but I am confident that the tools I use allow me to achieve what I want.
After all, Python has a very rigid syntax because it is so strict with respect to whitespace. It turned out that this is more of an advantage than a hassle. It results in clean code without any special effort: You just have to structure your code, otherwise it won’t work. Sweet.
Oh, and yes, I might’ve been bored lately … :)
[^1]Java isn’t on the list because it’s not my language of choice. I see it’s power but for rapid prototyping, it’s just not suitable – and I never really got around it’s syntax.
[^2]No, not .NET and Mono. It’s Perl and C, becaus I never understood Perl’s Syntax and because C is just too clumsy for what I wanted to do. (And C isn’t a real rapid prototyping language neither)
[LibXML2 for Mac OS X]: http://www.zveno.com/open_source/libxml2xslt.html[Python bindings]: http://xmlsoft.org/python.html [Python/XML]: http://www.xmldatabases.org/WK/blog/215_XML_Document_Construction_With_Python_and_libxml2.item [Kimbro Staken’s blog]: http://www.xmldatabases.org/WK/blog/ [XML category]: http://www.xmldatabases.org/WK/blog?t=category&a=XML