Extending Anthologize: Part 1

Anthologize has seen two version releases since its initial launch in August. Much of the progress since then (aside from some smallish – but needed – features and bugfixes) has been centered on the development of a plugin architecture for Anthologize, a system that will allow individuals to build custom output formats for their Anthologize content in a relatively straightforward way. Anthologize development team member (and metadata badass) Patrick Murray-John has been hard at work on this project: creating prototype plugins, writing blog posts about the plugin process, and of course building large parts of API itself. In this series of posts, I’d like to augment some of Patrick’s specific musings with a general explanation of how Anthologize works, and how the plugin API lets you tap into it. Think of it as an introduction to Anthologize plugin building.

First, a brief peek under the hood. Anthologize’s tagline is “Use the power of WordPress to transform online content into an electronic book.” How does Anthologize take you from point A – your WordPress content – to point B – your ebook? When you press the Export button on the final Export Project screen, you set into motion a two-step process.

Step 1: WordPress to TEI(ish)

WordPress content is stored in the wp_posts database table, and is typically accessed using WP’s “loop”, which is a set of iterative and indexical functions that make it easy to get and display post data in whichever way you’d like. When you hit the final Export button, Anthologize catches the form submit and hands things off to the format translator, which will get the content out of the database. We’ll use PDF as our example. The main translator file is base.php. That file acts like a top-level manager for all the things that have to be done in order for PDFs to be generated. The first important thing that any export format has to do is to call up the content of the project, which it does by instantiating the TeiDom class.

In many ways, TeiDom is the workhorse of Anthologize. Using the session data passed to it from the PDF base.php file – data that includes the project id, the desired page size, the content of the dedication, stuff like that – it uses a variation on The Loop to collect each part and item from an Anthologize project. Those objects, along with their metadata, are then fed into an empty TEI template file before getting handed back to the individual export format translator.

Why the middleman? Early in the development process, the team made a few decisions about the way that Anthologize ought to operate. First, though it’s currently being developed as a WordPress plugin, we should anticipate a time when much of Anthologize could be ported to another CMS or to a standalone application. If format translators like PDF had to dig directly into the WP database, or were forced to use The Loop to get project data, then they’d have to be refactored when the data lived somewhere other that WP. TEI is a platform-independent format, and since format translators like the PDF generator communicate only with the TeiDom (not directly with WordPress), they should be fairly platform-independent as well. Second, if we were going to have a middleman, we wanted it to be one with extremely broad expressive power, and one for which standards and translation techniques already exist. By choosing TEI, we open the door for armies of archivists, librarians, and other such format wizards, armed with XSLT ninjitsu. (In fact, that’s how Anthologize dev team member Patrick Rashleigh built the Anthologize epub generator!)

In the header for this section, I say TEI(ish) rather than TEI. That’s because the Anthologize middleman TEI layer is not the kind of TEI that your local text-encoder might expect. In particular, the content of your WordPress blog posts, which is already marked up with HTML, is untouched in the export process. A true TEI markup of your text would mean lots of manual encoding, so we just pass it along as-is. This untouched HTML post content, however, is embedded in a larger TEI framework for holding the metadata and generally explaining the structure of the project document. It’s not necessarily the kind of document you could use to build a richly marked-up text visualization, but it works well for the purposes of simple presentation.

Step 2: TEI to your format

Once the format generator (remember our friend templates/pdf/base.php?) gets the TEI document back from the TeiDom workhorse – which is essentially shared by all Anthologize export formats – the format-specific work can begin. PDF uses its own custom TEI-to-PDF class, along with an included PDF generation library, to parse the TeiDom object and turn it into something that, when delivered to your browser, is understood as a PDF. This, of course, is the hard part of building a translator, and is very format-specific.

The cool part about this part of the process, though, is the amount of flexibility that is emerging from the Anthologize architecture. Different format translators can deal with data in different ways; to wit:

  • The built-in PDF generator uses XPaths and some basic PHP loopage to format the final document. It’s also got some custom helper methods (eg get_book_title()) that it uses to make the the rendering code a bit easier to use.
  • The built-in ePub generator uses XSL transformations to move from the TEI document to the HTML-esque ePub output.
  • Patrick MJ has been working on a set of theming functions that will allow plugin authors to construct a loop very similar to the WordPress post Loop for the display of their data.
  • Because of the nature of PHP applications, export formats could always bypass Anthologize’s TEI and other API options and head directly for the WP database, using some embedded WP_Query/have_posts() loops.

Once your parser has turned Anthologize TEI data into the format necessary for your chosen format, you’ll need to deliver it to the browser by sending the write headers (here’s how ePub does it as an example).

In my next blog post, I’ll use an example to show how a plugin can register itself with Anthologize to take advantage of all these goodies.

1 thought on “Extending Anthologize: Part 1

  1. Pingback: Teleogistic / Extending Anthologize: Part 2

Leave a Reply

Your email address will not be published. Required fields are marked *