Handling LaTeX in WordPress and React.js

I’m in the midst of building a WordPress plugin that interfaces with WeBWorK, a web application for math and science homework. The plugin features a JavaScript app powered by React and Redux, which lives inside of a WordPress page template and communicates with WP via custom API endpoints. The project is for City Tech OpenLab and it’s pretty cool. I’ll write up some more general details when it’s closer to release.

The tool is designed for use in undergraduate math courses, so LaTeX-formatted content features heavily. LaTeX is beautiful when used on its own, but it’s hard to handle in the context of web applications. A few lessons and strategies are described below.

Delimiters

I’m dealing with “mixed” content: plain text that is interspersed with LaTeX. Some of this content is coming directly from WeBWorK, which formats LaTeX to be rendered with MathJax. WeBWorK is sending markup that, when simplified, looks like this:

This is plain text.
Here is an inline equation: <script type="math/tex">E = mc^2</script>
And here is the same equation as a standalone block:
<script type="math/tex; mode=display">E = mc^2</script>


It’s also possible for users to enter LaTeX from the front-end of the WP application. In this case, we’ve settled on the verbose delimiters \begin{math} and \end{math}, but we also support a shorthand delimiter borrowed from Jetpack’s LaTeX renderer:

This is plain text.
Here is an inline equation: \begin{math}E = mc^2\end{math}
And here is the same equation as a standalone block:
$latex E = mc^2$


Some of these delimiter formats are more fragile than others, for various reasons – script tags that cannot be escaped, false positives due to use of literal dollar signs, etc – so I standardized on some invented delimiters for data storage. Before saving any LaTeX content to the database, I’m converting all delimiters to the following formats (of my own invention, which explains their Great Beauty):

This is plain text.
Here is an inline equation: {{{LATEX_DELIM_INLINE_OPEN}}}E = mc^2{{{LATEX_DELIM_INLINE_CLOSE}}}
And here is the same equation as a standalone block:
{{{LATEX_DELIM_DISPLAY_OPEN}}}E = mc^2{{{LATEX_DELIM_DISPLAY_CLOSE}}}


This way, the delimiters are unique and easy to process on display.

Slashes

I’m using WordPress custom post types to store data. The lowest-level function available for saving post data is wp_insert_post(). Thanks to a glorious accident of history, this function expects data to be “slashed”, as in addslashes() and magic quotes. If you’re looking for a fun way to spend a few hours, read through this WordPress ticket and figure out a solution, please.

LaTeX uses \ as its escape character, which makes LaTeX-formatted content look “slashed” to WordPress. As such, attempting to save chunks of LaTeX using wp_insert_post() will result in slashes being removed and formatting being lost. So I had two choices: don’t use wp_insert_post(), or don’t use slashes.

I chose the latter. Before saving any content to the database, I identify text that appears between LaTeX delimiters, and I replace all slashes with a custom character. In all its glory:

public function swap_latex_escape_characters( $text ) {$regex = ';(\{\{\{LATEX_DELIM_((?:DISPLAY)|(?:INLINE))_OPEN\}\}\})(.*?)(\{\{\{LATEX_DELIM_\2_CLOSE\}\}\});s';
$text = preg_replace_callback($regex, function( $matches ) {$tex = str_replace( '\\', '{{{LATEX_ESCAPE_CHARACTER}}}', $matches[3] ); return$matches[1] . $tex .$matches[4];
}, $text ); return$text;
}


So a chunk of LaTeX like E = \frac{mc^2}{\sqrt{1-\frac{v^2}{c^2}}} goes into the database as

E = {{{LATEX_ESCAPE_CHARACTER}}}frac{mc^2}{{{{LATEX_ESCAPE_CHARACTER}}}sqrt{1-{{{LATEX_ESCAPE_CHARACTER}}}frac{v^2}{c^2}}}


They get converted back to slashes before being served via the API endpoints.

Rendering in a React component

I decided to stick with MathJax for rendering the LaTeX. MathJax has some preprocessors that allow you to configure custom delimiters, but I had a hard time making them work inside of my larger JavaScript framework. So I decided to skip the preprocessors and generate the <script type="math/tex"> tags myself.

React escapes output pretty aggressively. This is generally a good thing. But it makes it hard to print script tags and unescaped LaTeX characters. To make it work, I needed to live dangerously. But I didn’t want to print any more raw content than necessary. So I have two components for rendering text that may contain LaTeX. Click the links for the full source code; here’s a summary of what’s going on:

1. FormattedProblem performs a regular expression to find pieces of text that are set off by my custom delimiters. Text within delimiters gets passed to a LaTeX component. Text between LaTeX chunks becomes a span.
2. LaTeX puts content to be formatted as LaTeX into a script tag, as expected by MathJax. Once the component is rendered by React, I then tell MathJax to queue it for (re)processing. See how updateTex() is called both in componentDidMount() and componentDidUpdate(). Getting the MathJax queue logic correct here was hard: I spent a lot of time dealing with unrendered or double-rendered TeX, as well as slow performance when a page contains dozens of LaTeX chunks.

A reminder that React’s dangerouslySetInnerHTML is dangerous. I’m running LaTeX content through WP’s esc_html() before delivering it to the endpoint. And because I control both the endpoint and the client, I can trust that the proper escaping is happening somewhere in the chain.

I use a very slightly modified technique to provide a live preview of formatted LaTeX. Here it is in action:

Pretty cool TeXnology, huh? ROFLMAO

WordCamp Chicago 2016 slides

I just finished giving a talk at WordCamp Chicago titled “Backward Compatibility as a Design Principle”, in which I discussed WordPress’s approach to backward compatibility, how it’s evolved over the years, and its costs and benefits when compared to the alternatives. I’m not sure that the slides are very helpful in isolation, but someone asked me to post them, and I am not one to disappoint my Adoring Fans. Embedded below.

BuddyPress Docs 1.9.0 and Folder support

BuddyPress Docs is one of my more popular WordPress plugins. For years, one of the most popular feature requests has been the ability to sort Docs into folders. Docs 1.9.0, released earlier this week, finally introduced folder functionality.

The feature is pretty cool. When editing a Doc within the context of a group, you can select an existing folder, or create a new one, in which the Doc should appear. Folders can be nested arbitrarily. Breadcrumbs at the top of each Doc and each directory help to orient the reader. And a powerful, AJAX-powered directory interface makes it easy to drill down through the folder hierarchy. (Folders are currently limited to groups, which simplifies the question of where a given folder “lives”. An experimental plugin allows individual users to use folders to organize their personal Docs.)

I’ve got a couple reasons for drawing attention to this release. First, the Folders feature was developed as part of contract work I did for University of Florida Health. They use WordPress and BuddyPress for some of their internal workspaces, and the improvements to BuddyPress Docs have helped them to build a platform customized to their users’ specific needs. My partnership with UF Health is a great instance of a client commissioning a feature that then gets rolled into a publicly available tool – the type of patronage that demonstrates the best parts of free software development as well as IT in the public sector.

A bonus side note: UF Health Web Services is currently hiring a full-time web developer. If you know PHP, and want a chance to work with cool people on cool projects – including WordPress and BuddyPress – check out the job listing.

The other fun thing about this release is that it’s the first major release of Docs where I’ve worked closely with David Cavins, master luthier and BuddyPress maven. He’s a longtime contributor to Docs, and has done huge amounts of excellent work to bring 1.9.0 to fruition. Many thanks to David for his work on the release!

WordCamp NYC talk on the history of the WordPress taxonomy component

At WordCamp NYC 2015, I was pleased to present on the history of the WordPress taxonomy component. Of all the WordCamp talks I’ve given, this one was the most fun to prepare. I spent days reading through old Trac tickets, the wp-hackers archives, and interview transcripts. The jokes are mostly mediocre and the Photoshopping is (mostly intentionally) lousy, but I think the talk turned out OK. Check it out below, or on wordpress.tv.

How to cherry-pick comments using Subversion (for WordPress at least)

I generally use git-svn for my work on WordPress and related projects. On occasion, I’m forced to touch svn directly. This occurs most often when merging commits from trunk to a stable branch: it’s best to do this in a way that preserves svn:mergeinfo, and git-svn doesn’t do it properly. Nearly every time I have to do these merges, I have to relearn the process. Here’s a reminder for Future Me.

1. Commit as you normally do to trunk. Make note of the revision number. (Say, 36040.)
2. Create a new svn checkout of the branch, or svn up your existing one. Eg: $svn co https://develop.svn.wordpress.org/branches/4.4 /path/to/wp44/ 3. From within the branch checkout: $ svn merge -c 36040 ^/trunk  In other words: merge the specific commit (or comma-separated commits) from trunk of the same repo.
4. This will leave you with a dirty index (or whatever svn calls it). Use svn status to verify. Run unit tests, etc.

One of the suckers

At the beginning of the 4.2 dev cycle, I was made a permanent committer for WordPress. (Cool!) Here’s how Andrew Nacin summed up the promotion:

Suckers

Any maintainer of a large free software project will recognize the aptness of the “suckers” comment. I’ve spent more than five years as a leader on the BuddyPress project, and during that time I’ve developed a devotion to that project that sometimes feels like more like servitude than volunteerism. I feel a personal responsibility to the users of BuddyPress, which manifests itself as a pang of guilt about every bug, every missing feature, every unsatisfied customer. This guilt is part (not all, but not an insignificant portion) of what keeps me committed to the project. The same guilt sometimes makes me feel like a sucker.

With BuddyPress, the “sucker” aspects have been consistently counterbalanced by (a) the belief that my work is having a broad positive impact, and (b) the positive effect that contribution has on my reputation and my marketability. Point (b) is one that I’ve blathered on about on multiple occasions. My ability to make money as a consultant is directly tied to the reputation that I’ve earned doing free software work.

Since I started investing lots of energy into WordPress itself about nine months ago, I’ve been forced to reassess the “reputation cycle” somewhat. WordPress powers tens of millions of sites, as compared to BuddyPress’s tens of thousands. By this metric, the free software work I’ve done since September has had a far wider impact than any work before then.

You’d think that this increased impact would translate directly into a corresponding increase in reputation. Yet it hasn’t, at least not if you measure reputation by job offers. This time last year, I was turning away probably three or four freelance gigs per week, and probably one or two full-time jobs per month. I don’t get a fifth of that amount today.

This is not to complain – I am doing just fine, thankyouverymuch. But it’s worth thinking about the possible reasons why increased impact and productivity don’t always translate to more work. I think there are two big ones.

First: The further down you get in the technology stack, the more it feels like plumbing. I’ve built a number of major user-facing features for BuddyPress over the years, the kind of things someone might point to and say, “Boone built that”. My work on WordPress has been a lot less visible, and thus a lot harder to brag about. This is the truth in Nacin’s “suckers” comment. No one praises the plumber when the toilet flushes properly, but boy do people complain when it does not.

Second: I joined the WP core team around the time I stopped using Twitter. The time I used to waste looking at Twitter is time I now sink into doing meaningful work on projects like WordPress, which is to say that I’m more productive now than I used to be. But I’m talking less, and talking less about myself. If the Boone brand is in decline, it’s a publicity problem, not a quality problem. This is another side of the “sucker” coin: the greater the percentage of your time you devote to being productive, the less time you have to talk about how productive you are.

The number one reason to start using git: interactive staging

Git is an excellent tool for collaborative software development in a number of ways. In my opinion, Git’s most valuable feature – which also happens to be one of its most underappreciated – is interactive staging: git add -i and git add -p. If you are a software development who does not use Git, or if you use Git but do not use this feature, you should not write another line of code until you learn it.

When working on a large project, there is no principle more important than that changesets are atomic. Combining more than one distinct change into a commit is highly detrimental to a project’s history. Muddled changesets are hell for future developers on the project, including Future You. The usefulness of log, blame, bisect, and (not to get hyperbolic or anything) version control itself, depends on commits being logically small units.

But, of course, software is not built linearly. You’re writing cool new feature X, but you notice that in order to complete X, you need to fix bug Y. The problem is that you’re already halfway finished with X. You could: generate a diff file, stash/reset your changes, fix Y, commit the fix to Y, reapply the X diff, and continue work on X. This is a pain. Or you could: commit a large changeset that contains X and Y. But this is lousy for the reasons described above. Ideally, you would be able to fix Y, continue with X, and then sort out the changesets just before commit.

This is what Git’s interactive staging lets you do. Start with a clean index. Begin hacking on X. Stumble on Y. Fix Y. Complete X. Then use Git’s stage to commit Y and X separately.

Here’s a real example. (What follows is a pretty basic situation. git add -p is a subset of git add -i, which is full of whiz-bang goodies.) Say I’m working on fixing some poor localization in BuddyPress. While doing so, I notice that some PHPDoc is missing. So I fix both issues, and git diff shows the following:

Now, I know that Changesets Should Be Atomic, so I want to commit the documentation changes separately from the bug fix. So, instead of git commit -a or git add . – which blindly stage all changes – I jump into patch mode with git add -p:

Git has determined what it considers to be the first “hunk” of changes. (Hunks: it takes one to know one.) I’m then asked whether I want to “Stage this hunk”. Normally I’ll just type y or n, but in this case I see that Git hasn’t made the hunks small enough, so I choose s, for “split”:

The hunk is now split into two. The first sub-hunk is the documentation fix. Let’s stage it with y. The second sub-hunk is the code fix, which we’ll skip for now with n. Rinse and repeat with the second big hunk. git status now shows the following:

The “changes ready to be committed” are the documentation fixes. These can be viewed with git diff --cached. The “changes not staged for commit” are the code changes. These can be viewed with git diff. I’m now ready to commit the first set of changes:

(I typed that commit message in my \$EDITORvim – which you can’t see in this screenshot.) Then I’ll repeat the routine for the next set of changes, this time staging them all:

Git’s interactive staging is relatively simple, but will completely change the way you work, in ways that will result in meaningful improvements to your projects over time. This is, IMHO, the #1 reason to use Git, and if you’re not using it – because you’re an SVN user, because you didn’t know about it, because your GUI client doesn’t support it – it’s important enough that you ought to think about changing your toolkit.