Tag Archives: comments

Safely delete spam comments across a large WP network

I’m currently working on a university WordPress network that’s been running for four or five years (an MU veteran!) and has almost 5000 blogs, most of which are defunct (because they’re from previous semesters). Akismet is activated across the network, so there’s not much of a public spam problem. However, even spam comments are stored in the database, and some of the blogs have tens of thousands of spam comments sitting in their tables. I’m going to implement a couple of tricks to keep this from happening in the future (a lightweight honeypot for non-logged-in users, tell Akismet to auto-delete spam comments on old posts). But for now, I’ve got to clean up this mess, because the very large comment and commentmeta tables are causing resource issues.

I wrote a simple script that gradually cycles through all the blogs on the network and deletes comments that have been marked as spam by Akismet. Here it is, with some comments afterward:

Notes:

  • The number of blogs is hardcoded (4980)
  • The ‘qw_delete_in_progress’ key is a throttle, ensuring that only one of these routines is running at a time. You might call this the poor man’s poor man’s cron.
  • I’ve limited it to 10 comments per pageload, but you could change that if you wanted
  • Put it in an mu-plugins file. When it’s finished running (check the ‘qw_delete_next_blog’ flag in the wp_sitemeta table – it’s done if it’s greater than the total number of blogs on the system), be sure to remove it, or at least comment out the register_shutdown_function line.

Use at your own risk – I’m posting here primarily for my own records :)

New BuddyPress plugin: BP Include Non-Member Comments

I wrote a plugin this afternoon that solves a small but potentially annoying limitation of BuddyPress: its inability to show comments from non-members in the sitewide activity stream. In a streak of extreme creativity, I dubbed the plugin “BP Include Non-Member Comments”. Read more about it, and download it for your own use, here.

BP Include Non-Member Comments

By default, BuddyPress does not include comments from non-members (or non-logged-in users more generally) in the sitewide activity stream. This plugin records activity items for those comments.

BP Include Non-Member Comments

The plugin has been tested on version 1.1.3 of BP, as well as the 1.2 release candidate. If you want to use the plugin for 1.1.3 or lower, you will need to uncomment the first few add_action and add_filter lines in the plugin file.

Technical caveat: Non-logged-in commenters have BP user_id 0. When BP creates the activity stream, it decides whether or not to show the Delete button by checking to see whether the user_id for the currently logged in user is the same as the user_id of the person to whom the comment belongs. Presumably, though, you don’t want non-logged-in viewers of the activity stream to be able to delete items from the activity stream at all. BP’s core code is not currently set up to make it easy to remove these buttons, so I employed an ugly fix. If you have changed your theme significantly from the default, you might have to adjust the filter bp_nonmember_comment_content (near the end of the plugin) to remove the button properly.

Download the plugin here.

BP Include Non-Member Comments has been downloaded 1,446 times. Are you using this plugin? Consider a donation.

Version history

1.2.1 – April 15, 2010
Added checks for spam status
Fixed bug that made approved comments from site members appear twice (Thanks, Andrius!)
1.2 – April 6, 2010
Adapted to BuddyPress’s new comment activity recording method
Comment approval now posts to the activity stream as well
1.1 – February 22, 2010
Normalized file structure to latest BP standards (bp_init)
Fixed problem with deprecated bp_post_get_permalink
1.0 – February 7, 2010
Initial release

True cross-platform comment syncing with Disqus and Wordpress

FeedWordPress works well if you want to syndicate content from various sources into a single Wordpress blog. Syndicating comments is, of course, more difficult. I’m finishing up a job for a client who wanted real-time synced comments, and suggested that Disqus might do the trick. I quickly discovered that Disqus is clearly not made to do what I wanted it to do. But, being the cool guy that I am, I hacked something together that is more or less functional.

Here were the requirements: Comments on a blog post needed to be synchronized between the source blogs and the hub blog. Readers had to be able to comment in both places and have the comments sync. While I’d be using Wordpress to create the hub blog, the source blogs would be hosted on various platforms: Tumblr, Typepad, Blogger, self-hosted Wordpress. (The distributed requirement is especially important. If the blogs were all on the same installation of WPMU, the job would be trivial and would not require a third-party solution like Disqus.) Because bloggers would be coming from different platforms, I not only had to be able to accomodate those platforms, but I also had to make sure that the system would work with the platforms’ stock configuration. That is, since I (and, generally speaking, the bloggers) don’t have access to the platform code, all custom modifications need to happen at the hub blog.

I don’t particularly recommend that anyone try to replicate what I’ve done here. But hopefully it will point the way toward what might be a viable third-party system for true comment syncing.

The details

Here’s my strategy with regard to Disqus. If all the source blogs were registered to the same Disqus Comments account (ie corresponding to a single shortname), then they’d all have the same forum_key, which is to say they’d be accessible by the same API request. Thus the strategy is to make Disqus unable to distinguish between API calls from the source blogs (which are, recall, making stock API calls to Disqus) and API calls from the corresponding posts on the hub blog.

I installed the Disqus Comment System plugin for the Wordpress hub blog and registered with the same credentials that would be given to the source blogs. When feeds starting syndicating to the hub blog, however, I found that the comment sections on the source post weren’t matching the comment section on the hub post. The URL for each comment thread’s RSS feed showed me why: Disqus indexes a forum’s comment thread based on some post information that it gets from the client platform, and each platform was formatting the information in a different way.

First problem: The Wordpress Disqus plugin uses a post variable called $thread_meta, which is set in disqus-comment-system/lib/api.php thus:

$thread_meta = $post->ID . ' ' . $post->guid;

Disqus would then create a comment thread based on this string. The problem is that $post->ID is the post ID number for the hub blog, and has nothing to do with the source blog (which, depending on platform, does not include post ids in its API request at all). So the source blog’s thread would be identified as test_post (for example) while the hub blog would be 34_test_post. I replaced the code above with

$thread_meta = $post->guid

which manages to stay pretty consistent across platforms. (NB: The same change has to be made on the source blog version of the Disqus plugin, if the source blog is running a self-hosted installation of Wordpress.)

Second problem: Getting a stable and unique identifier for each post thread is only the first step. You also need to make sure that the identifier is concatenated correctly when the actual API request is made. Disqus comment sections work by loading a piece of Javascript that is concatenated from an API request to disqus.com for the proper thread, then finds the comment section on the post page, and replaces the native comment code with the code returned from disqus.com. But I found (again, by looking at the URL for the RSS feeds) that each platform was making the request a little bit differently. At the end of disqus-comments-system/comments.php, the stock WP plugin reads

<script type="text/javascript" charset="utf-8" src="http://<?php echo strtolower(get_option('disqus_forum_url')); ?>.<?php echo DISQUS_DOMAIN; ?>/disqus.js?v=2.0&amp;slug=<?php echo $dsq_response['thread_slug']; ?>&amp;pver=<?php echo $dsq_version; ?>"></script>

Through a fair degree of trial and error, I replaced it with a big block of code that figures out (via some metadata created by FeedWordPress) which platform that particular post came from, and then modifies the javascript accordingly:

<?php $ok = get_post_meta($post->ID, 'syndication_permalink'); ?>
<?php $name = get_post_meta($post->ID, 'syndication_source'); ?>
<?php $name = str_replace(" ", "_", $name[0]); ?>
<?php $theslug = $dsq_response['thread_slug']; ?>
<?php $theslug = str_replace( '8211_', '', $theslug ); /* Removes em dash UGH */ ?> 
<?php if ( preg_match( "/_[0-9]{2}$/", $dsq_response['thread_slug'] ) ) {
		$thesluglen = strlen($dsq_response['thread_slug']);
		$theslug = substr( $dsq_response['thread_slug'], 0, $thesluglen-3 );
		}
	?>
<?php if ( strpos( $ok[0], 'typepad' ) ) : ?>
	<script type="text/javascript" charset="utf-8" src="http://<?php echo strtolower(get_option('disqus_forum_url')); ?>.<?php echo DISQUS_DOMAIN; ?>/disqus.js?v=2.0&amp;slug=<?php echo $dsq_response['thread_slug'],'_',strtolower($name); ?>&amp;pname=wordpress&amp;pver=<?php echo $dsq_version; ?>"></script>
<?php elseif ( strpos( $ok[0], 'tumblr' ) ) : ?>
	<script type="text/javascript" charset="utf-8" src="http://<?php echo strtolower(get_option('disqus_forum_url')); ?>.<?php echo DISQUS_DOMAIN; ?>/disqus.js?v=2.0&amp;slug=<?php echo strtolower($name),'_',$dsq_response['thread_slug']; ?>&amp;pname=wordpress&amp;pver=<?php echo $dsq_version; ?>"></script>
<?php else : ?>
	<script type="text/javascript" charset="utf-8" src="http://<?php echo strtolower(get_option('disqus_forum_url')); ?>.<?php echo DISQUS_DOMAIN; ?>/disqus.js?v=2.0&amp;slug=<?php echo $theslug; ?>&amp;pver=<?php echo $dsq_version; ?>"></script>
<?php endif; ?>

In the first few lines, I do a bit of string manipulation to standardize the post title (ie the unique post identifier from Disqus’s point of view). Then I do some very ugly stuff. Wordpress was converting the em-dash (which was all over the client’s blog) into an ASCII code, which was screwing up the post identifier, so I just str_replaced it out. The next part (with the preg_match) is a bit tricky: in some cases, when Disqus receives requests from two blog posts with the same title (as is the case with the source blog post and the hub blog post), it differentiates between the two by assigning an apparently random two digit number to the second request it gets. Since the syndicated Disqus request will generally be sent after the source blog’s Disqus request (in virtue of its being syndicated), and therefore will be the one to be appended with the two-digit number, I figured I could just look for ‘_xx’ (where xx is a two digit number) at the end of the post title and strip it off. Ugly ugly ugly, but it works. The rest of the code just rearranges the javascript according to which platform the original post comes from, which in some cases requires the addition of source blog name.

With all this in place, I’ve got the following: A blogger posts on, say, his Tumblr blog, where Disqus is enabled. The post is fetched by FeedWordPress on the hub blog, where Disqus is also enabled, with the same user credentials. Then the hacks listed above trick Disqus into thinking that the syndicated post is the very same as the source post, so that the very same comments section is sent to each post. Kind of like magic, when it actually works.

Clearly, though, it would be much, much easier with a system that is built to do what I’m trying to do. That means, in part, having a single system for identifying posts across platforms (some appropriate htmlization of the post name, I presume) and then a single, unified system for making API requests.

Removing previous comment edits from BuddyPress activity – a plugin

Another BuddyPress plugin for you. This one makes sure that you don’t get multiple versions of the same comment in your activity streams when a comment is edited. Sounds like a small thing, but it was kind of a bear to program. Anyway, check it out at the CUNY Academic Commons Dev Blog.