The Summer of Jeff

Roll-your-own blogging software

Posted in programming by Jeff on July 28, 2010

A few years ago, I moved my GMAT Hacks website off of WordPress.  I wrote the code for a basic blogging platform using Python, and since then, I’ve built it out a little more.  A content management system (CMS) does not have to be complicated.  And as Blogger, WordPress and others have shown, the platform is generic; I’ve used almost exactly the same code to drive GMAT Hacks, GRE HQ, and the College Splits blog.

I’m not going to share any code, but I will walk through the process.  It’s very intuitive in Python, and I’m sure it’s similarly straightforward in many other languages.

The various blogging platforms offer much of what you’ve ever need, and they are generally easy to use and modify.  That’s why this very post is on a WordPress blog.  But especially in the case of my GMAT site, I needed more flexibility to automatically update special types of pages and create customized sidebars and footers.

The basics

A do-it-yourself CMS can consist of as few as three files:

  1. A database of some sort that, for each post, stores title, body text, date, and other information, possibly including category, tag(s), and anything else you can dream up.  I think this is simple enough not to require further explanation.
  2. A simple script to add items to the database and edit items already in the database.
  3. A script that uses the site template to generate pages for each post using the database.

Let’s look at the last two in a little more detail.

Add and edit items

This is also pretty simple.  The one aspect worth mentioning is that it’s important to validate everything going in–if you’re ambitious, you may even try to validate the HTML in the posts themselves.  I limit myself to checking that a new post’s category already exists and that the post’s date is valid.  (On some of my sites, I use YYYYMMDDX as a post ID, where X is an index to differentiate multiple posts from the same day.)

Generate the site

Depending on how thorough you want to be, this script can get fairly complex.  (Mine is currently a bit longer than 400 lines of code.)

At its most basic, it’s just a matter of creating a page for each post and uploading each one.  Here are a few more things it can do:

  1. Uploading some pages to multiple locations.  For example, you might want your most recent post to be the front page on your site.  So the page “category/recent-post.html” might also be uploaded as “index.html.”
  2. Creating tables of contents.  On my GMAT site, I have a chronological TOC, a site-wide TOC with posts sorted by category, and an individual TOC page for each category.  I also have a “recent posts” page with a chronological list of the last 10 posts.  The script creates each one every time I update the site.
  3. Creating an xml feed.  You might include the last 5 or 10 posts, and you have the flexibility to include all, some, or none of the body of the post.
  4. Updating pages outside of the blog hierarchy.  The first page of my GMAT site does not contain a blog post, but the script creates it, so that it always links to the most recent post.
  5. Varying sidebar and footer content.  My footers are generally predictable–they link to the previous post, as well as a category-specific table of contents.  But I also include an ad for one of my books.  (For some posts, I randomly rotate the ads with each site update.)  With full control over the script, I can put an ad for my math book on math-related pages and my verbal book on verbal-related pages.  I also have a few different sidebars for different purposes.  In a few cases, I even drop the footer content altogether.

Unlike the way, say, WordPress does things, every single page on all of my blogs is a flat html page.  This ensures that the pages are very fast to load regardless of traffic level.  It takes a little more time to generate and upload the site–for instance, my GMAT site now consists of over 300 pages, and most of them have a ‘recent posts’ box on the sidebar, so they must be updated each time I add a new post.  But with a decent connection, that only takes a couple of minutes.

The way my script works, it sorts the database by date, then goes through the list twice.  The first time, it creates the various TOCs, the XML feed, and the list of recent posts that I use in the sidebar.  The second time, it creates the individual pages.

If you have questions about the process, feel free to post them in the comments.


Comments Off on Roll-your-own blogging software

%d bloggers like this: