Oct 122012
 

This is the second in a series of articles on my migration to WordPress. In this post, I’ll talk about why I decided to go with RSS-Importer and outline the first steps I took to get my posts into WordPress. These steps did not complete the migration. Rather, they were a proof of concept which led to other tasks to import yet more data into WordPress.

In these posts, I will use the terms post and article to refer to stuff I have written.

The solution I am taking is not suitable for non-technical people. There will be programming to be done and code to be written. You can, of course, hand this set of articles to someone else to do all this.

The FreeBSD Diary: the website I’m importing

The website I want to import into WordPress is The FreeBSD Diary. I’ve been writing about my work with FreeBSD (a freely-available Unix-like operating system) since 1998. It started as a log of what I was doing. The purpose was to keep a record so if a problem arose and I went to seek help, I could accurately tell others what I had done. Eventually, I got better at what I was doing and was able to refer others to my logs when they went to do similar tasks. I enjoyed the writing and got great pleasure from seeing that it was helping others.

The website had always been simple in structure, but as the number of posts grew, I needed a database to avoid updating the main page, the index, and the list of posts by topic. Not only did I have fun setting up that database and the code behind it, in the end, it reduced my workload. At the time, no such package existed to do what I wanted. However, over the years, I grew to like WordPress and was using it on a regular basis for this website and for FreshPorts News. So much so that I was no longer writing much on The FreeBSD Diary. All these factors combined to make me consider a migration to WordPress.

Proof of concept: RSS Importer

The biggest issue: how do I migrate 644 posts from PHP & HTML into WordPress and avoid massive amounts of copy/paste or edit? I look around at the various import tools, and found nothing which would do what I wanted. I started to consider writing my own code to import from my website into WordPress. Then I reconsidered RSS Importer. My website already had an RSS feed. Why not try it?

It is important to note that my RSS feed did not include the article contents. So for the purposes of this exercise, I will create fake content, manually.

If you click on Tools | Import, you will see a list of options for importing from various formats, one of which is ‘RSS – Install the RSS importer to import posts from an RSS feed’. You will need to install that plugin first. How to install plugins is outside the scope of this article.

directory permissions

When using the RSS Importer, you will need to ensure that the directory wp-content/uploads exists and can be written to by your web server. In my case, that directory looks like this:

$ ls -ld uploads
drwxr-xr-x  4 www  www  512 Oct 10 11:58 uploads

This operating system is FreeBSD. Creating the directory and setting the appropriate permissions is outside the scope of this article.

The RSS file layout

After installing the RSS Importer plugin, I started looking at the plugin code. You’ll find it at wp-content/plugins/rss-importer/rss-importer.php. The code uses the preg_match function to parse the RSS input file. Here are the key uses of that function:

 $ grep preg_match rss-importer.php
                preg_match_all('|(.*?)|is', $importdata, $this->posts);
                        preg_match('|(.*?)|is', $post, $post_title);
                        preg_match('|(.*?)|is', $post, $post_date_gmt);
                                preg_match('|(.*?)|is', $post, $post_date_gmt);
                        preg_match_all('|(.*?)|is', $post, $categories);
                                preg_match_all('|(.*?)|is', $post, $categories);
                        preg_match('|(.*?)|is', $post, $guid);
                        preg_match('|(.*?)|is', $post, $post_content);
                                preg_match('|(.*?)|is', $post, $post_content);

From this list, you can identify the key tags from the RSS file which are handled by RSS Importer. Specifically:

  1. item
  2. title
  3. pubdate
  4. dc:date
  5. category
  6. dc:subject
  7. guid
  8. content:encoded
  9. description

Closer reading of the code shows that some tags are aliases for other tags and you don’t want to have both present. I settled on using just these tags:

  1. item
  2. title
  3. dc:date
  4. category
  5. content:encoded

I saved the output from my RSS feed into a file and started to modify the contents. An RSS file also contains some preamble (e.g. rrs version, channel, link) , which aren’t strictly necessary for this import. Thus, my sample, first import, proof of concept RSS file is:

<item>
  <title>This is the article title</title>
  <dc:date>2012-10-12 12:34:56</dc:date>
  <content:encoded>
This is the article content.
  </content:encoded>
  <category>FreeBSD</category>
</item>

To import this RSS file, I clicked on Tools | Import | RSS. I clicked on ‘Choose File’, selected the file into which I had saved the above content, then clicked on ‘Upload file and import’. The results was: 1. Importing post…Done!

The results of that import can be seen here. As you can see, it’s pretty basic. But that’s the point. We start small. First, we prove that it can be done, and then build upon what we know.

You should notice that the date on the import was: 2012-10-12 12:34:56. Compare that with the date/time on the article. Why are they different?

Time zones. The server time is set to UTC (also known as GMT). Thus, to get the time you want, add a timezone offset. In my case, I am 5 hours west of UTC, so I would use: 2012-10-12 12:34:56-05:00

From here, I’ll go on to stripping my existing website of extraneous material (headers, footers, side bars, etc) and then try importing a real article.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive