WordPress Migration: Getting all the authors

This is the fifth in a series of articles on my migration to WordPress. In this post, I’ll talk about how I imported the authors from my website into the WordPress database. This step was important to me because I am not the only contributor to The FreeBSD Diary.

The WordPress Structure

First, let’s look at the WordPress structure to see what is available to us. WordPress uses MySQL as a database. Although MySQL was one of the first open source databases I used, it is not my database of preference. I so wish WordPress had native support for PostgreSQL. That said, here is the table structure for the users table in WordPress 3.4.2:

mysql> show columns from wp_users;
+---------------------+---------------------+------+-----+---------------------+----------------+
| Field               | Type                | Null | Key | Default             | Extra          |
+---------------------+---------------------+------+-----+---------------------+----------------+
| ID                  | bigint(20) unsigned | NO   | PRI | NULL                | auto_increment |
| user_login          | varchar(60)         | NO   | UNI |                     |                |
| user_pass           | varchar(64)         | NO   |     |                     |                |
| user_nicename       | varchar(50)         | NO   | MUL |                     |                |
| user_email          | varchar(100)        | NO   |     |                     |                |
| user_url            | varchar(100)        | NO   |     |                     |                |
| user_registered     | datetime            | NO   |     | 0000-00-00 00:00:00 |                |
| user_activation_key | varchar(60)         | NO   |     |                     |                |
| user_status         | int(11)             | NO   |     | 0                   |                |
| display_name        | varchar(250)        | NO   |     |                     |                |
+---------------------+---------------------+------+-----+---------------------+----------------+
10 rows in set (0.00 sec)

mysql> 

What’s in my database?

If you compare the above to what my old website contains:

freebsddiary.org=# select distinct(author) from articles;        author        ----------------------
 
 Gerard Samuel
 Randall S. Ehren
 Niklas Saers
 Mike Miller
 
 Rick Oliver
 John J. Rushford Jr
 Marc Fonvieille
 rtroy@springsips.com
 Murat Balaban
 Leon Dang
 Liz
 Kim Scarborough
 Will Andrews
 James
 Chris Silva
 Roy Miller
 Eric Anderson
 Lou Rinaldi
 Harald Schmalzbauer
 Jim Weeks
 Daniel Bretoi
(23 rows)

freebsddiary.org=# 

It’s quite clear now… all I have to supply is display_name. Everything else will have to be made up from that. So… here goes.

I first tried this SQL statement for an author

INSERT INTO wp_users 
   (user_login, user_pass, user_nicename, user_email, user_url, user_activation_key, user_status, display_name)
VALUES
   (author, '', author, '', '', '', 0, author);

What is user_status you might ask? It is a deprecated field, no longer used, so I’m setting it to zero, the value I found for other rows in this table.

What about those empty values?

You’ll see that in the above list of authors, some of the rows are empty. Let’s take a closer look at that.

freebsddiary.org=# select distinct('"' || author || '"') as author, count(*) from articles group by 1;
         author         | count 
------------------------+-------
 "John J. Rushford Jr"  |     1
 "Marc Fonvieille"      |     1
 "Harald Schmalzbauer"  |     1
                        |   476
 "Will Andrews"         |     1
 "Jim Weeks"            |     1
 "Kim Scarborough"      |     1
 "rtroy@springsips.com" |     1
 "Chris Silva"          |     6
 "Roy Miller"           |     1
 "Gerard Samuel"        |     1
 "Leon Dang"            |     1
 "Liz"                  |     2
 "James"                |     1
 "Niklas Saers"         |     1
 ""                     |   148
 "Randall S. Ehren"     |     1
 "Lou Rinaldi"          |     1
 "Rick Oliver"          |     1
 "Eric Anderson"        |     1
 "Mike Miller"          |     1
 "Murat Balaban"        |     1
 "Daniel Bretoi"        |     1
(23 rows)

freebsddiary.org=#

From this, you can see that 476 entries are NULL (line 7), and 148 entries are blank (line 19). These articles were [most likely] written by myself. The following SQL updates those rows. The final results are shown below that…

freebsddiary.org=# update articles set author = 'Dan Langille' where author is null;
UPDATE 476
freebsddiary.org=# update articles set author = 'Dan Langille' where author = '';
UPDATE 148
freebsddiary.org=# select distinct('"' || author || '"') as author, count(*) from articles group by 1;
         author         | count 
------------------------+-------
 "John J. Rushford Jr"  |     1
 "Marc Fonvieille"      |     1
 "Harald Schmalzbauer"  |     1
 "Will Andrews"         |     1
 "Jim Weeks"            |     1
 "Kim Scarborough"      |     1
 "rtroy@springsips.com" |     1
 "Chris Silva"          |     6
 "Roy Miller"           |     1
 "Gerard Samuel"        |     1
 "Leon Dang"            |     1
 "Liz"                  |     2
 "James"                |     1
 "Niklas Saers"         |     1
 "Randall S. Ehren"     |     1
 "Lou Rinaldi"          |     1
 "Dan Langille"         |   624
 "Rick Oliver"          |     1
 "Eric Anderson"        |     1
 "Mike Miller"          |     1
 "Murat Balaban"        |     1
 "Daniel Bretoi"        |     1
(22 rows)

freebsddiary.org=# 

OK, now we have something we can work with.

Creating the SQL for importing into MySQL

Let’s try this query for creating the MySQL SQL.

freebsddiary.org=# select distinct('(''' || author || ''', '''', ''' || author || ''', '''', '''',  '''', 0, ''' || author || '''),') from articles where author != 'Dan Langille';
                                           ?column?                                            
-----------------------------------------------------------------------------------------------
 ('Daniel Bretoi', '', 'Daniel Bretoi', '', '',  '', 0, 'Daniel Bretoi'),
 ('Rick Oliver', '', 'Rick Oliver', '', '',  '', 0, 'Rick Oliver'),
 ('Leon Dang', '', 'Leon Dang', '', '',  '', 0, 'Leon Dang'),
 ('Mike Miller', '', 'Mike Miller', '', '',  '', 0, 'Mike Miller'),
 ('Kim Scarborough', '', 'Kim Scarborough', '', '',  '', 0, 'Kim Scarborough'),
 ('Roy Miller', '', 'Roy Miller', '', '',  '', 0, 'Roy Miller'),
 ('Marc Fonvieille', '', 'Marc Fonvieille', '', '',  '', 0, 'Marc Fonvieille'),
 ('Gerard Samuel', '', 'Gerard Samuel', '', '',  '', 0, 'Gerard Samuel'),
 ('Jim Weeks', '', 'Jim Weeks', '', '',  '', 0, 'Jim Weeks'),
 ('Chris Silva', '', 'Chris Silva', '', '',  '', 0, 'Chris Silva'),
 ('Liz', '', 'Liz', '', '',  '', 0, 'Liz'),
 ('Randall S. Ehren', '', 'Randall S. Ehren', '', '',  '', 0, 'Randall S. Ehren'),
 ('Harald Schmalzbauer', '', 'Harald Schmalzbauer', '', '',  '', 0, 'Harald Schmalzbauer'),
 ('Eric Anderson', '', 'Eric Anderson', '', '',  '', 0, 'Eric Anderson'),
 ('James', '', 'James', '', '',  '', 0, 'James'),
 ('Will Andrews', '', 'Will Andrews', '', '',  '', 0, 'Will Andrews'),
 ('Lou Rinaldi', '', 'Lou Rinaldi', '', '',  '', 0, 'Lou Rinaldi'),
 ('Niklas Saers', '', 'Niklas Saers', '', '',  '', 0, 'Niklas Saers'),
 ('Murat Balaban', '', 'Murat Balaban', '', '',  '', 0, 'Murat Balaban'),
 ('John J. Rushford Jr', '', 'John J. Rushford Jr', '', '',  '', 0, 'John J. Rushford Jr'),
 ('rtroy@springsips.com', '', 'rtroy@springsips.com', '', '',  '', 0, 'rtroy@springsips.com'),
(21 rows)

freebsddiary.org=# 

You will see that I am not importing my own name into the database. It’s already there.

Now, combing that with the SQL I showed you earlier, yields this insert statement, which you issue directly within MySQL. It will load the authors into your WordPress database.

INSERT INTO wp_users 
   (user_login, user_pass, user_nicename, user_email, user_url, user_activation_key, user_status, display_name)
VALUES
 ('Daniel Bretoi', '', 'Daniel Bretoi', '', '',  '', 0, 'Daniel Bretoi'),
 ('Rick Oliver', '', 'Rick Oliver', '', '',  '', 0, 'Rick Oliver'),
 ('Leon Dang', '', 'Leon Dang', '', '',  '', 0, 'Leon Dang'),
 ('Mike Miller', '', 'Mike Miller', '', '',  '', 0, 'Mike Miller'),
 ('Kim Scarborough', '', 'Kim Scarborough', '', '',  '', 0, 'Kim Scarborough'),
 ('Roy Miller', '', 'Roy Miller', '', '',  '', 0, 'Roy Miller'),
 ('Marc Fonvieille', '', 'Marc Fonvieille', '', '',  '', 0, 'Marc Fonvieille'),
 ('Gerard Samuel', '', 'Gerard Samuel', '', '',  '', 0, 'Gerard Samuel'),
 ('Jim Weeks', '', 'Jim Weeks', '', '',  '', 0, 'Jim Weeks'),
 ('Chris Silva', '', 'Chris Silva', '', '',  '', 0, 'Chris Silva'),
 ('Liz', '', 'Liz', '', '',  '', 0, 'Liz'),
 ('Randall S. Ehren', '', 'Randall S. Ehren', '', '',  '', 0, 'Randall S. Ehren'),
 ('Harald Schmalzbauer', '', 'Harald Schmalzbauer', '', '',  '', 0, 'Harald Schmalzbauer'),
 ('Eric Anderson', '', 'Eric Anderson', '', '',  '', 0, 'Eric Anderson'),
 ('James', '', 'James', '', '',  '', 0, 'James'),
 ('Will Andrews', '', 'Will Andrews', '', '',  '', 0, 'Will Andrews'),
 ('Lou Rinaldi', '', 'Lou Rinaldi', '', '',  '', 0, 'Lou Rinaldi'),
 ('Niklas Saers', '', 'Niklas Saers', '', '',  '', 0, 'Niklas Saers'),
 ('Murat Balaban', '', 'Murat Balaban', '', '',  '', 0, 'Murat Balaban'),
 ('John J. Rushford Jr', '', 'John J. Rushford Jr', '', '',  '', 0, 'John J. Rushford Jr'),
 ('rtroy@springsips.com', '', 'rtroy@springsips.com', '', '',  '', 0, 'rtroy@springsips.com');

After the import, if you look at the WordPress authors, you’ll see they have no roles.

Altering the RSS generation script

Now that WordPress contains the authors, let’s modify the script which exports the website. We will add the author to the output. Here’s the patch (well, it’s not a valid patch, but you can see the changes:

-   $sql = "SELECT description, name, filename 
+   $sql = "SELECT id, description, name, filename, actual_date as date, author

+      $HTML .= '    <author>' .  $myrow["author"] . '</author>' . "\n";

After running the script, we get the new file:

$ fetch http://example.freebsddiary.org/news.php
fetch: http://example.freebsddiary.org/news.php: size of remote file is not known
news.php                                              4355 kB 2406 kBps

Let’s count up the authors in there:

$ grep '<author>' news.php | sort | uniq | wc -l
      22

Yes, that’s the right number… Compare it to the SQL used to extract the list of authors from the original website.

Getting author connected to the post

Now we have to modify the rss-importer script. First, let me show you the code changes I made. This file can be found at wp-content/plugins/rss-importer/rss-importer.php.

$ diff -ruN rss-importer.php.ORIGINAL rss-importer.php
--- rss-importer.php.ORIGINAL	2012-10-28 18:19:32.000000000 +0000
+++ rss-importer.php	2012-10-28 17:58:05.000000000 +0000
@@ -44,6 +44,16 @@
 	var $posts = array ();
 	var $file;
 
+function get_all_authors() {
+	global $wpdb;
+
+	$query = "SELECT ID, user_login FROM $wpdb->users";
+
+	return $wpdb->get_results( $wpdb->prepare($query, $this->authors ));
+
+	return 0;
+}
+
 	function header() {
 		echo '<div class="wrap">';
 		screen_icon();
@@ -72,6 +82,16 @@
 		$datalines = file($this->file); // Read the file into an array
 		$importdata = implode('', $datalines); // squish it
 		$importdata = str_replace(array ("\r\n", "\r"), "\n", $importdata);
+		
+		$authors = $this->get_all_authors();
+		echo ' the authors are <pre?' . print_r($authors, true) . '</pre>';
+		$authorLookkup = array();
+		foreach($authors as $author)
+		{
+		        $authorLookup[$author->user_login] = $author->ID;
+                }
+		echo ' the authors are <pre?' . print_r($authorLookup, true) . '</pre>';
+                
 
 		preg_match_all('|<item>(.*?)</item>|is', $importdata, $this->posts);
 		$this->posts = $this->posts[1];
@@ -129,7 +149,22 @@
 			$post_content = str_replace('<br>', '<br />', $post_content);
 			$post_content = str_replace('<hr>', '<hr />', $post_content);
 
+                        // by default, all posts belong to the user id = 1.
 			$post_author = 1;
+
+			preg_match('|<author.*?>(.*?)</author>|is', $post, $authorName);
+			if ($authorName)
+			{
+				$authorName = $wpdb->escape(trim($authorName[1]));
+				if (isset($authorLookup[$authorName]))
+				{
+				        $post_author = $authorLookup[$authorName];
+				}
+			}
+			else
+				$authorName = '';
+
+
 			$post_status = 'publish';
 			$this->posts[$index] = compact('post_author', 'post_date', 'post_date_gmt', 'post_content', 'post_title', 'post_status', 'guid', 'categories');
 			$index++;

Let me explain a few things about that.

  1. Line 11 gets a list of the authors already in the system, and their IDs. Later on, we convert this to a lookup array
  2. Lines 29-33 convert that list of authors to array, which we use later to lookup a given author and find their ID
  3. Line 45 looks at the current post being processed, and extracts the contents of the author tag. This code was derived directly from that used to process the guid field, found elsewhere in rss-importer.php.
  4. Line 51 grabs the author ID and places it into $post_author. By default, this variable contains 1. We change it to the right value. From here, the rss-importer script continues on its merry way.

OK, that’s the authors imported

OK, that’s the authors imported. You can see the work in progress at http://wp.freebsddiary.org/. NOTE: eventually, that URL will redirect to http://www.freebsddiary.org/ so don’t get worried if you see that happening now.

The next step: importing all the comments for each article. That may prove more interesting that previously expected. And by interesting, I mean difficult.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top