Mark S. Rasmussen improve.dk
Apr 21
2014

Having just migrated from Wordpress to Hexo, I quickly realized I forgot something. I forgot to redirect my old permalinks to the new ones…

A permalink ought to live for the duration of your content, and most importantly, never change. However, having been through a number of different blog engines, not all of them support the same permalink structures, and might not even support redirecting old ones. As such, throgh the years my posts have ended up with multiple permalinks:

As you can see, I’ve dropped both the /archive/ and the /blog/ prefixes, as well as the dates. Redirecting old incoming links to the new ones was easy enough when I ran Wordpress on Apache. All it requires were a couple of lines in the .htaccess file:

# Redirect old permalink structure
<IfModule mod_rewrite.c>
	RewriteEngine On
	RewriteRule ^archive/([0-9]{4})/([0-9]{2})/([0-9]{2})/([^\.]+)\.aspx$ http://improve.dk/$4/ [NC,R=301,L]
	RewriteRule ^blog/([0-9]{4})/([0-9]{2})/([0-9]{2})/([^\.]+)$ http://improve.dk/$4/ [NC,R=301,L]
</IfModule>

Static Woes

Since I’ve migrated to Hexo it’s not as simple, unfortunately. I no longer host my site on Apache, but on GitHub Pages. GitHub Pages only allow static files to be served, so I’m no longer able to utilize the .htaccess rewriting rules. There’s also no server-side functionality available, so I can’t even manually send out a 302-redirect, needed to preserve my incoming links SEO value.

What I ended up doing was to write a small script that would parse my Wordpress backup file and then recreate the /blog/ and /archive/ directories, as if the posts were actually stored there:

string template = @"layout: false
---
<!DOCTYPE html>
<html>
	<head>
		<title>Redirecting to [Title]</title>
  		<link rel=""canonical"" href=""[Permalink]""/>
		<meta http-equiv=""content-type"" content=""text/html; charset=utf-8"" />
		<meta http-equiv=""refresh"" content=""0;url=[Permalink]"" />
	</head>
	<body>
		Redirecting to <a href=""[Permalink]"">[Title]</a>...
	</body>
</html>";

void Main()
{
	var outputPath = @"D:\Projects\improve.dk (GIT)\source\";
	var xmlPath = @"D:\Projects\improve.dk (GIT)\marksrasmussen-blog.wordpress.2014-03-08.xml";
	var xml = File.ReadAllText(xmlPath);
	var xd = new XmlDocument();
	xd.LoadXml(xml);
	
	var nsmgr = new XmlNamespaceManager(xd.NameTable);
	nsmgr.AddNamespace("content", "http://purl.org/rss/1.0/modules/content/");
	nsmgr.AddNamespace("wp", "http://wordpress.org/export/1.2/");
	
	foreach (XmlNode item in xd.SelectNodes("//item"))
	{
		var title = item.SelectSingleNode("title").InnerText;
		var date = Convert.ToDateTime(item.SelectSingleNode("pubDate").InnerText);
		var slug = item.SelectSingleNode("wp:post_name", nsmgr).InnerText;
		
		var indexHtml = template
			.Replace("[Permalink]", "http://improve.dk/" + slug + "/")
			.Replace("[Title]", HttpUtility.HtmlEncode(title));
		
		// First create the /archive/ entry
		var outputFolder = Path.Combine(outputPath, "archive", date.Year.ToString(), date.Month.ToString().PadLeft(2, '0'), date.Day.ToString().PadLeft(2, '0'), slug + ".aspx");
		var indexPath = Path.Combine(outputFolder, "index.html");
		Directory.CreateDirectory(outputFolder);
		File.WriteAllText(indexPath, indexHtml);
		
		// Then the /blog/ entry
		outputFolder = Path.Combine(outputPath, "blog", date.Year.ToString(), date.Month.ToString().PadLeft(2, '0'), date.Day.ToString().PadLeft(2, '0'), slug);
		indexPath = Path.Combine(outputFolder, "index.html");
		Directory.CreateDirectory(outputFolder);
		File.WriteAllText(indexPath, indexHtml);
	}
}

Now the post stored directly in the root, while placeholders have been put in place in the old /blog/ and /archive/ directories. The placeholder code is very simple:

layout: false
---
<!DOCTYPE html>
<html>
	<head>
		<title>Redirecting to TxF presentation materials</title>
  		<link rel="canonical" href="http://improve.dk/txf-presentation-materials/"/>
		<meta http-equiv="content-type" content="text/html; charset=utf-8" />
		<meta http-equiv="refresh" content="0;url=http://improve.dk/txf-presentation-materials/" />
	</head>
	<body>
		Redirecting to <a href="http://improve.dk/txf-presentation-materials/">TxF presentation materials</a>...
	</body>
</html>

It’s simply a small script that contains a meta refresh tag that sends the user on to the new URL. By utilizing the ´rel=”canonical”` meta tag, I ensure that this retains the SEO value as if I had performed a 302 redirect.

Going Forward

Creating the placeholder files is a one-off task, seeing as I’ll only ever need to redirect posts that precede the time when I changed my URL structure to contain neither the /blog/ and /archive/ prefixes, nor the dates. All posts from the beginning of 2013 were published using the current URL scheme, which I intend to keep for the foreseeable future.

Mark S. Rasmussen
I'm the CTO at iPaper where I cuddle with databases, mold code and maintain the overall technical & team responsibility. I'm an avid speaker at user groups & conferences. I love life, motorcycles, photography and all things technical. Say hi on Twitter, write me an email or look me up on LinkedIn.