I recently migrated this blog from Blogger, which was posting via SFTP to a shared hosting provider, to a self-managed Typo installation. One of the biggest reasons I didn’t do this sooner was that I didn’t want all of my old URLs (and the links that point to them) to stop working. After all, what is the point of a permalink if it stops working one day? It isn’t that I have such high page rank with Google or anything, but I didn’t want to have to start over from scratch.
I wanted to be sure that if someone either typed in a URL that used to work, or followed a link from an outside page, they would end viewing the content that they expected to land on. I thought some about having some kind of logging, so I can be aware of when people use these old links, and where they are coming from. I also thought about having some mechanism for alerting the user that the link they followed is old.
[ad]
I explored a couple of options to accomplish this goal. My first thought was to have a perl script act as a 404 handler, and deal with any broken links, possibly responding with HTTP redirect codes.That would have the advantage of very flexible logging (since I would write the whole thing). I could add some parameter to the query string to alert the ending page to display some message perhaps. That part might be tricky.
I thought about it some more, and came to the conclusion that I should not alert the user that they are using an old link. After all, if my redirect thing works, why is the link they used any more or less valid?
I did a little more digging, and discovered the awesome power of Apache’s mod_rewrite. As it turns out, Apache is already set up to do exactly what I need it to do, with its RewriteMap directive. With RewriteMap, you essentially point apache to a file containing the mappings (or a script that generates the mappings as its output) and Apache handles the rest. For example, here is the relevant snippet from from my Apache vhost conf file:
RewriteEngine On
RewriteMap migration txt:/home/pkaeding/blog.kaeding.name/redirect.map
RewriteCond ${migration:$1} /.+
RewriteRule ^(/.*) ${migration:$1}
The first line enables the rewrite engine, and allows the next lines to have any effect. The second line initializes my mapping file (which I nick-named ‘migration’ for the purpose of referring to it in the next couple of lines). Read on to see a snippet from this mapping file.
The third line defines the condition in which the mapping will be used. Essentially, this says: take the resource that was requested, and give it as input to the ‘migration’ RewriteMap. If the output from that mapping matches the regex pattern ‘/.+’, then continue with the rewite.
The fourth line instructs Apache to again take the resource requested, and feed it to the ‘migration’ map. The output should then replace everything in the parenthesis in the first argument, (which, in this case, is everything after and including the leading slash).
The syntax of the mapping file is very simple:
/archive/2006_04_01_archive.html /articles/2006/04
/archive/2006_05_01_archive.html /articles/2006/05
/archive/2006_06_01_archive.html /articles/2006/06
/archive/2007_02_01_archive.html /articles/2007/02
/archive/2008_03_01_archive.html /articles/2008/03
Just put the old URL, followed by some whitespace (I used ‘tab’), followed by the new URL. You can start comments with ‘#’.
As for my logging requirement, I will just use Google Analytics. To be honest, analyzing an additional set of logs to see which URLs people are using is just something I don’t have time for. And, with this approach, no HTTP redirects are issued (so if someone goes to an old URL, the browser will still display that old URL). But, since I decided that there was no such thing as ‘old’ URLs, any URL is as good as any other, and this is fine.
For more information on mod_rewrite, be sure to read the apache docs
I’d be curious to hear about any other approaches anyone may have tried in the comments.
3 Comments