I launched this blog in January 2003 and since then have posted over 1100 entries. It’s been a learning experience. I’ve changed the blog’s file structure a few times. If you blog and you use a blogging content management system like [Movable Type](http://www.movabletype.org) you know that each entry has a “permalink.” That URL is where the entry will live on your site’s server, organized in a certain way. I’ve changed that way 3 times now. That means that there are a lot of permalinks out there that are pointing to pages on this site that aren’t where they used to be. Or there were multiple copies of the same page stored and if a person hits a page that’s no longer linked to the database they can’t leave comments.

Erik Barzeski came up with a handy little [search script](http://nslog.com/archives/2005/11/14/improving_the_404_search.php) which helps him redirect lost visitors to the page they meant to find. We were talking via IM and I mentioned that I could use that script as I knew I had a big problem with 404s (file not found error) on this site from older links.

His script that he posted on his site didn’t work well right off the bat here, but through trial and error we got it working. Now, instead of a boring “Error 404” that leads nowhere visitors who are looking for a page that no longer exists are automatically sent to my [search page](http://www.momathome.com/mt32/mt-search.cgi) with the name of the page they thought they were looking for filled in the search field. If there’s only one result, they’re sent to that page. Even better, the script emails me at a separate address I set up letting me know what page wasn’t found. When I see the same terms over and over again, I’m doing a .htaccess redirect to make the path from A to B easier for visitors.

If you’re hosted by [Dreamhost](http://www.dreamhost.com/rewards.cgi?momathome) you have to do one more thing to make this work, as outlined on [this page](http://rephrase.net/days/05/01/php-cgi-no-input-file-specified-fix). Dreamhost prefers users to run PHP as CGI for security reasons, which breaks custom 404 pages. To get around this, just add these lines to your .htaccess file:

RewriteEngine on
Rewritecond %{DOCUMENT_ROOT}%{REQUEST_URI} !-f
RewriteRule \.php$ /path/to/custom/404.php
ErrorDocument 404 /path/to/custom/404.php

If you don’t know what a .htaccess file is or how to edit one, then don’t do it.

With Erik’s permission to post, [here’s the modified script that works on this site.](http://www.momathome.com/viewfromhome/images/404.phps) Very cool. Thanks again, [Erik.](http://www.nslog.com) I’m more than happy to help you troubleshoot your pages in the demon browser (IE 6) anytime!

Tags: , ,

Advertisement

One response to “”

  1. More code can be found here for those that are curious.

    Judi had some language I wanted to clarify…

    1) My blog used to use the old entry ID style URLs (http://nslog.com/archives/000183.php). So, my NSLog code (see here) checks to see if the 404 URL ends with a number, and if so, forwards the user to that article. Failing that, it does a search (title, entry body, etc. – standard MT search) to find the search term(s) in the entries. If only one entry is found in the resulting Web page, which is fetched with the file() call, the user is forwarded. Failing a single match on either of those two, the search results are displayed.

    2) Judi uses Dreamhost and doesn’t have access to file(), so that whole block of code is commented out. She also only cared about entries that had been previously “dirified” to 15 characters – that limit is now 30 for her. So, her search looks not for the entry ID, but for a basename that matches. It then “dirifies” the title of the entry, chops it to 30 letters, appends the file extension and her archive format (yyyy/mm) and sends the user onward. If that fails, the user is redirected to the search page itself, whether it generates one result or 20 (or zero).

    My way lets me hit “http://nslog.com/103” to view entry 103 while Judi’s method lets her hit “http://momathome.com/macworld_mag” to view the entry that has “macworld_mag” as part of the basename.

    So, be aware of these differences if you’re looking to use the code…