News

Welcome to End Point’s blog

Ongoing observations by End Point people

SEO friendly redirects in Interchange

In the past, I've had a few Interchange clients that would like the ability to be able to have their site do a SEO friendly 301 redirect to a new page for different reasons. It could be because either a product had gone out of stock and wasn't going to return or they completely reworked their url structures to be more SEO friendly and wanted the link juice to transfer to the new URLs. The normal way to handle this kind of request is to set up a bunch of Apache rewrite rules.

There were a few issues with going that route. The main issue is that to add or remove rules would mean that we would have to restart or reload Apache every time a change was made. The clients don't normally have the access to do this so it meant they would have to contact me to do it. Another issue was that they also don't have the access to modify the Apache virtual host file to add and remove rules so again, they would have to contact me to do it. To avoid the editing issue, we could have put the rules in a .htaccess file and allow them to modify it that way, but this can present its own challenges because some text editors and FTP clients don't handle hidden files very well. The other issue is that even though overall basic rewrite rules are pretty easy to copy, paste and reuse, they still can have nasty side effects if not done properly and can also be difficult to troubleshoot so I devised a way to allow them to be able to manage their 301 redirects using a simple database table and Interchange's Autoload directive.

The database table is a very simple table with two fields. I called them old_url and new_url with the primary key being old_url. The Autoload directive accepts a list of subroutines as its arguments so this requires us to create two different GlobalSubs. One to actually do the redirect and one to check the database and see if we need to redirect. The redirect sub is really straight forward and looks like this:

sub redirect {
   my ($url, $status) = @_;
   $status ||= 302;
   $Vend::StatusLine = qq|Status: $status moved\nLocation: $url\n|;
   $::Pragma->{download} = 1;
   my $body = '';
   ::response($body);
   $Vend::Sent = 1;
   return 1;
}

The code for the sub that checks to see if we need to redirect looks like this:

sub redirect_old_links {
   my $db = Vend::Data::database_exists_ref('page_redirects');
   my $dbh = $db->dbh();
   my $current_url = $::Tag->env({ arg => "REQUEST_URI" });
   my $normal_server = $::Variable->{NORMAL_SERVER};
   if ( ! exists $::Scratch->{redirects} ) {
       my $sth = $dbh->prepare(q{select * from page_redirects});
       my $rc  = $sth->execute();
       while ( my ($old,$new) = $sth->fetchrow_array() ) {
           $::Scratch->{redirects}{"$old"} = $new;
       }
       $sth->finish();
   }
   if ( exists $::Scratch->{redirects}  ) {
       if ( exists $::Scratch->{redirects}{"$current_url"} ) {
           my $path = $normal_server.$::Scratch->{redirects}{"$current_url"};
           my $Sub = Vend::Subs->new;
           $Sub->redirect($path, '301');
           return;
       } else {
          return;
       }
   }
}

We normally create these as two different files and put them into our own directory structure under the Interchange directory called custom/GlobalSub and then add this, include custom/GlobalSub/*.sub, to the interchange.cfg file to make sure they get loaded when Interchange restarts. After those files are loaded, you'll need to tell the catalog that you want it to Autoload this subroutine and to do that you use the Autoload directive in your catalog.cfg file like this:

Autoload redirect_old_links

After modifying your catalog.cfg file, you will need to reload your catalog to ensure to change takes effect. Once these things are in place, you should just be able to add data into the page_redirects table and start a new session and it will redirect you properly. When I was working on the system, I just created an entry that redirected /cgi-bin/vlink/redirect_test.html to /cgi-bin/vlink/index.html so I could ensure that it was redirecting me properly.

9 comments:

Jon Jensen said...

That makes sense. Just to be clear, your table is not actually storing full URLs with http://hostname/ etc., but just the URI starting with /path/to/wherver, right?

A couple of questions about the specific implementation:

What's the reason for interpolating the strings of the hash keys such as $hashref->{"$old"} instead of simply using the string as a hash key directly, as $hashref->{$old}?

And wouldn't it be more efficient to have your SELECT statement fetch only the relevant URI for $current_url instead of the whole table which you then search via a hash? You'd then be reading at most 1 record instead of the whole table each time.

I once did something similar in mod_perl and had 10,000 or more mappings so I definitely wouldn't have wanted to read the whole table on each request.

Ron Phipps said...

Hey Jon,

I think that Richard is reading the entire table in on the first access and then on future reads he's looking up the values from a hash in scratch.

Richard Templet said...

Jon,

Correct, the data in the table is only the URI. It's been a while since I wrote the original but I think the purpose behind using the quotes in the hash key was because URIs with hyphens were getting subtracted and not working properly. Ron is correct on the database query section, the first page load it creates $Scratch->{redirects} then each page load after that, it reads from the hashref instead of hitting the database each time.

Jon Jensen said...

Ron, actually, now I notice that the table's being read into $Scratch, which gets stored in the session -- each individual session. That actually seems worse to me! Richard, what's your line of reasoning? If you've only got a handful of redirects it probably doesn't matter much, but with very many this doesn't seem like a good approach.

Jon Jensen said...

On the quoting in Perl, there is no difference between $hashref->{"$someval"} and $hashref->{$someval} except that the quoted one is slower. It definitely would not do anything to hyphens. That's easy to confirm in a little test script.

Richard Templet said...

The reasoning behind using the session was to limit the amount of database queries. I figured a trade off of some session bloat versus another database query per page view was worth it. The biggest user of this only has about 300 records in their table so I didn't feel the bloat was so bad that it would cripple Interchange. I know Interchange has the ability to create 'memory' tables and that might be a better way to do it but I've never used those on a production system so I don't if they are good or bad.

Jon Jensen said...

This is a good case for using Interchange's memory tables. They get loaded up once, and all the reads are cheap after that. You wouldn't want to use them with a huge table, but only 300 records like this would be fine.

The only problem in either case is having it pick up updates to the table.

Richard Templet said...

Right because any changes to the table would require a restart of Interchange to take affect, correct?

Jon Jensen said...

Yeah, and the way you're doing it now the list is frozen in each session, so you'd have to purge that before they'd get any changes.

Probably better to use memcached or something lightweight for this than anything built into Interchange, actually.