How to create URL aliases in Drupal without path module

Creating pretty urls or permanent links in Drupal is easy. Really easy. This functionality comes out of the box with the Path module. And by adding the contributed Pathauto module you can make your life easier by letting Drupal generate the pretty urls automatically based on some properties of your post (like the title).

But there's another way of doing this in Drupal. Drupal provides a mechanism in code by means of the custom_url_rewrite_inbound and custom_url_rewrite_outbound functions. Using these wisely may give you some performance gain. Let's see how you can use these.

How does it work?

The custom_url_rewrite_inbound and custom_url_rewrite_outbound functions are regular PHP functions. They're not Drupal hooks. You can look at them as being translation/mapping functions. Each time Drupal is looking for an alias of a certain path, it calls custom_url_rewrite_outbound and passes in the path. Each time Drupal is looking for the original path for a certain alias, it calls custom_url_rewrite_inbound. You can implement these functions by putting them in your settings.php file.

An example

Suppose I want to replace rewrite all node urls (node/1, node/2, ...) to include the title for SEO reasons to (content/1/this-is-the-title-of-the-node, content/2/this-is-the-title-of-the-other-node). I'll explain later on why you have to include the node id.

Now when I see a url that starts with "content" and is followed by a slash and number, I will assume this is a node and the number is the node id.

When I see a url that starts with "node" and is followed by a slash and number, I will translate this to a url starting with "content/" and followed by the node id.

In code this translates to:

function custom_url_rewrite_inbound(&$result, $path, $path_language) {
  if (arg(0, $path) == 'content' && is_numeric(arg(1, $path)) && is_slug(arg(2, $path))) {
    $result = sprintf('node/%d', arg(1));
  }
}

function custom_url_rewrite_outbound(&$path, &$options, $original_path) {
  if (arg(0, $path) == 'node' && is_numeric(arg(1, $path)) && !arg(2, $path)) {
    $title = db_result(db_query("SELECT title FROM {node} WHERE nid = %d", arg(1, $path)));
    if ($title) {
      $path = sprintf('content/%d/%s', arg(1, $path), slug($title));
    }
  }
}

The slug and is_slug methods are included below. slug converts a string to a new string usable in a url (by replacing spaces with dashes, removing non-alphanumeric characters, ...).

A better example

All our node urls are now starting with "content". But on most sites we have multiple node types. We then may want to start urls of story nodes with "stories/", pages with "pages/", news items with "news/". The code then becomes the following.

function custom_url_rewrite_inbound(&$result, $path, $path_language) {
  if (in_array(arg(0, $path), array('stories', 'pages')) && is_numeric(arg(1, $path)) && is_slug(arg(2, $path))) {
    $result = sprintf('node/%d', arg(1));
  }
}

function custom_url_rewrite_outbound(&$path, &$options, $original_path) {
  if (arg(0, $path) == 'node' && is_numeric(arg(1, $path)) && !arg(2, $path)) {
    $node = db_fetch_object(db_query("SELECT title, type FROM {node} WHERE nid = %d", arg(1, $path)));
    if ($node) {
      $prefix_mapping = array(
        'story' => 'stories',
        'page' => 'pages',
      );
      if ($prefix = $prefix_mapping[$node->type]) {
        $path = sprintf('%s/%d/%s', $prefix, arg(1, $path), slug($node->title));
      }
    }
  }
}

Performance gain

When you enable path module and specify some aliases, you generate an extra database query for each call to drupal_get_path_alias (and drupal_get_normal_path). This function is called on each call to url which is in ran on each call to l. So every link/url on your page might generate a query. For most pages you have a lot of links to nodes, views pages, module urls (hook_menu), ... For nodes you'll probably have an alias, but for views, module urls etc, ... you probably won't. So that's a lot of extra database queries just to conclude there is no alias.

The method specified in the example only generates a database query once we've decided we're dealing with a node url (in case of custom_url_rewrite_outbound). So on pages where we have some urls to non-node pages we're saving some queries compared to path module.

In case of custom_url_rewrite_inbound we have no database queries because we only look at the node id and don't bother with the slug.

So do we have a lot of performance gain? Well, it depends on the case. But it's alway good to consider this strategy.

Caveats

Including node ids in the alias

It is not easily possible to create aliases this way without include the (node) id. This is to prevent naming clashes. Suppose the title for node/1 and node/2 is both test, then we would have node/test. Since we are not storing all aliases like the path module, we have no means to track if there are clashes and generate aliases like node/test-1, node/test-2, ...

Duplicate content

In this specific example, some SEO people might worry about duplicate content issues. Suppose you have an alias stories/1-test for node/1. All urls starting with stories/1- will give you the same page. If you're worried about this, you can add an extra check in custom_url_rewrite_inbound which generates of course an extra database query.

Write lightweight functions

When writing your own implementation be careful that custom_url_rewrite_inbound and custom_url_rewrite_outbound might be called hundreds of times on pages with a lot of urls. So be very careful to keep these functions as lightweight as possible.

Appendix : Implementation of slug helper functions

/**
 * Calculate a slug with a maximum length for a string.
 *
 * @param $string
 *   The string you want to calculate a slug for.
 * @param $length
 *   The maximum length the slug can have.
 * @return
 *   A string representing the slug
 */
function slug($string, $length = -1, $separator = '-') {
  // transliterate
  $string = transliterate($string);

  // lowercase
  $string = strtolower($string);

  // replace non alphanumeric and non underscore charachters by separator
  $string = preg_replace('/[^a-z0-9]/i', $separator, $string);

  // replace multiple occurences of separator by one instance
  $string = preg_replace('/'. preg_quote($separator) .'['. preg_quote($separator) .']*/', $separator, $string);

  // cut off to maximum length
  if ($length > -1 && strlen($string) > $length) {
    $string = substr($string, 0, $length);
  }

  // remove separator from start and end of string
  $string = preg_replace('/'. preg_quote($separator) .'$/', '', $string);
  $string = preg_replace('/^'. preg_quote($separator) .'/', '', $string);

  return $string;
}

/**
 * Transliterate a given string.
 *
 * @param $string
 *   The string you want to transliterate.
 * @return
 *   A string representing the transliterated version of the input string.
 */
function transliterate($string) {
  static $charmap;
  if (!$charmap) {
    $charmap = array(
      // Decompositions for Latin-1 Supplement
      chr(195) . chr(128) => 'A', chr(195) . chr(129) => 'A',
      chr(195) . chr(130) => 'A', chr(195) . chr(131) => 'A',
      chr(195) . chr(132) => 'A', chr(195) . chr(133) => 'A',
      chr(195) . chr(135) => 'C', chr(195) . chr(136) => 'E',
      chr(195) . chr(137) => 'E', chr(195) . chr(138) => 'E',
      chr(195) . chr(139) => 'E', chr(195) . chr(140) => 'I',
      chr(195) . chr(141) => 'I', chr(195) . chr(142) => 'I',
      chr(195) . chr(143) => 'I', chr(195) . chr(145) => 'N',
      chr(195) . chr(146) => 'O', chr(195) . chr(147) => 'O',
      chr(195) . chr(148) => 'O', chr(195) . chr(149) => 'O',
      chr(195) . chr(150) => 'O', chr(195) . chr(153) => 'U',
      chr(195) . chr(154) => 'U', chr(195) . chr(155) => 'U',
      chr(195) . chr(156) => 'U', chr(195) . chr(157) => 'Y',
      chr(195) . chr(159) => 's', chr(195) . chr(160) => 'a',
      chr(195) . chr(161) => 'a', chr(195) . chr(162) => 'a',
      chr(195) . chr(163) => 'a', chr(195) . chr(164) => 'a',
      chr(195) . chr(165) => 'a', chr(195) . chr(167) => 'c',
      chr(195) . chr(168) => 'e', chr(195) . chr(169) => 'e',
      chr(195) . chr(170) => 'e', chr(195) . chr(171) => 'e',
      chr(195) . chr(172) => 'i', chr(195) . chr(173) => 'i',
      chr(195) . chr(174) => 'i', chr(195) . chr(175) => 'i',
      chr(195) . chr(177) => 'n', chr(195) . chr(178) => 'o',
      chr(195) . chr(179) => 'o', chr(195) . chr(180) => 'o',
      chr(195) . chr(181) => 'o', chr(195) . chr(182) => 'o',
      chr(195) . chr(182) => 'o', chr(195) . chr(185) => 'u',
      chr(195) . chr(186) => 'u', chr(195) . chr(187) => 'u',
      chr(195) . chr(188) => 'u', chr(195) . chr(189) => 'y',
      chr(195) . chr(191) => 'y',
      // Decompositions for Latin Extended-A
      chr(196) . chr(128) => 'A', chr(196) . chr(129) => 'a',
      chr(196) . chr(130) => 'A', chr(196) . chr(131) => 'a',
      chr(196) . chr(132) => 'A', chr(196) . chr(133) => 'a',
      chr(196) . chr(134) => 'C', chr(196) . chr(135) => 'c',
      chr(196) . chr(136) => 'C', chr(196) . chr(137) => 'c',
      chr(196) . chr(138) => 'C', chr(196) . chr(139) => 'c',
      chr(196) . chr(140) => 'C', chr(196) . chr(141) => 'c',
      chr(196) . chr(142) => 'D', chr(196) . chr(143) => 'd',
      chr(196) . chr(144) => 'D', chr(196) . chr(145) => 'd',
      chr(196) . chr(146) => 'E', chr(196) . chr(147) => 'e',
      chr(196) . chr(148) => 'E', chr(196) . chr(149) => 'e',
      chr(196) . chr(150) => 'E', chr(196) . chr(151) => 'e',
      chr(196) . chr(152) => 'E', chr(196) . chr(153) => 'e',
      chr(196) . chr(154) => 'E', chr(196) . chr(155) => 'e',
      chr(196) . chr(156) => 'G', chr(196) . chr(157) => 'g',
      chr(196) . chr(158) => 'G', chr(196) . chr(159) => 'g',
      chr(196) . chr(160) => 'G', chr(196) . chr(161) => 'g',
      chr(196) . chr(162) => 'G', chr(196) . chr(163) => 'g',
      chr(196) . chr(164) => 'H', chr(196) . chr(165) => 'h',
      chr(196) . chr(166) => 'H', chr(196) . chr(167) => 'h',
      chr(196) . chr(168) => 'I', chr(196) . chr(169) => 'i',
      chr(196) . chr(170) => 'I', chr(196) . chr(171) => 'i',
      chr(196) . chr(172) => 'I', chr(196) . chr(173) => 'i',
      chr(196) . chr(174) => 'I', chr(196) . chr(175) => 'i',
      chr(196) . chr(176) => 'I', chr(196) . chr(177) => 'i',
      chr(196) . chr(178) => 'IJ', chr(196) . chr(179) => 'ij',
      chr(196) . chr(180) => 'J', chr(196) . chr(181) => 'j',
      chr(196) . chr(182) => 'K', chr(196) . chr(183) => 'k',
      chr(196) . chr(184) => 'k', chr(196) . chr(185) => 'L',
      chr(196) . chr(186) => 'l', chr(196) . chr(187) => 'L',
      chr(196) . chr(188) => 'l', chr(196) . chr(189) => 'L',
      chr(196) . chr(190) => 'l', chr(196) . chr(191) => 'L',
      chr(197) . chr(128) => 'l', chr(197) . chr(129) => 'L',
      chr(197) . chr(130) => 'l', chr(197) . chr(131) => 'N',
      chr(197) . chr(132) => 'n', chr(197) . chr(133) => 'N',
      chr(197) . chr(134) => 'n', chr(197) . chr(135) => 'N',
      chr(197) . chr(136) => 'n', chr(197) . chr(137) => 'N',
      chr(197) . chr(138) => 'n', chr(197) . chr(139) => 'N',
      chr(197) . chr(140) => 'O', chr(197) . chr(141) => 'o',
      chr(197) . chr(142) => 'O', chr(197) . chr(143) => 'o',
      chr(197) . chr(144) => 'O', chr(197) . chr(145) => 'o',
      chr(197) . chr(146) => 'OE', chr(197) . chr(147) => 'oe',
      chr(197) . chr(148) => 'R', chr(197) . chr(149) => 'r',
      chr(197) . chr(150) => 'R', chr(197) . chr(151) => 'r',
      chr(197) . chr(152) => 'R', chr(197) . chr(153) => 'r',
      chr(197) . chr(154) => 'S', chr(197) . chr(155) => 's',
      chr(197) . chr(156) => 'S', chr(197) . chr(157) => 's',
      chr(197) . chr(158) => 'S', chr(197) . chr(159) => 's',
      chr(197) . chr(160) => 'S', chr(197) . chr(161) => 's',
      chr(197) . chr(162) => 'T', chr(197) . chr(163) => 't',
      chr(197) . chr(164) => 'T', chr(197) . chr(165) => 't',
      chr(197) . chr(166) => 'T', chr(197) . chr(167) => 't',
      chr(197) . chr(168) => 'U', chr(197) . chr(169) => 'u',
      chr(197) . chr(170) => 'U', chr(197) . chr(171) => 'u',
      chr(197) . chr(172) => 'U', chr(197) . chr(173) => 'u',
      chr(197) . chr(174) => 'U', chr(197) . chr(175) => 'u',
      chr(197) . chr(176) => 'U', chr(197) . chr(177) => 'u',
      chr(197) . chr(178) => 'U', chr(197) . chr(179) => 'u',
      chr(197) . chr(180) => 'W', chr(197) . chr(181) => 'w',
      chr(197) . chr(182) => 'Y', chr(197) . chr(183) => 'y',
      chr(197) . chr(184) => 'Y', chr(197) . chr(185) => 'Z',
      chr(197) . chr(186) => 'z', chr(197) . chr(187) => 'Z',
      chr(197) . chr(188) => 'z', chr(197) . chr(189) => 'Z',
      chr(197) . chr(190) => 'z', chr(197) . chr(191) => 's',
      // Euro Sign
      chr(226) . chr(130) . chr(172) => 'E'
    );
  }

  // transliterate
  return strtr($string, $charmap);
}

function is_slug($str) {
  return $str == slug($str);
}
Written on February 02, 2010 at 14:04, tagged as Drupal, performance, url rewriting

Comments

"... For nodes you'll probably have an alias, but for views, module urls etc, ... you probably won't. So that's a lot of extra database queries just to conclude there is no alias."

Pressflow 6 and Drupal 7 both contain an alias whitelist. Where for example aliases are only searched for taxonomy URLs if you have at least one taxonomy URL with an alias. It uses arg(0) as the key so if you don't alias any /foo/* then aliases won't be searched for on these paths. This eliminates your performance gain.

If you are looking this closely at performance you might just want to install Pressflow.

custom_url_rewrite_inbound() or hook_url_inbound_alter() is run on DRUPAL_BOOTSTRAP_PATH (before most modules are loaded).

custom_url_rewrite_outbound() or hook_url_outbound_alter() is always run with DRUPAL_BOOTSTRAP_FULL (all modules loaded).

The above applies if you are using url_alter or not. The trick is if you want to implement the inbound hooks, you have to make sure your module implements hook_boot() (can be just an empty function) so that it's loaded prior to the full bootstrap.

Nice post. Didn't know that :)

There's an issue with that one though. Not too many people will have this, but I've had to deal with it once.

In _drupal_bootstrap the path system is loaded in DRUPAL_BOOTSTRAP_PATH. This occurs before DRUPAL_BOOTSTRAP_FULL.

If you use the url_alter module, the custom url rewriting functions are only added in the DRUPAL_BOOTSTRAP_FULL step. This might be too late in some cases.

Nice writeup!

You might also consider using the http://drupal.org/project/url_alter module if you want multiple modules to react on incoming / outgoing URLs, or want to add your logic through the admin interface (although that's probably not the best approach)

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

About

drupalcoder.com is a blog on all things Drupal in specific and LAMP on OS X in general. It is maintained by Davy Van Den Bremt, a Belgian (Drupal) web developer and designer living in Ghent. The goal of this blog is to log all interesting things that have crossed the writer's path while developing Drupal sites. You can read all about Davy's professional activities on his LinkedIn profile. If you want to get in touch, use the contact form.