Drupal coder

Improving search results when working with node references in Drupal

In some cases we want to use Drupal's node reference module to normalize our data. This has certain implications though as information is split up in different nodes. This will influence our search results because our search indexer will only look at each node individually. To explain this problem better we'll start with an example and provide a solution to 3 problems that occur with when spreading information over different nodes (connected via node references).

An example: trainings

In our example we're going to publish information about trainings that our company is giving. Our company is giving trainings on different topics and hosts these on different locations. More specifically a training on a certain topic might be given multiple times and on different locations. The description of the training topic is always the same though, and there's also a set of recurring locations.

When our editors are inputting trainings, we don't want them to always input the same topic description and address information. We'll "normalize" our data and we'll store our data in 3 node types: "Training", "Training topic" and "Training location".

For the training topics, we'll store the name of the topic and a description of it.
For the training locations, we'll store the venue name and its address.

A training then only consists of the date the location is given, the training topic and the training location. To store the training topic and location we'll use node reference fields.

This datamodel should not be visible to the visitors of our site. We only want them to deal with "trainings". So to the public we'll only provide training pages.

The problem

When we now search through our site (using Drupal's core search, Search Lucene API, Apache Solr Search Integration or any other search module) using the a certain venue as keyword, our search will only return "Training location" nodes. If we search for a certain training topic it will only return "Training topic" nodes. We don't want this. We always want the "Training" node to pop up.

How can we do this?

The solution

Remark: A better solution has been posted in a follow-up article.

Drupal allows us to add some extra content that needs to be indexed with each node. It does so by providing us the update index operation in hook_nodeapi. We'll use that construct to add the content of our referenced nodes when indexing the training nodes. To build an indexable text version of our referenced node, we'll use the same technique Drupal uses to index each node itself (see _node_index_node ).

/**
 * Implementation of hook_nodeapi().
 */
function my_module_nodeapi(&$node, $op, $a3 = NULL, $page = FALSE) {
  // adding node reference field's full node content to index for trainings
  if ($op == 'update index' && $node->type == 'training') {
    $text = '';
    
    $fields = array('field_training_topic', 'field_training_location');
    foreach ($fields as $field) {
      if ($nid = $node->{$field}[0]['nid']) {
        $ref_node = node_load($nid);
        $ref_node->build_mode = NODE_BUILD_SEARCH_INDEX;
        $ref_node = node_build_content($ref_node, FALSE, FALSE);
        $ref_node->body = drupal_render($ref_node->content);
        $text .= '<h2>'. check_plain($ref_node->title) .'</h2>'. $ref_node->body;
      }
    }
    
    return $text;
  }
}

That's it. Our training nodes are now indexed with their location and topic information.

What happens when the referenced node's content changes?

Good question. Whenever one of the referenced nodes is updated, the indexed version of our parent node becomes invalid. To solve this we'll reindex the parent node as soon as the referenced node is updated or deleted. Drupal has a handy function for this called search_touch_node. This function will trigger the search indexer to reindex a particular node.

/**
 * Implementation of hook_nodeapi().
 */
function my_module_nodeapi(&$node, $op, $a3 = NULL, $page = FALSE) { 
  // updating node references content changes in search index
  if (in_array($node->type, array('training_topic', 'training_location')) && in_array($op, array('update', 'delete'))) {
    $result = db_query("SELECT nid FROM {content_type_training} WHERE field_training_topic_nid = %d OR field_training_location_nid = %d", $node->nid, $node->nid);
    while ($row = db_fetch_object($result)) {
      search_touch_node($row->nid);
    }
  }
}

The finishing touch

We still have one problem now though. We have our relevant "training" nodes popping up in our search results when we're searching for a certain topic, but we're also seeing the "training topic" nodes themselves appearing. As told before, we don't want the user to see these.

To hide these from our search results, we can use the Search restrict module. If you're using Search Lucene API or Apache Solr Search Integration, you don't even need this module as it's already possible in those modules to restrict certain content types from being indexed.

April 08, 2010Drupal, search

Comments

Thanks a lot for this post as well as the code shots - much appreciated!
~ Jim Summer

Thanks Davy! This will come in handy some day

Davy, great article, thank you very much for this. It proves your expertise and openness to the Drupal community.

Thanks for writing this down, very useful!

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options