In some cases we want to use Drupal's node reference module to normalize our data. This has certain implications though as information is split up in different nodes. This will influence our search results because our search indexer will only look at each node individually. To explain this problem better we'll start with an example and provide a solution to 3 problems that occur with when spreading information over different nodes (connected via node references).
In our example we're going to publish information about trainings that our company is giving. Our company is giving trainings on different topics and hosts these on different locations. More specifically a training on a certain topic might be given multiple times and on different locations. The description of the training topic is always the same though, and there's also a set of recurring locations.
When our editors are inputting trainings, we don't want them to always input the same topic description and address information. We'll "normalize" our data and we'll store our data in 3 node types: "Training", "Training topic" and "Training location".
For the training topics, we'll store the name of the topic and a description of it.
For the training locations, we'll store the venue name and its address.
A training then only consists of the date the location is given, the training topic and the training location. To store the training topic and location we'll use node reference fields.
This datamodel should not be visible to the visitors of our site. We only want them to deal with "trainings". So to the public we'll only provide training pages.
When we now search through our site (using Drupal's core search, Search Lucene API, Apache Solr Search Integration or any other search module) using the a certain venue as keyword, our search will only return "Training location" nodes. If we search for a certain training topic it will only return "Training topic" nodes. We don't want this. We always want the "Training" node to pop up.
How can we do this?
Drupal allows us to add some extra content that needs to be indexed with each node. It does so by providing us the update index operation in hook_nodeapi. We'll use that construct to add the content of our referenced nodes when indexing the training nodes. To build an indexable text version of our referenced node, we'll use the same technique Drupal uses to index each node itself (see _node_index_node ).
/**
* Implementation of hook_nodeapi().
*/
function my_module_nodeapi(&$node, $op, $a3 = NULL, $page = FALSE) {
// adding node reference field's full node content to index for trainings
if ($op == 'update index' && $node->type == 'training') {
$text = '';
$fields = array('field_training_topic', 'field_training_location');
foreach ($fields as $field) {
if ($nid = $node->{$field}[0]['nid']) {
$ref_node = node_load($nid);
$ref_node->build_mode = NODE_BUILD_SEARCH_INDEX;
$ref_node = node_build_content($ref_node, FALSE, FALSE);
$ref_node->body = drupal_render($ref_node->content);
$text .= '<h2>'. check_plain($ref_node->title) .'</h2>'. $ref_node->body;
}
}
return $text;
}
}
That's it. Our training nodes are now indexed with their location and topic information.
Good question. Whenever one of the referenced nodes is updated, the indexed version of our parent node becomes invalid. To solve this we'll reindex the parent node as soon as the referenced node is updated or deleted. Drupal has a handy function for this called search_touch_node. This function will trigger the search indexer to reindex a particular node.
/**
* Implementation of hook_nodeapi().
*/
function my_module_nodeapi(&$node, $op, $a3 = NULL, $page = FALSE) {
// updating node references content changes in search index
if (in_array($node->type, array('training_topic', 'training_location')) && in_array($op, array('update', 'delete'))) {
$result = db_query("SELECT nid FROM {content_type_training} WHERE field_training_topic_nid = %d OR field_training_location_nid = %d", $node->nid, $node->nid);
while ($row = db_fetch_object($result)) {
search_touch_node($row->nid);
}
}
}
We still have one problem now though. We have our relevant "training" nodes popping up in our search results when we're searching for a certain topic, but we're also seeing the "training topic" nodes themselves appearing. As told before, we don't want the user to see these.
To hide these from our search results, we can use the Search restrict module. If you're using Search Lucene API or Apache Solr Search Integration, you don't even need this module as it's already possible in those modules to restrict certain content types from being indexed.
Comments
Thanks a lot for this post as well as the code shots - much appreciated!
~ Jim Summer
Thanks Davy! This will come in handy some day
Davy, great article, thank you very much for this. It proves your expertise and openness to the Drupal community.
Thanks for writing this down, very useful!
Post new comment