search

PHP filter and search index woes

PHP filter is a great filter. It gives you a lot of flexibility. But it has a lot of disadvantages too. One of them is that it makes your code harder to debug.

One of those hard debug sessions occurred lately when the following happened. Cron always stopped running when it was going to index this one node. It also redirected to some page. Weird.

What was happening? The node was a page. The body was having the PHP filter on and the following code was in that body:

<?php drupal_goto("node/add/story"); ?>

What does this have to do with search indexing you ask? Well, when the search indexer is going to index a node, it first runs all filters on your body, and it's the result of that that is getting indexed. So if you're writing PHP code, that code is run when the indexer comes along. Even "non-text generating" code, like the redirect in this case.

Written on April 23, 2009 at 11:56, tagged as caveats, Drupal, PHP, PHP filter, search

Setting the date result of search results to the node creation date

The date Drupal shows on search results is not the date the node was created. Instead it is the date the node was indexed by the search indexer.
Most of the time this is the wanted behavior. Your nodes get indexed when cron is run and if you run cron each 5 or 15 minutes this date is within 15 minutes of the creation date.

But the date is far off when import your nodes from another system. Sometimes this date has been created years ago, and then the time between creation and indexing is long.

As always there's an easy fix for this in Drupal. The search results are themed via theme_search_item and we all know how to override a theme function.

We just have to change

  if ($item['date']) {
    $info[] = format_date($item['date'], 'small');
  }

to

  if(isset($item['node'])) {
  	$info[] = format_date($item['node']->created, 'small');
  } elseif ($item['date']) {
    $info[] = format_date($item['date'], 'small');
  }

So we get:

function phptemplate_search_item($item, $type) {
  $output = ' <h3><a href="'. check_url($item['link']) .'">'. check_plain($item['title']) .'</a></h3>';
  $info = array();
  if ($item['type']) {
    $info[] = check_plain($item['type']);
  }
  if ($item['user']) {
    $info[] = $item['user'];
  }
  /*if ($item['date']) {
    $info[] = format_date($item['date'], 'small');
  }*/
  if(isset($item['node'])) {
  	$info[] = format_date($item['node']->created, 'small');
  } elseif ($item['date']) {
    $info[] = format_date($item['date'], 'small');
  }
  if (is_array($item['extra'])) {
    $info = array_merge($info, $item['extra']);
  }
  $output .= ' <div>'. ($item['snippet'] ? '<p>'. $item['snippet'] .'</p>' : '') .'<p class="search-info">'. implode(' - ', $info) .'</p></div>';
  return $output;
}
Written on October 12, 2008 at 09:09, tagged as Drupal, search, things you might have overlooked

Drupal script to automatically add terms to nodes

I had a client recently who had a vocabulary of all car brands and thousands of news items on cars. He wanted to tag all news items (implemented as nodes) with the card brand(s) they were about. No Drupal module available (yet) to do that, so I've written a script myself.

I think the script (which you will find further on) is very easy to understand. I iterate over all terms in the vocabulary and for each term I do a Drupal search. I then iterate over all search results an see if the score of the result is high enough to mark it as a good search result for this term.

Easy! Right? Well, yes. But there were a few things to take in account.

I had to modify the default Drupal do_search function because it was written for paged search result display for one search operation per page request.

To make the function display all search results was fairly easy. I just had to remove the pager_query function line and replace it with a db_query one.

  // Do actual search query
  //$result = pager_query("SELECT * FROM temp_search_results", 10, 0, $count_query); // This is the line that got removed from the original function implementation.
  $result = db_query("SELECT * FROM temp_search_results"); // This is what we'll use now. We don't want any paging. We want it all!

Allowing the function to make more search operations in one script run was a bit trickier. By default, the function makes use of a temporary table to build its results. This temporary table is automatically cleared after the request. But we want to clear it earlier. So I added a few lines at the end of the function.

  // Drop the temporary table to allow multiple searches in one run
  if (($count = db_result(db_query('SELECT COUNT(*) FROM temp_search_results'))) > 0) {
    db_query_temporary("DROP table temp_search_results");    
  }
  if (($count = db_result(db_query('SELECT COUNT(*) FROM temp_search_sids'))) > 0) {
    db_query_temporary("DROP table temp_search_sids");    
  }

Below you find the complete script. You can also download it. The script defines a few parameters. You can specify

  • the vocabulary from which the terms should be taken
  • the minimum score a result has to have to be taken into account
  • a run mode (verbose/run)
  • the node type you want to run the script upon

To run the script, put it in the root of your Drupal installation. To get the best results, play with the MIN_SCORE parameter.

Please make sure all your nodes are indexed by Drupal's search before you run this script. Otherwise you won't get any results.

<?php

define('VOCABULARY', 1); // Vocabulary from which the terms should be taken
define('MIN_SCORE', 0.3); // Minimum score for the search index. Results lower then this are not taken into account

define('VERBOSE', TRUE); // TRUE = Verbose run : Display what nodes will get what terms
define('RUN', TRUE); // TRUE = Fill in the terms in Drupal / FALSE = Don't fill them in, we want to see the verbose information first

define('NODE_TYPE', 'news'); // Only nodes of this node type will be taken into account

// Bootstrapping Drupal
require_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

// Logging in as admin user
global $user;
$user->uid = 1;

// Getting all terms from our vocabulary
$sql_terms = "SELECT tid, name FROM {term_data} WHERE vid = %d";
$result_terms = db_query($sql_terms, VOCABULARY);

// Iterating over all terms
while($term = db_fetch_object($result_terms)) {
  
  // Do a search in Drupal
  $search_results = custom_do_search($term->name, 'node');
  
  // Iterating over all results
  foreach($search_results as $item) {
    
    // We only want results with a decent score
    if($item->score > MIN_SCORE) {
      
      // Fetch the node info
      $sql_node = "SELECT nid, type, title FROM {node} WHERE nid = %d";
      $result_node = db_query($sql_node, $item->sid);
      $node = db_fetch_object($result_node);
      
      // Is the node of the correct type
      if($node->type == NODE_TYPE) {
        
        // If we're in verbose mode, print term, node title and search score
        if(VERBOSE) {
          print $term->name . " | " . $node->title . " | " . $item->score . "\n";
        }
        
        // Fill in the terms in Drupal if we need to
        if(RUN) {
          db_query("INSERT INTO {term_node} (nid, tid) VALUES (%d, %d)", $item->sid, $term->tid);
        }
      }
    }
  }
}

function custom_do_search($keywords, $type, $join1 = '', $where1 = '1', $arguments1 = array(), $select2 = 'i.relevance AS score', $join2 = '', $arguments2 = array(), $sort_parameters = 'ORDER BY score DESC') {
  $query = search_parse_query($keywords);

  if ($query[2] == '') {
    form_set_error('keys', t('You must include at least one positive keyword with @count characters or more.', array('@count' => variable_get('minimum_word_size', 3))));
  }
  if ($query === NULL || $query[0] == '' || $query[2] == '') {
    return array();
  }

  // First pass: select all possible matching sids, doing a simple index-based OR matching on the keywords.
  // 'matches' is used to reject those items that cannot possibly match the query.
  $conditions = $where1 .' AND ('. $query[2] .") AND i.type = '%s'";
  $arguments = array_merge($arguments1, $query[3], array($type, $query[4]));
  $result = db_query_temporary("SELECT i.type, i.sid, SUM(i.score * t.count) AS relevance, COUNT(*) AS matches FROM {search_index} i INNER JOIN {search_total} t ON i.word = t.word $join1 WHERE $conditions GROUP BY i.type, i.sid HAVING COUNT(*) >= %d", $arguments, 'temp_search_sids');

  // Calculate maximum relevance, to normalize it
  $normalize = db_result(db_query('SELECT MAX(relevance) FROM temp_search_sids'));
  if (!$normalize) {
    return array();
  }
  $select2 = str_replace('i.relevance', '('. (1.0 / $normalize) .' * i.relevance)', $select2);

  // Second pass: only keep items that match the complicated keywords conditions (phrase search, negative keywords, ...)
  $conditions = '('. $query[0] .')';
  $arguments = array_merge($arguments2, $query[1]);
  $result = db_query_temporary("SELECT i.type, i.sid, $select2 FROM temp_search_sids i INNER JOIN {search_dataset} d ON i.sid = d.sid AND i.type = d.type $join2 WHERE $conditions $sort_parameters", $arguments, 'temp_search_results');
  if (($count = db_result(db_query('SELECT COUNT(*) FROM temp_search_results'))) == 0) {
    return array();
  }
  $count_query = "SELECT $count";

  // Do actual search query
  //$result = pager_query("SELECT * FROM temp_search_results", 10, 0, $count_query); // This is the line that got removed from the original function implementation.
  $result = db_query("SELECT * FROM temp_search_results"); // This is what we'll use now. We don't want any paging. We want it all!
  $results = array();
  while ($item = db_fetch_object($result)) {
    $results[] = $item;
  }

  // Drop the temporary table to allow multiple searches in one run
  if (($count = db_result(db_query('SELECT COUNT(*) FROM temp_search_results'))) > 0) {
    db_query_temporary("DROP table temp_search_results");    
  }
  if (($count = db_result(db_query('SELECT COUNT(*) FROM temp_search_sids'))) > 0) {
    db_query_temporary("DROP table temp_search_sids");    
  }

  return $results;
}

?>
Written on September 29, 2008 at 10:58, tagged as auto tagging, Drupal, scripts, search, taxonomy

About

drupalcoder.com is a blog on all things Drupal in specific and LAMP on OS X in general. It is maintained by Davy Van Den Bremt, a Belgian (Drupal) web developer and designer living in Ghent. The goal of this blog is to log all interesting things that have crossed the writer's path while developing Drupal sites. You can read all about Davy's professional activities on his LinkedIn profile. If you want to get in touch, use the contact form.