I had a client recently who had a vocabulary of all car brands and thousands of news items on cars. He wanted to tag all news items (implemented as nodes) with the card brand(s) they were about. No Drupal module available (yet) to do that, so I've written a script myself.
I think the script (which you will find further on) is very easy to understand. I iterate over all terms in the vocabulary and for each term I do a Drupal search. I then iterate over all search results an see if the score of the result is high enough to mark it as a good search result for this term.
Easy! Right? Well, yes. But there were a few things to take in account.
I had to modify the default Drupal do_search function because it was written for paged search result display for one search operation per page request.
To make the function display all search results was fairly easy. I just had to remove the pager_query function line and replace it with a db_query one.
//$result = pager_query("SELECT * FROM temp_search_results", 10, 0, $count_query); // This is the line that got removed from the original function implementation.
$result = db_query("SELECT * FROM temp_search_results"); // This is what we'll use now. We don't want any paging. We want it all!
Allowing the function to make more search operations in one script run was a bit trickier. By default, the function makes use of a temporary table to build its results. This temporary table is automatically cleared after the request. But we want to clear it earlier. So I added a few lines at the end of the function.
if (($count = db_result(db_query('SELECT COUNT(*) FROM temp_search_results'))) > 0) {
db_query_temporary("DROP table temp_search_results");
}
if (($count = db_result(db_query('SELECT COUNT(*) FROM temp_search_sids'))) > 0) {
db_query_temporary("DROP table temp_search_sids");
}
Below you find the complete script. You can also download it. The script defines a few parameters. You can specify
- the vocabulary from which the terms should be taken
- the minimum score a result has to have to be taken into account
- a run mode (verbose/run)
- the node type you want to run the script upon
To run the script, put it in the root of your Drupal installation. To get the best results, play with the MIN_SCORE parameter.
Please make sure all your nodes are indexed by Drupal's search before you run this script. Otherwise you won't get any results.
define('VOCABULARY', 1); // Vocabulary from which the terms should be taken
define('MIN_SCORE', 0.3); // Minimum score for the search index. Results lower then this are not taken into account
define('VERBOSE', TRUE); // TRUE = Verbose run : Display what nodes will get what terms
define('RUN', TRUE); // TRUE = Fill in the terms in Drupal / FALSE = Don't fill them in, we want to see the verbose information first
define('NODE_TYPE', 'news'); // Only nodes of this node type will be taken into account
// Bootstrapping Drupal
require_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
// Logging in as admin user
global $user;
$user->uid = 1;
// Getting all terms from our vocabulary
$sql_terms = "SELECT tid, name FROM {term_data} WHERE vid = %d";
$result_terms = db_query($sql_terms, VOCABULARY);
// Iterating over all terms
while($term = db_fetch_object($result_terms)) {
// Do a search in Drupal
$search_results = custom_do_search($term->name, 'node');
// Iterating over all results
foreach($search_results as $item) {
// We only want results with a decent score
if($item->score > MIN_SCORE) {
// Fetch the node info
$sql_node = "SELECT nid, type, title FROM {node} WHERE nid = %d";
$result_node = db_query($sql_node, $item->sid);
$node = db_fetch_object($result_node);
// Is the node of the correct type
if($node->type == NODE_TYPE) {
// If we're in verbose mode, print term, node title and search score
if(VERBOSE) {
print $term->name . " | " . $node->title . " | " . $item->score . "\n";
}
// Fill in the terms in Drupal if we need to
if(RUN) {
db_query("INSERT INTO {term_node} (nid, tid) VALUES (%d, %d)", $item->sid, $term->tid);
}
}
}
}
}
function custom_do_search($keywords, $type, $join1 = '', $where1 = '1', $arguments1 = array(), $select2 = 'i.relevance AS score', $join2 = '', $arguments2 = array(), $sort_parameters = 'ORDER BY score DESC') {
$query = search_parse_query($keywords);
if ($query[2] == '') {
form_set_error('keys', t('You must include at least one positive keyword with @count characters or more.', array('@count' => variable_get('minimum_word_size', 3))));
}
if ($query === NULL || $query[0] == '' || $query[2] == '') {
return array();
}
// First pass: select all possible matching sids, doing a simple index-based OR matching on the keywords.
// 'matches' is used to reject those items that cannot possibly match the query.
$conditions = $where1 .' AND ('. $query[2] .") AND i.type = '%s'";
$arguments = array_merge($arguments1, $query[3], array($type, $query[4]));
$result = db_query_temporary("SELECT i.type, i.sid, SUM(i.score * t.count) AS relevance, COUNT(*) AS matches FROM {search_index} i INNER JOIN {search_total} t ON i.word = t.word $join1 WHERE $conditions GROUP BY i.type, i.sid HAVING COUNT(*) >= %d", $arguments, 'temp_search_sids');
// Calculate maximum relevance, to normalize it
$normalize = db_result(db_query('SELECT MAX(relevance) FROM temp_search_sids'));
if (!$normalize) {
return array();
}
$select2 = str_replace('i.relevance', '('. (1.0 / $normalize) .' * i.relevance)', $select2);
// Second pass: only keep items that match the complicated keywords conditions (phrase search, negative keywords, ...)
$conditions = '('. $query[0] .')';
$arguments = array_merge($arguments2, $query[1]);
$result = db_query_temporary("SELECT i.type, i.sid, $select2 FROM temp_search_sids i INNER JOIN {search_dataset} d ON i.sid = d.sid AND i.type = d.type $join2 WHERE $conditions $sort_parameters", $arguments, 'temp_search_results');
if (($count = db_result(db_query('SELECT COUNT(*) FROM temp_search_results'))) == 0) {
return array();
}
$count_query = "SELECT $count";
// Do actual search query
//$result = pager_query("SELECT * FROM temp_search_results", 10, 0, $count_query); // This is the line that got removed from the original function implementation.
$result = db_query("SELECT * FROM temp_search_results"); // This is what we'll use now. We don't want any paging. We want it all!
$results = array();
while ($item = db_fetch_object($result)) {
$results[] = $item;
}
// Drop the temporary table to allow multiple searches in one run
if (($count = db_result(db_query('SELECT COUNT(*) FROM temp_search_results'))) > 0) {
db_query_temporary("DROP table temp_search_results");
}
if (($count = db_result(db_query('SELECT COUNT(*) FROM temp_search_sids'))) > 0) {
db_query_temporary("DROP table temp_search_sids");
}
return $results;
}
?>
| Attachment | Size |
|---|---|
| drupalcoder-autotag.php.zip | 2.46 KB |






Comments
Post new comment