Drupal coder

Apache Solr

Three things we learned from indexing a Drupal site with millions of nodes in Apache SOLR

For one of our clients, we are running a Drupal site with about a millions of nodes. Before launch, those nodes are imported from another database and then indexed into Apache SOLR. The total time to index all of these nodes in an empty SOLR instance is measured in days rather than hours or minutes.

A bit too long to do this import regularly. So me and my (XDebug) profiler delved into the Apache SOLR module code to look where we could scrape of a few hours/days of the execution time.

Seemed like in our case, there were 3 components responsible for a large share of the execution time. Let's have a look.

BTW. We are using the latest dev build of version 2 of the Apache SOLR module.

September 06, 2010Apache Solr, Drupal, performance, search

Performance tip : disable Drupal's core search indexer when using Apache Solr

Here's a quick tip for people using the (simply awesome) Apache Solr Search Integration module.

The Apache Solr module depends on Drupal's core Search module. This means the Search module will be enabled too when the Solr module is installed. As soon as the core Search module is enabled it starts to index all your nodes and stuff too. This not only takes time to run, but also fills up your database (search_dataset, search_index, ... tables).

Most of the time when you are installing the Apache Solr Search module, you don't need Drupal's core search form and replace the search form with the Apache Solr one using the Apache Solr module setting "Make Apache Solr Search the default:". You have now disabled the core Search module's form, but you have not disabled the indexing. To disable the indexing and save some CPU cycles and database space, just go to your site's search settings page (admin/settings/search) and set the "Number of items to index per cron run" to 0.

search-index-limit.png

Installing Apache Solr in Tomcat for Drupal on Snow Leopard

There's quite a few information available on how to install Apache Solr for your Drupal website. One of the best places to start is the Apache Solr Search Integration module documentation page. In this post I will gather all the bits and pieces for installing Solr in Tomcat on one specific platform: Snow Leopard. This is the platform I'm developing Drupal sites on and the great thing is it has all the needed Java stuff built in, so it's quite easy to install Solr and Tomcat. This method might work on some other systems too having Java 1.6 (with mostly some minor adjustments) but I've not tested this.

February 15, 2010Apache Solr, Drupal, OS X, search, Tomcat

Book review (from a Drupal point of view) : Solr 1.4 Enterprise Search Server

Apache Solr 1.4 Enterprise Search ServerSearch is ubiquitious. It's available on all sites, desktop applications, ... A good search engine is something essential for letting your users get what they want. There's a lot of factors that define what a makes a good search engine: speed, accuracy of the results, ...

Drupal has already a plethora of solutions these problems. There is a search module in core. We also have integration with Google Search Appliance, Custom search, Lucene, ... But most recently Apache Solr is hotter than hot and seems to become the standard replacement for Drupal's core search solution. Even more now since Acquia uses it as the core of one of its flag products, Acquia Search.