One thing becomes clear if you use Drupal for a while... the built-in search is poor. This isn't unexpected, as search is a a hard problem to solve. And the best way to get around hard problems is to find someone who has done it well and use their work.
There are a number of products out there, including Google, Sphinx, Thunderstone, etc. some of which have integration already done to one extent or another. Another product which has gotten a lot of interest recently is Apache Solr, which is a frontend that sits on top of their search product Lucene. Solr's advantage for Drupal is that it indexes nodes, not pages. This means it can have access to attributes of the node that are not readily parsable from the rendered page. these attributes can be used to filter the results.
Want to filter the search results by a CCK content type? Just expose the filter block and away you go. But this is the sort of information you can get with a simple search. The fun part happens when you want to add your own items to the index and make it available as a facet.
On a recent project, we captured a lot of content that had geographic data using the Location module — Country, State/Province, City, Postal Code. We wanted to expose this data to Solr. The first part is pretty easy. You just put it into the search index. To do that, you implement hook_apachesolr_update_index which has the following signature:
hook_apachesolr_update_index(&$doc, $node)
Pretty self-explanatory, it takes the document in the search index and the node being indexed. If you want to add things to the search index just add them as attributes to the document, as below:
function mymodule_apachesolr_update_index(&$doc, $node)
{
if (!empty($node->location['lid']))
{
$doc->sm_name = $node->location['name'];
$doc->sm_street = $node->location['street'];
$doc->sm_city = $node->location['city'];
$doc->sm_province = $node->location['province'];
$doc->sm_postal = $node->location['postal'];
$doc->sm_country = $node->location['country'];
$doc->fm_lat = $node->location['latitude'];
$doc->fm_lng = $node->location['longitude'];
$doc->sm_province_name = $node->location['province_name'];
$doc->sm_country_name = $node->location['country_name'];
}
}
You'll note that each of the document attributes is prefixed by either an "sm_" or "fm_". These are used to indicate that the values are either floats or strings and can be multiple. In fact, the above code was written with only a single lid because that was all we were capturing, but could readily adapted to multiple locations. So now when each node is indexed and it has a location those attributes are indexed. Great. But now we need to actually get this information and use it to filter results.
To filter results, we have to tell Solr that we have facets that are exposed by our module, which is accomplished by implementing hook_apachesolr_facets to return an array of facets like the following:
function mymodule_apachesolr_facets() {
$keys = _mymodule_location_key();
foreach ($keys as $id => $key)
{
$facets[$key] = array(
'info' => t('Apache Solr Location: Filter by Location: @type', array('@type' => $id)),
'facet_field' => $key,
'display_callback' => 'mymodule_' . strtolower($id) . '_name'
);
}
return $facets;
}
function _mymodule_location_key()
{
$keys = array();
$keys['Country'] = apachesolr_index_key(array(
'name' => 'country',
'multiple' => TRUE,
'index_type' => 'string',
));
$keys['Province'] = apachesolr_index_key(array(
'name' => 'province',
'multiple' => TRUE,
'index_type' => 'string',
));
$keys['City'] = apachesolr_index_key(array(
'name' => 'city',
'multiple' => TRUE,
'index_type' => 'string',
));
return $keys;
}
This returns an array of facets, each one has the appropriate title, has a 'facet_field' that matches what we declared in hook_apachesolr_update_index (e.g. sm_city, sm_province, etc.), and a display callback function to get the text to display to the user for the facet. This is important because we're indexing the province and country values not their names (e.g. in the U.S. state initials, VA, DC, MD, and ISO 3166 country codes, such as us) and you probably want to give the user a better experience than that.
The next part is to actually implement the blocks that show the facets. This is done through your standard hook_block:
function mymodule_block($op = 'list', $delta = 0, $edit = array()) {
switch ($op) {
case 'list':
$enabled_facets = apachesolr_get_enabled_facets('mymodule');
$facets = mymodule_apachesolr_facets();
// Add the blocks
$blocks = array();
foreach ($enabled_facets as $delta => $facet_field)
{
if (isset($facets[$delta])) {
$blocks[$delta] = $facets[$delta] + array('cache' => BLOCK_CACHE_PER_PAGE,);
}
}
return $blocks;
case 'view':
if (apachesolr_has_searched()) {
$keys = _mymodule_location_key();
$response = apachesolr_static_response_cache();
$query = apachesolr_current_query();
if (empty($response))
{
return;
}
if (is_object($response->facet_counts->facet_fields->{$delta}))
{
$facets = array();
switch ($delta)
{
case $keys['Country']:
$facets = _mymodule_country_list($keys, $query, $response);
break;
case $keys['Province']:
// TODO: Add default value for country in configuration
$facets = _mymodule_province_list($keys, $query, $response, 'us');
$break;
case $keys['City']:
$facets = _mymodule_city_list($keys, $query, $response);
break;
}
}
$items = apachesolr_search_nested_facet_items($query, $facets, $response->response->numFound);
if ($items && ($response->response->numFound > 0))
{
$limit = isset($initial_limits['apachesolr_search'][$delta]) ? $initial_limits['apachesolr_search'][$delta] : $limit_default;
return array(
'subject' => t('Filter by @type', array('@type' => array_search($delta, $keys))),
'content' => theme('apachesolr_facet_list', $items, $limit),
);
}
}
break;
case 'configure':
return apachesolr_facetcount_form('mymodule', $delta);
break;
case 'save':
apachesolr_facetcount_save($edit);
break;
}
}
Let's tackle the easy parts first. Configure and save just call the standard Solr facet form functions passing $delta and the name of the module for configure and $edit in save. List is almost as simple, it gets a list of all enabled facets, compares them to the facets exposed by the module, and for each one that is returns a block.
The meat of the functionality is in the view, and the functions that it calls. The first thing we do is get the current Solr query and response using the Apachesolr API functions. We can use these to determine which facets should be shown and how to add the facet to the URL. The functionality for the facets themselves is pretty similar, I'll give an example below of the country implementation:
function _mymodule_country_list($keys, $query, $response)
{
$key = $keys['Country'];
$facets = array();
foreach ($response->facet_counts->facet_fields->{$key} as $country => $count)
{
if ($country != '_empty_')
{
$active = $query->has_filter($key, $country);
$facets[$country] = array(
'#name' => $key,
'#value' => $country,
'#exclude' => FALSE,
'#count' => $count,
'#parent' => 0,
'#children' => array(),
'#has_children' => false,
'#active' => $active,
);
if ($active)
{
$provinces = _mymodule_province_list($keys, $query, $response, $country);
if (0 < sizeof($provinces))
{
$facets[$country]['#has_children'] = true;
$facets[$country]['#children'] = $provinces;
}
}
}
}
return $facets;
}
The Solr response contains the list of facets and the count of results for each one in $response->facet_counts->facet_fields. For example, $response->facet_counts->facet_fields->sm_country would be an array of all country facets that were present in the current Solr response along with the count. Since we don't want to allow a user to get to a dead-end we're only concerned with countries that have results. In addition we want the user to be able to drill-down from the country to the province and then to the city to find more specific results. So when we loop through all of the facets from the response we check to see if it is currently active, and if so we get the province facets that are available and put them as children. Province and Cities work pretty much the same way, just with different facets. Below is a screen shot showing it all in action:

The great thing about Solr is it can also integrate with Views. With the advent of Views 3, you can pull Views from alternate repositories, including Solr. The Apache Solr Views integration is a little fragile, and there are a number of bugs. So if you want to make that happen, be sure to read through the issues log on it carefully. But it allows you to do some great things that we weren't able to before.
Forum One News
William was born in Bangkok, Thailand, and grew up in India, Panama, and Morocco. Now he delves into languages and cultures that are more of the programming variety. He has been developing on the...





Comments
Displaying in search results
Hello,
I'm wondering how to display these fields in search results. I.e. "sm_postal".
I manage to get the data indexed, but it is not included within search results by default.
Best,
Konrad
Displaying search results
I believe you can use hook_apachesolr_search_result to modify the search result and include values in the output.
thanks
Hey, I just wanted to say thanks for this post, one of the most useful posts on tweaking the Drupal Apache Solr module that I've come across. I am successfully adding more fields to the Solr index and pulling adding custom facet blocks based on your example above.
Many thanks!
Rivulet ES is an open source
Rivulet ES is an open source enterprise search server based on the Lucene Java
search library And Solr,Like to use Solr can use the same Rivulet ES, with XML/HTTP
and JSON APIs, hit highlighting, faceted search,
caching, replication, and a web administration interface. It runs in a Java
servlet container such as Tomcat,In addition, Rivulet ES adds a visual management
and control platform, most of the functions of Solr way through WEB access and use,
you can collect the data source, including file system, Network File System, CMS, ECM,
the database can be collected IBM's content management system, EMC documentum,
Rivulet ES the most important feature is the ability to customize data from
different sources show different ways
Post new comment