Let the battle begin!

Search API vs Apachesolr

Drupal Dev Days

Drupal Search and Solr Wizardry

 

Matthias Hutterer
Nick Veenhof

 Introduction 

Nick Veenhof

@nick_vh

http://www.nickveenhof.be

Nick_vh @ Drupal.org

Matthias Hutterer

mh86 @ Drupal.org

  • Drupal contributions including Email Field and Taxonomy Manager
  • Working at epiqo
    • Building powerful searches for job portals
    • Main motivation for a new Search API

 Search Api 

About

  • Framework for easily creating searches
  • Abstracts from data sources and backend implementations
  • Large ecosystem with extensions, e.g. backends
  • Facet API integration
  • Heavily based on Entity API
    • Provides metadata
    • Used for index and server configurations

Basic Structure

Index any data

  • Different datasources
  • One datasource: entities
  • Based on Entity API:
    • Each property can be indexed
    • Properties of related entities can be indexed

How to configure your index - Fields

Search API Views

  • Full Views support
  • Display any property of an entity
  • Use any indexed field as filter, argument or sort
  • Most code based on Entity API's views integration
  • By default: data retrieved via entity load
    • Can be bypassed ("Retrieve data from Solr" setting in server)
  • Alternative: Search API pages

Extensions

  • Backends
    • Apache Solr
    • Database
    • Fuzzy Search
    • Xapian
    • Elasticsearch (But we need you!)
    • ...

Extensions

  • Features
    • Search API Autocomplete
    • Attachments
    • Saved Searches
    • Location
    • Pretty Facets Paths
    • Slider (in progress)
    • ...
  • Current dev: Search API Statistics (GSOC Project)

 Search Api Recipes 

API Overview

  • CRUD hooks for indexes and servers
  • Hooks for adding
    • data sources
    • backends
    • data alterations
    • processors
  • Hook fired when indexing items
  • Hook fired when executing a search

Implementing a Backend - Part I


/**
 * Implements hook_search_api_service_info().
 */
function search_api_solr_search_api_service_info() {
  $services['search_api_solr_service'] = array(
    'name' => t('Solr service'),
    'class' => 'SearchApiSolrService',
    'description' => t('Index items using an Apache Solr search server.'),
  );
  return $services;
}

            

Implementing a Backend - Part II

/**
 * Search service class using Solr server.
 */
class SearchApiSolrService extends SearchApiAbstractService {
  public function configurationForm(array $form, array &$form_state) {}
  public function supportsFeature($feature) {}
  public function addIndex(SearchApiIndex $index) {}
  public function removeIndex($index) {}
  public function indexItems(SearchApiIndex $index, array $items) {}
  public function deleteItems($ids = 'all', SearchApiIndex $index = NULL) {}
  public function search(SearchApiQueryInterface $query) {}
}
            

How to index custom fields

/**
 * Implements hook_entity_property_info_alter().
 */
function example_entity_property_info_alter(&$info) {
  $info['node']['properties']['example_random_number'] = array(
    'type' => 'integer',
    'label' => t('Random number'),
    'computed' => TRUE,
    'getter callback' => 'example_property_random_number_getter_callback',
  );
}

/**
 * Getter callback for a random number between 1 and 100.
 */
function example_property_random_number_getter_callback($item) {
  return mt_rand(1, 100);
}
          

Search API Query Alter

/**
 * Implements hook_search_api_query_alter().
 */
function example_search_api_query_alter(SearchApiQueryInterface $query) {
  $index = $query->getIndex();
  if ($index->machine_name == "my_search") {
    $query->condition('my_field', 'condition');
  }
}
          

Search API Solr Query Alter

/**
 * Implements hook_search_api_solr_query_alter().
 */
function example_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query) {
  // Change the Solr request handler.
  $call_args['params']['defType'] = 'edismax';

  // Wrap search wards with wildcards for partial matches.
  // Note: only excerpt of full code.
  $key_array = array();
  foreach (explode(' ', $query->getOriginalKeys()) as $k) {
    $key_array[] = "*$k*";
  }
  $call_args['query'] = implode(' ', $key_array);
}
          

 Apachesolr Recipes 

Apache Solr

  • Open Source Enterprise Search Platform
  • Apache Foundation
  • Full-text search, highlighting, faceted search, clustering, rich document handling
  • Distributed
  • Replication/scalable
  • Java
  • REST HTTP and answers in XML/JSON and some others
  • Not Relational

How to create a Range/Slider facet

  • Facet Api
  • Facet Api Slider
  • Works for any numeric type
  • Code is pending for having date sliders
  • Quirk : In the facet, manually put the query type on Numeric Range when selecting the Slider widget

How to index locations

  • Apachesolr geo, Apachesolr location
  • Theme your results to have maps based on those locations.
  • TIP: Add custom code to get nice range facets

How to index attachments

  • Apachesolr Attachments
    • Refactored
    • Recognized as an entity linked to the node entity
    • Local or remote extractions
    • Tip: Attachments to other entities? Easy to link them in custom code
    • Quirk: No media support, we need help! Someone? Anyone? :)

How to have a common index between Drupal 6 and 7

  • Apachesolr Multisitesearch
    • Refactored
    • Small module, most of it is in the core module.
    • 7.x-1.x API and schema = 6.x-3.x API and schema, meaning easier maintenance of custom code
    • Tip: Theme the results so you know where they are coming from

How can I reduce the http request when using thumbnails

function HOOK_apachesolr_index_document_build(ApacheSolrDocument $document, $entity, $env_id) {
  // Warning : Simplified, you need more checks!
  // Encode the image to base64
  $img64 = base64_encode_image($entity->field_image['filename']);
  // Add it to the binary dynamic index field
  $document->setMultiValue('xm_field_img64', $img64);
}
function HOOK_apachesolr_query_alter(DrupalSolrQueryInterface $query) {
  $query->addParam('fl', 'xm_field_img64');
}
          

How can I create my own Query object?

New query object and how to make a query + using the fq/filters
$solr = apachesolr_get_solr();
$query = apachesolr_drupal_query("custom", array('q' => 'mykeys'), 'sort_label asc', 'search/path');
$query->setSolrsort('sort_name', 'desc');
$query->addFilter('bundle', (article OR page));
$query->removeFilter('bundle');
$query->addParam('fq', "bundle:(article OR page)");
$query->addParam('fq', "field_date:[1970-12-31T23:59:59Z TO NOW]");
$resp = $query->search();
            

Solr internals

What do all these FQ, FL. params mean?

  • Query (q)
    select/?q=Superhero
    http://lucene.apache.org/core/3_6_0/queryparsersyntax.html
  • sort, start, rows
    select/?q=Superhero&start=0&rows=10&sort=sort_name+asc
    Can sort on integers, but for strings you need to use sort_yourfield
  • Filter Query (fq)
    select/?q=Superhero&fq=bundle:person&fq=attribute:cape
  • Fields (to return) (fl)
    select/?q=Superhero&fl=id,entity_id,name,attribute,score,...
  • Highlighting (hl, hl.q, hl.fl)
    select/?q=Superhero&hl=true&hl.q=super&hl.fl=name,content,comments

Wait, I've seen many others..? Dismax/Edismax

  • defType
    select/?q=Superhero AND evil&defType=edismax
  • Alternative Query (q.alt)
    select/?q.alt=bundle:person
    Can only be used if your q param is empty or not specified.
  • Query fields (qf)
    select/?q=Superhero&qf=teaser^2.0
    Feld type boosting.
  • Phrase Fields (pf)
    select/?q=Robin Hood&pf=name^10
    Document type boosting.

How to use Elevate.xml

<elevate>
<query text="Superman"><doc id="HASH/node/248813" /></query>
</elevate>
          

Why are the schema and solrconfig.xml different in Search Api Solr and Apachesolr?


http://drupal.org/sandbox/cpliakas/1600962

How can I Debug Solr


select/?q=Robin Hood&somecomplexthing&debugQuery=on&debug=on
select/?q=Robin Hood&somecomplexthing&indent=true
admin/analysis.jsp?highlight=on
Jetty Console (live query log when starting with "java start.jar")

How can I enable replication in Solr?


How can I enable replication in Solr?

#solrcore.properties file
enable.master=false
enable.slave=true
poll_time=00:02:00
master_core_url=http://localhost:8983/solr/MYMASTERCORE
          
This file or support is not yet committed to both projects, but the initiative is making sure it will

How can I monitor my Solr?


New Relic + mbeans (nagios, ...)

Performance & Drupal

Performance testing Acquia Search

  • MergePolicies
    • LogByteSizeMergePolicy (1.4)
    • LogDocMergePolicy (1.4)
    • TieredMergePolicy (3.x)
  • Jmeter/Apache Access Logs
  • CentOS 5 - 2.6.18-xenU-ec2-v1.0 vs Ubuntu 10.04.4 - 2.6.32-341-ec2

Specifications of the Master

Large Instance (M1.large)
7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)

Specifications of the Slave

High-CPU Medium Instance (C1.medium)
1.7 GB of memory
5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)

Performance Graphs (Buytaert.net)

Performance Graphs

Performance conclusions

  • Keep LogByteMergePolicy with factor 4
  • TieredMergePolicy very interesting. Completely different
  • Solr 3.5 faster than Solr 1.4.1
  • Don't rely on default settings
  • Set Lucene version explicitly

 Future of Solr Search 

Joint forces


Joint forces

  • Apache Solr Common Configurations
  • Acquia Search for Search API
  • No plans Drupal 8 core
  • GSOC Project Search API Statistics
  • More geo and mapping integration
  • Search Api with Apachesolr backend??
  • Ideas?

 Questions? 

Thanks for your attention!