Let the battle begin!
Search API vs Apachesolr

Drupal Dev Days
Drupal Search and Solr Wizardry


Matthias Hutterer
Nick Veenhof
Introduction

Nick Veenhof
Matthias Hutterer
- Drupal contributions including Email Field and Taxonomy Manager
- Working at epiqo
- Building powerful searches for job portals
- Main motivation for a new Search API
Search Api
About
- Framework for easily creating searches
- Abstracts from data sources and backend implementations
- Large ecosystem with extensions, e.g. backends
- Facet API integration
- Heavily based on Entity API
- Provides metadata
- Used for index and server configurations
Basic Structure

Index any data
- Different datasources
- One datasource: entities
- Based on Entity API:
- Each property can be indexed
- Properties of related entities can be indexed
How to configure your index - Fields

Search API Views
- Full Views support
- Display any property of an entity
- Use any indexed field as filter, argument or sort
- Most code based on Entity API's views integration
- By default: data retrieved via entity load
- Can be bypassed ("Retrieve data from Solr" setting in server)
- Alternative: Search API pages
Extensions
- Backends
- Apache Solr
- Database
- Fuzzy Search
- Xapian
- Elasticsearch (But we need you!)
- ...
Extensions
- Features
- Search API Autocomplete
- Attachments
- Saved Searches
- Location
- Pretty Facets Paths
- Slider (in progress)
- ...
- Current dev: Search API Statistics (GSOC Project)
Search Api Recipes
API Overview
- CRUD hooks for indexes and servers
- Hooks for adding
- data sources
- backends
- data alterations
- processors
- Hook fired when indexing items
- Hook fired when executing a search
Implementing a Backend - Part I
/** * Implements hook_search_api_service_info(). */ function search_api_solr_search_api_service_info() { $services['search_api_solr_service'] = array( 'name' => t('Solr service'), 'class' => 'SearchApiSolrService', 'description' => t('Index items using an Apache Solr search server.'), ); return $services; }
Implementing a Backend - Part II
/** * Search service class using Solr server. */ class SearchApiSolrService extends SearchApiAbstractService { public function configurationForm(array $form, array &$form_state) {} public function supportsFeature($feature) {} public function addIndex(SearchApiIndex $index) {} public function removeIndex($index) {} public function indexItems(SearchApiIndex $index, array $items) {} public function deleteItems($ids = 'all', SearchApiIndex $index = NULL) {} public function search(SearchApiQueryInterface $query) {} }
How to index custom fields
/** * Implements hook_entity_property_info_alter(). */ function example_entity_property_info_alter(&$info) { $info['node']['properties']['example_random_number'] = array( 'type' => 'integer', 'label' => t('Random number'), 'computed' => TRUE, 'getter callback' => 'example_property_random_number_getter_callback', ); } /** * Getter callback for a random number between 1 and 100. */ function example_property_random_number_getter_callback($item) { return mt_rand(1, 100); }
Search API Query Alter
/** * Implements hook_search_api_query_alter(). */ function example_search_api_query_alter(SearchApiQueryInterface $query) { $index = $query->getIndex(); if ($index->machine_name == "my_search") { $query->condition('my_field', 'condition'); } }
Search API Solr Query Alter
/** * Implements hook_search_api_solr_query_alter(). */ function example_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query) { // Change the Solr request handler. $call_args['params']['defType'] = 'edismax'; // Wrap search wards with wildcards for partial matches. // Note: only excerpt of full code. $key_array = array(); foreach (explode(' ', $query->getOriginalKeys()) as $k) { $key_array[] = "*$k*"; } $call_args['query'] = implode(' ', $key_array); }
Apachesolr Recipes
Apache Solr
- Open Source Enterprise Search Platform
- Apache Foundation
- Full-text search, highlighting, faceted search, clustering, rich document handling
- Distributed
- Replication/scalable
- Java
- REST HTTP and answers in XML/JSON and some others
- Not Relational
How to create a Range/Slider facet
- Facet Api
- Facet Api Slider
- Works for any numeric type
- Code is pending for having date sliders
- Quirk : In the facet, manually put the query type on Numeric Range when selecting the Slider widget

How to index locations
- Apachesolr geo, Apachesolr location
- Theme your results to have maps based on those locations.
- TIP: Add custom code to get nice range facets
How to index attachments
- Apachesolr Attachments
- Refactored
- Recognized as an entity linked to the node entity
- Local or remote extractions
- Tip: Attachments to other entities? Easy to link them in custom code
- Quirk: No media support, we need help! Someone? Anyone? :)
How to have a common index between Drupal 6 and 7
- Apachesolr Multisitesearch
- Refactored
- Small module, most of it is in the core module.
- 7.x-1.x API and schema = 6.x-3.x API and schema, meaning easier maintenance of custom code
- Tip: Theme the results so you know where they are coming from
How can I reduce the http request when using thumbnails
function HOOK_apachesolr_index_document_build(ApacheSolrDocument $document, $entity, $env_id) { // Warning : Simplified, you need more checks! // Encode the image to base64 $img64 = base64_encode_image($entity->field_image['filename']); // Add it to the binary dynamic index field $document->setMultiValue('xm_field_img64', $img64); } function HOOK_apachesolr_query_alter(DrupalSolrQueryInterface $query) { $query->addParam('fl', 'xm_field_img64'); }
How can I create my own Query object?
New query object and how to make a query + using the fq/filters$solr = apachesolr_get_solr(); $query = apachesolr_drupal_query("custom", array('q' => 'mykeys'), 'sort_label asc', 'search/path'); $query->setSolrsort('sort_name', 'desc'); $query->addFilter('bundle', (article OR page)); $query->removeFilter('bundle'); $query->addParam('fq', "bundle:(article OR page)"); $query->addParam('fq', "field_date:[1970-12-31T23:59:59Z TO NOW]"); $resp = $query->search();
Solr internals
What do all these FQ, FL. params mean?
-
Query (q)
select/?q=Superhero
http://lucene.apache.org/core/3_6_0/queryparsersyntax.html -
sort, start, rows
select/?q=Superhero&start=0&rows=10&sort=sort_name+asc
Can sort on integers, but for strings you need to use sort_yourfield -
Filter Query (fq)
select/?q=Superhero&fq=bundle:person&fq=attribute:cape
-
Fields (to return) (fl)
select/?q=Superhero&fl=id,entity_id,name,attribute,score,...
-
Highlighting (hl, hl.q, hl.fl)
select/?q=Superhero&hl=true&hl.q=super&hl.fl=name,content,comments
Wait, I've seen many others..? Dismax/Edismax
-
defType
select/?q=Superhero AND evil&defType=edismax
-
Alternative Query (q.alt)
select/?q.alt=bundle:person
Can only be used if your q param is empty or not specified. -
Query fields (qf)
select/?q=Superhero&qf=teaser^2.0
Feld type boosting. -
Phrase Fields (pf)
select/?q=Robin Hood&pf=name^10
Document type boosting.
How to use Elevate.xml
<elevate> <query text="Superman"><doc id="HASH/node/248813" /></query> </elevate>
Why are the schema and solrconfig.xml different in Search Api Solr and Apachesolr?

http://drupal.org/sandbox/cpliakas/1600962
How can I Debug Solr
select/?q=Robin Hood&somecomplexthing&debugQuery=on&debug=on
select/?q=Robin Hood&somecomplexthing&indent=true
admin/analysis.jsp?highlight=on
Jetty Console (live query log when starting with "java start.jar")
How can I enable replication in Solr?

How can I enable replication in Solr?
#solrcore.properties file enable.master=false enable.slave=true poll_time=00:02:00 master_core_url=http://localhost:8983/solr/MYMASTERCOREThis file or support is not yet committed to both projects, but the initiative is making sure it will
How can I monitor my Solr?

New Relic + mbeans (nagios, ...)
Performance & Drupal
Performance testing Acquia Search
- MergePolicies
- LogByteSizeMergePolicy (1.4)
- LogDocMergePolicy (1.4)
- TieredMergePolicy (3.x)
- Jmeter/Apache Access Logs
- CentOS 5 - 2.6.18-xenU-ec2-v1.0 vs Ubuntu 10.04.4 - 2.6.32-341-ec2
Specifications of the Master
Large Instance (M1.large)7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
Specifications of the Slave
High-CPU Medium Instance (C1.medium)1.7 GB of memory
5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)
Performance Graphs (Buytaert.net)
Performance Graphs
Performance conclusions
- Keep LogByteMergePolicy with factor 4
- TieredMergePolicy very interesting. Completely different
- Solr 3.5 faster than Solr 1.4.1
- Don't rely on default settings
- Set Lucene version explicitly
Future of Solr Search
Joint forces

Joint forces
- Apache Solr Common Configurations
- Acquia Search for Search API
- No plans Drupal 8 core
- GSOC Project Search API Statistics
- More geo and mapping integration
- Search Api with Apachesolr backend??
- Ideas?