Important

You are browsing the documentation for version 4.1 of OroCommerce, OroCRM and OroPlatform, which is no longer maintained. Read version 5.1 (the latest LTS version) of the Oro documentation to get up-to-date information.

See our Release Process documentation for more information on the currently supported and upcoming releases.

Website ElasticSearch Search Engine

Data Storage

The Elasticsearch search engine uses multiple indexes (one per each website and entity) prefixed with parameter from parameters.yml file. The Elasticsearch indexes structure is defined in the index mapping. The mapping contains list of indexes, types (one per index), fields, their declarations and additional index settings. The index contains custom settings, including analyzer and tokenizer, and mapping settings for all entities.

The WebsiteElasticSearchBundle reads mapping configuration, defining the search index configuration, from website_search.yml files.

Example configuration:

 1 Oro\Bundle\ProductBundle\Entity\Product:
 2     alias: oro_product_WEBSITE_ID
 3     fields:
 4         -
 5             name: sku
 6             type: text
 7         -
 8             name: names_LOCALIZATION_ID
 9             type: text
10         -
11             name: all_text_LOCALIZATION_ID
12             type: text
13             store: false
14         -
15             name: all_text
16             type: text
17             default_search_field: true
18             store: false

If your deployment hosts two websites with IDs 1 and 2, the following search index mappings are built automatically, based on the above configuration:

  1 {
  2   "oro_website_search_oro_product_1" : {
  3     "settings" : {
  4       "index" : {
  5         "mapping" : {
  6           "total_fields" : {
  7             "limit" : "10000000"
  8           }
  9         },
 10         "query" : {
 11           "default_field" : "all_text"
 12         },
 13         "max_result_window" : "10000000",
 14         "analysis" : {
 15           "filter" : {
 16             "substring" : {
 17               "type" : "nGram",
 18               "min_gram" : "1",
 19               "max_gram" : "100"
 20             }
 21           },
 22           "analyzer" : {
 23             "fulltext_search_analyzer" : {
 24               "filter" : [
 25                 "lowercase",
 26                 "unique"
 27               ],
 28               "tokenizer" : "whitespace"
 29             },
 30             "fulltext_index_analyzer" : {
 31               "filter" : [
 32                 "lowercase",
 33                 "substring",
 34                 "unique"
 35               ],
 36               "char_filter" : [
 37                 "html_strip"
 38               ],
 39               "tokenizer" : "whitespace"
 40             }
 41           }
 42         }
 43       }
 44     },
 45     "mappings" : {
 46       "oro_product_1" : {
 47         "_all" : {
 48           "enabled" : false
 49         },
 50         "dynamic_templates" : [
 51           {
 52             "all_text_LOCALIZATION_ID" : {
 53               "match" : "^all_text_[^_]+$",
 54               "match_mapping_type" : "string",
 55               "match_pattern" : "regex",
 56               "mapping" : {
 57                 "fields" : {
 58                   "analyzed" : {
 59                     "type" : "text",
 60                     "search_analyzer" : "fulltext_search_analyzer",
 61                     "analyzer" : "fulltext_index_analyzer"
 62                   }
 63                 },
 64                 "store" : false,
 65                 "type" : "keyword"
 66               }
 67             }
 68           },
 69           {
 70             "names_LOCALIZATION_ID" : {
 71               "match" : "^names_[^_]+$",
 72               "match_mapping_type" : "string",
 73               "match_pattern" : "regex",
 74               "mapping" : {
 75                 "fields" : {
 76                   "analyzed" : {
 77                     "type" : "text",
 78                     "search_analyzer" : "fulltext_search_analyzer",
 79                     "analyzer" : "fulltext_index_analyzer"
 80                   }
 81                 },
 82                 "store" : true,
 83                 "type" : "keyword"
 84               }
 85             }
 86           }
 87         ],
 88         "properties" : {
 89           "all_text" : {
 90             "type" : "keyword",
 91             "fields" : {
 92               "analyzed" : {
 93                 "type" : "text",
 94                 "analyzer" : "fulltext_index_analyzer",
 95                 "search_analyzer" : "fulltext_search_analyzer"
 96               }
 97             }
 98           },
 99           "sku" : {
100             "type" : "keyword",
101             "store" : true,
102             "fields" : {
103               "analyzed" : {
104                 "type" : "text",
105                 "analyzer" : "fulltext_index_analyzer",
106                 "search_analyzer" : "fulltext_search_analyzer"
107               }
108             }
109           }
110         }
111       }
112     }
113   },
114   "oro_website_search_oro_product_2" : {
115     "settings" : {
116         ...
117     }
118     "mappings" : {
119       "oro_product_2" : {
120         "_all" : {
121           "enabled" : false
122         },
123         "dynamic_templates" : [
124             ...
125         ],
126         "properties" : {
127             ...
128         }
129       }
130     }
131   }
132 }

Two product indexes (oro_website_search_oro_product_1 and oro_website_search_oro_product_2) with one type in each (oro_product_1 and oro_product_2) contain product information for the appropriate website (with WEBSITE_ID 1 and 2 respectively). Names of these product indexes and types are built automatically based on the oro_product_WEBSITE_ID placeholder. Product information contains the following:

  • The dynamic fields mapping with names_LOCALIZATION_ID, descriptions_LOCALIZATION_ID and all_text_LOCALIZATION_ID placeholders in these types are used to automatically set mapping for the fields that match provided patterns.

  • The plain mapping is defined for sku, all_text and tmp_alias fields. A tmp_alias is a special field used during the indexation.

  • The default configuration for analyzer and tokenizer.

  • By default, all fields are stored, but you may configure some to be not. Storing fields means that, apart from being queried, it is possible to read and return them from the server. Disabling storing of some fields (like the all_text) can save some storage space.

  • The default field for querying is specified via the default_field.

Aggregations

Another example: we need to calculate count of products per SKU. Lets update previous query to have calculated count of products. For that we will execute previous query and should have next configuration for aggregations:

AGGREGATE text.sku COUNT AS skuCounts

Elasticsearch engine converts it to the request similar to the following one:

 1 curl -XGET '181.1.24.34:9200/oro_website_search_oro_product_1/oro_product_1/_search?_source=sku,names_2,shortDescriptions_2' -H 'Content-Type: application/json' -d '
 2 {
 3     "query":{
 4         "match":{
 5             "all_text_2.analyzed":"light"
 6         }
 7     },
 8     "from":0,
 9     "size":25,
10     "aggregations":{
11         "skuCounts":{
12             "terms":{
13                 "field":"sku",
14                 "size":100000000
15             }
16         }
17     }
18 }'

As you see in request we have size parameter with big value. It is added manually, because Elasticsearch by default will return only 10 record. With these value we will have all records even if there are more than 10 of them.

Indexation

Indexation in the Elasticsearch is pretty simple. The data is collected using the standard WebsiteSearchBundle functionality and data is saved to the index according to the specified mappings.

The only interesting part in this engine is how unused entities are removed from index. To do that during the indexation, each entity has one more service field tmp_alias which is used to store name of the temporary alias of an entity assigned to it during the indexation. After indexation is finished engine simply removes all entities with alias not equal to an alias of the current indexation (which are outdated entities that must not be present in search index any longer).