Website ElasticSearch Search Engine

Data Storage

The Elasticsearch search engine uses multiple indexes (one per each website and entity) prefixed with DSN path from the website_search_engine_dsn container parameter. The Elasticsearch indexes structure is defined in the index mapping. The mapping contains list of indexes, types (one per index), fields, their declarations and additional index settings. The index contains custom settings, including analyzer and tokenizer, and mapping settings for all entities.

The WebsiteElasticSearchBundle reads mapping configuration, defining the search index configuration, from website_search.yml files.

Example configuration:

Oro\Bundle\ProductBundle\Entity\Product:
    alias: oro_product_WEBSITE_ID
    fields:
        -
            name: sku
            type: text
        -
            name: names_LOCALIZATION_ID
            type: text
        -
            name: all_text_LOCALIZATION_ID
            type: text
            store: false
            default_search_field: true

If your deployment hosts two websites with IDs 1 and 2, the following search index mappings are built automatically, based on the above configuration:

{
  "oro_website_search_oro_product_1" : {
    "settings" : {
      "index" : {
        "mapping" : {
          "total_fields" : {
            "limit" : "10000000"
          }
        },
        "query" : {
          "default_field" : "all_text_LOCALIZATION_ID"
        },
        "max_result_window" : "10000000",
        "analysis" : {
          "filter" : {
            "substring" : {
              "type" : "nGram",
              "min_gram" : "1",
              "max_gram" : "100"
            }
          },
          "analyzer" : {
            "fulltext_search_analyzer" : {
              "filter" : [
                "lowercase",
                "unique"
              ],
              "tokenizer" : "whitespace"
            },
            "fulltext_index_analyzer" : {
              "filter" : [
                "lowercase",
                "substring",
                "unique"
              ],
              "char_filter" : [
                "html_strip"
              ],
              "tokenizer" : "whitespace"
            }
          }
        }
      }
    },
    "mappings" : {
      "oro_product_1" : {
        "_all" : {
          "enabled" : false
        },
        "dynamic_templates" : [
          {
            "all_text_LOCALIZATION_ID" : {
              "match" : "^all_text_[^_]+$",
              "match_mapping_type" : "string",
              "match_pattern" : "regex",
              "mapping" : {
                "fields" : {
                  "analyzed" : {
                    "type" : "text",
                    "search_analyzer" : "fulltext_search_analyzer",
                    "analyzer" : "fulltext_index_analyzer"
                  }
                },
                "store" : false,
                "type" : "keyword"
              }
            }
          },
          {
            "names_LOCALIZATION_ID" : {
              "match" : "^names_[^_]+$",
              "match_mapping_type" : "string",
              "match_pattern" : "regex",
              "mapping" : {
                "fields" : {
                  "analyzed" : {
                    "type" : "text",
                    "search_analyzer" : "fulltext_search_analyzer",
                    "analyzer" : "fulltext_index_analyzer"
                  }
                },
                "store" : true,
                "type" : "keyword"
              }
            }
          }
        ],
        "properties" : {
          "sku" : {
            "type" : "keyword",
            "store" : true,
            "fields" : {
              "analyzed" : {
                "type" : "text",
                "analyzer" : "fulltext_index_analyzer",
                "search_analyzer" : "fulltext_search_analyzer"
              }
            }
          }
        }
      }
    }
  },
  "oro_website_search_oro_product_2" : {
    "settings" : {
        ...
    }
    "mappings" : {
      "oro_product_2" : {
        "_all" : {
          "enabled" : false
        },
        "dynamic_templates" : [
            ...
        ],
        "properties" : {
            ...
        }
      }
    }
  }
}

Two product indexes (oro_website_search_oro_product_1 and oro_website_search_oro_product_2) with one type in each (oro_product_1 and oro_product_2) contain product information for the appropriate website (with WEBSITE_ID 1 and 2 respectively). Names of these product indexes and types are built automatically based on the oro_product_WEBSITE_ID placeholder. Product information contains the following:

The dynamic fields mapping with names_LOCALIZATION_ID, descriptions_LOCALIZATION_ID and all_text_LOCALIZATION_ID placeholders in these types are used to automatically set mapping for the fields that match provided patterns.
The plain mapping is defined for sku and tmp_alias fields. A tmp_alias is a special field used during the indexation.
The default configuration for analyzer and tokenizer.
By default, all fields are stored, but you may configure some to be not. Storing fields means that, apart from being queried, it is possible to read and return them from the server. Disabling storing of some fields can save some storage space.
The default field for querying is specified via the default_field.

Search

The Elasticsearch provides high quality search capabilities with relevant search results and speedy response. Elasticsearch is a document-based storage that is fast and easy to scale, but does not support execution of the relation-based queries.

The WebsiteElasticSearchBundle reuses functionality from ElasticSearchBundle and thus uses the same approach to build request: it converts the Oro\Bundle\SearchBundle\Query\Query object to the request data using request builders. Each of the builders implements Oro\Bundle\ElasticSearchBundle\RequestBuilder\RequestBuilderInterface and converts some part of the object to an appropriate part of the Elasticsearch query.

Let’s assume that you need to execute following query:

SELECT
    text.sku,
    text.names_LOCALIZATION_ID,
    text.shortDescriptions_LOCALIZATION_ID
FROM
    oro_product_WEBSITE_ID
WHERE
    text.all_text_LOCALIZATION_ID ~ "light"
LIMIT 25

Elasticsearch engine converts it to the request similar to the following one:

curl -XGET '181.1.24.34:9200/oro_website_search_oro_product_1/oro_product_1/_search?_source=sku,names_2,shortDescriptions_2' -H 'Content-Type: application/json' -d '
{
    "query":{
        "match":{
            "all_text_2.analyzed":"light"
        }
    },
    "from":0,
    "size":25
}'

where the Elasticsearch engine is running on the 181.1.24.34 host via the 9200 port.

Aggregations

Another example: we need to calculate count of products per SKU. Lets update previous query to have calculated count of products. For that we will execute previous query and should have next configuration for aggregations:

AGGREGATE text.sku COUNT AS skuCounts

Elasticsearch engine converts it to the request similar to the following one:

curl -XGET '181.1.24.34:9200/oro_website_search_oro_product_1/oro_product_1/_search?_source=sku,names_2,shortDescriptions_2' -H 'Content-Type: application/json' -d '
{
    "query":{
        "match":{
            "all_text_2.analyzed":"light"
        }
    },
    "from":0,
    "size":25,
    "aggregations":{
        "skuCounts":{
            "terms":{
                "field":"sku",
                "size":100000000
            }
        }
    }
}'

As you see in request we have size parameter with big value. It is added manually, because Elasticsearch by default will return only 10 record. With these value we will have all records even if there are more than 10 of them.

Indexation

Indexation in the Elasticsearch is pretty simple. The data is collected using the standard WebsiteSearchBundle functionality and data is saved to the index according to the specified mappings.

The only interesting part in this engine is how unused entities are removed from index. To do that during the indexation, each entity has one more service field tmp_alias which is used to store name of the temporary alias of an entity assigned to it during the indexation. After indexation is finished engine removes all entities with alias not equal to an alias of the current indexation (which are outdated entities that must not be present in search index any longer).