The cornerstones of data retrieval in search technology have been relational databases and keyword search engines. Keyword search operates on a principle of literal matching, pairing query terms with those in an indexed database, with techniques like Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) being the most common for ranking relevancy.
The TF-IDF value, for example, increases in direct proportion to the frequency of a word occurring in a document, balanced by the presence of the word in other documents within the corpus to help adjust for the fact that some words appear more frequently in general. This matching does not consider any meaning or context of the query. Instead, it is merely capable of identifying the exact match similarity of tokens or words. For this technology to work most effectively, two things must be true:
- The retailer data set must be uniform, consistent, and complete.
- The language, vocabulary, or jargon used to query must match that of the index.
That sounds simple. However, more often than not, retailers’ data is not uniform, consistent, or complete. In addition, users prefer to conduct their searches in colloquial language, which often does not match the retailer data. Without the inference of meaning, the matching of words could be impossible.
Keyword search configuration is a laborious task, requiring an understanding of the websiteβs unique user search behavior, knowledge of the corpus of data fields and tags in the context of their importance to a product, and technical understanding of how certain normalization, tokenization, and weighing should be applied to fields within the search index. Despite the amount of configuration that is done, there is still a reliance on token matching. In some cases, tokenization methods implemented to solve one problem or a use case create new problems for other queries.
Often, weights or βboostsβ are applied to specific categories or groups of products to combat poor results, conversely disappointing the retailer and the consumer when these products are ranked high in the results of what seems like an irrelevant query.
Due to the principle that keyword search relies upon the matching of tokens, this approach is incapable of grasping the nuanced meaning, context, or intent inherent in a user’s query.
Synonyms libraries are the most common tool to circumvent this problem. For example, βpantsβ and βtrousersβ are synonymous to each other. However, that simple relationship must be manually mapped in keyword search. That extends to more complex relationships between words that are not synonyms but rather share meaning with each other. For example, βlightweightβ, βportableβ, βsmallβ, βminiβ, and βtravelβ all might share the same meaning when referring to a βcamera tripodβ. These same words, nevertheless, might not share meaning when referring to other items in the same retailer catalog. Terms such as βtravel bagβ and βmini bagβ might refer to very different items, one designed with many compartments for a long trip and the other being compact and miniature.
The extent to which synonyms must be manually configured for keyword searches is far underestimated. In the case of words that share meaning only in specific contexts, a synonym library will only confuse queries where the meaning isn’t shared.
URL redirects are used to manage explicit results to poor keyword matches, directing users to a listing or landing page related to their query. This practice can be a good quick fix when a retailer is struggling to accurately match and rank search results for a specific keyword; however, in almost all cases, doing this does not provide a list of results that match the query by relevance but rather a static page of specific products that require the user to filter further.
URL redirects and synonyms libraries help fix many of the underlying problems with relevancy in keyword searches today. Often, these tools are used to combat null search result keywords or the top-ranking keywords on a website. However, manually managing an extensive catalog at scale across thousands of search keywords faces limitations.
Improving eCommerce site search is an ongoing process of implementing data-driven changes to core search algorithms. Site search generates valuable data that merchants can utilize to learn and find the best algorithms and configurations to match those search queries with brand products to maximize conversion rates.