Matchers module

Matcher functions to determine similarity between two strings

Using different algorithms and given two strings, each function returns a value between 0 and 1, where 0 is completely different and 1 represent complete equality (as in “hello world”, “hello world”)

search.matchers.intersect_token_ratio(query, string)[source]

Perform a match utilizing the intersection method.

search.matchers.position_similarity(el1, el2, seq1, seq2)[source]

Get the normalized inverted movement cost for for between el1 and el2 on the seq2 iterable.

The function is used to get a value describing how far two words are in a phrase (as list, as in string.split(' ') or, in our case through search.utils.tokenize()).

Moves are relative to el1 on seq1, which should be the longest set for the function to work properly.

Warning

The function is currently broken and always returns 1, making the position inside the matching string irrelevant.

Note

The given strings MUST be inside the corresponding list.

Parameters:
  • el1 (any) – element of the seq1 iterable
  • el2 (any) – element of the seq2 iterable
  • seq1 (iterable) – iterable allowing the index method containing the el1 element.
  • seq2 (iterable) – iterable allowing the index method containing the el2 element.
Returns:

value 0 -> 1 representing how far the two words are, where 1 represent the closest(same position) and tending to zero the farthest on the maximum available moves possible on seq1.

Return type:

float

search.matchers.similarity(query, string)[source]

Calculate the match for the given query and string.

The match is calculated using the jaro winkler for each set of the matrix (query x string) and takes into consideration the position difference into the strings.

Parameters:
  • query (str) – search query
  • string (str) – string to test against
Returns:

normalized value indicating the probability of match, where 0 means completely dissimilar and 1 means equal.

Return type:

float

search.matchers.token_sort_ratio(query, string)[source]

generate tokens from query and string, then for each query token find the best partial ratio on the string and get the average value