./lib/search.ts annotated source
Back to indexlibsearch is the core text search algorithm that I've polished and reused over the years across many of my personal projects for fast and simple full-text search, packaged into a small single-purpose JavaScript library.
For how libsearch works, how to import and use in your own project, and canonical documentation, check out the GitHub repository page.
9
To turn every potential query into a regular expression, we need to be able to escape characters that are significant in RegExp.
12function escapeForRegExp(text: string): string {13 return text.replace(/[.*+?^${}[\]()|\\]/g, '\\$1');14}
15
Utility function for sorting an array by some predicate, rather than a
comparator function. This implementation assumes by(it) is very cheap.
18function sortBy<T>(items: T[], by: (_it: T) => any): T[] {19 return items.sort((a, b) => {20 const aby = by(a);
21 const bby = by(b);
22 if (aby < bby) {23 return 1;
24 }
25 if (bby < aby) {26 return -1;
27 }
28 return 0;
29 });
30}
31
The search function takes:
items, the list of items to searchquery, the search query textby, which is a predicate function that takes an item from the items list and returns the string that should be matched with the queryoptions, a dictionary of options:
Options include
caseSensitive, which is self-explanatorymode: which is 'word', 'prefix', or 'autocomplete' ('autocomplete' by default), determining the way in which partial matches are processed
43export function search<T>(items: T[], query: string, by: (_it: T) => string = x => String(x), {44 caseSensitive = false,
45 mode = 'autocomplete',
46}: {47 caseSensitive?: boolean;
48 mode?: 'word' | 'prefix' | 'autocomplete';
49} = {}) {countMatches counts the number of times regexp occurs in the string
s. We need this information for ranking, where documents that mention
the keyword more times (relative to the total word count of the
document) are ranked higher.
54 function countMatches(s: string, regexp: RegExp): number {55 let i = 0;
56 while (regexp.exec(s) !== null) {57 i ++;
58 }
59 return i;
60 }
61
We chunk up the query string into a list of "words", each of which will become a regular expression filter.
64 const words = query
65 .trim()
66 .split(/\s+/)
67 .filter(s => s !== '');
68
Short-circuit if the search query is empty -- return the original list. This is a sensible default because in most apps this corresponds to the "home view" of the list, where a search has not been performed.
72 if (words.length === 0) {73 return items;
74 }
75
For every word in the search query, we're going to keep track of every document's TF-IDF value in this map, and aggregate them together by the end for sorting.
79 const tfidf = new Map<T, number>();
80
Iterate through every word in the query and progressively filter down
items to just the documents that match every query word.
83 const results = words.reduce((results, word, i) => {84 const isLastWord = i + 1 === words.length;
85 const regexp = new RegExp(
86 '(^|\\W)'
87 + escapeForRegExp(word)
88 + ((mode === 'autocomplete' && isLastWord) || mode === 'prefix' ? '' : '($|\\W)'),
The 'u' flag for Unicode used to be used here, but was removed because it was (1) across-the-board too slow, and removing it made a statistically significant speed improvement, and (2) caused at least Chrome to have strange performance cliffs in unpredictable ways where certain RegExp operations would take 10s of ms.
95 caseSensitive ? 'mg' : 'img'
96 );
97 return results.filter(result => {98 const text = by(result);
99 const count = countMatches(text, regexp);
100 if (count === 0) {101 return false;
102 }
Compute the TF-IDF value for this word, and add it to this
result's TF-IDF value so far.
105 tfidf.set(
106 result,
107 (tfidf.get(result) || 0)
108 + (count / text.length * Math.log(items.length / results.length))
109 );
110 return true;
111 })
112 }, items);
113
Sort the results list by our ranking metric, TF-IDF.
115 return sortBy(results, result => tfidf.get(result));
116}
117
118