./lib/search.ts annotated source
Back to indexlibsearch is the core text search algorithm that I've polished and reused over the years across many of my personal projects for fast and simple full-text search, packaged into a small single-purpose JavaScript library.
For how libsearch works, how to import and use in your own project, and canonical documentation, check out the GitHub repository page.
9
To turn every potential query into a regular expression, we need to be able to escape characters that are significant in RegExp.
12function escapeForRegExp(text: string): string {
13 return text.replace(/[.*+?^${}[\]()|\\]/g, '\\$1');
14}
15
Utility function for sorting an array by some predicate, rather than a
comparator function. This implementation assumes by(it)
is very cheap.
18function sortBy<T>(items: T[], by: (_it: T) => any): T[] {
19 return items.sort((a, b) => {
20 const aby = by(a);
21 const bby = by(b);
22 if (aby < bby) {
23 return 1;
24 }
25 if (bby < aby) {
26 return -1;
27 }
28 return 0;
29 });
30}
31
The search function takes:
items
, the list of items to searchquery
, the search query textby
, which is a predicate function that takes an item from the items list and returns the string that should be matched with the queryoptions
, a dictionary of options:
Options include
caseSensitive
, which is self-explanatorymode
: which is 'word', 'prefix', or 'autocomplete' ('autocomplete' by default), determining the way in which partial matches are processed
43export function search<T>(items: T[], query: string, by: (_it: T) => string = x => String(x), {
44 caseSensitive = false,
45 mode = 'autocomplete',
46}: {
47 caseSensitive?: boolean;
48 mode?: 'word' | 'prefix' | 'autocomplete';
49} = {}) {
countMatches
counts the number of times regexp
occurs in the string
s
. We need this information for ranking, where documents that mention
the keyword more times (relative to the total word count of the
document) are ranked higher.
54 function countMatches(s: string, regexp: RegExp): number {
55 let i = 0;
56 while (regexp.exec(s) !== null) {
57 i ++;
58 }
59 return i;
60 }
61
We chunk up the query string into a list of "words", each of which will become a regular expression filter.
64 const words = query
65 .trim()
66 .split(/\s+/)
67 .filter(s => s !== '');
68
Short-circuit if the search query is empty -- return the original list. This is a sensible default because in most apps this corresponds to the "home view" of the list, where a search has not been performed.
72 if (words.length === 0) {
73 return items;
74 }
75
For every word in the search query, we're going to keep track of every document's TF-IDF value in this map, and aggregate them together by the end for sorting.
79 const tfidf = new Map<T, number>();
80
Iterate through every word in the query and progressively filter down
items
to just the documents that match every query word.
83 const results = words.reduce((results, word, i) => {
84 const isLastWord = i + 1 === words.length;
85 const regexp = new RegExp(
86 '(^|\\W)'
87 + escapeForRegExp(word)
88 + ((mode === 'autocomplete' && isLastWord) || mode === 'prefix' ? '' : '($|\\W)'),
The 'u' flag for Unicode used to be used here, but was removed because it was (1) across-the-board too slow, and removing it made a statistically significant speed improvement, and (2) caused at least Chrome to have strange performance cliffs in unpredictable ways where certain RegExp operations would take 10s of ms.
95 caseSensitive ? 'mg' : 'img'
96 );
97 return results.filter(result => {
98 const text = by(result);
99 const count = countMatches(text, regexp);
100 if (count === 0) {
101 return false;
102 }
Compute the TF-IDF value for this word
, and add it to this
result's TF-IDF value so far.
105 tfidf.set(
106 result,
107 (tfidf.get(result) || 0)
108 + (count / text.length * Math.log(items.length / results.length))
109 );
110 return true;
111 })
112 }, items);
113
Sort the results list by our ranking metric, TF-IDF.
115 return sortBy(results, result => tfidf.get(result));
116}
117
118