我正在尝试开发一个简单的搜索引擎,以获得匹配的句子在一个文本文件与nodejs,但我想改进我的搜索引擎,以获得相似的文本,而不仅仅是准确的文本,有什么建议,我可以如何做到这一点?
这是我的代码:
const folder = "./movies/data";
const fs = require("fs");
function search(params) {
let list = [];
fs.readdirSync(folder).forEach((file) => {
const data = fs.readFileSync(`movies/data/${file}`, {
encoding: "utf8",
flag: "r",
});
if (data.includes(params)) {
list.push(data);
}
});
console.log(
`Foram encontradas ${list.length} ocorrências pelo termo ${params}.`
);
return `Foram encontradas ${list.length} ocorrências pelo termo ${params}.`;
}
let args = process.argv.slice(2);
search(args.join(" "));
module.exports = search;
在实现任何算法之前,您已经选择了文本比较算法。
其中最好的是Levenshtein Distance
https://en.wikipedia.org/wiki/Levenshtein_Distance
JS中Levenshtein距离实现的链接
https://www.tutorialspoint.com/Levenshtein-distance-in-javascript