I take a Show HN to be a solicitation for feedback.
It took me a few tries to get this to run. I tried passing -model_path, but it complained that it couldn't find config.json. I then made a config.json in the pwd (which contained the executable), but it couldn't expand "~". I then tried searching *.txt, but it couldn't find anything unless I specified just one file.
In terms of offering an experience similar to grep, it's a lot slower. It's not quite as slow with the SLIM file, but I wonder if there's some low-hanging fruit for optimization. Perhaps caching similarity for input tokens?
The configuration thing is unclear to me. I think that "current directory" means "same directory as the binary", but it could mean pwd.
Neither of those is good: configuration doesn't belong where the binaries go, and it's obviously wrong to look for configs in the working directory.
I suggest checking $XDG_CONFIG_HOME, and defaulting to `~/.config/sgrep/config.toml`.
That extension is not a typo, btw. JSON is unpleasant to edit for configuration purposes, TOML is not.
Or you could use an ENV variable directly, if the only thing that needs configuring is the model's location, that would be fine as well.
If that were the on ramp, I'd be giving feedback on the program instead. I do think it's a clever idea and I'd like to try it out.