Show HN: PaperGrep – Find academic papers referenced in production code
papergrep.devAround 9 years ago I wrote a blog-post looking for scientific papers in OpenJDK [0]. Back then I simply greped the source code searching for PDFs and didn't even know what a DOI is.
Since then, whenever I entered a new domain or worked in a new codebase, I wished I could see the papers referenced in the source. For example, PyTorch has great papers [1] describing implementation details of compilation and parallelization techniques. Reading those papers + the code that implements them is incredibly helpful for understanding both the domain and the codebase.
I finally decided to build PaperGrep as a simple tool for this. The biggest challenge wasn't parsing citations (though that's hard) - it's organizing everything in a useful way, which I'm still figuring out.
So far, the process is semi-automated: most of the tedious parts such as parsing, background jobs, metadata search is automated, but there is still a lot of manual work to review/curate the papers coming from ambigous or unclear citations.
Yet, I've already found some interesting papers to read through, so the effort was definitely worth it! Current selection of repos is biased based on my interests - what domains/repos am I missing?
[0] https://news.ycombinator.com/item?id=13022649 [1] https://papergrep.dev/repository/pytorch/pytorch