About kb:preprints

Knowledge Browser: Preprints (or just kb:preprints) supports dissection of new scientific discoveries.

What is the goal of the project?

A few years ago there were only two preprint servers - arXiv and bioRxiv - but today that number has risen to a few dozen, with new ones added each year. kb:preprints has been created out of a pure need to have one place for browsing all preprints without disadvantages of current solutions.

How does it work?

kb:preprints collects and dissects metadata about new preprints. With NLP solutions and algorithms crafted specially for scientific texts, an index is built, which can be searched by a user. Query entered into the search engine is also treated by NLP to match various forms of the same thing (for example "gravity waves" matches "gravitational waves"). All NLP modifications leave a trace in the form of relative relevancy, which in the end is calculated in a scale from 1 to 10. This becomes the basis of the results presentation: preprints with relevancy from 10 to 4 are treated as relevant, with 3 are somewhat relevant, and below 3 are usually not relevant. Those three groups are shown and sorted separately by the time of posting to a preprint server. It is also possible to discard date-based sorting and rely on relevance only.

What is the future of the project?

The search engine is the central feature and will be constantly improved to provide the best possible experience. There is also work in progress around discovery-specific tools, such as alerts and browsable feeds. Overall idea (and name) of Knowledge Browser broadly refers to general knowledge, and it is possible that the project will evolve into more ambitious effort (it would be neat to discover not only new preprints but also all new publications!).

How do you cooperate with preprint servers?

Crawling and indexing algorithms are executed in a very gentle way. Essentially the whole service uses a lot fewer resources than an average user, as the only downloaded information is metadata about publications added in the last few days.

Who is behind the project?

Website is developed and maintained by Rafał Grochala. We can get in touch here: @Ivegot99introns (DMs are open)