Machine learning project predicts scientific paper categories

Dan Saattrup Nielsen

27 February 2020

Dan Saattrup Nielsen recently finished a machine learning project which takes a title and an abstract of a scientific paper and predicts which arXiv categories it belongs to (out of 148 possible categories).

PhD student Dan Saattrup Nielsen's project can be viewed via the website, which contains a demonstration of his recently trained machine learning model that predicts all the arXiv categories of a scientific paper from its title and abstract. The model correctly identifies 93% of the 6 main categories such as Maths and Physics, and it detects 65% of the 148 fine-grained categories (here 'classical' models don't get beyond ~45%). The model itself is a recurrent neural network with attention mechanisms, trained on the entirety of arXiv (~1.3 million papers).

