Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.

Career Column: For Nicholas DeVito, Georgia Richards and Peter Inglesby, custom webscrapers have driven their research — and their collaborations. In research, time and resources are precious. Automating common tasks, such as data collection, can make a project efficient and repeatable, leading in turn to increased productivity and output. You will end up with a shareble and reproducible method for data collection that can be verified, used and expanded on by others — in other words, a computationally reproducible data-collection workflow. In a current project, we are analysing coroners’ reports to help to prevent future deaths. It has required downloading more than 3,000 PDFs to search for opioid-related deaths, a huge data-collection task. In discussion with the larger team, we decided that this task was a good candidate for automation. With a few days of work, we were able to write a computer program that could quickly, efficiently and reproducibly collect all the PDFs and create a spreadsheet that documented each case.

More information

Type

Journal article

Journal

Nature Research

Publisher

Nature

Publication Date

08/09/2020

Addresses

DPhil student: Georgia Richards