Pillbox is one of the largest free databases of prescription and over-the-counter drug information and images, combining data from pharmaceutical companies, Food and Drug Administration, National Institutes of Health, and Department of Veterans Affairs.
Pillbox for Developers is a resource for getting open access to the data processing code, understanding the methodology, and contributing to the project.
Pillbox’s primary data source (FDA drug labels) is complex and does not organize information based on individual pills. A Python data process is used to download the source data and produce an easy to use, “pill-focused” dataset. This improved data process is the beginning of greater flexibility for developers to understand the process and access the data.
This script can be run on your local machine by installing Python and necessary requirements. Follow this setup guide and steps for running the scripts to start processing on your own machine.
Source data is available from DailyMed in XML format. DailyMed is a service of the National Library of Medicine (NLM)
Python scripts download the DailyMed XML files, and process them into a JSON API and CSV.
Pillbox for Developers uses GitHub to enable anyone to join the process and help further development of the project. You can contribute to the project in by flagging errors or writing code to improve the process.
This space will be used to track development and enable open access to the data processing code. Future development will include improved data format outputs and great flexibility in data access. Access at github.com/HHS/pillbox-data-process.
Use the open issues queue to file bugs, or to help tackle existing issues.
Find documentation in the repository.
You can add features, or improve existing code by starting work on a new branch. After you've completed your update, submit a pull requests for Pillbox to review.
Pillbox data is currently provided in two forms: raw CSV and an API. Download the data or register to key an API key to access the data.
Download the raw data in CSV format
Get access to the API search data with custom queries