New Directories and File Packages for Bulk Retrieval of the PMC Article Datasets

PubMed Central (PMC) has made significant improvements to the bulk retrieval of two of thePMC Article Datasets from ourFTP service. The improvements were made to bulk packages which include metadata and full text files of articles in XML or plain text formats for thePMC Open Access (OA) Subset and theAuthor Manuscript Dataset, which combined encompass more than half of the 7 million articles in PMC. To improve the usability of these two datasets, PMC has redesigned the bulk download directory structure and file packages on our FTP service. The new structure includes:baseline packages that contain all articles available in PMC as of the baseline date for each respective dataset or grouping; anddaily incremental packages for each respective dataset or grouping that only contain articles that are new to the dataset or that have been updated since the baseline or previous incremental file was created.The PMC Open Access Subset bulk packages have been divided into three groups based on available license terms:Commercial Use Allowed - CC0, CC BY, CC BY-SA, CC BY-ND licenses;Non-Commercial Use Only - CC BY-NC, CC BY-NC-SA, CC BY-NC-ND licenses; andOther - no machine-readable Creative Commons license, no license, or a custom license.The baseline packages for each of these PMC Open Access Subset usage groups and for the Author Manuscript Dataset have been further divided by PMCID range (e.g., a package with PMC004XXXXXX in its name means that any appropriate articles with PMCIDs falli...
Source: PubMed Central News - Category: Databases & Libraries Source Type: news