Course SyllabusDescription: Developers have created a number of packages for accessing the scholarly literature in R over the last several years, among them rcrossref, rorcid, and roadoi. These packages make use of the APIs in their systems to allow users to execute specific queries and pull the structured data into R, where it can be reshaped, merged with other data, and analyzed.
This session will assume no experience with R, and so will begin with a general introduction to R and the R Studio environment, based partly on my “Introduction to R for Libraries” ALCTS webinar series:
http://www.ala.org/alcts/confevents/upcoming/webinar/IntrotoR. This introduction will include reading data into R, installing packages, some functions for cleaning and restructuring data, and basics of visualizing data. Participants will be provided prewritten R scripts as well as explanatory handouts for each section of the course. This not only will help them ease into using R, but will serve as a resource in the future.
As participants become more comfortable using R, we will introduce some of the packages that allow us to access the scholarly literature. rcrossref interfaces with the CrossRef API, allowing users to pull article metadata based on DOIs, keywords, funders, authors, and more. This can be immensely powerful for collecting citation data, conducting literature reviews, creating bibliographies, and more. rorcid interfaces with the ORCID API, allowing users to pull publication data based on a specific ORCID iD, or to input names and other identifying information to find a specific individual’s identifier. Finally, roadoi interfaces with Unpaywall, allowing users to input a set of DOIs and return publication information along with potential locations of open access versions. As we work through the packages, participants will continue to learn R functions for working with data, including dplyr, purrr, and tidyr.
By the conclusion of the session, participants will be able to work with and analyze data in R, and will be familiar with the major functions in each of the listed packages. On a deeper level, they will have more powerful tools for gathering subsets of the scholarly literature in clean and structured formats based on specific parameters.
Intended Audience: While some experience working in R will be helpful, this session will assume no knowledge of R. Preparatory work is not required, but it may be useful for participants to download R and R Studio, install the swirl package (
https://swirlstats.com/students.html), and run through those exercises in order to get a baseline level of preparedness.
Requirements: Students will be required to bring a laptop with R and R Studio installed, and a few packages installed as well. I will provide detailed installation instructions.