Do researchers have to pay a licensing fee for SAS in the Researcher Workbench?
SAS must be licensed for use, so many researchers are used to paying for this license. At this time, registered researchers do not need to pay for a SAS license while using the SAS Studio application on the Researcher Workbench. The software is provided for no additional cost to researchers through the Researcher Workbench. Users will still incur computational costs in the cloud, as with all other analyses.
Can I use my personal or institutional SAS account to analyze the All of Us dataset?
You cannot use your personal or institutional SAS login or SAS software to analyze the All of Us dataset. You must use SAS Studio within the Researcher Workbench to analyze All of Us data. To use SAS Studio, log in to your Researcher Workbench account and click the “SAS Studio” button on the right hand side of the workspace.
All of Us data can only be analyzed within the Researcher Workbench platform. You cannot download data to use in other software, including SAS.
How is the All of Us Research Program different from other longitudinal cohort studies?
Unlike many research studies that focus on a specific disease or population, the All of Us Research Program will provide a national research resource to inform thousands of research questions, covering a wide variety of health conditions. A diverse cohort of 1 million or more participants will contribute data from electronic health records (EHRs), biospecimens, surveys, and other measures to build a comprehensive set of biological, environmental, and behavioral data. The data platform will be open to researchers all over the world.
What is the composition of the All of Us cohort?
All of Us aims to engage a cohort of one million or more participants that reflects the rich diversity of the United States and its territories, including populations that have historically been underrepresented in biomedical research. The depth and breadth of data captured from this large, diverse cohort will enable research on a range of health topics and conditions.
The cohort is large and growing, with participants from all 50 states. Of participants with data available in the Researcher Workbench, about 84% self-identify as members of communities underrepresented in biomedical research, including about 43% who self-identify as members of racial and ethnic minority groups.
For more information about the data available, visit our Data Browser.
How does All of Us assess diversity? What communities does All of Us consider “underrepresented in biomedical research” (UBR)?
All of Us is committed to engaging a cohort that is demographically, geographically, and medically diverse.
Specifically, these are the populations the program considers underrepresented in biomedical research:
- Ancestry:
- Race: People who select a single race other than White (e.g., Asian), or who select more than one race
- Ethnicity: People who select an ethnicity other than those listed under the race of White (e.g., Japanese)
- Age: People who are 65 years of age or older at the time of primary consent
- Sexual and gender minorities:
- Sex assigned at birth: People who select intersex as their sex at birth
- Sexual orientation: People who select any sexual orientation choice other than straight (e.g., gay, lesbian, bisexual, queer, asexual, etc.)
- Gender identity: People who select any gender identity choice other than man or woman (e.g., non-binary, transgender, genderfluid, questioning, etc.) or whose gender identity is different from their sex assigned at birth
- Income: People with an annual household income at or below 200% of the Federal Poverty Level (FPL) based on residency (defined as the 48 contiguous states, Alaska, or Hawaii)* and household size
- *For participants not residing in the 48 contiguous states, Alaska, or Hawaii, the FPL for the contiguous 48 will be used
- Educational attainment: People without a high school diploma or GED
- Geography: Residents of established rural and non-metropolitan ZIP codes, based on the HRSA Federal Office of Rural Health Policy data files
- Disability: People with a physical, functional, cognitive, or other condition that substantially limits one or more life activities
- Health care access and utilization: People with inadequate access to health care, such as lacking health insurance, having no source of primary care, or being unable to obtain needed medical care within the past 12 months due to barriers
Will the All of Us cohort offer a representative sample of U.S. citizens?
No. The All of Us participant community will reflect the diversity of the United States, but cannot be described as a representative sample. Participants are not recruited via probability sampling; the research program is open to all.
How are participants recruited, and what does participation entail?
Many participants are invited to enroll by one of our partner health care provider organizations, which include large academic medical centers, VA medical centers, and community health centers across the country. Participants can also enroll directly through our website, JoinAllofUs.org, or at certain All of Us events.
All of Us participants are able to share different kinds of information by completing surveys, providing access to their electronic health records (EHRs), and syncing Fitbit devices within the All of Us participant portal. Some participants are invited to visit partner sites to have physical measurements and blood and urine samples taken. The program will stay in touch with participants over time about new opportunities to share data through additional surveys, new research studies, and new electronic tools, including apps.
What data are available for analysis?
Within the Cloud-based environment of the Researcher Workbench, registered researchers use R, Python, or SAS to link and analyze a variety of data types — surveys, physical measurements, electronic health records (EHRs), wearables, genomics — to conduct a wide range of studies.
How are you gathering and curating information from electronic health records?
The All of Us Research Program employs Observational Medical Outcomes Partnership (OMOP) Common Data Model Version 5 infrastructure to ensure feasibility and standardization across electronic health record (EHR) data for researchers. The All of Us data set is comprised of EHR data from 14 OMOP tables, including Person, Visit Occurrence, Condition Occurrence, Drug Exposure, Measurement, Procedure Occurrence, Observation, Location, Provider, Device Exposure, Death, Care Site, Fact Relationship, and Specimen.
Within the context of the Research Hub, EHR data will be presented at the highest level of granularity, which is EHR Domain. Domains include Demographics, Conditions, Procedures, Drugs, Measurements, and Visits.
What additional data will the program add in the future?
The breadth of data types collected continues to expand. In the near future, All of Us will begin analyzing biological and genomic assays on participants’ biospecimens. Upcoming surveys may address physical activity, diet, medications, environmental exposures, and more. Participants will also be able to contribute data from additional fitness trackers, mobile apps, and other digital health technology.
Are you collaborating with other cohort programs?
Yes. Our advisory panel has included representatives from large cohort studies in the United States and abroad, and All of Us leadership meets regularly with many U.S cohorts as well as an international consortium of large cohort programs to share best practices.
Will there be funding opportunities?
The National Institutes of Health (NIH) may issue funding announcements in the future to support research studies using All of Us data. For updates, visit AllofUs.nih.gov and subscribe.
To learn more about NIH funding opportunities generally, visit https://grants.nih.gov/grants/oer.htm
The Researcher Workbench features several tools to support data analysis:
- Workspace: A workspace is the place to store and analyze data for a specific project. Each Workspace has a dedicated space for file storage that can be shared with other users, allowing view-only or edit access.
- Cohort Builder: Within the workspace, the Cohort Builder’s guided user interface allows researchers to create, review, and annotate cohorts through a user-friendly point-and-click interface.
- Dataset Builder: The Dataset Builder provides users with the ability to select specific medical concepts and variables to build a data set for analysis.
- Analysis Tools: Through built-in applications like Jupyter Notebook (Python and R), RStudio, and SAS Studio, researchers can perform comprehensive analyses using programming languages R, Python, or SAS. Teams of researchers with various areas of expertise can work together on data cleaning and transformation, statistical modeling, machine learning, and more.
We offer training materials and Help Desk support for researchers who need assistance using these tools.
Additional tools may be added over time.
What is the Survey Explorer?
The Survey Explorer is a tool that allows you to browse the questions that the All of Us Program surveys ask and to see the source information for each of these questions.
How can I view full surveys?
Click the links below each survey title to view the full survey. Surveys are available in both English and Spanish.
Most survey questions used in the All of Us Program were sourced from other validated survey instruments. When you click ‘Explore Source Information’ you can click through each survey question to see where this question was originally used, a description of the source survey, the source year, and the source URL.
How does the program choose whether to create a question from scratch or use one from an existing survey?
For each survey topic, a task force of experts works together to create the survey. They start with questions that have already been used in other surveys (source instruments), such as from the National Health Interview Survey developed by the Centers for Disease Control and Prevention. If there are no publicly available survey questions that address the topic of interest then the task force will create their own.
What cross tabulations are available in the Data Browser?
In the Data Browser, you can perform simple cross tabulations between a single variable, such as a diagnosis of diabetes in electronic health record data, and either sex assigned at birth or age. To find these cross tabulations, search for a keyword, like “diabetes,” and click on the relevant results. The section will then open to display a cross tabulation bar graph with sex assigned at birth. You can select “age” to see the bar graph for specific age ranges.
What is genetic ancestry?
The Data Browser includes calculated genetic ancestry associations of variants. Genetic ancestry shows the part of the world where an individual’s ancestors may have lived. People whose ancestors lived in the same region of the world have similar patterns in their DNA. By comparing an individual’s DNA to the DNA of others whose ancestry we know, we can estimate where an individual’s ancestors may have lived.
Genetic ancestry is not the same as race and ethnicity. Race and ethnicity are concepts created by humans and are not determined by DNA. They are usually based on physical features, such as skin color, or shared language and culture. People of the same race or ethnicity may share the same genetic ancestry, but this is not always the case.
All of Us carries out an analysis that clusters individuals into groups based on the shared patterns in their DNA. This allows us to infer their genetic ancestry. The genetic ancestry category labels correspond to geographic locations where the individuals’ ancestors might have lived hundreds of years ago. Some individuals may not neatly fit the patterns of any of the genetic ancestry groups that we have displayed here. They may cluster with a different genetic ancestry group. Or they may not cluster fully with any group displayed here.
Genetic ancestry is more complex than what is included in the Data Browser. The available data is intended to provide a broad overview of genetic variation by ancestry. Genetic ancestry is linked to migration over time among populations. Individuals may have a blend of multiple ancestries. The specific details and categories aren’t captured by the Variant Search.
What is the purpose of the Data Browser?
The Data Browser is an interactive tool that allows you to learn more about the data collected as part of the All of Us Research Program. You can explore the survey questions and answers and physical measurements taken at the time of participant enrollment. You can also learn more about the electronic health record (EHR) data. The Data Browser will allow you to see how many of the All of Us participants have certain conditions, survey responses, demographics, and more.
The Data Browser was built with researchers in mind but also provides value to other users, including program participants, funders, the media and other stakeholders. Researchers may find information that allows them to develop hypotheses or assess the feasibility of the data set for their studies. Participants might be interested in comparing their survey responses with those of the group or exploring how many other participants have diseases relevant to themselves or a family member. Finally, the media, funders, and other stakeholders might be interested in learning about the participant group as a whole, including exploring the prevalence of specific conditions or drug exposures, or learning about response rates for the surveys.
How does the Data Browser protect participant privacy?
Participant privacy is protected in multiple ways. Personally identifiable information (PII) is any data that could potentially identify a specific individual. All PII, such as names and addresses are removed from participant records made available to the public and researchers. In addition, all data are rounded up to 20 participants. For example, if only 8 participants have a particular medical condition it will be displayed as 20.
It is not possible to view individual data records on the Data Browser. The Data Browser shows aggregate data for groups of de-identified participants.
All of Us program data are stored on a secure, encrypted platform that receives routine updates.
How does the Data Browser search electronic health record (EHR) data?
When enrolling in the All of Us Research Program, participants can consent to provide the program with access to their electronic health record (EHR) data. When a participant consents, the enrolling Health Provider Organization submits the EHR to the Data and Research Center. The Data Browser uses keywords to retrieve EHR information from the Data and Research Center. Information retrieved includes diagnoses, procedures, medications, measurements, etc. using keywords.
Why do the counts in the Data Browser differ from the current number of participants?
There may be a delay of several months between the time a participant consents and the time their record is included in the All of Us data that is available in the Data Browser. The delay is a result of the time it takes for participant data to be collected, transferred to the Data and Research Center and curated. As a result, the overall participant counts within the Data Browser are lower than the overall enrollment numbers for the program.
Why do the counts in the Data Browser differ from the counts on the Data Snapshots dashboard?
The Snapshots dataset includes those recently enrolled and the latest All of Us Research Program updates. The Data Browser counts may differ from Data Snapshot counts due to a delay of several months between the time a participant consents and the time his/her record is included in the All of Us data that is visible in the Data Browser. The delay is a result of the time it takes for participant data to be collected, transferred to the Data and Research Center and curated. Both datasets are considered valid by the All of Us Research Program for their intended purpose. Please use the appropriate dataset when estimating the statistic of interest, as statistics may vary in the Snapshots and Data Browser datasets. When referencing these data, please name the dataset (Snapshots or Browser) and date the statistics were estimated.
Why do the total counts in the Sex Assigned at Birth, Age, Sources, and Values graphs differ from the total participants count?
One of the steps All of Us takes to protect participant privacy in the Data Browser is to round all participant counts to the nearest multiple of 20. This is especially important for medical concepts, survey answers and demographic breakdowns that have relatively few participants. For example, participant counts of 0 – 20 are all rounded to 20. A participant count of 426 is displayed as 440 and so on. Because of this privacy methodology, the counts on the Sex Assigned at Birth, Age, Sources, and Values graphs may add up to more than the total participants count.
How are the Sex Assigned at Birth and Age Percentage (%s) calculated?
For EHR Domains – Sex assigned at birth percentages are calculated as the [Number of participants of each sex with this medical concept mentioned in their EHR] / [Total number of sex with EHR in this domain]
Age percentages are calculated as the [Number of participants in each age group with this medical concept mentioned in their EHR] / [Total number of age with EHR in this domain]
For Surveys – Sex assigned at birth percentages are calculated as the [Number of participants of each sex that selected this answer] / [Total number of sex who answered this question (excluding skip codes)]
Age percentages are calculated as the [Number of participants in each age group that selected this answer] / [Total number of age who answered this question (excluding skip codes)]
Where does the data come from?
The data in the All of Us Data Browser comes from participant electronic health records and from survey answers and physical measurements taken at the time the participant enrolls in the All of Us program.
Have participants consented to share this data?
What are medical concepts?
Medical concepts are similar to medical terms; they describe information in a patient’s medical record, such as a condition they have, a doctor’s diagnosis, a prescription they are taking, or a procedure or measurement the doctor performed. In the Data Browser we refer to conditions, procedures, drugs, and measurements as electronic health record (EHR) domains. For example, a patient’s weight (measurement) is often taken during a routine medical examination (procedure) or a patient may be diagnosed with type II diabetes (condition) and prescribed metformin (drug) to treat the condition.
What are vocabularies?
A patient’s electronic health record (EHR) may contain medical information that means the same thing but may have been recorded in many different ways. For example, the condition type II diabetes may be recorded as ICD9 code 250.00 at one doctor’s office or ICD10 code E11 at another. When All of Us receives a participant’s EHR, all of the codes (called source codes) are re-assigned a standard vocabulary code (e.g., for type II diabetes SNOMED 44054006). By changing or mapping all of the source codes to standard codes, the EHR can be more easily categorized and searched by researchers.
What do “source” and “standard” mean?
SOURCE – electronic health record (EHR) data enters our system with terms and codes for conditions, drugs, and procedures using “source vocabularies”. Source vocabularies are the original methods of classifying conditions, diagnoses and procedures (e.g. ICD9 and ICD10CM codes) and will be “mapped” to the new standard vocabularies. However, the source vocabularies are retained after the mapping and data can still be searched using the original terminology or codes.
STANDARD – Translation of clinical findings, symptoms, diagnoses, procedures, etc. from traditional methods of coding and classification into what is referred to as a “standard vocabulary” allow EHRs to be more readily categorized and searchable. Examples of standard vocabularies include SNOMED, LOINC, and RxNorm.
How often are the data updated?
Data are updated periodically.
What is SNOMED?
SNOMED stands for Systematized Nomenclature of Medicine. SNOMED connects the various terminology, medical codes, synonyms, and definitions used among different electronic health records (EHR). For example, one system might use ICD9 codes while another EHR system uses ICD10 codes. SNOMED allows the same data point from multiple EHR systems to be matched up.
What is LOINC?
LOINC stands for Logical Observation Identifiers Names and Codes. LOINC is used by health provider organizations to code laboratory test orders and results. For example, 2345-7 is the code used for the amount of glucose measured in your blood during a blood test.
What are ICD codes?
ICD stands for International Classification of Diseases. ICD codes are used in the United States to classify diseases, illnesses or injuries. There are various revisions of the codes, including ICD9 (Ninth Revision) and ICD10 (Tenth Revision).
What are CPT codes?
CPT stands for Current Procedural Terminology. CPT codes are a list of descriptive terms and identifying numeric codes used by physicians and health care professionals for billing of medical services and procedures.
What is RxNorm?
RxNorm is a naming system for all medications available in the U.S. market. The name of each drug is a compilation of its active ingredients, strength and form. Each combination, therefore, has a unique RxNorm name.
Is the data from the All of Us serology study, highlighted in Clinical Infectious Diseases, available to researchers?
Within the Researcher Workbench, a series of five tables enables All of Us researchers to replicate the analysis described in the journal article.1 At this time, the data is not linked to individual participant records.
For more information about replicating this research, please see the user support resources in the Workbench or contact support@researchallofus.org.
1Althoff, K., Schlueter, D.J., Anton-Culver, H., Cherry, J., Denny, J., Thomsen, I., …
Schully, S. (2021). Antibodies to SARS-CoV-2 in All of Us Research Program participants, January 2 – March 18, 2020. Clinical Infectious Diseases, ciab519, https://doi.org/10.1093/cid/ciab519
What COVID-19 data are available in the Researcher Workbench now?
The All of Us Data Dictionary provides researchers with the most robust description of data elements available within the Researcher Workbench.
Between May 2020 and February 2021, participants were invited to complete a series of six COVID-19 Participant Experience (COPE) surveys. The COPE Survey data can be readily linked to other data within the Researcher Workbench— including electronic health records, physical measurements and wearables data—enabling researchers to get a more holistic view of program participants’ COVID-19 experiences. Within the Controlled Tier, more granular data including vaccination status and COVID-related symptoms (through April 1, 2021).
What is All of Us doing to support COVID-19 research?
All of Us supports three discrete activities to support COVID-19 research:
- COVID-19 Participant Experience (COPE) Survey
- Antibody Testing: All of Us is supporting antibody testing on samples from recently enrolled participants to help researchers better understand the origins and spread of COVID-19 in the United States.
- EHR Data Standardization and Collection: We are accelerating the collection of EHR information about COVID-19 to help researchers learn more about symptoms and associated health problems, as well as the effects of different medicines and treatments. The data will be added to the Researcher Workbench as soon as possible and support national research about individual symptomaticity, morbidity, mortality, and more.
What is the COPE Survey?
The All of Us Research Program has developed a survey designed for All of Us participants to contribute information about how COVID-19 is impacting their physical and mental health. This survey is referred to as the COVID-19 Participant Experience, or COPE survey. The first COPE survey was released on May 7, 2020. Additional surveys went out in June and July. A shorter version of the survey is available for November 2020, December 2020, and January 2021.
Is there a cost to use the All of Us genomic data?
There is no cost for researchers to register with the All of Us Research Program and to begin working within the dataset. Researchers will incur costs for computation and data storage, however.
The All of Us Research Program provides $300 in initial credits for each registered Researcher Workbench user. Additional charges must be covered by the researcher through their billing accounts. Resources to help researchers estimate costs are provided within the Researcher Workbench itself, on the User Support Hub. Researchers can find examples of how much genomic data can cost to analyze in the User Support Hub (login required).
How can I access All of Us genomic data?
All of Us genomic data are only available through the Controlled Tier of the Researcher Workbench.
Currently, only registered researchers whose institutions have Data Use and Registration Agreements in place with All of Us that include the Controlled Tier can access genomic data. Visit the Institutional Agreements page to check your institution’s access.
If your institution has access, you can follow the steps on our Register page to become an All of Us researcher. If your institution does not have a Data Use and Registration Agreement (DURA) in place with All of Us, or if your institution’s current DURA does not yet allow for Controlled Tier access, you can initiate the process here.
How much genomic data does All of Us have?
The Researcher Workbench’s Controlled Tier includes data from more than 447,000 participants with genotyping arrays, more than 414,000 with short-read whole genome sequences (WGS), more than 97,000 with structural variants, and more than 2,700 with long-read WGS. To learn more about these data, please visit the Data Browser.
What if I have trouble signing in to the Researcher Workbench?
The All of Us Researcher Workbench uses Google sign in for all accounts. This requires users to authenticate their account with Google and set cookies in the browser. If you are having trouble signing in, these suggestions may help:
- Confirm that you are using Google Chrome. Other browsers are not supported at this time.
- Check that you are signing into the workbench using your @researchallofus.org account. If you have forgotten your username, refer to the workbench welcome email.
- If you have cookie-blockers enabled, either disable them, or whitelist “https://workbench.researchallofus.org” and “https://accounts.google.com“ (More information can be found here.)
- If you have Chrome extensions installed:
- Try creating a new Chrome profile without any extensions. (We recommend setting up a separate Chrome profile for your @researchallofus.org account). Confirm if this resolves the issue by refreshing the page.
- If you continue using your Chrome extensions on this new profile, you are required to disable or reconfigure the specific extensions. For example, some privacy extensions, such as Privacy Badger, may need to be configured to allow cross-site cookies on accounts.google.com.
If you are still unable to sign in after following these steps, please contact support@researchallofus.org
How do I access the research data?
Accessing the Researcher Workbench data is easy and takes only a few steps. If you are interested in applying for Researcher Workbench access, please visit the Register page for information on the steps you will need to complete.
I don’t see my institution listed on the Researcher Workbench registration page. Can I still access the data?
For you to access the Registered Tier and Controlled Tier data, your institution will need to have signed a Data Use and Registration Agreement with the All of Us Research Program. If your institution is not listed, that means your institution does not have an agreement with the program yet. You can help initiate one by submitting a request. Note that it may take some time to initiate the agreement. In the meantime, you can view the public All of Us Data Snapshots, Data Browser, and Survey Explorer.
What information about me is displayed publicly and why?
The Research Projects Directory will display your name, institution, and role. This information will be displayed along with the Research Purpose Description you provided for each of your workspaces (and for your shared access workspaces). This provides All of Us participants information about who is using their data and the research the data are enabling. The All of Us Research Program also makes this information publicly available on AllofUs.nih.gov to comply with the 21st Century Cures Act.
Why is my research project information shared publicly?
The All of Us Research Program is committed to being transparent with its research participants about the purpose of the research that uses their data. Any participant or member of the public can request that the All of Us Resource Access Board (RAB) review a research purpose description if they have concerns that your research projects may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. The RAB will review the request and contact you if action is needed to address concerns.
What is the Resource Access Board (RAB)?
The Resource Access Board (RAB) is charged with reviewing and auditing research projects to determine whether they may potentially stigmatize research participants or violate the Data User Code of Conduct in any other way. The RAB is composed of experts in human subjects research, research ethics, and privacy and security, as well as participant representatives.
Can I request a review of my own research project?
Yes. When you create a workspace, you will be prompted to request a Resource Access Board (RAB) review of your research purpose if you are concerned about potential stigmatization of research participants. If you request a RAB review, you can expect a response within 5 business days. In the meantime, you can continue with your research.
What happens if someone else requests a review of my research project?
The requester will fill out a form describing their specific concerns. This form is sent to the Resource Access Board (RAB) for review. If more information or remediation is needed, the RAB will contact the workspace owner.
What if the Resource Access Board (RAB) finds concerns for stigmatization with a research project?
The RAB will contact the workspace owner.
Am I obligated to share any publications with the program?
Yes. As a condition of your data access, you must inform the program of any upcoming publications resulting from access to All of Us Research Program data at least 2 weeks before the date of publication or presentation. This includes peer-reviewed manuscripts, conference abstracts, and/or presentations. You can do this by contacting User Support in your Researcher Workbench account.
Your manuscript will not go through program review. The information will only be used to help the program prepare for any media coverage or communication surrounding the upcoming publication. Embargoes will be honored.
Additionally, users must submit an electronic version of a final, peer-reviewed manuscript to PubMed Central immediately upon acceptance for publication, to be made publicly available immediately without any embargo period once published.
How do I cite or acknowledge the All of Us Research Program in publications?
Work that uses All of Us data must honor the contribution of those who take part in All of Us to the Research Project’s work. This includes acknowledgement in all oral and written presentations, disclosures, and publications resulting from any analyses of the data. Learn more and find the citation language on the Data Access Tiers page.
What data are available for download in the Researcher Workbench?
The Researcher Workbench protects participant data by enabling researchers to analyze All of Us data within the Researcher Workbench without taking the participant-level data out of the secure cloud environment. You must not download, copy, or take screenshots of individual participant-level (or row-level) data and remove it from the All of Us Research Program environment.
What are the rules/policies on import of external data, codes, or files into my workspace?
You may upload or import external data, codes, or files into your workspace for the sole purpose of the research that you have described. You may not link Registered or Controlled Tier All of Us Research Program data at the participant level with participant-level data from other sources without the explicit, documented permission of the All of Us Research Program. You may apply for such permission from the RAB by emailing aouresourceaccess@od.nih.gov.
You are responsible for ensuring that you have the appropriate rights to anything you upload into the system and that you have removed all of the personally identifiable information (PII) from any data or files you upload. Guidance on removing PII from data is available on the User Support Hub.
For further details on policies related to the import of external content into your workspace, refer to the All of Us Terms of Use and Data User Code of Conduct.
Is there a cost to use the Researcher Workbench?
There is no cost to access the Researcher Workbench. Computation costs for analyses, however, may be incurred through Google Cloud Platform. The All of Us Research Program provides $300 in initial credits for each registered Researcher Workbench user. These credits will help pay for preliminary storage and initial computational needs as researchers get started using the Researcher Workbench. Researchers are able to link billing accounts to their Researcher Workbench account following the usage of the initial credits.
What is a cohort?
A cohort is a group of participants whom researchers are interested in studying. Researchers can create cohorts by adding inclusion or exclusion criteria.
What are analysis files?
Analysis files are where researchers can perform comprehensive analyses on cohorts and data sets using programming languages R, Python, or SAS.
What is a concept set?
Concepts describe information in a patient’s medical record, such as a condition, a prescription they are taking, or their vital signs. Subject areas such as conditions, drugs, measurements, etc. are called “domains”. Users can search for and save collections of concepts from a particular domain as a “concept set” and then use concept sets and cohorts to create a dataset, which can be used for analysis.
What is a dataset?
Datasets are analysis-ready tables that can be exported to analysis tools such as Jupyter Notebook, RStudio, and SAS Studio. Users can build and preview a dataset for one or more cohorts by selecting the desired concept sets and values for the cohorts.
What is OMOP?
The All of Us Research Program employs Observational Medical Outcomes Partnership (OMOP) Common Data Model Version 5 infrastructure to ensure feasibility and standardization across all program data types (physical measurements, electronic health records and participant provided information). Data coming from disparate sources are standardized (see What do “source” and “standard” mean? above) and stored in a set of formally described tables with defined relationships. This allows data to be accessed and connected in many different ways by researchers. Learn more about the OHDSI OMOP CDM initiative here.
What resources are available for researchers interested in survey data?
Participants in the All of Us Research Program respond to surveys spanning a variety of topics, including demographics, health care, and lifestyle. Each survey has been tested for readability and accessibility through cognitive interviews and quantitative testing. This testing process included populations from different educational backgrounds and geographic locations to capture a sample reflective of the U.S. population. You can preview the survey questions on the Survey Explorer. Previewing the available questions can help you prepare your research questions and approach. The All of Us Researcher Workbench provides researchers with a variety of supportive materials for conducting survey research with the All of Us dataset.
- The All of Us Research Program is very careful to protect the privacy of our participants. We follow privacy and data security rules to ensure the protection of participant data. This includes removing all personally identifying information (PII) from participant records as well as withholding and/or generalizing data that might be considered at “at risk” for participant re-identification. Because these methods affect what data are available for analysis, we provide multiple sources of documentation of our participant privacy protection methodology to all registered researchers that outline data removal, transformations, and generalizations made. This information can be found within our “Documentation” category in the User Support Hub (under “Resources for Survey Data Research”).
- Within the Researcher Workbench, three resources are available to help you search for and understand variables of interest: survey codebooks (pdfs), links to the All of Us Registered Tier CDR Data Dictionary (online spreadsheet), and Athena (a searchable database). Athena links survey questions and answers to their corresponding source as well as “standard concept IDs.”
For example, let’s say your research wants to include data that reflects how often the participants in your cohort smoke cigarettes. The “Lifestyle Survey”’ includes questions about cigarette smoking habits of participants (e.g., “Do you now smoke cigarettes every day, some days, or not at all?”). If you are interested in including only those participants who smoke every day, you can look up the concept ID (SmokeFrequency_EveryDay) and standard concept ID for that specific answer (45881677) in our survey codebook, so when you are ready to analyze your data, you can make sure to extract data including that concept ID. You could also log in to Athena and search for that information by typing in the concept ID in the search bar (make sure to check “PPI” under the vocabulary drop down menu). Athena provides the concept ID as well as additional contextual information that you might find useful (e.g., it will show that concept ID is the answer to the question “Do you now smoke cigarettes every day, some days, or not at all?” which falls under the parent code of “Smoking Frequency”).
- The All of Us Survey Codebook and Frequency Distribution Guide is a featured workspace in the Researcher Workbench that all users can access immediately. This workspace provides detailed instructions on how to extract survey data from the All of Us data repository and visualize the data in both tables and graphs. Researchers can copy this workspace and use it as a template.
Do I need my project reviewed by the All of Us Institutional Review Board (IRB) in order to access this data using the Researcher Workbench?
No. As noted in the All of Us Responsible Conduct of Research training, the Researcher Workbench employs a data passport model, through which authorized users do not need IRB review for each research project. Most authorized users will not be conducting human subjects research with All of Us data for two reasons: (1) The research will not directly involve participants, only their data; and (2) the data available in the Researcher Workbench has been carefully checked and altered to remove identifying information while preserving its scientific utility. Nevertheless, we encourage anyone using All of Us data to apply the ethical principles of research with human participants to their work.
You can read more in the letter confirming the All of Us Institutional Review Board’s regulatory opinion.
Do I need Institutional Review Board (IRB) approval from my own institution in order to access this data through the Researcher Workbench?
Researchers should always check with their local institutional review board to ensure compliance with local requirements for conduct of research. We have provided the template language below as a resource to use for local IRB applications.
“The Registered Tier and Controlled Tier data available on the Research Hub contains data from participants who have consented to be involved in the All of Us Research Program, including data from electronic health records (EHRs), surveys, and physical measurements. All data available to researchers has had direct identifiers removed and has been further modified to minimize re-identification risks. This includes removing all explicit identifiers in both EHRs and participant provided information, all free-text fields, geolocation data smaller than U.S. state level, living situations, race and ethnicity subcategories, active duty military status, cause of death, and diagnosis codes subject to public knowledge. Additionally, the following demographic fields are generalized: race and ethnicity, education, employment, and information regarding sex at birth, gender identity, and sexual orientation. Also, all dates are systematically shifted backwards by a random number between 1 and 365, and data from participants over the age of 89 are removed. The All of Us Research Program data will be accessed for research strictly using the Researcher Workbench (researchallofus.org). External data can be brought into this secure environment; however, researchers are restricted from importing any individually identifiable information and from row-level linkage of the external data. Data searches, cohort building, and analysis will solely take place on the Researcher Workbench, a secure cloud-based resource with statistical analysis software available for use with All of Us data. Researchers are granted access to the Researcher Workbench after their affiliated institution signs a Data Use and Registration Agreement, and they create an account, including setting up two-factor authentication, verify their identity through Login.gov or ID.me, complete the All of Us Responsible Conduct of Research training, and sign a Data User Code of Conduct, which prohibits any re-identification of All of Us participants. For more information, please visit researchallofus.org.”
What are my responsibilities when relying on the All of Us Institutional Review Board (IRB)?
You are responsible for adhering to the decisions and review processes of the All of Us IRB just as you would your own local IRB.
Additionally, you must meet the requirements of your institution and must check with your local Human Research Protection Program (HRPP) regarding local submission and reporting requirements.
What do “source” and “standard” mean?
SOURCE – electronic health record (EHR) data enters our system with terms and codes for conditions, drugs, and procedures using “source vocabularies”. Source vocabularies are the original methods of classifying conditions, diagnoses and procedures (e.g. ICD9 and ICD10CM codes) and will be “mapped” to the new standard vocabularies. However, the source vocabularies are retained after the mapping and data can still be searched using the original terminology or codes.
STANDARD – Translation of clinical findings, symptoms, diagnoses, procedures, etc. from traditional methods of coding and classification into what is referred to as a “standard vocabulary” allow EHRs to be more readily categorized and searchable. Examples of standard vocabularies include SNOMED, LOINC, and RxNorm.