Frequently Asked Questions

Frequently Asked Questions

Everything you need to know about the Research Hub. Choose a topic to learn more.

Researcher FAQs

How is the All of Us Research Program different from other longitudinal cohort studies?

Unlike many research studies that focus on a specific disease or population, the All of Us Research Program will provide a national research resource to inform thousands of research questions, covering a wide variety of health conditions. A diverse cohort of 1 million or more participants will contribute data from electronic health records (EHRs), biospecimens, surveys, and other measures to build a comprehensive set of biological, environmental, and behavioral data. The data platform will be open to researchers all over the world.

What is the composition of the All of Us cohort?

All of Us aims to engage a cohort of 1 million or more participants that reflects the rich diversity of America, including populations that have historically been underrepresented in biomedical research. The depth and breadth of data captured from this large, diverse cohort will enable research on a range of health topics and conditions, including common and uncommon conditions.

The cohort is large and growing, with participants from all 50 states. Of our participants who have completed all of the steps of the initial protocol, more than 75% are from underrepresented populations, including 50% from racial and ethnic minority groups. The program is currently enrolling pregnant women and women who become pregnant during the study. We have plans to include children and other special populations in the future.

For more information about the All of Us participant community, visit our Data Browser or Data Snapshots.

How does All of Us assess diversity? What communities does All of Us consider “underrepresented in biomedical research?”

All of Us is committed to engaging a cohort that is demographically, geographically, and medically diverse.

Specifically, these are the populations the program considers underrepresented in biomedical research, across different diversity categories:

  • Ancestry (Race and Ethnicity):
    • Race: People who select a single race other than white (e.g., Asian), or who select more than one race
    • Ethnicity: People who select an ethnicity other than those listed under the race of white (e.g., Japanese)
  • Age: Young people under 18 years old and older adults 65 and above
  • Sex: People who self-report intersex as their sex at birth
  • Sexual and Gender Minorities:
    • Sexual orientation: People who select any additional sexual orientation choice other than straight (e.g., gay, lesbian, bisexual, queer, asexual, etc.)
    • Gender identity: People who select any additional gender identity choice other than man or woman (e.g., non-binary, transgender, genderfluid, questioning, etc.)
  • Income: People who earn less than $25,000 a year
  • Educational attainment: People without a high school diploma or G.E.D.
  • Access to care: People who currently need a medical visit, or have needed one in the past 12 months, but cannot readily use the health care system or pay for needed care
  • Geography: Residents of established rural and non-metropolitan zip codes, based on U.S. Census data
  • Disability: People with a physical or cognitive disability, as defined by the Americans with Disabilities Act

Will the All of Us cohort offer a representative sample of U.S. citizens?

No. The All of Us participant community will reflect the diversity of the United States, but cannot be described as a representative sample. Participants are not recruited via probability sampling; the research program is open to all.

How are participants recruited, and what does participation entail?

Many participants are invited to enroll by one of our partner health care provider organizations, which include large academic medical centers, VA medical centers, and community health centers across the country. Participants can also enroll directly through our website, JoinAllofUs.org, or at certain All of Us events.

All of Us participants are able to share different kinds of information by completing surveys, providing access to their electronic health records, and syncing Fitbit devices within the All of Us participant portal. Some participants are invited to visit partner sites to have physical measurements and blood and urine samples taken. The program will stay in touch with participants over time about new opportunities to share data, through additional surveys, new research studies, and new electronic tools, including apps.

What data is available for analysis?

The Research Hub will house an array of data collected by the All of Us Research Program. Data types expected to be included in our 2019 release include participant-provided information via surveys, physical measurements, and electronic health records. 

Current survey subjects include basic demographics, overall health, and lifestyle. Physical measurements include blood pressure, heart rate, height, weight, waist circumference, and hip circumference.

How are you gathering and curating information from electronic health records?

The All of Us Research Program employs Observational Medical Outcomes Partnership (OMOP) Common Data Model Version 5 infrastructure to ensure feasibility and standardization across EHR data for researchers. The All of Us data set is comprised of EHR data from 14 OMOP tables, including Person, Visit Occurrence, Condition Occurrence, Drug Exposure, Measurement, Procedure Occurrence, Observation, Location, Provider, Device Exposure, Death, Care Site, Fact Relationship, and Specimen.

Within the context of the Research Hub, EHR data will be presented at the highest level of granularity, which is EHR Domain. Domains include Demographics, Conditions, Procedures, Drugs, Measurements, and Visits.

What additional data will the program add in the future?

The breadth of data types collected continues to expand. In the near future, All of Us will begin analyzing biological and genomic assays on participants’ biospecimens. Upcoming surveys may address physical activity, diet, medications, environmental exposures, and more. Participants will also be able to contribute data from additional fitness trackers, mobile apps, and other digital health technology.

Are you collaborating with other cohort programs?

Yes. Our advisory panel has included representatives from large cohort studies in the United States and abroad, and All of Us leadership meets regularly with many U.S cohorts as well as an international consortium of large cohort programs to share best practices.

Will there be funding opportunities?

The National Institutes of Health (NIH) may issue funding announcements in the future to support research studies using All of Us data. For updates, visit AllofUs.nih.gov and subscribe.

To learn more about NIH funding opportunities generally, visit https://grants.nih.gov/grants/oer.htm

What is the All of Us Research Program’s scientific vision? Are there specific priority areas?

The program aims to enable research that will:

  • Increase wellness and resilience, and promote healthy living
  • Reduce health disparities and improve health equity in populations that are historically underrepresented in research
  • Develop improved risk assessment and prevention strategies to preempt disease
  • Provide earlier and more accurate diagnosis to decrease illness burden
  • Improve health outcomes and reduce disease burden through improved treatment and development of precision interventions.

What tools will be available for me to analyze the data?

The Research Hub’s Workbench features several tools to support data analysis:

  • Workspace: A Workspace is the place to store and analyze data for a specific project. Each Workspace has a dedicated space for file storage that can be shared with other users, allowing view-only or edit access.
  • Cohort Builder: Within the Workspace, the Cohort Builder’s guided user interface allows researchers to create, review, and annotate cohorts through a user-friendly point-and-click interface.
  • Data Set Selector: The Data Set Selector provides users with the ability to select specific medical concepts and variables to build a data set for analysis.
  • Notebooks: Through the built-in application Jupyter Notebook, users can perform comprehensive analyses on cohorts and data sets using programming languages R or Python. Teams of researchers with various areas of expertise can work together on data cleaning and transformation, statistical modeling, machine learning, and more.

We offer training materials and Help Desk support for researchers who need assistance using these tools.

Additional tools may be added over time.

How can I give feedback about the program or make suggestions?

For more information or to offer general feedback on the All of Us Research Program, email help@JoinAllofUs.org. To make specific suggestions about the Research Hub, log in and click the Help Desk tab to open a ticket. Select “Feedback,” and provide your name, email address, the tool about which you’re providing feedback, and your suggestion.

Survey Explorer FAQs

What is the Survey Explorer?

The Survey Explorer is a tool that allows you to browse the questions that the All of Us Program surveys ask and to see the source information for each of these questions.

How can I view full surveys?

Click the links below each survey title to view the full survey. Surveys are available in both English and Spanish.

What is source information?

Most survey questions used in the All of Us Program were sourced from other validated survey instruments. When you click ‘Explore Source Information’ you can click through each survey question to see where this question was originally used, a description of the source survey, the source year, and the source URL.

How does the program choose whether to create a question from scratch or use one from an existing survey?

For each survey topic, a task force of experts works together to create the survey. They start with questions that have already been used in other surveys (source instruments), such as from the National Health Interview Survey developed by the Centers for Disease Control and Prevention. If there are no publicly available survey questions that address the topic of interest, then the task force will create their own.

Data Browser FAQs

What is the purpose of the Data Browser?

The Data Browser is an interactive tool that allows you to learn more about the data collected as part of the All of Us Research Program. You can explore the survey questions and answers and physical measurements taken at the time of participant enrollment. You can also learn more about the electronic health record (EHR) data. The Data Browser will allow you to see how many of the All of Us participants have certain conditions, survey responses, demographics, and more.

The Data Browser was built with researchers in mind but also provides value to other users, including program participants, funders, the media and other stakeholders. Researchers may find information that allows them to develop hypotheses or assess the feasibility of the data set for their studies. Participants might be interested in comparing their survey responses with those of the group or exploring how many other participants have diseases relevant to themselves or a family member. Finally, the media, funders and other stakeholders might be interested in learning about the participant group as a whole, including exploring the prevalence of specific conditions or drug exposures or learning about response rates for the surveys.

How does the Data Browser protect participant privacy?

Participant privacy is protected in multiple ways. Personally identifiable information (PII) is any data that could potentially identify a specific individual. All PII, such as names and addresses are removed from participant records made available to the public and researchers. In addition, all data are rounded up to 20 participants. For example, if only 8 participants have a particular medical condition it will be displayed as ≤ 20.

It is not possible to view individual data records on the Data Browser. The Data Browser shows aggregate data for groups of de-identified participants.

All of Us program data is stored on a secure, encrypted platform that receives routine updates.  

How does the Data Browser search electronic health record (EHR) data?

When enrolling in the All of Us Research Program, participants can consent to provide the program with access to their electronic health record (EHR) data. When a participant consents, the enrolling Health Provider Organization submits the EHR to the Data and Research Center. The Data Browser uses keywords to retrieve EHR information from the Data and Research Center. Information retrieved includes diagnoses, procedures, medications, measurements, etc. using keywords.

Why do the counts in the Data Browser differ from the current number of participants?

There may be a delay of several months between the time a participant consents and the time their record is included in the All of Us data that is available in the Data Browser. The delay is a result of the time it takes for participant data to be collected, transferred to the Data and Research Center and curated. As a result, the overall participant counts within the Data Browser are lower than the overall enrollment numbers for the program.

Why do the counts in the Data Browser differ from the counts on the Data Snapshots dashboard?

Data Browser counts may differ from Data Snapshot counts due to a delay of several months between the time a participant consents and the time their record is included in the All of Us data that is visible in the Data Browser. The delay is a result of the time it takes for participant data to be collected, transferred to the Data and Research Center and curated.

Why do the total counts in the Sex Assigned at Birth, Age, Sources and Values graphs differ from the total participants count?

One of the steps All of Us takes to protect participant privacy in the Data Browser is to round all participant counts to the nearest multiple of 20. This is especially important for medical concepts, survey answers and demographic breakdowns that have relatively few participants. For example, participant counts of 0 – 20 are all rounded to 20. A participant count of 426 is displayed as 440 and so on. Because of this privacy methodology, the counts on the Sex Assigned at Birth, Age, Sources and Values graphs may add up to more than the total participants count.

Where does the data come from?

The data in the All of Us Data Browser comes from participant electronic health records and from survey answers and physical measurements taken at the time the participant enrolls in the All of Us program.

Have participants consented to share this data?

Yes, participants have consented to share the data found in the Data Browser.

What are medical concepts?

Medical concepts are similar to medical terms; they describe information in a patient’s medical record, such as a condition they have, a doctor’s diagnosis, a prescription they are taking, or a procedure or measurement the doctor performed. In the Data Browser we refer to conditions, procedures, drugs and measurements as electronic health record (EHR) domains. For example, a patient’s weight (measurement) is often taken during a routine medical examination (procedure) or a patient may be diagnosed with type II diabetes (condition) and prescribed metformin (drug) to treat the condition.

What are vocabularies?

A patient’s electronic health record (EHR) may contain medical information that means the same thing but may have been recorded in many different ways. For example, the condition type II diabetes may be recorded as ICD9 code 250.00 at one doctor’s office or ICD10 code E11 at another. When All of Us receives a participants EHR, all of the codes (called source codes) are re-assigned a standard vocabulary code (e.g., for type II diabetes SNOMED 44054006). By changing, or mapping, all of the source codes to standard codes, the EHR can be more easily categorized and searched by researchers.

What do “source” and “standard” mean?

SOURCE – electronic health record (EHR) data enters our system with terms and codes for conditions, drugs and procedures using “source vocabularies”. Source vocabularies are the original methods of classifying conditions, diagnoses and procedures (e.g. ICD9 and ICD10CM codes) and will be “mapped” to the new standard vocabularies. However, the source vocabularies are retained after the mapping and data can still be searched using the original terminology or codes.

STANDARD – Translation of clinical findings, symptoms, diagnoses, procedures, etc. from traditional methods of coding and classification into what is referred to as a “standard vocabulary” allow EHRs to be more readily categorized and searchable. Examples of standard vocabularies include SNOMED, LOINC, and RxNorm.

How often is the data updated?

Data is updated periodically.

How do I access the research data?

If you are a physician, graduate student, researcher, data scientist or otherwise have a need for building research cohorts and analyzing data, you will soon be able to apply for access to the All of Us Researcher Workbench.

Why can’t I see cross tabulations?

Cross tabulation is a method to analyze the relationship between more than one variable. For example, a researcher may want to know if people who smoke cigarettes (variable 1) are more likely to have lung cancer (variable 2) than those who do not smoke. The Data Browser allows you to explore participant data, but not to do cross tabulation as a way to protect participant privacy and stigmatization of a specific group of people.    

What is SNOMED?

SNOMED stands for Systematized Nomenclature of Medicine. SNOMED connects the various terminology, medical codes, synonyms and definitions used among different electronic health records (EHR). For example, one system might use ICD9 codes while another EHR system uses ICD10 codes. SNOMED allows the same data point from multiple EHR systems to be matched up.

What is LOINC?

LOINC stands for Logical Observation Identifiers Names and Codes. LOINC is used by health provider organizations to code laboratory test orders and results. For example, 2345-7 is the code used for the amount of glucose measured in your blood during a blood test.

What are ICD codes?

ICD stands for International Classification of Diseases. ICD codes are used in the United States to classify diseases, illnesses or injuries. There are various revisions of the codes, including ICD9 (Ninth Revision) and ICD10 (Tenth Revision).

What are CPT codes?

CPT stands for Current Procedural Terminology. CPT codes are a list of descriptive terms and identifying numeric codes used by physicians and healthcare professionals for billing of medical services and procedures.

What is RxNorm?

RxNorm is a naming system for all medications available in the U.S. market. The name of each drug is a compilation of its active ingredients, strength and form. Each combination, therefore, has a unique RxNorm name.