The first blog in our “Intro to Data” series introduced the idea of “big data” encompassing volume, velocity, and variety. Our second blog provided some examples of big data applications within the Military Health System (MHS). In this third blog, we’re going to talk about the difference between data, information, knowledge, and wisdom, illustrating these concepts with some examples from our work at the Psychological Health Center of Excellence (PHCoE).
Data are just bits and pieces of concepts representing entities or ideas. Data could be a list of books, dates entered on a piece of paper, or a collection of numbers. Raw data combined in relation to other raw data is information. Perhaps the list of dates combined with the list of numbers represents a person’s blood pressure at various times. Knowledge is when meaning is added to the information. Was the person taking blood pressure medication, and if so, how did that affect the blood pressure over time? Wisdom, sitting atop information and knowledge in the “DIKW pyramid,” is the ability to add insight to knowledge and to analyze information, to see beyond its mere collection of facts, and to make use of it in meaningful ways. Wisdom leads to better informed decisions and often leads to further questions.
The MHS Data Repository (MDR) is a data warehouse that stores data collected at health care encounters and entered into electronic medical records. Within MDR, there are numerous datasets. Each dataset contains a discrete collection of specific pieces of information. There are separate datasets for inpatient encounters versus outpatient encounters, direct care versus purchased care, and pharmacy and enrollment information. These datasets are all just information. Knowledge can be gained by summarizing the data in order to answer specific questions.
A simple question could be: “How many patients did provider X see in 2016?” Or: “How many patients received Y procedure?” Complicated questions asked of the data require programming effort to abstract the appropriate data fields from the various datasets, perhaps creating new data points (variables) based on user-defined definitions. For example, let’s think about how we might determine the prevalence of PTSD in the MHS over the past five years. The MDR datasets containing diagnosis information include multiple variables associated with the diagnoses coded at each encounter. Summarizing diagnosis information from these datasets requires deciding how many diagnosis variables to include. Is the first position diagnosis variable sufficient to identify patients with PTSD? Or should we include diagnoses from positions 1 and 2, or more? Also, which diagnosis codes do we want to use to define a visit related to PTSD? Do we want to know about prevalence of acute versus chronic PTSD separately? Determining the change in prevalence of a certain diagnosis over time is a more complicated query because it requires defining the methodology to use for abstracting diagnosis codes.
When we ask questions of the data, the best possible answers are still summary numbers – usually frequencies or counts and percentages. From this summary format, it takes wisdom to interpret the results, to discern their validity, to assign meaning to them, and to suggest further inquiry. This wisdom stems from knowledge of the program or practice which feeds the data into the system, as well as knowledge of external factors that could influence the data quality and its representation of the underlying truth.
Questions that we frequently ask ourselves and our stakeholders are: Does a particular variable have the same meaning over time? Do different providers assign different codes to the same event/visit/procedure? Was the definition used to categorize an event correct? How many variables have missing values and how does that impact our confidence in the results? What assumptions were made about the data? Were correct statistical techniques used in the analysis and were the underlying assumptions of the statistical tests met? For best results, analysts, subject matter experts, and other key stakeholders need to work closely together so that all these considerations are taken into account.
In summary, within the MDR (and other MHS administrative data portals), there’s a vast collection of data, but it’s what we do with the data – which questions we choose to answer and the context we add to those answers – that allows us to move from many bits of information to evidence-based program planning and evaluation. At PHCoE, we’re excited to dive into the data to explore some of these relevant topics:
Mental health outcomes (e.g. incidence and prevalence) among occupational sub-groups
Staffing, utilization, and workload among mental health care providers at mental health outpatient clinics
Follow-up mental health care post-hospitalization for specific mental health conditions
Pathways of care for transgender beneficiaries
Cost of care for survivors of sexual assault
What else would be useful for you to know at the DoD population-level that MHS administrative data could help answer? Questions and comments are welcome below, and could spark ideas for future blogs or web site posts.
Ruth Quah is a data analyst with the Psychological Health Performance and Analytics team at the Psychological Health Center of Excellence. She has a Master of Public Health degree in biometry.
The views expressed in Clinician's Corner blogs are solely those of the author and do not necessarily reflect the opinion of the Psychological Health Center of Excellence or Department of Defense.