-
The CDC’s Behavioral Risk Factor Surveillance System (BRFSS) collects ongoing, large-scale telephone survey data from adults in all 50 states and territories. Respondents self-report information on health-related behaviors, chronic conditions, and preventive care. This national survey is the original source of all heart-health data used in the project.
-
In 2017, a subset of BRFSS data is extracted specifically for Alabama. This subset includes selected variables related to heart disease risk (such as chronic conditions, behaviors, and use of preventive services) and is exported from the BRFSS system into a CSV file
-
In 2017, a subset of BRFSS data is extracted specifically for Alabama. This subset includes selected variables related to heart disease risk (such as chronic conditions, behaviors, and use of preventive services) and is exported from the BRFSS system into a CSV file
-
At the new site, the CSV file is imported into the SQL database. During this process, character encoding issues cause some rows to be unreadable by the system, so those rows fail to import. The resulting SQL table contains fewer records than the original CSV, but this mismatch is not immediately documented, creating a hidden data quality problem.
-
In June 2018, a newly hired internal data analyst is tasked with re-reviewing the “master” dataset to identify the top three heart-health concerns for the new facility. While comparing data sources, the analyst discovers that the record count in SQL does not match the original CSV. This discovery reveals that data was lost during migration and prompts the need for a detailed review of data provenance and management practices.
-
An IT system administrator copies the CSV file and codebook from the secure hard drive onto a flash drive. The flash drive is physically transported to the new facility, where it is used as the source media for loading the data into the SQL database.
-
As the organization expands, it receives funding to implement a new SQL database to manage and analyze data for the new facility. Leadership decides that the existing BRFSS Alabama CSV will be migrated into this SQL environment so it can serve as a central data source.