-
Source: CDC's Behavioral Risk Factor Surveillance System (BRFSS)
Method: Health-related telephone surveys across all 50 states, DC, and 3 US territories
Scope: 400,000+ adult interviews annually
Data Type: Quantitative survey data (health behaviors, chronic conditions, preventive services)
Personnel: CDC survey administrators and data collectors -
Reason: Business expansion to new location
Method: Data and codebook copied to flash drive
Transport: Physical delivery to new facility
Security: Data physically transported by IT system administrator
Personnel: IT system administrator
Provenance Impact: Data moved through multiple physical locations and handlers -
Target System: New SQL database at expanded location
Funding: New location received funding for SQL database infrastructure
Action: Immediate import of CSV file into SQL database
Format Change: CSV → SQL database tables
Personnel: IT system administrator and database team -
Action: Data retrieved from BRFSS database
Format: Converted to CSV format
Storage: Saved on secure hard drive in locked room
Subset: Alabama data with sampled risk factor columns
Analysis: Initial review of medical care access issues in Alabama
Personnel: Original facility data analysts -
Problem: Character encoding incompatibility
Cause: CSV file contained characters unreadable by SQL database
Result: Some data rows failed to import properly
Impact: Import quantity did not match original CSV file
Data Quality: Validity compromised due to incomplete dataset
Discovery: Issue not immediately detected -
Status: Missing data identified, seeking external consultation
Challenge: Insufficient data for complete analysis
Need: Identify and recover missing information
Stakeholders: Client, internal analyst, external consultant (you)
Next Steps: Data validation, gap analysis, and potential re-extraction from source -
Personnel: Newly hired data analyst
Task: Re-review master dataset to identify top 3 health concerns for healthy hearts
Objective: Determine focus areas for new facility operations
Discovery: Data analyst identified discrepancy between original and imported data