-
Period: to
Phase One
Context and Scope
-Data familiarization
-Define objectives
-Data validation -
Data Familiarization
Load the dataset into a data analysis framework . Gain familiarity with the data to better comprehend variable metadata. Construct a variable dependency graph. -
Define objectives
*Establish measurement criteria for progress and success. -
Data validation
Implement error handling for missing or indeterminate values then validate and correct data anomalies -
Period: to
PhaseTwo
Data cleaning and validation
-Conduct data normalization or standardization
-Construct Computed Variables -
Conduct data normalization or standardization
Utilize Python to concatenate and format date components into a standardized datetime object. Create a datetime object from individual date fields (day, month, year) -
Construct Computed Variables
Derive BMI from the provided WEIGHT2 and HEIGHT3 measurements. Determine the time interval between the current date and the last checkup (CHECKUP1). -
Period: to
Phase three
Preliminary Data Analysis
-Statistical Summaries
-Visual Data Analysis
-Dependency Analysis -
Statistical Summaries
Utilize Python to calculate descriptive statistics for the numeric variables NUMADULT, WEIGHT2, and HEIGHT3. Conduct categorical data analysis on the categorical variables SMOKE100 and HLTHPLN1. -
Visual data analysis
Categorical Data Visualization: Employ bar charts to illustrate the frequency or proportion of categories within a categorical variable, like BPHIGH4 status, across different gender groups. Continuous Data Analysis: Perform a continuous data analysis using boxplots or histograms to assess the central tendency, variability, and shape of the distribution of continuous variables, such as WEIGHT2. -
Dependency analysis
Assess the statistical significance of the relationship between EXEROFT1 and BPHIGH4. -
Period: to
Phase four
Prescriptive Analytics
-Pattern Analysis
-Stratification
-Statistical Hypothesis Testing -
Pattern analysis
Assess temporal health trends based on interview dates. Conduct demographic comparisons to identify health variations across different population subgroups. -
Stratification
Partition the dataset by categorical variables like gender, smoking status, and health plan coverage. -
Statistical hypothesis testing
Utilize statistical inference to draw conclusions about the relationship between smoking habits and general health based on sample evidence -
Period: to
Phase five
Research Synthesis and Visualization
-Collation of Insights
-Devise Actionable Strategies
-Present findings -
Collation of insights
Visualize key findings. -
Devise actionable strategies
Implement strategies derived from analysis. -
Present findings
Utilize Tableau to develop a clear and concise report to visualize the analytical findings.