This blogposts shows how to extract population estimates data reported by the Florida Department of Health and prepare them for analysis, specifically, for exploring the trends in demographic growth between 2006 and 2020.
The initial extract of the data was obtained from www.flhealthcharts.com, a reporting tool for population counts estimated by the Florida Department of Health. The figure below shows the modifications to the default query the produces the data product used in this demonstration:
The tool gives the option to save the product of the query as an Excel book (
.xls), however, the import of this extension into R has been problematic, so I have converted (“save as”) the file manually into a more modern Excel format,
.xlsx. This file is the raw source for the current report and can be dowloaded for closer inspection here.
The structure of the Excel file requires some tidying to enable a nimble analytic flow
We can identify several problems to address:
- Not all columns have names
age_groupare stored in merged spreadsheet cells
- Sums for categories are recorded in rows as observations (e.g
- Some values of
age_groupare misinterpreted as dates (e.g.
age_groupdoes not bin the population evenly (e.g.
The cleaned version of this dataset is available for download here.
We will consider several questions in order to demonstrate plotting from this cleaned dataset.
- How does the total population of Florida changes between 2006 and 2020?
- How does each ethnic group change during this time?
- what Ethnic group is most dissimilar from the other three in their dynamics?
- What would it look like if we used the original