This project aims to investigate the effects of healthcare access on vote share in the 2024 presidential election, with a particular focus on county-level variations. By accounting for spatial effects, the study will explore how proximity and access to healthcare facilities, approximated by the number of hospitals, may influence voting behaviors. The project seeks to provide insights into the relationship between healthcare access and electoral outcomes, contributing to a broader understanding of how socioeconomic factors shape political preferences at a local level.
hospitals <- read.csv('./Hospital_General_Information.csv', na.strings=c('Not Available'))
votes <- read_excel('./ColoradoVoteShare.xlsx', sheet = 3, skip=2)
## New names:
## • `` -> `...4`
## • `` -> `...6`
## • `` -> `...8`
## • `` -> `...10`
## • `` -> `...12`
## • `` -> `...14`
## • `` -> `...16`
## • `` -> `...18`
county_locs <- read.csv('Colorado_County_Boundaries.csv')
hospitals <- hospitals[hospitals$State == 'CO',]
votes <- votes[, c('Precinct', 'Kamala D. Harris / Tim Walz', 'Donald J. Trump / JD Vance')]
colnames(votes) <- c('Precinct', 'Harris', 'Trump')
votes$Harris <- as.numeric(votes$Harris)
votes$Trump <- as.numeric(votes$Trump)
votes$share_harris <- votes$Harris / (votes$Harris + votes$Trump)
# Remove last row (vote totals)
votes <- votes[1:(dim(votes)[1]-1),]
head(hospitals)[c(2, 4, 5, 7)]
## Facility.Name City.Town State County.Parish
## 698 BANNER NORTH COLORADO MEDICAL CENTER GREELEY CO WELD
## 699 LONGMONT UNITED HOSPITAL LONGMONT CO BOULDER
## 700 INTERMOUNTAIN HEALTH PLATTE VALLEY HOSPITAL BRIGHTON CO ADAMS
## 701 MONTROSE REGIONAL HEALTH MONTROSE CO MONTROSE
## 702 SAN LUIS VALLEY REGIONAL MEDICAL CENTER ALAMOSA CO ALAMOSA
## 703 LUTHERAN MEDICAL CENTER WHEAT RIDGE CO JEFFERSON
head(votes)
## # A tibble: 6 × 4
## Precinct Harris Trump share_harris
## <chr> <dbl> <dbl> <dbl>
## 1 Adams 98727 78876 0.556
## 2 Alamosa 3213 3998 0.446
## 3 Arapahoe 138815 88114 0.612
## 4 Archuleta 3856 5158 0.428
## 5 Baca 271 1662 0.140
## 6 Bent 633 1470 0.301
# Combine data
M <- length(votes$Precinct)
hospital_count <- rep(NA, M)
central_lat <- rep(NA, M)
central_long <- rep(NA, M)
for (i in 1:M) {
hospital_count[i] <- count(hospitals['County.Parish'] == toupper(votes$Precinct[i]))
central_lat[i] <- county_locs[county_locs['COUNTY'] == toupper(votes$Precinct[i]), 7]
central_long[i] <- county_locs[county_locs['COUNTY'] == toupper(votes$Precinct[i]), 8]
}
votes$hospital_count <- hospital_count
votes$central_lat <- central_lat
votes$central_long <- central_long
bubblePlot(votes$central_long, votes$central_lat, votes$share_harris, col = colorRampPalette(c('red', 'blue')), size = 2)
US(add=TRUE)
obj<- spatialProcess( cbind(central_long, central_lat), votes$share_harris, profileLambda=TRUE, profileARange=TRUE)
c <- seq(from=0, to=1, by=0.01)
surface(obj, col = rgb(1 - c, 0, c)) #, xlim=c(-108, -103), ylim=c(37.5, 40.5)
US(add=TRUE)
points(-104.9903, 39.7392, lwd=5, col='black')
text(-105.15, 40, labels='Denver', cex= 1, col='white')
# Evaluate vote share as adjusted by hospital access
hospitalFit <- spatialProcess( cbind(central_long, central_lat), votes$share_harris, Z=votes$hospital_count, profileLambda=TRUE, profileARange=TRUE)
plot(hospitalFit, which=2)
summary(hospitalFit)
## CALL:
## spatialProcess(x = cbind(central_long, central_lat), y = votes$share_harris,
## Z = votes$hospital_count, profileLambda = TRUE, profileARange = TRUE)
##
## SUMMARY OF MODEL FIT:
##
## Number of Observations: 64
## Degree of polynomial in fixed part: 1
## Total number of parameters in fixed part: 4
## Number of additional covariates (Z) 1
## sigma Process stan. dev: 0.1377
## tau Nugget stan. dev: 0.08853
## lambda tau^2/sigma^2: 0.4133
## aRange parameter (in units of distance): 0.7622
## Approx. degrees of freedom for curve 33.49
## Standard Error of df estimate: 1.387
## log Likelihood: 38.7006859817464
## log Likelihood REML: 31.8852543097218
##
## ESTIMATED COEFFICIENTS FOR FIXED PART:
##
## estimate SE pValue
## d1 -1.27800 2.832000 0.65170
## d2 -0.02582 0.022680 0.25500
## d3 -0.02907 0.033700 0.38840
## d4 0.03151 0.009617 0.00105
##
## COVARIANCE MODEL: stationary.cov
## Covariance function: Matern
## Non-default covariance arguments and their values
## Covariance :
## [1] "Matern"
## smoothness :
## [1] 1
## aRange :
## [1] 0.7622114
## onlyUpper :
## [1] FALSE
## distMat :
## [1] NA
## Nonzero entries in covariance matrix 4096
##
## SUMMARY FROM Max. Likelihood ESTIMATION:
## Parameters found from optim:
## lambda aRange
## 0.4132703 0.7622114
## Approx. confidence intervals for MLE(s)
## lower95% upper95%
## lambda 0.1473149 1.159369
## aRange 0.3579380 1.623092
##
## Note: MLEs for tau and sigma found analytically from lambda
##
## Summary from estimation:
## lnProfileLike.FULL lnProfileREML.FULL lnLike.FULL lnREML.FULL
## 38.70068598 31.88525431 NA NA
## lambda tau sigma2 aRange
## 0.41327027 0.08853004 0.01896475 0.76221144
## eff.df GCV
## 33.49366207 0.01886888
initial_predictions <- data.frame(
Red_Area = c(FALSE, FALSE, TRUE, TRUE),
Healthcare_Access = c("HIGH", "LOW", "HIGH", "LOW"),
Share_Harris = c(
predict(hospitalFit, x = cbind(-104.9903, 39.7392), Z = max(votes$hospital_count)),
predict(hospitalFit, x = cbind(-104.9903, 39.7392), Z = min(votes$hospital_count)),
predict(hospitalFit, x = cbind(-109, 40.5), Z = max(votes$hospital_count)),
predict(hospitalFit, x = cbind(-109, 40.5), Z = min(votes$hospital_count))
)
)
knitr::kable(initial_predictions, format = "html")
Red_Area | Healthcare_Access | Share_Harris |
---|---|---|
FALSE | HIGH | 0.7737465 |
FALSE | LOW | 0.5216585 |
TRUE | HIGH | 0.4796519 |
TRUE | LOW | 0.2275639 |
This shows a large effect associated with an increase in the number of hospitals in a county. While some of this is plausibly a direct result of increased healthcare access, some portion of the effect is likely a result of confounding on the level of urbanization of a county. To investigate this further, I will divide the number of voters per county by the number of hospitals in that county to get a ‘voters per hospital’ metric which I will use to model vote distribution further. To adjust for counties without any hospitals, I will add 1 to the numerator and denominator of this calculation.
votes$votersPerHospital = (votes$Harris+votes$Trump+1)/(votes$hospital_count+1)
adjHospitalFit <- spatialProcess( cbind(central_long, central_lat), votes$share_harris, Z=votes$votersPerHospital, profileLambda=TRUE, profileARange=TRUE)
summary(adjHospitalFit)
## CALL:
## spatialProcess(x = cbind(central_long, central_lat), y = votes$share_harris,
## Z = votes$votersPerHospital, profileLambda = TRUE, profileARange = TRUE)
##
## SUMMARY OF MODEL FIT:
##
## Number of Observations: 64
## Degree of polynomial in fixed part: 1
## Total number of parameters in fixed part: 4
## Number of additional covariates (Z) 1
## sigma Process stan. dev: 0.1252
## tau Nugget stan. dev: 0.09732
## lambda tau^2/sigma^2: 0.6046
## aRange parameter (in units of distance): 0.6812
## Approx. degrees of freedom for curve 27.65
## Standard Error of df estimate: 1.279
## log Likelihood: 36.5499049977484
## log Likelihood REML: 21.0876087098734
##
## ESTIMATED COEFFICIENTS FOR FIXED PART:
##
## estimate SE pValue
## d1 -1.739e+00 2.562e+00 0.49720
## d2 -2.889e-02 2.031e-02 0.15490
## d3 -2.485e-02 3.088e-02 0.42100
## d4 3.944e-06 1.632e-06 0.01568
##
## COVARIANCE MODEL: stationary.cov
## Covariance function: Matern
## Non-default covariance arguments and their values
## Covariance :
## [1] "Matern"
## smoothness :
## [1] 1
## aRange :
## [1] 0.6811639
## onlyUpper :
## [1] FALSE
## distMat :
## [1] NA
## Nonzero entries in covariance matrix 4096
##
## SUMMARY FROM Max. Likelihood ESTIMATION:
## Parameters found from optim:
## lambda aRange
## 0.6045613 0.6811639
## Approx. confidence intervals for MLE(s)
## lower95% upper95%
## lambda 0.2010903 1.817564
## aRange 0.2767761 1.676389
##
## Note: MLEs for tau and sigma found analytically from lambda
##
## Summary from estimation:
## lnProfileLike.FULL lnProfileREML.FULL lnLike.FULL lnREML.FULL
## 36.54990500 21.08760871 NA NA
## lambda tau sigma2 aRange
## 0.60456132 0.09732305 0.01566719 0.68116395
## eff.df GCV
## 27.65204259 0.01759462
Krig( cbind(central_long, central_lat), votes$share_harris, Z=votes$votersPerHospital)
## Call:
## Krig(x = cbind(central_long, central_lat), Y = votes$share_harris,
## Z = votes$votersPerHospital)
##
## Number of Observations: 64
## Number of parameters in the null space 4
## Parameters for fixed spatial drift 3
## Model degrees of freedom: 39.3
## Residual degrees of freedom: 24.7
## GCV estimate for tau: 0.08284
## MLE for tau: 0.08021
## MLE for sigma: 0.01927
## lambda 0.33
## User supplied sigma NA
## User supplied tau^2 NA
## Summary of estimates:
## lambda trA GCV tauHat -lnLike Prof converge
## GCV 0.5847082 31.86456 0.01755998 0.09389969 -34.47357 1
## GCV.model NA NA NA NA NA NA
## GCV.one 0.5847082 31.86456 0.01755998 0.09389969 NA 1
## RMSE NA NA NA NA NA NA
## pure error NA NA NA NA NA NA
## REML 0.3339460 39.32637 0.01779910 0.08283722 -34.69420 3
predictions <- data.frame(
Red_Area = c(FALSE, FALSE, TRUE, TRUE),
Healthcare_Access = c("HIGH", "LOW", "HIGH", "LOW"),
Share_Harris = c(
predict(adjHospitalFit, x = cbind(-104.9903, 39.7392), Z = max(votes$votersPerHospital)),
predict(adjHospitalFit, x = cbind(-104.9903, 39.7392), Z = min(votes$votersPerHospital)),
predict(adjHospitalFit, x = cbind(-109, 40.5), Z = max(votes$votersPerHospital)),
predict(adjHospitalFit, x = cbind(-109, 40.5), Z = min(votes$votersPerHospital))
)
)
knitr::kable(predictions, format = "html")
Red_Area | Healthcare_Access | Share_Harris |
---|---|---|
FALSE | HIGH | 0.7854835 |
FALSE | LOW | 0.5181335 |
TRUE | HIGH | 0.5700806 |
TRUE | LOW | 0.3027306 |
We find three clear effects of use in interpreting election results:
p < 0.01
.As this is an observational study, it remains vulnerable to confounding variables that could affect the relationship between healthcare access and voter share. While I included population density as a control, other factors like average income, education levels, and employment could also influence both healthcare access and voting behavior. If, for instance, counties with higher average incomes tend to have more healthcare resources per capita and are also more likely to support Harris, this may account for some of the effect attributed to healthcare access. To address this, future analysis could be carried out using a matched analysis structure, in which similar counties are compared longitudinally, to see if there is a change in trend as healthcare projects are completed. Further information could also be found in completing a longitudinal study, looking at trends over multiple election cycles as a function of hospital count.
This analysis could likely be refined further. The metric ‘hospitals per county’ was used as a proxy for total healthcare access despite that only being one metric of healthcare. A future analysis could look at a variety of other metrics, such as county health spending or levels of health insurance. Further analysis could also look at election and party preferences outside of the presidential election, and could also attempt to model voter turnout. Finally, it may be useful to investigate these effects with a higher spatial resolution, instead of limiting analysis to the county level.