Project Topic

This project aims to investigate the effects of healthcare access on vote share in the 2024 presidential election, with a particular focus on county-level variations. By accounting for spatial effects, the study will explore how proximity and access to healthcare facilities, approximated by the number of hospitals, may influence voting behaviors. The project seeks to provide insights into the relationship between healthcare access and electoral outcomes, contributing to a broader understanding of how socioeconomic factors shape political preferences at a local level.

Data Description

hospitals <- read.csv('./Hospital_General_Information.csv', na.strings=c('Not Available'))
votes <- read_excel('./ColoradoVoteShare.xlsx', sheet = 3, skip=2)
## New names:
## • `` -> `...4`
## • `` -> `...6`
## • `` -> `...8`
## • `` -> `...10`
## • `` -> `...12`
## • `` -> `...14`
## • `` -> `...16`
## • `` -> `...18`
county_locs <- read.csv('Colorado_County_Boundaries.csv')

Data Cleaning

hospitals <- hospitals[hospitals$State == 'CO',]

votes <- votes[, c('Precinct', 'Kamala D. Harris / Tim Walz', 'Donald J. Trump / JD Vance')]
colnames(votes) <- c('Precinct', 'Harris', 'Trump')

votes$Harris <- as.numeric(votes$Harris)
votes$Trump <- as.numeric(votes$Trump)

votes$share_harris <- votes$Harris / (votes$Harris + votes$Trump)

# Remove last row (vote totals)
votes <- votes[1:(dim(votes)[1]-1),]

head(hospitals)[c(2, 4, 5, 7)]
##                                   Facility.Name   City.Town State County.Parish
## 698        BANNER NORTH COLORADO MEDICAL CENTER     GREELEY    CO          WELD
## 699                    LONGMONT UNITED HOSPITAL    LONGMONT    CO       BOULDER
## 700 INTERMOUNTAIN HEALTH PLATTE VALLEY HOSPITAL    BRIGHTON    CO         ADAMS
## 701                    MONTROSE REGIONAL HEALTH    MONTROSE    CO      MONTROSE
## 702    SAN LUIS VALLEY REGIONAL  MEDICAL CENTER     ALAMOSA    CO       ALAMOSA
## 703                     LUTHERAN MEDICAL CENTER WHEAT RIDGE    CO     JEFFERSON
head(votes)
## # A tibble: 6 × 4
##   Precinct  Harris Trump share_harris
##   <chr>      <dbl> <dbl>        <dbl>
## 1 Adams      98727 78876        0.556
## 2 Alamosa     3213  3998        0.446
## 3 Arapahoe  138815 88114        0.612
## 4 Archuleta   3856  5158        0.428
## 5 Baca         271  1662        0.140
## 6 Bent         633  1470        0.301

Find Hospital Count by County

# Combine data
M <- length(votes$Precinct)

hospital_count <- rep(NA, M)
central_lat <- rep(NA, M)
central_long <- rep(NA, M)

for (i in 1:M) {
  hospital_count[i] <- count(hospitals['County.Parish'] == toupper(votes$Precinct[i]))
  
  central_lat[i] <- county_locs[county_locs['COUNTY'] == toupper(votes$Precinct[i]), 7]
  central_long[i] <- county_locs[county_locs['COUNTY'] == toupper(votes$Precinct[i]), 8]
}
votes$hospital_count <- hospital_count

votes$central_lat <- central_lat
votes$central_long <- central_long

Data visualizations

bubblePlot(votes$central_long, votes$central_lat, votes$share_harris, col = colorRampPalette(c('red', 'blue')), size = 2)
US(add=TRUE)

obj<- spatialProcess( cbind(central_long, central_lat), votes$share_harris, profileLambda=TRUE, profileARange=TRUE)

c <- seq(from=0, to=1, by=0.01)
surface(obj, col = rgb(1 - c, 0, c)) #, xlim=c(-108, -103), ylim=c(37.5, 40.5)
US(add=TRUE)
points(-104.9903, 39.7392, lwd=5, col='black')
text(-105.15, 40, labels='Denver', cex= 1, col='white')

# Evaluate vote share as adjusted by hospital access

hospitalFit <- spatialProcess( cbind(central_long, central_lat), votes$share_harris, Z=votes$hospital_count, profileLambda=TRUE, profileARange=TRUE)
plot(hospitalFit, which=2)

summary(hospitalFit)
## CALL:
## spatialProcess(x = cbind(central_long, central_lat), y = votes$share_harris, 
##     Z = votes$hospital_count, profileLambda = TRUE, profileARange = TRUE)
## 
##  SUMMARY OF MODEL FIT:
##                                                             
##  Number of Observations:                    64              
##  Degree of polynomial in fixed part:        1               
##  Total number of parameters in fixed part:  4               
##  Number of additional covariates (Z)        1               
##  sigma Process stan. dev:                   0.1377          
##  tau  Nugget stan. dev:                     0.08853         
##  lambda   tau^2/sigma^2:                    0.4133          
##  aRange parameter (in units of distance):   0.7622          
##  Approx.  degrees of freedom for curve      33.49           
##     Standard Error of df estimate:          1.387           
##  log Likelihood:                            38.7006859817464
##  log Likelihood REML:                       31.8852543097218
## 
##  ESTIMATED COEFFICIENTS FOR FIXED PART:
## 
##    estimate       SE  pValue
## d1 -1.27800 2.832000 0.65170
## d2 -0.02582 0.022680 0.25500
## d3 -0.02907 0.033700 0.38840
## d4  0.03151 0.009617 0.00105
## 
##  COVARIANCE MODEL: stationary.cov
##   Covariance function:  Matern
##    Non-default covariance arguments and their values 
## Covariance :
## [1] "Matern"
## smoothness :
## [1] 1
## aRange :
## [1] 0.7622114
## onlyUpper :
## [1] FALSE
## distMat :
## [1] NA
## Nonzero entries in covariance matrix  4096
## 
## SUMMARY FROM Max. Likelihood ESTIMATION:
## Parameters found from optim: 
##    lambda    aRange 
## 0.4132703 0.7622114 
## Approx. confidence intervals for MLE(s) 
##         lower95% upper95%
## lambda 0.1473149 1.159369
## aRange 0.3579380 1.623092
## 
##  Note: MLEs for  tau and sigma found analytically from lambda
## 
## Summary from estimation:
## lnProfileLike.FULL lnProfileREML.FULL        lnLike.FULL        lnREML.FULL 
##        38.70068598        31.88525431                 NA                 NA 
##             lambda                tau             sigma2             aRange 
##         0.41327027         0.08853004         0.01896475         0.76221144 
##             eff.df                GCV 
##        33.49366207         0.01886888

initial_predictions <- data.frame(
  Red_Area = c(FALSE, FALSE, TRUE, TRUE),
  Healthcare_Access = c("HIGH", "LOW", "HIGH", "LOW"),
  Share_Harris = c(
    predict(hospitalFit, x = cbind(-104.9903, 39.7392), Z = max(votes$hospital_count)), 
    predict(hospitalFit, x = cbind(-104.9903, 39.7392), Z = min(votes$hospital_count)),
    predict(hospitalFit, x = cbind(-109, 40.5), Z = max(votes$hospital_count)),
    predict(hospitalFit, x = cbind(-109, 40.5), Z = min(votes$hospital_count))
  )
)
knitr::kable(initial_predictions, format = "html")
Red_Area Healthcare_Access Share_Harris
FALSE HIGH 0.7737465
FALSE LOW 0.5216585
TRUE HIGH 0.4796519
TRUE LOW 0.2275639

This shows a large effect associated with an increase in the number of hospitals in a county. While some of this is plausibly a direct result of increased healthcare access, some portion of the effect is likely a result of confounding on the level of urbanization of a county. To investigate this further, I will divide the number of voters per county by the number of hospitals in that county to get a ‘voters per hospital’ metric which I will use to model vote distribution further. To adjust for counties without any hospitals, I will add 1 to the numerator and denominator of this calculation.

votes$votersPerHospital = (votes$Harris+votes$Trump+1)/(votes$hospital_count+1)
adjHospitalFit <- spatialProcess( cbind(central_long, central_lat), votes$share_harris, Z=votes$votersPerHospital, profileLambda=TRUE, profileARange=TRUE)
summary(adjHospitalFit)
## CALL:
## spatialProcess(x = cbind(central_long, central_lat), y = votes$share_harris, 
##     Z = votes$votersPerHospital, profileLambda = TRUE, profileARange = TRUE)
## 
##  SUMMARY OF MODEL FIT:
##                                                             
##  Number of Observations:                    64              
##  Degree of polynomial in fixed part:        1               
##  Total number of parameters in fixed part:  4               
##  Number of additional covariates (Z)        1               
##  sigma Process stan. dev:                   0.1252          
##  tau  Nugget stan. dev:                     0.09732         
##  lambda   tau^2/sigma^2:                    0.6046          
##  aRange parameter (in units of distance):   0.6812          
##  Approx.  degrees of freedom for curve      27.65           
##     Standard Error of df estimate:          1.279           
##  log Likelihood:                            36.5499049977484
##  log Likelihood REML:                       21.0876087098734
## 
##  ESTIMATED COEFFICIENTS FOR FIXED PART:
## 
##      estimate        SE  pValue
## d1 -1.739e+00 2.562e+00 0.49720
## d2 -2.889e-02 2.031e-02 0.15490
## d3 -2.485e-02 3.088e-02 0.42100
## d4  3.944e-06 1.632e-06 0.01568
## 
##  COVARIANCE MODEL: stationary.cov
##   Covariance function:  Matern
##    Non-default covariance arguments and their values 
## Covariance :
## [1] "Matern"
## smoothness :
## [1] 1
## aRange :
## [1] 0.6811639
## onlyUpper :
## [1] FALSE
## distMat :
## [1] NA
## Nonzero entries in covariance matrix  4096
## 
## SUMMARY FROM Max. Likelihood ESTIMATION:
## Parameters found from optim: 
##    lambda    aRange 
## 0.6045613 0.6811639 
## Approx. confidence intervals for MLE(s) 
##         lower95% upper95%
## lambda 0.2010903 1.817564
## aRange 0.2767761 1.676389
## 
##  Note: MLEs for  tau and sigma found analytically from lambda
## 
## Summary from estimation:
## lnProfileLike.FULL lnProfileREML.FULL        lnLike.FULL        lnREML.FULL 
##        36.54990500        21.08760871                 NA                 NA 
##             lambda                tau             sigma2             aRange 
##         0.60456132         0.09732305         0.01566719         0.68116395 
##             eff.df                GCV 
##        27.65204259         0.01759462
Krig( cbind(central_long, central_lat), votes$share_harris, Z=votes$votersPerHospital)
## Call:
## Krig(x = cbind(central_long, central_lat), Y = votes$share_harris, 
##     Z = votes$votersPerHospital)
##                                                
##  Number of Observations:                64     
##  Number of parameters in the null space 4      
##  Parameters for fixed spatial drift     3      
##  Model degrees of freedom:              39.3   
##  Residual degrees of freedom:           24.7   
##  GCV estimate for tau:                  0.08284
##  MLE for tau:                           0.08021
##  MLE for sigma:                         0.01927
##  lambda                                 0.33   
##  User supplied sigma                    NA     
##  User supplied tau^2                    NA     
## Summary of estimates: 
##               lambda      trA        GCV     tauHat -lnLike Prof converge
## GCV        0.5847082 31.86456 0.01755998 0.09389969    -34.47357        1
## GCV.model         NA       NA         NA         NA           NA       NA
## GCV.one    0.5847082 31.86456 0.01755998 0.09389969           NA        1
## RMSE              NA       NA         NA         NA           NA       NA
## pure error        NA       NA         NA         NA           NA       NA
## REML       0.3339460 39.32637 0.01779910 0.08283722    -34.69420        3
predictions <- data.frame(
  Red_Area = c(FALSE, FALSE, TRUE, TRUE),
  Healthcare_Access = c("HIGH", "LOW", "HIGH", "LOW"),
  Share_Harris = c(
    predict(adjHospitalFit, x = cbind(-104.9903, 39.7392), Z = max(votes$votersPerHospital)), 
    predict(adjHospitalFit, x = cbind(-104.9903, 39.7392), Z = min(votes$votersPerHospital)),
    predict(adjHospitalFit, x = cbind(-109, 40.5), Z = max(votes$votersPerHospital)),
    predict(adjHospitalFit, x = cbind(-109, 40.5), Z = min(votes$votersPerHospital))
  )
)
knitr::kable(predictions, format = "html")
Red_Area Healthcare_Access Share_Harris
FALSE HIGH 0.7854835
FALSE LOW 0.5181335
TRUE HIGH 0.5700806
TRUE LOW 0.3027306

Results

We find three clear effects of use in interpreting election results:

  1. There is a sizable spatial effect involved in determining the voter share.
    • Note: While there is a clear spatial effect observed, our spatial process does not show a significant linear relationship. This is explainable as the relationship is nonlinear.
  2. An increase in health resources is associated with an increase in Harris’ voter share.
    • This is reflected in our initial modeling, relating hospital count to vote share. This finds that the estimated vote share for Harris goes up by 3.15 percentage points for every additional hospital in a county, at a significance level of p < 0.01.
  3. That increase persists after a basic adjustment for population levels, suggesting that the effect is not just a result of population density.
    • Importantly, the increase in Harris’ voter share remained statistically significant even after adjusting for population levels, by modeling Harris’ vote share as the sum of the spatial effect and the number of voters per hospital, indicating that the relationship is not merely a function of population density. The persistence of this effect after adjustment highlights the independent influence of healthcare access on voting behavior, beyond the confounding effects of larger population sizes, though it does not rule out other confounders.

Areas for further research

As this is an observational study, it remains vulnerable to confounding variables that could affect the relationship between healthcare access and voter share. While I included population density as a control, other factors like average income, education levels, and employment could also influence both healthcare access and voting behavior. If, for instance, counties with higher average incomes tend to have more healthcare resources per capita and are also more likely to support Harris, this may account for some of the effect attributed to healthcare access. To address this, future analysis could be carried out using a matched analysis structure, in which similar counties are compared longitudinally, to see if there is a change in trend as healthcare projects are completed. Further information could also be found in completing a longitudinal study, looking at trends over multiple election cycles as a function of hospital count.

This analysis could likely be refined further. The metric ‘hospitals per county’ was used as a proxy for total healthcare access despite that only being one metric of healthcare. A future analysis could look at a variety of other metrics, such as county health spending or levels of health insurance. Further analysis could also look at election and party preferences outside of the presidential election, and could also attempt to model voter turnout. Finally, it may be useful to investigate these effects with a higher spatial resolution, instead of limiting analysis to the county level.