It is hard for us to know exactly how many people take the drug Truvada (TDF/FTC) to prevent versus treat HIV. Word is spreading through communities about this daily pill to the reduce risk of infection, and as people become informed the number of users is increasing. Can Google help us predict this uptake?
Building on the suspicion in my previous post that the trends in Google searches for “HIV prep” matches perfectly with the rate of Truvada uptake, here we explore the potential validity of using Google Trends to predict future uptake of the drug. After comparing the fit of different combinations of parameters, this simple linear regression model seemed sufficient:
Here the outcome N is the cumulative number of people initiating PrEP in month t based on the popularity of Google searches as score Google(t). To assess the validity of using Google Trends to predict the future number of people on PrEP, we can fit the model based on users from 2012-2015 (orange) or 2012-2016 (red) so we can see how closely the predictions hit to the actual number of users in 2016 and 2017.
fit <- lm(prep ~ trend + time + time^2 + trend*time, data=data) hat <- predict(fit, newdata = data, interval = "prediction")
Next time
Next time I try this I will spend a little more time fitting a poison generalized linear model, since the outcome is a count of persons bounded between zero and infinity. What would really make this awesome is monthly data from Gilead to use as the outcome instead of the yearly totals pulled from the abstract of Mera et al.