Java Mailing List Archive

http://www.r-help.com/

Home » Home (12/2007) » R Help for Statistical Computing »

[R] Boostrap p-value in regression [indirectly related to R]

Viechtbauer Wolfgang (STAT)

2007-05-21


Hello All,

Despite my preference for reporting confidence intervals, I need to
obtain a p-value for a hypothesis test in the context of regression
using bootstrapping. I have read John Fox's chapter on bootstrapping
regression models and have consulted Efron & Tibshirani's "An
Introduction to the Bootstrap" but I just wanted to ask the experts here
for some feedback to make sure that I am not doing something wrong.

Let's take a simplified example where the model includes one independent
variable and the idea is to test H0: beta1 = 0 versus Ha: beta1 != 0.

########################################################

### generate some sample data

n <- 50
xi <- runif(n, min=1, max=5)
yi <- 0 + 0.2 * xi + rnorm(n, mean=0, sd=1)

### fit simple regression model

mod <- lm(yi ~ xi)
summary(mod)
b1 <- coef(mod)[2]
t1 <- coef(mod)[2] / coef(summary(mod))[2,2]

### 1000 bootstrap replications using (X,Y)-pair resampling

t1.star <- rep(NA,1000)

for (i in 1:1000) {

ids     <- sample(1:n, replace=TRUE)
newyi    <- yi[ids]
newxi    <- xi[ids]
mod     <- lm(newyi ~ newxi)
t1.star[i] <- ( coef(mod)[2] - b1) / coef(summary(mod))[2,2]

}

### get bootstrap p-value

hist(t1.star, nclass=40)
abline(v=t1, lwd=3)
abline(v=-1*t1, lwd=3)
2 * mean( t1.star > abs(t1) )

########################################################

As suggested in the chapter on bootstrapping regression models by John
Fox, the bootstrap p-value is 2 times the proportion of bootstrap
t-values (with b1 subtracted so that we get the distribution under H0)
larger than the absolute value of the actual t-value observed in the
data.

Doesn't this assume that the bootstrap sampling distribution is
symmetric? And if yes, would it then not be more reasonable to
calculate:

mean( abs(t1.star) > abs(t1) )

or in words: the number of bootstrap t-values that are more extreme on
either side of the bootstrap distribution than the actual t-value
observed?

Any suggestions or comments would be appreciated!

--
Wolfgang Viechtbauer
Department of Methodology and Statistics
University of Maastricht, The Netherlands
http://www.wvbauer.com

______________________________________________
R-help@(protected)
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
©2008 r-help.com - Jax Systems, LLC, U.S.A.