9.6 Linear regression after MI
Use with()
to fit a regression model on a multiply imputed dataset (mids
) object to produce a multiple imputation repeated analysis (mira
) object. Afterwards, use pool()
to create a multiple imputation pooled results (mipo
) object. Returning to Example 9.1, the following code fits a linear regression model after MI.
<- with(imp.nhanes.new,
fit.imp.lm lm(LBDGLUSI_trans ~ BMXWAIST + smoker + RIDAGEYR +
+ race_eth + income)) RIAGENDR
with()
fits the regression model separately for each complete dataset. The following extracts a list
of regression results for each imputation.
$analyses fit.imp.lm
For example, the results for the first analysis are:
# Full summary
# summary(fit.imp.lm$analyses[[1]])
# Just the coefficients
round(summary(fit.imp.lm$analyses[[1]])$coef, 5)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.11043 0.00355 -31.13280 0.00000
## BMXWAIST 0.00029 0.00003 9.78624 0.00000
## smokerPast 0.00140 0.00117 1.18960 0.23449
## smokerCurrent 0.00070 0.00142 0.49553 0.62034
## RIDAGEYR 0.00031 0.00003 9.96333 0.00000
## RIAGENDRFemale -0.00492 0.00099 -4.98353 0.00000
## race_ethNon-Hispanic White -0.00358 0.00133 -2.69919 0.00707
## race_ethNon-Hispanic Black -0.00265 0.00185 -1.42922 0.15326
## race_ethNon-Hispanic Other -0.00205 0.00196 -1.04731 0.29521
## income$25,000 to < $55,000 0.00103 0.00152 0.67989 0.49674
## income$55,000+ -0.00001 0.00136 -0.00387 0.99692
Use summary(pool())
to apply Rubin’s rules to obtain the final regression coefficients and 95% confidence intervals.
summary(pool(fit.imp.lm), conf.int=T)
The numbers in the resulting table may have a lot of digits, and unfortunately the round()
function does not work directly on a mipo.summary
object. To round the results, use the function round.summary()
(found in Functions_rmph.R
which you loaded at the beginning of this chapter) which takes a mira
object, pools the results, and rounds the output to a specified number of digits
.
round.summary(fit.imp.lm, digits = 4)
## estimate std.error statistic df p.value 2.5 % 97.5 %
## (Intercept) -0.1113 0.0036 -30.8461 872.6 0.0000 -0.1183 -0.1042
## BMXWAIST 0.0003 0.0000 9.8759 869.4 0.0000 0.0002 0.0004
## smokerPast 0.0014 0.0012 1.1607 978.1 0.2460 -0.0009 0.0037
## smokerCurrent 0.0007 0.0014 0.4877 972.4 0.6258 -0.0021 0.0035
## RIDAGEYR 0.0003 0.0000 9.9618 974.8 0.0000 0.0002 0.0004
## RIAGENDRFemale -0.0048 0.0010 -4.8932 977.1 0.0000 -0.0068 -0.0029
## race_ethNon-Hispanic White -0.0038 0.0013 -2.8099 941.7 0.0051 -0.0064 -0.0011
## race_ethNon-Hispanic Black -0.0028 0.0018 -1.4936 981.1 0.1356 -0.0064 0.0009
## race_ethNon-Hispanic Other -0.0020 0.0020 -1.0094 975.2 0.3131 -0.0058 0.0019
## income$25,000 to < $55,000 0.0011 0.0016 0.6474 363.4 0.5178 -0.0022 0.0043
## income$55,000+ 0.0002 0.0015 0.1470 389.6 0.8832 -0.0027 0.0031
9.6.1 Multiple degree of freedom tests
mi.anova()
(Grund, Lüdtke, and Robitzsch 2016) from the miceadds
package (Robitzsch, Grund, and Henke 2023) carries out multiple degree of freedom Type 3 tests pooled over imputations, and also computes the pooled \(R^2\) value.
mi.anova(imp.nhanes.new,
"LBDGLUSI_trans ~ BMXWAIST + smoker + RIDAGEYR + RIAGENDR +
race_eth + income",
type = 3)
## Univariate ANOVA for Multiply Imputed Data (Type 3)
##
## lm Formula: LBDGLUSI_trans ~ BMXWAIST + smoker + RIDAGEYR + RIAGENDR +
## race_eth + income
## R^2=0.1926
## ..........................................................................
## ANOVA Table
## SSQ df1 df2 F value Pr(>F) eta2 partial.eta2
## BMXWAIST 0.02376 1 7323.6 96.9034 0.00000 0.08262 0.09283
## smoker 0.00033 2 202970.1 0.6860 0.50360 0.00114 0.00141
## RIDAGEYR 0.02348 1 97744.1 98.8674 0.00000 0.08166 0.09185
## RIAGENDR 0.00566 1 258576.2 23.9319 0.00000 0.01968 0.02380
## race_eth 0.00195 3 26100.3 2.6892 0.04466 0.00678 0.00832
## income 0.00022 2 664.7 0.2536 0.77607 0.00076 0.00094
## Residual 0.23219 NA NA NA NA NA NA
mi.anova()
only works, however, for linear regression. A more generally applicable method, which works for linear and other forms of regression, is D1()
which carries out a Wald test comparing two nested models. To get a multiple degree of freedom test for a categorical variable with more than two levels, fit a reduced model that omits that variable, and use D1()
to compare the full and reduced models. The first time you run D1()
, you will be prompted to install the mitml
package (Grund, Robitzsch, and Luedtke 2023) if you have not already installed it.
For example, to get a Type 3 test for smoker
in Example 9.1:
# Reduced model omitting smoker
<- with(imp.nhanes.new,
fit.imp.lm.smoker lm(LBDGLUSI_trans ~ BMXWAIST + RIDAGEYR + RIAGENDR + race_eth + income))
# Compare full and reduced model
summary(D1(fit.imp.lm, fit.imp.lm.smoker))
##
## Models:
## model formula
## 1 LBDGLUSI_trans ~ BMXWAIST + smoker + RIDAGEYR + RIAGENDR + race_eth + income
## 2 LBDGLUSI_trans ~ BMXWAIST + RIDAGEYR + RIAGENDR + race_eth + income
##
## Comparisons:
## test statistic df1 df2 dfcom p.value riv
## 1 ~~ 2 0.6892 2 984.7 989 0.5022 0.006685
##
## Number of imputations: 14 Method D1
To carry out a likelihood ratio test instead, use D3()
.
summary(D3(fit.imp.lm, fit.imp.lm.smoker))
##
## Models:
## model formula
## 1 LBDGLUSI_trans ~ BMXWAIST + smoker + RIDAGEYR + RIAGENDR + race_eth + income
## 2 LBDGLUSI_trans ~ BMXWAIST + RIDAGEYR + RIAGENDR + race_eth + income
##
## Comparisons:
## test statistic df1 df2 dfcom p.value riv
## 1 ~~ 2 0.6992 2 766256 989 0.497 0.004973
##
## Number of imputations: 14 Method D3
9.6.2 Predictions
To obtain predictions following MI, predict within each analysis and pool the results over imputations using Rubin’s rules. First, create a data.frame
containing the values at which you want to predict. Second, compute the prediction (and its standard error) for each analysis. Since the mira
object, fit.imp.lm
, contains the results of each analysis in a list (fit.imp.lm$analyses
), use lapply()
to apply the predict()
function to each analysis (each element of the list). Finally, extract the predictions and their variances and use Rubin’s rules to pool the results via pool.scalar()
.
Example 9.1 (continued): Predict (transformed) fasting glucose for an individual with a waist circumference of 130 cm who is a current smoker, age 50 years, male, non-Hispanic Black, and earns at least $55,000 annually.
# Prediction data.frame
<- data.frame(
NEWDAT BMXWAIST = 130,
smoker = "Current",
RIDAGEYR = 50,
RIAGENDR = "Male",
race_eth = "Non-Hispanic Black",
income = "$55,000+")
# The code below assumes NEWDAT has only 1 row.
# To make predictions at different values,
# run again with a different NEWDAT.
# Get prediction for each analysis
<- lapply(fit.imp.lm$analyses, predict, newdata=NEWDAT, se.fit=T)
PREDLIST
# Compute the mean and variance in each analysis
<- VAR <- rep(NA, length(fit.imp.lm$analyses))
MEAN for(i in 1:length(MEAN)) {
<- PREDLIST[[i]]$fit
MEAN[i] <- (PREDLIST[[i]]$se.fit)^2
VAR[ i]
}
# Apply Rubin's rules
<- pool.scalar(MEAN,
PRED.POOLED
VAR,nrow(imp.nhanes.new$data))
# Extract the pooled mean
$qbar PRED.POOLED
## [1] -0.05951
# Extract the pooled standard error of the mean
sqrt(PRED.POOLED$t)
## [1] 0.002238
If the outcome was transformed and you would like the prediction on the original scale, then back-transform (this works for the mean, but not the standard error). In Example 9.1, a Box-Cox transformation with \(\lambda\) = -1.5 was used.
-1*PRED.POOLED$qbar)^(1/-1.5) (
## [1] 6.561
9.6.3 Polynomial predictor transformations
If you are going to include non-linear functions of variables in the regression model (e.g., Box-Cox, log, quadratic), your imputation model should include those non-linearities. This ensures that your imputation model includes the same level of complexity as your regression model. Otherwise, you will be imputing values under the assumption of linearity.
For functions that convert \(X\) into a single term \(f(x)\) (e.g., \(log(X)\)), simply create a transformed version of the variable prior to imputation. This was done for the outcome fasting glucose in Example 9.1. What about a transformation that converts a predictor \(X\) into multiple terms, such as a quadratic transformation which includes both \(X\) and \(X^2\)? Should the square term be created prior to imputation (transform-then-impute) or after (impute-then-transform)? It turns out that transform-then-impute is the better method and leads to unbiased results (Hippel 2009). Create all the terms you need for your regression model during pre-processing and include them in the imputation model.
NOTES:
- When using a polynomial transformation, avoid collinearity issues in the imputation model by first centering (e.g. \(cX = X -\) mean of \(X\)) and then creating higher order terms as powers of the centered variable (e.g., \(cX2 = (cX)^2, cX3 = (cX)^3\), etc.) prior to imputation.
- In both the imputation and regression model, include higher order terms using the derived version (e.g., include the square term as \(cX2\), not \(I(cX^2)\)).
- In both the imputation and regression model, include all the polynomial terms (e.g., centered linear, quadratic, cubic, etc.) in your polynomial, not just the highest order term (e.g., for a quadratic transformation, include both \(cX\) and \(cX2\), not just \(cX2\)).
Example 9.1 (continued): Regress (transformed) fasting glucose on waist circumference, adjusted for smoking status, age, gender, race/ethnicity, and income, and include a quadratic transformation for waist circumference.
First, repeat the pre-processing steps but this time create a centered version of waist circumference and a squared version of that centered variable.
# Compute the mean waist circumference for centering
mean(nhanes_adult_fast_sub$BMXWAIST, na.rm=T)
## [1] 100.5
<- nhanes_adult_fast_sub %>%
nhanesb mutate(
# (1) Collapse a sparse categorical variable
race_eth = fct_collapse(RIDRETH3,
"Hispanic" = c("Mexican American", "Other Hispanic"),
"Non-Hispanic Other" = c("Non-Hispanic Asian", "Other/Multi")),
# (2) Box-Cox outcome transformation for our regression
LBDGLUSI_trans = -1*LBDGLUSI^(-1.5),
# (3) Centered, quadratic version of waist circumference
cBMXWAIST = BMXWAIST - 100,
cBMXWAIST2 = cBMXWAIST^2
%>%
) # Select the variables to be included in the imputation model.
# If including a quadratic, also include the linear term.
select(LBDGLUSI_trans, cBMXWAIST, cBMXWAIST2,
smoker, RIDAGEYR, RIAGENDR, race_eth, income)
Next, fit the imputation model.
<- mice(nhanesb,
imp.quad seed = 3,
m = nimpute(nhanesb),
print = F)
Finally, fit the regression model with both the centered linear and quadratic waist circumference terms, with the quadratic term included as cBMXWAIST2
not I(cBMXWAIST^2)
.
<- with(imp.quad,
fit.imp.quad lm(LBDGLUSI_trans ~ cBMXWAIST + cBMXWAIST2 + smoker + RIDAGEYR +
+ race_eth + income))
RIAGENDR round.summary(fit.imp.quad, digits = 8)[, c("estimate", "2.5 %",
"97.5 %", "p.value")]
## estimate 2.5 % 97.5 % p.value
## (Intercept) -0.08232213 -0.08685128 -0.07779298 0.00000000
## cBMXWAIST 0.00028876 0.00022254 0.00035498 0.00000000
## cBMXWAIST2 0.00000069 -0.00000174 0.00000312 0.57879512
## smokerPast 0.00135526 -0.00094790 0.00365843 0.24847865
## smokerCurrent 0.00076638 -0.00202906 0.00356181 0.59070100
## RIDAGEYR 0.00031138 0.00024918 0.00037358 0.00000000
## RIAGENDRFemale -0.00487109 -0.00681069 -0.00293150 0.00000097
## race_ethNon-Hispanic White -0.00391980 -0.00655731 -0.00128228 0.00362140
## race_ethNon-Hispanic Black -0.00290628 -0.00654603 0.00073348 0.11745374
## race_ethNon-Hispanic Other -0.00212235 -0.00597177 0.00172706 0.27954052
## income$25,000 to < $55,000 0.00121704 -0.00190836 0.00434243 0.44474015
## income$55,000+ 0.00051304 -0.00228454 0.00331063 0.71890279
9.6.4 Interactions
If you want to include an interaction in your regression model, make sure to include the interaction in your imputation model, as well. This ensures that your imputation model includes at least the same level of complexity as your regression model. Otherwise, you will be imputing values under the assumption of no interaction.
For an interaction involving a categorical variable, this can be done by stratifying the dataset by the categorical variable, fitting the imputation model separately in each stratum, and then combining the results. This will only work if the categorical variable has no missing values (or, if there are not many missing values, if you first exclude cases with missing values on the categorical variable). An alternative, which is more generally applicable, is to use transform-then-impute; create the interaction term as a distinct variable in your pre-processing step and include it in the imputation model (Hippel 2009).
The stratification and transform-then-impute methods will not, in general, lead to the same imputed values. The stratification method leads to a more complex imputation model because it effectively includes an interaction between the categorical variable and every other variable, whereas the transform-then-impute method only includes whatever interactions you explicitly include. However, transform-then-impute is more generally applicable; if the categorical variable has missing data, or if both terms in the interaction are continuous, the stratification method is not possible.
Example 9.1 (continued): Regress (transformed) fasting glucose on waist circumference, adjusted for smoking status, age, gender, race/ethnicity, and income, and include an interaction to see if the waist circumference effect depends on gender. Use MI to handle missing data.
9.6.4.1 Interaction via stratification
First, check to see if gender has any missing values. If yes, and the number is very small, then remove the cases with missing gender from the dataset. If the number is large, or you do not want to lose even a small number of cases, use the interaction via transform-then-impute method described in the next section.
table(nhanes$RIAGENDR, exclude = NULL)
##
## Male Female
## 457 543
Gender has no missing values. To use stratification to include an interaction in the imputation model, do the following: split the dataset into separate datasets for each level of gender, use mice()
on each dataset, and combine the resulting mids
objects using rbind()
. Make sure to use the same number of imputations for all strata, and base the number of imputations on the full dataset, prior to stratification.
# Split data by gender
<- nhanes %>%
nhanes_F filter(RIAGENDR == "Female")
<- nhanes %>%
nhanes_M filter(RIAGENDR == "Male")
# Impute separately by gender
<- mice(nhanes_F,
imp_F seed = 3,
m = nimpute(nhanes),
print = F)
<- mice(nhanes_M,
imp_M seed = 3,
m = nimpute(nhanes),
print = F)
# Combine the imputations across strata
<- rbind(imp_F, imp_M) imp.int
Just to check, table gender on the first imputation. It should have all its values and no missing values. If it does not, you may have stratified on a variable with missing values (which you should not do).
# Checking the first imputation
# (should have all levels of gender and no missing values)
table(complete(imp.int)$RIAGENDR, exclude=NULL)
##
## Male Female
## 457 543
Next, fit the model with the interaction and pool the results.
<- with(imp.int,
fit.imp.int lm(LBDGLUSI_trans ~ BMXWAIST + smoker + RIDAGEYR +
+ race_eth + income +
RIAGENDR :RIAGENDR))
BMXWAISTround.summary(fit.imp.int, digits = 5)[, c("estimate", "2.5 %",
"97.5 %", "p.value")]
## estimate 2.5 % 97.5 % p.value
## (Intercept) -0.11338 -0.12322 -0.10354 0.0000
## BMXWAIST 0.00032 0.00022 0.00041 0.0000
## smokerPast 0.00139 -0.00092 0.00370 0.2393
## smokerCurrent 0.00083 -0.00198 0.00363 0.5636
## RIDAGEYR 0.00031 0.00024 0.00037 0.0000
## RIAGENDRFemale -0.00117 -0.01336 0.01102 0.8511
## race_ethNon-Hispanic White -0.00387 -0.00650 -0.00124 0.0040
## race_ethNon-Hispanic Black -0.00283 -0.00647 0.00081 0.1279
## race_ethNon-Hispanic Other -0.00206 -0.00591 0.00179 0.2934
## income$25,000 to < $55,000 0.00088 -0.00237 0.00413 0.5958
## income$55,000+ 0.00026 -0.00268 0.00320 0.8632
## BMXWAIST:RIAGENDRFemale -0.00004 -0.00016 0.00008 0.5493
In this example, the interaction is only 1 term, but if there were multiple terms, use D1()
to get a multiple degree of freedom Wald test of significance by comparing the model with an interaction to the model without (or D3()
to get a likelihood ratio test).
# Fit reduced model with no interaction
<- with(imp.int,
fit.imp.noint lm(LBDGLUSI_trans ~ BMXWAIST + smoker + RIDAGEYR +
+ race_eth + income))
RIAGENDR
# Type 3 Wald test of interaction
summary(D1(fit.imp.int, fit.imp.noint))
##
## Models:
## model
## 1
## 2
## formula
## LBDGLUSI_trans ~ BMXWAIST + smoker + RIDAGEYR + RIAGENDR + race_eth + income + BMXWAIST:RIAGENDR
## LBDGLUSI_trans ~ BMXWAIST + smoker + RIDAGEYR + RIAGENDR + race_eth + income
##
## Comparisons:
## test statistic df1 df2 dfcom p.value riv
## 1 ~~ 2 0.3589 1 702.3 988 0.5493 0.05453
##
## Number of imputations: 14 Method D1
9.6.4.2 Interaction via transform-then-impute
A more general approach to including an interaction, which works even when a categorical variable in the interaction has missing values or when both variables are continuous, is to explicitly compute the interaction terms during pre-processing followed by imputation including the computed terms. If both terms in the interaction are continuous, simply create (during pre-processing) a variable that is their product. If either are categorical, the process involves a few additional steps.
First, fit the regression model, including the interaction, to the original pre-imputation dataset to confirm what terms comprise the interaction.
<- lm(LBDGLUSI_trans ~ BMXWAIST + smoker + RIDAGEYR +
fit.pre + race_eth + income +
RIAGENDR :RIAGENDR, data = nhanes)
BMXWAISTnames(coef(fit.pre))
## [1] "(Intercept)" "BMXWAIST" "smokerPast"
## [4] "smokerCurrent" "RIDAGEYR" "RIAGENDRFemale"
## [7] "race_ethNon-Hispanic White" "race_ethNon-Hispanic Black" "race_ethNon-Hispanic Other"
## [10] "income$25,000 to < $55,000" "income$55,000+" "BMXWAIST:RIAGENDRFemale"
There is one interaction term – BMXWAIST:RIAGENDRFemale
. To explicitly compute this term, derive a variable that is BMXWAIST
multiplied by an indicator variable corresponding to RIAGENDER == Female
(1 when this condition is true, 0 when false). In general, first create an indicator variable for each non-reference level of the categorical variable and compute the product of each indicator variable and the other term in the interaction.
NOTE: Make sure to drop the old categorical variable from the dataset.
<- nhanes %>%
nhanes_int mutate(RIAGENDRFemale = as.numeric(RIAGENDR == "Female"),
BMXWAIST_RIAGENDRFemale = BMXWAIST*RIAGENDRFemale) %>%
select(-RIAGENDR)
# Checking
table(nhanes$RIAGENDR, nhanes_int$RIAGENDRFemale, exclude=NULL)
##
## 0 1
## Male 457 0
## Female 0 543
<- nhanes$RIAGENDR == "Female"
SUB # Should all be 0
summary(nhanes_int$BMXWAIST[SUB] -
$BMXWAIST_RIAGENDRFemale[SUB]) nhanes_int
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 0 0 0 0 0 21
Next, fit the imputation model.
<- mice(nhanes_int,
imp.int seed = 3,
m = nimpute(nhanes_int),
print = F)
imp.int
## Class: mids
## Number of multiple imputations: 14
## Imputation methods:
## LBDGLUSI_trans BMXWAIST smoker RIDAGEYR
## "" "pmm" "" ""
## race_eth income RIAGENDRFemale BMXWAIST_RIAGENDRFemale
## "" "polyreg" "" "pmm"
## PredictorMatrix:
## LBDGLUSI_trans BMXWAIST smoker RIDAGEYR race_eth income RIAGENDRFemale
## LBDGLUSI_trans 0 1 1 1 1 1 1
## BMXWAIST 1 0 1 1 1 1 1
## smoker 1 1 0 1 1 1 1
## RIDAGEYR 1 1 1 0 1 1 1
## race_eth 1 1 1 1 0 1 1
## income 1 1 1 1 1 0 1
## BMXWAIST_RIAGENDRFemale
## LBDGLUSI_trans 1
## BMXWAIST 1
## smoker 1
## RIDAGEYR 1
## race_eth 1
## income 1
Finally, fit the regression model using with()
, including the new gender indicator variable and the derived interaction term (which is entered just like any other term in the model, not with a :
).
<- with(imp.int,
fit.imp.int lm(LBDGLUSI_trans ~ BMXWAIST + smoker + RIDAGEYR +
+ race_eth + income +
RIAGENDRFemale
BMXWAIST_RIAGENDRFemale))round.summary(fit.imp.int, digits = 5)
## estimate std.error statistic df p.value 2.5 % 97.5 %
## (Intercept) -0.11338 0.00490 -23.1489 893.7 0.00000 -0.12299 -0.10377
## BMXWAIST 0.00032 0.00005 6.8965 865.8 0.00000 0.00023 0.00041
## smokerPast 0.00139 0.00118 1.1765 975.4 0.23968 -0.00093 0.00370
## smokerCurrent 0.00074 0.00143 0.5173 973.6 0.60509 -0.00206 0.00354
## RIDAGEYR 0.00031 0.00003 9.8340 977.1 0.00000 0.00024 0.00037
## RIAGENDRFemale -0.00140 0.00612 -0.2282 921.4 0.81954 -0.01340 0.01061
## race_ethNon-Hispanic White -0.00390 0.00133 -2.9217 970.9 0.00356 -0.00651 -0.00128
## race_ethNon-Hispanic Black -0.00287 0.00185 -1.5481 978.0 0.12193 -0.00650 0.00077
## race_ethNon-Hispanic Other -0.00200 0.00196 -1.0209 973.4 0.30757 -0.00585 0.00185
## income$25,000 to < $55,000 0.00094 0.00167 0.5649 288.1 0.57259 -0.00235 0.00424
## income$55,000+ 0.00031 0.00147 0.2100 434.7 0.83375 -0.00259 0.00321
## BMXWAIST_RIAGENDRFemale -0.00003 0.00006 -0.5527 918.8 0.58060 -0.00015 0.00008
9.6.4.3 Estimate the effect of one variable at levels of the other
To estimate the effect of one variable at levels of the other, use gmodels::estimable()
on each imputation separately and pool the results using Rubin’s rules via pool.scalar()
(with some additional code to compute p-values).
# Compute mean and variance from each analysis
# Initialize vector for level 1 (Male)
<- VAR1 <- rep(NA, length(fit.imp.int$analyses))
MEAN1 # Initialize vector for level 2 (Female)
<- VAR2 <- rep(NA, length(fit.imp.int$analyses))
MEAN2
for(i in 1:length(MEAN1)) {
<- gmodels::estimable(fit.imp.int$analyses[[i]],
EST1 c("BMXWAIST" = 1))
<- gmodels::estimable(fit.imp.int$analyses[[i]],
EST2 c("BMXWAIST" = 1,
"BMXWAIST_RIAGENDRFemale" = 1))
<- as.numeric(EST1["Estimate"])
MEAN1[i] <- (as.numeric(EST1["Std. Error"])^2)
VAR1[ i] <- as.numeric(EST2["Estimate"])
MEAN2[i] <- (as.numeric(EST2["Std. Error"])^2)
VAR2[ i]
}
# Apply Rubin's rules
<- pool.scalar(MEAN1,
POOLED1
VAR1,nrow(imp.int$data))
<- pool.scalar(MEAN2,
POOLED2
VAR2,nrow(imp.int$data))
# Extract the pooled mean and standard error
<- data.frame(gender = c("Male", "Female"),
DF est = c(POOLED1$qbar, POOLED2$qbar),
se = sqrt(c(POOLED1$t, POOLED2$t)),
df = c(POOLED1$df, POOLED2$df))
# Compute test statistic and p-value
$t <- DF$est/DF$se
DF$p <- 2*(pt(abs(DF$t), DF$df, lower.tail = F))
DF DF
## gender est se df t p
## 1 Male 0.0003162 0.00004585 874.6 6.896 0.0000000000102137
## 2 Female 0.0002831 0.00003895 879.0 7.270 0.0000000000007958