summary for section 2

Content Type

User Generated

User

utjscq

Subject

Mathematics

Description

I need summary for 2-(Ordinal Superiority Measures for Normal Linear Models) which in p 215

specially for confidence interval in that section

Unformatted Attachment Preview

Biometrics 73, 214–219 March 2017 DOI: 10.1111/biom.12565 Ordinal Probability Eﬀect Measures for Group Comparisons in Multinomial Cumulative Link Models Alan Agresti1, * and Maria Kateri2, ** 1 Department of Statistics, University of Florida, Gainesville, Florida 32605, U.S.A. 2 Institute of Statistics, RWTH Aachen University, D-52056 Aachen, Germany ∗ email: aa@stat.uﬂ.edu ∗∗ email: maria.kateri@rwth-aachen.de Summary. We consider simple ordinal model-based probability eﬀect measures for comparing distributions of two groups, adjusted for explanatory variables. An “ordinal superiority” measure summarizes the probability that an observation from one distribution falls above an independent observation from the other distribution, adjusted for explanatory variables in a model. The measure √applies directly to normal linear models and to a normal latent variable model for ordinal response variables. It equals (β/ 2) for the corresponding ordinal model that applies a probit link function to cumulative multinomial probabilities, for standard normal cdf and eﬀect β that is the coeﬃcient of the group indicator variable. For the more general latent variable model for ordinal responses that corresponds to a linear model with other possible error distributions and corresponding link functions for cumulative multinomial probabilities, the √ ordinal superiority measure equals exp(β)/[1 + exp(β)] with the log–log √ link and equals approximately exp(β/ 2)/[1 + exp(β/ 2)] with the logit link, where β is the group eﬀect. Another ordinal superiority measure generalizes the diﬀerence of proportions from binary to ordinal responses. We also present related measures directly for ordinal models for the observed response that need not assume corresponding latent response models. We present conﬁdence intervals for the measures and illustrate with an example. Key words: Cumulative logit model; Cumulative probit model; Mann–Whitney statistic; Ordinal multinomial models; Proportional odds; Stochastic ordering. 1. Introduction This article considers simple ordinal eﬀect summaries for model-based comparison of two groups on an ordinal categorical response variable, while adjusting for other explanatory variables. Unlike standard summaries using nonlinear measures such as probits and odds ratios that can be diﬃcult for practitioners to interpret, the proposed measures are based merely on probabilities and their diﬀerences. The summary measures generalize two “ordinal superiority” measures that compare two groups without supplementary explanatory variables. Let y1 and y2 denote independent random variables from groups denoted by A and B, for a quantitative or ordinal categorical scale. The measure = P(y1 > y2 ) − P(y2 > y1 ). (1) summarizes their relative size. For binary responses with outcomes (0, 1), this simpliﬁes to the diﬀerence of proportions, P(y1 = 1) − P(y2 = 1). If y1 and y2 are identically distributed, then = 0.0. For discrete response variables, such as ordinal categorical responses, a related measure that has null value equal to 0.50 rather than 0 is γ = P(y1 > y2 ) + 214 1 P(y1 = y2 ) 2 (2) (Klotz, 1966). The correction factor adjusts for ties to generate a null value of 0.50. The measures are functionally related, γ = ( + 1)/2, = 2γ − 1, with γ and having ranges [0, 1] and [−1, 1], respectively. They are most meaningful when the groups are stochastically ordered, such as when they diﬀer by a location shift on some scale. For details for ordinal categorical response scales, see Agresti (2010, Chap. 2). The measures relate directly to the information used in the Mann–Whitney statistic. For example, issue 4 of Volume 25 of Statistics in Medicine in 2006, which is devoted to that statistic and its uses and extensions, contains several articles that use such measures. The ordinal eﬀect measures discussed in this article use such probabilities in the context of modeling ordinal response variables while adjusting for explanatory variables. Section 2 introduces the measures for normal linear models that contain an indicator term for the groups, because linear models serve as latent variable models for ordinal response data. Section 3 presents related measures for a standard model for an ordinal response variable that applies a link function such as the probit or logit to cumulative probabilities, utilizing its connection with the latent variable model for various error distributions. Section 4 presents an example, also showing how to use © 2016, The International Biometric Society Ordinal Probability Eﬀect Measures R software to easily construct conﬁdence intervals for the measures. Section 5 presents related ordinal eﬀect measures for cumulative link models in terms of the observed response, instead of a latent response. Section 6 discusses the applicability of the measures and suggests extensions for other models. 2. Ordinal Superiority Measures for Normal Linear Models We ﬁrst consider normal linear models that have explanatory variables in addition to a binary group indicator variable. At explanatory variable values x = (x1 , . . . , xp )T , let y1 denote the response variable for an observation in group A and let y2 denote an independent response for an observation in group B. Using the model-based conditional distributions on y for the two groups at x, let = P(y1 > y2 ; x) − P(y2 > y1 ; x). With no explanatory variables other than the group indicator, this simpliﬁes to (1). An analog of the ordinal superiority measure (2) is which is merely P(y1 > y2 ; x) when the response is continuous. The measures are useful summaries when no substantive interaction occurs between the group variable and the explanatory variables. Let z be a group indicator for an observation, where z = 1 for group A and z = 0 for group B. These ordinal measures have simple form for the ordinary normal linear model y = β0 + βz + xT βx + , with βx = (β1 , . . . , βp )T and ∼ N(0, σ 2 ). For this model, the diﬀerence between the conditional means of y1 and y2 at x is β, and (y1 − y2 ) − β −β √ > √ 2σ 2σ √ 2(β̂/ 2s) − 1. A conﬁdence interval (L, U) for the standardized diﬀerence β/σ in the normal linear √ model √yields a corresponding conﬁdence interval ((L/ 2), (U/ 2)) for γ, which then also yields one for . For the model matrix X for the linear model, let v denote the element in the row and column of (XT X)−1 corresponding to the eﬀect parameter β for comparing the two groups. √ For testing H0 : β = 0 using the usual t statistic, t = β̂/s v, consider the noncentrality parameter λ= β √ . σ v Let (λ̂L , λ̂U ) denote the standard conﬁdence interval for λ for √ this√ test (Lehmann, 1986, p. 352). Then, since λ = (β/ 2σ)( √ 2/v), √ it follows that the conﬁdence interval (L, U) for β/ 2σ is v/2(λ̂L , λ̂U ). Applying to these endpoints yields the conﬁdence interval for γ. Hayter (2012) presented more general conﬁdence intervals, and Tian (2008) presented conﬁdence intervals for group comparisons when the groups have diﬀerent variances. 3. γ = ( + 1)/2, γ = P(y1 > y2 ; x) = P 215 β = √ 2σ . This formula applies regardless of the values x of the √ explanatory variables. Likewise, = 2(β/ 2σ) − 1. Diﬀerences between the normal conditional standardized means for the two groups taking values β/σ equal to 0, 0.5, 1, 2, 3, correspond to γ equal to 0.50, 0.64, 0.76, 0.92, 0.98, respectively. Analogous measures apply when interaction occurs between the group indicator and an explanatory variable, or when the variance is allowed to be nonconstant, but then the values of the measures depend on the value of that explanatory variable. The standardized diﬀerence β/σ has seen longtime use in the literature for comparing two groups (e.g., Lehmann, 1975, p. 71). The corresponding ordinal superiority measures have also been used in a general regression context (e.g., Brumback et al., 2006, Thas et al., 2012). In practice, with least squares estimate β̂ in the linear model and residual standard deviation s, we√ can estimate ˆ = the ordinal group comparisons by γ̂ = (β̂/ 2s) and Ordinal Superiority Measures for Ordinal Latent Variable Models When y is a c-category ordinal response variable, the most popular models are special cases of the cumulative link model link[P(y ≤ j)] = αj − βz − xT βx , j = 1, . . . , c − 1, (3) for link functions such as the logit, probit, or log–log and complementary log–log (McCullagh, 1980). It is often sensible to regard an ordinal categorical variable as necessarily crude measurement of a continuous latent variable y∗ that, if we could observe it, would be the response variable in an ordinary linear model. The cumulative link model is implied by a model in which a latent response has conditional distribution with cdf given by the inverse of the link function and with mean βz + xT βx (Anderson and Philips, 1981). The normal latent variable model with y∗ ∼ N(βz + T x βx , 1) implies the cumulative probit model −1 [P(y ≤ j)] = αj − βz − xT βx , with {αj } being cutpoints on the underlying scale and being the standard normal cdf. The ordinal superiority measures apply directly to this latent variable model. Let y1∗ and y2∗ denote independent underlying latent variables at x when z = 1 and when z = 0, respectively. For this model, γ = P(y1∗ > y2∗ ; x) = P (y1∗ − y2∗ ) − β β −β √ = √ , > √ 2 2 2 √ regardless of x values, and = 2(β/ 2) − 1. The logit link and corresponding cumulative logit model relate to underlying logistic distributions, for which such a simple expression does not occur. However, because of the very close similarity of logit and probit model ﬁts, estimates of the corresponding measures for that logistic latent variable model are very similar to estimates for the normal latent 216 Biometrics, March 2017 variable model. For a cumulative logit model with proportional odds structure and maximum likelihood estimate β̂ of the group eﬀect, we can use numerical integration or simulate pairs of observations from the relevant logistic distributions to closely approximate the maximum likelihood estimate of the probability for the diﬀerence of latent logistic random variables. In practice, though, it is adequate to approximate the distribution of y1∗ − y2∗ by a √ logistic distribution with parameter β and scale parameter 2, for which √ exp(β/ 2) √ , γ≈ [1 + exp(β/ 2)] or to ﬁt the corresponding cumulative probit model and use the closed-form results for it. For ordinal responses, log–log and complementary log–log links are appropriate when we expect underlying latent variables to have extreme-value distributions. If in the latent variable model, the errors are independent extreme-value random variables (i.e., the standard Gumbel cdf F () = exp[− exp(−)]), then their diﬀerence has the standard logistic distribution (McFadden, 1974). For a model with log–log link and coeﬃcient β for the group indicator, it follows that γ = P(y1∗ > y2∗ ; x) = exp(β) , [1 + exp(β)] when the scale parameter of the underlying extreme-value distributions is 1. For γ and for the latent variable model with an ordinal response variable, simple conﬁdence intervals result directly from ordinary conﬁdence intervals for β for the corresponding ordinal cumulative link model. For example, if [β̂L , β̂U ] is a proﬁle-likelihood or Wald conﬁdence interval for β in the cumulative probit model based on a multinomial likelihood, √ the corresponding conﬁdence interval for γ is [(β̂L / 2), √ (β̂U / 2)]. 4. Example for Cumulative Link Models We illustrate the ordinal superiority measures with an example from Agresti (2015, Section 6.3.3) on a study of mental health. It relates a four-category response variable measuring mental impairment (1 = well, 2 = mild symptom formation, 3 = moderate symptom formation, 4 = impaired) to a binary indicator of socioeconomic status (SES: 1 = high, 0 = low) and a quantitative life-events (LE) index taking values on the nonnegative integers between 0 and 9 with mean 4.3 and standard deviation 2.7. The n = 40 observations are available at www.stat.uﬂ.edu/ aa/glm/data. For the cumulative probit model corresponding to a normal latent variable model, the maximum likelihood ﬁt is −1 [P̂(y ≤ j)] = α̂j + 0.68336(SES) − 0.19535(LE). To compare the √ two levels of SES using β̂1 = −0.68336, we ˆ = −0.371. The ordinal can use γ̂ = (β̂1 / 2) = 0.314 and superiority measure γ̂ has the interpretation that at any particular value for life events, there is about a 1/3 chance of lower mental impairment at low SES than at high SES. The 95% proﬁle likelihood conﬁdence interval for β1 yields conﬁdence intervals (0.161, 0.507) for γ and (−0.678, 0.015) for . Table 1 shows how simple it is to use software such as R to obtain a conﬁdence interval for γ for the SES eﬀect. Here, we ﬁtted the cumulative probit model using the cml function of the R–package ordinal (Christensen, 2011). Similarly, we can use these measures to compare two levels of the life events measure. For the highest and lowest levels (0 √ and 9), γ̂ = (9β̂2 / 2) = 0.893, with 95% proﬁle likelihood conﬁdence interval (0.653, 0.983), suggesting a very strong eﬀect. Table 1 R code and output (edited) for ﬁnding conﬁdence interval for ordinal superiority measure γ for SES eﬀect in cumulative probit model with mental impairment data > Mental Mental 1 impair ses life 2 1 1 1 3 1 1 9 ... 40 4 0 9 > attach(Mental) > library(ordinal) # library(ordinal) requires response to be a factor > impair.f probit.m summary(probit.m) # we don’t show cutpoint parameter estimates Estimate Std. Error z value Pr(>|z|) ses -0.68336 0.36411 -1.877 0.06055 . life 0.19535 0.06887 2.837 0.00456 > Like.CI.b1 Like.CI.gamma k π1j (x0 )π2k (x0 ), (4) k>j and γ(x0 ) = j>k π1j (x0 )π2k (x0 ) + 1 π1j (x0 )π2j (x0 ). 2 (5) j ˆ 0 ) and γ̂(x0 ) replace the Corresponding sample values (x probabilities in (4) and (5) by the corresponding ﬁtted values {π̂1j (x0 )} and {π̂2j (x0 )} for the model. Unlike the measures for the latent variable models, these measures have values depending on x0 . In practice, we could report them and their conﬁdence intervals at a representative x0 value, such as the overall mean x̄. Or, if the sample x values are representative of the population of interest, a summary approach estimates the measures at the x value for each obser- Table 2 R code and output (edited) for ﬁnding conﬁdence interval for ordinal superiority measure γ for SES eﬀect in normal linear model with mental impairment data > summary(lm(impair ~ ses + life)) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.91973788 0.33785712 5.68210 1.6924e-06 ses -0.64500836 0.32915094 -1.95961 0.0576069 . life 0.17778169 0.06060938 2.93324 0.0057285 --Residual standard error: 1.02696 on 37 degrees of freedom > library(MBESS) > conf.limits.nct(-0.64501/0.32915, df=37, conf.level=0.95) $Lower.Limit [1] -3.956887716 $Upper.Limit [1] 0.06278409435 > v pnorm(sqrt(v/2)*(-3.956887)); pnorm(sqrt(v/2)*(0.062784)) [1] 0.1849219812 [1] 0.5056763573 218 Biometrics, March 2017 vation, and then averages them. Let xi denote the explanatory component vector for observation i, and let π ij = π j (xi ), for = 1, 2, i = 1, . . . , n, and j = 1, . . . , c. Summary ordinal superiority measures are ∗ = 1 1 i and γ ∗ = γi , n n i (6) i with components i = (xi ) and γi = γ(xi ), given by (4) and (5), respectively. The expressions of i and γi in terms of the parameters of the cumulative link model (3) are given in web ˆ ∗ and appendix A. We obtain the model-based estimates γ̂ ∗ by replacing the parameter values by the corresponding estimated values. To construct conﬁdence intervals for these measures, we can obtain large-sample standard errors using the delta method, based on an estimated covariance matrix of the ML model parameter estimates that are generated from the usual multinomial sampling scheme. From results for the simple case without explanatory variables, it is more sensible to apply the delta method to a transform such as the logit of the measure (Ryu and Agresti, 2008) rather than to the measure itself. Web appendix A contains the technical details. An R function for constructing estimates and conﬁdence intervals for γ ∗ and ∗ , based on the cumulative logit or probit model, is available in web appendix B. We illustrate for the mental impairment data that Section 4 used to illustrate the measures for the ordinal latent variable models. For comparing the two SES levels with cumulative probit and cumulative logit models, Table 3 shows γ(x) and (x) at the life events index values x = 0, . . . , 9, and at the sample mean value x̄ = 4.3. Although the estimates vary according to the life events value, they are quite stable. As we would expect, because of the similarity of logit and probit Table 3 Estimates of the ordinal superiority measures comparing the two SES levels for the mental impairment data at the diﬀerent levels of the life-events index and its sample mean, based on the cumulative probit and cumulative logit models Cumulative Probit Life events Logit Probit Logit ˆ γ̂ 0 1 2 3 4 5 6 7 8 9 0.355 0.345 0.338 0.333 0.330 0.329 0.330 0.334 0.339 0.348 0.357 0.348 0.341 0.337 0.335 0.334 0.334 0.336 0.341 0.350 −0.291 −0.310 −0.325 −0.334 −0.340 −0.342 −0.339 −0.333 −0.321 −0.305 −0.286 −0.305 −0.318 −0.326 −0.330 −0.333 −0.332 −0.327 −0.317 −0.301 x̄ = 4.3 0.330 0.334 −0.341 −0.331 models, summary results are similar for the two cumulative links. For the summary measures averaged over the 40 observaˆ ∗ = −0.325 for the probit tions, we obtain γ̂ ∗ = 0.337 and ∗ ˆ ∗ = −0.319 for the model, and we obtain γ̂ = 0.341 and logit model. Table 4 shows 95% conﬁdence intervals for the population values, using the observed information matrix. All these analyses indicate a range from essentially no eﬀect to a relatively large one in the direction of poorer mental health at the lower SES level. 6. Discussion and Extensions The measures introduced here supplement measures previously proposed to summarize eﬀects in models for ordinal categorical responses, such as Ryu and Agresti (2008) and Thas et al. (2012). For other ordinal eﬀect measures, see Cheng (2009), Lu et al. (2014), Lu et al. (2015), and Volfovsky et al. (2015). An advantage of the ordinal superiority measures is simplicity of interpretation for ordinal categorical models in which researchers often ﬁnd probits and odds ratios diﬃcult to interpret. For models with nonlinear link functions, such as cumulative link models, the natural model-based eﬀect measures are not easy to understand. For the typical medical researcher or practitioner, for instance, reading that at any values of explanatory variables the estimated probability that a response to drug (z = 1) is better than a response to placebo (z = 0) is γ̂ = 0.66 would have greater meaning than reading that (i) an estimated cumulative odds for drug is exp(β̂) = 2.7 times the estimated cumulative odds for placebo (i.e., from (3) with the logit link), or (ii) estimated cumulative probits diﬀer by β̂ = 0.5 or an underlying mean for drug is β̂ = 0.5 standard deviations better than for placebo (i.e., from (3) with the probit link), or (iii) the estimated probability that the response for drug is worse than a particular outcome category is the power exp(β̂) = 1.7 of the estimated probability that the response for placebo is worse than that category (i.e., from (3) with the complementary log–log link). The ordinal superiority measures extend directly to summary comparisons of multiple groups, based on more general models that have multiple indicator variables for the groups. For example, suppose a cumulative probit model contains terms β(a) za + β(b) zb in the linear predictor for groups a and b, where zj = 1 for observations from group j and zj = 0 otherwise. Then, an √analog of γ for comparing those groups is [(β(a) − β(b) )/ 2]. Inference can use Bonferroni adjustments. With a large number g of groups, it may be useful to model the g(g − 1)/2 comparison measures in terms of fewer parameters, such as is done with the Bradley–Terry model and is discussed in a simpler context by Bergsma et al. (2009, p. 11). The proposed measures in Section 5 that are not connected with a linear latent variable model apply directly to other ordinal models, such as continuation-ratio logit models and adjacent-category logit models that have proportional odds structure (Agresti, 2010, Chapter 4). When the explanatory variables are solely categorical, the data form a contingency table, and (3) for the logit link is the response model analog of association models for cumulative odds ratios, while other Ordinal Probability Eﬀect Measures 219 Table 4 95% conﬁdence intervals for the ordinal superiority measures comparing the two SES levels for the mental impairment data at the sample mean of the life-events index and summarized over life-events values, based on the cumulative probit and cumulative logit models Cumulative Probit Life events x̄ = 4.3 Summary Logit Probit γ (0.19, 0.51) (0.21, 0.49) Logit (0.20, 0.51) (0.21, 0.50) ordinal response models correspond to association models for alternative types of ordinal odds ratios (see Sections 8.3.2– 8.3.4 of Kateri, 2014). Some of these models, such as those expressed in terms of local odds ratios, have approximate connections with underlying normal models. The measures extend also to more general ordinal-response models than those having linear predictors, such as generalized additive models for ordinal responses (e.g., Yee and Wild, 1996), although obtaining conﬁdence intervals is then more challenging. 7. Supplementary Materials Web Appendices A and B, referenced in Section 5, are available with this article at the Biometrics website on Wiley Online Library. Web appendix A contains the technical details for deriving the large-sample conﬁdence intervals for γ ∗ and ∗ , while web appendix B provides the R–function for computˆ ∗ , along with the associated conﬁdence intervals. ing γ̂ ∗ and Acknowledgements The authors appreciate helpful comments about an earlier draft from Wicher Bergsma, Leonardo Grilli, Carla Rampichini, and Euijung Ryu. References Agresti, A. (2010). Analysis of Ordinal Categorical Data, 2nd ed. Hoboken, NJ: Wiley. Agresti, A. (2015). Foundations of Linear and Generalized Linear Models. Hoboken, NJ: Wiley. Anderson, J. A. and Philips, P. R. (1981). Regression, discrimination, and measurement models for ordered categorical variables. Applied Statistics 30, 22–31. Bergsma, W., Croon, M. A., and Hagenaars, J. A. (2009). Marginal Models for Dependent, Clustered, and Longitudinal Categorical Data. New York, NY: Springer. Brumback, L. C., Pepe, M. S., and Alonzo, T. A. (2006). Using the ROC curve for gauging treatment eﬀect in clinical trials. Statistics in Medicine 25, 575–590. Cheng, J. (2009). Estimation and inference for the causal eﬀect of receiving treatment on a multinomial outcome. Biometrics 65, 96–103. (−0.63, 0.03) (−0.57, −0.02) (−0.61, 0.02) (−0.57, −0.01) Christensen, R. H. B. (2011). Analysis of ordinal data with cumulative link models estimation with the ordinal package. R–package version 2011.09-13. Hayter, A. J. (2012). Win-probabilities for regression models. Statistical Methodology 9, 520–527. Kateri, M. (2014). Contingency Table Analysis: Methods and Implementation Using R. New York: Birkäuser/Springer. Kelley, K. (2007). Conﬁdence intervals for standardized eﬀect sizes: theory, application, and implementation. Journal of Statistical Software 20, 1–24. Klotz J. H. (1966). The Wilcoxon, ties, and the computer. Journal of the American Statistical Association 61, 772–787. Lehmann, E. L. (1975). Nonparametrics: Statistical Methods Based on Ranks. San Francisco, CA: Holden-Day. Lehmann, E. L. (1986). Testing Statistical Hypotheses, 2nd edition. New York, NY: Springer-Verlag. Lu, J., Ding, P., and Dasgupta, T. (2015). Sharp bounds of causal eﬀects on ordinal outcomes. http://arxiv.org/abs/1507. 01542v1 Lu, T.-Y., Poon, W.-Y., and Cheung, S. H. (2014). A uniﬁed framework for the comparison of treatments with ordinal responses. Pscyhometrika 79, 605–620. McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society, Series B 42, 109–142. McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In Frontiers in Econometrics, P. Zarembka (ed), 105–142. New York: Academic Press. Ryu, E. and Agresti, A. (2008). Modeling and inference for an ordinal eﬀect size measure. Statistics in Medicine 27, 1703– 1717. Thas, O., De Neve, J., Clement, L., and Ottoy, J.-P. (2012). Probabilistic index models. Journal of the Royal Statistical Society, Series B 74, 623–671. Tian, L. (2008). Conﬁdence intervals for P(Y1 > Y2 ) with normal outcomes in linear models. Statistics in Medicine 27, 4221– 4237. Volfovsky, A., Airoldi, E. M., and Rubin, D. B. (2015). Causal inference for ordinal outcomes. http://arxiv:1501.01234v1 Yee, T. W. and Wild, C. J. (1996). Vector generalized additive models. Journal of the Royal Statistical Society, Series B 58, 481–493. Received February 2016. Revised June 2016. Accepted June 2016.
Purchase answer to see full attachment

Tags: math stat

User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Hello, check the assignment

Running head: SUMMARY FOR SECTION 2

1

Confidence interval of delta (∆)
Practically, with square estimate βˆ in the linear model and S as the residual standard deviation, it
is possible to estim...