1 Inference about variation.- 1.1 Imperfection and variation.- 1.2 Educational measurement and testing.- 1.3 Statistical context.- 1.
3.1 Statistical objects.- 1.3.2 Estimation.- 1.3.3 Correlation structure and similarity.
- 1.3.4 Notation.- 2 Reliability of essay rating.- 2.1 Introduction.- 2.2 Models.
- 2.3 Estimation.- 2.4 Extensions.- 2.5 Diagnostic procedures.- 2.6 Examples.
- 2.6.1 Advanced Placement tests.- 2.7 Standard errors.- 2.7.1 Simulations.
- 2.8 Summary.- 2.9 Literature review.- 3 Adjusting subjectively rated scores.- 3.1 Introduction.- 3.
2 Estimating severity.- 3.3 Examinee-specific shrinkage.- 3.3.1 Rating in a single session.- 3.3.
2 Shrinking to the rater's mean.- 3.4 General scheme.- 3.4.1 Sensitivity and robustness.- 3.5 More diagnostics.
- 3.6 Examples.- 3.6.1 Advanced Placement tests.- 3.7 Estimating linear combinations of true scores.- 3.
7.1 Optimal linear combinations.- 3.8 Summary.- Appendix. Derivation of MSE for the general adjustment scheme.- 4 Rating several essays.- 4.
1 Introduction.- 4.2 Models.- 4.3 Estimation.- 4.4 Application.- 4.
4.1 Itemwise analyses.- 4.4.2 Simultaneous analysis.- 4.5 Choice of essay topics.- 4.
5.1 Modelling choice.- 4.5.2 Simulations.- 4.6 Summary.- 5 Summarizing item-level properties.
- 5.1 Introduction.- 5.2 Differential item functioning.- 5.3 DIF variance.- 5.4 Estimation.
- 5.5 Examples.- 5.5.1 National Teachers' Examination.- 5.5.2 GRE Verbal test.
- 5.6 Shrinkage estimation of DIF coefficients.- 5.7 Model criticism and diagnostics.- 5.8 Multiple administrations.- 5.8.
1 Estimation.- 5.8.2 Examples.- 5.8.3 Other applications.- 5.
9 Conclusion.- 6 Equating and equivalence of tests.- 6.1 Introduction.- 6.2 Equivalent scores.- 6.2.
1 Equating test forms.- 6.2.2 Half-forms.- 6.2.3 Linear true-score equating.- 6.
3 Estimation.- 6.4 Application.- 6.4.1 Data and analysis.- 6.4.
2 Comparing validity.- 6.4.3 Model criticism.- 6.5 Summary.- 7 Inference from surveys with complex sampling design.- 7.
1 Introduction.- 7.2 Sampling design.- 7.2.1 The realized sampling design.- 7.2.
2 The 'model' sampling design.- 7.2.3 Sampling weights and non-response.- 7.3 Proficiency scores.- 7.3.
1 Imputed values.- 7.4 Jackknife.- 7.5 Model-based method.- 7.5.1 Stratification and clustering.
- 7.5.2 Sampling variance of the ratio estimator.- 7.5.3 Within-cluster variance.- 7.5.
4 Between-cluster variance.- 7.5.5 Multivariate outcomes.- 7.6 Examples.- 7.6.
1 Subpopulation means.- 7.6.2 How much do weights matter?.- 7.7 Estimating proportions.- 7.7.
1 Percentiles.- 7.8 Regression with survey data.- 7.9 Estimating many subpopulation means.- 7.10 Jackknife and model-based estimators.- 7.
11 Summary.- 8 Small-area estimation.- 8.1 Introduction.- 8.2 Shrinkage estimation.- 8.3 Regression with survey data.
- 8.4 Fitting two-level regression.- 8.4.1 Restricted maximum likelihood.- 8.4.2 Sampling weights.
- 8.5 Small-area mean prediction.- 8.6 Selection of covariates.- 8.7 Application.- 8.7.
1 No adjustment.- 8.7.2 Adjustment for covariates.- 8.7.3 Prediction and cross-validation.- 8.
7.4 Refinement.- 8.8 Summary and literature review.- 9 Cut scores for pass/fail decisions.- 9.1 Introduction.- 9.
2 Models.- 9.3 Fitting logistic regression.- 9.3.1 Generalized linear models.- 9.3.
2 Random coefficients.- 9.3.3 Cut score estimation.- 9.4 Examples.- 9.4.
1 PPST Writing test.- 9.4.2 Physical Education.- 9.5 Summary.- 10 Incomplete longitudinal data.- 10.
1 Introduction.- 10.2 Informative missingness.- 10.3 Longitudinal analysis.- 10.4 EM algorithm.- 10.
5 Application.- 10.6 Estimation.- 10.6.1 Variation in growth.- 10.6.
2 Covariate adjustment.- 10.6.3 Missing covariate data.- 10.6.4 Standard errors.- 10.
6.5 Clustering.- 10.7 Summary.- References.