The Problem of Zero Cells in the Analysis of Contingency Tables
DOI:
https://doi.org/10.15678/ZNUEK.2015.0941.0504Keywords:
zero cell, categorical data analysis, contingency tables, log-linear analysisAbstract
Log-linear analysis is a statistical tool used to analyse the independence of categorical data in contingency tables. With this method, any number of nominal or ordinal variables can be analysed: interactions can be included in the model, various types of association can be analysed, and the analysis provides a formal model equation. Although log-linear analysis is a versatile statistical method, there are some limitations in using it due to zero cells. Zero cells in contingency table are of two types: fixed (structural) and sampling zeros. Fixed zeros occur when it is impossible to observe values for certain combinations of the variable. Sampling zeros are due to sampling variations and the relatively small size of the sample when compared with a large number of cells. In the paper several options will be presented for how to deal with zero cells in a table. All calculations will be conducted in R.
Downloads
References
Akaike H. (1973), Information Theory and an Extension of the Maximum Likelihood Principle, Proceedings of the 2nd International Symposium on Information, Budapest.
Andersen E. B. (1997), Introduction to the Statistical Analysis of Categorical Data, New York, Springer. DOI: https://doi.org/10.1007/978-3-642-59123-5
Brzezińska J. (2013), Metody wizualizacji danych jakościowych w programie R, Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu, 279, Klasyfikacja i analiza danych - teoria i zastosowania, 21: 182-190.
Brzezińska J. (2014), Visual Models for Categorical Data in Economic Research, M. Spiliopoulou, L. Schmids-Thieme, R. Janning (eds), Studies in Classification, Data Analysis, and Knowledge Organization: Data Analysis, Machine Learning and Knowledge Discovery, Springer: 33-40. DOI: https://doi.org/10.1007/978-3-319-01595-8_4
Brzezińska J. (2015), Analiza logarytmiczno-liniowa. Teoria i zastosowania z wykorzystaniem programu R, C.H. Beck, Warszawa.
Clausen S. E. (1998), Applied Correspondence Analysis. An Introduction, Sage Publications, Thousand Oaks.
Clogg C. C., Eliason S. R. (1988), Some Common Problems in Log-linear Analysis (in:) J. S. Long (ed.), Common Problems/Propers Solutions, Sage, Newbury Park, CA.
Cohen A. (1980), On the Graphical Display of the Significant Components in a Two-way Contingency Table, "Communications in Statistics - Theory and Methods", 9(10): 1025-1041, http://dx.doi.org/10.1080/03610928008827940. DOI: https://doi.org/10.1080/03610928008827940
Everitt B. (1977), The Analysis of Contingency Tables, Chapman and Hall, London. DOI: https://doi.org/10.1007/978-1-4899-2927-3
Evers M., Namboodiri N. K. (1977), A Monte Carlo Assessment of the Stability of Log-linear Estimates in Small Samples, Proceedings of the American Statistical Association, Social Statistics Section, American Statistical Association, Washington, DC.
Fienberg S. E. (1980), The Analysis of Cross-classified Categorical Data, MIT Press, Cambridge.
Friendly M. (1991), The SAS System for Statistical Graphics, SAS Institute Inc., Carry, NC.
Friendly M. (2000), Visualizing Categorical Data, SAS Institute Inc., Carry, NC.
Goodman L. A. (1970), The Multivariate Analysis of Qualitative Data: Interaction among Multiple Classifications, "Journal of the American Statistical Association", 65(329): 226-256, http://dx.doi.org/10.1080/01621459.1970.10481076. DOI: https://doi.org/10.1080/01621459.1970.10481076
Greenacre M. J. (1984), Theory and Applications of Correspondence Analysis, Academic Press, London.
Grizzle J. E., Starmer C. F., Koch G. C. (1969), Analysis of Categorical Data by Linear Models, "Biometrics", 25(3): 489-504, http://dx.doi.org/10.2307/2528901. DOI: https://doi.org/10.2307/2528901
Ishii-Kunts M. (1994), Ordinal Log-linear Models, Sage University Paper Series on Quantitative Applications in the Social Science, series no. 07-097, Sage, Beverly Hills-London.
Knoke D., Burke P. J. (1980), Log-linear Models, Sage University Paper Series on Quantitative Applications in the Social Science, series no. 07-020, Sage, BeverlyHills-London.
Raftery A. E. (1986), Choosing Models for Bross-classification, "American Sociological Review", 51: 145-146. DOI: https://doi.org/10.2307/2095483
Smirnoff J. S. (2003), Analyzing Categorical Data, Springer Texts in Statistics, Springer, New York. DOI: https://doi.org/10.1007/978-0-387-21727-7
Snee R. D. (1974), Graphical Display of Two-way Contingency Tables, "The American Statistician", 28(1): 9-12, http://dx.doi.org/10.2307/2683520. DOI: https://doi.org/10.1080/00031305.1974.10479053
Downloads
Published
Issue
Section
License
Copyright (c) 2015 Cracow Review of Economics and Management
This work is licensed under a Creative Commons Attribution 4.0 International License.