Principal Component Analysis of Standard and Spherical Covariances from the Population and Random Samples to Real and Simulated Data

Inge Koch; Lyron Winderbaum; Kanta Naito

doi:doi:10.11648/j.ajtas.20221104.13

| Peer-Reviewed

Principal Component Analysis of Standard and Spherical Covariances from the Population and Random Samples to Real and Simulated Data

Inge Koch, Lyron Winderbaum, Kanta Naito

Published in American Journal of Theoretical and Applied Statistics (Volume 11, Issue 4)

Received: 12 August 2022 Accepted: 5 September 2022 Published: 26 September 2022

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Principal component analysis (PCA) is the tool of choice for summarising multivariate and high-dimensional data as features in a lower-dimensional space. PCA works well for Gaussian data, but may not do so well for high-dimensional, skewed or heavy-tailed data or data with outliers as encountered in practice. The availability of complex data has enhanced these shortcomings and increased the demand for PC approaches that perform well for such data. The purpose of this paper is to critically appraise a class of interpretable PC candidates which can respond to this demand and to compare their performance to that of standard PCA. Among the large variety of nonlinear PCA, we concentrate on the subclass that is based on spherical covariance matrices. This subclass includes the spatial sign, spatial rank, and Kendall’s τ covariance matrix. We focus on three key aspects: population concepts and their properties; sample-based estimators; and actual practice based on the analysis of real and simulated data. At the population level we consider relationships between the standard covariance matrix and spherical covariance matrices. For the random sample we consider natural estimators of the population eigenvectors, look at appropriate distributional models, highlight relationships between different estimators and relate properties of estimators and their population analogues. We complement the theory we present with new analyses of multivariate and high-dimensional real data as well as simulated data from diverse distributions which elucidates behaviour patterns of spherical PCA for elliptic and non-elliptic distributions. The latter are not captured in the theoretical framework, and their inclusion therefore offers fresh insight into the performance of spherical PCA. The combination of the theory and the new analysis evidence that PCA of rank-based covariances severely outperforms that based on the potentially unstable spatial sign covariance matrix. Further, the overall good performance of rank-based PCA and its superior properties for data for which the sample covariance matrix has been known to perform poorly make rank-based PCA not only a desirable addition to standard PCA, but render it a serious competitor for dimension reduction and feature selection while retaining features valued in PCA.

Published in	American Journal of Theoretical and Applied Statistics (Volume 11, Issue 4)
DOI	10.11648/j.ajtas.20221104.13
Page(s)	122-139
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2022. Published by Science Publishing Group

Keywords

Multivariate Ranks, Multivariate Spatial Signs, Nonlinear Covariance Matrices, Performance of Nonlinear PCA, Spherical PCA

References

[1]	J. I. Marden (1999). “Some robust estimates of principal components,” Statistics & probability letters, vol. 43, no. 4, 349-359.
[2]	S. Taskinen, I. Koch, and H. Oja (2012). “Robustifying principal component analysis with spatial sign vectors,” Statistics and Probability Letters, vol. 82, 765-774.
[3]	S. Visuri, V. Koivunen, and H. Oja(2000). “Sign and rank covariance matrices,” Journal of Statistical Planning and Inference, vol. 91, no. 2, 557-575.
[4]	A. Dürre, D. Vogel, and R. Fried (2015). “Spatial sign correlation,” Journal of Multivariate Analysis, vol. 135, 89-105.
[5]	D. Gervini (2008). “Robust functional estimation using the median and spherical principal components,” Biometrika, vol. 95, no. 3, 587-600.
[6]	C. Croux, E. Ollila, and H. Oja (2002). “Sign and rank covariance matrices: statistical properties and application to principal components analysis,” in Statistical data analysis based on the L1-norm and related methods. Springer, 257-269.
[7]	F. Han and H. Liu (2018). “Eca: High- dimensional elliptical component analysis in non- gaussian distributions,” Journal of the American Statistical Association, vol. 113, no. 521, 252-268.
[8]	A. Dürre, D. E. Tyler, and D. Vogel (2016). “On the eigenvalues of the spatial sign covariance matrix in more than two dimensions,” Statistics & Probability Letters, vol. 111, 80-85.
[9]	K. Yu, X. Dang, and Y. Chen (2015). “Robustness of the affine equivariant scatter estimator based on the spatial rank covariance matrix,” Communications in Statistics- Theory and Methods, vol. 44, no. 5, 914-932.
[10]	P. J. Huber (1981). Robust statistics. John Wiley & Sons.
[11]	N. Locantore, J. Marron, D. Simpson, N. Tripoli, J. Zhang, K. Cohen, G. Boente, R. Fraiman, B. Brumback, C. Croux et al. (1999). “Robust principal component analysis for functional data,” Test, vol. 8, no. 1, 1-73.
[12]	S. Visuri, E. Ollila, V. Koivunen, J. Möttoönen, and H. Oja (2003). “Affine equivariant multivariate rank methods,” Journal of Statistical Planning and Inference, vol. 114, no. 1-2, 161-185.
[13]	A. Dürre, D. Vogel,and D. E. Tyler (2014). “The spatial sign covariance matrix with unknown location,” Journal of Multivariate Analysis, vol. 130, 107-117.
[14]	L. J. van‘t Veer, H. Dai, M. J. van de Vijver, Y. D. He, A. A. M. Hart, M. Mao, H. L. Peterse, K. van der Kooy, M. J. Marton, A. T. Witteveen, G. J. Schreiber, R. M. Kerkhoven, C. Roberts, P. S. Linsley, R. Bernards, and S. H. Friend (2002). “Gene expression profiling predicts clinical outcome of breast cancer,” Nature, vol. 415, 530- 536.
[15]	R. D. Cook and S. Weisberg (1999). Applied Statistics Including Computing and Graphics. New York: Wiley.
[16]	S. Aeberhard, D. Coomans, and O. de Vel (1992). “Comparison of classifiers in high dimensional settings,” Tech. Rep. no. 92-02, Dept. of Computer Science and Dept of Mathematics and Statistics, James Cook University of North Queensland, data sets collected by Forina et al and available on http://www.kernel- machines.com/.
[17]	A. Azzalini and A. Capitanio (1999). “Statistical applications of the multivariate skew normal distribution,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 61, no. 3, 579-602.

Cite This Article

Plain Text BibTeX RIS

APA Style

Inge Koch, Lyron Winderbaum, Kanta Naito. (2022). Principal Component Analysis of Standard and Spherical Covariances from the Population and Random Samples to Real and Simulated Data. American Journal of Theoretical and Applied Statistics, 11(4), 122-139. https://doi.org/10.11648/j.ajtas.20221104.13

Copy | Download

ACS Style

Inge Koch; Lyron Winderbaum; Kanta Naito. Principal Component Analysis of Standard and Spherical Covariances from the Population and Random Samples to Real and Simulated Data. Am. J. Theor. Appl. Stat. 2022, 11(4), 122-139. doi: 10.11648/j.ajtas.20221104.13

Copy | Download

AMA Style

Inge Koch, Lyron Winderbaum, Kanta Naito. Principal Component Analysis of Standard and Spherical Covariances from the Population and Random Samples to Real and Simulated Data. Am J Theor Appl Stat. 2022;11(4):122-139. doi: 10.11648/j.ajtas.20221104.13

Copy | Download

@article{10.11648/j.ajtas.20221104.13,
  author = {Inge Koch and Lyron Winderbaum and Kanta Naito},
  title = {Principal Component Analysis of Standard and Spherical Covariances from the Population and Random Samples to Real and Simulated Data},
  journal = {American Journal of Theoretical and Applied Statistics},
  volume = {11},
  number = {4},
  pages = {122-139},
  doi = {10.11648/j.ajtas.20221104.13},
  url = {https://doi.org/10.11648/j.ajtas.20221104.13},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20221104.13},
  abstract = {Principal component analysis (PCA) is the tool of choice for summarising multivariate and high-dimensional data as features in a lower-dimensional space. PCA works well for Gaussian data, but may not do so well for high-dimensional, skewed or heavy-tailed data or data with outliers as encountered in practice. The availability of complex data has enhanced these shortcomings and increased the demand for PC approaches that perform well for such data. The purpose of this paper is to critically appraise a class of interpretable PC candidates which can respond to this demand and to compare their performance to that of standard PCA. Among the large variety of nonlinear PCA, we concentrate on the subclass that is based on spherical covariance matrices. This subclass includes the spatial sign, spatial rank, and Kendall’s τ covariance matrix. We focus on three key aspects: population concepts and their properties; sample-based estimators; and actual practice based on the analysis of real and simulated data. At the population level we consider relationships between the standard covariance matrix and spherical covariance matrices. For the random sample we consider natural estimators of the population eigenvectors, look at appropriate distributional models, highlight relationships between different estimators and relate properties of estimators and their population analogues. We complement the theory we present with new analyses of multivariate and high-dimensional real data as well as simulated data from diverse distributions which elucidates behaviour patterns of spherical PCA for elliptic and non-elliptic distributions. The latter are not captured in the theoretical framework, and their inclusion therefore offers fresh insight into the performance of spherical PCA. The combination of the theory and the new analysis evidence that PCA of rank-based covariances severely outperforms that based on the potentially unstable spatial sign covariance matrix. Further, the overall good performance of rank-based PCA and its superior properties for data for which the sample covariance matrix has been known to perform poorly make rank-based PCA not only a desirable addition to standard PCA, but render it a serious competitor for dimension reduction and feature selection while retaining features valued in PCA.},
 year = {2022}
}

Copy | Download

TY  - JOUR
T1  - Principal Component Analysis of Standard and Spherical Covariances from the Population and Random Samples to Real and Simulated Data
AU  - Inge Koch
AU  - Lyron Winderbaum
AU  - Kanta Naito
Y1  - 2022/09/26
PY  - 2022
N1  - https://doi.org/10.11648/j.ajtas.20221104.13
DO  - 10.11648/j.ajtas.20221104.13
T2  - American Journal of Theoretical and Applied Statistics
JF  - American Journal of Theoretical and Applied Statistics
JO  - American Journal of Theoretical and Applied Statistics
SP  - 122
EP  - 139
PB  - Science Publishing Group
SN  - 2326-9006
UR  - https://doi.org/10.11648/j.ajtas.20221104.13
AB  - Principal component analysis (PCA) is the tool of choice for summarising multivariate and high-dimensional data as features in a lower-dimensional space. PCA works well for Gaussian data, but may not do so well for high-dimensional, skewed or heavy-tailed data or data with outliers as encountered in practice. The availability of complex data has enhanced these shortcomings and increased the demand for PC approaches that perform well for such data. The purpose of this paper is to critically appraise a class of interpretable PC candidates which can respond to this demand and to compare their performance to that of standard PCA. Among the large variety of nonlinear PCA, we concentrate on the subclass that is based on spherical covariance matrices. This subclass includes the spatial sign, spatial rank, and Kendall’s τ covariance matrix. We focus on three key aspects: population concepts and their properties; sample-based estimators; and actual practice based on the analysis of real and simulated data. At the population level we consider relationships between the standard covariance matrix and spherical covariance matrices. For the random sample we consider natural estimators of the population eigenvectors, look at appropriate distributional models, highlight relationships between different estimators and relate properties of estimators and their population analogues. We complement the theory we present with new analyses of multivariate and high-dimensional real data as well as simulated data from diverse distributions which elucidates behaviour patterns of spherical PCA for elliptic and non-elliptic distributions. The latter are not captured in the theoretical framework, and their inclusion therefore offers fresh insight into the performance of spherical PCA. The combination of the theory and the new analysis evidence that PCA of rank-based covariances severely outperforms that based on the potentially unstable spatial sign covariance matrix. Further, the overall good performance of rank-based PCA and its superior properties for data for which the sample covariance matrix has been known to perform poorly make rank-based PCA not only a desirable addition to standard PCA, but render it a serious competitor for dimension reduction and feature selection while retaining features valued in PCA.
VL  - 11
IS  - 4
ER  -

Copy | Download

Author Information

Inge Koch

Department of Mathematics and Statistics, The University of Western Australia, Crawley, Australia
Lyron Winderbaum

Department of Mathematical Sciences, The University of South Australia, Adelaide, Australia
Kanta Naito

Department of Mathematics, Chiba University, Chiba, Japan

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Inge Koch, Lyron Winderbaum, Kanta Naito. (2022). Principal Component Analysis of Standard and Spherical Covariances from the Population and Random Samples to Real and Simulated Data. American Journal of Theoretical and Applied Statistics, 11(4), 122-139. https://doi.org/10.11648/j.ajtas.20221104.13

Copy | Download

ACS Style

Inge Koch; Lyron Winderbaum; Kanta Naito. Principal Component Analysis of Standard and Spherical Covariances from the Population and Random Samples to Real and Simulated Data. Am. J. Theor. Appl. Stat. 2022, 11(4), 122-139. doi: 10.11648/j.ajtas.20221104.13

Copy | Download

AMA Style

Inge Koch, Lyron Winderbaum, Kanta Naito. Principal Component Analysis of Standard and Spherical Covariances from the Population and Random Samples to Real and Simulated Data. Am J Theor Appl Stat. 2022;11(4):122-139. doi: 10.11648/j.ajtas.20221104.13

Copy | Download

@article{10.11648/j.ajtas.20221104.13,
  author = {Inge Koch and Lyron Winderbaum and Kanta Naito},
  title = {Principal Component Analysis of Standard and Spherical Covariances from the Population and Random Samples to Real and Simulated Data},
  journal = {American Journal of Theoretical and Applied Statistics},
  volume = {11},
  number = {4},
  pages = {122-139},
  doi = {10.11648/j.ajtas.20221104.13},
  url = {https://doi.org/10.11648/j.ajtas.20221104.13},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20221104.13},
  abstract = {Principal component analysis (PCA) is the tool of choice for summarising multivariate and high-dimensional data as features in a lower-dimensional space. PCA works well for Gaussian data, but may not do so well for high-dimensional, skewed or heavy-tailed data or data with outliers as encountered in practice. The availability of complex data has enhanced these shortcomings and increased the demand for PC approaches that perform well for such data. The purpose of this paper is to critically appraise a class of interpretable PC candidates which can respond to this demand and to compare their performance to that of standard PCA. Among the large variety of nonlinear PCA, we concentrate on the subclass that is based on spherical covariance matrices. This subclass includes the spatial sign, spatial rank, and Kendall’s τ covariance matrix. We focus on three key aspects: population concepts and their properties; sample-based estimators; and actual practice based on the analysis of real and simulated data. At the population level we consider relationships between the standard covariance matrix and spherical covariance matrices. For the random sample we consider natural estimators of the population eigenvectors, look at appropriate distributional models, highlight relationships between different estimators and relate properties of estimators and their population analogues. We complement the theory we present with new analyses of multivariate and high-dimensional real data as well as simulated data from diverse distributions which elucidates behaviour patterns of spherical PCA for elliptic and non-elliptic distributions. The latter are not captured in the theoretical framework, and their inclusion therefore offers fresh insight into the performance of spherical PCA. The combination of the theory and the new analysis evidence that PCA of rank-based covariances severely outperforms that based on the potentially unstable spatial sign covariance matrix. Further, the overall good performance of rank-based PCA and its superior properties for data for which the sample covariance matrix has been known to perform poorly make rank-based PCA not only a desirable addition to standard PCA, but render it a serious competitor for dimension reduction and feature selection while retaining features valued in PCA.},
 year = {2022}
}

Copy | Download

TY  - JOUR
T1  - Principal Component Analysis of Standard and Spherical Covariances from the Population and Random Samples to Real and Simulated Data
AU  - Inge Koch
AU  - Lyron Winderbaum
AU  - Kanta Naito
Y1  - 2022/09/26
PY  - 2022
N1  - https://doi.org/10.11648/j.ajtas.20221104.13
DO  - 10.11648/j.ajtas.20221104.13
T2  - American Journal of Theoretical and Applied Statistics
JF  - American Journal of Theoretical and Applied Statistics
JO  - American Journal of Theoretical and Applied Statistics
SP  - 122
EP  - 139
PB  - Science Publishing Group
SN  - 2326-9006
UR  - https://doi.org/10.11648/j.ajtas.20221104.13
AB  - Principal component analysis (PCA) is the tool of choice for summarising multivariate and high-dimensional data as features in a lower-dimensional space. PCA works well for Gaussian data, but may not do so well for high-dimensional, skewed or heavy-tailed data or data with outliers as encountered in practice. The availability of complex data has enhanced these shortcomings and increased the demand for PC approaches that perform well for such data. The purpose of this paper is to critically appraise a class of interpretable PC candidates which can respond to this demand and to compare their performance to that of standard PCA. Among the large variety of nonlinear PCA, we concentrate on the subclass that is based on spherical covariance matrices. This subclass includes the spatial sign, spatial rank, and Kendall’s τ covariance matrix. We focus on three key aspects: population concepts and their properties; sample-based estimators; and actual practice based on the analysis of real and simulated data. At the population level we consider relationships between the standard covariance matrix and spherical covariance matrices. For the random sample we consider natural estimators of the population eigenvectors, look at appropriate distributional models, highlight relationships between different estimators and relate properties of estimators and their population analogues. We complement the theory we present with new analyses of multivariate and high-dimensional real data as well as simulated data from diverse distributions which elucidates behaviour patterns of spherical PCA for elliptic and non-elliptic distributions. The latter are not captured in the theoretical framework, and their inclusion therefore offers fresh insight into the performance of spherical PCA. The combination of the theory and the new analysis evidence that PCA of rank-based covariances severely outperforms that based on the potentially unstable spatial sign covariance matrix. Further, the overall good performance of rank-based PCA and its superior properties for data for which the sample covariance matrix has been known to perform poorly make rank-based PCA not only a desirable addition to standard PCA, but render it a serious competitor for dimension reduction and feature selection while retaining features valued in PCA.
VL  - 11
IS  - 4
ER  -

Copy | Download