| Peer-Reviewed

On the Coefficient of Determination Ratio for Detecting Influential Outliers in Linear Regression Analysis

Received: 1 January 2022    Accepted: 26 January 2022    Published: 9 February 2022
Views:       Downloads:
Abstract

The initial procedure of the Coefficient of Determination Ratio (CDR) for determining outliers in linear regression model is suggested for centred data and declares an observation as an outlier if the CDR value deviates from unity. Although the method performs very well and detects more precisely the requisite outliers than those observed by other well-known detection measures, the cut-off rule approach is a source of subjectivity and the data structure for which the method is designed is also restrictive. In this study therefore, a more rigorous cut-off rule of the same method for identifying influential observations is outlined for an updated method of the CDR that covers the more general case of a non-centred data. A cut-off rule is specified that involves the ratio of quantile values of the Beta distribution. An automated implementation of the procedure is presented that makes use of datasets in the literature and those that are simulated under various conditions of sample size, number and distribution of explanatory variables. The method is now made more generalized in application, objective and reliable as a detection measure than the initial proposal. It therefore provides most appreciable improvement in the explanatory power of linear regression models when the identified outliers are deleted from the data.

Published in American Journal of Theoretical and Applied Statistics (Volume 11, Issue 1)
DOI 10.11648/j.ajtas.20221101.14
Page(s) 27-35
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Outliers, Coefficient of Determination Ratio, Linear Regression, Regression Diagnostics, Influential Observation

References
[1] Chatterjee, S., & Hadi, A. S. (1986). Influential observations, high leverage points, and outliers in linear regression. Statistical Science, 1 (3), 379-393.
[2] Chatterjee, S., & Hadi, A. S. (2006). Regression analysis by example. New Jersey, NJ: John Wiley & Sons.
[3] Hadi, A. S. (1992). A new measure of overall potential influence in linear regression. Journal of the Royal Statistical Society, series B (Methodological), 54, 761-771.
[4] Pena, D. (2005). A new statistic for influence in linear regression. Technometrics, 47 (1), 1-12.
[5] Barnett, V., & Lewis, T. (1994). Outliers in statistical data (3rd ed.). New York, NY: John Wiley and Sons.
[6] Cook, R. D., & Weisberg, S. (1982). Residuals and Influence in Regression. New York, NY: Chapman and Hall.
[7] Draper, N. R., & John, J. A. (1981). Influential observations and outliers in regression. Technometrics, 23 (1), 21-26.
[8] Hadi, A. S., & Simonoff, J. S. (1993). Procedures for the identification of multiple outliers in linear models. Journal of the American Statistical Association, 88 (424), 1264-1272.
[9] Hawkins, D. M. (1991). Diagnostics for the use with regression recursive residuals, Technometrics, 33 (2), 221-234.
[10] Lawrence, A. J. (1995). Deletion influence and masking in regression, Journal of the Royal Statistical Society, Series B (Methodological), 57 (1), 181-189.
[11] Cook, R. D. (1977). Detection of influential observations in linear regression, Technometrics, 22: 494–508.
[12] Belsley, D. A., Kuh, E. & Welsch, R. E. (2004). Regression diagnostics: Identifying influential data and sources of collinearity (2nd ed.). New Jersey, NJ: John Wiley & Sons.
[13] Zakaria, A., Howard, N. K., & Nkansah, B. K. (2014). On the detection of influential outliers in linear regression analysis, American Journal of Theoretical and Applied Statistics, 3 (4), 100-106. doi: 10.11648/j.ajtas.20140304.14.
[14] Rencher, A. C. & Schaalje, G. B. (2008). Linear models in statistics (2nd ed.). New Jersey, NJ: John Wiley & Sons.
[15] Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., & Hothorn, T. (2021). mvtnorm: Multivariate Normal and t Distributions. R package version 1.1-3, https://CRAN.R-project.org/package=mvtnorm.
[16] Siniksaran, E. & Satman, M. H. (2011). PURO: A package for unmasking regression outliers, Gazi University Journal of Science, 24 (1), 59-68.
[17] Billor, N., Chatterjee, S., & Hadi, A. S. (2006). A re-weighted least squares method for robust regression estimation, American Journal of Mathematical and Management Science, 26 (3&4), 229-252.
Cite This Article
  • APA Style

    Arimiyaw Zakaria, Benony Kwaku Gordor, Bismark Kwao Nkansah. (2022). On the Coefficient of Determination Ratio for Detecting Influential Outliers in Linear Regression Analysis. American Journal of Theoretical and Applied Statistics, 11(1), 27-35. https://doi.org/10.11648/j.ajtas.20221101.14

    Copy | Download

    ACS Style

    Arimiyaw Zakaria; Benony Kwaku Gordor; Bismark Kwao Nkansah. On the Coefficient of Determination Ratio for Detecting Influential Outliers in Linear Regression Analysis. Am. J. Theor. Appl. Stat. 2022, 11(1), 27-35. doi: 10.11648/j.ajtas.20221101.14

    Copy | Download

    AMA Style

    Arimiyaw Zakaria, Benony Kwaku Gordor, Bismark Kwao Nkansah. On the Coefficient of Determination Ratio for Detecting Influential Outliers in Linear Regression Analysis. Am J Theor Appl Stat. 2022;11(1):27-35. doi: 10.11648/j.ajtas.20221101.14

    Copy | Download

  • @article{10.11648/j.ajtas.20221101.14,
      author = {Arimiyaw Zakaria and Benony Kwaku Gordor and Bismark Kwao Nkansah},
      title = {On the Coefficient of Determination Ratio for Detecting Influential Outliers in Linear Regression Analysis},
      journal = {American Journal of Theoretical and Applied Statistics},
      volume = {11},
      number = {1},
      pages = {27-35},
      doi = {10.11648/j.ajtas.20221101.14},
      url = {https://doi.org/10.11648/j.ajtas.20221101.14},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20221101.14},
      abstract = {The initial procedure of the Coefficient of Determination Ratio (CDR) for determining outliers in linear regression model is suggested for centred data and declares an observation as an outlier if the CDR value deviates from unity. Although the method performs very well and detects more precisely the requisite outliers than those observed by other well-known detection measures, the cut-off rule approach is a source of subjectivity and the data structure for which the method is designed is also restrictive. In this study therefore, a more rigorous cut-off rule of the same method for identifying influential observations is outlined for an updated method of the CDR that covers the more general case of a non-centred data. A cut-off rule is specified that involves the ratio of quantile values of the Beta distribution. An automated implementation of the procedure is presented that makes use of datasets in the literature and those that are simulated under various conditions of sample size, number and distribution of explanatory variables. The method is now made more generalized in application, objective and reliable as a detection measure than the initial proposal. It therefore provides most appreciable improvement in the explanatory power of linear regression models when the identified outliers are deleted from the data.},
     year = {2022}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - On the Coefficient of Determination Ratio for Detecting Influential Outliers in Linear Regression Analysis
    AU  - Arimiyaw Zakaria
    AU  - Benony Kwaku Gordor
    AU  - Bismark Kwao Nkansah
    Y1  - 2022/02/09
    PY  - 2022
    N1  - https://doi.org/10.11648/j.ajtas.20221101.14
    DO  - 10.11648/j.ajtas.20221101.14
    T2  - American Journal of Theoretical and Applied Statistics
    JF  - American Journal of Theoretical and Applied Statistics
    JO  - American Journal of Theoretical and Applied Statistics
    SP  - 27
    EP  - 35
    PB  - Science Publishing Group
    SN  - 2326-9006
    UR  - https://doi.org/10.11648/j.ajtas.20221101.14
    AB  - The initial procedure of the Coefficient of Determination Ratio (CDR) for determining outliers in linear regression model is suggested for centred data and declares an observation as an outlier if the CDR value deviates from unity. Although the method performs very well and detects more precisely the requisite outliers than those observed by other well-known detection measures, the cut-off rule approach is a source of subjectivity and the data structure for which the method is designed is also restrictive. In this study therefore, a more rigorous cut-off rule of the same method for identifying influential observations is outlined for an updated method of the CDR that covers the more general case of a non-centred data. A cut-off rule is specified that involves the ratio of quantile values of the Beta distribution. An automated implementation of the procedure is presented that makes use of datasets in the literature and those that are simulated under various conditions of sample size, number and distribution of explanatory variables. The method is now made more generalized in application, objective and reliable as a detection measure than the initial proposal. It therefore provides most appreciable improvement in the explanatory power of linear regression models when the identified outliers are deleted from the data.
    VL  - 11
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Department of Statistics, University of Cape Coast, Cape Coast, Ghana

  • Department of Computer Science and Information Systems, Ashesi University, Berekuso, Ghana

  • Department of Statistics, University of Cape Coast, Cape Coast, Ghana

  • Sections