#### Date of Award

Summer 1989

#### Document Type

Dissertation

#### Degree Name

Doctor of Philosophy (PhD)

#### Department

Mathematics and Statistics

#### Program/Concentration

Computational and Applied Mathematics

#### Committee Director

Dayanand N. Naik

#### Committee Member

Ram C. Dahiya

#### Committee Member

Michael J. Doviak

#### Committee Member

Edward P. Markowski

#### Abstract

Observations arising from a linear regression model, lead one to believe that a particular observation or a set of observations are aberrant from the rest of the data. These may arise in several ways: for example, from incorrect or faulty measurements or by gross errors in either response or explanatory variables. Sometimes the model may inadequately describe the systematic structure of the data, or the data may be better analyzed in another scale. When diagnostics indicate the presence of anomalous data, then either these data are indeed unusual and hence helpful, or contaminated and, therefore, in need of modifications or deletions.

Therefore, it is desirable to develop a technique which can identify unusual observations, and determine how they influence the response variate. A large number of statistics are used, in the literature, to detect outliers and influential observations in the linear regression models. Two kinds of comparison studies to determine an optimal statistic are done in this dissertation: (i) using several data sets studied by different authors, and (ii) a detailed simulation study. Various choices of the design matrix of the regression model are considered to study the performance of these statistics in the case of multicollinearity and other situations. Calibration points using the exact distributions and the Bonferroni's inequality are given for each statistic. The results show that, in general, a set of two or three statistics is needed to detect outliers, and a different set of statistics to detect influential observations.

Various measures have been proposed which emphasize different aspects of influence upon the linear regression model. Many of the existing measures for detecting influential observations in linear regression models have natural extensions to the multivariate regression. The measures of influence are generalized to the multivariate regression model and multivariate analysis of variance models. Several data sets are considered to illustrate the methods. The regression models with autocorrelated errors are also studied to develop diagnostic statistics based on intervention analysis.

#### DOI

10.25777/gte7-c039

#### Recommended Citation

Hossain, Anwar M..
"Detection of Outliers and Influential Observations in Regression Models"
(1989). Doctor of Philosophy (PhD), Dissertation, Mathematics and Statistics, Old Dominion University, DOI: 10.25777/gte7-c039

https://digitalcommons.odu.edu/mathstat_etds/80