Date of Award

Summer 1989

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Mathematics and Statistics

Program/Concentration

Computational and Applied Mathematics

Committee Director

Dayanand N. Naik

Committee Member

Ram C. Dahiya

Committee Member

Michael J. Doviak

Committee Member

Edward P. Markowski

Abstract

Observations arising from a linear regression model, lead one to believe that a particular observation or a set of observations are aberrant from the rest of the data. These may arise in several ways: for example, from incorrect or faulty measurements or by gross errors in either response or explanatory variables. Sometimes the model may inadequately describe the systematic structure of the data, or the data may be better analyzed in another scale. When diagnostics indicate the presence of anomalous data, then either these data are indeed unusual and hence helpful, or contaminated and, therefore, in need of modifications or deletions.

Therefore, it is desirable to develop a technique which can identify unusual observations, and determine how they influence the response variate. A large number of statistics are used, in the literature, to detect outliers and influential observations in the linear regression models. Two kinds of comparison studies to determine an optimal statistic are done in this dissertation: (i) using several data sets studied by different authors, and (ii) a detailed simulation study. Various choices of the design matrix of the regression model are considered to study the performance of these statistics in the case of multicollinearity and other situations. Calibration points using the exact distributions and the Bonferroni's inequality are given for each statistic. The results show that, in general, a set of two or three statistics is needed to detect outliers, and a different set of statistics to detect influential observations.

Various measures have been proposed which emphasize different aspects of influence upon the linear regression model. Many of the existing measures for detecting influential observations in linear regression models have natural extensions to the multivariate regression. The measures of influence are generalized to the multivariate regression model and multivariate analysis of variance models. Several data sets are considered to illustrate the methods. The regression models with autocorrelated errors are also studied to develop diagnostic statistics based on intervention analysis.

DOI

10.25777/gte7-c039

Share

COinS