Ali Hadi's Research Activities
Ali Hadi's Research
Activities
Areas of Research Interests
I am interested in solving practical problems in statistics and related
fields (e.g, applied probability, computer science, mathematics, and
engineering). My publications include four
Books and more than seventy articles. The methods for outlier detection, the Influence
measure and the Potential-Residual Plot have
been implemented in several statistics packages (e.g., Data Desk, Stata, and SYSTAT).
Areas of my research Interests include:
- Probability and Statistical Science:
Return to Home
Robust Statistics and Outlier Detection
Although it is customary to assume that data are homogeneous, in fact
they often contain outliers or subgroups. Scientists and philosophers have
recognized for at least 380 years that real data are not homogeneous and
that the identification of outliers is an important step in the progress
of scientific understanding. Methods that deal with robust estimation and
outlier detection are presented in the following articles:
Robust Regression
Methods:
- Billor, N., Chatterjee, S., and Hadi, A. S., (2004), "A Re-Weighted Least Squares Method for
Robust Regression Estimation," American Journal of Mathematical and Management Sciences,
(in press). Contact me for
computer programs.
- Castillo, E., Hadi, A. S., Lacruz, B., and Sarabia, J. M. (2001), "Regression
Diagnostics for the Least Absolute Value and the Minimax Methods," Communications
in Statistics: Theory and Methods, 30, 6, 1197-1225.
- Hadi, A. S. and Luceño, A. (1997), "Maximum Trimmed
Likelihood Estimators: A Unified Approach, Examples, and Algorithms,"
Journal of Computational Statistics & Data Analysis, 25, 251-272.
- Hadi, A. S. and Simonoff, J. S. (1994), "Improving the
Estimation and outlier Identification Properties of the Least Median of Squares and Minimum Volume Ellipsoid Estimators," Parisankhyan
Samikkha, 1, 61-70.
Detection of Outliers
in Large Data Sets:
- Billor, N., Hadi, A. S. and Velleman , P. F. (2000), "BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators," Computational Statist & Data Analysis, 34, 279-298.
Contact me for a copy of the paper and computer programs
Detection of
Outliers in Multivariate Data:
- Hadi, A. S. and Nyquist, H. (1999), "Frechet Distance as a Tool
for Diagnosing Elliptically Symmetric Multivariate Data,"
Linear Algebra and Its Applications, 289, 183-201.
- Hadi, A. S. (1994), "A Modification of a Method for the
Detection
of Outliers in Multivariate Samples," Journal of the Royal
Statistical
Society, Series (B), 56, 393-396. This methods has been
implemented
in Stata (hadimvo), and in SYSTAT. Also,
click here to copy an S-PLUS code.
- Gould, W. and Hadi, A. S. (1993), "Identifying Multivariate
Outliers,"
Stata Technical Bulletin, 11, 2-5. (An implementation in Stata
of the method in the above paper).
- Hadi, A. S. (1992), "Identifying Multiple Outliers in
Multivariate
Data," Journal of the Royal Statistical Society, Series (B),
54, 761-771.
Detection of
Outliers in Regression Data:
- Billor, N., Chatterjee, S., and Hadi, A. S., "A Re-Weighted Least Squares Method
for Robust Regression Estimation," Manuscript under review.
Contact me for computer programs.
- Hadi, A. S. and Simonoff , J. S. (1997), "A More Robust Outlier
Identifier for Regression Data," Bulletin of the International
Statistical Institute, 281-282. Contact me or Jeff Simonoff
for
computer programs.
- Hadi, A. S. and Simonoff, J. S. (1993), "Procedures for the
Identification of Multiple Outliers in Linear Models," Journal of the American Statistical Association, 88, 1264-1272. (Code
available)
- Hadi, A. S. (1992), "A New Measure
of Overall Potential Influence in Linear Regression,"
Computational Statistics & Data Analysis, 14, 1-27. The proposed influence measure has been implemented in
Data Desk.
Graphical Methods
for the Detection of Outliers:
- Billor, N., Chatterjee, S., and Hadi, A. S., (2004), "A Re-Weighted Least Squares Method for
Robust Regression Estimation," American Journal of Mathematical and Management Sciences,
(in press). Contact me for computer programs.
- Dodge, Y. and Hadi, A. S. (1999), "Simple Graphs and Bounds for
the Elements of the Hat Matrix," Journal of Applied Statistics, 26, 817-823.
- Hadi, A. S. (1992), "A New Measure of
Overall Potential Influence in Linear Regression," Computational
Statistics & Data Analysis, 14, 1-27. The proposed Potential-Residual Plot has been implemented
in Data Desk.
- Hadi, A. S. (1993), "Graphical Methods for Linear Models,"
Chapter 23 in Handbook of Statistics: Computational Statistics, (C. R. Rao,
Ed.), Vol. 9, North-Holland Publishing Company, 775-802.
- Hadi, A. S. (1990), "Two Graphical Displays for the Detection
of Potentially Influential Subsets in Regression," Journal of
Applied Statistics, 17, 313-327.
Return to Home or
Research activities
Parameter and Quantile Estimation
- Castillo, E., Hadi, A. S., Lacruz, B., and Sarabia, J. M. (2001), "Constrained Mixture
Distributions," Metrika, 55, 247-269..
- Castillo, E., Hadi, A. S., and Sarabia, J. M. (1998), "A Method
for Estimating Lorenz Curves," Communications in Statistics,
Theory and Methods, 27, 2037-2063.
- Castillo, E. and Hadi, A. S. (1997), "Fitting the Generalized
Pareto Distribution to Data," Journal of the American Statistical
Association, 92, 1609-1620.
- Hadi, A. S. and Luceño, A. (1997), "Maximum Trimmed
Likelihood
Estimators: A Unified Approach, Examples, and Algorithms," Journal
of Computational Statistics & Data Analysis, 25, 251-272.
- Castillo, E., Hadi, A. S., and Sarabia, J. M. (1997), "Fitting
Continuous Bivariate Distributions to Data," The Statistician,
46, 355-369.
- Castillo, E. and Hadi, A. S. (1995), "A Method for Estimating
Parameters and Quantiles of Continuous Distributions of Random
Variables," Computational Statistics & Data Analysis, 20, 421-439.
- Castillo, E. and Hadi, A. S. (1994), "Parameter and Quantile
Estimation for the Generalized Extreme-Value Distribution," Environmetrics, 5, 417-432.
- Castillo, E. and Hadi, A. S. (1994), "Parameters and Quantiles
Estimation for Continuously Distributed Random Variables,"
Proceedings of the Statistical Computing Section, American Statistical Association,
284-289.
Return to Home or
Research activities
Fatigue and Lifetime Data Analysis
- Castillo, E., Fernández-Canteli, A., and Hadi, A. S. (1999),
"On Fitting a Fatigue Model to Data" International Journal
of Fatigue, 21, 97-106.
- Castillo, E. and Hadi, A. S. (1995), "Modeling Life-Time Data with
Application to Fatigue Models," Journal of
the American Statistical Association, 90, 1041-1054.
Return to Home or
Research activities
Extreme Value Distributions
- Castillo, E. and Hadi, A. S. (1997), "Fitting the Generalized
Pareto Distribution to Data," Journal of the American Statistical
Association, 92, 1609-1620.
- Castillo, E. and Hadi, A. S. (1994), "Parameter and Quantile
Estimation
for the Generalized Extreme-Value Distribution," Environmetrics, 5,
417-432.
- "Point Estimation of the Parameters of Two Families of
Generalized Logistic Distributions," Manuscript in preparation.
Return to Home or
Research activities
Perturbed Eigenvalue Problem
- "Modified Matrix Eigenvalue Problem for Real Symmetric
Matrices With Applications in Statistics," Manuscript under review.
- Hadi, A. S. and Wells, M. T. (1990), "Assessing the Effects of
Multiple Rows on the Condition Number of a Matrix," Journal of
the American Statistical Association, 85, 786- 792.
- Hadi, A. S. (1988), "Diagnosing Collinearity-Influential
Observations,"
Computational Statistics & Data Analysis, 7, 143-159.
- Hadi, A. S. and Velleman, P. F. (1987), "Diagnosing Near
Collinearities
in Least Squares Regression," A Discussion of "Collinearity
and
Least Squares Regression", by G. W. Stewart, Statistical
Science,
2, 93-98.
- Hadi, A. S. (1987), "The Influence of a Single Row on the
Eigenstructure
of a Matrix," Proceedings of the Statistical Computing Section,
American Statistical Association, 85-90.
Return to Home or
Research activities
Generalized Inverses
- Hadi, A. S. and Wells, M. T. (1991), "Minimum Distance Method
of Estimation and Testing When Statistics Have Limiting Singular
Multivariate Normal Distribution," Sankhya , Vol. 53, Series B,
Part 2, 257-267.
- Hadi, A. S. and Wells, M. T. (1990), "A Note on Generalized Wald's
Test," Metrika, 37, 309-315.
Return to Home or
Research activities
Statistical Analysis of Employment
Discrimination Data
- Hadi, A. S. and Jersky, B. (1990), "How Fair Can Employers
be?," Communications in Statistics: Theory and Methods,
A19, 12, 4545-558.
Return to Home or
Research activities
Probability
- Castillo, E., and Hadi, A. S.. (2000), "Some Probability Concepts
for Engineers," in Handbook of Industrial Automations, (E.
L. Hall and R. L. Shell, Eds.), New York: Marcel Dekker, 1-32.
Return to Home or
Research activities
Neural and Functional Networks
- Castillo, E. and Hadi, A. S. (2004), "Functional Networks," in Encyclopedia of
Statistical Sciences, (N. Balakrishnan, C. Read, S. Kotz, and B. Vidakovic, eds.), (in press).
- Castillo, E., Gutiérrez, J. M., Hadi, A. S., and Lacruz, B., "Some Applications of Functional Networks in Statistics and Engineering,"
Technometrics, 43, 10-24.
This paper received the " 2001 Technometrics Invited Paper Award" and was presented
as such at the Joint Annual Meetings of five statistical societies in Atlanta, GA (August 7, 2001).
- Castillo, E., Hadi, A., and Lacruz, B. (2001), "Optimal Transformations in Multiple
Linear Regression Using Functional Networks," Proceedings of the International
Work-Conference on Artificial and natural Neural Networks. IWANN 2001, in Lecture Notes in
Computer Science 2084, Part I, 316-324.
- Castillo, E., Cobo, A., Gómez Nesterkín, and Hadi, A.
S.(1999), "A General Framework for Functional
Networks," Networks, 35, 70-82.
Return to Home or
Research activities
Bayesian and Markov Networks
- Castillo, E. and Hadi, A. S. (2004), "Bayesian Networks," in Encyclopedia of
Statistical Sciences, (N. Balakrishnan, C. Read, S. Kotz, and B. Vidakovic, eds.), (in press).
- Castillo, E. and Hadi, A. S. (2004), "Markov Networks," in Encyclopedia of
Statistical Sciences, (N. Balakrishnan, C. Read, S. Kotz, and B. Vidakovic, eds.), (in press).
Return to Home or
Research activities
Software Available
- To see which statistical computer
codes are available, click
here and
examine the individual articles in list of publications.
- To download software related to the expert
systems and artificial intelligence areas, visit the AI Research Group Site.
- Below is an SPLUS code for the detection of outliers in multivariate
data.
S-
PLUS Code:
function(X) {
# -----------------------------------------------------------------
# Hadi, Ali S. (1994), "A Modification of a Method for the
# Detection of Outliers in Multivariate Samples," Journal of the
# Royal Statistical Society (B), 2, 393-396.
# -----------------------------------------------------------------
n <- dim(X) [1]
p <- dim(X) [2]
h <- trunc((n + p + 1)/2) id <- 1:n
r <- p
out <- 0
cf <- (1 + ((p + 1)/(n - p)) + (2/(n - 1 - (3*p))) )^2
# cf <- (1 + ((p + 1)/(n - p)) + (1/(n - p - h)) )^2
alpha <- 0.05
tol <- max(10^-(p+5), 10^-12)
# -----------------------------------------------------------------
# ** Compute Mahalanobis distance
# -----------------------------------------------------------------
C <- apply(X, 2, mean)
S <- var(X)
if (det(S) < tol) stop ()
D <- mahalanobis(X, C, S)
mah.out <- 0
cv <- qchisq(1-(alpha/n), p)
for (i in 1:n) if (D[i] >= cv) mah.out <- cbind(mah.out, i)
mah.out <- mah.out[-1]
mah <- sqrt(D)
Xbar <- C
Covariance <- S #
# ----------------------------------------------------------------
# ** Step 0
# ----------------------------------------------------------------
# ** Compute Di(Cm, Sm)
C <- apply(X, 2, median)
C <- t(array(C, dim = c(n, p)))
Y <- X - C
S <- ((n - 1)^-1)*(t(Y) %*% Y)
D <- mahalanobis(X, C[1, ], S)
Z <- sort.list(D)
# ----------------------------------------------------------------
# ** Compute Di(Cv, Sv)
repeat {
Y <- X[Z[1:h], ]
C <- apply(Y, 2, mean)
S <- var(Y)
if (det(S) > tol) {
D <- mahalanobis(X, C, S)
Z <- sort.list(D); break }
else h <- h + 1
}
# ----------------------------------------------------------------
# ** Step 1
# ----------------------------------------------------------------
repeat {
r <- r + 1
if ( h < r) break
Y <- X[Z[1:r],]
C <- apply(Y, 2, mean)
S <- var(Y)
if (det(S) > tol) {
D <- mahalanobis(X, C, S)
Z <- sort.list(D) }
}
# ----------------------------------------------------------------
# ** Step 3
# ----------------------------------------------------------------
# ** Compute Di(Cb, Sb)
repeat {
Y <- X[Z[1:h],]
C <- apply(Y, 2, mean)
S <- var(Y)
if (det(S) > tol) {
D <- mahalanobis(X, C, S)
Z <- sort.list(D)
if (D[Z[h + 1]] >= (cf*qchisq(1-(alpha/n), p))) {
out <- Z[(h + 1) : n]
break }
else { h <- h + 1
if (n <= h) break }
}
else { h <- h + 1
if (n <= h) break }
}
D <- sqrt(D/cf)
dst <- cbind(id, mah, D)
Outliers <- out
Cb <- C;
Sb <- S
Distances <- dst
return(Xbar, Covariance, mah.out, Outliers, Cb, Sb, Distances)
result
}
# ----------------------------------------------------------------
Return to Home,
Research activities or Outlier detection