Ali Hadi's Research Activities

Ali Hadi's Research Activities

Areas of Research Interests
Visit the AI Research Group
Books
Recently Published Articles
Manuscripts Under Review
Software Available

Areas of Research Interests

I am interested in solving practical problems in statistics and related fields (e.g, applied probability, computer science, mathematics, and engineering). My publications include four Books and more than seventy articles. The methods for outlier detection, the Influence measure and the Potential-Residual Plot have been implemented in several statistics packages (e.g., Data Desk, Stata, and SYSTAT). Areas of my research Interests include:

Probability and Statistical Science:

Computer Science:

Mathematics:

Interdisciplinary:
- Data Mining
- Neural and Functional Networks

Return to Home

Robust Statistics and Outlier Detection

Although it is customary to assume that data are homogeneous, in fact they often contain outliers or subgroups. Scientists and philosophers have recognized for at least 380 years that real data are not homogeneous and that the identification of outliers is an important step in the progress of scientific understanding. Methods that deal with robust estimation and outlier detection are presented in the following articles:

Robust Regression Methods:

Billor, N., Chatterjee, S., and Hadi, A. S., (2004), "A Re-Weighted Least Squares Method for Robust Regression Estimation," American Journal of Mathematical and Management Sciences, (in press). Contact me for computer programs.
Castillo, E., Hadi, A. S., Lacruz, B., and Sarabia, J. M. (2001), "Regression Diagnostics for the Least Absolute Value and the Minimax Methods," Communications in Statistics: Theory and Methods, 30, 6, 1197-1225.
Hadi, A. S. and Luceño, A. (1997), "Maximum Trimmed Likelihood Estimators: A Unified Approach, Examples, and Algorithms," Journal of Computational Statistics & Data Analysis, 25, 251-272.
Hadi, A. S. and Simonoff, J. S. (1994), "Improving the Estimation and outlier Identification Properties of the Least Median of Squares and Minimum Volume Ellipsoid Estimators," Parisankhyan Samikkha, 1, 61-70.

Detection of Outliers in Large Data Sets:

Billor, N., Hadi, A. S. and Velleman , P. F. (2000), "BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators," Computational Statist & Data Analysis, 34, 279-298. Contact me for a copy of the paper and computer programs

Detection of Outliers in Multivariate Data:

Hadi, A. S. and Nyquist, H. (1999), "Frechet Distance as a Tool for Diagnosing Elliptically Symmetric Multivariate Data," Linear Algebra and Its Applications, 289, 183-201.
Hadi, A. S. (1994), "A Modification of a Method for the Detection of Outliers in Multivariate Samples," Journal of the Royal Statistical Society, Series (B), 56, 393-396. This methods has been implemented in Stata (hadimvo), and in SYSTAT. Also, click here to copy an S-PLUS code.
Gould, W. and Hadi, A. S. (1993), "Identifying Multivariate Outliers," Stata Technical Bulletin, 11, 2-5. (An implementation in Stata of the method in the above paper).
Hadi, A. S. (1992), "Identifying Multiple Outliers in Multivariate Data," Journal of the Royal Statistical Society, Series (B), 54, 761-771.

Detection of Outliers in Regression Data:

Billor, N., Chatterjee, S., and Hadi, A. S., "A Re-Weighted Least Squares Method for Robust Regression Estimation," Manuscript under review. Contact me for computer programs.
Hadi, A. S. and Simonoff , J. S. (1997), "A More Robust Outlier Identifier for Regression Data," Bulletin of the International Statistical Institute, 281-282. Contact me or Jeff Simonoff for computer programs.
Hadi, A. S. and Simonoff, J. S. (1993), "Procedures for the Identification of Multiple Outliers in Linear Models," Journal of the American Statistical Association, 88, 1264-1272. (Code available)
Hadi, A. S. (1992), "A New Measure of Overall Potential Influence in Linear Regression," Computational Statistics & Data Analysis, 14, 1-27. The proposed influence measure has been implemented in Data Desk.

Graphical Methods for the Detection of Outliers:

Billor, N., Chatterjee, S., and Hadi, A. S., (2004), "A Re-Weighted Least Squares Method for Robust Regression Estimation," American Journal of Mathematical and Management Sciences, (in press). Contact me for computer programs.
Dodge, Y. and Hadi, A. S. (1999), "Simple Graphs and Bounds for the Elements of the Hat Matrix," Journal of Applied Statistics, 26, 817-823.
Hadi, A. S. (1992), "A New Measure of Overall Potential Influence in Linear Regression," Computational Statistics & Data Analysis, 14, 1-27. The proposed Potential-Residual Plot has been implemented in Data Desk.
Hadi, A. S. (1993), "Graphical Methods for Linear Models," Chapter 23 in Handbook of Statistics: Computational Statistics, (C. R. Rao, Ed.), Vol. 9, North-Holland Publishing Company, 775-802.
Hadi, A. S. (1990), "Two Graphical Displays for the Detection of Potentially Influential Subsets in Regression," Journal of Applied Statistics, 17, 313-327.

Return to Home or Research activities

Parameter and Quantile Estimation

Castillo, E., Hadi, A. S., Lacruz, B., and Sarabia, J. M. (2001), "Constrained Mixture Distributions," Metrika, 55, 247-269..
Castillo, E., Hadi, A. S., and Sarabia, J. M. (1998), "A Method for Estimating Lorenz Curves," Communications in Statistics, Theory and Methods, 27, 2037-2063.
Castillo, E. and Hadi, A. S. (1997), "Fitting the Generalized Pareto Distribution to Data," Journal of the American Statistical Association, 92, 1609-1620.
Hadi, A. S. and Luceño, A. (1997), "Maximum Trimmed Likelihood Estimators: A Unified Approach, Examples, and Algorithms," Journal of Computational Statistics & Data Analysis, 25, 251-272.
Castillo, E., Hadi, A. S., and Sarabia, J. M. (1997), "Fitting Continuous Bivariate Distributions to Data," The Statistician, 46, 355-369.
Castillo, E. and Hadi, A. S. (1995), "A Method for Estimating Parameters and Quantiles of Continuous Distributions of Random Variables," Computational Statistics & Data Analysis, 20, 421-439.
Castillo, E. and Hadi, A. S. (1994), "Parameter and Quantile Estimation for the Generalized Extreme-Value Distribution," Environmetrics, 5, 417-432.
Castillo, E. and Hadi, A. S. (1994), "Parameters and Quantiles Estimation for Continuously Distributed Random Variables," Proceedings of the Statistical Computing Section, American Statistical Association, 284-289.

Return to Home or Research activities

Fatigue and Lifetime Data Analysis

Castillo, E., Fernández-Canteli, A., and Hadi, A. S. (1999), "On Fitting a Fatigue Model to Data" International Journal of Fatigue, 21, 97-106.
Castillo, E. and Hadi, A. S. (1995), "Modeling Life-Time Data with Application to Fatigue Models," Journal of the American Statistical Association, 90, 1041-1054.

Return to Home or Research activities

Extreme Value Distributions

Castillo, E. and Hadi, A. S. (1997), "Fitting the Generalized Pareto Distribution to Data," Journal of the American Statistical Association, 92, 1609-1620.
Castillo, E. and Hadi, A. S. (1994), "Parameter and Quantile Estimation for the Generalized Extreme-Value Distribution," Environmetrics, 5, 417-432.
"Point Estimation of the Parameters of Two Families of Generalized Logistic Distributions," Manuscript in preparation.

Return to Home or Research activities

Perturbed Eigenvalue Problem

"Modified Matrix Eigenvalue Problem for Real Symmetric Matrices With Applications in Statistics," Manuscript under review.
Hadi, A. S. and Wells, M. T. (1990), "Assessing the Effects of Multiple Rows on the Condition Number of a Matrix," Journal of the American Statistical Association, 85, 786- 792.
Hadi, A. S. (1988), "Diagnosing Collinearity-Influential Observations," Computational Statistics & Data Analysis, 7, 143-159.
Hadi, A. S. and Velleman, P. F. (1987), "Diagnosing Near Collinearities in Least Squares Regression," A Discussion of "Collinearity and Least Squares Regression", by G. W. Stewart, Statistical Science, 2, 93-98.
Hadi, A. S. (1987), "The Influence of a Single Row on the Eigenstructure of a Matrix," Proceedings of the Statistical Computing Section, American Statistical Association, 85-90.

Return to Home or Research activities

Generalized Inverses

Hadi, A. S. and Wells, M. T. (1991), "Minimum Distance Method of Estimation and Testing When Statistics Have Limiting Singular Multivariate Normal Distribution," Sankhya , Vol. 53, Series B, Part 2, 257-267.
Hadi, A. S. and Wells, M. T. (1990), "A Note on Generalized Wald's Test," Metrika, 37, 309-315.

Return to Home or Research activities

Statistical Analysis of Employment Discrimination Data

Hadi, A. S. and Jersky, B. (1990), "How Fair Can Employers be?," Communications in Statistics: Theory and Methods, A19, 12, 4545-558.

Return to Home or Research activities

Probability

Castillo, E., and Hadi, A. S.. (2000), "Some Probability Concepts for Engineers," in Handbook of Industrial Automations, (E. L. Hall and R. L. Shell, Eds.), New York: Marcel Dekker, 1-32.

Return to Home or Research activities

Neural and Functional Networks

Castillo, E. and Hadi, A. S. (2004), "Functional Networks," in Encyclopedia of Statistical Sciences, (N. Balakrishnan, C. Read, S. Kotz, and B. Vidakovic, eds.), (in press).
Castillo, E., Gutiérrez, J. M., Hadi, A. S., and Lacruz, B., "Some Applications of Functional Networks in Statistics and Engineering," Technometrics, 43, 10-24.
This paper received the " 2001 Technometrics Invited Paper Award" and was presented as such at the Joint Annual Meetings of five statistical societies in Atlanta, GA (August 7, 2001).
Castillo, E., Hadi, A., and Lacruz, B. (2001), "Optimal Transformations in Multiple Linear Regression Using Functional Networks," Proceedings of the International Work-Conference on Artificial and natural Neural Networks. IWANN 2001, in Lecture Notes in Computer Science 2084, Part I, 316-324.
Castillo, E., Cobo, A., Gómez Nesterkín, and Hadi, A. S.(1999), "A General Framework for Functional Networks," Networks, 35, 70-82.

Return to Home or Research activities

Bayesian and Markov Networks

Castillo, E. and Hadi, A. S. (2004), "Bayesian Networks," in Encyclopedia of Statistical Sciences, (N. Balakrishnan, C. Read, S. Kotz, and B. Vidakovic, eds.), (in press).
Castillo, E. and Hadi, A. S. (2004), "Markov Networks," in Encyclopedia of Statistical Sciences, (N. Balakrishnan, C. Read, S. Kotz, and B. Vidakovic, eds.), (in press).

Return to Home or Research activities

Software Available

To see which statistical computer codes are available, click here and examine the individual articles in list of publications.
To download software related to the expert systems and artificial intelligence areas, visit the AI Research Group Site.
Below is an SPLUS code for the detection of outliers in multivariate data.

S- PLUS Code:

function(X) {
# -----------------------------------------------------------------
#  Hadi, Ali S. (1994), "A Modification of a Method for the
#  Detection of Outliers in Multivariate Samples," Journal of the
#  Royal Statistical Society (B), 2, 393-396.
# -----------------------------------------------------------------
  n <- dim(X) [1]
  p <- dim(X) [2]
  h <- trunc((n + p + 1)/2)     id <- 1:n
  r <- p
  out <- 0
  cf <- (1 + ((p + 1)/(n - p)) + (2/(n - 1 - (3*p))) )^2
# cf <- (1 + ((p + 1)/(n - p)) + (1/(n - p - h)) )^2
  alpha <- 0.05
  tol <- max(10^-(p+5), 10^-12)
# -----------------------------------------------------------------
# **  Compute Mahalanobis distance
# -----------------------------------------------------------------
  C <- apply(X, 2, mean)
  S <- var(X)
  if (det(S) < tol) stop ()
  D <- mahalanobis(X, C, S)
  mah.out <- 0
  cv <- qchisq(1-(alpha/n), p)
  for (i in 1:n) if (D[i] >= cv) mah.out <- cbind(mah.out, i)
  mah.out <- mah.out[-1]
  mah <- sqrt(D)
  Xbar <- C
  Covariance <- S   #
# ----------------------------------------------------------------
#  **  Step 0
# ----------------------------------------------------------------
#  **  Compute Di(Cm, Sm)
  C <- apply(X, 2, median)
  C <- t(array(C, dim = c(n, p)))
  Y <- X - C
  S <- ((n - 1)^-1)*(t(Y) %*% Y)
  D <- mahalanobis(X, C[1, ], S)
  Z <- sort.list(D)
# ----------------------------------------------------------------
#  **  Compute Di(Cv, Sv)
  repeat {
    Y <- X[Z[1:h], ]
    C <- apply(Y, 2, mean)
    S <- var(Y)
    if (det(S) > tol) {
       D <- mahalanobis(X, C, S)
       Z <- sort.list(D); break }
    else h <- h + 1
    }
# ----------------------------------------------------------------
#  **  Step 1
# ----------------------------------------------------------------
  repeat {
    r <- r + 1
    if ( h < r) break
    Y <- X[Z[1:r],]
    C <- apply(Y, 2, mean)
    S <- var(Y)
    if (det(S) > tol) {
       D <- mahalanobis(X, C, S)
       Z <- sort.list(D) }
    }
# ----------------------------------------------------------------
#  **  Step 3
# ----------------------------------------------------------------
#  **  Compute Di(Cb, Sb)
  repeat {
    Y <- X[Z[1:h],]
    C <- apply(Y, 2, mean)
    S <- var(Y)
    if (det(S) > tol) {
       D <- mahalanobis(X, C, S)
       Z <- sort.list(D)
       if (D[Z[h + 1]] >= (cf*qchisq(1-(alpha/n), p))) {
            out <- Z[(h + 1) : n]
            break }
       else { h <- h + 1
              if (n <= h) break }
       }
    else { h <- h + 1
          if (n <= h) break }
    }
  D <- sqrt(D/cf)
  dst <- cbind(id, mah, D)
  Outliers <- out
  Cb <- C;
  Sb <- S
  Distances <- dst
  return(Xbar, Covariance, mah.out, Outliers, Cb, Sb, Distances)
  result
}
# ----------------------------------------------------------------

Return to Home

Return to Home or Research activities

Return to Home or Research activities

Return to Home or Research activities

Return to Home or Research activities

Return to Home or Research activities

Return to Home or Research activities

Return to Home or Research activities

Return to Home or Research activities

Return to Home or Research activities

Return to Home or Research activities

Return to Home, Research activities or Outlier detection