A mathematician once told me his opinion about the difference between mathematics and newer fields like statistics and econometrics. He told me that mathematics was old enough to be mature and that various problems in nomenclature had been worked out long ago. For example, it used to be that different cultures used different numbering systems in math research, but now we all use a standard place-based numbering system and one common set of numerals.
Statistics (this mathematician said) was relatively new and there are still competing terms that refer to the same thing, or different things that are referred to by one term. Here is one of several posts by Andrew Gelman about confusion in what "fixed effects" and "random effects" are (he talks about five totally different definitions). I have talked to several successful academic researchers who don't really understand the precise definitions of fixed and random effects. For the record, I think that we should stick to the definitions based on the linear algebra behind them. I outlined some of the algebra here, based on the explanation in Wooldridge's textbook.
Other examples of confusing or inconsistent terminology in statistics and econometrics:
- (very minor) Heteroskadasticity-robust standard errors are also called Huber-White, Eicker-White, or Eicker-Huber-White standard errors.
- (also very minor) People talk about the "Wilcoxon test," and it's not usually clear whether they mean the Wilcoxon signed-rank test or the Wilcoxon rank sum test.
- (more serious) The right-hand terms in a regression equation are variously called covariates, predictors, regressors, independent variables, explanatory variables, or even features or other things. Similarly, the left-hand term is variously called the outcome variable, the response variable, the dependent variable, and other things. The error could be the error or the residual. Each of these terms represent slightly different philosophical ideas about what is happening in regression and why.
- (also serious)This paper which is otherwise quite good makes what I think is another mistake on page 190: saying that "shrinkage" and "partial pooling" are the same thing. Partial pooling in a multilevel model does indeed shrink coefficients compared to a fixed effects model, but I think that the term shrinkage refers to (or should refer to) specifically the effect of including a regression coefficient in a loss function like in ridge regression (another thing with many names). Even if that is not exactly what is meant by the term shrinkage, I don't think that shrinkage and partial pooling are synonyms.
- (maybe the most significant): People are rarely if ever able to give a precise definition of "endogeneity." They usually mean something about a problem with causal inference or omitted variables. An econometrics-based definition (that I prefer) is simply that the error terms are correlated with the regressors.
(Email me if you think of any others; I know there are plenty that I missed.)
It is possible that as time goes on, the confusion about these terms will recede and we will live in a world where one term refers to one idea. However, I doubt it. Each competing term represents a person's training, philosophy, field, and even region. Maybe these competing and confusing terms add richness and variety to the linguistic landscape. Or maybe they don't, but since science is non-centralized and (in some ways) conservative, I predict that terminological confusion/diversity will persist for decades to come.
One last point: even mathematics has plenty of terminology issues, despite its supposed maturity. For example, even today, hundreds of years after calculus was invented, there are many different competing notation methods for differentiation. Maybe maturity lies not only in simple terminology, but also in the ability to tolerate and thrive with confusing terminology.