Ignoring the additivity nature, or lack of it, of a variable can invalidate any statistical treatment. This is part of a research paper on the Self-Weighting Model (SWM) that I’m writing. I presented a sneak preview on this during a recent seminar before Fundación de Investigación, a local research company dedicated to clinical trials.

In general, if x is a non-additive variable, we cannot:

- obtain a mean (arithmetic average) x value from a set of x’s.
- calculate a standard deviation from a set of x’s.
- mean-center a set of x’s by subtracting a mean score from each x.
- standardize a set of x’s by subtracting a mean score from each x and dividing each by a standard deviation score (i.e., convert each x into a z score) .
- compute the L1-norm (Manhattan, taxicab distance) from a vector whose elements are a set of x’s.

[Added on 6-21-2012] In addition for said variable, we cannot:

- take the difference between any two pairs of x values.
- compute a coefficient of variation (standard deviation/mean), from a set of x values.
- compute a mean difference from any two sets of x values.
- compute a pooled standard deviation from any two sets of x values.
- compute a Cohen’s d (mean difference/pooled standard deviation) from any two sets of x values.

In a math/statistics scenario, the following are non-additive: slopes, cosines, sines, tangents, and ratios (e.g., standard deviations, correlation coefficients, beta coefficients, coefficients of variations, Cohen’s d, etc).

In an experimental sciences scenario, the following are non-additive: any intensive property, any ratio of extensive properties (e.g., density = mass/volume), any dissimilar ratio, etc.

The information presented in the aforementioned seminar is applicable to many dissimilar fields, including web analytics, data mining, information retrieval, and almost any research field that requires of numerical analysis of experimental variables.

Unfortunately, from time to time we see research articles published wherein the additivity/non-additivity nature of variables is ignored and data crunching and analysis is arbitrarily carried out. The result: sloppy approximations, invalid models, and erroneous forecasts.

PS. There might be counterexamples to the notion of classifying properties as intensive or extensive. Very few properties are neither one. There are also some cases wherein instrumental lectures (not properties) are subtracted in order to compute signal responses or signal-to-noise ratios, derivatives, numerical analysis, etc, but for the most part the above holds in the physical world.