I25. A particularly perfect application of the normal law of error in more than one dimension is afforded by the movements of the molecules in a homogeneous gas. A general idea
of the role played by probabilities in the explanation Normal
of these movements may be obtained without entering trlbution of into the more complicated and controverted parts of Molecular the subject, without going beyond the initial very Velocities. abstract supposition of perfectly elastic equal spheres. For convenience of enunciation we may confine ourselves to two dimensions. Let us imagine, then, an enormous billiardtable with perfectly elastic cushions and a frictionless cloth on which millions of perfectly elastic balls rush hither and thither at random—colliding with each other—a homogeneous chaos, with that sort of uniformity in the midst of diversity which is characteristic of probabilities. Upon this hypothesis, if we fix attention on any n balls taken at random—they need not be, according to some they ought not to be, contiguous—if n is very large, the average properties will be approximately the same as those of the total mixture. In particular the average energy of the n balls may be equated to the average energy of the total number of balls, say T/N, if T is the total energy and N the total number of the balls. Now if we watch any one of the n specimen balls long enough for it to undergo a great number of collisions, we observe that either of its velocitycomponents, say that in the direction of x, viz. u, receives accessions from an immense number of independent causes in random fashion. We may presume, therefore, that these will be distributed (among the n balls) according to the law of error. The law will not be of the type which was first supposed, where the " spread " continually increases as the number of the elements is increased.° Nor will it be of the type which was afterwards mentioned" where the spread diminishes as the number of the elements is increased. The linear function by which the elements are aggregated is here of an intermediate type; such that the mean square of deviation corresponding to the velocity remains constant. The method of composition might be illustrated by the process of taking r digits at random from mathematical tables adding the differences between each digit and 4.5 the mean value of digits, and dividing the sum by 1/r. Here are some figures obtained by taking at random batches of sixteen digits from the expansion of 7r, subtracting 16 X 4.5 from the sum of each batch, and dividing the remainder by 1116:
14 Cf. above, par. 102.
16 Cf. Galton's enthusiasm, Natural Inheritance, p. 66.
16 A lucid statement of the methods and results of probabilities applied to gunnery is given in the Official Textbook of Gunnery (1902).
1' Venn, Journ. Slat. Soc. (1891), p. 443. '8 Ed. Rev. (1850), xcii. 23.
" Cf. Dalton, Phil. Mag. (1875), xlix. 44. 2° Above, par. 112.
" Ibid.
+I.25, +0.75, —I, —I, +5.5, 2.75, +0.75, 2,
+1'75, +3'25, +0.25, 2.75, 2'25, 0.5, +4'75, +0.25.
If, instead of sixteen, a million digits went to each batch, the general character of the series would be much the same; the aggregate figures would continue to hover about zero with a standard deviation of 8.25, a probable error of nearly 2. Here for instance are seven aggregates formed by recombining 252 out of the 256 digits above utilized into batches of 36 according to the prescribed rule: viz. subtracting 36X4.5 from the sum of each batch of 36 and dividing the remainder by s/ 36:
0.5, +3'3, +2.6, —o•6, +1.5, 2, +I.
The illustration brings into view the circumstance that though the system of molecules may start with a distribution of velocities other than the normal, yet by repeated collisions the normal distribution will be superinduced. If both the velocities u and v are distributed according to the law of error for one dimension, we may presume that the joint values of u and v conform to the normal surface. Or we may reason directly that as the pair of velocities u and v is made up of a great number of elementary pairs (the coordinates in each of which need not, initially at least, be supposed uncorrelated) the law of frequency for concurrent values of u and v must be of the normal form which may be written
xY Y2
Z = 2 J (km 1. 1 — r2) eXp — Lk2 — 2r— km+ mJ /2(1 — r2).
It may be presumed that r, the coefficient of correlation, is zero, for, owing to the symmetry of the influences by which the molecular chaos is brought about, it is not to be supposed that there is any connexion or repugnance between one direction of u, say south to north, and one direction of v, say west to east. For a like reason k must be supposed equal to m. Thus the average velocity=2k; which multiplied by m, the mass of a sphere, is to be equated to the average energy T/N. The reasoning may be extended with confidence to three dimensions, and with caution to contiguous molecules.
126. Correlation cannot be ignored in another application of the manydimensioned law of error, its use in biological inquiries to Norma) investigate the relations between different generations. Correlation It was found by Galton that the heights and other in Biology measurable attributes of children of the same parents
range about a mean which is not that of the parental heights, but nearer the average of the general population. The amount of this " regression " is simply proportional to the distance of the " midparent's " height from the general average. This is a case of very general law which governs the relations not only between members of the same family, but also between members of the same organism, and generally between two (or more) coexistent or in any way coordinated observations, each belonging to a normal group. Let x and y be the measurements of a pair thus constituted. Then 2 it may be expected that the conjunction of particular values for x and y will approximately obey the twodimensioned normal law which'has been already exhibited (see par. 114).
11 7. Regressionlines. —In the expression above given, put l/ I km =r, and the equation for the frequency of pairs having values of the attribute under measurement becomes
1 1(x —a)2 (x—a)(y—b) (y—b)2l
z=2ars/ km/I—r2expL k 2r k slm + m /2(1 —r2).
This formula is of very general application.' If two sets of measurements were made on the height, or other measurable feature, of the proverbial " Goodwin Sands " and " Tenterden Steeple," and the first measurement of one set was coupled with the first of the other set, the second with the second, and so on, the pairs of magnitudes thus presented would doubtless vary according to the abovewritten law, only in that case r would presumably be zero; the expression for z would reduce to the product of the two independent probabilities that particular values of x and y should concur. But slight interdependences between things supposed to be totally unconnected would often be discovered by this law of error in two or more dimensions.' It may be put in a more convenient form by substituting
for (x—a)/./k and n for (y—Ohl m. The equation of the surface then becomes z = (I/2irsl1 — r2) exp—[E2 — 2rl;tf+272]/2(I — r2). If the frequency of observations in the vicinity of a point is represented by the number of dots in a small increment of area, when r=o the dots will be distributed uniformly about the origin, the curves of equal probability will be circles. When r is different from zero
' Above, par. 114, and below, par. 127.
2 Some plurality of independent causes is presumable.
3 Herschel's a priori proposition concerning the law of error in two dimensions (above, par. 99) might still be defended either as generally true, so many phenomena showing no trace of interdependence, or on the principle which justifies our putting 1 for a probability that is unknown (above, par. 6), or 5 for a decimal place that is neglected; correlation being equally likely to be positive or negative. The latter sort of explanation may be offered for the less serious contrast between the a priori and the empirical proof of the law of error in one dimension (below, par. 158).
' Cf. above, par. 115.the dots will be distributed so that the majority will be massed in two quadrants: in those for which t and +1 are both positive or both negative when r is positive, in those for which and n have opposite signs when r is negative. In the limiting case, when r =I the whole host will be massed along the line every deviation t being attended with an equal deviation n. In general, to any deviation of one of the variables ' there corresponds a set or " array " (Pearson) of values of the other variable; for which the frequency is given by substituting k' for Z in the general equation. The section thus obtained proves to be a normal probabilitycurve with standard deviation sl (I —r2). The most probable value of 21 corresponding to the assigned
value of is re'. The equation +t—r, or rather what it becomes when translated back to our original coordinates (y—b)/o2= r(x—a)Qi, where Qi, 02 are our sl k, Jm respectively,' is often called a regressionequation. A verification is to hand in the abovecited statistics, which Weldon obtained by casting batches of dice. If the dice were perfect, r ( =ill/ km) would equal 2, and as the dice proved not to be very imperfect, the coefficient is doubtless a p p r o x i  m a t e l y = . Accordingly, we may expect that, if axes x and y are drawn through the point of maximumfrequency at the centre of the compartment containing 244 observations, corresponding to any value of x, say 22i (where i is the side of each square compartment), the most probable value of y should be vi, and corresponding to y=2vi the most probable value of x should be vi. And in fact these regressionequations are fairly well fulfilled for the integer values of v (more than which could not be expected from discrete observations): e.g. when x = +4i, the value of y, for which the frequency (25) is a maximum, is as it ought to be + 2i; when x = 2i the maximum (119) is at y= —i; when x= 4i the maximum (16) is at y= 2i; when y is + 2i the maximum (138) is at x= +i; when y is—2i the maximum (117) at x= —i, and in the two cases (x= +2i and y= +4i), where the fulfilment is not exact, the failure is not very serious.
128. Analogous statements hold good for the case of three or more dimensions of error.' The normal law of error for any number of variables, xi x2 x3, may be put in the form z = (I (2,r)n/2 f i) exp — [Rnxi2 + Rnx22 + &c. + 2R12x1x2 + &c.]/2p where 0 is the determinant:
I rig ri3
r21 I r ..
r3i r32 1 ..
'
each r, e.g. 1.23 (=r32), is the coefficient of correlation between two of the variables, e.g. X2, x3; R1i is the first minor of the determinant formed by omitting the first row and first column; R22 is the first minor formed by omitting the second row and the second column, and so on; Rig (=R21) is the first minor formed by omitting the first column and second row (or vice versa). The principle of correlation plays an important role in natural history. It has replaced the notion that there is a simple proportion between the size of organs by the appropriate conception that there are simple proportions existing between the deviation from the average of one organ and the most probable value for the coexistent deviation of the other organ from its average' Attributes favoured by " natural " or other selection are found to be correlated with other attributes which are not directly selected. The extent to which the attributes of an individual depend upon those of his ancestors as measured by correlation.' The principle is instrumental to most of the important " mathematical contributions " which Professor Pearson has made to the theory of evolution.9 In social inquiries, also, the principle promises a rich harvest. Where numerous fluctuating causes go to produce a result like pauperism or immunity from smallpox, the ideal method of eliminating chance would be to construct " regressionequations " of the following type: " Change % in pauperism [in the decade 1871—1881] in rural districts= 27.07%, +o•299 (change % outrelief ratio), +0.271 (change ,% on proportion of old), + .064 (change % in population).„10
129. In order to determine the best values of the coefficients involved in the law of error, and to test the worth of Deterof
ina
the results obtained by using any values, recourse must tion
be had to inverse probability. Constants by
130. The simplest problem under this head is the Inverse where the quaesitum is a single real object and the method. data consist of a large number of observations, xi, x2, . . x,,, such that if the number were indefinitely increased, the completed series would form a normal probabilitycurve with the true point as its centre, and having a given modulus c. It is as if we had observed the position of the dints made by the fragments
' Cf. note to par. 98, above.
' Phil. Mag. (1892), p. 200 seq. ; 1896, p. 211; Pearson, Trans. Roy. Soc. (1896), 187, p. 302; Burbury, Phil. Mag. (1894), p. 145.
' Pearson, " On the Reconstruction of Prehistoric Races," Trans. Roy. Soc. (1898), A, p. 174 seq.; Proc. Roy. Soc. (1898), p. 418.
8 Pearson, The Law of Ancestral Heredity," Trans. Roy. Soc.; Proc. Roy. Soc. (1898).
9 Papers in the Royal Society since 1895.
19 An example instructively discussed by Yule, Journ. Stat. Soc. (1899).
of an exploding shell so far as to know the distance of each mark measured (from an origin) along a right line, say the line of an extended fortification, and it was known that the shell was fired perpendicular to the fortification from a distant ridge parallel to the fortification, and that the shell was of a kind of which the fragments are scattered according to a normal law 1 with a known coefficient of dispersion; the question is at what position on the distant ridge was the enemy's gun probably placed ? By received principles the probability, say P, that the given set of observations should have resulted from measuring (or aiming at) an object of which the real position was between x and x +Ax is
dx J exp — [(x  x1)2 + (x — x2)2 + &c.]/c2;
+cs
where J is a constant obtained by equating to unity f _ Pdx m
(since the given set of observations must have resulted from some position on the axis of x). The value of x, from which the given set of observations most probably resulted, is obtained by making P a maximum. Putting dP/dx = o, we have for the maximum (d2P(dx2 being negative for this value) the arithmetic mean of the given observations. The accuracy of the determination is measured by a probabilitycurve with modulus c V n. This in the course of a very long siege if every case in which the given group of shellmarks x1, x2, . . . x,, was presented could be investigated, it would be found that the enemy's cannon was fired from the position x', the (point right opposite to the) arithmetic mean of Si, x2, &c., x,,, with a frequency assigned by the equation
z = (sl n/ 1/ lrc) exp — n(x — x')2/c2.
The reasoning is applicable without material modification to the case in which the data and the quaesitum are not absolute quantities, but proportions; for instance, given the percentage of white balls in several large batches drawn at random from an immense urn containing black and white balls, to find the percentage of white balls in the urn—the inverse problem associated with the name of Bayes.
131. Simple as this solution is, it is not the one which has most recommended itself to Laplace. He envisages the quaesitum not so much as that point which is most probably the real one, as that point which may most advantageously be put for the real one. In our illustration it is as if it were required to discover from a number of shotmarks not the point' which in the course of a long siege would be most frequently the position of the cannon which had scattered the observed fragments but the point which it would be best to treat as that position—to fire at, say, with a view of silencing the enemy's gun—having regard not so much to the frequency with which the direction adopted is right, as to the extent to which it is wrong in the long run. As the measure of the detriment of error, Laplace' takes "la valeur moyenne de 1'erreur a craindre," the mean first power of the errors taken positively on each side of the real point. The mean spare of errors is proposed by Gauss as the criterion.4 Any mean power indeed, the integral of any function which increases in absolute magnitude with the increase of its variable, taken as the measure of the detriment, will lead to the same conclusion, if the normal law prevails.'
132. Yet another speculative difficulty occurs in the simplest, and recurs in the more complicated inverse problem. In putting P as the probability, deduced from the observations that the real point for which they stand is x (between x and x +Ox), it is tacitly assumed that prior to observation one value of x is as probable as another. In our illustration it must be assumed that the enemy's gun was as likely to be at one point as another of (a certain tract of) the ridge from which it was fired. If, apart from the evidence of the shellmarks; there was any reason for thinking that the gun was situated at one point rather than another, the formula would require to be modified. This a priori probability is sometimes grounded on our ignorance; according to another view, the procedure is justified by a rough general knowledge that over a tract of x for which P is sensible one value of x occurs about as often as another.'
1 If normally in any direction indifferently according to the twoor threedimensioned law of error, then normally in one dimension when collected and distributed in belts perpendicular to a horizontal right line, as in the example cited below, par. 155.
2 Or small interval (cf. preceding section).
3 " Toute erreur soit positive soit negative doit titre consideree comme un desavantage ou une perte reelle a un jeu quelconque," Thiorie analytique, art. 20 seq., especially art. 25. As to which it is acutely remarked by Bravais (op. cit. p. 258), " Cette regle simple laisse a desirer une demonstration rigoureuse, car 1'analogue du cas actuel avec celui des jeux de hasard est loin d'etre complete."
Theoria combinationis, pt. i. § 6. Simon Newcomb is conspicuous by walking in the way of Laplace and Gauss in his preference of the most advantageous to the most probable determinations. With Gauss he postulates that " the evil of an error is proportioned to the square of its magnitude " (American Journal of Mathematics, vol. viii. No. 4).
As argued by the present writer, Camb. Phil. Trans. (1885), vol. xiv. pt. ii. 0. 161. Cf. Glaisher, Mem. Astronom. Soc. xxxix. I08.
The view taken by the present writer on the " Philosophy of Chance," in Mind (188o; approved by Professor Pearson, Grammar
133. Subject to similar speculative difficulties, the solution which has been obtained may be extended to the analogous problem in which the quaesitum is not the real value of an observed magnitude, but the mean to which a series of statistics indefinitely prolonged converges.'
134. Next, let the modulus, still supposed given, not be the same for all the observations, but c1 for x1, c2 for x2, &c. Then P becomes proportional to
exp — [(x  x1)2/c12 + (x — x2)2/c22 + &c.].
And the value of x which is both the most probable and the " most advantageous " is (x1/c12 1x2/c22 +&c.) /(I /c32 +1/c22 +&c.) ;
each observation being weighted with the inverse method of mean square of observations made under similar con least ditions.s This is the rule prescribed by the " method squares. of least squares "; but as the rule in this case has been deduced by genuine inverse probability, the problem does not exemplify what is most characteristic in that method, namely, that a rule deducible from the hypothesis that the errors of observations obey the normal law of error is employed in cases where the normal law is not known, or even is known not, to hold good. For example, let the curve of error for each observation be of the form of
z = [I / l (2rc)] X exp [ — x2/c2 — 2j(x/c — 2x3/3c3)],
where j is a small fraction, so that z may equally well be equated to
(IN ,rc)[I 2j(x/c 2x3/3c3)] exp —x2/c2, a law which is actually
very prevalent. Then, according to the genuine inverse method,
the most probable value of x is given by the quadratic equation
did d log P= o, where log P= const. — (x—xr)2/c,.2 E2j[(x—xr)3/cr3
2(x—xr)3/3cr3I, denoting summation over all the observations. According to the " method of least squares," the solution is the weighted arithmetic mean of the observations, the weight of any observation being inversely proportional to the corresponding mean square, i.e. cr2/2 (the terms of the integral which involve j vanishing), which would be the solution if the j's are all zero. We put for the solution of the given case what is known to be the solution of an essentially different case. How can this paradox be justified?
135. Many of the answers which have been given to this question seem to come to this. When the data are unmanageable, it is legitimate to attend to a part thereof, and to determine the most probable (or the " most advantageous ") value of the quaesitum, and the degree of its accuracy, from the selected portion of the data as if it formed the whole. This throwing overboard of part of the data in order to utilize the remainder has often to be resorted to in the rough course of applied probabilities. Thus an insurance office only takes account of the age and some other simple attributes of its customers, though a better bargain might be made in particular cases by taking into account all available details. The nature of the method is particularly clear in the case where the given set of observations consists of several batches, the observations in any batch ranging under the same law of frequency with mean x and mean square of error kr, the function and the constants different for different batches; then if we confine our attention to those parts of the data which are of the type x'r and 14— ignoring what else may be given as to the laws of error—we may treat the x'r's as so many observations, each ranging under the normal law of error with its coefficient of dispersion; and apply the rules proper to the normal law. Those rules applied to the data, considered as a set of derivative observations each formed by a batch of the original observations) averaged, give as the most probable (and also the most advantageous combination of the observations the arithmetic mean weighted according to the inverse mean square pertaining to each observation, and for the law of the error to which the determination is liable the normal law with standard deviation 9 (Ek/n)—the very rules that are prescribed by the method of least squares.
136. The principle involved might be illustrated by the proposal to make the economy of datum a little less rigid: to utilize, not indeed all, but a little more of our materials—not only the mean square of error for each batch, but also the mean cube of error. To begin with the simple case of a single homogenous batch: suppose that in our example the fragments of the shell are no longer scattered according to the normal law. By the method of least squares it would still be proper to put the arithmetic mean to the given observations for the true point required, and to measure the accuracy of that determination by a probabilitycurve of which the modulus is I (2k), where k is the mean square of deviation (of fragments from their mean). If it is thought desirable to utilize more of the data there is available, the proposition that the arithmetic mean of a
of Science, 2nd ed. p. 146). See also " A priori Probabilities," Phil. ag. (Sept. 1884), and Camb. Phil. Trans. (1885), vol. xiv. pt. ii. P. 147 Seq.
Above, pars. 6, 7.
The mean square f}0v (x2/sl re) exp — x2/c2dx = c2/2.
9 The standard deviation pertaining to a set of (n/r) composite observations, each derived from the original n observations by averaging a batch thereof numbering r, is  (k/r)/Rl (n/r)• =  (k/n), when the given observations are all of the same weight; mutatis mutandis when the weights differ.
numerous set of observations, say x, x2, . . . x„ (taken as a sample from an indefinitely large group obeying any the same law of frequency) varies from set to set approximately according to the following law (to be established later)
z alaexp — C c +2nj \c 3c3/
where c2/2 the mean square of deviation, and j = the mean cube of deviation, and j/ca, say j, is small. Then, by abstraction analogous to that which has just been attributed to the method of least squares, we may regard the datum as a single observation, the arithmetic mean (of a sample batch of observations) subject to the law of error z =f(x). The most probable value of the quaesitum is therefore given by the equation f'(x—x')
o, where x' is the arithmetic mean of the given observations. From the resulting quadratic equation, putting x = x' + e, and recollecting that e is small we have c = jc. That is the correction due to the utilization of the mean cube of error. The most advantageous solution cannot now be determined,' f(x) being unsymmetrical, without assuming a particular form for the function of detriment. This method of least squares plus cubes may easily be extended to the case of several batches.
137. This application of probabilities not to the actual data but to a selected part thereof, this economy of the inverse method, is widely practised in miscellaneous statistics, where the object is to determine whether the discrepancy between two sets of observation is accidental or significant of a real difference? For instance, let the data be ages at death of individuals of two classes (e.g. temperate or not so, urban or rural, &c.) who have been under observation, since the age of, say, 20. Granted that the ages at death conform to Gompertz's law; the determination of the modal age at death, that age at which the proportion of the total observed dying (per unit of time) is a maximum for each class, would most perfectly be effected by the genuine inverse method. That method will also enable us to determine the probability that the two modes should have differed to the observed extent by mere accident.' According to the abridged method it suffices to proceed as if our data consisted of two observations x' and y', the average ages at death of the two classes, each average obeying the normal law of error, with respective moduli ci = [(x' — x1)2 + (x' — x2)2 + &c.]2/n,
c2 = .l [(y' — yl)2 + (y' — y2)2 + &c.]2/n, where x1, x2, &c., yl y2, &c.,
are the respective sets of observed ages at death; as follows from the law of error, whatever the law of distribution of the given observations. According to a wellknown property of the normal law, the difference between the averages of n and n' observations respectively will range under a probabilitycurve with modulus v cl2 + c22, say c. Whence for the probability that a difference as great as the observed one, say e, should have occurred by chance we have 2[I—8(r)], where r =e/c, and 8(x) is the integral 2/sl3r(exp = xz)dx, given in many treatises.
138.* This sort of abridgment may be extended to other kinds of average besides the arithmetic, in particular the median (that point Abridged which has as many of the given observations above as Methods. below it). By simple induction we know that the
median of a large sample of observations is a probable value for the true median; how probable is determined as follows from a selection of our data. First suppose that all the observations are of the same weight. If x' were the true median, the probability that as many as an + r of the observations should fall on either side of that point is given by the normal law for which the exponent is 2r2/n.4 This probability that the observed median will differ from the true one by a certain number of observations is connected with the probability that they will differ by a certain extent of the abscissa, by the proposition that the number of observations contained between the true and apparent median is equal to the small difference between them multiplied by the density of observations at the median—in the case of normal and generally symmetrical curves the greatest ordinate. This is the second datum we require to select. In the case of the normal curve it may be calculated from the modulus itself, determined by induction from a selection of data._ If the observations are not all of the same worth, weight may be assigned by counting one observation as if it occurred oftener than another. This is the essence of Laplace's Method of Situation.6'
1 The use of the cubes is also contrasted with that of the squares (only) in this respect: that it is no longer a matter of indifference how many of the original observations we assign to the batch of which the mean constitutes the single (compound) observation.
2 The object of the writer's paper on " Methods of Statistics " in the Jubilee number of the Journ. Stat. Soc. (1885).
' See on the use of the inverse method to determine the mode of a group, the present writer's paper on " Probable Errors " in the Journ. Slat. Soc. (Sept. 1908).
4 Above, par. 103.
s Theorie analytique, 2nd supp. p. 164. Mecanique celeste,
bk. iii. art. 4o; on which see the note in Bowdich's translation. The method may be extended to other percentiles. See Czuber, Beobachtungsfehle , § 58. Cf. Phil. Mag. (1886), p. 375; and Sheppard,
139. In its simplest form, where all the given observations are of equal weight, this method is of wide applicability. Compared with the genuine inverse method, it is always more convenient, seldom much less accurate, sometimes even more accurate. If the given observations obey the normal law, the precision of the median is less than the precision of the arithmetic mean by only some 25% a discrepancy not very serious where only a rough estimate of the worth of an average is required. If the observations do not obey the normal law—especially if the extremities are abnormally divergent—the precision of the median may be greater than that of the arithmetic mean.'
140. Yet another instance of the contrast between genuine and abridged inversion is afforded by the problem to determine the modulus as well as the mean for a set of observations Determinknown to obey the normal law; what the first problem' anon of becomes when the coefficient of dispersion is not given. FrequencyBy inverse probability we ought in that case, in addition Constants. to the equation dP/dx = o, to put dP/dc = o. Whence
c2 = 2 [(x' — x,)2 + (x' — x2)2 + &c. + (x' — x, )2] /n, and x' =
(xl + x2 + &c. + x")/n. This solution differs from that which is often given in the textbooks' in that there, in the expression for c2, (n. — I) occurs in the denominator instead of n. The difference is explained by the fact that the authorities referred to determine c, not by genuine inversion, but by ordinary induction, by a condition which certainly would be fulfilled in the long run, but does not express the whole of our data; a condition in this respect like the equation of c to J 1r(Ee)/n, where e is the difference (taken positively, without regard to its sign) between any observation and the arithmetic mean of all the observations'
141. Of course the determination of the most probable value is subject to the speculative difficulties proper to a priori probability: which are particularly striking in this case, as it appears equally natural to take as that constant, of which the values are a priori equally probable, k( =c2/2), or even10 h( = I/c2), the measure of weight, as in fact Laplace has done ;11 yet no two of these assumptions can be exactly true.12
142. A more convenient determination is obtained from simple induction by .equating the modulus to some datum of the observed group to which it would be equal if the group were complete—in particular to the distance from the median of some percentile (or point which marks off a certain percentage, e.g 25 of the given observations) multiplied by a factor corresponding to the percentile obtainable from a familiar table. Mr Sheppard has given an interesting proof 13 that we cannot by way of percentiles obtain such good" results for the frequencyconstants as, by the use of " the average and average square " [the method prescribed by inverse probability)
143. The same philosophical subtleties, with greater mathematical complications, meet us when we pass on to the case of two or more quaesita. The problem under this head which mainly Entangled exercised the older writers was to determine a number of measure. unknown quantities, given a larger number, n, of equa ments. tions involving them.
144. Supposing the true values approximately known, by substituting the approximate values in the given equations and expanding according to Taylor's theorem, there will be obtained for the corrections, say x, y..., n linear equations of the form
alx+bly • . =fi a2x+bey' . =f2,
where each a and b is a known coefficient, and each f is a fallible observation. Suppose that the error to which each is liable obeys the normal law, and that the modulus pertaining to each observation is the same—which latter condition can be secured by multiplying each equation by a proper factor—then if x' and y' are the true values of the quaesita, the frequency with which (aix' + b,y' — fl) assumes different values is given by the equation z = i/(sl,re') exp — [aix + biy — fl] 2/c,', where c, is a constant which,
Trans. Roy. Soc. (1889), 192, p. 135, ante, where the error incident to this kind of determination is ascertained with much precision.
6 Cf. Phil. Mag. (1887), xxiv. 269 seq., where the median is prescribed in case of " discordant " (heterogeneous) observations. If the more drastic remedy of rejecting part of the data is resorted to Sheppard's method of performing that operation may be recommended (Prot. Lond. Math. Soc. vol. 31). He prescribes for cases to which the median may not be appropriate, namely, the determination of other frequencyconstants besides the mean of the observations.
Above, par. 134.
8 E.g. Airy, Theory of Errors, art. 6o.
9 It is a nice point that the expression for c2, which has (n —'1) instead of n for denominator, though not the more probable, may yet be the more advantageous (supposing that there were any sensible difference between the two). Cf. Camb. Phil. Trans. (1885), vol. xiv. pt. ii. p. 165; and " Probable Errors," Journ. Stat. Soc. (June 1908).
1° Above, par. 96, note.
u Theorie analytique, 2nd supp. ed. 1847, p. 578.
12 See the matter discussed in Camb. Phil. Trans., loc. cit. 1a Trans. Roy. Soc. (1899), A, cxcii. 135.
14 Good as tested by a comparison of the mean squares of errors in the frequencyconstant determined by the compared methods.
= f(x), say;
if not known beforehand, may be inferred, as in the simpler case, from a set of observations. Similar statements holding for the other equations, the probability that the given set of observations fi, f2, &c., should have resulted from a particular system of values for x, y . is J exp I(aix+b,y—fi)2/ci2+(a2x+b2Y—f2)2/c22+&c.1, where J is aconstant determined on the same principle as in the analogous simpler cases.' The condition that P should be a maximum gives as many linear equations for the determination of x' y' . as there are unknown quantities.
145. The solution proper to the case where the observations are known to arrange according to the normal law may be extended to numerous observations ranging under any law, on the principles which justify the use of the Method of Least Squares in the case of a single quaesitum.
146. As in that simple case, the principle of economy will now justify the use of the median, e.g. in the case of two quaesita, putting for the true values of x and y that point for which the sum of the perpendiculars let fall from it on each of a set of lines representing the given equations (properly weighted) is a minimum?
147. The older writers have expressed the error in the determination of one of the variables without reference to the error in the Norma! other. But the error of one variable may be regarded
latlon.as correlated with that of another; that is, if the system orm x , y . . . forms the solution of the given equations, while x'+f, y'dn . . . is the real system, the (small) values of k, n.... which will concur in the long run of systems from which the given set of observations result are normally correlated. From this point of view Bravais, in 1846, was led to several theorems which are applicable to the now more important case of correlation in which and n are given (not in general small) deviations from the means of two or more correlated members (organs or attributes) forming a normal group.
148. To determine the frequencyconstants of such a group it is proper to proceed on the analogy of the simple case of onedimensioned error. In the case of two dimensions, for instance, the probability p, that a given pair of observations (xi, y') should have resulted from a normal group of which the means are x' y' respectively, the standard deviations o, and o2 and the coefficient of correlation r, may be written
AxAyAo,Ao2Ar(I /22r) sloes20 — r2) exp 1E2,
where E2= (x' — xi)2/o,2 — 2r(x' — x,)(Y' — Y')/o,02 + (Y' — yi)2/022.
A similar statement holds for each other pair of observations
(x2y2), (x3y3)... ; with analogous expressions for p2, p3... Whence,
as in the simpler case, we have p, X ps X&c. X p„/J (a constant)
for P, the a posteriori probability that the given observations should
have resulted from an assigned system of the frequencyconstants.
The most probable system is determined by making P a maximum,
and accordingly equating to zero each of the following expressions
dP dP dP dP dP
dx, dy, do', doe, dr.
The values of the arithmetic mean and of the standard deviation for each variable are what have been obtained in the simple case of one dimension. The value of r is E(x'—x,)(y'—yr)/o'o2.' The probable error of the determination is assigned on the assumption that the errors to which it is liable are small.* Such coefficients have already been calculated for a great number of interesting cases. For instance, the coefficient of correlation between the human stature and femur is 0.8, between the right and left femur is 0.96, between the statures of husbands and wives is 0•28.'
149. This application of inverse probability to determine correlationcoefficients and the error to which the determination is liable has been largely employed by Professor Pearson° and other recent writers. The use of the normal formula to measure the probable—and improbable—errors incident to such determinations is justified by reasoning akin to that which has been employed in the general proof of the law of error.? Professor Pearson has pointed out a circumstance which seems to be of great importance in the theory of evolution: that the errors incident to the determination of different frequencycoefficients are apt to be mutually correlated. Thus if a random selection be made from a certain population, the correlationcoefficient which fits the organs of that set is apt to differ from the coefficient proper to the complete group in the same sense as some other frequencycoefficients.
150. The last remark applies also to the determination of the coefficients, in particular those of correlation, by abridged methods, on principles explained with reference to the simple case; for instance by the formula r=En/El;, where Et is the sum of (some or all) the
' Above, par. 13o.
2 See Phil. Mag. (1888), " On a New Method of Reducing Observations "; where a comparison in respect of convenience and accuracy with the received method is attempted.
3 Corresponding to the k/elm of pars. 14, 127 above.
* Pearson, Trans. Roy. Soc., A, 191, P. 234.
5 Pearson, Grammar of Science, 2nd ed. p. 402, 431.
° Trans. Roy. Soc. (1898), A, vol. 191; Biometrika, ii. 273.
7 Above, par. 107. Compare the proof of the " Subsidiary Law of Error," as the law in this connexion may be called, in the paper on " Probable Errors," Journ. Slat. Soc. (June 1908).positive (or the negative) deviations of the values for one organ or attribute measured by the modulus pertaining to that member, and En is the sum of the values of the other member, which are associated with the constituents of EE. This variety of this method is certainly much less troublesome, and is perhaps not much less accurate, than the method prescribed by genuine inversion.
151. A method of rejecting data analogous to the use of percentiles in one dimension is practised when, given the frequency of observations for each increment of area, e.g. each Ax Ay, we utilize only the frequency for integral areas. Mr Sheppard has given an elegant solution of the problem: to find the correlation between two attributes, given the medians L, and M, of a normal group for each attribute and the distribution of the total group, as thus.8
Below L, Above L,
Below M, P R
Above M, R P
If cos D is put for r, the coefficient of correlation, it is found that D =vrR/(P+R). For example, let the group of statistics relating to dice already °cited from Professor Weldon be arranged in four quadrants by a horizontal and a vertical line, each of which separates the total groups into two halves: lines of which equations prove to be respectively y=6.11 and x=6.156. For R we have 1360.5, and for P 687.5 roughly. Whence D=srXo•66; r= cos 0.66 Xir = —I nearly, as it ought; the negative sign being required by the circumstance that the lower part of Mr Sheppard's diagram shown in fig. 12 corresponds to the upper part of Professor Weldon's diagram shown in par. 115.
152. Necessity rather than convenience is sometimes the motive for resort to percentiles. Professor Pearson has applied the median method to determine the correlation between husbands and wives in respect of the darkness of eyecolour, a character which does not admit of exact graduation: " our numbers merely refer to certain groupings, arranged, it is true, in increasing darkness of colour, but in no way corresponding to equal increases in colourintensity."1° From data of this sort, having ascertained the number of husbands with eyecolours above the median tint who marry wives with eyecolour above the median tint, Professor Pearson finds for r the coefficient of correlation +o•1. A general method for determining the frequencyconstants when the data are, or are taken to be, of the integral sort has been given by Professor Pearson." Attention should also be called to Mr Yule's treatment of the problem by a sort of logical calculus on the lines of Boole and Jevons.12
153. In the cases of correlation which have been so far considered, it has been presupposed that the things correlated range according to the normal law of error. But now, suppose the law Abnormal of distribution to be no longer normal : for instance, that
the dots on the plane of xy," representing each a pair of Correlation. members, are no longer grouped in elliptic (or circular) rings of equal frequency, that the locus of the maximum y deviation, corresponding to an assigned x deviation, is no longer a right line. How is the interdependence of these deviations to be formulated? It is submitted that such data may be treated as if they were normal: by an extension of the Method of Least Squares, in two or more dimensions." Thus when the amount of pauperism together with the amount of outdoor relief is plotted in several unions there is obtained a distribution far from normal. Nevertheless if the average pauperism and average outdoor relief are taken for aggregates—say quintettes or decades—of unions taken at random, it may be expected that these means will conform to the normal law, with coefficients obtained from the original data, according to the rule which is proper to the case of the normal law." By obtaining averages conforming to the normal law, as by the simple application of the method of least squares, we should not indeed have utilized the whole of our data, but we shall put a part of it in a very useful
8 Trans. Roy. Soc. (1899), A, 192, p. 141.
°Above, par. 115.
1° Grammar of Science, p. 432.
u Trans. Roy. Soc., A, vol. 195. In this connexion reference should also be made to Pearson's theory of " Contingency " in his thirteenth contribution to the " Mathematical Theory of Evolution ' (Drapers' Company Research Memoirs).
'2 Trans. Roy. Soc. (1900), A, 194, p. 257; (1901), A, 197, p. 91.
1' Above, par. 127.
" Above, par. 116.
'5 If from the given set of n observations (each corresponding to a point on the plane xy) there is derived a set of n/s observations each obtained by averaging a batch numbering s of the original observation; the coefficient of correlation for the derived system is the same as that which pertains to the original system. As to the standard deviation for the new system see note to par. 135.
1
7!
72
T2 2s
9
72
72
72
IS
3
n
n
o
shape. Although the regressionequations obtained would not
accurately fit the original material, yet they would have a certain
correspondence thereto. What sort of correspondence . may be
illustrated by an example in games of chance, which Professor
Weldon kindly supplied. Three halfdozen of dice having been
thrown, the number of dice with
2
more than three points in that dozen
which is made up of the first and
the second halfdozen is taken for y,
the number of sixes in the dozen
made up to the first and the third
halfdozen, is taken for x. Thus
each twofold observation (xy) is the
sum of six twofold elements, each of
which is subject to a law of fre
quency represented in fig. 13 ; where 1
the figures outside denote the num
ber of successes of each kind, for the
ordinate the number of dice with b ' • more than three points (out of a cast
of=i1/5/18; o2=i/J2; r=I/00.
Whence for the regressionequation which gives the value of the ordinate most probably associated with an assigned value of the abscissa we have y=xXro2/oi=o.3x; and for the other regressionequation, x=y/6. Accordingly, in Professor Weldon's statistics, which are reproduced in the annexed diagram, when X=3 the
0 1 2 3 4 5 6 7 8 9 10 11 12
12 I
11 4 3 3 3 I
10 3 17 15 13 10 4 3 I
9 12 51 59 61 36 14 5 3
8 36 135 154 150 64 21 5 2
7 74 195 260 179 112 35 5 I
6 90 248 254 170 75 26 3
5 93 220 230 124 51 8 2
4 86 162 127 75 19 4 I
3 37 86 56 17 6 2
2 14 23 23 4 3
1 2 4
0
most probable value of y ought to be I. And in fact this expectation is verified, x and y being measured along lines drawn through the centre of the compartment, which ought to have the maximum of content, representing the concurrence of one dozen with two sixes and another dozen with six dice having each more than three points, the compartment which in fact contains 254 (almost the maximum content). In the absence of observations at x= 3i or y= t 6i, the regressionequations cannot be further verified. At least they have begun to be verified by batches composed of six elements, whereas they are not verifiable at all for the simple elements. The normal formula describes the given statistics as they behave, not when by themselves, but when massed in crowds: the regressionequation does not tell us that if x' is the magnitude of one member the most probable magnitude of the other member associated therewith is rx', but that if x' is the average of several samples of the first member, then rx' is the most probable average for the specimens of the other member associated with those samples. Mr Yule's proposal to construct regressionequations according to the normal rule " without troubling to investigate the normality of the distribution "2 admits of this among other explanations.' Mr Yule's own view of the subject is well worthy of attention.
' Cf. above, par. 115. 2 Prot. Roy. Soc., vol. 6o, p. 477. ' Below, par. 168.
154. In the determination of the standarddeviation proper to the law of error (and other constants proper to other laws of frequency) it commonly happens that besides the inaccuracy, sheppard's which has been estimated, due to the paucity of the Corrections. data, there is an inaccuracy due to their discrete charac
ter: the circumstance that measurement, e.g. of human heights, are given in comparatively large units, e.g. inches, while the real objects are more perfectly graduated. Mr Sheppard has prescribed a remedy for this imperfection. For the standard deviation let µ2 be the rough value obtained on the supposition that the observations are massed at intervals of unit length (not spread out continuously, as ideal measurements would be) ; then the proper value, the mean integral of deviation squared, say (µ2) =112 1', h2, where h is the size of a unit, e.g. an inch. It is not to be objected to this correction that it becomes nugatory when it is less than the probable error to which the measurement is liable on account of the paucity of observations. For, as the correction is always in one direction, that of subtraction, it tends in the long run to be advantageous even though masked in particular instances by larger fluctuating errors.'
155. Professor Pearson has given a beautiful application of the theory of correlation to test the empirical evidence that a given group conforms to a proposed formula, e.g. the normal
Pearson's law of error.' Criterion of
Supposing: the constants of the proposed function to Bmpirkai be known—in the case of the normal law the arith
metic mean and modulus—we could determine the
position of any percentile, e.g. the median, say a. Now the probability that if any sample numbering n were taken at random from the complete group, the median of the sample, a', would lie at such a distance from a that there should be r observations between
Ir~
~
a and a' is 2%7rn exp2r2/n, 6
If, then, any observed set has an excess which makes the above written integral very small, the set has probably not been formed by a random selection from the supposed given complete group. To extend this method to the case of two, or generally n, percentiles, forming (n + I) compartments, it must be observed that the excesses say e and e', are not independent but correlated. To measure the probability of obtaining a pair of excesses respectively as large as e and e', we have now (corresponding to the extremity of the probabilitycurve in the simple case) the solid content of a certain probabilitysurface outside the curve of equal probability which passes through the points on the plane xy assigned by e, e' (and the other data). This double, or in general multiple, integral, say P, is expressed by Professor Pearson with great elegance in terms of the quadratic factor, called by him x2, which forms the exponent of the expression for the probability • that a particular system of the values of the correlated e, e', &c., should concur 1
P=1/z/1r fe_ X2dX +~Ze_lx' [X +. X3 .+ . . . + ( J
x 7r LI I.3 1.3 . (n—2) when n is odd; with an expression different in form, but nearly coincident in result, when it is even. The practical rule derived
from this general theorem may thus be stated. Find from the given observations the probable values of the coefficients pertaining to the formula which is supposed to represent the observations. Calculate from the coefficients a certain number, say n, of percentiles; thereby dividing the given set into n +I sections, any of which, according to calculation, ought to contain say m of the observations, while in fact it contains m'. Put e for m'—m; then xi=Eel/m. Professor Pearson has given in an appended table the values of P corresponding to values of n + I up to 20, and values of x2 up to 70. He does not conceal that there is some laxity involved in the circumstance that the coefficients employed are not known exactly, only inferred with probability'
156. Here is one of Professor Pearson's illustrations. The table on next page gives the distribution of moo shots fired at a line in a target, the hits being arranged in belts drawn on the target parallel to the line. The " normal distribution " is obtained from a normal curve, of which the coefficients are determined from the observations. From the value of x2, viz. 45.8, and of (n+I), viz. II, we deduce, with sufficient accuracy from Professor Pearson's table, or more exactly from the formula on which the table is based, that P=.x00,001,5 • . " In other words, if shots are distributed on a target according to the normal law, then such a distribution as that cited could only be expected to occur on an average some 15 or 16 times in 10,000,000 times."
157. " Such a distribution " in this argument must be interpreted as a distribution for which it is claimed that the The observations are all independent of each other. Suppose Criterion that there were only 500 independent observations, the Criticized. remainder being merely duplicates of these 50o. Then in the above
4 Just as the removal of a tax tends to be in the long run beneficial to the consumer, though the benefit on any particular occasion may be masked by fluctuations of price due to other causes.
6 Phil. Mag. (July, 1900). B As shown above, par. 103.
7 Loc. cit. p. 166.
table the columns for the normal distribution and for the discrepancy e should each be halved; and accordingly the column for e2/m should be halved. Thus e2/m being reduced to 22.9, P as found from Professor Pearson's table is between 995 and 629. That is, such a distribution might be expected to occur once on an average some once or twice in a hundred times. If actual duplication of this sort is not common in statistics; yet in all such applications of the
Belt. Observed Normal e. z/
Frequency. Distribution. a m.
I I I O O
2 4 6 — 2 0. 667
3 to 27 17 10.704
4 89 67 +22 7.224
5 190 162 +28 4'839
6 212 242 3o 3'719
7 204 240 36 5.400
8 193 157 +36 8.255
9 79 70 + 9 1.157
Io 16 26 —to 3.846
II 2 2 O O
woo 1000 — 45.811
Pearsonian criterion—and in other calculations involving the number of observations, in particular the determinations of probable error —a good margin is to be left for the possibility that the n observations are not perfectly independent: e.g. the accidents of wind or nerve which affected one shot may have affected other shots immediately before or after.
158. (2) The Generalized Law of Error.—That the normal law of error should not be exactly fulfilled is not disconcerting to those who ground the law upon the plurality of independent causes. On that view the normal law would only be exact when the numbers of elements from which it is generated is very great. In general, when that number is large, but not indefinitely great,' there is required a correction owing to one or other of the following imperfections: that the elements do not fluctuate according to the normal law of frequency; that their fluctuations are not independent of each other; that the function whereby they are aggregated is not linear. The correction is formed by a series of terms descending in the order of magnitude.
159. The first term of this series may be written
2 (k/c')[x/c 2x'/3ca] ;
where c2/2 is the mean square of deviation for the compound and also the sum of the mean squares of deviations for the component
elements, kl is the mean cube of deviations for the Second ed ndhd compound and the sum of the mean cubes for the cornand T hirda ponents, and the elements are supposed to be such and flops. so numerous that k,/c' is of the order n. This second
approximation, first given by Poisson, was rediscovered by De Forest.' The present writer has obtained it' by a variety of methods. By a further extension of these methods a third and further approximations may be found. The corrected normal law is then of the form'
z (exp—z2) [1—2k (X—3ca) +k2 (—3+IO~
3 c 4 9 8 x" cb) ( cz 3 c4) 20
2 a + +k2 2 z+'' a
where k = k,/ c', k2 = k2/c4, k1 and c are defined as above, k2 is the sum of the respective differences for each element between its mean fourth power of error and thrice its mean square of error,' and also the corresponding difference for the compound. The formula may be verified by the case of the binomial, considered as a simple case of the law of great numbers. Here
c2 = 2npq, k1 = npq (q—p), k2 = np4 (1 — 6p4)?
1 It is frequent in the statistics of wages.
' See on this subject, in addition to the paper on the " Law of Error " already cited (Comb. Phil. Trans., 1905), another paper by the present writer, on " The Generalized Law of Error," in the Journ. Slat. Soc. (September, 1906).
' The Analyst (Iowa), vol. ix.
4 Phil. Mag (Feb., 1896) and Comb. Phil. Trans. (1905).
6 The part of the third approximation affected with k2 may be found by proceeding to another step in the method described (Phil. Mag., 1896, p. 96). The remaining part of the third approximation is found by the same method (or the variant on p. 97) from the
partial differential aquat 4, where k2, is the differ
ence on dx
new di — 24
ence between the actual mean fourth power of deviation and what it would be if the normal law held good. Further approximations may be obtained on the same principle.
6A4  31222 in the notation which Professor Pearson has made familiar.
1 Cf. Pearson, Trans. Roy. Soc. (1895), A, clxxxvi. 347.These values being substituted for the coefficients in the general formula, there results an expression which may be obtained directly by continuing' to expand the expression for a term of the binomial.
In virtue of the second approximation a set of observations is not to be excluded from the affinity to the normal curve because, like the curve of barometric heights,9 it is slightly asymmetrical. In virtue of the third approximation it is not excluded because, like the group of shotmarks above examined, it is, though almost perfectly symmetrical, in other respects apparently somewhat abnormal.
160. If the third approximation is not satisfactory there is still available a fourth, or a still higher degree of approximation.10 The general expression for y which (multiplied by ox) ff, her represents the probability that an error will occur at a Approximaparticular point (within a particular small interval) lions. may be written
a 4 I t 2
I d\
i
(ox) +k44 e —k+1~ ... yo,
? (dd x) — ... +(— I) k1 t+2) ! (dxd)
where yo is (the normal e—x2/2k, k is the
errorfunction) T 5w)
mean square of deviation; k1, k2,..., &c., are coefficients formed from the mean powers of deviation according to the rule that k1 is the difference between the tth mean power as it actually is and what it would be if the (t—1)th approximation were perfectly correct. Thus k1 is the difference between the actual mean third power and what the third powel would be if the first approximation, the normal law, were perfectly correct, that is, the difference between the actual mean third power, often written µa, and zero, that is As. Similarly kx is the difference between the actual mean fourth power of deviation, say A3, and what that mean power would be if the second approximation were perfectly correct, viz. 3k'. Thus k2=µ2—3kz. The series k1, ka, k6, &c., k, k2, k4, &c., form each a succession of terms descending in the order of magnitude, when each k, e.g. k1 has been divided by the corresponding power, i.e. the power (t+2) of the parameter or modulus c (2k), which division is secured by the successive differ
entiations of d)\ '
yo, with which each k is associated, e.g. ko with (dx
Moreover, the first term of the odd series of k's when divided by the proper power of the parameter, viz. c' is small in comparison with the first term of the even series, viz. k, properly referred—divided by c' ( = 2k).
161. Whatever the degree of approximation employed, it is to be remembered that the law in general is only applicable to a certain
Character
of the Approximation.
162. The law thus generalized may be extended, with similar reservations, to two or more dimensions, For example, the second approximation in two dimensions may be written
zo 3! 1, 8,0 (fits dxdy dxdyz +o'ak dy) mor olor
 k— + 32,1 k + 31,2 k
where zo is (the normal errorfunction) More
Dimen
I (x'—2rxy+y2) slons.
2r(I — I2) en.— (i _ r2)
x and y are (as before) coordinates measured from the centre of gravity of the group as origin, each referred to (divided by) its proper modulus; r is the ordinary coefficient of regression; 3,ok is the mean value of the cubes x', 2,1k is the mean value of the products x2y, and so on; all these k's being quantities of an order less than unity. This form lends itself readily to the determination of a second approximation to the regressioncurve, which is the locus of that y, which is the most probable value of the ordinate corresponding to an assigned value of x. Form the logarithm of the abovewritten expression (for the frequencysurface); and differentiate that logarithm with respect to x. The required locus is given by equating this
Above, Q 103, referring to Todhunter, History, art. 993. The third (or second additional term of) approximation for the binomial, given explicitly by Professor Pearson, Trans. Roy. Soc. (1895), A, footnote of p. 347, will be found to agree with the general formula above given, when it is observed that the correction affecting the absolute term, his yo, disappears in his formula by division.
° Journ. Stat. Soc. (1899), p. 550, referring to Pearson, Trans. Roy. Soc. (1898), A.
10 Practically no doubt the law is not available beyond the third or fourth approximation, for a reason given by Pearson, with reference to his generalized probabilitycurve, that the probable error incident to the determination of the higher moments becomes very great.
u This consideration does not present the determination of the true moments from the complete set of observations if homogeneous, according as the system of elements fulfils more or less perfectly certain conditions.
range of the compound magnitude here represented by the abscissa x.11 The curve of error, even when generalized as here proposed, coincides only with the central portion—the body, as distinguished from the extremities —of the actual locus; a greater or less proportion.
differential to zero (the second differential being always negative). The resulting equation is of the form
ya rx—T—axe— 2$xyyy2=e,
where T, a, ti, y are all small, linear functions of the k's. As y is nearly equal to r x, it is legitimate to substitute r x for y, when y is multiplied by a small coefficient. The curve of regression thus reduces to a parabola with equation of the form
y—T=rx—qx2;
where q is a linear function of the third mean powers and moments of the given group.
163. Dissection of certain Heterogeneous Groups.—Under the head of law of error may be placed the case in which statistics relating to two (or more) different types, each separately conforming to the normal law, are mixed together; for instance, the measurements of human heights in a country comprising two distinct races.
In this case the quaesita are the constants in a curve of the form:
y = a (I/ 117c1) exp—(x—a)2/cie+0(1N ace) cxp — (x — b)2/c22 where a and fl are the proportionate sizes of the two groups (a+13 =1); a and b are the respective centres of gravity; and c,, c2 the respective moduli. The data are measurements each of which relates to one or other of these component curves. A splendid solution of this difficult problem has been given by Professor Pearson. The five unknown quantities are connected by him with the centre of gravity of the given observations, and the mean second, third, fourth and fifth powers of their deviations from that centre of gravity, by certain rational algebraic equations, which reduce to an equation in one variable of the ninth dimension. In an example worked by Professor Pearson this fundamental equation had three possible roots, two of which gave very fair solutions of the problem, while the third suggested that there might be a negative solution, importing that the given system would be obtained by subtracting one of the normal groups from the other; but the coefficients for the negative solution proved to be imaginary. " In the case of crabs' foreheads, therefore, we cannot represent the frequency curve for their forehead length as the difference of two normal curves." In another case, which prima facie seemed normal, Professor Pearson found that " all nine roots of the fundamental nonic lead to imaginary solutions of the problem. The best and most accurate representation is the normal curve."
164. This laborious method of separation seems best suited to cases in which it is known beforehand that the statistics are a mixture of two normal groups, or at least this is strongly suggested by the twoheaded character of the given group. Otherwise the less troublesome generalized law of error may be preferable, as it is appropriate both to the mixture of two—not very widely different—normal groups, and also the other cases of composition. Even when a group of statistics can be broken up into two or three frequency curves of the normal—or not very abnormal—type, the group may yet be adequately represented by a single curve of the " generalized " type, provided that the heterogeneity is not very great, not great enough to prevent the constants k,, k2, k3, &c., from being small. Thus, suppose the given group to consist of two normal curves each having the same modulus c, and that the distance between the centres is considerable, so considerable as just to cause the central portion of the total group to become saddlebacked. This phenomenon sets in when the distance between the centre of gravity of the system and the centre of either component = J 2c.1 Even in this case k2 is only—o•125; k4 is 0.25 (the odd k's are zero).
Section II.—Laws of Frequency.
165. A formula much more comprehensive than the corrected normal law is proposed by Professor Pearson under the The designation of the " generalized probabilitycurve." " GeneralThe ground and scope of the new law cannot be better /zed Prahastated than in the words of the author: " The slope of bltity the normal curve is given by a relation the form Curve."
Idyx
y dx c1
The slope of the curve correlated to the skew binomial, as the normal curve to the symmetrical binomial, is given by a relation of the form
dy x
—
y dx c1+c2x
Finally, the slope of the curve correlated to the hypergeometrical series (which expresses a probability distribution in which the contributory causes are not independent, and not equally likely to give equal deviations in excess and defect), as the above curves to their respective binomials, is given by a relation of the form
'Cf. Journ. Stat. Soc. (1899), lxii. 131. A similar substitution of the generalized law of error may be recommended in preference to the method of translating a normal law of error (putting x=f(x), where it obeys the normal law of error) suggested by the present writer (Journ. Stat. Soc., 1898), and independently by Professor J. C. Kapteyn (Skew Frequency Curves, 1903).
I dy x
_
y dx c1+c2x+csx2'
This latter curve comprises the two others as special cases, and, so far as my investigations have yet gone, practically covers all homogeneous statistics that I have had to deal with. Something still more general may be conceivable, but I have found no necessity for it." 2 The " hypergeometrical series," it should be explained, had appeared as representative of the distribution of black balls,' in the following case. " Take n balls in a bag, of which n are black and qn are white, and let r balls be drawn and the number of black be recorded. If r > pn, the range of black balls will lie between o and pn; the resulting frequencypolygon is given by a hypergeometrical series."
Further reasons in favour of his construction are given by Professor Pearson in a later paper.4 " The immense majority, if not the totality, of frequency distributions in homogeneous material show, when the frequency is indefinitely increased, a tendency to give a smooth curve characterized by the following properties. (i.) The frequency starts from zero, increases slowly or rapidly to a maximum and then falls again to zero—probably at a quite different rate—as the character for which the frequency is measured is steadily increased. This is the almost universal unimodal distribution of the frequency of homogeneous series . . (ii.) In the next place there is generally contact of the frequencycurve at the extremities of the range. These characteristics at once suggest the following of frequency curve, if yax measure the frequency falling between x andx+Sx:
dy __ y_ (x+a)
dx F(x) " '
Now let us assume that F (x) can be expanded by Maclaurin's theorem. Then our differential equation to the frequency will be
1 dy x+a
y dx bo+blx+bzx2+ .. .
Experience shows that the form (x) [" keeping bo, b1, b2, only "] suffices for certainly the great bulk of frequency distributions."
166. The " generalized probabilitycurve " presents two main
forms 6.
y =yo(I +x/a1)rat) I —x/a2)va2, 1 —e tarilx/a.
and y=yo(1+x2/a2)me
When a1, a2, v are all finite and positive, the first form represents, in general, a skew curve, with limited range in both directions; in the particular case, when a1=a2, a symmetrical curve, with range limited in both directions. If a2 = co , the curve reduces to
representing an y = yo asymmetr(2ica+lx/a1binarIomial a—~ x)
with v=2µ2fµ3, and
21 =2/222/µ3—aµ3/µ2, 142 and ao, being respectively the mean second and mean third power of deviation measured from the centre of gravity. In the particular case, when as is small, this form reduces to what is above called the " quasinormal " curve; and when µ3 is zero, a1 becoming infinite, to the simple normal curve. The pregnant general form yields two less familiar shapes apt to represent curves of the character shown in figs. 14 and 15—the one occurring in a
good number of instances, such as infant deaths, the values of houses, the number of petals in certain flowers; the other less familiarily illustrated by Consumptivity and Cloudiness.? The second solution represents a skew curve with unlimited range in both directions.' Professor Pearson has successfully applied these formulae to a number of beautiful specimens culled in the most diverse fields of statistics. The flexibility with which the generalized probabilitycurve adapts itself to every variety of existing groups no doubt gives it a great advantage over the normal curve, even. in its extended form. It is only in respect of a priori evidence that the latter can claim precedences
167. Skew Correlation.—Professor Pearson has extended bis
2 Trans. Roy. Soc. (1895), A, p. 381. ' Ibid. p. 36o.
4 " Mathematical Contributions to the Theory of Evolution "
(Drapers' Company Research Memoirs, Biometric Series II.), xiv. 4. p. 7, loc. cit. ' Ibid. p. 367.
7 Pearson, loc. cit., p. 364, and Proc. Roy. Soc.
8 A lucid exposition of Professor Pearson's various methods is given by W. Palin Elderton in Frequencycurves and Correlation (1906).
9 Journ. Stat. Soc. (1895), p. 506.
method to frequencyloci of two dimensions;1 constructing for the curve of regression (as a substitute for the normal right line), in the case of " skew correlation," a parabola,' with constants based on the higher moments of the given group.
168. In this connexion reference may again be made to Mr Yule's method of treating skew surfaces as if they were normal. It is certainly remarkable that the correlation should be so well represented by a line—the property of a normal surface—in cases of which normality cannot be predicated: for instance, the statistics of the number of husbands (or wives) living at each age who have wives (or husbands) living at different ages.' It maybe suggested that though in this case there is one dominant cause, the continual decrease of the population, inconsistent with the plurality of causes postulated for the law of error, yet there is a sufficient degree of accidental variation to realize one property at least of the normal locus.
169. There is possibly an extensive class of phenomena of which frequency depends largely on fortuitous causes, yet not Relations so completely as to present the genuine law of error.' between This mixed class of phenomena might be amenable Frequency to a kind of law of frequency that would be different and Proba from, yet have some affinity to, the law of error. bury. The double character may be taken as the definition of the laws proper to the present section. The definition of the class is more distinct than its extent. Consider for example the statistics which represent the numbers out of a million born that die in each year of age after thirty of forty—the latter part of the column in a lifetable. These are well represented by a species of Professor Pearson's "generalized probabilitycurve," his type iii. of the form
x1a)7'e
_1X
The statistics also lend themselves to the GompertzMakeham formula for the number living at the age
is =
The former law, the simplest species of the " generalized probabilitycurve," may well be attributed in part to the operation of a plexus of causes such as that which is apt to generate the law of error. In fact, a high authority, Professor Lexis, has seen in these statistics—or continental statistics in pare materia—a fulfilment of the normal law of errors They at least fulfil tolerably the generalized law of error above described. But the GompertzMakeham formula is not thus to be accounted for; at least it is not thus that it was regarded by its discoverers. Gompertz justifies his law' by a " hypothetical deduction congruous with many natural effects," such as the exhaustion of air by a pump; and Makeham follows' in the same track of explanation by way of natural laws. Of course it is not denied that mortality is subject to accident. But the GompertzMakeham law purports to be fulfilled in spite of, not by reason of, fortuitous agencies. The formula is accounted for not by the interaction of fleeting causes which is characteristic of probability, but by causes of that ordinary kind of which the investigation constitutes the greater part of natural science. Laws of frequency thus conceived do not belong to the theory of Probabilities.
' " Contributions," No xiv. (above cited).
s Not the same parabola as that proposed at par. 162.
3 Census of England and Wales General Report (cod. 2174), p. 226. Cf. p. 70, as to the rationale of the phenomenon.
A good example of the suggested blend between law and chance is presented by an hypothesis which Benine (in a passage referred to above, par. 97) has vroposed to account for Pareto's incomecurve.
" Contributions,' No. ii., Phil. Trans. (1895), vol. 186, A.
8 Lexis, Massenerscheinungen, § 46. Cf. Venn, cited above, par. 124.
' Phil. Trans. (1–25).
8 Assurance Magazine (1866), xi. 315.
403
" Generating Functions." Not all parts of the book are as rewarding as the Introduction (published separately as Essai philosophique des probabilites) and the fourth and subsequent chapters of the second book. Among numerous general treatises E. Czuber's Wahrscheinlichkeitstheorie (1899) may be noticed as terse, lucid and abounding in references. Other authorities may be mentioned in relation to the different parts of the subject as above divided. First principles are discussed with remarkable acumen by J. Venn in Logic of Chance (1st ed., 1876, 3rd ed., 1888) and by J. v. Kries in Principien der Wahrscheinlichkeitsrechnung (1886). As a repertory of neat problems involving the calculation of probability and expectation W. A. Whitworth's Choice and Chance (5th ed., 1901), and DCC. Exercises ... in Choice and Chance (1897) deserve mention. But this advantage is afforded in nearly as great perfection by more comprehensive works. Bertrand's Calcul des probabilites (1889) abounds in choice examples, while it excels in almost every other branch of the subject. Special mention is also deserved by H. Poincare's Calcul des probabilites (icons professes, 1893–1894). On local or geometrical probability Professor Morgan Crofton is one of the highest authorities. His paper on " Local Probability " in Phil. Trans. (1868), and on " Geometrical Theorems," Proc. Lend. Math. Soc. (1887), viii., should be read in connexion with the section on " Local Probability " in his article on " Probability " in the 9th edition of the Ency. Brit., from which section several paragraphs have been transferred en bloc to the section on Geometrical Applications in the present article. The topic is treated exhaustively by Czuber in Geometrische Wahrscheinlichkeiten and Mittelworten (1884). Czuber is also to be mentioned as the author of Theorie der Beobachtungsfehler, in which he has reproduced, often with improvement, or referred to, almost everything of importance in the work of his predecessors. A. L. Bowley's Elements of Statistics, pt. 2 (2nd ed., 1902), forms an introduction to the law of error which leads the beginner easily, yet far. References to other writers are given in Section I. of Part II. above. A list of writings on the cognate topic, the method of least squares, has been given by Merriman (Connecticut Trans. vol. iv.). On laws of frequency, as above defined, Professor Karl Pearson is the highest authority. His " Contributions to the Mathematical Theory of Evolution," of which twelve have appeared in the Trans. Roy. Soc. (1894–1903) and others are being published by the Drapers' Company, teem with new theories in Probabilities. (F. Y. E.9)
End of Article: I25 

[back] I235 
[next] I271 I 
There are no comments yet for this article.
Do not copy, download, transfer, or otherwise replicate the site content in whole or in part.
Links to articles and home page are encouraged.