The Mathematics of Man Pt. 1: Heritability & the Basics

Counting away the blank slate.

Behavioral genetics is THE theory of the social sciences of this century. Robert Plomin is a genius; the blank slate is soaked in the blood of its last intelligent adherents. Sociology, anthropology, and history are jokes until they recognize the fundamental law of man: P = G + E. This post is about the math behind that law.

We start with sets, a collection of numbers. [1, 2, 3] is a set. From here we get probability; if X is a random variable, i.e. what you draw randomly from the set, what is the probability X = 3? P(X=3) = 1/3 in this case. This is basic. P(X = n) simply equals the number of times n occurs in the set divided by the size of the set.

We now go to a concept called expected value, denoted as E(X).

If you’re a smart little math genius you already see that this is equivalent to the formula

For our set above, (1 + 2 + 3)/3 = 1 * 1/3 + 2 * 1/3 + 3 * 1/3 = 2. The “average”, , which you are probably familiar with, is just the expected value of a finite sample. This isn’t very useful for our small uniform distribution, but it’s very useful in other types of distributions. To gain an intuition, let’s generate a normally distributed set: [2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 11, 11, 12].

E(X) = 7 for the normal distribution. As you can hopefully see, 7 is literally our expected value in this case, i.e. it’s the most likely value to occur. Furthermore, in this distribution, the next likely values are adjacent to the expected value.

Very different distributions can have the same expected values. Variance helps account for some of these differences. Take the set [7,7,7] for instance. It has a mean, median, and mode of 7, just like the normal distribution. But its variance should be zero. Here’s why.

Variance is the average squared difference between each data point and the expected value. In other words,

This is also written as

As you can see, the set [7,7,7] has a variance of 0. The normal distribution, on the other hand, has a variance of 5.83.

Last of the basic concepts are covariance and correlation.

These concepts should be pretty intuitive, but here’s an example. Take the matrix [ [1,3], [2,6], [3,9] ]. Var(X) = 2/3 and Var(Y) = 6, so STD(X) = sqrt(2/3) and STD(Y) = sqrt(2*3), meaning STD(X ) * STD(Y) = 2. Cov(X,Y) = (1*3 + 1*3)/3 = 2, so Corr(X,Y) = 1. The correlation is one because X changes with Y and Y changes with X and there is no unaccounted variance. As an aside, Correlation is always in between 1 and negative 1 due to the Cauchy Schwarz inequality.

Okay, that’s all for the basics. If you understood these things, you are now ready for the law of quantitative genetics: P = G + E.1 (P is phenotype, G is genotype, and E is environment). With this we can understand heritability. Since P = G + E2

The heritability equation is:

Finding P and Var(P) for a trait like IQ or height is easy. The interesting part is finding Var(G). This is usually done via twin study, although now molecular methods allow us to estimate Var(G) directly. Due to an incomplete list of significant SNPs (and probably other factors), the direct method currently underestimates heritability severely. So how do we estimate Var(G) using twins?

There are two common formulas:3

The first is slightly simpler to explain. r(mza) is the intraclass correlation coefficient (ICC) of monozygotic twins reared apart (MZA) for some trait. What this means is that researchers take, say, 1000 pairs of twins (i.e. 2000 individuals) and they measure everyone’s IQ. Using a formula similar to the correlation formula seen above, they can estimate how much variance is within twin pairs and how much is between. The ICC is the proportion of variance that is between the pairs. When twins’ scores are very similar, we can imagine that Cov(T1, T2) ~ Var(T1 or T2) so you get an ICC near 1. When they are unsimilar, Cov(T1,T2) < Var(T1 or T2), so you get a low ICC. Here is the important part: estimating the proportion of variance that is between MZA pairs is the same as estimating the proportion of phenotypic variance that is genetic in a population. This is because the twins have the same genes, but, on average, different environments. The degree to which MZA pairs “vary together” in the face of individualized or randomized environmental factors is the degree to which genetics cause variance in a random sample.

The math I used above is only a rough approximation with the basics we already know. Let’s work through an example with the real math so we can gain a deeper intuition. Say we have the following 10 X 2 matrix: [   [110,113] , [94,91] , [111,112] , [98,104] , [117,120] , [83,79] , [93,92] , [101,103] , [119,117] , [121,125]  ]. We will need the midpoints, so let’s write those out: [111.5, 92.5, 111.5, 101, 118.5, 81, 92.5, 102, 118, 123].

We have 20 IQ measurements. The first step is to express these measurements in an unusual way:

P is the phenotype (IQ in this instance), u is the overall mean (105.15), G is the pair mean minus the overall mean, and E is the remaining random effect. Example: 94 = 105.15 – 12.65 + 1.5. Next, we assume that G and E do not covary, so

The set of Gi is [6.35, -12.65, 6.35, -4.15, 13.35, -24.15, -12.65, -3.15, 12.85, 17.85]. Its variance is 167.3. The set of Eij is [-1.5, 1.5, -0.5, -3, -1.5, 2, 0.5, -1, 1, -2, 1.5, -1.5, 0.5, 3, 1.5, -2, 0.5, 1, -1, 2]. Its variance is 2.62. Thus Var(P) = Var(G) + Var(E) = 169.92. We can verify that this is true by calculated the variance of the set of all IQs directly.

The ICC is simply

In this case, r = 0.98. The set of E has much less spread than the set of G; in other words, individual effects are largely unneeded. Individual scores cluster tightly around their pair-averages.

Let’s see what happens with a set of scores that are paired randomly: [   [110,101] , [118,106] , [96,118] , [109,101] , [108, 118] , [92, 110] , [91, 138] , [104, 107] , [82, 113] , [113,98]    ]. Gi = [-1.15, 5.35, 0.35, -1.65, 6.35, -5.65, 7.85, -1.15, -9.15, -1.15]. Here r = 25%. That’s very low for such a small sample. As we can see, it’s hard to interpret this result literally, since it was generated randomly, but that’s a topic for another time. In general, the math works.

So why does r(mza) allow us to measure heritability? It is an empirical matter. Because we empirically assume that MZAs share 100% of their genes, we write that

Where c is shared environment and e is unshared environment. As far as I am aware, there is no “proof” of this, and this statement is at the center of the “controversy” surrounding the legitimacy of twin studies. That aside, using this structure we can write the following and see the reasoning behind the second method:

Since twins raised together share their family (“shared”) environment, all of that variance is going to contribute to between-pairs variation. Since DZTs share 50% of their genes, about half of their genetic variance will contribute to between-pairs variation, and the other half to individual variation.

The following algebra derives the second heritability equation from our assumptions above:

Thus ends our foray into the theory of heritability. I find the assumptions here to be correct, and further discussion of them is beyond the purpose of this post. Going forward, I would like to provide similar explanations for IQ/g and the Big 5 personality traits. Then I write critically on the validity of these theories. But for now I hope you have learned something. While you’re here, why don’t you subscribe on whatever platform you’re accessing this on? If you don’t, you might miss out…

1

Why does this work? We can quantify genetic contribution into what is known as a “polygenic score” or PGS.  Since we can quantify G (which is the basic axiom of “quantitative genetics,” the field from which these things derive) and since we can measure P, E is deducible.

2

Technically, Var(G+E) = Var(G) + Var(E) + 2Cov(G,E).

3

I am neglecting the difference between narrow sense and broad sense heritability here. The former is denoted with a lower case h and the latter with an upper case H. Narrow sense conceptually only involves “additive” genetic effects, whereas broad sense includes all effects (the other two are epistasis and dominance effects). Because only additive effects are efficiently passed on, the MZA method is thought to estimate broad sense heritability, while the MZT-DZT method is thought to estimate narrow sense heritability. Consequently, for IQ, the MZA method usually estimates heritability > 80% while the latter method usually yields a result in between 60% and 75%.