Tuesday, July 29, 2008

Math is hard, so is science journalism...

So the NYTimes reports on a new study that shows no difference in average scores between boys and girls. They open, as one would expect, with the standard laugh at how this just proves once again how stupid, wrong and sexist old Larry Summers (youngest tenured professor in Harvard history, the idiot) was for suggesting that intrinsic aptitude might be at all responsible for the vastly greater percentage of men than women at top math and math-related faculties.

But, of course, Larry wasn't saying that men and women don't have equal average aptitude. His point was that all the studies of the issue show that men and women have equal averages but very unequal variances. The variance is the measure of how "spread-out" a distribution is. For a distribution centered about its mean, like a normal distribution, having a greater variance means that you will have higher and higher percentages the further you get from the mean.

This means when you're looking way, way in the tails (tails is a term used to refer to the far reaches, both left and right, of a distribution) if boys have a greater variance than girls, you will see a higher and higher ratio of boys to girls the further you get in the tails.

To put it in simple terms, if boys and girls have the same average but boys have a greater variance, you will expect to see 50-50 boys/girls at the average. As you start to go above the average, you will start to see that ratio skew towards the boys, say 52/50 at 1 SD (Standard Deviation, a measure of how far you are from the mean, in variance terms) above average (yes, I could do the math to get the actual differences with an implied variance, no I am not going to. Maybe later, if I get bored. BTW, I've never been that bored in my life so don't hold your breath.).

Now when you're talking SDs, the percentage of people you're talking about gets small, fast. About 15% of a population will be above 1SD, that drops to 2% more than 2 SDs above the mean (roughly, here, roughly), and only about 0.1% of a population will lie above 3SDs. (This one is near and dear to my heart for IQ related reasons...)

At 3SDs, or even further, say 4SDs (now you're talking like 1-in-a-million, or, say, like a Harvard math PHD's math ability level) the skew of boys-to-girls will get even more pronounced, if the boys have a higher variance. It might even get all the way to, say 85-to-15, which is the observed ratio.

So how did Larry and all the other studies get it wrong and this one get it right? Simple, this study agrees with Larry and all the others, but tries to play the difference down by suggesting that the variance they observed would account for a 75-25 ratio but only sexism could get it to the observed 85-15. Maybe it's sexism or maybe they're underestimating the variance measurement or maybe they're underestimating how far in the tails math-faculty really are, or maybe there are other attributes whose variance is different between men and women that reinforces the ratio or maybe lots of things.

Point is the nice lady at the NY Times either didn't understand what she was reporting or just decided to lie about what it said because it made a better story to laugh at dumb old Larry again rather than point out yet another study supporting his hypothesis.

Regardless, I'm sure the world will be a better place when we force extremely high-end math faculties to be evenly distributed between men and women. There's no way this will just result in either a) men being forced out as the departments shrink 'til only the number of men that equals the (much smaller) number of interested and capable women are allowed to do it or b) the lowering of standards until there is just no such thing as a capable and qualified math faculty.

After all, that's not at all what happened to college sports when this logic was applied to it... oh wait...



UPDATE: What do you know, I actually got that bored. Also I wanted to put up some pretty pictures to go with this post, so I did.


Here they are, maybe I'll try to work them into the text, though the fact that it was written picture-less somewhat complicates that, so I might not.


At any rate.

Here is a graph showing two normal probability distribution functions. They are both normal, with mean zero. The standard deviation of the "boys" function I have chosen to be 10% higher than the "girls" function. This gives a variance ratio (boys/girls) of 1.21, which is within the parameters seen in the latest study, though I just picked it arbitrarily as an example.


The x-axis is "girl" SDs. As you can see, the girl's graph is taller and skinnier than the boy's graph. This is because, as I said above, the higher variance "spreads out" the graph:


But, as I said, the real fun begins when you get into the tails. That's where the relatively small difference in variance starts to cause some real fun.


Here are two images of the left tail, that is the "bad-at-math" side of the distribution. The first shows the tail from 4 to 2 SDs below the mean and is, in effect, a "zoom-in" of the left part of the above graph. The second shows the same thing, but only 4 to 3 SDs below the mean, this is a zoom-in of the left half of the first zoom-in, if you will.

The thing to note is that while it looked above like the difference was vanishingly small in the tails, this is due mostly to the fact that in the tails you're dealing with such small numbers. When you zoom in, you see that the variance is in fact causing the difference between boys and girls to grow:

Finally, here is a graph showing how many boys versus girls you would expect out of 100 for the entire left half of the probability distribution. As you will be able to see, as you get further into the "worse at math" side of the spectrum, a random group of 100 made up of kids at that level or below gets more and more lopsidely male. At the mean it's 50-50, as expected since they have the same "average" ability, at 4 girl-SDs below the mean (which is the 1-in-10,000 bad at math kid, so the really, really bad at math kid) you're looking at 19 girls to 81 boys on average. Not because of sexism, but because there are more really dumb boys than girls in this model of math-ability.

(Note that, because the normal is symmetric about the mean, the "good-at-math" side of the dist. would display exactly the opposite tendency, that is out of a group of 100 random boys and girls, you would expect more and more boys as you get into "better" math territory and at 4 girl-SDs above the mean you would expect 81 boys to 19 girls. Note that this is the 1-in-10,000 "good-at-math" folk, or, in other words, the population that you are drawing from when selecting professors for the most elite universities.)

No comments: