On page 62, to explain methods for
random sampling, the authors describe how to make a sample of 50
students from a population of 150.
When they explain the stratified method
for random sampling, they say: your population list may be divided
into two lists of seventy-five males and seventy-five females and you
sample each list randomly until you have twenty-five of each.
The statement that out of a total
population of 150 students the genders are equally split is in
general not correct. You might think that it doesn’t really matter
whether the two lists don’t have exactly the same length, and that
the method remains valid. But this is not the case, because if there
are, say, 90 males and 60 females, a sample built with 25 males and
25 females would obviously not represent the population.
Now, rather than studying a sample that
represents the whole population, you might like to investigate
differences between male and female students, regardless of how many
of each gender are present in the population. Then, it would make
sense to do the split and pick equal numbers of students from the two
lists.
But how they put it, they are
definitely wrong. And they keep doing it. Here is how the text
continues: You then subdivide the two lists into age groups to
ensure you have sufficient numbers of, say under- and over-thirties
(assuming that age is relevant to your research question). As you can
see, if you are going to subdivide your sample into particular
demographic subgroups, your subgroups will become smaller and smaller
the more categories you include, until the sample size for these
subgroups cannot be seen as reliably representative.
So far so good, but now comes the
blunder: For instance twenty-five females, split equally into
under- and over- thirties will give you only twelve or thirteen
people in each group.
How can you split equally 25
students on the basis of age (if you define the discriminating age in
advance)? It’s plainly wrong.
And they go deeper and deeper in their
nonsense: If you want to look at four age categories, you will get
only six or so people in each age and gender subgroup.
You might think that with a couple of
“on average” added in the crucial places, everything would make
sense. But that is not the case, because their explanation would
still imply that the age distribution of students is flat, which is
not.
It’s sad to see such mistakes in
academic textbooks...
No comments:
Post a Comment