Last Updated on: 29th August 2022, 08:16 am
Population and sample are important concepts to understand when doing research. While people sometimes use the two terms interchangeably, they each have specific meanings. If you understand the difference from the start, you will save yourself time, headaches, and maybe even snide comments. You’ll move forward faster and thus save money. All in all, it’s worth taking five minutes to understand the difference between population vs sample.
Population vs Sample
Population refers to the entire group that you intend to draw conclusions about. Sample refers to the specific group of people that you collect data from–a sub-group of the population.
Depending on your study’s design, populations may reflect a group of people, a set of organizations, a set of documents, archived data, etc.
For example, if you’re looking at high school teachers and how they interact with students, then your population is all high school teachers. If you’re limiting it by high school teachers in Minnesota, then your population is all high school teachers in Minnesota.
100 Minnesota teachers who fill out a survey or are interviewed for your study would be a sample of this population.
Here’s another way of looking at population vs sample: the population is the whole pie, and the sample is the piece you eat. (But please don’t eat your study participants without getting IRB approval).
Why Use Samples, not Populations, in Research?
Looking at the difference between population vs samples, it’s easy to assume that studying the entire population would be preferable. The reason we use samples is to save time and resources, and in many cases, make a study feasible.
A researcher can’t reasonably ask all high school teachers in Minnesota to respond to a questionnaire, or–for qualitative research–participate in an interview. So we try to get a sample that is representative of the population.
The Census
One instance of trying to collect data on an entire population is the US Census. While nobody involved in the process will say that we actually get the data from everybody, by law, we must try to do it. This is one of the rare examples of research conducted (or at least attempted) on an entire population.
The US. Census is a massive undertaking, and the 2020 census is projected to cost $15.6 billion. Only an organization as powerful and well-funded as the US Government could take on such an ambitious project, and they only do so every 10 years. In fact, people involved in the census have asked to transition to using a sample (vs a population) approach, but that request, thus far, has been denied. There are concerns that it would be difficult to get a true representative sample if we don’t know the actual composition of the entire population.
Using a sample costs a lot less money, time, and resources than trying to study an entire population. Political polls are samples. We can’t ask everybody who they’re going to vote for, but a polling company might ask 1,000 people.
Representative Samples
The key, of course, is getting a sample that is actually representative of the population. In elections, is the population eligible voters, or is it eligible voters most likely to actually vote in the upcoming election? Not making that distinction has caused pollsters to make famously incorrect predictions in the past.
Your sample must mimic the population as closely as possible, meaning that you will need to know the make-up of your population first and then create parameters for your sample that represent these characteristics. For the Minnesota teacher survey, you’d want to be sure that you don’t simply survey teachers from one school, or teachers of just one demographic group. Any answers from that survey would not represent the diversity of the population of all Minnesota teachers.
Using Samples in Quantitative Research
If we’re doing quantitative work, we can try to extend, extrapolate, and generalize the results to the population, assuming our sample accurately represents the population as a whole.
Statistics are done on a sample in order to infer or generalize to the population as best you can. That’s why you see things like, “with a 95% confidence level,” or “a P value of .05.” That is, “I’m pretty sure the sample represents the population, because my sample is like a mini-version of the population.”
The trick with sample size is to get a big enough sample that your results can be generalized to the population. This is why many people do a G-power test in preparation for selecting their sample. A G-power test allows you to calculate how big a sample you need so that you can have confidence in your results.
Using Samples in Qualitative Research
In qualitative research you can’t generalize to the population, your sample size is much smaller, and you’re asking open-ended questions. Your sample may be around 20-25 respondents, or even fewer, depending on the type of qualitative study. This is because gathering extensive data from one person (or case, etc.) takes a lot of time.
I highly recommend the article titled “Are We There Yet? Data Saturation in Qualitative Research” by Patricia Fusch and Lawerence Ness (2015). It talks about sufficient sample sizes for qualitative research–that is, data saturation. From the Abstract:
“Failure to reach data saturation has an impact on the quality of the research conducted and hampers content validity. The aim of a study should include what determines when data saturation is achieved, for a small study will reach saturation more rapidly than a larger study. Data saturation is reached when there is enough information to replicate the study when the ability to obtain additional new information has been attained, and when further coding is no longer feasible.”
So, when doing qualitative research, you’re doing a deeper dive into the data on a small sample, rather than gathering information on a broader sample.
Sampling Error
No sample is going to perfectly describe the population. So you will be off a little bit, depending on how well you’ve collected your sample. Going back to political polls, they’ll say “a candidate is up by 6 points, with a plus or minus error of three points on either side.” That indicates their level of confidence that the sample represents the population.
Your sample can be biased, which will then make it different from your population. If you only sample people who have landline phones in a political poll, for example, that’s a different set of people than those who are actually going to vote.
Types of Sampling
There are a few different types of sampling. We’ll cover a few of the most common here.
Random Sampling
A random sample means it’s equally likely that any one person in the population will be part of the sample. Sometimes, when you’re looking to sample from a population randomly, you can use a random number generator. This is a computer program that essentially assigns a number to each person (or item) in the population and then randomly chooses numbers to include in the sample.
Random sampling is often considered the best way to sample. However, it’s not always possible. If your population is all teachers in the United States, you may not have access to those in Alaska or Hawaii, for example. In that case, you’ll either need to narrow your population or try a different sampling method.
Purposive Sampling
Expediency is the biggest reason to go with a purposive sample. In purposive sampling, you essentially say, “I’m not going to choose people at random, I’m going to include people I specifically choose.” A purposive sample can be easier to come up with than a random sample. However, for valid research, you always want to make sure that it’s as close a representation of the population as possible.
You can try to make it as random as possible within the purposive structure and not get people who have the same characteristics. If you know the demographic makeup of the population, you can also try to have a similar percentage of different groups represented in your sample.
Snowball Sampling
This can only be done in the winter. Just kidding. In snowball sampling, you get somebody to participate and ask them, “Do you know of anybody else who meets these criteria?” and you include them in your sample. You then ask them the same question, and so on. There’s no guarantee that this will fully represent the population. However, this can be a way to include people in populations that are difficult-to find through other means, such as un-housed people or people with drug addiction.
In general, regardless of type, the more people you sample, the more likely it is that it will be representative of the population. A large sample can potentially make up for some sampling errors.
Get Help Thinking It Through
Thinking through all the ways your sample could not be representative of the population you intend to generalize to can prevent future problems. This might take some digging into personal assumptions and biases.
For example, while it’s fairly easy to send out survey via Survey Monkey, if your population includes people who don’t have access to computers, you’ll be missing that part of the sample in your study, thus skewing your results.
The best way to think this through is to ask people to help find the flaws in the study. Your friend might mention that he does not answer the phones on the Sabbath, so if you’re doing a telephone survey you may miss the segment of the population that observes this custom. A committee member may share that she once made the mistake of assuming everyone could read English, but she later realized that almost 20% of her population couldn’t, and she had no one in her sample to represent that segment.
Studies need to be thought through with others at every stage because it’s difficult for us to know what we don’t know. Critical reviewers can help us see when our sample might not be representative of the population; when a random sample won’t be as useful as snowball sampling; how qualitative data may be generalized; etc.
The next time you pick up the phone and agree to answer a survey, ask yourself what population the researchers might be sampling and how choosing you might be more or less representative of their intended population. You now know a lot more about whether or not their data will be reliable and valid.