# Sampling Primer

For a marketing research study to be accurate and valuable, the information gathered needs to be representative of the whole population. The population is the entire group that the researcher is interested in learning more about. Surveying the whole population would be a census and that would be ideal. But, in most cases, a survey of the whole population is too large and expensive to field and analyze. Since that is usually not an option, inferential statistics is needed. Inferential statistics involves using a sample, or subset, of the entire population to gather information that is representative, and also an estimate, of that population.

The first step in sampling is defining your sample. The sample unit is the basic level of the population that the researcher wants to measure. For example, the sample unit for a research study on the customer�s satisfaction with their Internet service would be the person that purchased that Internet service. Another example of a sample unit is the employees of Company X if Company X is doing a study on their employees� satisfaction.

Once the sample is correctly defined, the researcher needs to obtain a sample frame. A sample frame is a complete list of the population from which the sample is selected. Many times in marketing research, this complete list or data is not accessible, which is a form of sample frame error. Sample frame error is when the sample frame doesn�t contain the entire population, or it does not contain the correct people in the population. This is often a result of not having a current or the most up-to-date list of the population. Researchers use the term incidence rate to refer to the percentage of people that are actual members of the determined population. The higher the incidence rate, the less of a chance that sampling frame error has occurred.

The next step is determining the sample size needed. The sample size directly affects how accurate the findings are. The larger the sample size, the more accurate the findings, but it is also more expensive. So, how do we determine sample size? The most accurate method is the confidence interval method because it uses the statistical concepts of variability, sample errors and confidence intervals. Variability is the dissimilarity of the respondents� answers to a question. To account for the most variability, researchers usually set this to 50% / 50%. The most standard level of sample error that researchers usually will accept is 5%, making the confidence interval 95%. That means that the researcher can be 95% confident that the findings received are accurate. Other methods used are an arbitrary size (e.g. percentage of population), conventional size (size believed to be right), size based on the different statistical techniques used for analysis, and size based on cost of the research (available budget).

Next is determining what method to use in pulling the sample. There are two different sample designs to choose from: probability and nonprobability. Probability samples refer to the methods that ensure that the probability of a member of the population being chosen can be calculated. Nonprobability methods are more subjective, and the probability cannot be calculated.

## There are four different probability methods:

1. Simple Random Sampling � a random selection procedure to ensure that each respondent has the same chance of being selected into the sample (e.g. random digit dialing, plus-one dialing, random number selection of sample frame database, etc.).

2. Systematic Sampling � a random starting point in the sample frame is chosen, and then a constant skip interval is used to select each respondent for the sample. This is more efficient than simple random sampling and the formula often used to compute the skip interval is the population size/sample size.

3. Cluster Sampling � population is divided into clusters, or subgroups, which are very similar to each other and then one of two sampling methods are used: perform a census on one or a few of the clusters of the entire population, or randomly select more clusters and take a sample of them. Area sampling (clusters are a form of geographic location) is an often-used form of cluster sampling. There is a danger of error in cluster sampling if the clusters are not actually homogenous.

4. Stratified Sampling � identify strata, or subpopulations (e.g. income, gender, etc.), and perform simple random sampling on each strata, and then weights are applied to estimate the population�s findings. Stratified sampling is best used if a population is not a normal population or has a skewed distribution.

## There are also four different nonprobability methods:

1. Convenience Sampling � use a high-traffic area to recruit participants (e.g. mall). The error that occurs is those of the population that do not visit the area chosen have no chance of being included in the sample.

2. Judgment Sampling � use of judgment or educated guess by someone who is knowledgeable about the population of who should be in the sample. This is highly subjective, so there is likely to be error. This method is often used for recruiting focus groups.

3. Referral Sampling � ask respondents to identify people like themselves to participate in the survey; most often used when there is very small sample frame; sometimes called �snowball sampling.� The error that occurs is those of the population who aren�t well known, disliked, or have different opinions as the person being asked are not included in the sample.

4. Quota Sampling � identify quota characteristics, such as demographics, to use as the sample, and the quotas are defined by the research objectives.

Online sampling methods are a bit different. One approach is that of random online intercept sampling, which relies on random selection of visitors to a website. A common method is invitation online sampling, which each person in the sample is emailed an invitation to take a survey. The link is provided in the survey and sometimes a special key or password is provided as well. Another common method is online panel sampling. This is when you get the sample from a panel of people who are willing to participate in surveys. Online panels are a fast and convenient way to obtain sample. Also, there are many ways to cut the panel or set the parameters, so they are a very flexible source of sample. (Note: panels are often used in other research methodologies as well.)

The final step is the assessment of the sample. This can take many different forms. One is a sanity check of the sample plan or the sample process in its entirety. Another is sample validation, which is not always possible. Sample validation is the process that ensures the sample is truly representative of the population. This cannot be done if there is no prior knowledge of the population�s demographic profile. If the assessment fails, sometimes the researcher can weight and manipulate the sample so it becomes representative of the desired population. When this is not the case, resampling is necessary. This requires adding more sample until it reaches the adequate level of validation. Once the adequate level of validation is reached, it is on to the field for the data collection process.