Generate a Sampling Distribution

Top  Previous  Next

How To... > Create Simulations > Simulation Examples > Generate a Sampling Distribution

The concept of a sampling distribution is important in statistics. The idea is that you have defined some statistic for a random sample. Now you sample again and again, keeping the sample size the same and keeping track of this statistic for each sample. The values of the statistic vary, of course, and fall into a distribution. A histogram of these values represents the sampling distribution of the statistic. (More precisely, the histogram approximates the sampling distribution, which is the distribution you would obtain from all possible samples.)

 

_img457

Some statistics, especially those commonly used in statistical practice, are known to have sampling distributions that can be approximated, under certain conditions, by a normal or other probability distribution. For many statistics, however, the only way to come up with the sampling distribution is to use a computer to actually carry out the repeated sampling procedure.

In this example, we create the sampling distribution of the median age of people from a certain collection of individuals from the 2000 census. We consider the collection of 500 ages to be the population. The distribution of ages in the population is shown here. We’ll draw simple random samples of 10 ages and compute the median age for each sample.

1.Start with a collection. (You don’t have to start with census microdata, but if you want to, the instructions are here Import U.S. Census Microdata from IPUMS.)
2.With the population collection selected, choose Collection | Sample Cases.

This creates a new collection with 10 cases chosen randomly with replacement. If you want to change the number of cases per sample, change the sampling controls in the Sample panel of the sample collection’s inspector.

_img458

3.Double-click the sample collection to show its inspector, then choose the Measures tab.
4.Define a new measure, as shown, that computes the median age of the people in the sample. (You can define as many measures as you want.)

_img459

 

5.With the sample collection selected, choose Collection | Collect Measures.

You now have a measures collection containing five cases, each of which has one value for the median age of people in a sample. (If you defined more than one measure, each measure becomes an attribute in the measures collection.)

6.Double-click the measures collection to show its inspector.
7.If necessary, go to the Collect Measures panel.
8.Change the number of measures collected from the default of 5 to 95.

_img460

9.Click Collect More Measures.

The collection you get contains 100 values of the median age.

You can graph the data in a measures collection, just as you do with any collection. (Go to the Cases panel, and drop the attribute on a graph.)

The histogram is an approximation of the sampling distribution for the statistic.