Simulation—Polling Voters |
Tutorials > Simulation—Polling Voters In this tutorial, we simulate a population of voters, a certain proportion of whom will vote in favor of a particular proposition. We investigate the question of how accurately a random sample of voters can predict the outcome of an election. The city of Freeport has a rent control initiative, Proposition A, on the ballot. The local newspaper is going to conduct a poll three weeks before the election to gauge public sentiment. Staff members need to know how big a sample to poll. Our job is to set up a simulation they can use to determine, for any given sample size, the accuracy they should expect.
Modeling the Population of Voters We start by creating a model for the population, the people who will vote in the election for or against the proposition. The model will consist of a single number—the proportion of voters who will vote yes.
A proportion can only lie between 0 and 1, so we should adjust the slider scale. We could do this by dragging on the scale until it’s close enough, but there’s another way to control axes.
5.Make a new collection named Sample of Voters. 6.Add 100 cases to the collection (Choose Collection | New Cases). 7.Double-click the collection. The inspector now shows properties of the collection. There is only one inspector window. You change what it inspects by double-clicking the desired object. 8.In the inspector, create an attribute called vote by clicking <new> and typing the attribute name. We want the values for this attribute to be “yes” and “no.” The values will be drawn from an infinite population, whose proportion of yeses is set by the slider.
11. Close the formula editor. For each case, Fathom will generate a random number between 0 and 1 and evaluate it. If the number is less than the slider’s value, Fathom will give the case the value “yes”; otherwise, Fathom will give the case the value “no.” (The function random( ) has a minimum of 0 and a maximum of 1, unless you specify otherwise.) It’s always good to check your simulations. 12. Make a case table for the collection, and check that you have a roughly even mix of yeses and noes. 13. Delete the case table. 14. Graph the vote attribute. You should now have the three objects shown at right (and the inspector). 15. Choose Collection | Rerandomize several times. Each time you rerandomize, the bars in the graph change, reflecting the results of a new sample. 16. Drag the slider’s thumb to change its value to somewhere around 0.80. The vote is no longer close; it is a slam dunk for the proposition. 17. Move the slider’s thumb back to somewhere near 0.50 to model a close election.
Simulating Repeated Surveys We now have a population whose “true vote” is controlled by a slider, and we want to see how well a sample of 100 people accurately predicts election results (compared with other sample sizes we’ll do later). We need to run the simulation many times to see how well the sampling does in the long run. We could simply rerandomize many times, each time recording the proportion of yeses for each run (our sample statistic of interest), or we can have Fathom do this grunt work for us. In Fathom, this is called collecting measures. First, we need to define measures to collect. 18. In the collection’s inspector, go to the Measures panel by clicking its tab. This looks much like the Cases panel, in that there’s a prompt for creating/naming measures and a Formula column for defining how each measure is computed. The interface for working with measures is similar to that of working with attributes, but measures themselves are different. Whereas attributes have distinct values for each case, a measure has one value for the collection as a whole.
Now that we have defined a measure, we want to collect a lot of them in a new collection (to sample our population repeatedly). Although there’s a command that will do this, let’s use the drag-and-drop method. 22. Make a new, empty collection, putting it to the right of the existing objects.
Let’s look at this collection.
25. Graph these data. We need to do more surveys. The controls for the measures collection are—where else?—in its inspector.
You might want to make a summary table of the collection of samples, showing the proportion of yeses, and run the simulation a few more times. You could change the slider’s value, change the collect measures control to replace existing cases, and see the results of repeated polling when the race isn’t close. But we were investigating sample size, so, when you’re finished experimenting, let’s return to that.
Changing Sample Size The trouble is, we aren’t collecting the sample size. How do we do that? We need another measure, which we need to define for the Sample of Voters collection. 30. Double-click the collection of voters. 31. In the Measures panel, define a second measure, SampleSize, giving it the formula count(). In Fathom, the count function, without any arguments in it (such as We need to start the simulation over, because we’re now collecting the sample sizes. 32. Double-click the measures collection, and check Replace existing cases in the Collect Measures panel. 33. Collect 50 measures. Notice that the case table first empties, then fills, with two attributes this time, one for each of the measures defined. 34. Uncheck Replace existing cases. You want to keep these measures after you change the sample size, not throw them away. 35. Select the collection of sample voters, and add 300 new cases (choose Collection | New Cases).
37. Click this button to collect 50 measures of samples of 400 each. We need to add this information to our measures graph. What we want 38. Drag SampleSize from the measures collection’s case table and drop it on the vertical axis of the measures graph. You get a scatter plot, which isn’t what you want. You need SampleSize to be treated as a categorical attribute, not a numeric one.
We now have a tool to use with the newspaper staff to show them the effects of sample size on polling results: Smaller samples have more spread than larger, for example. |