Goodness of Fit (Chi-Square) Test from Raw Data

Top  Previous  Next

How To... > Work with Statistical Objects > Test Hypotheses > Goodness of Fit (Chi-Square) Test from Raw Data

 

When you have data with categorical information, you can use a goodness of fit test to determine whether or not the distribution of values among the categories conforms to what was expected. The most common situation is that all categories are expected to be equal.

1.Create a new test by dragging one from the shelf or by choosing Object | New | Hypothesis Test.
2.From the test’s pop-up menu, choose Goodness of Fit.
3.Drop a categorical attribute on the prompt at the top of the object.

As an example, suppose you have rolled a particular die 100 times, with the results shown in this table. Just by inspection, you suspect that the die is loaded in favor of 6. But couldn’t this happen by chance?

The result shows that, yes, it could happen by chance, but only about two times out of a thousand; therefore, you are justified in claiming that the die is loaded.

But not every situation tests for equal probability for each category. For example, you are told what the proportion of different colors of M&M candies is supposed to be. You want to see, given a random sample of 120, whether those proportions are reasonable.

_img162

 

 

After making the test and dropping the attribute in the top pane of the test, click on “are not equally likely” to bring up a pop-up menu. From that menu, choose have probabilities given above. This creates a second column in the table that, initially, has all equal probabilities. Edit these values to correspond to the probabilities you are testing against.

In the example at right, the p-value of 0.2 tells us that it would not be unusual to get a set of counts this different (or more different) when sampling from a population with the given proportions for each color.

_img502

Choose Test | Show Test Statistic Distribution to bring up a graph of the chi-square distribution in which the shaded area corresponds to the probability of getting a chi-square as great or greater than that observed if the null hypothesis were true.

This plot corresponds to the M&M situation discussed above. The shaded area is 20% of the total area under the curve

_img503