Importing US Census Microdata |
Tutorials > Importing US Census Microdata Since 1790, the U.S. Census Bureau has conducted a thorough survey of the American population once every ten years. The first censuses were primarily concerned with the number of people, so the federal government could make decisions about representation and taxation. Today, the U.S. Census Bureau collects a variety of data, including age, sex, race, national origin, marital status, and education. The most detailed information published by the U.S. Census Bureau is called microdata, or data about individuals. Using Fathom, you can import samples of census microdata from 1850 to 2000. The attributes you get depend on the questions asked in a particular year. You can use these data to explore many characteristics of the American people. This tutorial focuses first on racial diversity, then on school attendance. Note: For this tutorial to work, your computer must be connected to the Internet. If you’ve done the normal installation of Fathom, you will have the files you need in place. (If not, make sure the Fathom application folder contains the Helpers folder, which contains the ImportSpecs folder, which contains the file IPUMS_USA_InterfaceSpec.xml.)
Getting to Know the Data 1.In a new Fathom document, choose File | Import | U.S. Census Data.
4. Click Year and Location, and read the attributes. 5. Move your cursor over Urban or Rural. The status bar at the lower left of the Fathom window shows information about the attribute that describes the attribute and tells the years for which it is available: Skim through the list of attributes, clicking on headings of interest in the left pane, and reading about the attributes in the right pane. For now, we’ll keep the default request but add one more attribute. 6. In Education, check School Attendance. 7. Click Download Data. Fathom connects to the Internet and submits your request to IPUMS (Integrated Public Use Microdata Series, at the University of Minnesota), which has a searchable database of census microdata samples. Fathom decodes and imports the results into a collection. (If left coded, all data would be in the form of numbers, rather than, for example, “male” and “female.”)
To compare racial diversity, we’d like to quantify it. We’ll do this in two ways. First, we’ll look at the proportion of the majority race (the smaller that proportion is, the more racially diverse an area is), and then we’ll look at how many different racial groups live in an area. We’ll use a summary table. 10. Drag a summary table from the shelf.
Right now, we’re investigating the majority race, so we’re looking at the proportion of whites. We can make a new summary table to calculate only this proportion.
The collection has been connected to the summary table, so it “knows” about the collection and its attributes. We can edit the formula to calculate only the proportion of cases whose race is white. 15. Double-click the formula to show the formula editor, and delete the existing formula. We could type the formula we want, but, instead, we’ll use the attribute and function list in the formula editor itself.
Now we want Fathom to calculate how many distinct racial types are in this sample. 20. Choose Summary | Add Formula (the menu won’t appear unless the summary table is selected).
Your numbers may be a bit different from those shown here. When IPUMS has more cases available than we are asking for, we get a simple random sample. Try downloading data several times to see how much the numbers change. We now have a rough idea of the racial diversity for the United States as a whole. We want to look at how the diversity varies around the country. When we ask for different data, our data will be replaced, so we need a record of the values we got for the country as a whole. We can make a picture of this table and keep it for future reference. 22. Make the summary table a good size—as small as possible but still showing all the information. 23. Select the summary table and choose Edit | Copy As Picture. 24. Click in a blank place in the document to deselect the table. 25. Choose Edit | Paste Picture. This isn’t a live summary table and won’t change when we change the data. (You might also want to have a picture of the graph of race.)
Changing the Cases Requested Now we will change the request from its default of all of the country to one state.
School Attendance You could continue the exploration of racial diversity, looking at different states and metropolitan areas to find the most and least racially diverse places in the United States. But let’s move on and look at some of the other attributes.
31. Go to the Microdata panel of the inspector. 32. Click the Years heading in the Choosing Cases list. 33. Check the boxes for 1850, 1900, 1940, 1970, and 2000, and submit the request. 34. When the data come in, graph Census_year. Notice that you don’t get a dot plot; you get a bar chart, instead. Fathom is treating this attribute as categorical. (To learn more about why, see Fathom Help: Attributes with Category Sets) It would be nice to see the years in chronological order.
39. Drop School_attendance in the middle of the ribbon chart of year. You now have a time series display. The vertical bands are the census years, and the legend patterns show changes in proportions of the population that are in or not in school for that census year.
Adult Education Our histogram showed something about the age range of those in school. We can use filtering to look more closely at the schooling of children or of adults. First let’s look at just the children. 42. Select the ribbon chart and choose Object | Add Filter. The formula editor appears. The formula entered for the filter tells Fathom what cases to keep in
44. Select the collection and choose Object | Add Filter.
Going Further •Try downloading data for California in the years 1850 and 2000. Make a ribbon chart with Census_year on the horizontal axis and Sex in the middle. What do you see? What’s going on here? (Hint: Who came to California before 1850? Note: The 1850 census did not count Native Americans.) Verify your hypothesis by getting more data, such as Occupation. How long did it take for the sexes to even out? (Find out by getting some years in between.) •Explore the idea that people now move around more than did people in the past. (Make an attribute with a formula that compares people’s current state with their birthplace, such as if(Birthplace_General = State_FIPS_code), “Same State”, “Moved” |