May 12, 2009

Minorities, Immigrants and Homeownership

Appendix B: Data Sources and Methodology

This report uses data from a number of sources. Trends in homeownership are based on the analysis of Current Population Survey (CPS) data. The analysis of higher-priced loans utilizes data collected under the Home Mortgage Disclosure Act (HMDA). Foreclosure rates for the nation and U.S. counties were provided by RealtyTrac®. The statistical model that examines the relationship between foreclosure rates and the demographic and economic characteristics of counties combines data from RealtyTrac®, the American Community Survey (ACS), HMDA, the Bureau of Labor Statistics (BLS) and the Federal Housing Finance Agency (FHFA).


The CPS is a monthly survey of approximately 55,000 households conducted by the U.S. Census Bureau for the Bureau of Labor Statistics. The homeownership status of the householder is noted in the survey each month. However, the microdata files released for public use by the Census Bureau do not contain that information. The Census Bureau instead releases the homeownership data on its website a few months after the fact. The Pew Hispanic Center collected the monthly homeownership data from January 1995 to June 2008 and appended those to the CPS public use microdata files.

The study reports trends in homeownership on an annual basis. Those are derived by combining the 12 monthly CPS files into a single annual file. The CPS sample design calls for a household to be interviewed for two periods of four consecutive months separated by a gap of eight months. This means that there can be multiple records for the same household within any calendar year. To avoid the duplication of records within an annual file, only the records of households in their fourth and eighth month of interviews were retained in the sample (in the terminology of the CPS, the annual file consists of outgoing rotation months only).

The typical annual CPS file constructed in that manner consisted of about 160,000 households. There are two notable exceptions. The homeownership variable was not available for the months of March 2001 and December 2003; therefore, the annual files for 2001 and 2003 are 11-month files consisting of about 150,000 households each. Also, the estimates for 2008 are based on a six-month file, from January through June, of about 81,000 households.

Information on people’s nativity was not collected on a regular basis in the CPS until 1995. Therefore, the analysis in this study begins in 1995. There have been several revisions of the CPS since 1995, but they are not believed to have had an impact on the principal variable of interest—homeownership. One study (Masnick, McArdle and Belsky, 1999) suggests that revisions made to the CPS in 1994 affect the comparability of homeownership data from 1994 onwards with earlier years. In particular, the study argues that measured increases in homeownership between 1993 and 1996 are exaggerated by revisions of the CPS. That is not an issue for this study because the analysis begins in 1995.

Higher-Priced Loans

Data on the number and characteristics of higher-priced loans are from the Home Mortgage Disclosure Act. The data, tabulations from the data and additional information are available at Under the terms of the act, mortgage lenders in metropolitan areas report information on their lending activity and major characteristics of the borrowers to the U.S. government. HMDA data encompass about 80% of all home-related lending in the U.S.

The 2007 HMDA data contain information on more than 21 million applications for home loans. Those consist of applications for home purchase (about 7 million), refinance (about 12 million) and home improvement (about 2 million).

This study is limited to conventional home purchase loans for owner-occupancy of one- to four-family homes, first liens only. Also, loans that are missing an applicant’s gender, ethnicity or other key information are excluded. That limits the sample to about 4 million loan applications and 3 million loan originations.

HMDA data for 2006 were used in the analysis of differences in foreclosure rates across U.S. counties. Loan data were grouped by county to compute the following two variables: the county average of the loan amount as a percent of income and the percent of higher-priced loans to Hispanics, blacks and whites in a county.

Foreclosure Rates and the Characteristics of U.S. Counties

Data on foreclosure rates in U.S. counties were provided by RealtyTrac® ( Those data were available for all 3,141 U.S. counties. Data from other sources were matched to the foreclosure data to analyze the relationship between foreclosure rates and counties’ economic and demographic characteristics.

Demographic characteristics of U.S. counties were tabulated from the American Community Survey, Public Use Microdata Sample, 2005-07. That file is a three-year sample of the ACS consisting of about 3.5 million household records and describes the average characteristics of the U.S. population from 2005 to 2007.

The ACS includes geographic identifiers for areas with populations of 100,000 or more, known as Public Use Microdata Areas (PUMAs). Using a program developed by Jeffrey S. Passel of the Center, it was possible to map data for PUMAs into 3,140 counties. When a PUMA was matched into a group of counties, the same characteristics were assigned to all counties within the PUMA.

The specific demographic characteristics of counties computed from the ACS file were as follows: the race, ethnicity and nativity of the householder population in a county; the homeownership rate by the race, ethnicity and nativity of householders in a county; the race, ethnicity and nativity of homeowners in a county; and the race, ethnicity and nativity of mortgage holders in a county.

The unemployment rate in a county was determined from the local area unemployment database of the Bureau of Labor Statistics ( The county unemployment rate used in the statistical models is an average of the monthly, nonseasonally adjusted, unemployment rates from January through November of 2008.

Home price appreciation, or depreciation, in a county is measured by the change in the House Price Index (HPI) from the fourth quarter of 2005 to the fourth quarter of 2007. The HPI is estimated by the Federal Housing Finance Agency (FHFA; for all metropolitan statistical areas (MSAs) in the U.S. MSA-level estimates were assigned to all counties within a specific MSA. As a result, HPI estimates for a total of 1,086 metropolitan counties were obtained for the analysis of differences in foreclosure rates across counties.

The estimates presented in the study, specifically in Tables A4 and A5, are representative of the results obtained from a number of different statistical models. One variant that was estimated excluded counties from California, Florida, Arizona and Nevada from the sample. That resulted in a somewhat weaker, but still statistically significant, relationship between foreclosure rates and the shares of immigrants in a county’s population. Another variant included higher-priced loans for both home purchase and refinance in the analysis. There was no notable change in the resulting estimates.

In other variants of the statistical model, the foreclosure rate was altered to align it more closely with the population of homeowners. First, the homeownership rate in a county was divided into the foreclosure rate. The result is an estimate of the share of owner-occupied housing units in a county that entered into foreclosure, as opposed to the share of all housing units in a county that entered into foreclosure.

Second, the share of homeowners in a county with a mortgage was divided into the foreclosure rate. That was done because only homeowners with mortgages face the risk of foreclosure. The resulting foreclosure rate is an estimate of the share of homeowners with mortgages who entered into foreclosure. In both variants, the list of regression variables was suitably modified to align with the newly defined foreclosure rate. Results from the estimation of alternate models are available upon request.