Appendix C: Methodology

Overview

The estimates presented in this report for the unauthorized immigrant population are based on a residual estimation methodology that compares a demographic estimate of the number of immigrants residing legally in the country with the total number of immigrants as measured by a survey—either the American Community Survey or the March Supplement to the Current Population Survey. The difference is assumed to be the number of unauthorized immigrants in the survey, a number that later is adjusted for omissions from the survey (see below). The basic estimate is:

The legal resident immigrant population is estimated by applying demographic methods to counts of legal admissions covering the period since 1980 obtained from the Department of Homeland Security’s Office of Immigration Statistics (U.S. Department of Homeland Security, 2012) and its predecessor at the Immigration and Naturalization Service, with projections to current years, when necessary. Initial estimates here are calculated separately for age-gender groups in six states (California, Florida, Illinois, New Jersey, New York and Texas) and the balance of the country; within these areas the estimates are further subdivided into immigrant populations from 35 countries or groups of countries by period of arrival in the United States. Variants of the residual method have been widely used and are generally accepted as the best current estimates (Baker and Rytina, 2013; Warren and Warren, 2013). See also Passel, Cohn and Gonzalez-Barrera (2013), Passel and Cohn (2008), Passel (2007) and Passel et al. (2004) for more details.

The overall estimates for unauthorized immigrants build on these residuals by adjusting for survey omissions for these six states and the balance of the country, subdivided for Mexican immigrants and other groups of immigrants (balance of Latin America, South and East Asia, rest of world) depending on sample size and state.

Once the residual estimates have been produced, individual foreign-born respondents in the survey are assigned a specific status (one option being unauthorized immigrant) based on the individual’s demographic, social, economic, geographic and family characteristics in numbers that agree with the initial residual estimates for the estimated legal immigrant and unauthorized immigrant populations. These status assignments are the basis for the characteristics reported here (including, for example, specific countries of birth, detailed state estimates and labor force participation). A final step in the weighting-estimation process involves developing final state-level estimates that take into account trends over time in the estimates.

Comparability with Previous Estimates

The estimates presented here for 1990-2012 are internally consistent and comparable across years and states. The 2005-2012 estimates are based on the American Community Survey (ACS); those for 1995 and 2000, on the March Current Population Survey (CPS); and for 1990, on the 1990 Census (produced by Warren and Warren, 2013). The Pew Research Center published estimates for these same dates from essentially these same sources in two previous reports issued since September 2013 (Passel et al., 2014; Passel, Cohn and Gonzalez-Barrera, 2013) and related graphics. These earlier reports also included estimates for 1996-1999, 2001-2004 and 2013 based on March Current Population Surveys—estimates that are also consistent with estimates published here.⁵

The estimates in this report and the previous two reports are based on survey data consistent with the censuses of 1990, 2000 and 2010. For the 1995-2009 surveys, special weights were developed to align with both the preceding and subsequent censuses (see below). As such, population figures for these years are not identical to those published from the original surveys. Moreover, these new estimates of unauthorized immigrants differ from previous estimates published before 2013, even from earlier estimates based on the same surveys. Although differences at the national level are not generally very large, some state-level differences may be relatively greater. The estimates in this report supersede all Pew Research estimates published before September 2013 and the ACS-based estimates in this report supersede earlier estimates for the same date (i.e., 2012) based on the CPS.

The ACS has a much larger sample size than the CPS (see below). As such, state-level estimates of unauthorized immigrants and those for countries of birth are much more precise (i.e., have smaller margins of error) from the ACS than from the CPS. The larger sample sizes also permit more detailed analyses of the characteristics of unauthorized immigrants at the state level and for individual countries of birth.

Rounding of Estimates

All estimates for unauthorized immigrant populations are presented as rounded numbers to avoid the appearance of unwarranted precision in the estimates. The rounding conventions for unauthorized immigrant estimates, dependent somewhat on data sources, are:

Estimates for 1990 are based on the 1990 Census and use ACS-based rounding conventions. These same conventions are used to round the 90% confidence intervals limits, presented as “Range (+ or /),” with one exception—limits that round to less than 5,000 are shown as 2,000. For state and national level data on the total population or total labor force, figures are rounded to the nearest 10,000.

Unrounded numbers are used for significance tests, for plotting charts and for computations of differences and percentages. Where differences are reported, they are computed from unrounded estimates and then rounded separately. Because each figure is rounded separately, the rounded estimates may not add to rounded totals. Similarly, percentages computed from rounded numbers may differ from the percentages shown in this report.

Status Assignments—Legal and Unauthorized Immigrants

Individual survey respondents are assigned a status as a legal or unauthorized immigrant based on the individual’s demographic, social, economic and geographic characteristics so the resulting number of immigrants in various categories agrees with the totals from the residual estimates. The assignment procedure employs a variety of methods, assumptions and data sources.

First, all immigrants entering the U.S. before 1980 are assumed to be legal immigrants. Then, the ACS and CPS data are corrected for known over-reporting of naturalized citizenship on the part of recently-arrived immigrants (Passel et al. 1997). Specifically immigrants in the U.S. less than six years are not eligible to naturalize unless they are married to a U.S. citizen, in which case they can naturalize after three years. Immigrants reporting as naturalized who fail to meet these requirements are moved into the non-citizen category. All remaining naturalized citizens from countries other than Mexico and those in Central America are assigned as legal. Persons entering the U.S. as refugees are identified on the basis of country of birth and year of immigration to align with known admissions of refugees and asylees (persons granted asylum). Then, individuals holding certain kinds of temporary visas (including students, diplomats and “high-tech guest workers”) are identified in the survey and each is assigned a specific legal temporary migration status using information on country of birth, date of entry, occupation, education and certain family characteristics. The specific visa types identified and supporting variables are:

Diplomats and embassy employees (A visa) Foreign students (F, M visa) Visiting scholars (J visa) Physicians (J visa) Registered nurses (H-1A visas) Intracompany transfers (L visas) “High-tech” guest workers (H-1B visas) International organizations (G visas) Religious workers (R visas) Exchange visitors (J visas) Athletes, artists and entertainers (O, P visas) Spouses and children within the various categories

Finally, immigrants are screened on the basis of occupations, participation in public programs and relationships with natives and legal immigrants. Some individuals are assigned as legal immigrants because of these characteristics:

Refugees and naturalized citizens Legal temporary immigrants Persons working for the government or the Armed Forces Veterans or members of the Armed Forces Participants in government programs not open to unauthorized immigrants—Supplemental Security Income (SSI), Temporary Assistance for Needy (TANF), Medicare, Medicaid and Food Stamps Persons entering the U.S. before 1980 Persons with certain occupations that require legal status or government licensing (e.g. police officers and other law enforcement occupations, lawyers, health care professionals) Children of citizens and legal temporary migrants Most immediate relatives of U.S. citizens Other family members, especially those entering the U.S before legal residents

As result of these steps, the foreign-born population is divided between individuals with “definitely legal” status (including long-term residents, naturalized citizens, refugees and asylees, legal temporary migrants and some legal permanent residents) and a group of “potentially unauthorized” migrants. (See Passel, 2007 and Passel et al., 2004 for additional detail.)

The number of potentially unauthorized migrants typically exceeds the estimated number of unauthorized migrants (from the residual estimates) by 20-35% nationally. So, to have a result consistent with the residual estimate of legal and unauthorized immigrants, probabilistic methods are employed to assign legal or unauthorized status to these potentially unauthorized individuals. The base probability for each assignment is the ratio of the residual estimate to the number of potentially unauthorized immigrants. These initial probabilities are first adjusted separately for parents living with their children and all others (to ensure that an appropriate number of unauthorized children are selected) and then by broad occupation categories.

After this last step in the probabilistic assignment process, there is a check to ensure that the legal statuses of family members are consistent; for example, all family members entering the country at the same time are assumed to have the same legal status. The resulting populations for unauthorized immigrants are compared with the residual estimates; if they disagree, the assignment probabilities are adjusted and the random assignments are repeated. The entire process requires several iterations to produce estimates that agree with the demographically-derived population totals. At the end, the final estimates agree with the residual estimates for the six individual states noted earlier and for the balance of the country; for Mexican-born and other legal and unauthorized immigrants in each area; and for children, working-age men and working-age women within each category. Finally, the survey weights for the foreign-born are adjusted upward for survey omissions (undercount) so the tabulated figures agree with the adjusted analytic, demographic estimates of the total number of legal and unauthorized migrants developed in the very first step.

Data Sources and Survey Weights

The American Community Survey is an ongoing survey conducted by the U.S. Census Bureau. The survey collects detailed information on a broad range of topics, including country of birth, year of immigration and citizenship—the information required for the residual estimates. The ACS has a continuous collection design with monthly samples of about 250,000; the nominal annual sample size was about 2.9 million households for 2005-2009 with about 1.9 million included in the final sample. The initial sample was expanded to almost 3.3 million addresses for 2011 and over 3.5 million for 2012; the final sample included more than 2.1 million address in 2011 and almost 2.4 million in 2012 (http://www.census.gov/acs/www/methodology/sample_size_data/index.php).

For this report, public-use samples of individual survey records from the ACS are tabulated to provide the data used in the estimation process. The public-use file is a representative 1% sample of the entire U.S. (including about 3 million individual records for each year 2005-2012) obtained from the Integrated Public-Use Microdata Series or IPUMS (Ruggles et al., 2010). The ACS began full-scale operation in 2005 covering only the household population; since 2006 it has covered the entire U.S. population. ACS data are released by the Census Bureau in September for the previous year.

The other survey data source used for residual estimates comes from March Supplements to the Current Population Survey. The CPS is a monthly survey currently of about 55,000 households conducted jointly by the U.S. Bureau of Labor Statistics and the Census Bureau. Since 2001, the March supplement sample has been expanded to about 80,000 households; before then, the expanded March Supplement sample included about 50,000 households (U.S. Census Bureau, 2006). The CPS universe covers the civilian noninstitutional population. The CPS was redesigned in 1994 and, for the first time, included the information required for the residual estimates (i.e., country of birth, date of immigration and citizenship). Some limitations of the initial March Supplement of redesigned CPS, 1994, preclude its use in making these estimates, so the first CPS-based estimates are for March 1995. CPS data are released by the Census Bureau in September for the previous March

Population figures from both the ACS and CPS are based on the Census Bureau’s official population estimates for the nation, states and smaller areas through a weighting process that ensures the survey figures agree with pre-specified national population totals by age, sex, race and Hispanic origin. At the sub-national level, the two surveys differ in their target populations. The March CPS data agree with state-level totals by age, sex and race and are based on a process that imposes other conditions on weights for couples (U.S. Census Bureau, 2006). The ACS weights use estimates for much smaller geographic areas that are summed to state totals (U.S. Census Bureau, 2014– especially Chapter 11).

The population estimates for the surveys are based on the latest available figures at the time the survey weights are estimated. This process produces the best estimates available at the time of the survey, but it does not guarantee that a time series produced across multiple surveys is consistent or accurate. Significant discontinuities can be introduced when the Census Bureau changes its population estimation methods, as it did several times early in the 2000s and in 2007 and 2008 (Passel and Cohn, 2010), or when the entire estimates series is recalibrated to take into account the results of a new census.

The estimates shown for unauthorized immigrants and the underlying survey data are derived from ACS IPUMS 1% samples for 2005-2012 and March CPS public-use files for 1995 and 2000, which have been reweighted to take into account population estimates consistent with the 1990 Census, the 2000 Census, the 2010 Census and the most recent population estimates. The population estimates used to reweight the ACS for 2005 through 2009 are the Census Bureau’s intercensal population estimates for the 2000s (http://www.census.gov/popest/data/intercensal/index.html); these population estimates use demographic components of population change for 2000-2010 and are consistent with both the 2000 and 2010 censuses. Similarly, the population estimates used to reweight the CPS for March 1995 and March 2000 are the intercensal population estimates for the 1990s (U.S. Census Bureau, 2013), which are consistent with the 1990 and 2000 censuses. The ACS data for 2010-2012 do not require reweighting as they are weighted to recent population estimates based on the 2010 Census. The original 2005 ACS covered the household population, but not the population living in group quarters (about 8 million people). For Pew Research Center analyses, we augmented the 2005 ACS with group quarters records from the 2006 ACS but weighted to agree with the 2005 population estimates. The reweighting methodology for both the ACS and CPS follows, to the extent possible, the methods used by the Census Bureau in producing the sample weights that equal the population totals. See Passel, Cohn and Gonzalez-Barrera 2013 for more details on weighting and adjustments for survey undercoverage.

Because of the much, much larger sample size in the ACS (3.1 million sample cases in 2012 including more than 350,000 foreign-born cases) than the March CPS (203,000 sample cases in 2012 with about 26,000 foreign-born), the ACS-based estimates should be considered more accurate than the CPS-based estimates. In this publication, we have replaced the previous CPS-based estimate for 2012 with the new ACS-based estimate.

Other Methodological Issues

Adjust for Undercount

Adjustments for omissions from the surveys (also referred to as adjustments for undercount) are introduced into the estimation process at several points. The initial comparisons with the survey (based on the equation shown above) take the difference between the immigrants in the survey and the estimated legal population. Since the comparison is people appearing in the survey, the estimated legal population must be discounted slightly because some legal immigrants are missed by the survey. This initial estimate represents unauthorized immigrants included in the survey. To estimate the total number of unauthorized immigrants in the country, it must be adjusted for those left out. Similarly, the estimated number of legal immigrants appearing in the survey must also be adjusted for undercount to arrive at the total foreign-born population.

These various coverage adjustments are done separately for groups based on age, sex, country of birth and year of arrival. The patterns and levels of adjustments are based on Census Bureau studies of overall census coverage (see U.S. Census Bureau, 2012 for links to evaluation studies of the 1980, 1990, 2000 and 2010 Censuses; also Passel, 2001) that are adjusted up or down to reflect the results of a number of specialized studies that focus on immigrants. Census Bureau undercount estimates have generally been subdivided by race/Hispanic origin, age, and sex. So the adjustments to the Pew Research Center data use rates for countries of birth based on the predominant race of immigrants from the country—Hispanic and non-Hispanic races for white, black and Asian. Undercount rates for children do not differ by gender, but for younger adults (ages 18-29 and 30-49) the undercount rates for males tend to be higher, and for some groups much higher, than those for females. At older ages, the undercount rates are lower than for younger adults with no strong patterns of gender differences (and with some estimated overcounts).

The basic information on specific coverage patterns of immigrants is drawn principally from comparisons with Mexican data, U.S. mortality data and specialized surveys conducted at the time of the 2000 Census (Van Hook et al., 2014; Bean et al., 1998; Capps et al., 2002; Marcelli and Ong, 2002). In these studies, unauthorized immigrants generally have significantly higher undercount rates than legal immigrants who, in turn, tend to have higher undercounts than U.S. natives. More recent immigrants are more likely than longer-term residents to be missed. The most recent study (Van Hook et al., 2014) finds marked improvements in coverage of Mexicans in the ACS and CPS between the late 1990s and the 2000s. This and earlier work suggest very serious coverage problems with immigrants in the data collected before the 2000 Census but fewer issues in the 2000 Census and subsequent data sets. This whole pattern of assumptions leads to adjustments of 10% to 20% for the estimates of unauthorized immigrants in the 1995-2000 CPS, with slightly larger adjustments for unauthorized Mexicans in those years. (Note that this means even larger coverage adjustments, sometimes exceeding 30% for adult men younger than age 40.)

After 2000, the coverage adjustments build in steady improvements in overall coverage and improvements specifically for Mexican immigrants. The improvements are even greater than noted in the research comparing Mexico and U.S. sources because the reweighted ACS and CPS data provide even greater improvements in reducing undercounts, since they incorporate results of the 2010 Census (Passel and Cohn, 2012). With all of these factors, coverage adjustments increase the estimate of the unauthorized immigrant population by 8% to 13% for 2000-2009 and by 5% to 7% for 2010-2012. For the overall immigrant population, coverage adjustments hovered slightly below 5% during the 1990s and trended downward to around 2% to 3% by 2012. Since the population estimates used in weighting the ACS and the CPS come from the same sources, the coverage adjustments tend to be similar.

Margins of Error

Estimates of the unauthorized immigrant population are computed as the difference between a deterministic, administratively based estimate (i.e., the legal foreign-born population, or “L” in the equation above) and a sample-based estimate (i.e., the survey total of the foreign-born population, or “F”). Consequently the margin of error (or variance) for the estimated unauthorized population is the margin of error for “F,” the sample-based estimate of the foreign-born population in the estimates for the U.S. and the six largest states. Thus, for these areas, the margins of error are based on the variance of the foreign-born population entering since 1980. For other states, for countries and regions of birth and for characteristics other than the total number of unauthorized immigrants (e.g., numbers in the labor force), the margins of error are based on the estimated populations themselves and not the larger number of foreign-born who entered since 1980.

For all ACS, variances were computed with replicate weights supplied for the ACS by the Census Bureau through IPUMS (Ruggles et al., 2010; documentation of the weights at U.S. Census Bureau, 2014, especially Chapter 12); for earlier CPS data, generalized variance formulas supplied in Census Bureau documentation were used to compute margins of error (U.S. Census Bureau, 2012b, especially Appendix G).

The ranges reported represent a 90% confidence interval around the estimates. They take into account the sampling error associated with the survey-based estimate. Other sources of potential error—including the variability associated with the random assignment of legal statuses, potential errors in the status assignment process and non-sampling error in the surveys—are not represented in the reported margins of error. For this report, statistical tests rely on a 90% confidence level.

Countries and Regions of Birth

Some modifications in the original CPS countries of birth were introduced to ensure that all foreign-born respondents could be assigned to a specific country or region of birth. See Passel and Cohn (2008) for a detailed treatment of how persons with unknown country of birth were assigned to specific countries.

Defining regions of the world and, in some cases, specific countries using the various data sources requires grouping areas into identifiable units and “drawing lines” on the world map. In the historical data used to construct the legal foreign-born population, it is not possible to differentiate the individual republics within the former Soviet Union. Thus, for analytic purposes in this report, the former republics are grouped together and considered to be part of Europe. For this report, China, Hong Kong and Taiwan are combined and reported as “China” because of potential inconsistencies between the administrative data sources and the surveys and because of concerns over consistency of reporting on the part of respondents. South and East Asia is defined to include Afghanistan, Pakistan and countries east of them. The Middle East includes Southwest Asia from Iran and westward plus countries in North Africa. Data for North and South Korea are not generally separated in the survey data used for the estimates. Thus, data reported for persons born in Korea cover both North and South Koreans; the vast majority of Korean immigrants in the U.S. are from South Korea.

One exception is published estimates for 2012 based on the March 2012 Current Population Survey. These have been superseded by the ACS-based estimates in this report and the previous one (Passel et al. 2014).↩