Cape Area Panel Study 2002-2009, Waves 1-5 Secure Data
Waves 1-5 Secure Data
The Cape Area Panel Study (CAPS) is a longitudinal study of the lives of youths in metropolitan Cape Town, South Africa. The first wave of the study collected interviews from about 4800 randomly selected young people age 14-22 in August-December, 2002. Wave 1 also collected information on all members of these young people’s households, as well as a random sample of households that did not have members age 14-22. A third of the youth sample was re-interviewed in 2003 (Wave 2a) and the remaining two thirds were re-visited in 2004 (Wave 2b). The full youth sample was then re-interviewed in 2005 (Wave 3), 2006 (Wave 4) and 2009 (Wave 5). Wave 3 includes interviews with approximately 2000 co-resident parents of young adults, while wave 4 also includes interviews with a sample of older adults (all individuals from the original 2002 households who were born on or before 1 January 1956) and all children born to the female young adults. The fifth wave comprises all respondents interviewed in any of the Waves 2a, 3 or 4. In 2010 there were telephonic follow-ups or proxy interviewed that tried to capture those that were not successfully interviewed during the course of the 2009 fieldwork. The study covers a wide range of outcomes, including schooling, employment, health, family formation, and intergenerational support systems. CAPS began in 2002 as a collaborative project of the Population Studies Center in the Institute for Social Research at the University of Michigan and the Centre for Social Science Research at the University of Cape Town (UCT). Other units involved in subsequent waves include UCT’s Southern African Labour and Development Research Unit and the Research Program in Development Studies at Princeton University.
The secure version of CAPS 2002-2009 includes date of birth, location (ea number, placename), job and school names and locations, as well as variables used in the processing of the data. The secure version does not include information available in the public release dataset and researchers will have to merge these data with the publicly available data when doing their analyses.
Kind of Data
Sample survey data [ssd]
Unit of Analysis
The unit of analysis for this survey is individuals.
Version 1: Edited, partially anonymised data, available for use through the Secure Data Service
The study covers a wide range of topics on youths in Metropolitan Cape Town, including schooling, employment, health, family formation, and intergenerational support systems.
The lowest level of geographic aggregation for the data is enumeration area.
The survey covered youths in Metropolitan Cape Town, South Africa.
Producers and sponsors
University of Cape Town
University of Michigan
Andrew W. Mellon Foundation
Fogarty International Center
Health Economics and HIV/AIDS Research Division, UKZN
National Institutes of Health
Office of AIDS Research
The CAPS household sample was drawn through a two-stage process. First, the 'enumeration areas' (EAs) used for the 1996 Population Census were divided into three strata according to whether the population of each was predominantly African, predominantly coloured or predominantly white. A sample of primary sampling units (PSUs) was selected within each stratum with probability proportional to size. Within each PSU a sample of 25 screener households was drawn. The Overview and Technical Documentation for Waves 1-2-3-4-5 provides a more detailed discussion of the sampling design. Data users should take the stratification and clustering into account for all analyses. Strata and PSUs are identified by the majpop and cluster variables respectively.
Response rates for the survey are covered in Section 5 on non-response and attrition in the document "The Cape Area Panel Study: Overview and technical documentation: Waves 1-2-3-4-5 (2002-2009)."
The public release data include sample weights that should be used to adjust for the sample design. Three sample weights for wave 1 are included in the data, each of which deal with specific issues.
The first of these weights, "weightsd", adjusts for three critical elements of the sample design: 1) the intentional oversampling of African and white households; 2) the intentional differential sampling of households with and without young adult household members; and 3) the addition of secondary households (backyard shacks) into the sample of screener households in the field. This weight is incorporated into the other two sample weights. The second, weighthr, begins from the first weight and adds additional adjustments for unit non-response at the level of PSUs. The third sample weight, weightyr, is an individual young adult weight that adds additional adjustment for individual non-response. This adjustment is made by calculating response rates for each combination of single years of age, sex, and population group (8x2x3=48 cells) using the information provided on the household questionnaire.
In addition to the three sample design weights, the Waves 1-2-3-4-5 public release data sets include additional weights to adjust for individual young adult non-response in Waves 2, 3, 4 and 5. Since Wave 2 is composed of two sub-waves (Waves 2a & 2b) with different modules asked of different sub-samples, there are three Wave 2 attrition weights. The weight w2a_weightyr corresponds to the Wave 2a sub-sample (approximately one-third of the total CAPS Young adult sample), the weight w2b_weightyr corresponds to the Wave 2b sub-sample (approximately two-thirds of the total CAPS Young adult sample), and the weight w2y_weightyr corresponds to the combined "total" Wave 2 sample. All of these weights are individual young adult weights that add an additional adjustment for individual young adult non-response in Wave 2a, 2b or 2 "total" to the weight weightyr, which adjusts for the sample design and Wave 1 non-response.
Similarly the weights, w3y_weightyr, w4y_weightyr and w5y_weightyr, are individual young adult weights that add additional adjustment for individual young adult non-response in Waves 3, 4 and 5 to the weight weightyr. The adjustment for Wave 2a, 2b, Wave 2 "total", Wave 3, Wave 4 or Wave 5 young adult non-response is made by estimating separate probit models of the probability the respondent completed a Wave 2a, 2b, either of the Wave 2, Wave 3, Wave 4 or Wave 5 young adult questionnaire. Information given in Wave 1 on age, sex and population group was included in the model. As in the construction of the original weight weightyr, the small number of individuals classified as Indian and other were merged with the Coloured group. From the estimation, the predicted probability was inverted and then capped at the 99% percentile to obtain the non-response adjustment.
Dates of Data Collection
Data Collection Mode
• Wave 1 (2002) included a household questionnaire, a young adult questionnaire and a literacy and numeracy evaluation questionnaire
• Wave 2a (2003) and 2b (2004) both included young adult questionnaires only
• Wave 3 (2005) included a household questionnaire, a parent questionnaire and a young adult questionnaire
• Wave 4 (2006) included a household questionnaire, an older adult questionnaire, a young adult questionnaire, a young adult proxy questionnaire and a child questionnaire
• Wave 5 (2009) included a young adult questionnaire, young adult telephonic questionnaire and a young adult proxy questionnaire
The questionaires and technical documentation for use with the secure version of CAPS 2002-2009 should be downloaded from the link to the public access dataset.
University of Cape Town and University of Michigan . Cape Area Panel Study 2002-2009, Waves 1-5 Secure Data [dataset]. Version 1. Cape Town and Ann Arbor: University of Cape Town and University of Michigan [producers], 2015. Cape Town: DataFirst [distributor], 2015. DOI: https://doi.org/10.25828/8y6d-1153