The Community Survey is a nationally representative, large-scale household survey which is designed to provide information on the extent of poor households in South Africa, their access to services, and levels of unemployment, at national, provincial and municipal levels. The main objectives of the survey are:
1. To fill data gaps between national population and housing censuses
2. To provide estimates at lower geographical levels than existing household surveys
3. To build capacities for the next census round
4. To provide inputs to the mid-year population projections.
Kind of Data
Sample survey data [ssd]
Unit of Analysis
v1: Edited, anonymised dataset for public distribution.
The scope of the Community Survey includes:
Particulars of dwelling
Location and description of dwelling unit. This section was completed by the interviewer.
Questions on demographics, migration, general health and functioning, parental survival, education, employment, income and social grants and fertility. This section was completed for all household members and visitors who were present on the night of the 6/7th March 2016
Housing, household goods, services and crime, and agricultural activities
Perception questions on satisfaction with basic service, questions on housing, household goods and services, crime, agricultural activities and food security
Mortality: Questions on sex, age, year and month of death and maternal mortality for each member of the household who passed away 12 months prior to the reference night of the survey
However, not all the data collected has been published in the 2017 release of the Community Survey 2016 dataset. There are meant to be four data files. These are files for households, persons, mortality, and emigration. Emigration data collected included sex, age, country of residence and year moved for each member of the household who has emigrated to another country since March 2006 and is still residing there. The emigration file is currently not available, though. Statistics SA has not provided an explanation for the missing file.
The Community Survey 2016 is also missing employment and income data. Data on employment type and employment status data was collected with questions 3.7.6 - 18.104.22.168 of the questionnaire. Income data was collected with questions 3.7.7. - 22.214.171.124. According to Statistics SA, the data from these questions was not released because changes in collection methodologies resulted in this data not being comparable with the employment and income data in the Quarterly Labour Force Survey.
morbidity and mortality [14.4]
social conditions and indicators [13.8]
economic conditions and indicators [1.2]
health care and medical treatment [8.5]
specific social services: use and provision [15.3]
The survey covered the whole of South Africa.
The lowest level of geographic aggregation of the data is local municipality.
The Community Survey covered all de jure household members (usual residents) in South Africa. The survey excluded collective living quarters (institutions) and some households in EAs classified as recreational areas or institutions.
Producers and sponsors
Statistics South Africa
The sampling procedure that was adopted for the Community Survey was a two-stage stratified random sampling process. Stage one involved the selection of enumeration areas, and stage two was the selection of dwelling units. Since the data are required for each local municipality, each municipality was considered as an explicit stratum. The stratification is done for those municipalities classified as category B municipalities (local municipalities) and category A municipalities (metropolitan areas) as proclaimed at the time of Census 2001. However, the newly proclaimed boundaries as well as any higher level of geography, such as province or district municipality, were considered as any other domain variable based on their link to the smallest geographic unit - the enumeration area.
The main goal of CS 2016 is to produce estimates of key indicators at local municipality level. The sample was designed such that direct survey estimates for these indicators could be produced at municipal level. The weighting approach is based on the sample design. Information on weighting can be found in the technical report.
Dates of Data Collection
Data Collection Mode
Statistics South Africa
The CS 2016 questionnaire consisted of six main sections, 11 sub-sections and a total of 225 questions. A first draft of the paper questionnaire was developed in February 2015 and various versions were reviewed and updated thereafter based on discussions with stakeholders. The target population of the survey was all persons in the sampled dwelling who were present on the reference night (i.e. the night between 6 and 7 March 2016). The final CAPI questionnaire was made up of three person rosters. One roster was utilised for the person information, one roster for emigration and one roster for mortality.
Data processing refers to a class of programmes that organise and manipulate usually large amounts of numeric data. Data processing involved the processing of completed questionnaires. Information received from questionnaires collected during fieldwork was converted into data represented by numbers or characters. The two methods used for this conversion were manual capturing (key-entry) and scanning. The scanning method was used as the main process and the key-entry application was used for questionnaires that were damaged and not scannable.
In general, the high-level processes covered the following activities:
Boxes were received and questionnaires were checked to ensure that:
1) they belonged to the box; and
2) were not damaged.
Data were then captured and converted into electronic format through scanning or Key-from-Paper (KFP). Thereafter, an account of all sampled dwelling units was prepared and data were balanced to verify whether the data collected for each household contained the four sections – General, Persons, Mortality, and Household.
Data were then checked for consistency and prepared for final output based on the tabulation plan.
Two methods were used for capturing the data, namely scanning and manual capturing (key-entry).
The scanning process proceeded as follows:
The data processor scanned the box number, and then entered the estimated number of pages in each batch. At this stage, the batches were ready to be scanned. One box at a time was given to each of the six Scanning Operators to avoid scanning the questionnaires twice. The batches were then taken out of the box and placed next to the tray on the scanner. The box number was then scanned using the small hand-held scanner and the number of pages per batch was entered into the Input Station.
A visual check was performed on the scanning to ensure that the images were clear of any noise and that the data were clear and readable. The barcode as well as the actual data on the questionnaire was checked. In the case where the image was either too light or too dark, parameters were adjusted and the batch was rescanned. Validations were automatically executed to confirm scanning parameters and image
quality. Questionnaires that could not be scanned were de-activated from their boxes and assigned to a new box. Images were transferred to the server and their barcodes were tracked. These questionnaires were then sent to Key-from-Paper.
Manual Capturing (Key-from-Paper)
Key-from-Paper (KFP) is an application for manual data capturing. The application was developed to capture questionnaires that were not suitable for scanning. Such questionnaires included those which were torn or where pencil entries were not bold enough for the interpretation of the scanner, or those that were in a bad condition. Duplicate application was created for quality assurance purposes. The same questionnaires that were captured in application one, were also captured on application two. Each questionnaire captured in both applications, was compared to one another using corresponding fields. Validation checks were not implemented in the applications. The application was used by data processors to capture information as was reflected on the questionnaires. EA and DU numbers were placed into the look-up table to validate the sampled frame. In cases where an EA or DU was found to be invalid, the EA Summary Book was then used for corrections.
Coding of Open-Ended Questions
Coding is the process of assigning numerical values to responses to facilitate data capturing and processing in general. The code lists for occupation and industry were based on the International Standard Classifications done to the five-digit level. The variables covered were occupation, industry, and place names.
University of Cape Town
Public use files, accessible to all.
Statistics South Africa. Community Survey 2016 [dataset]. Version 1. Pretoria: Statistics South Africa [producer], 2017. Cape Town: DataFirst [distributor], 2017. 10.25828/12sy-yj26