Survey ID Number
zaf-statssa-lfs-2000-sep-v2.1
Title
Labour Force Survey 2000, September
Version Notes
The South African September 2000 LFS dataset was originally released in 2001 as 4 data files (household, worker, person and stratum\_psu). A second version was downloaded from the Statistics South Africa website subsequent to that in March 2006 by DataFirst. This version differed slightly from the originally obtained release. Most notably, weights were recast to reflect population estimates released in February 2005. This version was also benchmarked to the 2001 South African census (whereas previously it had been benchmarked to the 1996 South African census). As a result, the weight variables in each data file differ between versions 1.0 and 2.0. The second version (version 2.0) also has several extra observations. The source of these extra data is unclear. Specifically,
1) 31 extra observations are in the household data file
2) 1 extra observation are in the person data file
3) 18 extra observations are in the worker data file
A third version (version 2.1) was downloaded by DataFirst on 11 August 2011 as 3 data files (the other three data files subsumed the, originally separately released, stratum\_psu datafile) which differed slightly from version 2.0 in the following ways:
1) The suffix "\_Sep2000" no longer appears on all variable names
2) Year and Month variables were added
3) Variable labels were altered. Previously, all variable labels were literal questions. Now the variable labels describe the variables.
4) A number of variables were renamed (beyond dropping the suffix “\_Sep2000”). For example, Q710Lght\_Sep2000 in version 2.0 is now Q710Ligh in version 2.1. This could prove confusing when comparing between current and previous versions of the data file(s).
5) Version 2.1 and 2.0 also have some substantive differences:
The most significant of these is the apparent switch of one of the variable names and labels in the household/general data files. To clarify, Question 7.25 in the LFS Household questionnaire pertains to proximity to transport. Version 1.1 has entries for question 7.25a that relate respondent proximity to trains, but in the later version this variable relates proximity to taxis. The same is true for question 7.25b (i.e. the converse is also true). Summarily, the data relating respondent proximity to trains and taxis has been muddled up in version 2.0. This mistake would seem to be within version 2.0 only, as version 2.1 agrees with the original (version 1.0).
Furthermore, the variable reflecting the way in which the household receives mail (in the household data file) in version 2.0 has no value labels and does not match up with the values in version 2.1. Version 2.1 aggregates the value labels of version 2.0 in groups of 10. For example, in version 2.0 there are variables labelled 11, 12 and 19, which are grouped into decades in version 2.1. So entries of 11, 12 and 19 all take on the value 1 in the later version. In version 2.0, the sum of the number of observations within decades equals the sum of observations equal to 1 in version 2.0 (value label: "Delivered to the dwelling"). It is unclear as to the source of the distinction between these variables (it may be arbitrary, an artifact of ASCII to STATA conversion for example).
The South African September 2000 LFS dataset was originally released in 2001 as 4 data files (household, worker, person and stratum psu). A second version was downloaded from the Statistics South Africa website subsequent to that in March 2006 by DataFirst. This version differed slightly from the originally obtained release. Most notably, weights were recast to reflect population estimates released in February 2005. This version was also benchmarked to the 2001 South African census (whereas previously it had been benchmarked to the 1996 South African census). As a result, the weight variables in each data file differ between versions 1.0 and 2.0. The second version (version 2.0) also has several extra observations. The source of these extra data is unclear. Specifically,
• 31 extra observations are in the household data file
• 1 extra observation are in the person data file
• 18 extra observations are in the worker data file
A third version (version 2.1) was downloaded by DataFirst on 11 August 2011 as 3 data files (the other three data files subsumed the, originally separately released, stratum psu datafile) which differed slightly from version 2.0 in the
following ways:
1. The suffix ” Sep2000” no longer appears on all variable names
2. Year and Month variables were added
3. Variable labels were altered. Previously, all variable labels were literal questions. Now the variable labels describe the variables.
4. A number of variables were renamed (beyond dropping the suffix Sep2000). For example, Q710Lght Sep2000 in version 2.0 is now Q710Ligh in version 2.1. This could prove confusing when comparing between current and previous versions of the data file(s).
Version 2.1 and 2.0 also have some substantive differences:
Household/general data file
The most significant of these is the apparent switch of one of the variable names and labels in the household/general data files. To clarify, Question 7.25 in the LFS Household questionnaire pertains to proximity to transport. Version 1.1 has entries for question 7.25a that relate respondent proximity to trains, but in the later version this variable relates proximity to taxis. The same is true for question 7.25b (i.e. the converse is also true). Summarily, the data relating respondent proximity to trains and taxis has been muddled up in version 2.0. This mistake would seem to be within version 2.0 only, as version 2.1 agrees with the original (version 1.0).
Furthermore, the variable reflecting the way in which the household receives mail (in the household data file) in version 2.0 has no value labels and does not match up with the values in version 2.1. Version 2.1 aggregates the value labels
of version 2.0 in groups of 10. For example, in version 2.0 there are variables labelled 11, 12 and 19, which are grouped into decades in version 2.1. So entries of 11, 12 and 19 all take on the value 1 in the later version. In version 2.0, the sum of the number of observations within decades equals the sum of observations equal to 1 in version 2.0 (value label: ”Delivered to the dwelling”). It is unclear as to the source of the distinction between these variables (it may be arbitrary, an artifact of ASCII to STATA conversion for example).