Census 2001: 10% Sample of unit records (Version 1)

 

Thank you for choosing the Census 2001 10% sample of unit records.

We hope that you will find this product useful.

 

1.            GENERAL INFORMATION

 

This file sets out the information that you will need to access the data provided.

 

In order to process and analyse the data users should be in possession of appropriate software that can handle very large datasets. Users can contact Stats SA for advice in this regard if necessary.

 

The files essential for accessing the data are provided in text format. Other files with relevant information have been provided in Word 2000 format. If you do not have software appropriate for this format, please contact Stats SA and paper copies will be forwarded to you.  Alternatively, this and additional documentation is available on the Stats SA website situated at www.statssa.gov.za

 

 

2.            CONTENTS OF CDs

           

2.1        Data

 

This directory contains the data files in zipped format:

 

·         Households.zip

·         Persons.zip

·         Mortality.zip

·         Household imputation flags.zip 

·         Person imputation flags.zip

·         Geography.zip

 

2.2               Metadata

 

This directory contains metadata for all the variables in the following files:

 

·         Introduction.doc

·         Households.doc

·         Persons.doc

·         Mortality.doc

·         Imputation flags for households and persons.doc

·         Geography.doc

 

2.3               Code lists

 

This directory contains code lists for:

 

·         Country of birth and citizenship

·         Religion

·         Occupation

·         Industry

 

All code lists are contained in the metadata on the Stats SA website. These four code lists are supplied separately for the convenience of users.

 

Users of the variables on migration and place of work should consult the main-place or sub-place code list on the website.

 

2.4               Questionnaires

 

This directory contains the following questionnaires in .pdf format

 

·         Questionnaire A (for persons in households)

·         Questionnaire B (for persons in institutions)

·         Questionnaire C (for institutions)

 

2.5               Record layouts

 

This directory contains the record layouts of the files described in 2.1 above.

 

2.6               Definitions

 

This directory contains the concepts and definitions used in the data.

 

2.7               Adjustment factors

 

This directory contains an Excel file with four worksheets showing the adjustment factors for persons and households on municipality and provincial level, which can be used to calculate the universe.

 

If required, standard errors (SE) for each variable can be calculated by Stats SA.

 

 

3.            REQUIREMENTS

 

The following minimum hard drive space is required:

 

·         Data 122Mb

·         Metadata   12Mb

·         Other     1Mb

 

 

 

4.            DESIGN OF THE SAMPLE

 

This sample is a 10% unit level sample drawn from Census 2001 as follows:

 

4.1            Households:

 

·         A 10% sample of households in housing units, and

·         A 10% sample of collective living quarters (both institutional and non-institutional) and the homeless.

 

4.2            Persons:

 

·         A sample consisting of all persons in the households and collective living quarters, and the homeless, drawn for the samples described above in 4.1.

 

4.3            Mortality

 

·         A sample consisting of all mortality information for the households in housing units drawn in the 10% sample of households.

 

 

5.            WEIGHTING FACTORS

 

Both the 10% household and person sample files contain a weight variable. This weight variable is the adjustment factor for undercount (for households or persons as appropriate) multiplied by 10 to inflate the 10% samples to the relevant population. In the person records, aggregated totals of sparsely populated codes, such as very old ages, might differ substantially from real totals due to sampling fluctuations – no scaling of the weights was done. In the household records aggregated totals will be approximately equal to real totals. Mortality was not adjusted for undercount and therefore there is no weight variable. 

 

 

6.            STRATIFICATION AND ORDERING OF THE RECORDS

 

The census household records were implicitly stratified according to municipality, geographic type and EA number. The latter is a unique eight-digit census Enumerator Area number.

 

The following geographic types were used:

 

·         Urban formal

·         Urban informal

·         Tribal

·         Rural formal

 

 

7.            VARIABLES INCLUDED IN THE 10% SAMPLE

 

All variables as per the questionnaire are included in the 10% sample, as well as derived variables and imputation flags.

 

EA numbers are excluded to preserve confidentiality.

 

Geographic type is excluded from the final sample. Instead two additional geographical variables are supplied, namely:

 

·         Urban and rural – Census ’96 classification

·         Size and density of locality

 

 

8.            GEOGRAPHY

 

The South African geographical structure for the 10% sample consists of the following geographical entities, which fit into different geographical hierarchical levels:

 

South Africa

Province

District council (DC - Category C) or Metropolitan area (Category A)

Magisterial districts (MD)

Local municipality (Category B), or District management area (DMA)

 

While the structure is intended to be hierarchical, South Africa’s geography has cross-boundary entities, which complicate the picture. For example, there are eight municipalities which lie across provincial boundary lines. Users are advised to bear this in mind when choosing the appropriate hierarchy. For example, for the City of Tshwane, which lies in two provinces, one would not use the provincial hierarchy.

 

Due to the existence of cross-boundary entities there are five distinct geographical hierarchies.

 

 

9            MERGING THE DATASETS

 

Number of records in the datasets :

 

·         Households :  948 592

·         Persons :  3 725 655

·         Mortality :  36 267 

·         Household imputation flags : 948 592

·         Person imputation flags : 3 725 655

·         Geography :   948 592

 

 

Serial number is a common variable in all files listed in 2.1 above. This variable together with the variable Person number can be used to merge Persons with their relevant Imputation flags.

 

Serial number can also be used to merge all files with the different geographical hierarchies in the file Geography.zip. 

 

The variable Type of living quarters (comprehensive) is included in both the Households and Persons files to assist with the analysis of the data.

 

 

10.            INTERPRETING THE DATA

 

10.1            Confidentiality

 

In order to preserve confidentiality the lowest geographical level that unit records can be linked to is municipality.

 

As further assurance of the confidentiality of the data, municipalities with 200 or fewer households are logically grouped with adjacent municipalities.

 

The following municipalities are grouped:

 

                        Code                                                                      Grouped with

193          114

292          218

491          415

591          592

691 605

 

10.2      Extract from the Report of the Census Sub-Committee to the South African Statistics Council on Census 2001

 

“Preliminary investigations indicate that the 2001 census probably resulted in:

·         an underestimate of the number of children below age five*

·         an over-estimate of the number of teenagers aged between 10 and 20

·         an underestimate of the number of men relative to the number of women*

·         an underestimate of the number in the white population

·         higher than expected numbers aged 80 and older, in the African population

·         an underestimate of the number of foreign-born, since some identified themselves incorrectly as being South African-born

·         age misstatement in the range 60-74

·         an overestimate of the extent of unemployment

·         an underestimate of those who were employed for only a few hours per week

·         an underestimate of household income

·         an overestimate the number of paternal orphans and the number of fathers missing from the household.

 

* This is a common feature of censuses, particularly in developing countries.

 

In addition:

·         Scanning problems caused some births to be recorded in the wrong province. The number of cases is relatively small and should not lead to too much distortion for most purposes for which these data are used; however, it does produce obviously erroneous results when one tries to estimate the extent of inter-provincial migration of those born since the previous census.

·         The fertility data (numbers of children ever born, children surviving) are problematic.

 

For further details of these investigations see the full report of the Census Sub-Committee.

 

 

11.            COPYRIGHT NOTICE AND DISCLAIMER

 

© Copyright, Statistics South Africa, 2003.

 

The information products and services of Stats SA are protected in terms of the Copyright Act, 1978 (Act 98 of 1978). As the State President is the holder of State copyright, all organs of State enjoy unhindered use of the Department’s information products and services, without a need for further permission to copy in terms of that copyright.

 

Where a copy of the information is made available to any third party outside the State, the third party must be made aware of the existence of State copyright and ownership of the information by the State.

 

The State (through Statistics SA) retains the full ownership of its information, products and services at all times; access to information does not give ownership of the information to the client. The use of any data is subject to acknowledgement of Stats SA as the supplier and owner of copyright.

 

Statistics South Africa (Stats SA) will not be liable for any damages or losses, except to the extent that such losses or damages are attributable to a breach by Stats SA of its obligations in terms of an existing agreement or to the negligence or willful act or omissions of the Stats SA, its servants or agents, arising out of the supply of data and or digital products in terms of that agreement. The user indemnifies Stats SA against any claims of whatsoever nature (including legal costs) by third parties arising from the reformatting, restructuring, reprocessing and/or addition of the data, by the user.

 

The data were gathered in October 2001. Since then, there have been demographic changes in South Africa associated, inter alia, with internal and external migration, and population growth. This means that population profiles may have changed at differing geographic levels. Stats SA is not responsible for any damages or losses, arising directly or consequently, which might result from the application or use of these data.

 

 

12.            CONTACT DETAILS

 

Please do not hesitate to contact Stats SA User Information Services for additional information or queries:

 

Tel:                   +27 (12) 310-8600

Fax:                 +27 (12) 310-8500

E-mail:              info@statssa.gov.za

Stats SA website: www.statssa.gov.za