Back to contents index

Chapter 5

Final statistical adjustments for undercount in Census ’96

The estimations of the final undercount and the adjustment factors could not be based solely on the matching process because of the proportion of unresolved cases (78% of cases were successfully matched). They also could not rely solely on the respondent saying whether or not a particular person had been counted, because of a possible lack of knowledge of where each person was on census night, or forgetfulness. Instead, they relied on a combination of these methods, as described below. This chapter is very technical.

Imputation

A brief description of the way in which unresolved cases were handled to arrive at adjustment factors, and an outline of the way in which these adjustment factors were calculated, are given below. For a more detailed description, the reader is referred to Calculating the undercount in Census ’96 (Statistics South Africa, 1998).

Each person in the PES was allocated a probability of having been counted. Some people were found to be definitely enumerated during the matching process (probability of 1), whilst others were clearly missed (probability of 0). Among the unresolved cases, however, a probability of having been enumerated or missed had to be calculated.

The characteristics of the resolved cases (those definitely enumerated or definitely missed, but excluding those who said they were counted elsewhere) were used to impute the probability that could be attached to an unresolved case, regarding having been counted or missed. The statistical technique, Chi-square Interaction Detection (CHAID), was selected and applied, province by province, by a local Stats SA consultant, Professor D.J. Stoker. This multivariate technique analyses the relationships between categorical-level dependent (in this case, whether or not the person was actually counted) and independent variables (as listed below).

The predictors (independent variables) in each CHAID were:

  • Whether or not the respondent said the person had been counted.

  • EA type (formal or informal urban areas, tribal areas, commercial farms or other rural areas).

  • Population group.

  • Gender.

  • Age group, grouped in such a way as to minimise the effects of age heaping.

  • Household size.

The predictors in the CHAID analysis create a number of hierarchical branches, depending on the strength of their predictive value, and the interactions between them. The best or most significant predictor of whether or not a person had actually been counted (as confirmed in the matching process), across all provinces, was whether or not the respondent said they had been counted. For example, in Eastern Cape, 92% of those cases where respondents said they were counted had actually been matched.

Further predictors varied by province. The dendrograms produced by the CHAID analysis showed different splits and branches. For example, in Eastern Cape, amongst Africans and coloureds living in two-person households who said they were counted, the probability of having actually been counted was 95%. Also in Eastern Cape, amongst those living in informal settlements in households containing three to five people who said they were counted, the probability of having actually been counted was 88% (all outputs of the CHAID analyses are available from Stats SA on request).

Once the CHAID analysis was completed, it was possible to impute a probability of having been counted among the unresolved cases.

The possibility of overcount and of people being missed in both the PES and the census

In any census, some people are counted more than once. For example, if they were visiting a relative on census night, they could have been counted at the place where they were visiting as well as at the place where they usually live.

In Census ’96, a number of people who said in the PES that they were counted elsewhere were actually counted (as indicated in the matching process) at the PES address. These people may have been counted more than once. Allowance was made in the calculations for the possibility of counting these people twice by adding the probability that they were counted elsewhere (imputed using the statistical method described in the previous section) to their probability that they were counted in the PES (1 was added if they were matched at the PES dwelling). Thus their probability of having been counted exceeded 1.

There may, of course, be people who were not reached by both the PES and the census, because they were missed in demarcation. The control to prevent this happening was the monitoring team and the supervision of ground staff by their seniors. Such omissions were investigated, aided by a computerised geographical information system (GIS) as described in Chapter 6.

Weighting

Once a probability of having been counted was allocated to each person in the PES, these probabilities had to be applied as weighting factors to each person in the census to adjust for undercount. The technique used, called XAID, is another multivariate technique. As distinct from the CHAID, XAID makes use of a continuous dependent variable, in this case, the probability of having been counted in the census.

In the XAID analysis, the same set of predictors was used as those used for the CHAID, except one predictor that was no longer applicable, i.e., whether or not the person said that he or she was counted. XAID analyses were done separately for each province. Two variables, namely household size and age group, featured prominently in all analyses.

The outcomes of the XAID analysis were used to impute a probability of having been enumerated for every census record, i.e., every person counted in the census. The reciprocals of these imputed values were taken as weights, and the subclasses defined by the categories in the XAID branches were taken as weighting classes. A weighting matrix, based on these subclasses, was developed for each province.

For example, in Eastern Cape, an African aged 0-1 years, living in a household containing two to five people, in a formal urban area, a commercial farm or a non-tribal rural area, was given a weight of 1,1549. This means that, to adjust for undercount, every baby meeting this description in Eastern Cape represented 1,1549 babies in the final count.

Also in Eastern Cape, those living in informal settlements in households containing two to five people were given a weight of 1,2067, irrespective of population group and age. (For a more detailed, technical description of the CHAID and XAID imputations and the resulting matrices of weights, the reader is referred to Calculating the undercount in Census ’96 Statistics South Africa, 1998).

Adjustments applied to households

For the preliminary estimates, no estimates of the extent of undercount of households were made.

In the final estimates, the CHAID and XAID techniques were applied separately to both households and individuals. A weight was therefore added to each household to take undercount of households into account.

The following variables were included in the CHAID household imputations:

  • Whether or not the respondent said household was visited.

  • Household size.

  • EA type.

  • Gender of the head of the household.

  • Population group of the head of the household.

On the basis of the XAID, a weighting matrix for households was developed. For example, in Eastern Cape, a household of six or more people in formal urban, informal urban or traditional, non-urban areas with an African head of household obtained a weight of 1,0269. This means that, to adjust for undercount, every household meeting this description in Eastern Cape represented 1,0269 households in the final count.

Taking hostel-dwellers and institutions into account

Originally, hostels were included in the 1% PES sample. However, during the matching process, it was found that matching was not possible. There is a high level of mobility of hostel-dwellers, particularly towards the end of the year when the PES was conducted, and people were returning home for holidays or else they were returning home because their contracts had ended.

When final estimates were first calculated, hostel-dwellers and other institutional dwellers were given a weight of 1, which means they remained unadjusted by the PES, but this method could have led to an underestimate of young males. Instead, a simplified set of weighting matrices were developed for the non-institutional population by province, population group, gender and age. These were checked against the XAID weights for consistency; and then the simplified weight for the appropriate demographic category was applied for each hostel or institutional dweller.