grouplabs - create meaningful value labels for group variables by Sergiy Radyakin

Description

grouplabs is a powerful command to create value labels for the groupped variables in Stata. It is extremely useful in the presence of multiple variables, and especially if they are of different types (numeric or strings). Standard Stata command egen group allows creating value labels with option label, however they contain values of the contributing attributes, not their labels. In contrast grouplabs creates easily readable and understandable labels from the original variables' value labels, variable labels, or variable names as a last resort. It also reacts to the missing values if they were used for grouping.

What does this command do?

Suppose you look at the NLSW88 example dataset. You want to profile workers by two attributes: race and marital status. In the original dataset each of these variables appears with the following values:

. tabulate race

       race |      Freq.     Percent        Cum.
------------+-----------------------------------
      white |      1,637       72.89       72.89
      black |        583       25.96       98.84
      other |         26        1.16      100.00
------------+-----------------------------------
      Total |      2,246      100.00

. tabulate married

    married |      Freq.     Percent        Cum.
------------+-----------------------------------
     single |        804       35.80       35.80
    married |      1,442       64.20      100.00
------------+-----------------------------------
      Total |      2,246      100.00

Now suppose you create your categorical profile variable x with the following Stata command:

. egen x=group(race married)

. tabulate x

 group(race |
   married) |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |        487       21.68       21.68
          2 |      1,150       51.20       72.89
          3 |        309       13.76       86.64
          4 |        274       12.20       98.84
          5 |          8        0.36       99.20
          6 |         18        0.80      100.00
------------+-----------------------------------
      Total |      2,246      100.00

Now calling the grouplabs command creates much more useful labels:

. grouplabs race married, groupvar(x) val

. tabulate x

   group(race |
     married) |      Freq.     Percent        Cum.
--------------+-----------------------------------
 white single |        487       21.68       21.68
white married |      1,150       51.20       72.89
 black single |        309       13.76       86.64
black married |        274       12.20       98.84
 other single |          8        0.36       99.20
other married |         18        0.80      100.00
--------------+-----------------------------------
        Total |      2,246      100.00

A larger example for multiple attributes based on the NLSW88 example dataset is available here. Note that the texts in the square brackets correspond to the missing values of the corresponding variables.

Another example of output

In this example five binary attributes x1-x5 are combined into one categorical variable x, which is then given value labels based on the labels of the contributing variables. Below is the (truncated) tabulation of the resulting variable with value labels shown.

                  group(x1 x2 x3 x4 x5) |      Freq.     Percent        Cum.
----------------------------------------+-----------------------------------
                                    --- |         25        2.50        2.50
                               imported |         26        2.60        5.10
                                  heavy |         31        3.10        8.20
                     heavy and imported |         22        2.20       10.40
                             waterproof |         38        3.80       14.20
                waterproof and imported |         26        2.60       16.80
                   waterproof and heavy |         38        3.80       20.60
      waterproof and heavy and imported |         41        4.10       24.70

....table continues but truncated here......................................

This example illustrates the use of the variable labels for binary attributes and multiple (more than 2) variables in the list.

Installation instructions

The module is compatible with Stata 9 and higher. It can, perhaps, be made compatible with earlier versions of Stata if necessary. The program grouplabs is available from the SSC archive. To install, type in Stata literally the following:

  findit grouplabs
Stata will respond with a link to the program. Click it and in the popup viewer window with program description click "install".

Syntax

The main syntax is trivial, basically mirroring the egen statement that was used to create the group variable:

  grouplabs varlist, groupvar(varname)

both varlist and groupvar are mandatory. The full syntax is the following:

  grouplabs varlist, groupvar(varname) [values lname(string) emptylabel(string) separator(string)]
  • varlist is the list of binary variables used to create the categorical group variable;
  • groupvar is the categorical group variable; groupvar can be abbreviated to just g ;
  • lname is the name of the value labels that will be created, lname is optional and the default is the name of the groupvar; lname can be abbreviated to just ln;
  • emptylabel is the string that will be used for the category corresponding to no attribute (all binary variables of the group are zeroes); emptylabel is optional and defaults to "---"; emptylabel can be abbreviated to just emptyl;
  • separator is the token that will be used to separate the components of the composite label, it is a whitespace by default, other reasonable values are: ",", "&", " & ", " and ", etc. Separator is optional, and whitespace is used if separator is not specified; separator can be abbreviated to sep.

Notes

  1. If the group variable does not exist, it will be created without the option missing. If this option is needed, create the group variable prior to calling grouplabs.
  2. If the labels of variables x1 and x2 are "x1" and "x2" correspondingly the labels of the group variable will be "x1", "x2", "x1 x2", and "---" or the string specified instead of the missing label. If the missing values of variables were treated as grouping categories (egen group was executed with the option missing) the corresponding labels will be in square brackets, such as "[x1]", "[x1] x2", "[x2]", and "[x1] [x2]" in addition to the above.
  3. String variables are supported. Regardless of the values flag, values of the string variables are taken, since they can't be labelled.
  4. When choosing the separator, keep in mind that the width of the value label as displayed by the Stata's table command is limited to 40 characters and can't be expanded further (Stata's limitation).
  5. Standard one-way tabulate command may have difficulties displaying the resulting value labels since they may become long for a large number of variables comprising a group. Command tab1w is recommended for one-way tabulations with long labels, but is not mandatory for grouplabs to work.

Here is an executable example:

  do "http://www.radyakin.org/stata/grouplabs/example1.do"

Author and support

In case you are experiencing a problem with this program, and you think the error is mine, kindly let me know.

grouplabs was written by Sergiy Radyakin. To contact the author send email to sradyakin/at/worldbank.org.

Or write to Statalist and mark your message with a tag grouplabs.