USECSPRO: basic information about CSPro data

The Census and Survey Processing System, more frequently known as CSPro is a software developed by the US Census Bureau to facilitate basic data processing, most importantly data entry, but also validation, correction and recoding, tabulation, and various other tasks. It has been actively used since about 2000 by hundreds of organizations to collect data as part of Censuses, Labor Force Surveys, Income and Expenditure Surveys, Demographic and Health Surveys. A large number of such data files are currently available for researchers.

CSPro dictionary files (*.dcf) define data structure: hierarchy of data levels, number and types of data fields, variable and value labels, etc. Dictionary files do not contain actual data (but only meta-data = data about data) and usually are not confidential. In fact dictionary files are usually circulated a while before the survey starts to test-drive them and determine any deficiencies or particular respondents' situations that the dictionary would not be capable to acommodate.

One dictionary may be applicable to multiple datasets. Often multiple data entry operators would use the same dictionary developed for the survey to enter data and produce, say, 50 different data files, one per each operator. These data files need to be combined into a single file to produce the total sample file representing the survey. This can be done before or after conversion.

Usually a dictionary file maps all data in the data file. But occasionally dictionaries cover only a subset of the data in the file, leaving some content undocumented. This can happen for various reasons, but most often is a result of changes in the survey design, which resulted in expansion of the survey questionnaire and correspondingly modification to the dictionary files. A separate data processing station equipped with an older version of the dictionary might still be able to extract the data it was designed to process, but will not be able to access the newer data. In such a case update the dictinary file to the corresponding recent file.

CSPro data files are text files with a hierarchical record structure and positional content location. Data files do not contain any information that would determine, which dictionary must be used to read them in. Since CSPro data files contain actual raw person-level data, these files are usually confidential, but are still often available for researchers after proper security clearance and confidentiality agreement.