Standalone converter of tab-separated data to Stata's .dta format

Description

The program tab2dta.exe is a standalone converter of tab-separated data to Stata's .dta dataset format. Variable and value labels can be applied by specifying an optional .do file, which should follow Stata's syntax for labelling. This .do file may use only a limited set of commands for labelling data plus must conform to additional limitations below.

Standalone here means that the converter is self-sufficient, specifically it does not require Stata or Stat/Transfer to perform conversion.

Tab-separated data file is a file which uses tabulation character (ASCII code 09) to separate fields within records. It is a popular information interchange format and is similar to CSV (comma-separated values). Most statistical packages and spreadsheet programs would allow export to this format.

For example, Open Office's Calc is producing tab-separated files compatible with this program. Select "Save As..." in the "File" menu, specify "Text CSV" format and set checkmark on "Edit filter settings". Then change the field separator to tabulation in the next dialog:

tab2dta is a .NET program. It is designed to work in MS Windows, but should be able to run in Linux and Mac provided that Microsoft .NET Framework or it's alternative is installed on these machines. See Mono. The minimum required version of .NET is 2.0. If you have Windows Vista or newer, your system already has compatible version of .NET installed. For Windows XP Microsoft .NET Framework can be acquired from Microsoft's website for free.

Assumptions and limitations

  • output is done using the Stata dataset file specification 113, which corresponds to Stata version 8-9 and supported by all subsequent Stata versions;
  • maximum string length is 244 characters;
  • decimal separator is not explicitly taken care of; it is up to the .Net convertion function to decide whether to use dot or comma;
  • all numeric data is saved as doubles; no byte, int, long or float;
  • value labels may be specified in any order or without any order, not only in ascending order:
    label define correct 1 "yes" 2 "no"
    label define alsocorrect 2 "no" 1 "yes"
  • new value labels override already defined value labels, there is no need to specify options add or modify;
  • no labels for extended missings;
  • all strings in the do file must be in Stata's compound quotes, like so: `"string"';
  • no quotes in quotes:
    `"this is `"not"' allowed"'
    `"this "is" allowed"'
    `"this is `also' allowed"'
    `"this 'should work' too"'
  • there is no need to put quotes around string values in the tab-separated input data file, but if they are present, they will become part of the value;
  • unicode is not supported by Stata hence ANSI conversion is done using codepage 1252;
  • Stata seems to have a preference for numeric in ambiguous cases, e.g. in the situation when the whole column is just dots - they are imported as numeric missings, while alternative interpretation would be a string variable with dots as values. This converter follows same convention.
  • the program was designed to work in automated environment that automatically produces correct inputs, thus tab2dta has only minimal error handling and reporting, and no validation of inputs.

Command mode operation

This converter can be invoked from a command line with parameters to work in batch mode and be reused for various purposes.

  • Supply 3 arguments in the following order:
    fully-qualified name of data file, 
    fully-qualified name of the do file, 
    fully-qualified name of the resulting dta file;
  • the do file is optional in both command line and interactive mode (untested);

Examples

The following examples demonstrate how input files for the program may look like:

Example 1Example2
simple.tab
simple.do
example.tab
example.do

Download

No installation is necessary. The application tab2dta.exe is portable.

You need only the application file: download tab2dta.exe

The most recent version is: 1.0.5193.25553 (compiled 2014.03.21)

Author and support

tab2dta was written by Sergiy Radyakin.

To contact the author send email to sradyakin/at/worldbank.org.