General 4 — Lee Epstein

Coding Variables

Published in 2005. In the Handook of Social Measurement, ed. Kimberly Kempf-Leonard. Academic Press.

Lee Epstein
Andrew D. Martin

The introduction is below. Click here for the chapter (.pdf).

Introduction

Social scientists engaged in empirical research—that is, research seeking to make claims or inferences based on observations of the real world—undertake an enormous range of activities. Some investigators collect information from primary sources; others rely primarily on secondary archival data. Many do little more than categorize the information they collect; but many more deploy complex
technologies to analyze their data.

Seen in this way, it might appear that, beyond following some basic rules of inference and guidelines for the conduct of their research, scholars producing empirical work have little in common. Their data come from a multitude of sources; their tools for making use of the data are equally varied. But there exists at least one task in empirical scholarship that is universal, that virtually all scholars and their students perform every time they undertake a new project: coding variables, or the process of translating properties or attributes of the world (i.e., variables) into a form that researchers can systematically analyze after they have chosen the appropriate measures to tap the underlying variable of interest. Regardless of whether the data are qualitative or quantitative, regardless of the form the analyses take, virtually all researchers seeking to make claims or inferences based on observations of the real world engage in the process of coding data. That is, after measurement has taken place, they (1) develop a precise schema to account for the values on which each variable of interest can take and then (2) methodically and physically assign each unit under study a value for every given variable.

And yet, despite the universality of the task (not to mention the fundamental role it plays in research), it typically receives only the briefest mention in most volumes on designing research or analyzing data. Why this is the case is a question on which we can only speculate, but an obvious response centers on the seemingly idiosyncratic nature of the undertaking. For some projects, researchers may be best off coding inductively, that is, collecting their data, drawing a representative sample, examining the data in the sample, and then developing their coding scheme; for others, investigators proceed in a deductive manner, that is, they develop their schemes first and then collect/code their data; and for still a third set, a combination of inductive and deductive coding may be most appropriate. (Some writers associate inductive coding with research that primarily relies on qualitative [nonnumerical] data/research and deductive coding with quantitative [numerical] research. Given the [typically] dynamic nature of the processes of collecting data and coding, however, these associations do not always or perhaps even usually hold. Indeed, it is probably the case that most researchers, regardless of whether their data are qualitative or quantitative, invoke some combination of deductive and inductive coding.) The relative case (or difficulty) of the coding task also can vary, depending on the types of data with which the researcher is working, the level of detail for which the coding scheme calls, and the amount of pretesting the analyst has conducted, to name just three.

Nonetheless, we believe it is possible to develop some generalizations about the process of coding variables, as well as guidelines for so doing. This much we attempt to accomplish here. Our discussion is divided into two sections, corresponding to the two key phases of the coding process: (1) developing a precise schema to account for the values of the variables and (2) methodically assigning each unit under study a value for every given variable. Readers should be aware, however, that although we made as much use as we could of existing literatures, discussions of coding variables are sufficiently few and far between (and where they do exist, rather scanty) that many of the generalizations we make and the guidelines we offer come largely from our own experience. Accordingly, sins of commission and omission probably loom large in our discussion (with the latter particularly likely in light of space limitations).