The Classification of Measurement Scales, Part I

In 1946, Stanley Smith Stevens of Harvard University published a brief article in the journal Science that classified measurement scales into four types.  Since that time, his classification has been used by statisticians, researchers, academics, and quantitative analysts of all sorts.  Different professional fields have created their own versions of his classification, but his article should be considered a classic in the field of measurement theory.

Stevens’ classification identifies four types of measurement scales.  His purpose in developing the classification was to clarify the meaning of the term “measurement” at a time when there was much disagreement about the term particularly among social and behavioral scientists. Here are Stevens’ four types of scales.

Nominal Scale:  Each value on the scale is a label that represents a category of a characteristic.  Examples are gender (female, male), ethnicity (asian, hispanic, black, white, etc.), political party, religious affiliation, eye color, type of school attended, and so on.  Values on a nominal scale have no inherent order and do not represent quantities on a continuum.  Sometimes numerals are used to represent a nominal scale (female = 1, male = 2), but the numerals do not represent quantities.  Data from a nominal scale are sometimes referred to as “categorical” data, “attribute” data, and (incorrectly) “qualitative” data.

Ordinal Scale:  Each value on the scale represents a quantity of a characteristic by a rating.  Examples are ratings of customer satisfaction (Very Satisfied, Somewhat Satisfied, Somewhat Dissatisfied, Very Dissatisfied), ratings of quality (5 = Very Good, 4 = Good, 3 = Fair, 2 = Poor, 1 = Very Poor), hardness of minerals, grades of meat, socio-economic status, and scoring of Olympic figure skaters.  The values on an ordinal scale have an inherent order (from low to high or less to more), but the intervals between the ordered values are not equal because an ordinal scale lacks a “unit of measure.”

Interval Scale:  Each value on the scale represents a precise quantity of the characteristic being measured.  An interval scale incorporates a unit of measure to indicate quantities. Examples are a scale used to measure temperature (Fahrenheit or Celsius, both of which use the “degree” as the unit of measure) and dates counted in years.  An interval scale is similar to a ratio scale (see next) except that an interval scale does not have an absolute zero.  On the Fahrenheit scale, the zero is arbitrary and does not mean “no temperature at all.”  In counting years, there is no “zero year.”

Ratio Scale:  Like an interval scale, each value on a ratio scale represents a precise quantity based on a unit of measure.  Unlike an interval scale, a ratio scale has a true zero value that represents the absence of quantity.  Examples are age, income, length, weight, time, and electrical voltage.  Interval and ratio scales are considered “quantitative” measures in the ordinary sense of the word.

If you are interested in reading the original article, here is the reference:  Stevens, S. S. On the Theory of Scales of Measurement. Science Vol. 103, No. 2684 (June 7, 1946), 677-680.

As always, your comments are welcome by clicking on Comments below.

Does a performance measure tell the truth? A follow-up

A reader of this blog sent the following comment in reply to last month’s post:

“It seems that the difficulty comes in trying to create a measure when an activity takes a long time to complete and new activities keep being added.  Maybe a measure of “time to complete” is needed as well.”

Recall that last month’s post detailed a Washington Post reporter criticizing a measure of performance reported by the Chief of the District of Columbia police department.  The D. C. Police Chief reported that in 2011 the department’s homicide closure rate was 94%.  The reporter claimed that the “true” closure rate was 57% and that the Chief misrepresented the “truth.”

The reader is correct in fingering time as the complicating factor for this measure. “Time to complete” can be a useful measure, but it does not resolve the debate about the “true” closure rate. The technical measurement question is how to account for cases whose time to close overlaps reporting periods.

One response is to not bother.  Instead, manage process performance by keeping track of the volume of open cases and the case backlog because these measures can inform the allocation of staffing resources toward the most urgent priorities.

However, if stakeholders need to assess the overall performance of the process, reportson open cases and total backlog are not what they want to know.  They want to know the “closure rate” which is the percent of cases successfully closed in some period of time.  To analyze the possibilities, let us construct a hypothetical flow of cases through a hypothetical process:

  • Assume that from the inception of the process 100 cases have been entered and 75 have been closed but the closure rate has not been calculated or reported.
  • At the start of a new reporting period, 25 “prior” cases are still open.
  • During the reporting period, 20 of the 25 prior open cases are closed.
  • During the reporting period, 100 “new” cases are added to the process.
  • During the reporting period, 60 of the 100 new cases are closed.

In the reporting period a total of 75 cases were closed. What is the closure rate? Here are several depending on the denominator used to calculate the percentage:

  1. The overall closure rate since inception is 75+20+60 ÷ 100+100 = 78%
  2. The closure rate for new cases in the reporting period is 60 ÷ 100 = 60%
  3. The closure rate for all open cases in the reporting period is 20+60 ÷ 25+100 = 64%
  4. The closure rate for all cases closed out of all cases in the reporting period is 20+60 ÷ 100 = 80%

Unless you are a believer in the positivist paradigm for scientific inquiry, arguing about the “true” rate is not a useful question for managing organizational performance.  Any delineation of a reporting time period will have the problem of selecting the appropriate denominator.  The choice will depend not on what is true but what aspect of performance is of interest.

As always, your comments are welcome by clicking on Comments below.

Does a performance measure tell the truth? A debate in the press.

One reason to measure organizational performance is to report to interested stakeholders on how an organization is doing.  Unfortunately, when the stakeholder was the public and the organization was the local police department, and when a local newspaper reporter got involved, a measure of police department performance was recently reported in the press in an uninformed and accusatory manner that did not advance the public’s understanding of performance data.

Under the headline, “The trick (all emphases added) to D.C.’s homicide closure rate,” The Washington Post on February 19, 2012 printed an in-depth report criticizing a performance measure reported by the District of Columbia Chief of Police.  The measure was the annual closure rate for homicide cases.  The Chief reported that in 2011 the department’s homicide closure rate was 94%.

In the opening paragraphs of the article, the reporter stated, “But an examination of District homicides found that the department’s closure rate is a statistical mishmash that makes things seem much better than they are.  The District had 108 homicides last year. A 94% closure rate would mean that detectives solved 102 of them.  But only 62 were solved as of year’s end, for a true closure rate of 57%.”

The reporter claimed that the “true” rate is 57%, and that the D.C. Chief is misrepresenting the “truth.”  In The Washington Post on February 28, 2012, the D.C. Chief wrote a response taking offense at the charge that she intentionally tricked the public.

Is this a case of manipulating data to hide the “truth” and “trick” the public? Did an intrepid reporter expose measurement malfeasance? Is there a “true” homicide closure rate for 2011?

The reality of solving a homicide case can be complex and time-consuming.  For a given year, there can be several different homicide closure rates, each giving a partial view of police department performance as discussed below.  None gives the whole “truth” and each should be understood in the context of how the measure helps to manage performance.

(1)  The reporter’s rate:  The percent of new cases closed in the year.

The reporter calculated this measure to be 62 out of 108 or 57%.  It is accurate as far as it goes but this measure does not acknowledge the successful closure of prior-year cases closed in the reporting year. It focuses only on closing new cases rather than open cases from prior years, which is arguably not good performance management.

 (2)  The Chief’s rate:  The percent of all cases (both prior and new) closed in the year out of new cases in the year.

The D.C. Chief of Police calculated this measure to be 92 out of 108 or 97%.  This percent is required reporting by the Uniform Crime Reporting guidelines established by the FBI and used nationwide according to the D.C. Chief of Police.  It emphasizes closing old as well as new cases but does not acknowledge the total number of new cases that have been added to the work load.  According to the article, the Chief used this measure to warn potential killers that they will be caught if they kill in D.C. which amounts to using it as a crime-deterrent tactic.

Here is a third way to calculate the 2011 D.C. homicide closure rate:

(3)  The percent of all cases closed in a year out of all open cases (both prior and new) in the year.

At the beginning of 2011, there were 43 open cases from prior years. During 2011, 108 new cases were reported for a total open case load of 151 for the year. By the end of the year, the 43 prior cases were closed and 62 of the new cases were closed for a total of 105 closed cases.  The closure rate for 2011 is 105 out of 151 or 70%.  This calculation keeps the total work load in focus to manage a balance between old and new cases to maximize productivity.

The truth is that the D.C. homicide closure rate for 2011 is variously 57%, 94%, and 70%.  There are still other measures of homicide closure performance as suggested in several letters to the editor criticizing the reporter’s analysis of the data:

(4)  A rolling aggregated closure rate based on the total number of homicides in a two, three, or four year period.

(5)  The average length of time it takes to solve a case.

(6)  The percentage of closed cases resulting in conviction.

There is an element of “truth” in each of these six measures, but each truth is partial.  It is more important to understand how the measures are used to manage performance for success.

As always, your comments are welcome by clicking on Comments below.

How do you identify a measure for an attribute that is hard to measure?

When a management goal is specific, the relevant measure will be obvious. For example, suppose management set the following goal:

  “Customer calls are processed within four hours,”

 A relevant measure would be:

             The percent of customer calls processed within four hours.

But what if the performance goal is not about cycle time or some physical attribute that is easy to measure?  Suppose management wants to improve customer service and customer satisfaction by giving front line staff greater responsibility for day-to-day decisions when serving customers? Assume that management stated the following goal in its strategic plan:

“Empower employees to make decisions that best serve the needs of customers.”

How can “employee empowerment” be measured?  To identify a relevant measure, management would have to define what empowerment means and how an empowered employee behaves.  For example, management could define empowerment as follows:

Empowerment is defined as taking a non-standard but appropriate action to satisfy a customer need without management approval. The action is non-standard if it is not specified as a standard operating procedure for the situation.

To describe the behavior of an empowered employee, management could develop a performance logic model for the work process that an empowered employee uses to meet a customer need.  For example:

  1. Employee receives a call from a customer with a concern.
  2. Employee assesses the customer need.
  3. Employee reviews the standard procedure for meeting customer need.
  4. If the standard procedure meets the need, employee provides it, and the call is closed successfully.
  5. If the standard procedure does not meet the need, employee creates a procedure and provides it.
  6. If the employee-created procedure satisfies the customer, the call is closed successfully.

Using this logic model, management could measure progress toward the empowerment goal with this measure:

 A. Percent of customer calls resolved by employee-created procedures

Now suppose management determined that there was an organizational policy that is a barrier to employee empowerment such as requiring several levels of management approval before taking a non-standard action. Management might state the following goal in support of empowerment:

 ”Reduce the number of management reviews required for a non-standard procedure.”

To measure progress toward this goal supporting employee empowerment, management could use the following measure:

B. Number of management review required for an employee to use a non-standard procedure to meet a customer need.

Note that B is an upstream (leading) measure and A is a downstream (lagging) measure, but both are relevant to increasing employee empowerment. The ultimate lagging measure, however, is C below:

 C. Percent of satisfied customers who called with a concern

What is challenging about measuring employee empowerment is that empowerment has the nature of a capability, a potential for action, which may or may not result in observable behavior.  If no customer has a need that requires a non-standard procedure, measure A will not indicate the amount of empowerment in the organization. To get around this measurement difficulty, management would have to measure an employee’s state of mind with the following:

D. Percent of employees who say they feel empowered to use a non-standard procedure without management approval if one was needed to meet a customer need.

Employee empowerment is hard to measure and so it has to be surrounded.

As always, your comments are welcome by clicking on Comments below.

What are the good uses of organizational performance measures and data?

I am writing this post at the end of 2011.  In this season of making a list and checking it twice, I thought I would offer a list of the many uses of data that come from measuring organizational performance.  Here is my list, organized into three categories of use:

A.    Measuring Organizational Performance for Managing Performance [These uses are guided by management's stated expectations for success.]

  •  Measures make clear management’s performance goals and expectations.
  •  Measurement data inform managers about progress toward performance goals.
  •  Measurement data inform employees of how well their work unit is doing.
  •  Measurement data can suggest the need for preventive action before a problem occurs.
  •  Measurement data can be used for organizational learning and improvement.
  •  Measurement data document successful accomplishments.
  •  Measurement data can be used to hold managers accountable.

B. Measuring Organizational Performance for Planning, Evaluation, and Decision-making [These uses are guided by specific problematic situation requiring study.]

  •  Measurement data can be used to set organizational priorities.
  •  Measurement data can be used to study organizational problems.
  •  Measurement data can be used for allocating resources.
  •  Measurement data can be used to make policy and program decisions.
  •  Measurement data can be used to describe, evaluate, or audit programs.

C.    Measuring Organizational Performance for Scientific Research [These uses are guided by relevant bodies of scientific knowledge.]

  •  Measures are used to test theories and hypotheses about organizational behavior.
  • Measures are used to answer questions about a field of organizational policy or practice.

Here at the Managing with Measures website, I am primarily interested in category A, in particular, how to develop a system of measures that provide the right data about organizational performance on a timely and on-going basis.

I also have an interest category B which involves designing empirical studies to address a specific questions or problems.

 Wishing you all a happy, prosperous, and high-performing 2012!

 As always, your comments are invited by clicking on “Comments” below this post.

A question about using a Yes-No scale

A reader of this blog sent the following questions to Managing with Measures.

(1)   Is a Yes-No scale used to quantify by rating or is it used to quantify by counting?

(2)   I see a Yes-No scale being used to record two things—it exists or it does not exist; it works or it does not work. An example is a lamp in a hotel room. Are there other possibilities?

These questions go back to a post on November 2010 which discussed the three ways to measure anything—using a measuring device, counting, and rating. To review briefly:

  • Quantifying With a Measuring Device

Time is measured with a chronometer; weight is measured with a scale, temperature is measured with a thermometer, etc. These devices each have a well-defined unit of measure which are counted to determine the observed amount of an attribute.  The device records the number of units observed.

  •  Quantifying By Counting

Instances are counted in which a defined attribute is present.  An example is the number of ball bearings in a sample that fail to meet a specification.

  •  Quantifying by Rating

A rating scale does not have a unit of measure that indicates an amount.  Instead human judgment is used to estimate the amount of an attribute. Examples are customer satisfaction and employee morale.

To answer the reader’s first question, Yes-No is a categorical scale with just two categories. It is an example of quantifying by counting, because it is used to determine the presence or absence of an attribute, not the amount of an attribute (a rating).

To answer the reader’s second question, you can use a Yes-No scale (or any type of categorical scale that has only two values that apply to all observations) to count the presence or absence of all sorts of attributes.  For example:

  •  Number of employees who like their boss
  • Number of travel vouchers that did not comply with company policy
  • Number of rainy days in a month
  • Number of registered voters who will vote in the next election
  • Number of participants who rode the bus

Now, if you push on these examples, you could argue that each has to do with either the “existence” of an attribute  (likes boss, compliance with policy, rain) or whether the thing observed “works” or “does not work” (will vote, rode the bus).  So the reader may have a point in his lamp example.

What is a performance attribute?

I was teaching a seminar on stating and measuring strategic goals recently.  We discussed the different types of goals and I explained the “measurability” of each type.  (Types of goals and their measurability may be the topic of a future posting to this blog!)  When we moved on to the topic of selecting specific measures, one participant shared the thought that customer service can’t be measured.

I always enjoy it when someone says you can’t measure something.  It can be a “teaching moment.”  I personally think that customer service is easy to measure, and so I asked why he thought it was hard to measure.  His answer was that customer service is very complex.  I invited the class to imagine themselves in a customer service department and to identify specific characteristics of customer services performance that could be measured. 

They came up with the following list:

  • Percent of satisfied customers
  • Number of customer complaints received
  • Number of new orders from customer referrals
  • Number of customer calls for information or help
  • Percent of on-time deliveries to customers
  • Average time to deliver products to customers
  • Average time to resolve a customer concern
  • Average time that customers waited for a response after calling
  • Cost of servicing a customer problem
  • Backlog of unresolved customer problems

It became clear to the class that it is not hard to identify measures for customer service.  The key is to identify specific attributes of performance.  A performance attribute is a single characteristic of performance.  (Some authorities use other terms—dimension, property, aspect, factor—I prefer attribute.)  Each individual attribute will suggest a measure.  To measure anything—the size of a room, the talent of an Olympic figure skater, organizational performance—you have to identify each specific attribute you want to measure.  A single measure can only be applied to a single attribute.  If what you want to measure has several attributes, you will need a different measure for each.  A performance attribute is what a performance measure measures.

The person who thought that customer service could not be measured was making the mistake of assuming that the performance of the customer service department had to be measured by a single comprehensive measure that represented the overall performance of the department.  It is possible to represent the overall performance of the department in a single number but this is done by creating an index made up of individual measures of specific attributes.

I invite questions or comments by clicking on “Comments” immediately below this post, or you can send an e-mail.

What is a metric owner?

Like an employee, a measure of organizational performance needs someone to supervise it to make sure it is doing its job. That person is called a “metric owner.” The two primary responsibilities of a metric owner are:

  • To regularly review the data produced by a measure to ensure that they  are trustworthy and useful for managing with measures.
  • To look for and advocate ways that a measure can be improved or a better measure substituted if necessary.

Owning a performance measure does not mean that the owner is responsible for achieving the performance that is being measured. The responsibility for achieving higher organizational performance should be shared jointly by all members of the leadership team. Assigning ownership of measures to members of the leadership team is simply a way to ensure that the measures are functioning properly over time.

I facilitated the strategic plan and performance measures for a large corporation using the balanced scorecard methodology.  Here is how they shared measure ownership.

 The CFO owned the three measures in the Financial perspective:

  • Number of sales quotes issued with total dollar value
  • Number of orders received as a percent of orders received + lost
  • Warranty costs from repairing or replacing as a percent of sales

 The V. P. Marketing owned the two measures in the Customer perspective:.

  • Percent of surveyed customers who are very satisfied
  • Percent of orders delivered on the original day of commitment

 The V. P. Production owned the three measures in the Internal Process perspective:

  • Cycle time index of average percent improvement in critical processes
  • Cost reduction index of average percent improvement of critical products
  • Fixed costs as a percent of sales

 The V. P. Human Resources owned the three measures in the Learning and Growth perspective:

  • Percent of employees who are very satisfied
  • Number of employees leaving voluntarily
  • Employee time lost due to accidents

It was understood by these leaders that they were mutually accountable for achieving the goals that were being monitored with these measures. For example, the CFO was not solely responsible for reducing the costs of honoring warranties, only making sure that the cost data were properly collected and reported.

What is performance benchmarking?

Once I attended a presentation on how an organization developed a system of performance measures.   The speaker said that they “benchmarked” their organization.  To do this, they visited several other organizations and reviewed their performance measures to get ideas for their own measures.

 What they did was make “site visits.”  They did not benchmark.

The notion of a benchmark comes from the field of surveying. A surveyor will measure the altitude of a selected position and mark it with a metal plate that documents the altitude at that position.  The surveyor then can measure the altitude at other positions using the metal plate as a reference point.  Essentially, a benchmark is a data point used as a basis for comparison.  In the field of organizational performance, a benchmark is a measured level of performance in one organization that is shared with another organization in order to compare performances.  To make this comparison, both organizations need to have measured their own performance.  Benchmarking is a way for two organizations to analyze and improve performance by sharing data on comparable organizational processes with each other.

To conduct a benchmarking study, one organization needs to find another organization that is willing to be a benchmarking partner. The partners then exchange performance data on a selected work process to learn how well each one is doing, which organization is performing better, and why.  An internal benchmarking study is carried out across divisions within a single organization.  An external benchmarking study is carried out between different organizations in the same or even different industries.  Sometimes a benchmarking study is conducted by an independent consulting firm that invites a number of organizations to participate in sharing data and best practices.  A benchmarking study needs to be carefully planned so that the data that is shared is comparable and useful to all participants.

Carl Thor in his book, The Measures of Success: Creating a High Performing Organization, describes the steps in a benchmarking study:  (1) Select a function or process to benchmark.  (2) Choose a benchmarking team.  (3) Gather data on organizational performance on the selected function or process.  (4) Select a benchmarking partner.  (5) Agree on the ground rules for confidentiality, data sharing, and schedule.  (6) Teams meet to compare data, discussing how each performs its work, and discuss the possibility of mutual site visits.  (7) In each organization, the teams implement performance improvements learned from the study.

On the distinction between measuring and evaluating performance

Recently I was involved in a disagreement about whether or not a strategic goal can be measured without a performance target.  The answer depends on what is meant by “measured.”  What I want to do in this blog post is clarify a very basic distinction that lies at the core of how performance measures are used to manage performance.

What complicates discussions of measuring organizational performance is the unfortunate reality that the word “measure” has many meanings in our everyday language.  The Oxford English Dictionary lists more than 30 different meanings for the word “measure” as a noun and as a verb.

  • Among its many meanings as a noun, “measure” can refer to a definite quantity of something, a standard for evaluation, a measuring device, and a course of action to accomplish something.
  •  Among its many meanings as a verb, “measure” can refer to estimating, marking off, judging, and weighing one’s thinking carefully.

So the word “measure” can refer to the act of determining the quantity of something (which is its technical meaning) and to the act of judging or evaluating something (which is not its technical meaning).  When discussing measures of organizational performance, the word measure should be used only in its technical meaning to avoid confusion and unproductive disagreements.

If we understand measurement as a process of determining the quantity of something, the job of a performance measure is to provide data that describes the current level of performance on some operational characteristic or attribute being measured.  Measuring performance is simply a systematic and objective method for observing the level of performance.  What measuring does not do is judge the observed level of performance.  Judging performance is the job of the manager and to do this, a manager needs to compare the measured level of performance to the desired level of performance (which is stated in the manager’s performance expectation).

The person who argued that you can’t measure a strategic goal without a performance target was using the word “measure” to mean evaluate.  The person who argued that you can measure without a target was using the word measure in its technical sense of determining the quantity or level of performance.  Using the same word with different meanings caused an unproductive discussion.

In summary, when reviewing and evaluating organizational performance, the manager considers four things.

  • A performance attribute:  The specific aspect or dimension of organizational performance that is being reviewed and evaluated.
  • A performance expectation:  What the manager thinks is adequate, satisfactory, or acceptable performance for the specific performance attribute at the time of the review.
  • Data from a performance measure:  A quantitative description of the observed level of performance on the attribute being reviewed.
  • A performance evaluation.  The manager’s judgment of how satisfactory the observed level of performance is in comparison to expected performance.