Identification and Data Assessment

Chapter 10

© 2019 McGraw-Hill Education. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or distribution without the prior written consent of McGraw-Hill Education

Learning Objectives

Explain what it means for a variable’s effect to be identified in a model

Explain extrapolation and interpolation and how each inherently suffers from an identification problem

Distinguish between functional form assumptions and enhanced data coverage as remedies for identification problems stemming from exploration and interpolation

Differentiate between endogeneity and types of multicollinearity as identification problems due to variable co-movement

Articulate remedies for identification problems and inference challenges due to variable co-movement

Solve for the direction of bias in cases of variable co-movement

‹#›

© 2019 McGraw-Hill Education.

The table below shows a subsample of rocking chair data

Your goal is to estimate the average treatment effect of price on sales. On average, when price increases by $1, what is the effect on sales of rocking chairs?

Assessing Data via Identification

‹#›

© 2019 McGraw-Hill Education.

A parameter (e.g., β) is identified within a given model if it can be estimated with any level of precision given a large enough sample from the population

Suppose we assume the data-generating process as:

Salesi = α + βPricei + Ui

Within this model, we are interested in accurately estimating β.

A parameter is identified if, for a given confidence level K ( < 100% ) and a given “length” L, we can build a confidence interval that contains β with length less than l and confidence level of K, given a large enough sample of data

Assessing Data via Identification

‹#›

© 2019 McGraw-Hill Education.

Identification Example

Define p as the probability of rolling a 3 on any single roll of a die.

Define X to be number of 3s observed on a single roll of a die( X = 1, for a roll of 3 and X = 0, for any other number), so E[X] = p

It can be shown that Var[X] = p(1 – p). Using this framework, the parameter p is identified

We can estimate p as precisely as we want given enough data on the roll of the die (given enough rolls of the die)

Assessing Data via Identification

‹#›

© 2019 McGraw-Hill Education.

The fact that p is identified follows directly from central limit theorem

Suppose the die is rolled N times. Define x1 as the observed values of X for the first roll, x2 for the second, and so on.

Then, define: = [ the sample mean for X, or equivalently, the portion of the N rolls that showed a 3

Given these definitions, the central limit theorem states that: ~ N(p, ) as N gets large

Assessing Data via Identification

‹#›

© 2019 McGraw-Hill Education.

Distribution of Mean of X for N =50 and N = 5,000

Assessing Data via Identification

‹#›

© 2019 McGraw-Hill Education.

Extrapolation and Interpolation

NOTE HOW THE VARIABLES “SALES” AND “PRICE” MOVE TOGETHER IN THE PRICE RANGE OF $210 TO $225 AND IN THE PRICE RANGE OF $275 TO $300

‹#›

© 2019 McGraw-Hill Education.

Suppose we want to know how Sales move with Prices in other price ranges

Interpolation involves drawing conclusions where there are “gaps” in the data

Data gap is any place where there are missing data for a variable over an interval of values, but data are not missing for at least some values on both ends of the interval

Extrapolation involves drawing conclusions beyond the extent of the data

Extrapolation and Interpolation

‹#›

© 2019 McGraw-Hill Education.

Must be considered when engaging in interpolation and/or extrapolation

The determining factor is whether the gap(s) in, or extend of, the data are due to random limitations in the sample or limitations in the population

If it is the former, there may be no identification problem

If it is the latter, then there is an identification problem that must be addressed

Identification Problems

‹#›

© 2019 McGraw-Hill Education.

Attempt to draw f(.) and g(.) without any mathematical formulas

WE ARE ATTEMPTING TO INTERPOLATE (FILL IN THE DATA GAP) AND ATTEMPTING TO EXTRAPOLATE (EXTEND BEYOND THE DATA’S RANGE).

Identification Problems

‹#›

© 2019 McGraw-Hill Education.

When interpolation or extrapolation is used to fill in gaps or limited extend of the data sample, but not the population, there is not an identification problem

When interpolation or extrapolation is used to fill gaps or limited extend of the population, there is an identification problem

No matter how much data is collected from the population, it will not help to draw any conclusions about what is happening in the unobserved range(s)

Identification Problems

‹#›

© 2019 McGraw-Hill Education.

Suppose you want to engage in interpolation and/or extrapolation when there exists an identification problem

For a general model of the data-generating process, where no assumptions are made about the determining function, we cannot sample more data from the population

There are two key approaches toward solving this type of identification problem:

Changes in the population

A functional form assumption

Remedies

‹#›

© 2019 McGraw-Hill Education.

Changing the population to alleviate an identification problem

A new singer has been promoting her music by selling physical copies of her music at various high schools.

She charges the same price to everyone and finds that the seniors buy the most often, freshman the least, and sophomores and juniors are in between

This tells her that her sales appear to be increasing by age of customers

She would like to extrapolate this relationship beyond just high school-aged kids

Using only data from high schools, she has an identification problem

Remedies: An Example

‹#›

© 2019 McGraw-Hill Education.

The figure illustrates possible ways to extrapolate past age 18, but there are no data to sort through the options.

A CLEAR OPTION TO THIS IDENTIFICATION PROBLEM WOULD BE TO TRY SELLING HER MUSIC AT COLLEGES AND COLLECT DATA ON HER SALES PERFORMANCE AMONG THIS GROUP.

THIS SIMPLE EXPANSION OF POPULATION WILL ALLEVIATE THE IDENTIFICATION PROBLEM.

Remedies

‹#›

© 2019 McGraw-Hill Education.

Imposing a functional form assumption to alleviate an identification problem

Standard practice is to assume a functional form of the determining function that applies for all relevant price levels

Assume a data-generating process with a linear functional form for the determining function: Salesi = α + βPricei + Ui

This assumption imposes the shape of the relationship between Sales and Price to be linear, but also dictates how to interpolate and/or extrapolate

Remedies

‹#›

© 2019 McGraw-Hill Education.

HERE, WE ARE ESTIMATING α AND β USING ONLY DATA WITH PRICE IN THE RANGES ($210, $225) AND ($275, $300).

WE ARE APPLYING THESE ESTIMATED VALUES ACROSS MANY OTHER PRICE LEVELS.

WE ARE USING THESE VALUES TO INTERPOLATE BETWEEN $225 AND $275 AND TO EXTRAPOLATE ALL THE WAY TO $350.

Regression Line for Rocking Chair Sales and Price Data

‹#›

© 2019 McGraw-Hill Education.

Another circumstance in which identification problems typically arise is when there is variable co-movement in the population

We use the broader term “co-movement” rather than correlation, since simple correlation alone do not encompass all the ways variables may move together in a population that result in identification problems

Variable Co-Movement

‹#›

© 2019 McGraw-Hill Education.

Variable Co-Movement

Three types of variable co-movement:

Perfect multicollinearity

Imperfect multicollinearity

Endogeneity

‹#›

© 2019 McGraw-Hill Education.

Consider the following data-generating process:

Yi = α + β1X1i +…+ βKXKi + Ui

Use regression analysis to estimate

We have assumed a functional form, so as long as there is some variation in there will not be identification problems stemming from voids in the data

There may be still be an identification problem when there is co-movement among the Xs and/or co-movement between one or more X and U

Variable Co-Movement

‹#›

© 2019 McGraw-Hill Education.

Perfect multicollinearity is a condition in which two or more independent variables have an exact linear relationship

If we can write there is perfect multicollinearity

Perfect multicollinearity in our model is equivalent to being able to express for all i in the population

Perfect multicollinearity implies a special type of correlation among two or more independent variables

Variable Co-Movement

‹#›

© 2019 McGraw-Hill Education.

Variable Co-Movement

Imperfect multicollinearity is a condition in which two or more independent variables have nearly an exact linear relationship

When this condition exists for a data-generating process, we can not express for all i in the population

Imperfect multicollinearity is equivalent to there being at least one semi-partial correlation that is “high”– nearly equal to 1

It is common to characterize a correlation above 0.8 as high

‹#›

© 2019 McGraw-Hill Education.

Variable Co-Movement

Endogeneity: in the context of identification problems involves co-movement between an independent variable(s) and the error term in a data-generating process

‹#›

© 2019 McGraw-Hill Education.

Perfect multicollinearity always leads to an identification problem in regression analysis

As an example, suppose, we believe that Sales of rocking chairs depends not only on price, but also on Distance from the designer’s location

We follow the data-generating process: Salesi = α + β1Pricei + β2Distancei + Ui

The population from which we are drawing suffers from perfect multicollinearity, creating an identification problem, particularly for β1 AND β2.

Identification Problems

‹#›

© 2019 McGraw-Hill Education.

The presence of perfect multicollinearity is clear, since we can write one independent variable as a linear function for another for every element in the population: Pricei = 200 + 0.04 × Distancei

The identification problem comes from the fact that we cannot separately estimate β1 and β2 – the marginal effect of Price and Distance on sales

The data-generating process becomes:

Salesi = α + β1(200 + 0.04 × Distancei)+ β2Distancei + Ui

Salesi = (α + β1200) + (0.04β1 + β2) Distancei + Ui

Perfect Multicollinearity

‹#›

© 2019 McGraw-Hill Education.

Three ways to detect perfect multicollinearity

A known linear relationship among two or more independent variables

Recognize misuse of dummy variables

Let the data reveal it

Perfect Multicollinearity

‹#›

© 2019 McGraw-Hill Education.

Imperfect multicollinearity does not cause an identification problem, it can create challenges with inference

imperfect multicollinearity can generate inflated p-values and confidence intervals, making it difficult to make any strong inductive arguments about population parameters

Because there is not an identification problem, these challenges go away with enough data

Imperfect Multicollinearity

‹#›

© 2019 McGraw-Hill Education.

To illustrate, imperfect multicollinearity, suppose, Price has a near-perfect linear relationship with Distance:

Pricei = 200 + 0.04 × Distancei + Vi,

where Vi contains other factors such as local fuel costs, etc.

A customer at a Distance of 2,000 miles might have a value for V of 3 and so face a Price of 200 + 0.04 × 2,000 + 3 = $283

A customer at a Distance of 400 miles might have a value for V of -2 and so face a Price of 200 + 0.04 × 400 ‒ 2 = $69

Price and Distance have imperfect multicollinearity

Imperfect Multicollinearity: An Example

‹#›

© 2019 McGraw-Hill Education.

Assume the following data-generating process:

Salesi = α + β1Pricei + β2Distancei + Ui

There is not perfect multicollinearity so we can get estimates of all the parameters when regressing Sales on Price and Distance

Imperfect Multicollinearity

‹#›

© 2019 McGraw-Hill Education.

Ways to check whether there is imperfect multicollinearity, and thus the possibility that this condition is inflating p-values and confidence intervals:

Calculate semi-partial correlations among independent variables and check whether they are close to 1

Variance inflation factor (VIF)

Imperfect Multicollinearity

‹#›

© 2019 McGraw-Hill Education.

Variation inflation factor (VIF) for an independent variable—say, —is equal to , where is the R-squared from regressing that independent variable (X1) on all other independent variables (X2,…,Xk) for a given determining function

A higher VIF for a given variable implies more noise (less certainity) in its coefficient estimator

VIF also tells us how much uncertainty this co-movement in the Xs is injecting into our estimators

Variation Inflation Factor (VIF)

‹#›

© 2019 McGraw-Hill Education.

Endogeneity can lead to estimators that are not consistent

Assume the following data-generating process:

Yi = α + β1X1i +…+ βKXKi + Ui

and there is a non-zero correlation between X1 and U

This correlation means 1 from a regression of Y on X1,…, XK need not be consistent

The inconsistency of 1 due to endogeneity amounts to endogeneity as an identification problem

Endogeneity as an Identification Problem

‹#›

© 2019 McGraw-Hill Education.

WE HAVE, 1 APPROACH A NUMBER C ≠ 1 AS THE SAMPLE GETS LARGE

Example of Inconsistent Estimator

‹#›

© 2019 McGraw-Hill Education.

The Effects of Variable Co-Movement on Identification

For the data-generating process Yi = α + β1X1i +…+ βKXKi + Ui : If there exists an exact linear relationship between at least two of the independent variables (Xs), defined as perfect multicollinearity, then there is an identification problem

In contract, if there is no exact linear relationship among the Xs, it is always possible to distinguish the effects of the independent variables on the outcome (Y) with any level of precision with sufficient data, even if some Xs exhibit imperfect multicollinearity

If there is correlation between any independent variable and the error term, defined as endogeneity, then there is an identification problem, no matter whether the correlation is via an exact linear relationship or not

‹#›

© 2019 McGraw-Hill Education.

For perfect multicollinearity

As long as our goal is to estimate the treatment effect and we have no particular interest in distinguishing the effects of controls, dropping one of the control variables contributing to perfect multicollinearity is an effective remedy

The only viable remedy when the treatment contributes to a perfect multicollinearity problem is to change the population from which you are sampling

Remedies for Identification Problems

‹#›

© 2019 McGraw-Hill Education.

Remedies for Identification Problems

For imperfect multicollinearity

If data are suffering from noisy estimates and VIF calculations suggest imperfect multicollinearity, the simple solution is to gather more data

If the imperfect multicollinearity involves only controls and there is no interest in estimating the effects of the controls per se, then collecting more data will not necessarily be worthwhile

‹#›

© 2019 McGraw-Hill Education.

Remedies for Identification Problems

For endogeneity

The only viable remedy is to change the population from which you are sampling

It does not matter whether the endogeneity involves the treatment or not

Options include: collecting controls, finding a proxy variable(s), finding an instrument(s), and/or transforming cross-sectional data to become a panel

‹#›

© 2019 McGraw-Hill Education.

Suppose we have assumed the following data-generating process: Yi = α + β1X1i +…+ βKXKi + Ui

Let X1 be the treatment and X2, … , XK be controls

Suppose that there is an omitted variable XK+1, that affects Y (and so is part of U) and is correlated with X1

The data generating process can be written as:

Yi = α + β1X1i +…+ βKXKi + βK+1XK+1i + Vi

Identification Damage Control: Signing the Bias

‹#›

© 2019 McGraw-Hill Education.

Let XK+1 = + X1i + …+ XKi be the estimated regression equation we get if we were to regress XK+1on X1, …, XK

Within this framework, define βK+1 × as the omitted variable bias

Omitted variable bias is the product of the effect of the omitted variable on the outcome (βK+1) and the (semi – partial) correlation between the omitted variable and the treatment ()

Identification Damage Control: Signing the Bias

‹#›

© 2019 McGraw-Hill Education.

Since we do not observe the omitted variable, we cannot estimate either of the components of omitted variable bias

We often can use theory to guide us with regard to the sign of each component.

The basic relationship is: sign(βK+1 × ) = sign(βK+1) × sign()

Identification Damage Control: Signing the Bias

‹#›

© 2019 McGraw-Hill Education.

Identification Damage Control: Signing the Bias

The four possibilities for the sign of the omitted variable bias is shown in the table below:

‹#›

© 2019 McGraw-Hill Education.

When exploring and interpolating data, there are two main problems that arise:

Functional form assumptions are used to address identification problems stemming from exploration and interpolation. The most common functional form assumptions are linear, quadratic and log-linear. Additive and multiplicative forms of these functions have also been used with success by researchers in the past decade or so (Mezey, 2005).

Enhanced data coverage is a way of addressing identification problems stemming from exploration and interpolation. The idea behind enhanced data coverage is that it provides more information about your model than you would have otherwise assumed, so you can make better choices about how to use it when defining functional form assumptions.

Enhanced data coverage can be used to address problems with functional form assumptions, but it’s important to understand what they are before using them as remedies for identification errors caused by exploration or interpolation problems

A functional form assumption is a way of addressing identification problems stemming from exploration and interpolation.

Exploration and interpolation are two common techniques used by geoscientists to address the uncertainty associated with data sets, which can lead to multiple solutions for a problem. When this occurs, there may be no clear answer as to which solution is best, or all possible solutions (or combinations thereof) may be equally valid solutions for different scenarios. In this case, it’s important for geologists working on exploration models that they understand how these methods work together so they know what questions will better help them answer their research question(s).

Functional form assumptions are a way of addressing identification problems stemming from exploration and interpolation. With these assumptions, you can identify important features of your data that are not already represented in the feature space. These features often include outliers in your data set that do not fall within an assumption’s functional form.

Try it now!

How it works?

Follow these simple steps to get your paper done

Place your order

Fill in the order form and provide all details of your assignment.

Proceed with the payment

Choose the payment system that suits you most.

Receive the final file

Once your paper is ready, we will email it to you.

Our Services

Ace Writing Center has stood as the world’s leading custom essay writing services providers. Once you enter all the details in the order form under the place order button, the rest is up to us.

Essays

At Ace Writing Center, Nowadays, students normally have extremely busy schedules. You will note that some of them have to take on some evening or weekend jobs in order to get some income that can help them to sustain in college or in the university. This can deny them a chance to write all the essays given. Others usually get bombarded with a lot of work by their lecturers. This can still delay such students from working on all their essays. However, some of them usually try to work on all these essays but end up delivering their work late. This can prevent them from graduating since most lecturers are strict on deadlines. If you want to write a business essay, the wise way is to hire an outstanding essay writing service like us, so that you can get the best results. If you are keen, you will note that many companies usually overcharge their customers. Some of them are there only to make money. And in reality, they really don’t care to build a long term commitment with students. You should not choose such companies. You should take your time and choose a reliable company to work with. Ace Writing Center is the ultimate solution for you. We have been offering our writing service for more than 7 years. This is a clear indication that you will get quality essay writing service. We have a wide range of writers who can work on any business essay that you might have. We believe in doing extensive research so that we can provide quality work to all our clients. .

Admissions

Admission and Business Papers

Have you ever had to write an admission essay for college? The majority of students face the same issues when applying to a university or college and many in such situations decide they need professional help to cope with this matter. They get in a situation when the deadline keeps coming closer but lack motivation to start because they are just not sure if their writing skills are strong enough. We have a solution for you! Ace Writing Center is the best admission essay writing service with a large professional team and years of experience in providing high-quality papers to students of all levels and faculties. The mission of our team is to help students make their dreams of entering a good college come true and that’s what we offer!.

Editing

Editing and Proofreading

Sometimes all the words for your paper just flow out of your mind and into your fingers. You type quickly at your keyboard and there they are, your beautiful words right there on the screen. But you have no idea how to polish it up. You may be wishing there was a paper writing service that offered this type of writing service. Look no more! Here at Ace Writing Center, we offer you an editing and proofreading option that you can't find anywhere else..

Coursework

College Essay Writing

In case you are familiar Ace Writing Center, you know the way to distinguish a better company from a cheap one exactly. First of all, poor service website does not have a sufficient support. We think support team is an essential part of success; it has to answer all clients’ questions and be a connecting link between clients and their writers. On our web-service you will get answers about anything you need and your writer will receive all your instructions, assignments and requirements exactly and swiftly. A writing service that we run has got a flexible pricing system that will save you from senseless wastes and many bonus systems that let you sparing money for something important for you.

You cannot copy content of this page