The Scientific Method: The Gold Standard for Establishing Causality

Chapter 4

Learning Objectives

Recall the elements of the scientific method.

Explain how experiments can be used to measure treatment effects.

Execute a hypothesis test concerning a treatment effect using experimental data.

Construct a confidence interval for a treatment effect using experimental data.

Differentiate experimental from nonexperimental data.

Explain why using nonexperimental data presents challenges when trying to measure treatment effects.

‹#›

The Scientific Method

The scientific method is a process designed to generate knowledge through the collection and analysis of experimental data.

A classic application is in medicine, where researchers run clinical trial to learn the impact of a new drug on patient’s health outcomes.

Scientific method effectively establishes causality.

‹#›

The Scientific Method

The scientific method consists of the following six parts:

Ask a question

Do background research

Formulate a hypothesis

Conduct an experiment to test the hypothesis

Analyze the data from the experiment and draw conclusions

Communicate the findings

‹#›

The Scientific Method Process

‹#›

The Scientific Method

Step 1: Ask a question. Deciding which question to ask is often motivated by interest in a particular outcome

Step 2: Do background research involves learning more about the issue surrounding the posed question. The purpose is to find information that will help identify a possible answer to the question

‹#›

The Scientific Method

Step 3: Formulate a hypothesis involves hypothesizing a possible answer to the question.

Hypothesis

A proposed idea based on limited evidence that leads to further investigation.

Typically grounded in the background research and involves a positive statement about causality

‹#›

The Scientific Method

Step 4: Run an experiment

Experiment

A test within a controlled environment designed to examine the validity of a hypothesis

Experimental data

Data that result from an experiment

‹#›

The Scientific Method

For hypothesis about causality, the experiment generally involves allocating a binary treatment, or treatment levels, across two or more groups

Treatment

Something that is administered to members of at least one participating group

Treatment effect

The change in the outcome resulting from variation in the treatment

‹#›

The Scientific Method

Step 5: Analyze the data and draw conclusions.

Compare the measured outcomes between the group receiving the treatment and those who didn’t

Build a confidence interval for the treatment effect

Is there a causal relationship and how big is it?

Step 6: Communicate the findings. Explain the methodology and findings.

Main conclusion, a confidence level, description of the experiment, reasoning leading to the conclusion, and summary of the statistics used

‹#›

Summaries of Scientific Method for Medicine and Business Examples

‹#›

The Scientific Method and Causal Inference

A Simple Treatment Framework

The basic goal when running an experiment is to measure a treatment effect

Potential outcomes framework:

Consider a group of subjects who will participate in an experiment. Index each with the letter i, so i = 1 refers to the first subject, i =2 refers to the second subject, etc.

Outcomeit is the outcome realized by the subject i if it receives the treatment t

OutcomeiNT is the outcome realized by that same person if it does not receive the treatment (NT), then:

Treatment Effecti = Outcomeit OutcomeiNT

‹#›

The Scientific Method and Causal Inference

The problem in trying to measure the treatment effect is that the subjects cannot be both untreated and treated at the same time

Hence, a single treatment status is chosen at the time of the experiment for any given subject

Two subjects are needed to observe the outcome of subject with treatment and the outcome of subject without treatment

The treatment effect on one subject may be different from the treatment effect on another subject.

‹#›

The Scientific Method and Causal Inference

Since we are unable to measure treatment effects for individual subjects, we attempt to estimate the mean treatment effect for the entire population of subjects who may receive the treatment

Average treatment effect (ATE)

The average difference in the treated and untreated outcome across all subjects in a population

The expected value of the treatment effect for a randomly drawn subject from the population written as E[Treatment Effecti]:

ATE = E[Treatment Effecti] = E[OutcomeiT OutcomeiNT]

‹#›

The Scientific Method and Causal Inference

From Experiments to Treatment Effects

Treatedi: i = 1 if the subject receives the treatment and i = 0 if the subject does not receive the treatment

Outcomei: This variable equals the outcome actually experienced by the subject i after the experiment.

Mean outcome for the treated group:( = 1)

Mean outcome for the untreated group:( = 0)

‹#›

The Scientific Method and Causal Inference

When does the difference in the mean outcomes across the treated and untreated groups yield an unbiased estimate of the ATE?

Participants are a random sample of the population

Assignment into the treated group is random

‹#›

The Scientific Method and Causal Inference

Why the mean outcome for the treated might differ from the mean outcome for the untreated?

Non-zero average treatment effect where the treated group responds to the treatment is called the effect of the treatment on the treated (ETT)

If ETT exists, even if both groups have the same mean outcome when not given the treatment, a difference emerges once the group receives treatment

Selection bias the mean outcome for the treated group would differ from the mean outcome for the untreated group in the case where neither receives the treatment

‹#›

Data Analysis Using the Scientific Method

Hypothesis Testing for the Treatment Effect

For a given experiment with N participants and a single, binary treatment:

The set of participants is a random sample from the population

The sample size N is large, so that there are at least 30 participants in the treated and untreated groups

Assignment of the treatment is random

The average treatment effect is zero (ATE = 0)

‹#›

Data Analysis Using the Scientific Method

The difference in the average outcome for the treated and untreated groups is distributed as:

= 1 – = 0 ~ N (0 , + )

This difference will fall within 1.65 (1.96, 2.58) standard deviations of 0 approximately 90% (95%, 99%) of the time

‹#›

Data Analysis Using the Scientific Method

Using t-stats: If the absolute value of the t-stat is greater than 1.65 (1.96, 2.58), reject the deduced distribution for the difference in sample means. Otherwise, fail to reject. The objective degree of support for this inductive argument is 90% (95%, 99%)

Using p-values: If the p-value of the t-stat is less than 0.10 (0.05, 0.01), reject the deduced distribution for the difference in sample means. Otherwise, fail to reject. The objective degree of support for this inductive argument is 90% (95%, 99%)

‹#›

Data Analysis Using the Scientific Method

Transposition: If inductive reasoning leads to a rejection of the distribution for the difference in sample means, reject at least one of the assumptions leading to that distribution. If the sample is large, and there is confidence in the random sample and random treating assignment, reject the null hypothesis

‹#›

P-Value for T-Stat of 3.466

‹#›

95% Confidence Interval When ATE = 0

‹#›

Confidence Interval for the Treatment Effect

Deductive reasoning:

IF…

The set of participants are a random sample from the population

The sample size N is large, so that there are at least 30 participants in the treated and untreated groups

Assignment of the treatment is random

Then…

The interval consisting of the difference between the average outcome for the treated and untreated, plus or minus 1.65 (1.96, 2.58) standard deviations for this difference, will contain the average treatment effect approximately 90% (95%, 99%) of the time

‹#›

Confidence Interval for the Treatment Effect

Inductive reasoning:

We observe the difference between the average outcome for the treated and untreated

= 1 – = 0, the sample standard deviations for the treated (S1) and untreated (S0), and the number of subjects receiving the treatment (N1) and not receiving the treatment (N0). We conclude the ATE is contained in the interval

= 1 – = 0 1.65 ( + )

‹#›

Confidence Interval for the Treatment Effect

The objective degree of support for this inductive argument is 90%. If we use the intervals

= 1 – = 0 1.96 ( + )

= 1 – = 0 2.58 ( + )

The objective degree of support becomes 95% and 99%

‹#›

Experimental Data vs Nonexperimental Data

Experimental data are well-suited toward measuring causal effects of treatments

Most data that are available to businesses are nonexperimental

Nonexperimental data is data that were not produced during an experiment

No longer able to control how the treatment is administered

Treatment is very seldom randomly assigned, which can interfere with estimating the treatment effect

‹#›

Examples of Nonexperimental Business Treatments and Outcomes

‹#›

IF THESE WERE EXPERIMENTAL DATA TO BE USED TO MEASURE A TREATMENT EFFECT, THE PRICE WOULD HAVE VARIED RANDOMLY ACROSS THE REGIONS AND TIME.

Panel Data on Price and Sales

‹#›

Experimental Data vs Nonexperimental Data

Consequences of Using Nonexperimental Data to Estimate Treatment Effects

High likelihood that the treatment is not randomly assigned

If treatment assignment is nonrandom, then we risk the possibility that ETT ≠ ATE, Selection Bias ≠ 0, or both

Comparing the means between the treated and the untreated groups is no longer a proper estimator for the ATE

‹#›

Chapter4.TheScientificMethodTheGoldStandardforEstablishingCausality.pptx

image1.png

image2.png

image3.JPG

image4.png

image5.png

image6.png

image7.png

image8.png

image9.png

image10.png

image11.png

image12.JPG

image13.JPG