Statistical Tests

Statistical Tests

Confidence Interval/Margin of error

(Video)

We try to answer the following question:

Find an interval such that we are “reasonably confident” that there is a 95% chance that the true population mean is in the interval

To calculate confidence interval, we require population standard deviation. However, since we can’t compute that, we use sample standard deviation as an estimator for population standard deviation

$$ Confidence \: Interval = \bar{s} \pm z\frac{s}{\sqrt{n}} \\

where \; \bar{s} := sample \: mean \\ and \; s := sample \: stddev \\ and \; z := z \: score \\ and \; n := sample \: size $$

It is important to note that a 95% confidence interval does not mean that there is a 95% chance that the population mean is within the interval. Rather, it means that if we were to repeat the sampling process many times and calculate a 95% confidence interval each time, then approximately 95% of the intervals would contain the true population mean.

Hypothesis testing

(Video)

We assume that null hypothesis is true. Given that null hypothesis is true, we try to find the probability of observing the results which we saw.

Generic Steps

Context: We have a population mean without treatment. We also have the mean and standard deviation from a sample from the treatment group

  1. Imagine a sampling distribution of sample means. We need to find the mean and std. deviation of this distribution
    1. The mean of this distribution would be same as population mean and standard deviation would be population standard deviation divided by square root of sample size
    2. Since we assume that null hypothesis is true, the mean of treatment population is same as the control population
    3. The same mean would be the mean of the sampling distribution of sample means
    4. We don’t have population standard deviation, but we know that standard deviation of sample is an unbiased estimator of population standard deviation
    5. Using sample std. deviation, we find the std. deviation of distribution of sample means by dividing by sq. root of sample size
  2. Now, we need to find how far is the treatment mean from mean of the sampling distribution. For that, we calculate z-score. This will allow us to estimate the chances of observing the treatment mean given null hypothesis is true
  3. Calculate the p-value
  4. If p-value is less than some significance level (eg. 0.05), then we can reject the null hypothesis. Why? Look at the definition of p-value below