In [8]:
import numpy as np
from scipy import stats

# Problem 1

> It is claimed that a certain type of bipolar transistor has a mean value of current
> gain that is at least 210. A sample of these transistors is tested. If the sample mean
> value of current gain is 200 with a sample standard deviation of 35, would the
> claim be rejected at the 5 percent level of signiﬁcance if
> 
> - (a) the sample size is 25;
> - (b) the sample size is 64?

First we define our $H_0$, which we set to $\mu < 10$.

We dont know the varience of the distribution, but instead the sample varience $S$.
Then we can use the one sided *t-test*.

$$H_0 : \mu < 210$$
$$H_1 : \mu \geq 210$$

First TS is calculated with 
$$
TS = \sqrt{n} (\bar{X} - \mu_0) / S
$$

Then the p value is calculated
$$
p = P{T_{n-1} \geq TS} = 1 - T(TS)
$$

Then one can check if the *p-value* is smaller than $0.95$.


In [14]:
mu_min = 210
mu_sample = 200
sigma_sample= 35
alpha = 0.05
accept = 1 - alpha

def test_with_n(n):
    TS = np.sqrt(n) * (mu_sample - mu_min) / sigma_sample
    p_value = 1 - stats.t.cdf(TS, n-1)
    
    print(f"TS: {TS}, p_value: {p_value}")
    return p_value < accept

# Part A
print(f"Assignment A: {test_with_n(25)}")
print(f"Assignment B: {test_with_n(64)}")

TS: -1.4285714285714286, p_value: 0.9169932070815955
Assignment A: True
TS: -2.2857142857142856, p_value: 0.987176989403574
Assignment B: False


# Problem 2

> A question of medical importance is whether jogging leads to a reduction in
one’s pulse rate. To test this hypothesis, 8 nonjogging volunteers agreed to begin
a 1-month jogging program. After the month their pulse rates were determined
and compared with their earlier values. If the data are as follows, can we conclude
that jogging has had an effect on the pulse rates?

I wont put the table from the book in :-(.

Here the after is dependent of the before.
We therefore have to look at the differences

We let $H_0$ be that the pulse is lowered, thus the difference mean $\mu_d < 0$.

We assume $\alpha = 0.05$

In [41]:
before = np.array([74, 86, 98, 102, 78, 84, 79, 70])
after = np.array([70, 85, 90, 110, 71, 80, 69, 74])
diff = after - before
n = len(diff)


mu_s = np.mean(diff)
var_s = np.sqrt(np.sum((diff - mu_s)**2 / (n - 1)))

TS = np.sqrt(n) * (mu_s - 0) / var_s
p_value = 1 - stats.t.cdf(np.abs(TS), n-1)
print(f"TS: {TS}, p-value: {p_value}")
print(f"H_0 will be accepted with all alpha={p_value}")


TS: -1.2629741003498156, p-value: 0.12351970736529827
H_0 will be accepted with all alpha=0.12351970736529827


# Problem 3

> According to the U.S. Bureau of the Census, 25.5 percent of the population of
those age 18 or over smoked in 1990. A scientist has recently claimed that this
percentage has since increased, and to prove her claim she randomly sampled 500
individuals from this population. If 138 of them were smokers, is her claim proved?
Use the 5 percent level of signiﬁcance.

The $H_0$ is that the new percentage is lower of equal than 25.5.
Because each person is a coin flip, this is a Bernoulli distribution.

$H_0$ is therefore $p \leq p_0$ where $p$ is the Bernoulli probability and $p_0 = 0.255$.

We will let $X$ be the number of smokers in a population, so we will reject $H_0$ if $X$ is large enough.


In [46]:
p = 0.255
n = 500
smokers = 138

p_value = 1 - stats.binom.cdf(138, 500, 0.255)
print(f"p_value for accepting h_0: {p_value}")
if p_value > alpha:
    print(f"We accept H_0, thus the claim has not been proven")
else:
    print(f"We do not accept H_0, thus the claim is proven")

p_value for accepting h_0: 0.12996099025442587
We accept H_0, thus the claim has not been proven
