sem6/prob/stat5/Opgaver.ipynb


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from scipy import stats"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Problem 1\n",
    "\n",
    "> It is claimed that a certain type of bipolar transistor has a mean value of current\n",
    "> gain that is at least 210. A sample of these transistors is tested. If the sample mean\n",
    "> value of current gain is 200 with a sample standard deviation of 35, would the\n",
    "> claim be rejected at the 5 percent level of signiﬁcance if\n",
    "> \n",
    "> - (a) the sample size is 25;\n",
    "> - (b) the sample size is 64?\n",
    "\n",
    "First we define our $H_0$, which we set to $\\mu < 10$.\n",
    "\n",
    "We dont know the varience of the distribution, but instead the sample varience $S$.\n",
    "Then we can use the one sided *t-test*.\n",
    "\n",
    "$$H_0 : \\mu < 210$$\n",
    "$$H_1 : \\mu \\geq 210$$\n",
    "\n",
    "First TS is calculated with \n",
    "$$\n",
    "TS = \\sqrt{n} (\\bar{X} - \\mu_0) / S\n",
    "$$\n",
    "\n",
    "Then the p value is calculated\n",
    "$$\n",
    "p = P{T_{n-1} \\geq TS} = 1 - T(TS)\n",
    "$$\n",
    "\n",
    "Then one can check if the *p-value* is smaller than $0.95$.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TS: -1.4285714285714286, p_value: 0.9169932070815955\n",
      "Assignment A: True\n",
      "TS: -2.2857142857142856, p_value: 0.987176989403574\n",
      "Assignment B: False\n"
     ]
    }
   ],
   "source": [
    "mu_min = 210\n",
    "mu_sample = 200\n",
    "sigma_sample= 35\n",
    "alpha = 0.05\n",
    "accept = 1 - alpha\n",
    "\n",
    "def test_with_n(n):\n",
    "    TS = np.sqrt(n) * (mu_sample - mu_min) / sigma_sample\n",
    "    p_value = 1 - stats.t.cdf(TS, n-1)\n",
    "    \n",
    "    print(f\"TS: {TS}, p_value: {p_value}\")\n",
    "    return p_value < accept\n",
    "\n",
    "# Part A\n",
    "print(f\"Assignment A: {test_with_n(25)}\")\n",
    "print(f\"Assignment B: {test_with_n(64)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Problem 2\n",
    "\n",
    "> A question of medical importance is whether jogging leads to a reduction in\n",
    "one’s pulse rate. To test this hypothesis, 8 nonjogging volunteers agreed to begin\n",
    "a 1-month jogging program. After the month their pulse rates were determined\n",
    "and compared with their earlier values. If the data are as follows, can we conclude\n",
    "that jogging has had an effect on the pulse rates?\n",
    "\n",
    "I wont put the table from the book in :-(.\n",
    "\n",
    "Here the after is dependent of the before.\n",
    "We therefore have to look at the differences\n",
    "\n",
    "We let $H_0$ be that the pulse is lowered, thus the difference mean $\\mu_d < 0$.\n",
    "\n",
    "We assume $\\alpha = 0.05$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TS: -1.2629741003498156, p-value: 0.12351970736529827\n",
      "H_0 will be accepted with all alpha=0.12351970736529827\n"
     ]
    }
   ],
   "source": [
    "before = np.array([74, 86, 98, 102, 78, 84, 79, 70])\n",
    "after = np.array([70, 85, 90, 110, 71, 80, 69, 74])\n",
    "diff = after - before\n",
    "n = len(diff)\n",
    "\n",
    "\n",
    "mu_s = np.mean(diff)\n",
    "var_s = np.sqrt(np.sum((diff - mu_s)**2 / (n - 1)))\n",
    "\n",
    "TS = np.sqrt(n) * (mu_s - 0) / var_s\n",
    "p_value = 1 - stats.t.cdf(np.abs(TS), n-1)\n",
    "print(f\"TS: {TS}, p-value: {p_value}\")\n",
    "print(f\"H_0 will be accepted with all alpha={p_value}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Problem 3\n",
    "\n",
    "> According to the U.S. Bureau of the Census, 25.5 percent of the population of\n",
    "those age 18 or over smoked in 1990. A scientist has recently claimed that this\n",
    "percentage has since increased, and to prove her claim she randomly sampled 500\n",
    "individuals from this population. If 138 of them were smokers, is her claim proved?\n",
    "Use the 5 percent level of signiﬁcance.\n",
    "\n",
    "The $H_0$ is that the new percentage is lower of equal than 25.5.\n",
    "Because each person is a coin flip, this is a Bernoulli distribution.\n",
    "\n",
    "$H_0$ is therefore $p \\leq p_0$ where $p$ is the Bernoulli probability and $p_0 = 0.255$.\n",
    "\n",
    "We will let $X$ be the number of smokers in a population, so we will reject $H_0$ if $X$ is large enough.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "p_value for accepting h_0: 0.12996099025442587\n",
      "We accept H_0, thus the claim has not been proven\n"
     ]
    }
   ],
   "source": [
    "p = 0.255\n",
    "n = 500\n",
    "smokers = 138\n",
    "\n",
    "p_value = 1 - stats.binom.cdf(138, 500, 0.255)\n",
    "print(f\"p_value for accepting h_0: {p_value}\")\n",
    "if p_value > alpha:\n",
    "    print(f\"We accept H_0, thus the claim has not been proven\")\n",
    "else:\n",
    "    print(f\"We do not accept H_0, thus the claim is proven\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}