aboutsummaryrefslogtreecommitdiff
path: root/sem6/prob/stat5/Opgaver.ipynb
diff options
context:
space:
mode:
Diffstat (limited to 'sem6/prob/stat5/Opgaver.ipynb')
-rw-r--r--sem6/prob/stat5/Opgaver.ipynb204
1 files changed, 204 insertions, 0 deletions
diff --git a/sem6/prob/stat5/Opgaver.ipynb b/sem6/prob/stat5/Opgaver.ipynb
new file mode 100644
index 0000000..4942f92
--- /dev/null
+++ b/sem6/prob/stat5/Opgaver.ipynb
@@ -0,0 +1,204 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "from scipy import stats"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Problem 1\n",
+ "\n",
+ "> It is claimed that a certain type of bipolar transistor has a mean value of current\n",
+ "> gain that is at least 210. A sample of these transistors is tested. If the sample mean\n",
+ "> value of current gain is 200 with a sample standard deviation of 35, would the\n",
+ "> claim be rejected at the 5 percent level of significance if\n",
+ "> \n",
+ "> - (a) the sample size is 25;\n",
+ "> - (b) the sample size is 64?\n",
+ "\n",
+ "First we define our $H_0$, which we set to $\\mu < 10$.\n",
+ "\n",
+ "We dont know the varience of the distribution, but instead the sample varience $S$.\n",
+ "Then we can use the one sided *t-test*.\n",
+ "\n",
+ "$$H_0 : \\mu < 210$$\n",
+ "$$H_1 : \\mu \\geq 210$$\n",
+ "\n",
+ "First TS is calculated with \n",
+ "$$\n",
+ "TS = \\sqrt{n} (\\bar{X} - \\mu_0) / S\n",
+ "$$\n",
+ "\n",
+ "Then the p value is calculated\n",
+ "$$\n",
+ "p = P{T_{n-1} \\geq TS} = 1 - T(TS)\n",
+ "$$\n",
+ "\n",
+ "Then one can check if the *p-value* is smaller than $0.95$.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "TS: -1.4285714285714286, p_value: 0.9169932070815955\n",
+ "Assignment A: True\n",
+ "TS: -2.2857142857142856, p_value: 0.987176989403574\n",
+ "Assignment B: False\n"
+ ]
+ }
+ ],
+ "source": [
+ "mu_min = 210\n",
+ "mu_sample = 200\n",
+ "sigma_sample= 35\n",
+ "alpha = 0.05\n",
+ "accept = 1 - alpha\n",
+ "\n",
+ "def test_with_n(n):\n",
+ " TS = np.sqrt(n) * (mu_sample - mu_min) / sigma_sample\n",
+ " p_value = 1 - stats.t.cdf(TS, n-1)\n",
+ " \n",
+ " print(f\"TS: {TS}, p_value: {p_value}\")\n",
+ " return p_value < accept\n",
+ "\n",
+ "# Part A\n",
+ "print(f\"Assignment A: {test_with_n(25)}\")\n",
+ "print(f\"Assignment B: {test_with_n(64)}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Problem 2\n",
+ "\n",
+ "> A question of medical importance is whether jogging leads to a reduction in\n",
+ "one’s pulse rate. To test this hypothesis, 8 nonjogging volunteers agreed to begin\n",
+ "a 1-month jogging program. After the month their pulse rates were determined\n",
+ "and compared with their earlier values. If the data are as follows, can we conclude\n",
+ "that jogging has had an effect on the pulse rates?\n",
+ "\n",
+ "I wont put the table from the book in :-(.\n",
+ "\n",
+ "Here the after is dependent of the before.\n",
+ "We therefore have to look at the differences\n",
+ "\n",
+ "We let $H_0$ be that the pulse is lowered, thus the difference mean $\\mu_d < 0$.\n",
+ "\n",
+ "We assume $\\alpha = 0.05$"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 41,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "TS: -1.2629741003498156, p-value: 0.12351970736529827\n",
+ "H_0 will be accepted with all alpha=0.12351970736529827\n"
+ ]
+ }
+ ],
+ "source": [
+ "before = np.array([74, 86, 98, 102, 78, 84, 79, 70])\n",
+ "after = np.array([70, 85, 90, 110, 71, 80, 69, 74])\n",
+ "diff = after - before\n",
+ "n = len(diff)\n",
+ "\n",
+ "\n",
+ "mu_s = np.mean(diff)\n",
+ "var_s = np.sqrt(np.sum((diff - mu_s)**2 / (n - 1)))\n",
+ "\n",
+ "TS = np.sqrt(n) * (mu_s - 0) / var_s\n",
+ "p_value = 1 - stats.t.cdf(np.abs(TS), n-1)\n",
+ "print(f\"TS: {TS}, p-value: {p_value}\")\n",
+ "print(f\"H_0 will be accepted with all alpha={p_value}\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Problem 3\n",
+ "\n",
+ "> According to the U.S. Bureau of the Census, 25.5 percent of the population of\n",
+ "those age 18 or over smoked in 1990. A scientist has recently claimed that this\n",
+ "percentage has since increased, and to prove her claim she randomly sampled 500\n",
+ "individuals from this population. If 138 of them were smokers, is her claim proved?\n",
+ "Use the 5 percent level of significance.\n",
+ "\n",
+ "The $H_0$ is that the new percentage is lower of equal than 25.5.\n",
+ "Because each person is a coin flip, this is a Bernoulli distribution.\n",
+ "\n",
+ "$H_0$ is therefore $p \\leq p_0$ where $p$ is the Bernoulli probability and $p_0 = 0.255$.\n",
+ "\n",
+ "We will let $X$ be the number of smokers in a population, so we will reject $H_0$ if $X$ is large enough.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 46,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "p_value for accepting h_0: 0.12996099025442587\n",
+ "We accept H_0, thus the claim has not been proven\n"
+ ]
+ }
+ ],
+ "source": [
+ "p = 0.255\n",
+ "n = 500\n",
+ "smokers = 138\n",
+ "\n",
+ "p_value = 1 - stats.binom.cdf(138, 500, 0.255)\n",
+ "print(f\"p_value for accepting h_0: {p_value}\")\n",
+ "if p_value > alpha:\n",
+ " print(f\"We accept H_0, thus the claim has not been proven\")\n",
+ "else:\n",
+ " print(f\"We do not accept H_0, thus the claim is proven\")"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.2"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}