diff options
Diffstat (limited to 'sem6/prob/stat5/Opgaver.ipynb')
-rw-r--r-- | sem6/prob/stat5/Opgaver.ipynb | 204 |
1 files changed, 204 insertions, 0 deletions
diff --git a/sem6/prob/stat5/Opgaver.ipynb b/sem6/prob/stat5/Opgaver.ipynb new file mode 100644 index 0000000..4942f92 --- /dev/null +++ b/sem6/prob/stat5/Opgaver.ipynb @@ -0,0 +1,204 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "from scipy import stats" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Problem 1\n", + "\n", + "> It is claimed that a certain type of bipolar transistor has a mean value of current\n", + "> gain that is at least 210. A sample of these transistors is tested. If the sample mean\n", + "> value of current gain is 200 with a sample standard deviation of 35, would the\n", + "> claim be rejected at the 5 percent level of significance if\n", + "> \n", + "> - (a) the sample size is 25;\n", + "> - (b) the sample size is 64?\n", + "\n", + "First we define our $H_0$, which we set to $\\mu < 10$.\n", + "\n", + "We dont know the varience of the distribution, but instead the sample varience $S$.\n", + "Then we can use the one sided *t-test*.\n", + "\n", + "$$H_0 : \\mu < 210$$\n", + "$$H_1 : \\mu \\geq 210$$\n", + "\n", + "First TS is calculated with \n", + "$$\n", + "TS = \\sqrt{n} (\\bar{X} - \\mu_0) / S\n", + "$$\n", + "\n", + "Then the p value is calculated\n", + "$$\n", + "p = P{T_{n-1} \\geq TS} = 1 - T(TS)\n", + "$$\n", + "\n", + "Then one can check if the *p-value* is smaller than $0.95$.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "TS: -1.4285714285714286, p_value: 0.9169932070815955\n", + "Assignment A: True\n", + "TS: -2.2857142857142856, p_value: 0.987176989403574\n", + "Assignment B: False\n" + ] + } + ], + "source": [ + "mu_min = 210\n", + "mu_sample = 200\n", + "sigma_sample= 35\n", + "alpha = 0.05\n", + "accept = 1 - alpha\n", + "\n", + "def test_with_n(n):\n", + " TS = np.sqrt(n) * (mu_sample - mu_min) / sigma_sample\n", + " p_value = 1 - stats.t.cdf(TS, n-1)\n", + " \n", + " print(f\"TS: {TS}, p_value: {p_value}\")\n", + " return p_value < accept\n", + "\n", + "# Part A\n", + "print(f\"Assignment A: {test_with_n(25)}\")\n", + "print(f\"Assignment B: {test_with_n(64)}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Problem 2\n", + "\n", + "> A question of medical importance is whether jogging leads to a reduction in\n", + "one’s pulse rate. To test this hypothesis, 8 nonjogging volunteers agreed to begin\n", + "a 1-month jogging program. After the month their pulse rates were determined\n", + "and compared with their earlier values. If the data are as follows, can we conclude\n", + "that jogging has had an effect on the pulse rates?\n", + "\n", + "I wont put the table from the book in :-(.\n", + "\n", + "Here the after is dependent of the before.\n", + "We therefore have to look at the differences\n", + "\n", + "We let $H_0$ be that the pulse is lowered, thus the difference mean $\\mu_d < 0$.\n", + "\n", + "We assume $\\alpha = 0.05$" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "TS: -1.2629741003498156, p-value: 0.12351970736529827\n", + "H_0 will be accepted with all alpha=0.12351970736529827\n" + ] + } + ], + "source": [ + "before = np.array([74, 86, 98, 102, 78, 84, 79, 70])\n", + "after = np.array([70, 85, 90, 110, 71, 80, 69, 74])\n", + "diff = after - before\n", + "n = len(diff)\n", + "\n", + "\n", + "mu_s = np.mean(diff)\n", + "var_s = np.sqrt(np.sum((diff - mu_s)**2 / (n - 1)))\n", + "\n", + "TS = np.sqrt(n) * (mu_s - 0) / var_s\n", + "p_value = 1 - stats.t.cdf(np.abs(TS), n-1)\n", + "print(f\"TS: {TS}, p-value: {p_value}\")\n", + "print(f\"H_0 will be accepted with all alpha={p_value}\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Problem 3\n", + "\n", + "> According to the U.S. Bureau of the Census, 25.5 percent of the population of\n", + "those age 18 or over smoked in 1990. A scientist has recently claimed that this\n", + "percentage has since increased, and to prove her claim she randomly sampled 500\n", + "individuals from this population. If 138 of them were smokers, is her claim proved?\n", + "Use the 5 percent level of significance.\n", + "\n", + "The $H_0$ is that the new percentage is lower of equal than 25.5.\n", + "Because each person is a coin flip, this is a Bernoulli distribution.\n", + "\n", + "$H_0$ is therefore $p \\leq p_0$ where $p$ is the Bernoulli probability and $p_0 = 0.255$.\n", + "\n", + "We will let $X$ be the number of smokers in a population, so we will reject $H_0$ if $X$ is large enough.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "p_value for accepting h_0: 0.12996099025442587\n", + "We accept H_0, thus the claim has not been proven\n" + ] + } + ], + "source": [ + "p = 0.255\n", + "n = 500\n", + "smokers = 138\n", + "\n", + "p_value = 1 - stats.binom.cdf(138, 500, 0.255)\n", + "print(f\"p_value for accepting h_0: {p_value}\")\n", + "if p_value > alpha:\n", + " print(f\"We accept H_0, thus the claim has not been proven\")\n", + "else:\n", + " print(f\"We do not accept H_0, thus the claim is proven\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.2" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} |