From c41a7739a79ab4dd6728d26359f8c118af8e2555 Mon Sep 17 00:00:00 2001 From: Frank Sauerburger <frank@sauerburger.com> Date: Mon, 7 Aug 2017 17:52:29 +0200 Subject: [PATCH] Add preliminary version of real-life section Add a preliminary version, i.e. without proof reading, of the section describing how to read, plot and fit data. --- README.md | 215 +++++++++++++++++++++++++++++++++++++++++++++++++++++- decay.txt | 128 ++++++++++++++++++++++++++++++++ 2 files changed, 342 insertions(+), 1 deletion(-) create mode 100644 decay.txt diff --git a/README.md b/README.md index c0660de..5ea9bfc 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,15 @@ years code examples for [ROOT](https://root.cern.ch/) have been provided. Material on the usage of python was missing. The code examples shown in this repository follow the examples shown in the ROOT introductions. +# Table of Contents +1. [Installation](#installation) +2. [Prerequisites and Structure](#prerequisites-and-structure) +3. ['Hello World' Example](#hello-world-example) +4. [Numpy Arrays](#numpy-arrays) +5. [Plotting Functions](#numpy-arrays) +6. [Plotting Data Points](#plotting-data-points) +7. [Reading, Plotting and Fitting Experimental Data](#reading-plotting-and-fitting-experimental-data) + # Installation To get started with python for data analysis in the advanced laboratories you need the python interpreter. In this document we will use `python3`. The @@ -245,7 +254,7 @@ $ cp func_plot.py data_plot.py --> We generate the pseudo data points by adding random deviations to the expected -y-values. Lets assume we measured data points for all half-integer x-values. +y-values. Lets assume we measured data points for all quarter-integer x-values. <!-- append data_plot.py --> ```python x_data = np.array([-2.5, -2, -1.5, -1, -.5, 0, .5, 1, 1.5, 2, 2.5, 3]) @@ -299,3 +308,207 @@ measurement.png After running `data_plot.py` you should have a plot similar to this.  +# Reading, Plotting and Fitting Experimental data +We are given with experimental data from a radioactive decay in this example. +The experimental setup consisted of a radioactive probe, a detector and a +multi-channel-analyzer. The recorded data in `decay.txt` consist of two +tab-separated columns. The first column is called channel. Each channel +corresponds to a certain energy range. The multi-channel-analyzer maintains a +counter for each channel. For each recorded decay the internal counter which +corresponds to the measured energy is incremented. The seconds column stored +these counts. Open the file with out favorite text editor and have a look at the +data. + +As usual, create the file `decay.py` and the import statements, which we need +for this example. + +<!-- Add additional files for non-X11 environment in CI --> +<!-- write decay.py +```python +import matplotlib +matplotlib.use('Agg') +``` +--> +<!-- write decay.py --> +```python +import numpy as np +import scipy.optimize # provides fit routines +import scipy.stats # provides statistical tests and distributions +import matplotlib.pyplot as plt +``` + +To inspect the provided data, we can plot the raw data points first. Numpy +provides the function `loadtxt`, which read a whitespace-separated file into a +numpy array. The function returns a two dimensional array. The outer array has +one entry for each line, and the inner two entries in our case, one for the +channel and the other one for the event count. We can use `transpose()` to flip +the matrix, such that the outer array has two entries, one with an array of +channel values and the other one with event counts. Since our measured event +counts stem from radioactive decay, we know that the event counts follow a +Poisson distribution. Therefore, the uncertainty of the events counts is simply +the square root of the number of events. + +<!-- append decay.py --> +```python +channel, count = np.loadtxt("decay.txt").transpose() +s_count = np.sqrt(count) +plt.errorbar(channel, count, s_count, fmt='.k', capsize=0, label="Data") +plt.savefig("decay_raw.eps") +``` +<!-- append decay.py +``` +plt.savefig("decay_raw.png") +``` +--> +<!-- console +``` +$ python3 decay.py +``` +--> + +The plot of the raw data is shown below. The plot shows the channel on the +x-axis and the number of events per bin on the y-axis. From the experimental setup we expect a +linearly rising background plus a Gaussian peak. + + +Judging from the plot, it looks like the assumed model describes the data. We +would like this model to our data to determine the optimal value of the model +parameter and their uncertainties (and the covariance matrix). Lets give a more +formal version of the expected model +```math + n(c) = A exp\left(-\frac{(c-m)^2}{2 s^2}\right) + y_0 + bc, +``` +where $`n(c)`$ is the expected number of events in channel $`c`$; $`A, m`$ +and $`s`$ are the height, center and width of the Gaussian respectively. The +parameters $`y_0`$ and $`b`$ are the usual parameter of a linear curve which is +assumed to describe our background. The model can be implemented in python as a +function. The first parameter should be the x value, all following arguments are +free parameters of the model. The return value corresponds to the y value, in +our case the expected number of events. + +<!-- append decay.py --> +``` +def model(channel, m, s, A, y0, b): + return A * np.exp(-0.5 * (channel - m)**2 / s**2) + y0 + b * channel +``` + +Please note that we are going to make an approximation with this definiton. +Strictly speaking, comparing thhe return values of our model to the measured +count is not correct. The variable channel corresponds to the radiation energy +measured with the setup. Lets assume channel $`c_i`$ corresponds to energy +$`E_i`$. If we measure $`n_i`$ events in channel $`c_i`$, this means that we +have measured $`n_i`$ in the energy interval $`[\frac{1}{2}(E_{i-1} + E_i), +\frac{1}{2}(E_i} + E_{i+}]`$. The proper way is to integrate our continuous +function $`n(c)`$ in each bin $`[c_{i} - \frac{1}{2}, c_{i} } \frac{1}{2}]`$. The +prodcedure show here is a good approximation, if the function can be considered +is linear within each bin. However, the parameter $`A`$ and $`b`$ are not normalized to +the bin width in this case. + + +To fit this model to our experimental data, we can use the function `curve_fit` +provided by the scipy package. The function `curve_fit` performs a least square +fit and returns the optimal parameters and the covariance matrix. The fit might +not converge on its one. We can guide optimization procedure by providing +suitable start values of the free parameters. From the plot I read off a height +$`A=50`$, a center $`m=60`$ and a width $`s = 10`$ for the Gaussian part and +$`y_0 = 20`$, $`b = 1`$ for the linear part. These values don't have to be +accurate. They should be a rough estimation, this is usually enough to get a +stable fit result. + +<!-- append decay.py --> +``` +p0 = (60, 10, 50, 20, 1) +popt, pcov = scipy.optimize.curve_fit(model, channel, count, p0, np.sqrt(count)) +``` + +To visualize the fitted model, we need to evaluate our model with the optimized +parameters `popt`. We are now ready to add the fitted curve to the plot and save +it. + +<!-- append decay.py --> +``` +fit_count = model(channel, *popt) +plt.plot(channel, model(channel, *popt), label="Linear + Gauss") + +plt.xlabel("Channel") +plt.ylabel("Counts") +plt.ylim(0, 1.1 * max(count)) +plt.legend() +plt.savefig("decay.eps") +``` +<!-- append decay.py +``` +plt.savefig("decay.png") +``` +--> +<!-- console +``` +$ python3 decay.py +``` +--> +The result should look like this. + + +Usually we want to measure some quantity with the experimental setup. For this +we need the optimized parameters and the covariance matrix returned by the +fit. Lets assume we are interested in the best fit value of the parameters and +their uncertainties. We can add the following print outs, to display this kind +of information. + +<!-- append decay.py --> +```python +print("Optimal parameters:") +print(" m = %g +- %g" % (popt[0], np.sqrt(pcov[0][0]))) +print(" s = %g +- %g" % (popt[1], np.sqrt(pcov[1][1]))) +print(" A = %g +- %g" % (popt[2], np.sqrt(pcov[2][2]))) +print(" y0 = %g +- %g" % (popt[3], np.sqrt(pcov[3][3]))) +print(" b = %g +- %g" % (popt[4], np.sqrt(pcov[4][4]))) +print() +``` + +A $`\chi^2`$-test can also be performed, to assess the goodness of this fit. In +a counting experiment like this one, we can rely on scipy's `chisquare`, which +returns the $`\chi^2`$ and the $`p`$-value. The `chisquare` method assumes, that +the uncertainties are the square root of the expectation. If this is not the +case, we have to compute the $`\chi^2`$ manually. The following example shows +both examples. The print statements produce the same output. Please note, that +we have five degrees of freedom, since we have five free parameters in our +model. + +<!-- append decay.py --> +```python +print("chi^2 from scipy:") +chi2, p = scipy.stats.chisquare(count, fit_count, ddof=5) +print(" chi2 / ndf = %g / %d" % (chi2, len(count) - 6)) +print(" p-value = %g" % p) + +print() + +print("Manual chi^2 test:") +uncertainty = np.sqrt(fit_count) +chi2 = ((count - fit_count)**2 / uncertainty**2).sum() +p = scipy.stats.distributions.chi2.sf(chi2, len(count) - 6) +print(" chi2 / ndf = %g / %d" % (chi2, len(count) - 6)) +print(" p-value = %g" % p) + +``` + +If you run the `decay.py` you should see the following fit results. +<!-- console_output --> +```bash +$ python3 decay.py +Optimal parameters: + m = 61.756 +- 0.696195 + s = 8.59757 +- 0.708479 + A = 43.2461 +- 3.24047 + y0 = 20.6589 +- 1.0885 + b = 0.841156 +- 0.0193703 + +chi^2 from scipy: + chi2 / ndf = 120.577 / 122 + p-value = 0.519418 + +Manual chi^2 test: + chi2 / ndf = 120.577 / 122 + p-value = 0.519418 +``` diff --git a/decay.txt b/decay.txt new file mode 100644 index 0000000..b8892c8 --- /dev/null +++ b/decay.txt @@ -0,0 +1,128 @@ +0 21 +1 19 +2 18 +3 23 +4 37 +5 30 +6 23 +7 28 +8 32 +9 24 +10 32 +11 32 +12 39 +13 32 +14 31 +15 37 +16 32 +17 31 +18 36 +19 30 +20 35 +21 39 +22 39 +23 39 +24 30 +25 45 +26 50 +27 46 +28 49 +29 53 +30 41 +31 45 +32 57 +33 48 +34 39 +35 65 +36 61 +37 53 +38 53 +39 51 +40 48 +41 53 +42 54 +43 71 +44 64 +45 69 +46 59 +47 81 +48 73 +49 93 +50 68 +51 80 +52 98 +53 88 +54 102 +55 99 +56 106 +57 113 +58 95 +59 107 +60 107 +61 117 +62 106 +63 107 +64 115 +65 125 +66 127 +67 136 +68 120 +69 110 +70 99 +71 106 +72 103 +73 96 +74 92 +75 107 +76 93 +77 86 +78 72 +79 108 +80 92 +81 94 +82 83 +83 101 +84 110 +85 105 +86 110 +87 94 +88 87 +89 106 +90 74 +91 92 +92 98 +93 98 +94 102 +95 94 +96 114 +97 91 +98 117 +99 107 +100 100 +101 97 +102 106 +103 91 +104 94 +105 96 +106 98 +107 128 +108 120 +109 105 +110 104 +111 129 +112 126 +113 120 +114 115 +115 122 +116 127 +117 131 +118 142 +119 128 +120 117 +121 139 +122 128 +123 122 +124 123 +125 129 +126 137 +127 110 -- GitLab