Skip to content
Snippets Groups Projects
Commit c41a7739 authored by Frank Sauerburger's avatar Frank Sauerburger
Browse files

Add preliminary version of real-life section

Add a preliminary version, i.e. without proof reading, of the section describing
how to read, plot and fit data.
parent 5cd36dc5
No related branches found
No related tags found
No related merge requests found
Pipeline #
......@@ -5,6 +5,15 @@ years code examples for [ROOT](https://root.cern.ch/) have been provided.
Material on the usage of python was missing. The code examples shown in this
repository follow the examples shown in the ROOT introductions.
# Table of Contents
1. [Installation](#installation)
2. [Prerequisites and Structure](#prerequisites-and-structure)
3. ['Hello World' Example](#hello-world-example)
4. [Numpy Arrays](#numpy-arrays)
5. [Plotting Functions](#numpy-arrays)
6. [Plotting Data Points](#plotting-data-points)
7. [Reading, Plotting and Fitting Experimental Data](#reading-plotting-and-fitting-experimental-data)
# Installation
To get started with python for data analysis in the advanced laboratories you
need the python interpreter. In this document we will use `python3`. The
......@@ -245,7 +254,7 @@ $ cp func_plot.py data_plot.py
-->
We generate the pseudo data points by adding random deviations to the expected
y-values. Lets assume we measured data points for all half-integer x-values.
y-values. Lets assume we measured data points for all quarter-integer x-values.
<!-- append data_plot.py -->
```python
x_data = np.array([-2.5, -2, -1.5, -1, -.5, 0, .5, 1, 1.5, 2, 2.5, 3])
......@@ -299,3 +308,207 @@ measurement.png
After running `data_plot.py` you should have a plot similar to this.
![Plot of cropped parabola with data points](https://srv.sauerburger.com/esel/FP-python-examples/-/jobs/artifacts/master/raw/measurement.png?job=doxec_test)
# Reading, Plotting and Fitting Experimental data
We are given with experimental data from a radioactive decay in this example.
The experimental setup consisted of a radioactive probe, a detector and a
multi-channel-analyzer. The recorded data in `decay.txt` consist of two
tab-separated columns. The first column is called channel. Each channel
corresponds to a certain energy range. The multi-channel-analyzer maintains a
counter for each channel. For each recorded decay the internal counter which
corresponds to the measured energy is incremented. The seconds column stored
these counts. Open the file with out favorite text editor and have a look at the
data.
As usual, create the file `decay.py` and the import statements, which we need
for this example.
<!-- Add additional files for non-X11 environment in CI -->
<!-- write decay.py
```python
import matplotlib
matplotlib.use('Agg')
```
-->
<!-- write decay.py -->
```python
import numpy as np
import scipy.optimize # provides fit routines
import scipy.stats # provides statistical tests and distributions
import matplotlib.pyplot as plt
```
To inspect the provided data, we can plot the raw data points first. Numpy
provides the function `loadtxt`, which read a whitespace-separated file into a
numpy array. The function returns a two dimensional array. The outer array has
one entry for each line, and the inner two entries in our case, one for the
channel and the other one for the event count. We can use `transpose()` to flip
the matrix, such that the outer array has two entries, one with an array of
channel values and the other one with event counts. Since our measured event
counts stem from radioactive decay, we know that the event counts follow a
Poisson distribution. Therefore, the uncertainty of the events counts is simply
the square root of the number of events.
<!-- append decay.py -->
```python
channel, count = np.loadtxt("decay.txt").transpose()
s_count = np.sqrt(count)
plt.errorbar(channel, count, s_count, fmt='.k', capsize=0, label="Data")
plt.savefig("decay_raw.eps")
```
<!-- append decay.py
```
plt.savefig("decay_raw.png")
```
-->
<!-- console
```
$ python3 decay.py
```
-->
The plot of the raw data is shown below. The plot shows the channel on the
x-axis and the number of events per bin on the y-axis. From the experimental setup we expect a
linearly rising background plus a Gaussian peak.
![Data only for the decay](https://srv.sauerburger.com/esel/FP-python-examples/-/jobs/artifacts/master/raw/decay_raw.png?job=doxec_test)
Judging from the plot, it looks like the assumed model describes the data. We
would like this model to our data to determine the optimal value of the model
parameter and their uncertainties (and the covariance matrix). Lets give a more
formal version of the expected model
```math
n(c) = A exp\left(-\frac{(c-m)^2}{2 s^2}\right) + y_0 + bc,
```
where $`n(c)`$ is the expected number of events in channel $`c`$; $`A, m`$
and $`s`$ are the height, center and width of the Gaussian respectively. The
parameters $`y_0`$ and $`b`$ are the usual parameter of a linear curve which is
assumed to describe our background. The model can be implemented in python as a
function. The first parameter should be the x value, all following arguments are
free parameters of the model. The return value corresponds to the y value, in
our case the expected number of events.
<!-- append decay.py -->
```
def model(channel, m, s, A, y0, b):
return A * np.exp(-0.5 * (channel - m)**2 / s**2) + y0 + b * channel
```
Please note that we are going to make an approximation with this definiton.
Strictly speaking, comparing thhe return values of our model to the measured
count is not correct. The variable channel corresponds to the radiation energy
measured with the setup. Lets assume channel $`c_i`$ corresponds to energy
$`E_i`$. If we measure $`n_i`$ events in channel $`c_i`$, this means that we
have measured $`n_i`$ in the energy interval $`[\frac{1}{2}(E_{i-1} + E_i),
\frac{1}{2}(E_i} + E_{i+}]`$. The proper way is to integrate our continuous
function $`n(c)`$ in each bin $`[c_{i} - \frac{1}{2}, c_{i} } \frac{1}{2}]`$. The
prodcedure show here is a good approximation, if the function can be considered
is linear within each bin. However, the parameter $`A`$ and $`b`$ are not normalized to
the bin width in this case.
To fit this model to our experimental data, we can use the function `curve_fit`
provided by the scipy package. The function `curve_fit` performs a least square
fit and returns the optimal parameters and the covariance matrix. The fit might
not converge on its one. We can guide optimization procedure by providing
suitable start values of the free parameters. From the plot I read off a height
$`A=50`$, a center $`m=60`$ and a width $`s = 10`$ for the Gaussian part and
$`y_0 = 20`$, $`b = 1`$ for the linear part. These values don't have to be
accurate. They should be a rough estimation, this is usually enough to get a
stable fit result.
<!-- append decay.py -->
```
p0 = (60, 10, 50, 20, 1)
popt, pcov = scipy.optimize.curve_fit(model, channel, count, p0, np.sqrt(count))
```
To visualize the fitted model, we need to evaluate our model with the optimized
parameters `popt`. We are now ready to add the fitted curve to the plot and save
it.
<!-- append decay.py -->
```
fit_count = model(channel, *popt)
plt.plot(channel, model(channel, *popt), label="Linear + Gauss")
plt.xlabel("Channel")
plt.ylabel("Counts")
plt.ylim(0, 1.1 * max(count))
plt.legend()
plt.savefig("decay.eps")
```
<!-- append decay.py
```
plt.savefig("decay.png")
```
-->
<!-- console
```
$ python3 decay.py
```
-->
The result should look like this.
![Data and fit for the decay](https://srv.sauerburger.com/esel/FP-python-examples/-/jobs/artifacts/master/raw/decay.png?job=doxec_test)
Usually we want to measure some quantity with the experimental setup. For this
we need the optimized parameters and the covariance matrix returned by the
fit. Lets assume we are interested in the best fit value of the parameters and
their uncertainties. We can add the following print outs, to display this kind
of information.
<!-- append decay.py -->
```python
print("Optimal parameters:")
print(" m = %g +- %g" % (popt[0], np.sqrt(pcov[0][0])))
print(" s = %g +- %g" % (popt[1], np.sqrt(pcov[1][1])))
print(" A = %g +- %g" % (popt[2], np.sqrt(pcov[2][2])))
print(" y0 = %g +- %g" % (popt[3], np.sqrt(pcov[3][3])))
print(" b = %g +- %g" % (popt[4], np.sqrt(pcov[4][4])))
print()
```
A $`\chi^2`$-test can also be performed, to assess the goodness of this fit. In
a counting experiment like this one, we can rely on scipy's `chisquare`, which
returns the $`\chi^2`$ and the $`p`$-value. The `chisquare` method assumes, that
the uncertainties are the square root of the expectation. If this is not the
case, we have to compute the $`\chi^2`$ manually. The following example shows
both examples. The print statements produce the same output. Please note, that
we have five degrees of freedom, since we have five free parameters in our
model.
<!-- append decay.py -->
```python
print("chi^2 from scipy:")
chi2, p = scipy.stats.chisquare(count, fit_count, ddof=5)
print(" chi2 / ndf = %g / %d" % (chi2, len(count) - 6))
print(" p-value = %g" % p)
print()
print("Manual chi^2 test:")
uncertainty = np.sqrt(fit_count)
chi2 = ((count - fit_count)**2 / uncertainty**2).sum()
p = scipy.stats.distributions.chi2.sf(chi2, len(count) - 6)
print(" chi2 / ndf = %g / %d" % (chi2, len(count) - 6))
print(" p-value = %g" % p)
```
If you run the `decay.py` you should see the following fit results.
<!-- console_output -->
```bash
$ python3 decay.py
Optimal parameters:
m = 61.756 +- 0.696195
s = 8.59757 +- 0.708479
A = 43.2461 +- 3.24047
y0 = 20.6589 +- 1.0885
b = 0.841156 +- 0.0193703
chi^2 from scipy:
chi2 / ndf = 120.577 / 122
p-value = 0.519418
Manual chi^2 test:
chi2 / ndf = 120.577 / 122
p-value = 0.519418
```
decay.txt 0 → 100644
0 21
1 19
2 18
3 23
4 37
5 30
6 23
7 28
8 32
9 24
10 32
11 32
12 39
13 32
14 31
15 37
16 32
17 31
18 36
19 30
20 35
21 39
22 39
23 39
24 30
25 45
26 50
27 46
28 49
29 53
30 41
31 45
32 57
33 48
34 39
35 65
36 61
37 53
38 53
39 51
40 48
41 53
42 54
43 71
44 64
45 69
46 59
47 81
48 73
49 93
50 68
51 80
52 98
53 88
54 102
55 99
56 106
57 113
58 95
59 107
60 107
61 117
62 106
63 107
64 115
65 125
66 127
67 136
68 120
69 110
70 99
71 106
72 103
73 96
74 92
75 107
76 93
77 86
78 72
79 108
80 92
81 94
82 83
83 101
84 110
85 105
86 110
87 94
88 87
89 106
90 74
91 92
92 98
93 98
94 102
95 94
96 114
97 91
98 117
99 107
100 100
101 97
102 106
103 91
104 94
105 96
106 98
107 128
108 120
109 105
110 104
111 129
112 126
113 120
114 115
115 122
116 127
117 131
118 142
119 128
120 117
121 139
122 128
123 122
124 123
125 129
126 137
127 110
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment