diff --git a/README.md b/README.md index 86ebec3e0d5ad6f1b4a680372f15ba3ac0b10b73..43ccde130c9ebd9c9a96710ceaac9502e542e00a 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,11 @@ This repository consists of a collection of python examples intended as an -introduction on the use of python in data analysis, especially for the +introduction to the use of python in data analysis, especially for the advanced laboratories in physics at the University of Freiburg. In previous years, code examples for [ROOT](https://root.cern.ch/) have been provided. Material showing how to use python for the same task was missing. This document tries to fill the gap. -If you think this tutorial is useful, lacks important information, or is +If you think this tutorial is useful, lacks essential information, or is unclear, don't hesitate to give [feedback](mailto:frank@sauerburger.com). # Table of Contents @@ -36,7 +36,7 @@ $ pip3 install --user numpy scipy matplotlib ``` The `--user` argument for `pip` installs the python package in your home -directory, which hides potentially older packages installed with `apt-get`. +directory, which potentially hides older packages installed with `apt-get`. ## Windows Since I'm not using python on Windows myself, I don't have first-hand experience @@ -46,7 +46,7 @@ good solution for windows users since it provides all required packages. # Prerequisites and About the Tutorial This tutorial assumes that you have some experience with python, which -includes variable assignment, function calling and function definition, `if` statements and +includes variable assignment, function calling, and function definition, `if` statements and `for` loops. The tutorial uses only the very basics, such as variable assignments and function calls, but it is certainly advisable to know about control structures. @@ -113,7 +113,7 @@ The standard data structure to store numerical data is a numpy array. Numpy arrays are defined in the numpy package and are implemented in a very efficient way. -To get stared with numpy arrays, create a file `np_arrays.py` and add all lines +To get started with numpy arrays, create a file `np_arrays.py` and add all lines listed in this section. The first line should be an import statement. <!-- write np_arrays.py --> ```python @@ -167,7 +167,7 @@ The result is [-1. 0.5 2. 5. 6.5] ``` Numpy offers many other functionalities which are beyond the scope of this basic -introduction. It is definitely worth glancing at the +introduction. It is worth glancing at the [documentation](https://docs.scipy.org/doc/numpy/index.html). # Plotting Functions @@ -201,7 +201,7 @@ matplotlib.use('Agg') # Import the numpy library. import numpy as np -# Import the powerfull plotting library. +# Import the powerful plotting library. import matplotlib.pyplot as plt ``` @@ -215,7 +215,7 @@ x = np.linspace(-2.5, 3, 200) ``` We can easily calculate the square of all these values with `x**2`. Cropping the -right part is a bit more tricky. First we create an index array of `1`'s and +right part is a bit more tricky. First, we create an index array of `1`'s and `0`'s, which indicate whether $`x \geq 2`$. This index array has the same length as our $`x`$-grid. The first elements of the index array are `0`'s, since the @@ -225,7 +225,7 @@ The index array can be used to select a subset of $`y`$-values, namely all $`y`$-values, for which $`x\geq 2`$. Finally, we can assign the value $`4`$ to this subset, and therefore effectively crop the parabola. The implementation in python of the algorithm outlined -above is rather short. +above is rather short. <!-- append func_plot.py --> ```python # Calculate the regular parabola. @@ -237,7 +237,7 @@ idx = (x >= 2) # Set all y-values to 4, for which x >= 2. y[idx] = 4 -# One can get rid of the intermetdiate index array and combine both lines into +# One can get rid of the intermediate index array and combine both lines into # the statement y[x >= 2] = 4 ``` @@ -253,7 +253,7 @@ plt.plot(x, y) plt.xlabel("$x$") plt.ylabel("Cropped Parabola") -# Save the figure. Various different output formats are available. +# Save the figure. Different output formats are available. plt.savefig("cropped_parabola.eps") ``` <!-- append func_plot.py @@ -288,7 +288,7 @@ follow $`f(x)`$. This example is based on the code from the previous example. Copy the file from the previous example to `data_plot.py`, such that we can append the following code snippets to `data_plot.py`. Keep -the plotting code from the previous example as it is. +the plotting code from the last example as it is. <!-- console ```bash $ cp func_plot.py data_plot.py @@ -367,7 +367,7 @@ After running `data_plot.py`, you should have a plot similar to this.  # Reading, Plotting and Fitting Experimental Data -We are given with experimental data from a radioactive decay in this example. +We are given with experimental data from radioactive decay in this example. The experimental setup consisted of a radioactive probe, a detector, and a multi-channel-analyzer. The recorded data in [`decay.txt`](https://gitlab.sauerburger.com/frank/FP-python-examples/blob/master/decay.txt) @@ -471,13 +471,13 @@ def model(channel, m, s, A, y0, b): Please note that we are making an approximation with this definition. Strictly speaking, comparing the return values of our model to the measured count is not correct. The variable _channel_ corresponds to the energy -measured with the setup. Lets assume channel $`c_i`$ corresponds to energy +measured with the setup. Let's assume channel $`c_i`$ corresponds to the energy $`E_i`$. If we measure $`n_i`$ events in channel $`c_i`$, this means that we have measured $`n_i`$ in the energy interval $`[\frac{1}{2}(E_{i-1} + E_i), \frac{1}{2}(E_i + E_{i+1})]`$. The proper way is to integrate our continuous function $`n(c)`$ in each bin $`[c_{i} - \frac{1}{2}, c_{i} + \frac{1}{2}]`$ and compare these bin-wise integrals to the measured data. The -procedure shown here is a good approximation, if the function can be considered +procedure shown here is a good approximation if the function can be considered to be linear within each bin. However, the parameter $`A`$ and $`b`$ are not normalized to the bin width in this case. @@ -486,10 +486,10 @@ To fit this model to our experimental data, we can use the function `curve_fit` provided by the scipy package. The function `curve_fit` performs a least square fit and returns the optimal parameters and the covariance matrix. The fit might not converge on its own. We can guide the optimization procedure by providing -suitable start values of the free parameters. From the plot I read off a height +suitable start values of the free parameters. From the plot, I read off a height $`A=50`$, a center $`m=60`$ and a width $`s = 10`$ for the Gaussian part and $`y_0 = 20`$ and $`b = 1`$ for the linear part. These values don't have to be -accurate. They should be a rough estimation, this is usually enough to get a +accurate. They should be a rough estimation. This is usually enough to get a stable fit result. More information on the fitting method can be found in the [documentation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html). @@ -600,9 +600,9 @@ print(" p-value = %g" % p) ``` The additional `- 1` in the call of `chisquare` is necessary because the method -implicitly assumes that the total number of events is fixed. The method -therefore reduces the number of degrees of freedom by one. However, in our case, -the number of decays is not fixed and we have an additional degree +implicitly assumes that the total number of events is fixed. The method, +therefore, reduces the number of degrees of freedom by one. However, in our case, +the number of decays is not fixed, and we have an additional degree of freedom compared to what `chisquare` assumes. The `- 1` in the `ddof` corrects for this.