{% extends 'uhepp_vault/base.html' %} {% load pygmentify_tags %} {% block content %}
This getting started guide will walk you through
The guide expects that you have some basic experience in Python. You need to be familiar with functions and objects in Python, however, you don't need to be a library developer. The whole uhepp eco-system does not rely on ROOT, therefore it's not a disadvantage, if you are not familiar with ROOT. You should have a working installation of Python 3 on your machine. The guide will install additional Python package. The guide expect that you can install packages, either in your home directory, system-wide or in a virtual environment
This guide will not show you how to process HEP data. It is assumes that you have an analysis framework in place that iterates over individual events and whose output is a collection of histograms. These might be ROOT TH1 histograms or numpy arrays produced with Numpy, Pandas, Dask, PySpark or Caffea. For this guide, we will read sample histograms including statistical uncertainty from a JSON file.
In principle, you only need a text editor to write and edit JSON or YAML files and a simple tool like curl to interact with the API. However, this would be extremely cumbersome. It is much more convenient to use the uhepp Python package for both tasks. The package provides an interface to create and manipulate plots in uhepp format stored locally or remotely via the API.
For this guide we will need the uhepp package to work with the toy data. Make sure the following package is installed. If you use pip, simply run:
{% pygmentify %}pip install uhepp{% endpygmentify %}
To test that it is installed, open an interactive Python 3 shell and type
{% pygmentify %}import uhepp{% endpygmentify %}
Keep the shell open, we will use it trough out this guide.
The first step is to load the sample histograms from a prepared JSON file. You can download the file using your browser, or if you prefer your terminal, execute
{% pygmentify %}curl -o toyhisto.json https://xxx/{% endpygmentify %}
To load the sample data we can use Python's build-in json parser. If you have downloaded the file to a different location, you need to adjust the path in the following code snippet.
{% pygmentify %}import json with open("toyhisto.json") toyhisto_file: toydata = json.load(toyhisto_file){% endpygmentify %}
The toydata variable is a dictionary, contains binned yields for three processes: Simulated background (bkg), simulated signal (sig) and measured data. (Here in this example, even the measure data is taken from a random generator.) Besides the yields, the variable also contains the statistical uncertainty for each bin and process. Additionally, the dictionary contains the key bin_edges. This variable, stores the boundaries of the bins. Feel free, to explore the data. For example, run
{% pygmentify %}
toydata["bkg"]{% endpygmentify %}
or
{% pygmentify %}toydata["sig_stat"]{% endpygmentify %}
in your interactive Python shell.
You might have notices, that there are 41 bin edges. This means you would expect 40 bins. However, the binned data lists have 42 entries. The additional two values correspond to the under- and overflow bins. By convention, these two bins are always included in the raw data of a uhepp histogram. It is up to you, to tell uhepp to include these events in the visual plot or not.
A uhepp histogram stores the raw data separated from the visual specification. Let's start by creating a UHeppHist object and adding the raw data to the histogram. When you create a histogram object, the first argument is the mathematical symbol of the quantity of the x-axis. In our case, the data represents a mass distribution, so we use the letter m. The second argument, are the bin boundaries. We take them directly from the toydata dictionary.
{% pygmentify %}hist = uhepp.UHeppHist("m", toydata["bin_edges"]){% endpygmentify %}
The raw data is stored in the yields attribute. It is another dictionary mapping arbitrary internal names to the binned data. The binned data is stored as Yield objects. The yields objects couple the value with its uncertainties. A yield object comes close to a ROOT TH1 object (with some key distinctions). Yield objects can be added, scales, etc. while propagating uncertainties. Let's first create the yield objects from the sample data.
{% pygmentify %}signal_yield = uhepp.Yield(toydata["sig"], toydata["sig_stat"]) background_yield = uhepp.Yield(toydata["bkg"], toydata["bkg_stat"]) data_yield = uhepp.Yield(toydata["data"], toydata["data_stat"]){% endpygmentify %}
Finally, let's add the yields to our histogram.
{% pygmentify %}hist.yields = { "signal": signal_yield, "background": background_yield, "data": data_yield }{% endpygmentify %}
The names that we use here as dictionary keys, can be arbitrary strings. You are encouraged to use descriptive names, which makes editing the histogram much easier. We will use these names later to refer to the yields when we specify the content of the main plot or the ratio plot. In this example we've created a single background entry. In a real-world histogram, you would have several different physics proccesses with one entry per proccess. You are encouraged to use a fine-grained process list here. Merging two or more yield entries in the visual specification is easy.
signal_si = uhepp.StackItem(["signal"], "Signal") background_si = uhepp.StackItem(["background"], "Background") mc_stack = uhepp.Stack([background_si, signal_si]) hist.stacks.append(mc_stack){% endpygmentify %} {% pygmentify %}
data_si = uhepp.StackItem(["data"], "Data") data_stack = uhepp.Stack([data_si], bartype="points") hist.stacks.append(data_stack){% endpygmentify %} {% pygmentify %}
hist.variable = "Mass" hist.unit = "GeV" hist.filename = "higgs_mass_dist" hist.author = "Your name"{% endpygmentify %} {% pygmentify %}
ratio_si = uhepp.RatioItem(["data"], ["signal", "background"], bartype="points") hist.ratio.append(ratio_si){% endpygmentify %} {% pygmentify %}
hist.show(){% endpygmentify %}