Skip to content
Snippets Groups Projects
Unverified Commit e4131c4f authored by Frank Sauerburger's avatar Frank Sauerburger
Browse files

Add optional column creators to Cut

parent a51669be
Branches 75-add-support-for-scale-factors
No related tags found
1 merge request!67Resolve "Add support for scale factors"
Pipeline #12696 passed
......@@ -9,7 +9,7 @@ doctest:
image: python:3.7
script:
- pip install -r requirements.txt
- "python -m doctest -v $(ls freeforestml/*.py | grep -v '__init__.py')"
- ci/doctest.sh
unittest:
stage: test
......
#!/bin/bash
python3 -m doctest -v $(ls freeforestml/*.py | grep -v '__init__.py')
......@@ -8,7 +8,7 @@ class Cut:
quantities.
Cuts store the condition to be applied to a dataframe. New cut objects
accept all event by default. The selection can be limited by passing a
accept all events by default. The selection can be limited by passing a
lambda to the constructor.
>>> sel_all = Cut()
......@@ -65,9 +65,20 @@ class Cut:
>>> sel_sr = Cut(lambda df: df.is_sr == 1, label="Signal Region")
>>> sel_sr.label
'Signal Region'
If the application of a cut requires to change the event weights by a so
called scale factors, you can pass additional optional keyword arguments
that specify how the new weight should be computed.
>>> sel_sample = Cut(lambda df: df.value % 2 == 0, \
weight=lambda df: df.weight * 2)
The argument name 'weight' in this example is arbitrary. It is even
possible to add new columns to the returned dataframe in this way,
however, this is not recommended.
"""
def __init__(self, func=None, label=None):
def __init__(self, func=None, label=None, **columns):
"""
Creates a new cut. The optional func argument is called with the
dataframe upon evaluation. The function must return an index array. If
......@@ -77,16 +88,21 @@ class Cut:
if isinstance(func, Cut):
self.func = func.func
self.label = label or func.label
self.columns = columns or func.columns
else:
self.func = func
self.label = label
self.columns = columns
def __call__(self, dataframe):
"""
Applies the internally stored cut to the given dataframe and returns a
new dataframe containing only entries passing the event selection.
"""
return dataframe[self.idx_array(dataframe)]
new_df = dataframe[self.idx_array(dataframe)]
if self.columns:
new_df = new_df.assign(**self.columns)
return new_df
def idx_array(self, dataframe):
"""
......
......@@ -311,3 +311,26 @@ class CutTestCase(unittest.TestCase):
high_sale = Cut(lambda df: df.sale > 10)
self.assertEqual(list(high_sale(self.df).year), [])
def test_assign_columns(self):
"""
Check that passing a keyword argument overwrites an existing column.
"""
alternate = Cut(lambda df: df.year % 2 == 0,
sale=lambda df: df.sale * 2)
df_alt = alternate(self.df)
self.assertEqual(list(df_alt.year), [2010, 2012, 2014, 2016])
self.assertEqual(list(df_alt.sale), [7.8, 9.4, 15.0, 4.6])
def test_assign_new_columns(self):
"""
Check that passing a keyword argument creates a new columns
"""
alternate = Cut(lambda df: df.year % 2 == 0,
weight=lambda df: df.year * 0 + 2)
df_alt = alternate(self.df)
self.assertEqual(list(df_alt.year), [2010, 2012, 2014, 2016])
self.assertEqual(list(df_alt.sale), [3.9, 4.7, 7.5, 2.3])
self.assertEqual(list(df_alt.weight), [2, 2, 2, 2])
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment