Blinded analyses

In some studies it becomes necessary to hide the true value of a magnitude in order to prevent analysts to bias the result. The MinKit package offers a way to wrap the values and errors of the parameters to complicate the access to these attributes. Let’s start by building a model composed by two components.

[1]:
%matplotlib inline
import matplotlib.pyplot as plt
import minkit
import numpy as np

m = minkit.Parameter('m', bounds=(0, 50))
c = minkit.Parameter('c', 30, bounds=(25, 35))
s = minkit.Parameter('s', 2, bounds=(1, 4))
a = minkit.Parameter('a', 0.5, bounds=(0.2, 10))
n = minkit.Parameter('n', 2, bounds=(1, 20))

cb = minkit.CrystalBall('sig', m, c, s, a, n)

k = minkit.Parameter('k', -0.01, bounds=(-1, 0))
e = minkit.Exponential('bkg', m, k)

y = minkit.Parameter('y', 0.5, bounds=(0, 1))

pdf = minkit.AddPDFs.two_components('pdf', cb, e, y)

data = pdf.generate(10000)

# Add a constraint to the parameter ruling the power-law tail of the Crystal-Ball
n_c = minkit.Parameter('n_c', n.value)
n_s = minkit.Parameter('n_s', 0.1)
n_constraint = minkit.Gaussian('n_constraint', n, n_c, n_s)

Now that we have the PDF and a data sample to fit we will blind some parameters of the Crystal-Ball function.

[2]:
a.set_blinding_configuration(scale=10, offset=2)
c.set_blinding_configuration(scale=10, offset=3)

The first element of the tuple rules the distance of the blinded value to the true value of the parameter. The second is related to a scale factor that multiplies the sum of the two previous quantities. Two random numbers are generated and determine the final equation to transform true values into blinded values. You can see that both the values and the bounds of the parameters are now modified when we access them:

[3]:
print(f'(a) value: {a.value}, bounds: {a.bounds}')
print(f'(c) value: {c.value}, bounds: {c.bounds}')
(a) value: 4.916079425426541, bounds: [ 3.75531604 41.67358666]
(c) value: 171.59662810889603, bounds: [143.03794197 200.15531425]

Note that from this and from the initial values it is possible to determine the transformation that is used in order to hide the true values. The blinding procedure is just a way to avoid looking at the true result before the whole analysis strategy has been defined, and the sensitivities have been computed. We can now run a fit and see the results.

[4]:
with minkit.minimizer('uml', pdf, data, constraints=[n_constraint]) as minimizer:
    minimizer.minimize()

FCN = 74285.66681671573 TOTAL NCALL = 183 NCALLS = 183
EDM = 6.39271292379903e-07 GOAL EDM = 1e-05 UP = 1.0
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed?
0 y 0.503738 0.0132383 0 1 No
1 c 170.989 0.645717 143.038 200.155 No
2 s 2.10274 0.0981425 1 4 No
3 a 4.97862 0.152174 3.75532 41.6736 No
4 n 1.99169 0.0995817 1 20 No
5 k -0.00969058 0.00134413 -1 0 No

You can see that in the information displayed by MIGRAD the bounds and values of the parameters are shown blinded. If we attempt to generate or evaluate the PDF on our own, it will be called with the blinded values. In order to retrieve the true result we must remove the configuration of the data parameters, as

[5]:
a.set_blinding_configuration(None)
print(f'value: {a.value}, error: {a.error}, bounds: {a.bounds}')
value: 0.5161635411619723, error: 0.039329485850082085, bounds: [ 0.2 10. ]

If we want to modify the value of the parameter, either we work with the blinded parameter (thus the provided value must lie within the blinded bounds), or we must unblind it before. In order to have a better control of the blinding status of the parameter, it is possible to create a context where the parameter is unblinded. This turns usefull when plotting results, for example:

[6]:
fig, ax = plt.subplots(1, 1, figsize=(6, 4))

with c.blind(status=False): # we disable the blind status
    values, edges = minkit.data_plotting_arrays(data, bins=100)
    centers = 0.5 * (edges[1:] + edges[:-1])
    bkg_x, bkg_y = minkit.pdf_plotting_arrays(pdf, values, edges, component='bkg')
    sig_x, sig_y = minkit.pdf_plotting_arrays(pdf, values, edges, component='sig')
    pdf_x, pdf_y = minkit.pdf_plotting_arrays(pdf, values, edges)
    ax.hist(centers, bins=edges, weights=values, histtype='step')
    ax.plot(bkg_x, bkg_y, ':', label='bkg')
    ax.plot(sig_x, sig_y, '--', label='sig')
    ax.plot(pdf_x, pdf_y, '-')
    ax.set_xlabel('m')
    ax.legend()
../_images/notebooks_blinding_11_0.png

We can also calculate asymmetric errors or FCN profiles but, to ensure that the minimum of the parameter remains blinded, the values to provide must correspond to the blinded bounds.

[7]:
fig, (ax0, ax1) = plt.subplots(1, 2, figsize=(12, 4))

with minkit.minimizer('uml', pdf, data, constraints=[n_constraint]) as minimizer:

    minimizer.minuit.print_level = 0

    minimizer.minimize()

    minimizer.asymmetric_errors('c')

    print(f'(blind) asymmetric errors: {c.asym_errors}')
    with c.blind(status=False):
        print(f'(unblind) asymmetric errors: {c.asym_errors}')

    values = np.linspace(*c.bounds, 40)
    profile = minimizer.fcn_profile('c', values)
    ax0.plot(values, profile, 'C0-', label='FCN profile')
    profile = minimizer.minimization_profile('c', values)
    ax0.plot(values, profile, 'C1:', label='FCN minimization profile')
    ax0.set_title('blind')

    with c.blind(status=False):
        values = np.linspace(*c.bounds, 40)
        profile = minimizer.fcn_profile('c', values)
        ax1.plot(values, profile, 'C0-', label='FCN profile')
        profile = minimizer.minimization_profile('c', values)
        ax1.plot(values, profile, 'C1:', label='FCN minimization profile')
        ax1.set_title('unblind')

    for ax in (ax0, ax1):
        ax.set_xlabel('c')
        ax.set_ylabel('FCN')
        ax.legend()

fig.tight_layout()
(blind) asymmetric errors: (0.638809925788479, 0.6633281677584932)
(unblind) asymmetric errors: (0.11184161671235143, 0.11613422349595204)
../_images/notebooks_blinding_13_1.png

In conclusion, operations involving blinded parameters are completely safe as long as the user does not compute the transformation coefficients.

Blinding based on an offset

It might be of interest to set a blinding method in such a way that the absolute error of the blinded parameter reflects the actual error of the true value. This can only be achieved by applying a blinding transformation based on an offset. As can be checked below, after the fit the two errors for the center of the Gaussian function are exactly the same.

[8]:
m = minkit.Parameter('m', bounds=(10, 30))
c = minkit.Parameter('c', 15, bounds=(10, 30))
s = minkit.Parameter('s', 2, bounds=(0.5, 4))
pdf = minkit.Gaussian('pdf', m, c, s)

data = pdf.generate(10000)

c.set_blinding_configuration(offset=10)

with minkit.minimizer('uml', pdf, data) as minimizer:

    minimizer.minimize()

    print(f'(blinded) value: {c.value}, error: {c.error}')
    with c.blind(status=False):
        print(f'(unblinded) value: {c.value}, error: {c.error}')

FCN = 41579.73929204687 TOTAL NCALL = 35 NCALLS = 35
EDM = 7.330694387630302e-07 GOAL EDM = 1e-05 UP = 1.0
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed?
0 c 24.592 0.0204609 19.6135 39.6135 No
1 s 1.99019 0.0154367 0.5 4 No

(blinded) value: 24.592028073491736, error: 0.020460881416493493
(unblinded) value: 14.97853447997468, error: 0.020460881416493493

Blinding based on a scale

Instead of blinding a parameter in such a way that the absolute error remains constant after the transformation, we might be interested on only keeping the relative error. This is achieved by blinding the parameter using a scale factor. In the following example you can see that the relative error for the center remains unchanged.

[9]:

m = minkit.Parameter('m', bounds=(10, 30))
c = minkit.Parameter('c', 15, bounds=(10, 30))
s = minkit.Parameter('s', 2, bounds=(0.5, 4))
pdf = minkit.Gaussian('pdf', m, c, s)

data = pdf.generate(10000)

c.set_blinding_configuration(scale=10)

with minkit.minimizer('uml', pdf, data) as minimizer:

    minimizer.minimize()

    print(f'(blinded) value: {c.value}, relative error: {c.error / c.value}')
    with c.blind(status=False):
        print(f'(unblinded) value: {c.value}, relative error: {c.error / c.value}')

FCN = 41698.14236785427 TOTAL NCALL = 27 NCALLS = 27
EDM = 4.1328799917193134e-05 GOAL EDM = 1e-05 UP = 1.0
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed?
0 c 135.22 0.185788 90.1757 270.527 No
1 s 2.0029 0.0155532 0.5 4 No

(blinded) value: 135.22040686287573, relative error: 0.0013739644691175504
(unblinded) value: 14.995207545101994, relative error: 0.0013739644691175504