Blinded analyses¶

In some studies it becomes necessary to hide the true value of a magnitude in order to prevent analysts to bias the result. The MinKit package offers a way to wrap the values and errors of the parameters to complicate the access to these attributes. Let’s start by building a model composed by two components.

[1]:

%matplotlib inline
import matplotlib.pyplot as plt
import minkit
import numpy as np

m = minkit.Parameter('m', bounds=(0, 50))
c = minkit.Parameter('c', 30, bounds=(25, 35))
s = minkit.Parameter('s', 2, bounds=(1, 4))
a = minkit.Parameter('a', 0.5, bounds=(0.2, 10))
n = minkit.Parameter('n', 2, bounds=(1, 20))

cb = minkit.CrystalBall('sig', m, c, s, a, n)

k = minkit.Parameter('k', -0.01, bounds=(-1, 0))
e = minkit.Exponential('bkg', m, k)

y = minkit.Parameter('y', 0.5, bounds=(0, 1))

pdf = minkit.AddPDFs.two_components('pdf', cb, e, y)

data = pdf.generate(10000)

# Add a constraint to the parameter ruling the power-law tail of the Crystal-Ball
n_c = minkit.Parameter('n_c', n.value)
n_s = minkit.Parameter('n_s', 0.1)
n_constraint = minkit.Gaussian('n_constraint', n, n_c, n_s)

Now that we have the PDF and a data sample to fit we will blind some parameters of the Crystal-Ball function.

[2]:

a.set_blinding_configuration(scale=10, offset=2)
c.set_blinding_configuration(scale=10, offset=3)

The first element of the tuple rules the distance of the blinded value to the true value of the parameter. The second is related to a scale factor that multiplies the sum of the two previous quantities. Two random numbers are generated and determine the final equation to transform true values into blinded values. You can see that both the values and the bounds of the parameters are now modified when we access them:

[3]:

print(f'(a) value: {a.value}, bounds: {a.bounds}')
print(f'(c) value: {c.value}, bounds: {c.bounds}')

(a) value: 4.916079425426541, bounds: [ 3.75531604 41.67358666]
(c) value: 171.59662810889603, bounds: [143.03794197 200.15531425]

Note that from this and from the initial values it is possible to determine the transformation that is used in order to hide the true values. The blinding procedure is just a way to avoid looking at the true result before the whole analysis strategy has been defined, and the sensitivities have been computed. We can now run a fit and see the results.

[4]:

with minkit.minimizer('uml', pdf, data, constraints=[n_constraint]) as minimizer:
    minimizer.minimize()

FCN = 74285.66681671573	TOTAL NCALL = 183	NCALLS = 183
EDM = 6.39271292379903e-07	GOAL EDM = 1e-05	UP = 1.0

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed?
0	y	0.503738	0.0132383			0	1	No
1	c	170.989	0.645717			143.038	200.155	No
2	s	2.10274	0.0981425			1	4	No
3	a	4.97862	0.152174			3.75532	41.6736	No
4	n	1.99169	0.0995817			1	20	No
5	k	-0.00969058	0.00134413			-1	0	No

\begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Hesse Error & Minos Error- & Minos Error+ & Limit- & Limit+ & Fixed?\\
\hline
0 & y & 0.503738 & 0.0132383 &  &  & 0.0 & 1 & No\\
\hline
1 & c & 170.989 & 0.645717 &  &  & 143.03794197011052 & 200.155 & No\\
\hline
2 & s & 2.10274 & 0.0981425 &  &  & 1.0 & 4 & No\\
\hline
3 & a & 4.97862 & 0.152174 &  &  & 3.7553160389169893 & 41.6736 & No\\
\hline
4 & n & 1.99169 & 0.0995817 &  &  & 1.0 & 20 & No\\
\hline
5 & k & -0.00969058 & 0.00134413 &  &  & -1.0 & 0 & No\\
\hline
\end{tabular}

You can see that in the information displayed by MIGRAD the bounds and values of the parameters are shown blinded. If we attempt to generate or evaluate the PDF on our own, it will be called with the blinded values. In order to retrieve the true result we must remove the configuration of the data parameters, as

[5]:

a.set_blinding_configuration(None)
print(f'value: {a.value}, error: {a.error}, bounds: {a.bounds}')

value: 0.5161635411619723, error: 0.039329485850082085, bounds: [ 0.2 10. ]

If we want to modify the value of the parameter, either we work with the blinded parameter (thus the provided value must lie within the blinded bounds), or we must unblind it before. In order to have a better control of the blinding status of the parameter, it is possible to create a context where the parameter is unblinded. This turns usefull when plotting results, for example:

[6]:

fig, ax = plt.subplots(1, 1, figsize=(6, 4))

with c.blind(status=False): # we disable the blind status
    values, edges = minkit.data_plotting_arrays(data, bins=100)
    centers = 0.5 * (edges[1:] + edges[:-1])
    bkg_x, bkg_y = minkit.pdf_plotting_arrays(pdf, values, edges, component='bkg')
    sig_x, sig_y = minkit.pdf_plotting_arrays(pdf, values, edges, component='sig')
    pdf_x, pdf_y = minkit.pdf_plotting_arrays(pdf, values, edges)
    ax.hist(centers, bins=edges, weights=values, histtype='step')
    ax.plot(bkg_x, bkg_y, ':', label='bkg')
    ax.plot(sig_x, sig_y, '--', label='sig')
    ax.plot(pdf_x, pdf_y, '-')
    ax.set_xlabel('m')
    ax.legend()

We can also calculate asymmetric errors or FCN profiles but, to ensure that the minimum of the parameter remains blinded, the values to provide must correspond to the blinded bounds.

[7]:

fig, (ax0, ax1) = plt.subplots(1, 2, figsize=(12, 4))

with minkit.minimizer('uml', pdf, data, constraints=[n_constraint]) as minimizer:

    minimizer.minuit.print_level = 0

    minimizer.minimize()

    minimizer.asymmetric_errors('c')

    print(f'(blind) asymmetric errors: {c.asym_errors}')
    with c.blind(status=False):
        print(f'(unblind) asymmetric errors: {c.asym_errors}')

    values = np.linspace(*c.bounds, 40)
    profile = minimizer.fcn_profile('c', values)
    ax0.plot(values, profile, 'C0-', label='FCN profile')
    profile = minimizer.minimization_profile('c', values)
    ax0.plot(values, profile, 'C1:', label='FCN minimization profile')
    ax0.set_title('blind')

    with c.blind(status=False):
        values = np.linspace(*c.bounds, 40)
        profile = minimizer.fcn_profile('c', values)
        ax1.plot(values, profile, 'C0-', label='FCN profile')
        profile = minimizer.minimization_profile('c', values)
        ax1.plot(values, profile, 'C1:', label='FCN minimization profile')
        ax1.set_title('unblind')

    for ax in (ax0, ax1):
        ax.set_xlabel('c')
        ax.set_ylabel('FCN')
        ax.legend()

fig.tight_layout()

(blind) asymmetric errors: (0.638809925788479, 0.6633281677584932)
(unblind) asymmetric errors: (0.11184161671235143, 0.11613422349595204)

In conclusion, operations involving blinded parameters are completely safe as long as the user does not compute the transformation coefficients.

Blinding based on an offset¶

It might be of interest to set a blinding method in such a way that the absolute error of the blinded parameter reflects the actual error of the true value. This can only be achieved by applying a blinding transformation based on an offset. As can be checked below, after the fit the two errors for the center of the Gaussian function are exactly the same.

[8]:

m = minkit.Parameter('m', bounds=(10, 30))
c = minkit.Parameter('c', 15, bounds=(10, 30))
s = minkit.Parameter('s', 2, bounds=(0.5, 4))
pdf = minkit.Gaussian('pdf', m, c, s)

data = pdf.generate(10000)

c.set_blinding_configuration(offset=10)

with minkit.minimizer('uml', pdf, data) as minimizer:

    minimizer.minimize()

    print(f'(blinded) value: {c.value}, error: {c.error}')
    with c.blind(status=False):
        print(f'(unblinded) value: {c.value}, error: {c.error}')

FCN = 41579.73929204687	TOTAL NCALL = 35	NCALLS = 35
EDM = 7.330694387630302e-07	GOAL EDM = 1e-05	UP = 1.0

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed?
0	c	24.592	0.0204609			19.6135	39.6135	No
1	s	1.99019	0.0154367			0.5	4	No

\begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Hesse Error & Minos Error- & Minos Error+ & Limit- & Limit+ & Fixed?\\
\hline
0 & c & 24.592 & 0.0204609 &  &  & 19.613493593517056 & 39.6135 & No\\
\hline
1 & s & 1.99019 & 0.0154367 &  &  & 0.5 & 4 & No\\
\hline
\end{tabular}

(blinded) value: 24.592028073491736, error: 0.020460881416493493
(unblinded) value: 14.97853447997468, error: 0.020460881416493493

Blinding based on a scale¶

Instead of blinding a parameter in such a way that the absolute error remains constant after the transformation, we might be interested on only keeping the relative error. This is achieved by blinding the parameter using a scale factor. In the following example you can see that the relative error for the center remains unchanged.

[9]:

m = minkit.Parameter('m', bounds=(10, 30))
c = minkit.Parameter('c', 15, bounds=(10, 30))
s = minkit.Parameter('s', 2, bounds=(0.5, 4))
pdf = minkit.Gaussian('pdf', m, c, s)

data = pdf.generate(10000)

c.set_blinding_configuration(scale=10)

with minkit.minimizer('uml', pdf, data) as minimizer:

    minimizer.minimize()

    print(f'(blinded) value: {c.value}, relative error: {c.error / c.value}')
    with c.blind(status=False):
        print(f'(unblinded) value: {c.value}, relative error: {c.error / c.value}')

FCN = 41698.14236785427	TOTAL NCALL = 27	NCALLS = 27
EDM = 4.1328799917193134e-05	GOAL EDM = 1e-05	UP = 1.0

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed?
0	c	135.22	0.185788			90.1757	270.527	No
1	s	2.0029	0.0155532			0.5	4	No

\begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Hesse Error & Minos Error- & Minos Error+ & Limit- & Limit+ & Fixed?\\
\hline
0 & c & 135.22 & 0.185788 &  &  & 90.17574878918155 & 270.527 & No\\
\hline
1 & s & 2.0029 & 0.0155532 &  &  & 0.5 & 4 & No\\
\hline
\end{tabular}

(blinded) value: 135.22040686287573, relative error: 0.0013739644691175504
(unblinded) value: 14.995207545101994, relative error: 0.0013739644691175504