Stochastic Optimization with Consistent Constraints

Application of the method of consistent constraints in SOM is twofold:

A new Markov chain update that is enabled by calling Som.accumulate() with cc_update=True.
A new method Som.compute_final_solution_cc() to build the final solution out of accumulated particular solutions.

Both applications are rather involved and support many user-adjustable parameters. Names of their respective keyword arguments are typeset as monospaced text in the following.

Consistent-constraints updates

Consistent-constraints (CC) updates introduced in SOM 2.0 are a new type of updates in the Markov chain used to accumulate particular solutions (Section II.A of [GMPPS2017]). Unlike elementary updates, they radically change an MC configuration and help reveal local minima of the objective function more quickly.

The CC updates are proposed during the first stage of a global update at regular intervals. Two consecutive CC updates are separated by cc_update_cycle_length elementary updates. A CC update is skipped if it could potentially result in a configuration with too many rectangles, \(2K_0+1>\) max_rects, where \(K_0\) is the number of rectangles in the current configuration. CC updates are accepted and rejected based on the same Metropolis criterion as the elementary updates. A proposed CC update proceeds as follows.

Take the current configuration, construct a configuration made out of non-overlapping rectangles \(\{(c_k, w_k, h_k)\}\) (Eqs. (16)-(17) of [GMPPS2017]) and discard rectangles with width below min_rect_width. If size \(K\) of the resulting non-overlapping configuration is less than 3 (too small to evaluate the 2nd derivative), reject the update.
Collect heights \(h_k\) of rectangles in the non-overlapping configuration in a \(K\)-dimensional vector \(\mathbf{h}\).
Starting from the collected heights, use the CC protocol (see below) to compute optimal heights \(\mathbf{h}^\mathrm{(opt)}\).
If any of \(h^\mathrm{(opt)}_k\) is significantly negative, i.e. \(w_k h^\mathrm{(opt)}_k < -\) min_rect_weight, reject the update.
Replace heights \(h_k\) in the non-overlapping configuration with the optimal heights \(h^\mathrm{(opt)}_k\).
Remove small rectangles (\(w_k h^\mathrm{(opt)}_k <\) min_rect_weight) from the configuration and redistribute their weight by changing heights of their neighboring rectangles. This step is repeated until none of the rectangles has weight below min_rect_weight.
Evaluate the \(\chi^2\)-functional for the new optimized configuration and use \(\chi^2_\mathrm{(opt)}\) to compute the Metropolis ratio.

The CC protocol consists in iterative minimization of a quadratic function of \(h_k\) with self-consistent determination of regularization parameters \(Q^{(0)}_k, Q^{(1)}_k, Q^{(2)}_k\) this function depends on. The function in question is a sum of the \(\chi^2\)-functional and a few regularization terms,

\[O[\mathbf{h}; Q^{(0)},Q^{(1)},Q^{(2)}] = \chi^2[\mathbf{h}] + O_0[\mathbf{h};Q^{(0)}] + O_1[\mathbf{h};Q^{(1)}] + O_2[\mathbf{h};Q^{(2)}].\]

The regularization term \(O_0\) penalizes large amplitudes of a solution,

\[O_0[\mathbf{h};Q^{(0)}] = \sum_{k=1}^K (Q^{(0)}_k)^2 h_k^2 = \mathbf{h}^T \hat O_{(0)} \mathbf{h}, \quad \hat O_{(0)} = \mathrm{diag}((Q^{(0)}_k)^2).\]

\(O_1\) and \(O_2\) penalize large values of the 1st and 2nd derivatives of a solution respectively,

\[ \begin{align}\begin{aligned}O_1[\mathbf{h};Q^{(1)}] &= \sum_{k=1}^{K-1} (Q^{(1)}_k)^2 |A'(\epsilon_k)|^2 = \mathbf{h}^T \hat O_{(1)} \mathbf{h},\\O_2[\mathbf{h};Q^{(2)}] &= \sum_{k=2}^{K-1} (Q^{(2)}_{k-1})^2 |A''(\epsilon_k)|^2 = \mathbf{h}^T \hat O_{(2)} \mathbf{h}.\end{aligned}\end{align} \]

Matrices \(\hat O_{(1)}\) and \(\hat O_{(2)}\) are derived from finite-difference approximations of \(A'\) and \(A''\) at points \(\epsilon_k = c_k\).

CC protocol iterations are organized as follows.

Initialize regularization parameters according to
- \(Q^{(0)}_k = 0,\ k=\overline{1,K}\),
- \(Q^{(1)}_k = (\) cc_update_der_penalty_init \()W^2/ \mathcal{N},\ k = \overline{1,K-1}\),
- \(Q^{(2)}_k = (\) cc_update_der_penalty_init \()W^3/ \mathcal{N},\ k = \overline{1,K-2}\),
where \(W\) is the energy window width (energy_window[1] - energy_window[0]) and \(\mathcal{N}\) is the requested solution norm.
Use a numerical linear algebra algorithm to minimize the quadratic function \(O[\mathbf{h}; Q^{(0)},Q^{(1)},Q^{(2)}]\) w.r.t. \(\mathbf{h}\) subject to equality constraint \(\sum_{k=1}^K w_k h_k = \mathcal{N}\).
Compute discrepancy of weights of rectangles with heights \(\mathbf{h}\) and \(\mathbf{h}^\mathrm{new}\) as \(\frac{1}{\mathcal{N}}\sum_{k=1}^K |w_k (h_k - h^\mathrm{new}_k)|\). If it lies within a fixed tolerance level cc_update_rect_norm_variation_tol, terminate iterations.
Update amplitude regularization parameters \(Q^{(0)}_k\): If \(h^\mathrm{new}_k\) is negative, set \(Q^{(0)}_k =(\) cc_update_height_penalty_max \()W / \mathcal{N}\), otherwise divide \(Q^{(0)}_k\) by cc_update_height_penalty_divisor.
Use finite-difference approximations to compute derivatives of \(A\) at \(\epsilon_k = c_k\):
- 1st derivatives \(d^{(1)}_k, k = \overline{1,K-1}\).
- 2nd derivatives \(d^{(2)}_k, k = \overline{1,K-2}\).
Compute limiting constants for \(Q^{(1)}_k\) and \(Q^{(2)}_k\):
- \(Q^{(1)}_\mathrm{lim} = (\) cc_update_der_penalty_limiter \() \min_k Q^{(1)}_k\).
- \(Q^{(2)}_\mathrm{lim} = (\) cc_update_der_penalty_limiter \() \min_k Q^{(2)}_k\).
Update the derivative regularization parameters based on the magnitudes of \(d^{(1)}_k\) and \(d^{(2)}_k\).
- If \(|d^{(1)}_k| > (\) cc_update_der_penalty_threshold \(/ \sqrt{K-1}) / Q^{(1)}_k\), then set \(Q^{(1)}_k = (\) cc_update_der_penalty_threshold \(/ \sqrt{K-1}) / |d^{(1)}_k|\). Otherwise multiply \(Q^{(1)}_k\) by cc_update_der_penalty_increase_coeff.
- If \(|d^{(2)}_k| > (\) cc_update_der_penalty_threshold \(/ \sqrt{K-1}) / Q^{(2)}_k\), then set \(Q^{(2)}_k = (\) cc_update_der_penalty_threshold \(/ \sqrt{K-1}) / |d^{(2)}_k|\). Otherwise multiply \(Q^{(2)}_k\) by cc_update_der_penalty_increase_coeff.
Reduce excessively large derivative regularization parameters to avoid divergent behaviour:
- If \(Q^{(1)}_k > Q^{(1)}_\mathrm{lim}\), then set \(Q^{(1)}_k = Q^{(1)}_\mathrm{lim}\).
- If \(Q^{(2)}_k > Q^{(2)}_\mathrm{lim}\), then set \(Q^{(2)}_k = Q^{(2)}_\mathrm{lim}\).
If the maximal allowed number of iterations cc_update_max_iter is reached, terminate. Otherwise set \(\mathbf{h} = \mathbf{h}^\mathrm{new}\) and repeat from step 2.

Construction of final solution using the consistent-constraints protocol

SOM 2.x features an optimization procedure that finds a vector of final solution coefficients \(\mathbf{c}\) resulting in a smoother spectral function. It also allows to bias the final solution towards a user-provided default model.

The optimization procedure minimizes a quadratic function of \(\mathbf{c}\) depending on a few sets of regularization parameters,

\[O[\mathbf{c}; Q_k, D_k, B_k, T_k, A_T(\epsilon_k), U] = O_Q[\mathbf{c}; Q_k] + O_D[\mathbf{c}; D_k] + O_B[\mathbf{c}; B_k] + O_T[\mathbf{c}; T_k, A_T(\epsilon)] + O_U[\mathbf{c}; U].\]

\(O_Q[\mathbf{c}; Q_k]\) stabilizes the final solution by penalizing large amplitudes,

\[O_Q[\mathbf{c}; Q_k] = \sum_{k=1}^K Q_k^2 A(\epsilon_k)^2 = \mathbf{c}^T \hat O_Q \mathbf{c}, \quad (\hat O_Q)_{jj'} = \sum_{k=1}^K Q_k^2 A_j(\epsilon_k) A_{j'}(\epsilon_k).\]
\(O_D[\mathbf{c}; D_k]\) penalizes large 1st derivatives of the final solution,

\[O_D[\mathbf{c}; D_k] = \sum_{k=1}^{K-1} D_k^2 |A'(\epsilon_k)|^2 = \mathbf{c}^T \hat O_D \mathbf{c}, \quad (\hat O_D)_{jj'} = \sum_{k=1}^{K-1} D_k^2 A'_j(\epsilon_k) A'_{j'}(\epsilon_k).\]
\(O_B[\mathbf{c}; B_k]\) penalizes large 2nd derivatives of the final solution,

\[O_B[\mathbf{c}; B_k] = \sum_{k=2}^{K-1} B_{k-1}^2 |A''(\epsilon_k)|^2 = \mathbf{c}^T \hat O_B \mathbf{c}, \quad (\hat O_B)_{jj'} = \sum_{k=2}^{K-1} B_{k-1}^2 A''_j(\epsilon_k) A''_{j'}(\epsilon_k).\]
\(O_T[\mathbf{c}; T_k, A_T(\epsilon)]\) penalizes deviations from a predefined default model (“target”) \(A_T(\epsilon)\),

\[ \begin{align}\begin{aligned}O_T[\mathbf{c}; T_k, A_T(\epsilon)] &= \sum_{k=1}^K T_k [A(\epsilon_k) - A_T(\epsilon_k)]^2 = \mathbf{c}^T \hat O_T \mathbf{c} - 2\mathbf{f}_T^T\mathbf{c}+\mathrm{const},\\(\hat O_T)_{jj'} &= \sum_{k=1}^K T_k A_j(\epsilon_k) A_{j'}(\epsilon_k), \quad (\mathbf{f}_T)_j = \sum_{k=1}^K T_k A_j(\epsilon_k) A_T(\epsilon_k).\end{aligned}\end{align} \]
\(O_U[\mathbf{c}; U]\) penalizes deviations from the equal weight superposition \(c_j = 1/J\).

\[ \begin{align}\begin{aligned}O_U[\mathbf{c}; U] &= U \sum_{j=1}^J (c_j - 1/J)^2 = \mathbf{c}^T \hat O_U \mathbf{c}-2\mathbf{f}_U^T\mathbf{c}+\mathrm{const},\\(\hat O_U)_{jj'} &= U \delta_{jj'},\quad (\mathbf{f}_U)_j = U / J.\end{aligned}\end{align} \]

Values of particular spectral functions \(A_j(\epsilon)\) and their derivatives are approximated on a user-supplied uniform or non-uniform energy grid \(\{\epsilon_k\}\) (refreq_mesh).

The iterative optimization procedure proceeds as follows.

Precompute approximate values of \(A_j(\epsilon_k)\), \(A'_j(\epsilon_k)\) and \(A''_j(\epsilon_k)\).
Using user-supplied parameters \(A_T(\epsilon_k)=\) default_model, \(T_k =\) default_model_weights and \(U=\) ew_penalty_coeff precompute \(\mathbf{f} = \mathbf{f}_T + \mathbf{f}_U\).
Initialize regularization parameters according to
- \(\mathcal{D} = (\) der_penalty_init \() / J\),
- \(Q_k = 0\),
- \(D_k = \mathcal{D}\),
- \(B_k = \mathcal{D}\).
Use the regularization parameters to compute matrix \(\hat O = \hat O_Q + \hat O_D + \hat O_B + \hat O_T + \hat O_U\).
Use a numerical linear algebra algorithm to minimize \(\mathbf{c}^T \hat O \mathbf{c} - 2\mathbf{f}^T\mathbf{c}\) w.r.t. \(\mathbf{c}\) subject to the normalization sum rule \(\sum_{J=1}^J c_j = 1\). The minimization result at the \(I\)-th iteration is stored as a vector \(\mathbf{c}^{(I)}\).
Compute and store relative \(L_1\)-distance between \(\mathbf{c}^{(I)}\) and \(\mathbf{c}^{(I-1)}\),

\[d(I, I-1) = \frac{\sum_{j=1}^J |c^{(I)}_j - c^{(I-1)}_j|} {\sum_{j=1}^J |c^{(I)}_j|},\]

where the initial vector \(\mathbf{c}^{(0)}\) is taken to have all components equal to \(1/J\).
In case user has provided a monitor function (monitor), it is called with 4 arguments,
- a 1D NumPy array \(\mathbf{c}^{(I)}\),
- a 2-tuple of 1D arrays \((A(\epsilon_k), Q_k)\),
- a 2-tuple of 1D arrays \((A'(\epsilon_k), D_k)\),
- a 2-tuple of 1D arrays \((A''(\epsilon_k), B_k)\).
Returning True from the function terminates iterations.
If \(\sum_{j=1}^J |c^{(I)}_j| >\) max_sum_abs_c (contribution of the negative coefficients is too big), then terminate iterations.
If \(d(I, I-1) < \min_n |\sigma_n/G(n)|\), then convergence is reached and iterations are terminated. \(G(n)\) is the right-hand side of the analytic continuation problem, and \(\sigma_n\) are respective estimated error bars on the input data.
Form a new final solution from coefficients \(\mathbf{c}^{(I)}\),

\[A(\epsilon_k) = \sum_{j=1}^J c^{(I)}_j A_j(\epsilon_k),\quad A'(\epsilon_k) = \sum_{j=1}^J c^{(I)}_j A'_j(\epsilon_k),\quad A''(\epsilon_k) = \sum_{j=1}^J c^{(I)}_j A''_j(\epsilon_k).\]

Increase regularization parameters \(\mathcal{D}\), \(D_k\) and \(B_k\) by the factor der_penalty_coeff.
Update the amplitude regularization parameters \(Q_k\): If \(A(\epsilon_k) < 0\), then set \(Q_k =\) amp_penalty_max. Otherwise divide \(Q_k\) by amp_penalty_divisor.
Update the derivative regularization parameters:
- If \(D_k |A'(\epsilon_k)| > \mathcal{D}\), then set \(D_k = \mathcal{D} / |A'(\epsilon_k)|\);
- If \(B_k |A''(\epsilon_k)| > \mathcal{D}\), then set \(B_k = \mathcal{D} / |A''(\epsilon_k)|\).
If the maximal allowed number of iterations max_iter is reached, terminate. Otherwise repeat from step 4.

Once the iterations have been terminated, one inspects the accumulated list of pairs \((\mathbf{c}^{(1)}, d(1,0)), (\mathbf{c}^{(2)}, d(2,1)), \ldots\). Elements of the list are sorted in the ascending order according to \(d(I,I-1)\), i.e. \(\mathbf{c}^{(I)}\) from the iteration closest to convergence comes first in the sorted list. The first vector of coefficients from the sorted list that obeys

\(\chi^2\left[\sum_{j=1}^J c^{(I)}_j A_j\right] \leq (\) good_chi_abs \()^2\) and
\(\chi^2\left[\sum_{j=1}^J c^{(I)}_j A_j\right] \leq \min_j \chi^2[A_j] (\) good_chi_rel \()^2\)

represents the sought final solution.