Stochastic Optimization with Consistent Constraints
Application of the method of consistent constraints in SOM is twofold:
A new Markov chain update that is enabled by calling
Som.accumulate()
withcc_update=True
.A new method
Som.compute_final_solution_cc()
to build the final solution out of accumulated particular solutions.
Both applications are rather involved and support many user-adjustable
parameters. Names of their respective keyword arguments are typeset as
monospaced text
in the following.
Consistent-constraints updates
Consistent-constraints (CC) updates introduced in SOM 2.0 are a new type of updates in the Markov chain used to accumulate particular solutions (Section II.A of [GMPPS2017]). Unlike elementary updates, they radically change an MC configuration and help reveal local minima of the objective function more quickly.
The CC updates are proposed during the first stage of a global update at regular
intervals. Two consecutive CC updates are separated by
cc_update_cycle_length
elementary updates. A CC update is skipped if it
could potentially result in a configuration with too many rectangles,
\(2K_0+1>\) max_rects
, where \(K_0\) is the number of rectangles in
the current configuration. CC updates are accepted and rejected based on the
same Metropolis criterion as the elementary updates. A proposed CC update
proceeds as follows.
Take the current configuration, construct a configuration made out of non-overlapping rectangles \(\{(c_k, w_k, h_k)\}\) (Eqs. (16)-(17) of [GMPPS2017]) and discard rectangles with width below
min_rect_width
. If size \(K\) of the resulting non-overlapping configuration is less than 3 (too small to evaluate the 2nd derivative), reject the update.Collect heights \(h_k\) of rectangles in the non-overlapping configuration in a \(K\)-dimensional vector \(\mathbf{h}\).
Starting from the collected heights, use the CC protocol (see below) to compute optimal heights \(\mathbf{h}^\mathrm{(opt)}\).
If any of \(h^\mathrm{(opt)}_k\) is significantly negative, i.e. \(w_k h^\mathrm{(opt)}_k < -\)
min_rect_weight
, reject the update.Replace heights \(h_k\) in the non-overlapping configuration with the optimal heights \(h^\mathrm{(opt)}_k\).
Remove small rectangles (\(w_k h^\mathrm{(opt)}_k <\)
min_rect_weight
) from the configuration and redistribute their weight by changing heights of their neighboring rectangles. This step is repeated until none of the rectangles has weight belowmin_rect_weight
.Evaluate the \(\chi^2\)-functional for the new optimized configuration and use \(\chi^2_\mathrm{(opt)}\) to compute the Metropolis ratio.
The CC protocol consists in iterative minimization of a quadratic function of \(h_k\) with self-consistent determination of regularization parameters \(Q^{(0)}_k, Q^{(1)}_k, Q^{(2)}_k\) this function depends on. The function in question is a sum of the \(\chi^2\)-functional and a few regularization terms,
The regularization term \(O_0\) penalizes large amplitudes of a solution,
\(O_1\) and \(O_2\) penalize large values of the 1st and 2nd derivatives of a solution respectively,
Matrices \(\hat O_{(1)}\) and \(\hat O_{(2)}\) are derived from finite-difference approximations of \(A'\) and \(A''\) at points \(\epsilon_k = c_k\).
CC protocol iterations are organized as follows.
Initialize regularization parameters according to
\(Q^{(0)}_k = 0,\ k=\overline{1,K}\),
\(Q^{(1)}_k = (\)
cc_update_der_penalty_init
\()W^2/ \mathcal{N},\ k = \overline{1,K-1}\),\(Q^{(2)}_k = (\)
cc_update_der_penalty_init
\()W^3/ \mathcal{N},\ k = \overline{1,K-2}\),
where \(W\) is the energy window width (
energy_window[1] - energy_window[0]
) and \(\mathcal{N}\) is the requested solution norm.Use a numerical linear algebra algorithm to minimize the quadratic function \(O[\mathbf{h}; Q^{(0)},Q^{(1)},Q^{(2)}]\) w.r.t. \(\mathbf{h}\) subject to equality constraint \(\sum_{k=1}^K w_k h_k = \mathcal{N}\).
Compute discrepancy of weights of rectangles with heights \(\mathbf{h}\) and \(\mathbf{h}^\mathrm{new}\) as \(\frac{1}{\mathcal{N}}\sum_{k=1}^K |w_k (h_k - h^\mathrm{new}_k)|\). If it lies within a fixed tolerance level
cc_update_rect_norm_variation_tol
, terminate iterations.Update amplitude regularization parameters \(Q^{(0)}_k\): If \(h^\mathrm{new}_k\) is negative, set \(Q^{(0)}_k =(\)
cc_update_height_penalty_max
\()W / \mathcal{N}\), otherwise divide \(Q^{(0)}_k\) bycc_update_height_penalty_divisor
.Use finite-difference approximations to compute derivatives of \(A\) at \(\epsilon_k = c_k\):
1st derivatives \(d^{(1)}_k, k = \overline{1,K-1}\).
2nd derivatives \(d^{(2)}_k, k = \overline{1,K-2}\).
Compute limiting constants for \(Q^{(1)}_k\) and \(Q^{(2)}_k\):
\(Q^{(1)}_\mathrm{lim} = (\)
cc_update_der_penalty_limiter
\() \min_k Q^{(1)}_k\).\(Q^{(2)}_\mathrm{lim} = (\)
cc_update_der_penalty_limiter
\() \min_k Q^{(2)}_k\).
Update the derivative regularization parameters based on the magnitudes of \(d^{(1)}_k\) and \(d^{(2)}_k\).
If \(|d^{(1)}_k| > (\)
cc_update_der_penalty_threshold
\(/ \sqrt{K-1}) / Q^{(1)}_k\), then set \(Q^{(1)}_k = (\)cc_update_der_penalty_threshold
\(/ \sqrt{K-1}) / |d^{(1)}_k|\). Otherwise multiply \(Q^{(1)}_k\) bycc_update_der_penalty_increase_coeff
.If \(|d^{(2)}_k| > (\)
cc_update_der_penalty_threshold
\(/ \sqrt{K-1}) / Q^{(2)}_k\), then set \(Q^{(2)}_k = (\)cc_update_der_penalty_threshold
\(/ \sqrt{K-1}) / |d^{(2)}_k|\). Otherwise multiply \(Q^{(2)}_k\) bycc_update_der_penalty_increase_coeff
.
Reduce excessively large derivative regularization parameters to avoid divergent behaviour:
If \(Q^{(1)}_k > Q^{(1)}_\mathrm{lim}\), then set \(Q^{(1)}_k = Q^{(1)}_\mathrm{lim}\).
If \(Q^{(2)}_k > Q^{(2)}_\mathrm{lim}\), then set \(Q^{(2)}_k = Q^{(2)}_\mathrm{lim}\).
If the maximal allowed number of iterations
cc_update_max_iter
is reached, terminate. Otherwise set \(\mathbf{h} = \mathbf{h}^\mathrm{new}\) and repeat from step 2.
Construction of final solution using the consistent-constraints protocol
SOM 2.x features an optimization procedure that finds a vector of final solution coefficients \(\mathbf{c}\) resulting in a smoother spectral function. It also allows to bias the final solution towards a user-provided default model.
The optimization procedure minimizes a quadratic function of \(\mathbf{c}\) depending on a few sets of regularization parameters,
\(O_Q[\mathbf{c}; Q_k]\) stabilizes the final solution by penalizing large amplitudes,
\[O_Q[\mathbf{c}; Q_k] = \sum_{k=1}^K Q_k^2 A(\epsilon_k)^2 = \mathbf{c}^T \hat O_Q \mathbf{c}, \quad (\hat O_Q)_{jj'} = \sum_{k=1}^K Q_k^2 A_j(\epsilon_k) A_{j'}(\epsilon_k).\]\(O_D[\mathbf{c}; D_k]\) penalizes large 1st derivatives of the final solution,
\[O_D[\mathbf{c}; D_k] = \sum_{k=1}^{K-1} D_k^2 |A'(\epsilon_k)|^2 = \mathbf{c}^T \hat O_D \mathbf{c}, \quad (\hat O_D)_{jj'} = \sum_{k=1}^{K-1} D_k^2 A'_j(\epsilon_k) A'_{j'}(\epsilon_k).\]\(O_B[\mathbf{c}; B_k]\) penalizes large 2nd derivatives of the final solution,
\[O_B[\mathbf{c}; B_k] = \sum_{k=2}^{K-1} B_{k-1}^2 |A''(\epsilon_k)|^2 = \mathbf{c}^T \hat O_B \mathbf{c}, \quad (\hat O_B)_{jj'} = \sum_{k=2}^{K-1} B_{k-1}^2 A''_j(\epsilon_k) A''_{j'}(\epsilon_k).\]\(O_T[\mathbf{c}; T_k, A_T(\epsilon)]\) penalizes deviations from a predefined default model (“target”) \(A_T(\epsilon)\),
\[ \begin{align}\begin{aligned}O_T[\mathbf{c}; T_k, A_T(\epsilon)] &= \sum_{k=1}^K T_k [A(\epsilon_k) - A_T(\epsilon_k)]^2 = \mathbf{c}^T \hat O_T \mathbf{c} - 2\mathbf{f}_T^T\mathbf{c}+\mathrm{const},\\(\hat O_T)_{jj'} &= \sum_{k=1}^K T_k A_j(\epsilon_k) A_{j'}(\epsilon_k), \quad (\mathbf{f}_T)_j = \sum_{k=1}^K T_k A_j(\epsilon_k) A_T(\epsilon_k).\end{aligned}\end{align} \]\(O_U[\mathbf{c}; U]\) penalizes deviations from the equal weight superposition \(c_j = 1/J\).
\[ \begin{align}\begin{aligned}O_U[\mathbf{c}; U] &= U \sum_{j=1}^J (c_j - 1/J)^2 = \mathbf{c}^T \hat O_U \mathbf{c}-2\mathbf{f}_U^T\mathbf{c}+\mathrm{const},\\(\hat O_U)_{jj'} &= U \delta_{jj'},\quad (\mathbf{f}_U)_j = U / J.\end{aligned}\end{align} \]
Values of particular spectral functions \(A_j(\epsilon)\) and their
derivatives are approximated on a user-supplied uniform or non-uniform energy
grid \(\{\epsilon_k\}\) (refreq_mesh
).
The iterative optimization procedure proceeds as follows.
Precompute approximate values of \(A_j(\epsilon_k)\), \(A'_j(\epsilon_k)\) and \(A''_j(\epsilon_k)\).
Using user-supplied parameters \(A_T(\epsilon_k)=\)
default_model
, \(T_k =\)default_model_weights
and \(U=\)ew_penalty_coeff
precompute \(\mathbf{f} = \mathbf{f}_T + \mathbf{f}_U\).Initialize regularization parameters according to
\(\mathcal{D} = (\)
der_penalty_init
\() / J\),\(Q_k = 0\),
\(D_k = \mathcal{D}\),
\(B_k = \mathcal{D}\).
Use the regularization parameters to compute matrix \(\hat O = \hat O_Q + \hat O_D + \hat O_B + \hat O_T + \hat O_U\).
Use a numerical linear algebra algorithm to minimize \(\mathbf{c}^T \hat O \mathbf{c} - 2\mathbf{f}^T\mathbf{c}\) w.r.t. \(\mathbf{c}\) subject to the normalization sum rule \(\sum_{J=1}^J c_j = 1\). The minimization result at the \(I\)-th iteration is stored as a vector \(\mathbf{c}^{(I)}\).
Compute and store relative \(L_1\)-distance between \(\mathbf{c}^{(I)}\) and \(\mathbf{c}^{(I-1)}\),
\[d(I, I-1) = \frac{\sum_{j=1}^J |c^{(I)}_j - c^{(I-1)}_j|} {\sum_{j=1}^J |c^{(I)}_j|},\]where the initial vector \(\mathbf{c}^{(0)}\) is taken to have all components equal to \(1/J\).
In case user has provided a monitor function (
monitor
), it is called with 4 arguments,a 1D NumPy array \(\mathbf{c}^{(I)}\),
a 2-tuple of 1D arrays \((A(\epsilon_k), Q_k)\),
a 2-tuple of 1D arrays \((A'(\epsilon_k), D_k)\),
a 2-tuple of 1D arrays \((A''(\epsilon_k), B_k)\).
Returning
True
from the function terminates iterations.If \(\sum_{j=1}^J |c^{(I)}_j| >\)
max_sum_abs_c
(contribution of the negative coefficients is too big), then terminate iterations.If \(d(I, I-1) < \min_n |\sigma_n/G(n)|\), then convergence is reached and iterations are terminated. \(G(n)\) is the right-hand side of the analytic continuation problem, and \(\sigma_n\) are respective estimated error bars on the input data.
Form a new final solution from coefficients \(\mathbf{c}^{(I)}\),
Increase regularization parameters \(\mathcal{D}\), \(D_k\) and \(B_k\) by the factor
der_penalty_coeff
.Update the amplitude regularization parameters \(Q_k\): If \(A(\epsilon_k) < 0\), then set \(Q_k =\)
amp_penalty_max
. Otherwise divide \(Q_k\) byamp_penalty_divisor
.Update the derivative regularization parameters:
If \(D_k |A'(\epsilon_k)| > \mathcal{D}\), then set \(D_k = \mathcal{D} / |A'(\epsilon_k)|\);
If \(B_k |A''(\epsilon_k)| > \mathcal{D}\), then set \(B_k = \mathcal{D} / |A''(\epsilon_k)|\).
If the maximal allowed number of iterations
max_iter
is reached, terminate. Otherwise repeat from step 4.
Once the iterations have been terminated, one inspects the accumulated list of pairs \((\mathbf{c}^{(1)}, d(1,0)), (\mathbf{c}^{(2)}, d(2,1)), \ldots\). Elements of the list are sorted in the ascending order according to \(d(I,I-1)\), i.e. \(\mathbf{c}^{(I)}\) from the iteration closest to convergence comes first in the sorted list. The first vector of coefficients from the sorted list that obeys
\(\chi^2\left[\sum_{j=1}^J c^{(I)}_j A_j\right] \leq (\)
good_chi_abs
\()^2\) and\(\chi^2\left[\sum_{j=1}^J c^{(I)}_j A_j\right] \leq \min_j \chi^2[A_j] (\)
good_chi_rel
\()^2\)
represents the sought final solution.