Stochastic Optimization with Consistent Constraints

Application of the method of consistent constraints in SOM is twofold:

Both applications are rather involved and support many user-adjustable parameters. Names of their respective keyword arguments are typeset as monospaced text in the following.

Consistent-constraints (CC) updates introduced in SOM 2.0 are a new type of updates in the Markov chain used to accumulate particular solutions (Section II.A of [GMPPS2017]). Unlike elementary updates, they radically change an MC configuration and help reveal local minima of the objective function more quickly.

The CC updates are proposed during the first stage of a global update at regular intervals. Two consecutive CC updates are separated by cc_update_cycle_length elementary updates. A CC update is skipped if it could potentially result in a configuration with too many rectangles, $$2K_0+1>$$ max_rects, where $$K_0$$ is the number of rectangles in the current configuration. CC updates are accepted and rejected based on the same Metropolis criterion as the elementary updates. A proposed CC update proceeds as follows.

1. Take the current configuration, construct a configuration made out of non-overlapping rectangles $$\{(c_k, w_k, h_k)\}$$ (Eqs. (16)-(17) of [GMPPS2017]) and discard rectangles with width below min_rect_width. If size $$K$$ of the resulting non-overlapping configuration is less than 3 (too small to evaluate the 2nd derivative), reject the update.
2. Collect heights $$h_k$$ of rectangles in the non-overlapping configuration in a $$K$$-dimensional vector $$\mathbf{h}$$.
3. Starting from the collected heights, use the CC protocol (see below) to compute optimal heights $$\mathbf{h}^\mathrm{(opt)}$$.
4. If any of $$h^\mathrm{(opt)}_k$$ is significantly negative, i.e. $$w_k h^\mathrm{(opt)}_k < -$$ min_rect_weight, reject the update.
5. Replace heights $$h_k$$ in the non-overlapping configuration with the optimal heights $$h^\mathrm{(opt)}_k$$.
6. Remove small rectangles ($$w_k h^\mathrm{(opt)}_k <$$ min_rect_weight) from the configuration and redistribute their weight by changing heights of their neighboring rectangles. This step is repeated until none of the rectangles has weight below min_rect_weight.
7. Evaluate the $$\chi^2$$-functional for the new optimized configuration and use $$\chi^2_\mathrm{(opt)}$$ to compute the Metropolis ratio.

The CC protocol consists in iterative minimization of a quadratic function of $$h_k$$ with self-consistent determination of regularization parameters $$Q^{(0)}_k, Q^{(1)}_k, Q^{(2)}_k$$ this function depends on. The function in question is a sum of the $$\chi^2$$-functional and a few regularization terms,

$O[\mathbf{h}; Q^{(0)},Q^{(1)},Q^{(2)}] = \chi^2[\mathbf{h}] + O_0[\mathbf{h};Q^{(0)}] + O_1[\mathbf{h};Q^{(1)}] + O_2[\mathbf{h};Q^{(2)}].$

The regularization term $$O_0$$ penalizes large amplitudes of a solution,

$O_0[\mathbf{h};Q^{(0)}] = \sum_{k=1}^K (Q^{(0)}_k)^2 h_k^2 = \mathbf{h}^T \hat O_{(0)} \mathbf{h}, \quad \hat O_{(0)} = \mathrm{diag}((Q^{(0)}_k)^2).$

$$O_1$$ and $$O_2$$ penalize large values of the 1st and 2nd derivatives of a solution respectively,

\begin{align}\begin{aligned}O_1[\mathbf{h};Q^{(1)}] &= \sum_{k=1}^{K-1} (Q^{(1)}_k)^2 |A'(\epsilon_k)|^2 = \mathbf{h}^T \hat O_{(1)} \mathbf{h},\\O_2[\mathbf{h};Q^{(2)}] &= \sum_{k=2}^{K-1} (Q^{(2)}_{k-1})^2 |A''(\epsilon_k)|^2 = \mathbf{h}^T \hat O_{(2)} \mathbf{h}.\end{aligned}\end{align}

Matrices $$\hat O_{(1)}$$ and $$\hat O_{(2)}$$ are derived from finite-difference approximations of $$A'$$ and $$A''$$ at points $$\epsilon_k = c_k$$.

CC protocol iterations are organized as follows.

1. Initialize regularization parameters according to

• $$Q^{(0)}_k = 0,\ k=\overline{1,K}$$,
• $$Q^{(1)}_k = ($$ cc_update_der_penalty_init $$)W^2/ \mathcal{N},\ k = \overline{1,K-1}$$,
• $$Q^{(2)}_k = ($$ cc_update_der_penalty_init $$)W^3/ \mathcal{N},\ k = \overline{1,K-2}$$,

where $$W$$ is the energy window width (energy_window[1] - energy_window[0]) and $$\mathcal{N}$$ is the requested solution norm.

2. Use a numerical linear algebra algorithm to minimize the quadratic function $$O[\mathbf{h}; Q^{(0)},Q^{(1)},Q^{(2)}]$$ w.r.t. $$\mathbf{h}$$ subject to equality constraint $$\sum_{k=1}^K w_k h_k = \mathcal{N}$$.

3. Compute discrepancy of weights of rectangles with heights $$\mathbf{h}$$ and $$\mathbf{h}^\mathrm{new}$$ as $$\frac{1}{\mathcal{N}}\sum_{k=1}^K |w_k (h_k - h^\mathrm{new}_k)|$$. If it lies within a fixed tolerance level cc_update_rect_norm_variation_tol, terminate iterations.

4. Update amplitude regularization parameters $$Q^{(0)}_k$$: If $$h^\mathrm{new}_k$$ is negative, set $$Q^{(0)}_k =($$ cc_update_height_penalty_max $$)W / \mathcal{N}$$, otherwise divide $$Q^{(0)}_k$$ by cc_update_height_penalty_divisor.

5. Use finite-difference approximations to compute derivatives of $$A$$ at $$\epsilon_k = c_k$$:

• 1st derivatives $$d^{(1)}_k, k = \overline{1,K-1}$$.
• 2nd derivatives $$d^{(2)}_k, k = \overline{1,K-2}$$.
6. Compute limiting constants for $$Q^{(1)}_k$$ and $$Q^{(2)}_k$$:

• $$Q^{(1)}_\mathrm{lim} = ($$ cc_update_der_penalty_limiter $$) \min_k Q^{(1)}_k$$.
• $$Q^{(2)}_\mathrm{lim} = ($$ cc_update_der_penalty_limiter $$) \min_k Q^{(2)}_k$$.
7. Update the derivative regularization parameters based on the magnitudes of $$d^{(1)}_k$$ and $$d^{(2)}_k$$.

• If $$|d^{(1)}_k| > ($$ cc_update_der_penalty_threshold $$/ \sqrt{K-1}) / Q^{(1)}_k$$, then set $$Q^{(1)}_k = ($$ cc_update_der_penalty_threshold $$/ \sqrt{K-1}) / |d^{(1)}_k|$$. Otherwise multiply $$Q^{(1)}_k$$ by cc_update_der_penalty_increase_coeff.
• If $$|d^{(2)}_k| > ($$ cc_update_der_penalty_threshold $$/ \sqrt{K-1}) / Q^{(2)}_k$$, then set $$Q^{(2)}_k = ($$ cc_update_der_penalty_threshold $$/ \sqrt{K-1}) / |d^{(2)}_k|$$. Otherwise multiply $$Q^{(2)}_k$$ by cc_update_der_penalty_increase_coeff.
8. Reduce excessively large derivative regularization parameters to avoid divergent behaviour:

• If $$Q^{(1)}_k > Q^{(1)}_\mathrm{lim}$$, then set $$Q^{(1)}_k = Q^{(1)}_\mathrm{lim}$$.
• If $$Q^{(2)}_k > Q^{(2)}_\mathrm{lim}$$, then set $$Q^{(2)}_k = Q^{(2)}_\mathrm{lim}$$.
9. If the maximal allowed number of iterations cc_update_max_iter is reached, terminate. Otherwise set $$\mathbf{h} = \mathbf{h}^\mathrm{new}$$ and repeat from step 2.

Construction of final solution using the consistent-constraints protocol

SOM 2.0 features an optimization procedure that finds a vector of final solution coefficients $$\mathbf{c}$$ resulting in a smoother spectral function. It also allows to bias the final solution towards a user-provided default model.

The optimization procedure minimizes a quadratic function of $$\mathbf{c}$$ depending on a few sets of regularization parameters,

$O[\mathbf{c}; Q_k, D_k, B_k, T_k, A_T(\epsilon_k), U] = O_Q[\mathbf{c}; Q_k] + O_D[\mathbf{c}; D_k] + O_B[\mathbf{c}; B_k] + O_T[\mathbf{c}; T_k, A_T(\epsilon)] + O_U[\mathbf{c}; U].$
• $$O_Q[\mathbf{c}; Q_k]$$ stabilizes the final solution by penalizing large amplitudes,

$O_Q[\mathbf{c}; Q_k] = \sum_{k=1}^K Q_k^2 A(\epsilon_k)^2 = \mathbf{c}^T \hat O_Q \mathbf{c}, \quad (\hat O_Q)_{jj'} = \sum_{k=1}^K Q_k^2 A_j(\epsilon_k) A_{j'}(\epsilon_k).$
• $$O_D[\mathbf{c}; D_k]$$ penalizes large 1st derivatives of the final solution,

$O_D[\mathbf{c}; D_k] = \sum_{k=1}^{K-1} D_k^2 |A'(\epsilon_k)|^2 = \mathbf{c}^T \hat O_D \mathbf{c}, \quad (\hat O_D)_{jj'} = \sum_{k=1}^{K-1} D_k^2 A'_j(\epsilon_k) A'_{j'}(\epsilon_k).$
• $$O_B[\mathbf{c}; B_k]$$ penalizes large 2nd derivatives of the final solution,

$O_B[\mathbf{c}; B_k] = \sum_{k=2}^{K-1} B_{k-1}^2 |A''(\epsilon_k)|^2 = \mathbf{c}^T \hat O_B \mathbf{c}, \quad (\hat O_B)_{jj'} = \sum_{k=2}^{K-1} B_{k-1}^2 A''_j(\epsilon_k) A''_{j'}(\epsilon_k).$
• $$O_T[\mathbf{c}; T_k, A_T(\epsilon)]$$ penalizes deviations from a predefined default model (“target”) $$A_T(\epsilon)$$,

\begin{align}\begin{aligned}O_T[\mathbf{c}; T_k, A_T(\epsilon)] &= \sum_{k=1}^K T_k [A(\epsilon_k) - A_T(\epsilon_k)]^2 = \mathbf{c}^T \hat O_T \mathbf{c} - 2\mathbf{f}_T^T\mathbf{c}+\mathrm{const},\\hat O_T)_{jj'} &= \sum_{k=1}^K T_k A_j(\epsilon_k) A_{j'}(\epsilon_k), \quad (\mathbf{f}_T)_j = \sum_{k=1}^K T_k A_j(\epsilon_k) A_T(\epsilon_k).\end{aligned}\end{align} • \(O_U[\mathbf{c}; U] penalizes deviations from the equal weight superposition $$c_j = 1/J$$.

\begin{align}\begin{aligned}O_U[\mathbf{c}; U] &= U \sum_{j=1}^J (c_j - 1/J)^2 = \mathbf{c}^T \hat O_U \mathbf{c}-2\mathbf{f}_U^T\mathbf{c}+\mathrm{const},\\hat O_U)_{jj'} &= U \delta_{jj'},\quad (\mathbf{f}_U)_j = U / J.\end{aligned}\end{align} Values of particular spectral functions \(A_j(\epsilon) and their derivatives are approximated on a user-supplied uniform or non-uniform energy grid $$\{\epsilon_k\}$$ (refreq_mesh).

The iterative optimization procedure proceeds as follows.

1. Precompute approximate values of $$A_j(\epsilon_k)$$, $$A'_j(\epsilon_k)$$ and $$A''_j(\epsilon_k)$$.

2. Using user-supplied parameters $$A_T(\epsilon_k)=$$ default_model, $$T_k =$$ default_model_weights and $$U=$$ ew_penalty_coeff precompute $$\mathbf{f} = \mathbf{f}_T + \mathbf{f}_U$$.

3. Initialize regularization parameters according to

• $$\mathcal{D} = ($$ der_penalty_init $$) / J$$,
• $$Q_k = 0$$,
• $$D_k = \mathcal{D}$$,
• $$B_k = \mathcal{D}$$.
4. Use the regularization parameters to compute matrix $$\hat O = \hat O_Q + \hat O_D + \hat O_B + \hat O_T + \hat O_U$$.

5. Use a numerical linear algebra algorithm to minimize $$\mathbf{c}^T \hat O \mathbf{c} - 2\mathbf{f}^T\mathbf{c}$$ w.r.t. $$\mathbf{c}$$ subject to the normalization sum rule $$\sum_{J=1}^J c_j = 1$$. The minimization result at the $$I$$-th iteration is stored as a vector $$\mathbf{c}^{(I)}$$.

6. Compute and store relative $$L_1$$-distance between $$\mathbf{c}^{(I)}$$ and $$\mathbf{c}^{(I-1)}$$,

$d(I, I-1) = \frac{\sum_{j=1}^J |c^{(I)}_j - c^{(I-1)}_j|} {\sum_{j=1}^J |c^{(I)}_j|},$

where the initial vector $$\mathbf{c}^{(0)}$$ is taken to have all components equal to $$1/J$$.

7. In case user has provided a monitor function (monitor), it is called with 4 arguments,

• a 1D NumPy array $$\mathbf{c}^{(I)}$$,
• a 2-tuple of 1D arrays $$(A(\epsilon_k), Q_k)$$,
• a 2-tuple of 1D arrays $$(A'(\epsilon_k), D_k)$$,
• a 2-tuple of 1D arrays $$(A''(\epsilon_k), B_k)$$.

Returning True from the function terminates iterations.

8. If $$\sum_{j=1}^J |c^{(I)}_j| >$$ max_sum_abs_c (contribution of the negative coefficients is too big), then terminate iterations.

9. If $$d(I, I-1) < \min_n |\sigma_n/G(n)|$$, then convergence is reached and iterations are terminated. $$G(n)$$ is the right-hand side of the analytic continuation problem, and $$\sigma_n$$ are respective estimated error bars on the input data.

10. Form a new final solution from coefficients $$\mathbf{c}^{(I)}$$,

$A(\epsilon_k) = \sum_{j=1}^J c^{(I)}_j A_j(\epsilon_k),\quad A'(\epsilon_k) = \sum_{j=1}^J c^{(I)}_j A'_j(\epsilon_k),\quad A''(\epsilon_k) = \sum_{j=1}^J c^{(I)}_j A''_j(\epsilon_k).$
1. Increase regularization parameters $$\mathcal{D}$$, $$D_k$$ and $$B_k$$ by the factor der_penalty_coeff.
2. Update the amplitude regularization parameters $$Q_k$$: If $$A(\epsilon_k) < 0$$, then set $$Q_k =$$ amp_penalty_max. Otherwise divide $$Q_k$$ by amp_penalty_divisor.
3. Update the derivative regularization parameters:
• If $$D_k |A'(\epsilon_k)| > \mathcal{D}$$, then set $$D_k = \mathcal{D} / |A'(\epsilon_k)|$$;
• If $$B_k |A''(\epsilon_k)| > \mathcal{D}$$, then set $$B_k = \mathcal{D} / |A''(\epsilon_k)|$$.
4. If the maximal allowed number of iterations max_iter is reached, terminate. Otherwise repeat from step 4.

Once the iterations have been terminated, one inspects the accumulated list of pairs $$(\mathbf{c}^{(1)}, d(1,0)), (\mathbf{c}^{(2)}, d(2,1)), \ldots$$. Elements of the list are sorted in the ascending order according to $$d(I,I-1)$$, i.e. $$\mathbf{c}^{(I)}$$ from the iteration closest to convergence comes first in the sorted list. The first vector of coefficients from the sorted list that obeys

• $$\chi^2\left[\sum_{j=1}^J c^{(I)}_j A_j\right] \leq ($$ good_chi_abs $$)^2$$ and
• $$\chi^2\left[\sum_{j=1}^J c^{(I)}_j A_j\right] \leq \min_j \chi^2[A_j] ($$ good_chi_rel $$)^2$$

represents the sought final solution.