itsonlyamodel

MLE Hessian Derivation

MLE Space-Time Numerical Optimization II: Hessian

In my last post, we saw the speedup boosts gained when providing Scipy's minimize solver a gradient. We will take this a step further and see what happens when we provide the Hessian. The Hessian matrix is a second partial derivatives of a function $f(x)$. In our case, the log-likelihood of a Gaussian process with an anisotropic covariance function.

An anisotropic exponential space-time covariance function for a given year $n$ $$ k_n(x_1 , t_1 , x_2 , t_2 ; \theta) = \phi exp(-d(x_1 , t_1 , x_2 , t_2 )) $$ where the GP variance $\phi > 0$ and $$ d= \sqrt{ P }\\ P = \left( \frac{\Delta lat}{\theta_{lat}} \right)^2 + \left( \frac{\Delta lon}{\theta_{lon}} \right)^2 + \left( \frac{\Delta t}{\theta_{t}} \right)^2 $$

has a log-likelihood function

$$ l_n(\theta) = log (det(K_n(\theta) + \sigma\ I)) + y_n^T(K_n(\theta) + \sigma^2\ I)^{-1}y_n + m_n log(2\pi) $$

The total log-likelihood function for all years is a summation

$$ l(\theta) = -\frac{1}{2}\sum\limits_{n=0}^{total\ years} l_n(\theta) $$

Where $\kappa_n = K_n(\theta) + \sigma^2 I$

In our last entry, we found that the gradient

$$ \nabla l(\theta) = -\frac{1}{2}\sum\limits_{n=0}^{total\ years} tr\left( \left(\kappa_n(\theta)^{-1} - \kappa_n(\theta)^{-T} y_ny_n^T \kappa_n(\theta)^{-T}\right)\frac{ \partial \kappa_n(\theta) }{\partial \theta} \right) $$

We are interested in finding $\theta_{opt}$ that will minimize $l(\theta)$, and have found that providing the analytical function $\nabla l(\theta)$ Does a much better job in running the BFGS minimization algorithm. But there are faster algorithms available that can compute results much faster if you provide the Hessian.

$$ \nabla^2 l_n(\theta) = tr\left( \kappa_n^{-1} \partial^2\kappa_n \kappa_n^{-1}\kappa_n yy^T\right) + 2tr\left( \kappa_n^{-1} \partial \kappa_n \kappa_n^{-1}\partial\kappa_n \kappa_n^{-1}\left(yy^T-\frac{1}{2}\kappa_n\right)\right) $$

In this notebook, we shall compute the Hessian by hand and use a few different methods that will find the optimal values for $\theta$ in much fewer steps. Also, recall that we are working with a constrained optimization problem, but unconstrained optimization is possible with the transformation $\zeta = exp(\theta)$, which makes

$$ \nabla l(\zeta) = \nabla l(exp(\theta)) \odot \theta $$

$$ \nabla^2 l(\zeta) = \nabla^2 l(exp(\theta)) \odot \theta^T \theta $$

Where $\odot$ is the pairwise product operator. We shall see that this step is not needed for constrained optimization algorithms.

For

$$\theta = \begin{bmatrix} \phi \\ \theta_{lat} \\ \theta_{lon} \\ \theta_{t} \\ \sigma \end{bmatrix}$$

The hessian $\nabla ^2 l(\theta)$ is a $5 \times 5$ matrix, as is $\nabla^2 l_n(\theta)$. However $\nabla^2 \kappa_n$ is a tensor of rank 4 $\nabla^2 \kappa_n \in \mathbb{R}^{5 \times 5 \times m_n\ \times \ m_n }$ Revealing a daunting Hessian

$$ \Large \nabla^2\kappa = \begin{bmatrix} \frac{\partial \kappa}{\partial\phi^2}\ \frac{\partial \kappa}{\partial\phi\partial \theta_{lat}}\ \frac{\partial \kappa}{\partial\phi\partial \theta_{lon}}\ \frac{\partial \kappa}{\partial\phi\partial \theta_{t}}\ \frac{\partial \kappa}{\partial\phi\partial \sigma}\ \\ \frac{\partial \kappa}{\partial\theta_{lat}\partial\phi}\ \frac{\partial \kappa}{\partial\theta_{lat}^2}\ \frac{\partial \kappa}{\partial\theta_{lat}\partial \theta_{lon}}\ \frac{\partial \kappa}{\partial\theta_{lat}\partial \theta_{t}}\ \frac{\partial \kappa}{\partial\theta_{lat}\partial \sigma}\ \\ \frac{\partial \kappa}{\partial\theta_{lon}\partial\phi}\ \frac{\partial \kappa}{\partial\theta_{lon}\partial \theta_{lat}}\ \frac{\partial \kappa}{\partial\theta_{lon}^2}\ \frac{\partial \kappa}{\partial\theta_{lon}\partial \theta_{t}}\ \frac{\partial \kappa}{\partial\theta_{lon}\partial \sigma}\ \\ \frac{\partial \kappa}{\partial \theta_{t}\partial\phi}\ \frac{\partial \kappa}{\partial\theta_{t}\partial \theta_{lat}}\ \frac{\partial \kappa}{\partial\theta_{t}\partial \theta_{lon}}\ \frac{\partial \kappa}{\partial\theta_{t}^2}\ \frac{\partial \kappa}{\partial\theta_{t}\partial \sigma}\ \\ \frac{\partial \kappa}{\partial \sigma \partial \phi}\ \frac{\partial \kappa}{\partial \sigma \partial \theta_{lat}}\ \frac{\partial \kappa}{\partial \sigma \partial \theta_{lon}}\ \frac{\partial \kappa}{\partial \sigma \partial \theta_{t}}\ \frac{\partial \kappa}{\partial \sigma^2}\ \end{bmatrix} $$

Wow. Fortunately, we shall see that this matrix is symmetric and many of these terms are zero. The non zero terms are

$\Large\ \frac{\partial \kappa}{\partial\theta_{lat}} = \frac{\phi\Delta\ lat^2}{\theta_{lat} ^ 4} exp\left(-\sqrt{P}\right)\ P^{-\frac{1}{2}}\left(-3 + \frac{\Delta lat^2}{\theta_{lat}}P^{-\frac{1}{2}}\left( 1 + P^{-\frac{1}{2}}\right)\right)\\\Large\ \frac{\partial \kappa}{\partial\theta_{lon}} = \frac{\phi\Delta\ lon^2}{\theta_{lon} ^ 4} exp\left(-\sqrt{P}\right)\ P^{-\frac{1}{2}}\left(-3 + \frac{\Delta lon^2}{\theta_{lon}}P^{-\frac{1}{2}}\left( 1 + P^{-\frac{1}{2}}\right)\right)\\\Large\ \frac{\partial \kappa}{\partial\theta_{t}} = \frac{\phi\Delta\ t^2}{\theta_{t} ^ 4} exp\left(-\sqrt{P}\right)\ P^{-\frac{1}{2}}\left(-3 + \frac{\Delta t^2}{\theta_{t}}P^{-\frac{1}{2}}\left( 1 + P^{-\frac{1}{2}}\right)\right)\\\Large\ \frac{\partial \kappa}{\partial\sigma^2} = 2I\\\Large\ \frac{\partial \kappa}{\partial \phi \partial\theta_{lat}} = \frac{\partial \kappa}{\partial \theta_{lat}\partial\phi} = \ \frac{\Delta lat^2}{\theta_{lat}^3}exp\left(-\sqrt{P}\right)P^{-\frac{1}{2}} \\\Large\ \frac{\partial \kappa}{\partial \theta_{lon}\partial\theta_{lat}} = \frac{\partial \kappa}{\theta_{lon}\partial \theta_{lat}} = \ \frac{\phi \Delta lon^2 \Delta lat^2}{\theta_{lon}^3\theta_{lat}^3}exp\left(-\sqrt{P}\right)P^{-1}\left( 1 + P^{-\frac{1}{2}} \right) \\\Large\ \frac{\partial \kappa}{\partial \theta_{t} \partial\theta_{lon}} = \frac{\partial \kappa}{\partial \theta_{lon} \partial\theta_{t}} = \ \frac{\phi \Delta lon^2 \Delta t^2 }{\theta_{lon}^3\theta_{t}^3}exp\left(-\sqrt{P}\right)P^{-1}\left( 1 + P^{-\frac{1}{2}} \right) \\\Large\ \frac{\partial \kappa}{\partial \phi \partial\theta_{lon}} = \frac{\partial \kappa}{\partial \theta_{lon}\partial\phi} = \ \frac{\Delta lon^2}{\theta_{lon}^3}exp\left(-\sqrt{P}\right)P^{-\frac{1}{2}} \\\Large\ \frac{\partial \kappa}{\partial \theta_{lon}\partial\theta_{lat}} = \frac{\partial \kappa}{\theta_{lon}\partial \theta_{lat}} = \ \frac{\phi \Delta lon^2 \Delta lat^2}{\theta_{lon}^3\theta_{lat}^3}exp\left(-\sqrt{P}\right)P^{-1}\left( 1 + P^{-\frac{1}{2}} \right) \\\Large\ \frac{\partial \kappa}{\partial \theta_{t} \partial\theta_{lat}} = \frac{\partial \kappa}{\partial \theta_{lat} \partial\theta_{t}} = \ \frac{\phi \Delta lat^2 \Delta t^2 }{\theta_{lat}^3\theta_{t}^3}exp\left(-\sqrt{P}\right)P^{-1}\left( 1 + P^{-\frac{1}{2}} \right) $

Implementing in Python

Here we will check our analytical functions for $\nabla^2 \kappa, \nabla^2 l_n$, and $\nabla^2 l$. With the numerical approximation. We reuse some of the classes and functions from the last post.

In [2]:
from local_mle_space_time import *
import itertools
In [3]:
np.random.seed(seed=2)
def make_data(mLen=40):
    lat = np.random.uniform(-10,10, mLen)
    lon = np.random.uniform(-10,10, mLen)
    res = np.random.normal(0, 2.5, mLen)
    julDay = np.random.uniform(0, 365, mLen)
    agg = Agg(lat, lon, res, julDay)
    return agg

n = 10
LARGEVAL = 10**8
SMALLVAL = np.finfo(np.float64).eps
thetaInitGuess = [np.log(x) for x in [400**2, 5, 5, 5, 100]]
theta = Theta(*thetaInitGuess)
data = Data(*make_data(n).arr())
In [4]:
def hess_kappa(theta, data):
    H = np.zeros((5,5, data.m, data.m))
    P = p_fun(theta, data)
    P[P == 0] = np.inf # divide by zero correction
    Pnsq = P ** -.5
    Pchunk = 1/P * ( 1 + Pnsq)
    
    stChunk = lambda th1, th2: Pchunk / (th1 ** 3 * th2 ** 3 )
    stcov = space_time_covariance_exp(theta, data)
    stcovdp = stcov / theta.phi
    H[4,4, :, :] = 2 * np.eye(data.m)
    H[0,1, :, :] = stcovdp * partial_p_fun(data.dlat, theta.lat) * Pnsq
    H[1,0, :, :] = H[0,1, :, :]
    H[0,2, :, :] = stcovdp * partial_p_fun(data.dlon, theta.lon) * Pnsq
    H[2,0, :, :] = H[0,2, :, :]
    H[0,3, :, :] = stcovdp * partial_p_fun(data.dt, theta.t) * Pnsq
    H[3,0, :, :] = H[0,3, :, :]
    
    dlatsdlons = data.dlat ** 2 * data.dlon ** 2
    dlatsdts = data.dlat ** 2 * data.dt ** 2
    dlonsdts = data.dt ** 2 * data.dlon ** 2
    
    latlngChunk = stChunk(theta.lat, theta.lon)
    lattChunk = stChunk(theta.lat, theta.t)
    lontChunk = stChunk(theta.t, theta.lon)
    
    H[1,2, :, :] = stcov * dlatsdlons * latlngChunk
    H[2,1, :, :] = stcov * dlatsdlons * latlngChunk

    H[3,1, :, :] = stcov * dlatsdts  * lattChunk
    H[1,3, :, :] = stcov * dlatsdts * lattChunk

    H[3,2, :, :] = stcov * dlonsdts * lontChunk
    H[2,3, :, :] = stcov * dlonsdts * lontChunk
    
    ddChunk = lambda dx, th: -3 + (dx / th) ** 2 * Pnsq * ( 1 + Pnsq)
    mChunk = stcov * Pnsq
    
    H[1, 1, :, :] = mChunk * data.dlat ** 2 / theta.lat ** 4 * ddChunk(data.dlat, theta.lat)
    H[2, 2, :, :] = mChunk * data.dlon ** 2 / theta.lon ** 4 * ddChunk(data.dlon, theta.lon)
    H[3, 3, :, :] = mChunk * data.dt ** 2 / theta.t ** 4 * ddChunk(data.dt, theta.t)
    return H

our Hessian estimate functions are defined here.

In [5]:
def hessian ( x, the_func, eps=10**-8):
    """Numerical approximation to the Hessian
    Parameters
    ------------
    x: array-like
        The evaluation point
    the_func: function
        The function. We assume that the function returns the function value and
        the associated gradient as the second return element
    eps: float
        The size of the step
    """
    N = x.size
    H = np.zeros((N,N))
    df_0 = the_func (x)[1]
    for idx in range(N):
        xx0 = 1. * x[idx]
        x[idx] = xx0 + eps
        df_1 = the_func ( x )[1]
        H[idx, :] = ((df_1 - df_0)/eps)
        x[idx] = xx0
    return H

def hessian_4D ( x, the_func, eps=10**-8):
    """Numerical approximation to the 4D Hessian
    Parameters
    ------------
    x: array-like
        The evaluation point
    the_func: function
        The function. We assume that the function returns the function value and
        the associated gradient as the second return element
    eps: float
        The size of the step
    """
    N = x.size
    H = np.zeros((N,N, data.m, data.m))
    df_0 = the_func (x)[1]
    for idx in range(N):
        xx0 = 1. * x[idx]
        x[idx] = xx0 + eps
        df_1 = the_func ( x )[1]
        H[idx, :, :, :] = ((df_1 - df_0)/eps)
        x[idx] = xx0
    return H
In [6]:
kappaHess = hess_kappa(theta, data)
my_fun = lambda th: (kappa_fun(Theta(*th), data), grad_kappa(Theta(*th), data))
kappaHessNum = hessian_4D(np.array(thetaInitGuess), my_fun)
for idx, jdx in itertools.product(range(5), range(5)):
    pdd = kappaHess[idx, jdx, :, :]
    pddx = kappaHessNum[idx, jdx, :, :]
    if np.allclose(pdd, pddx, 10**-5):
        continue
    for kdx, ldx in itertools.product(range(pdd.shape[0]), range(pdd.shape[1])):
        pdde = pdd[kdx, ldx]
        pddex = pddx[kdx, ldx]
        print(f'elem: {idx}, {jdx}, {kdx}, {ldx}, analytic: {pdde}, numeric: {pddex}, ratio: {pdde/pddex}')

The analytic function agrees with the numeric approximation.

$nll_i(\theta)$ Hessian

In [7]:
def hess_nll(theta, data):
    kappa = kappa_fun(theta, data)
    gradKappa = grad_kappa(theta, data)
    hessKappa = hess_kappa(theta, data)
    nllHess = np.zeros((5,5))
    for idx, jdx in itertools.product(range(5), range(5)):
        ft1 = solve(kappa, hessKappa[idx, jdx, :, :])
        ft2 = solve(kappa, kappa - data.yyT)
        firstTerm = np.dot(ft1, ft2)
        st1 = solve(kappa, gradKappa[idx, :, :])
        st2 = solve(kappa, gradKappa[jdx, :, :])
        st3 = solve(kappa, data.yyT - .5 * kappa)
        secondTerm = np.dot(np.dot(st1, st2), st3)
        nllHess[idx, jdx] =  np.trace(firstTerm) + 2 * np.trace(secondTerm)
    return nllHess
In [8]:
nll_fun = lambda th: (neg_log_liklihood(Theta(*th), data), grad_nll(Theta(*th), data))
nllHessNum = hessian(np.array(thetaInitGuess), nll_fun)
nllHess = hess_nll(theta, data)
for idx, jdx in itertools.product(range(5), range(5)):
    pdd = nllHess[idx, jdx]
    pddx = nllHessNum[idx, jdx]
    if np.isclose(pdd, pddx, 10**-4):
        continue
    print(f'elem: {idx}, {jdx}, analytic: {pdd}, numeric: {pddx}, ratio: {pdd/pddx}')
elem: 1, 4, analytic: 1.0090298510675597e-08, numeric: 4.440892098500626e-08, ratio: 0.22721332306367845

Sometimes elements can disagree, but we can ignore them if they are small.

Verify $nll(\theta)$

In [9]:
def make_agg_years():
    aggYears = {}
    for year in range(2007, 2019):
        data = Data(*make_data(10).arr())
        aggYears[str(year)] = data
    return aggYears

def pairwise_product(arr):
    '''calculates product between each point'''
    cp = np.array(np.meshgrid(arr, arr)).T
    pwp = cp[:,:,0].flatten() * cp[:,:,1].flatten()
    return pwp.reshape(len(arr), len(arr))

def sum_nll(thetaArr, filtAggs):
    theta = Theta(*thetaArr)
    years = filtAggs.keys()
    nllSum = 0
    for iYear in years:
        data = Data(*filtAggs[iYear].arr())
        nllSum += neg_log_liklihood(theta, data)
    return .5 * nllSum

def grad_sum_nll(thetaArr, aggYears):
    theta = Theta(*thetaArr)
    years = aggYears.keys()
    gradSum = np.zeros(5)
    for iYear in years:
        data = Data(*aggYears[iYear].arr())
        gradSum = np.add(gradSum, grad_nll(theta, data))
    return .5 * gradSum

def hess_sum_nll(thetaArr, aggYears):
    theta = Theta(*thetaArr)
    years = aggYears.keys()
    hessSum = np.zeros((5, 5))
    for iYear in years:
        data    = aggYears[iYear]
        hessNll = hess_nll(theta, data)
        hessSum = np.add(hessSum, hessNll)
    return .5 * hessSum
In [10]:
aggYears = make_agg_years()
sum_nll_fun = lambda th: (sum_nll(th, aggYears), grad_sum_nll(th, aggYears))
sumNllHessNum = hessian(np.array(thetaInitGuess), sum_nll_fun, 10**-6)
In [11]:
sumNllHess = hess_sum_nll(thetaInitGuess, aggYears)
for idx, jdx in itertools.product(range(5), range(5)):
    pdd = sumNllHess[idx, jdx]
    pddx = sumNllHessNum[idx, jdx]
    if np.isclose(0, pdd, 10**-4 ) or np.isclose(0, pddx, 10**-4 ):
        continue
    if np.isclose(pdd, pddx, 10**-4):
        continue
    print(f'elem: {idx}, {jdx}, analytic: {pdd}, numeric: {pddx}, ratio: {pdd/pddx}')

Excellent! Numerical approximations and analytical results are roughly in agreement.

Unconstrained optimization

We need to be aware of transforming the constrained $\theta \gt 0$ to the unconstrained optimization problem $exp(\theta)$ Remember that we are setting the $\theta$ values to $\zeta = exp(\theta)$ to make this an unconstrained optimization problem (inputting, negative values will cause errors). This will have a slight modification on the negative log-likelihood gradient.

$$\large \nabla l(\zeta) = \nabla l(exp(\theta)) \odot \theta $$ and $$\large \nabla^2 l(\zeta) = \nabla^2 l(exp(\theta)) \odot \theta^T \theta $$

Where $\odot$ is the pairwise product operator.

We assign our objective functions and apply the available algorithms in Scipy's minimize documentation. The methods that did not seem to work are omitted.

In [12]:
def obj_fun(thetaArr, aggYears):
    eth = [np.exp(x) for x in thetaArr]
    return sum_nll(eth, aggYears)
    
def grad_fun(thetaArr, aggYears):
    eth = [np.exp(x) for x in thetaArr]
    return np.multiply(grad_sum_nll(eth, aggYears), np.array(eth))

def hess_fun(thetaArr, aggYears):
    eth = [np.exp(x) for x in thetaArr]
    return np.multiply(hess_sum_nll(eth, aggYears), pairwise_product(eth))

def timeit(method):
    def timed(*args, **kw):
        ts = time.time()
        result = method(*args, **kw)
        te = time.time()
        print('%r  %2.2f ms' % \
              (method.__name__, (te - ts) * 1000))
        return result
    return timed

options = {'disp': True, 'maxiter': 1000 }

def vanilla_cb(x):
    print(x)

def is_pos_def(x):
    return np.all(np.linalg.eigvals(x) > 0)

print('Hessian positive definite?: {}'.format(is_pos_def(hess_fun(thetaInitGuess, aggYears))))
aggYears = make_agg_years()
@timeit
def min_obj_no_grad():
    return minimize(obj_fun, thetaInitGuess, args=aggYears, method='BFGS', tol=10**-4, options=options)

@timeit
def min_obj_grad():
    return minimize(obj_fun, thetaInitGuess, args=aggYears, method='BFGS', tol=10**-4, jac=grad_fun, options=options)

@timeit
def min_obj_grad_nm():
    return minimize(obj_fun, thetaInitGuess, args=aggYears, method='Nelder-Mead', tol=10**-4, options=options)

@timeit
def min_obj_NCG():
    return minimize(obj_fun, thetaInitGuess, args=aggYears, method='Newton-CG', jac=grad_fun, hess=hess_fun, tol=10**-4, options=options)

@timeit
def min_obj_cg():
    return minimize(obj_fun, thetaInitGuess, args=aggYears, method='CG', jac=grad_fun, hess=hess_fun, tol=10**-4, options=options)

xOpt = min_obj_no_grad()
xOptFast = min_obj_grad()
xOptNM = min_obj_grad_nm()
xOptsCG = min_obj_cg()
xOptsNCG = min_obj_NCG()
Hessian positive definite?: False
Optimization terminated successfully.
         Current function value: 278.898117
         Iterations: 108
         Function evaluations: 786
         Gradient evaluations: 131
'min_obj_no_grad'  6824.62 ms
Optimization terminated successfully.
         Current function value: 278.898116
         Iterations: 104
         Function evaluations: 120
         Gradient evaluations: 120
'min_obj_grad'  3374.10 ms
Optimization terminated successfully.
         Current function value: 278.945299
         Iterations: 116
         Function evaluations: 220
'min_obj_grad_nm'  1887.61 ms
/home/tyler/anaconda3/envs/argo/lib/python3.6/site-packages/scipy/optimize/_minimize.py:523: RuntimeWarning: Method CG does not use Hessian information (hess).
  RuntimeWarning)
Optimization terminated successfully.
         Current function value: 278.945330
         Iterations: 6
         Function evaluations: 19
         Gradient evaluations: 19
'min_obj_cg'  572.18 ms
Optimization terminated successfully.
         Current function value: 278.945982
         Iterations: 16
         Function evaluations: 21
         Gradient evaluations: 21
         Hessian evaluations: 16
'min_obj_NCG'  3071.64 ms

Each method was terminated successfully. We also see how a few other minimization routines (documented here completed. The Nelder-Mead (Simplex), Conjugate Gradient, and Newton Conjugate Gradient methods were completed both faster and with fewer steps than BFGS.

Constrained Optimization

We know at heart that this is a constrained optimization problem so why are we insisting on using $\zeta = exp(\theta)$? The sequential least squares option (SLSQP) only uses the gradient, and makes a very fast prediction.

In [19]:
def obj_fun(thetaArr, aggYears):
    return sum_nll(thetaArr, aggYears)
    
def grad_fun(thetaArr, aggYears):
    return grad_sum_nll(thetaArr, aggYears)

def hess_fun(thetaArr, aggYears):
    hess_sum_nll( thetaArr, aggYears)
maxB = 10**10
minB = 10**-6
bounds = [ (minB, maxB) for x in range(5) ]

@timeit
def min_obj_grad_slqsp(): 
    return minimize(obj_fun, thetaInitGuess, args=aggYears, method='SLSQP', tol=10**-4, jac=grad_fun, bounds=bounds, options=options)

xOptSLSQP = min_obj_grad_slqsp()
Optimization terminated successfully    (Exit mode 0)
            Current function value: 278.73682561966064
            Iterations: 28
            Function evaluations: 38
            Gradient evaluations: 28
'min_obj_grad_slqsp'  795.42 ms

Final Remarks

As painful as it is to calculate gradients and Hessians, we shall see that it does pay off to have an analytical function to plug into your optimization algorithms.

I was surprised at how well the results turn out when we define some bounds for the Sequential Least-Squares Programming (SLSQP) algorithm. Even when we omit the Hessian, it performs extremely well. Bear in mind that the data we are using in these demonstrations is synthetic. Real data is rarely truly Gaussian or even stationary. I wouldn't be so quick to suggest which method is best, let alone trust that the results even agree. All we have now are options for implementing real data. We can implement a cascade of methods such that when one suboptimal or fails, we try another. Thanks for reading!