Sunday, February 12, 2012

Using rst2blogger and Breush-Godfrey test

This is mainly a test to see if rst2blogger works. The description sounded like just what I needed to make going from rst file to blogger less work. Some setup inconveniences: Why do python programs need to install distribute and destroy my setuptools? Lxml didn't have windows binaries, and easy_install fails because the source doesn't build. In contrast to sphinx, the plain docutils do not highlight python code, at least not right away. If there is an exception in rendering the html, then the message is completely uninformative: except Exception

(Reminds me of the sound I'm hearing when the kids are just guessing in a numbers game: "Try again", "Try again", ...)

try a rst list: some categories of tests that I have been working on recently.

  • diagnostic tests
    • autocorrelation
    • heteroscedasticity
    • multicollinearity
    • functional form
    • normality
    • outliers
    • influence
  • tests for coefficients/parameters
    • t test for linear restrictions
    • F test for linear restrictions
    • Wald test for linear restrictions missing
    • Likelihood Ratio test to compare nested models
    • Wald/F test test to compare nested models
    • comparing non-nested models: J, Cox

Writing Breush-Godfrey autocorrelation test in 5 minutes

The background story: I have written several diagnostics tests a while ago, maybe one and a half years ago. Many of the diagnostic test have a similar structure and it's easy to just copy the pattern. For one test, acorr_lm, I took the code and basic idea of the augmented dickey fuller test for unit roots, and wanted to make a Lagrange Multiplier test for autocorrelation out of it. Some time later, I added some comments that this could be used as Engle's ARCH test but that there is a difference to Breush-Godfrey's serial correlation test.

While I was writing tests comparing my diagnostic tests with R, I saw that Breush-Godfrey was still missing. Since this time I had unit tests, it was very quick work, copy and paste, and checking Wikipedia .

The test passed after a few changes, and I barely had to think.

Build the matrix of lagged residuals, column_stack them with the original regressors, get the matrix for the linear restrictions that the joint effect of the lagged residuals is zero. Most of the work is done by existing functions.

I haven't changed the docstring yet, and there is still cleanup work to do.

def acorr_breush_godfrey(results, nlags=None, store=False):
    '''Lagrange Multiplier tests for autocorrelation

    not checked yet, copied from unitrood_adf with adjustments
    check array shapes because of the addition of the constant.
    written/copied without reference
    This is not Breush-Godfrey. BG adds lags of residual to exog in the
    design matrix for the auxiliary regression with residuals as endog,
    see Greene 12.7.1.

    Notes
    -----
    If x is calculated as y^2 for a time series y, then this test corresponds
    to the Engel test for autoregressive conditional heteroscedasticity (ARCH).
    TODO: get details and verify

    '''

    x = np.concatenate((np.zeros(nlags), results.resid))
    exog_old = results.model.exog
    x = np.asarray(x)
    nobs = x.shape[0]
    if nlags is None:
        #for adf from Greene referencing Schwert 1989
        nlags = 12. * np.power(nobs/100., 1/4.)#nobs//4  #TODO: check default, or do AIC/BIC

    xdall = lagmat(x[:,None], nlags, trim='both')
    nobs = xdall.shape[0]
    xdall = np.c_[np.ones((nobs,1)), xdall]
    xshort = x[-nobs:]
    exog = np.column_stack((exog_old, xdall))
    k_vars = exog.shape[1]

    if store: resstore = ResultsStore()

    resols = sm.OLS(xshort, exog).fit()
    ft = resols.f_test(np.eye(nlags, k_vars, k_vars - nlags))
    fval = ft.fvalue
    fpval = ft.pvalue
    fval = np.squeeze(fval)[()]   #TODO: fix this in ContrastResults
    fpval = np.squeeze(fpval)[()]
    lm = nobs * resols.rsquared
    lmpval = stats.chi2.sf(lm, nlags)
    # Note: degrees of freedom for LM test is nvars minus constant = usedlags
    #return fval, fpval, lm, lmpval

    if store:
        resstore.resols = resols
        resstore.usedlag = nlags
        return fval, fpval, lm, lmpval, resstore
    else:
        return fval, fpval, lm, lmpval

Just a brief explanation to the main tools:

lagmat just creates a matrix of lagged values without having to worry about how to shift and cut the arrays.

>>> from scikits.statsmodels.tsa.tsatools import lagmat
>>> lagmat(np.arange(8), 3, trim='both')
array([[ 2.,  1.,  0.],
       [ 3.,  2.,  1.],
       [ 4.,  3.,  2.],
       [ 5.,  4.,  3.],
       [ 6.,  5.,  4.]])

numpy.eye is very flexible and allows shifting of the diagonal

>>> np.eye(3, 5, 5-3)
array([[ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.]])

The rest are mainly calls to OLS and f_test. One part I needed to change was that R takes the initial errors to be zero instead of truncating the regression, so I also had to concatenate some zeros in front of the regression residuals array.

No comments:

Post a Comment