Tuesday, April 28, 2009

Having fun with expectations

Using the distribution classes in scipy.stats it is easy to calculate expectations of a function with respect to any distributions using numerical integration.

I’m going to write a function that calculates the expectation, then I attach it to the class of continuous distributions in scipy.stats. Finally we can use our new method with any existing distribution.

Warning: Monkey Patching a class can have unintended effects if the new or changed methods interfere with other uses. In this case we just add a new method, which does not effect any of the original use of the distributions.

import numpy as np
from scipy import stats, integrate

def expectedfunc(self, fn=None, args=(), lb=None, ub=None, conditional=False):
    '''calculate expected value of a function with respect to the distribution

    only for standard version of distribution,
    location and scale not tested

        all parameters are keyword parameters
        fn : function (default: identity mapping)
           Function for which integral is calculated. Takes only one argument.
        args : tuple
           argument (parameters) of the distribution
        lb, ub : numbers
           lower and upper bound for integration, default is set to the support
           of the distribution
        conditional : boolean (False)
           If true then the integral is corrected by the conditional probability
           of the integration interval. The return value is the expectation
           of the function, conditional on being in the given interval.

        expected value : float
    if fn is None:
        def fun(x, *args):
            return x*self.pdf(x, *args)
        def fun(x, *args):
            return fn(x)*self.pdf(x, *args)
    if lb is None:
        lb = self.a
    if ub is None:
        ub = self.b
    if conditional:
        invfac = self.sf(lb,*args) - self.sf(ub,*args)
        invfac = 1.0
    return integrate.quad(fun, lb, ub,

For now this is just a function where the first argument is a distribution instance, as they are available in scipy.stats. We can call this function to calculate the forth moment of the standard normal distribution and compare it with the moment of stats.norm

>>> print stats.norm.moment(4)
>>> print expectedfunc(stats.norm, lambda(x): (x)**4)

We obtain the same result, which means in this case our function works correctly.

Now we can attach it to stats.distributions.rv_continuous, which is the superclass of all continuous distributions. We could have alse used new.instancemethod which is, however, depreciated and will be removen in py3k.

>>> import types
>>> stats.distributions.rv_continuous.expectedfunc =
...       types.MethodType(expectedfunc,None,stats.distributions.rv_continuous)
>>> #print dir(stats.norm)
>>> print stats.norm.expectedfunc
<bound method norm_gen.expectedfunc of <scipy.stats.distributions.norm_gen object at 0x02122830>>


Here is the forth moment for both the normal and the t distribution. The t distribution requires one parameter, the degrees of freedom, which I set to 10 to get fatter tails:

>>> print stats.norm.expectedfunc(lambda(x): (x)**4)
>>> print stats.norm.moment(4)
>>> print stats.t.expectedfunc(lambda(x): (x)**4, 10)
>>> print stats.t.moment(4, 10)

Expectation of some additional functions:

>>> print stats.norm.expectedfunc(lambda(x): np.sqrt(np.abs(x)))
>>> print stats.norm.expectedfunc(lambda(x): np.exp(-np.abs(x)))
>>> print stats.norm.expectedfunc(lambda(x): np.exp(-x), lb=0)

If our function is identical to one, and we use integration bounds on our integral, then we get the probability of the interval:

>>> print stats.norm.expectedfunc(lambda(x): 1, lb=-1, ub=0.5)
>>> print stats.norm.cdf(0.5) - stats.norm.cdf(-1)

Can we calculate the expectation of exp(x)?

>>> print stats.norm.expectedfunc(lambda(x): np.exp(x))
Warning: The ocurrence of roundoff error is detected, which prevents
  the requested tolerance from being achieved.  The error may be

The expectation of exp(x) with respect to the standard normal distribution is unbound, and our numerical integration function returns nan, not a number.

If we integrate with respect to a distribution with finite support, for example the rdistribution, rdist, then the expectation is finite:

>>> print stats.rdist.expectedfunc(lambda(x): np.exp(x),0.1)

We can also try complex values:

>>> print stats.norm.expectedfunc(lambda(x): np.exp(1j*x))

I have no idea if this is correct, but this is the basic calculation for the characteristic function of a distribution.

Next we can try out conditional expectation. As an example, we calculate the expectation of a standard normal random variable conditional on values being in the top decile, i.e. the expectation of all values in the top 10 %.

>>> lbdec = stats.norm.isf(0.1)
>>> print stats.norm.expectedfunc(lb=lbdec, conditional=True)
>>> print expectedfunc(stats.norm, lb=lbdec, conditional=True)
>>> print stats.norm.expectedfunc(lambda(x): 1, lb=-lbdec, ub=lbdec)
>>> #should be 0.8

What’s the variance if we truncate the normal distribution at the 0.1 and 0.9 deciles?

>>> print stats.norm.expectedfunc(lambda(x): x**2, lb=-lbdec, ub=lbdec,conditional=True)
>>> print expectedfunc(stats.norm, lambda(x): x**2, lb=-lbdec, ub=lbdec,conditional=True)

and verify the result with truncated normal

>>> print stats.truncnorm.moment(2,-lbdec,lbdec)
>>> lbdect = stats.t.isf(0.1, 10)
>>> print stats.t.expectedfunc(args=(10,), lb=lbdect, conditional=True)
>>> print expectedfunc(stats.t, args=(10,), lb=lbdect, conditional=True)

The t distribution has fatter tails than the normal distribution, so the conditional expectation of the top decile is larger for the t distribution than for the normal distribution, 1.989 versus 1.755.


  1. thanks for this.
    just wondering why your comment states 'only for standard version of distribution, location and scale not tested'
    since you pass in args to the integration, this should work just fine if we pass in loc and scale, right?

  2. Hi, if it's not tested then I don't trust it. After rewriting and testing this function also for location and scale, I needed to adjust the lower and upper bound to take the location shift and scaling into account. This might not be a problem for unbound distribution, integrating from -inf to +inf, but it was incorrect for distribution with a finite bound, for example distributions on the positive real line.

    The successor to this function is now a method of the distributions in scipy.stats (added in scipy 0.9)