Using the distribution classes in scipy.stats it is easy to calculate expectations of a function with respect to any distributions using numerical integration.
I’m going to write a function that calculates the expectation, then I attach it to the class of continuous distributions in scipy.stats. Finally we can use our new method with any existing distribution.
Warning: Monkey Patching a class can have unintended effects if the new or changed methods interfere with other uses. In this case we just add a new method, which does not effect any of the original use of the distributions.
import numpy as np
from scipy import stats, integrate
def expectedfunc(self, fn=None, args=(), lb=None, ub=None, conditional=False):
'''calculate expected value of a function with respect to the distribution
only for standard version of distribution,
location and scale not tested
Parameters
----------
all parameters are keyword parameters
fn : function (default: identity mapping)
Function for which integral is calculated. Takes only one argument.
args : tuple
argument (parameters) of the distribution
lb, ub : numbers
lower and upper bound for integration, default is set to the support
of the distribution
conditional : boolean (False)
If true then the integral is corrected by the conditional probability
of the integration interval. The return value is the expectation
of the function, conditional on being in the given interval.
Returns
-------
expected value : float
'''
if fn is None:
def fun(x, *args):
return x*self.pdf(x, *args)
else:
def fun(x, *args):
return fn(x)*self.pdf(x, *args)
if lb is None:
lb = self.a
if ub is None:
ub = self.b
if conditional:
invfac = self.sf(lb,*args) - self.sf(ub,*args)
else:
invfac = 1.0
return integrate.quad(fun, lb, ub,
args=args)[0]/invfac
For now this is just a function where the first argument is a distribution instance, as they are available in scipy.stats. We can call this function to calculate the forth moment of the standard normal distribution and compare it with the moment of stats.norm
>>> print stats.norm.moment(4)
3.0
>>> print expectedfunc(stats.norm, lambda(x): (x)**4)
3.0
We obtain the same result, which means in this case our function works correctly.
Now we can attach it to stats.distributions.rv_continuous, which is the superclass of all continuous distributions. We could have alse used new.instancemethod which is, however, depreciated and will be removen in py3k.
>>> import types
>>> stats.distributions.rv_continuous.expectedfunc =
... types.MethodType(expectedfunc,None,stats.distributions.rv_continuous)
>>> #print dir(stats.norm)
>>> print stats.norm.expectedfunc
<bound method norm_gen.expectedfunc of <scipy.stats.distributions.norm_gen object at 0x02122830>>
Examples
Here is the forth moment for both the normal and the t distribution. The t distribution requires one parameter, the degrees of freedom, which I set to 10 to get fatter tails:
>>> print stats.norm.expectedfunc(lambda(x): (x)**4)
3.0
>>> print stats.norm.moment(4)
3.0
>>> print stats.t.expectedfunc(lambda(x): (x)**4, 10)
6.25
>>> print stats.t.moment(4, 10)
6.25
Expectation of some additional functions:
>>> print stats.norm.expectedfunc(lambda(x): np.sqrt(np.abs(x)))
0.822178958662
>>> print stats.norm.expectedfunc(lambda(x): np.exp(-np.abs(x)))
0.52315658373
>>> print stats.norm.expectedfunc(lambda(x): np.exp(-x), lb=0)
0.261578291865
If our function is identical to one, and we use integration bounds on our integral, then we get the probability of the interval:
>>> print stats.norm.expectedfunc(lambda(x): 1, lb=-1, ub=0.5)
0.532807207343
>>> print stats.norm.cdf(0.5) - stats.norm.cdf(-1)
0.532807207343
Can we calculate the expectation of exp(x)?
>>> print stats.norm.expectedfunc(lambda(x): np.exp(x))
Warning: The ocurrence of roundoff error is detected, which prevents
the requested tolerance from being achieved. The error may be
underestimated.
-1.#IND
The expectation of exp(x) with respect to the standard normal distribution is unbound, and our numerical integration function returns nan, not a number.
If we integrate with respect to a distribution with finite support, for example the rdistribution, rdist, then the expectation is finite:
>>> print stats.rdist.expectedfunc(lambda(x): np.exp(x),0.1)
1.49242160729
We can also try complex values:
>>> print stats.norm.expectedfunc(lambda(x): np.exp(1j*x))
0.606530659713
I have no idea if this is correct, but this is the basic calculation for the characteristic function of a distribution.
Next we can try out conditional expectation. As an example, we calculate the expectation of a standard normal random variable conditional on values being in the top decile, i.e. the expectation of all values in the top 10 %.
>>> lbdec = stats.norm.isf(0.1)
>>> print stats.norm.expectedfunc(lb=lbdec, conditional=True)
1.75498331932
>>> print expectedfunc(stats.norm, lb=lbdec, conditional=True)
1.75498331932
>>> print stats.norm.expectedfunc(lambda(x): 1, lb=-lbdec, ub=lbdec)
0.8
>>> #should be 0.8
What’s the variance if we truncate the normal distribution at the 0.1 and 0.9 deciles?
>>> print stats.norm.expectedfunc(lambda(x): x**2, lb=-lbdec, ub=lbdec,conditional=True)
0.437724594904
>>> print expectedfunc(stats.norm, lambda(x): x**2, lb=-lbdec, ub=lbdec,conditional=True)
0.437724594904
and verify the result with truncated normal
>>> print stats.truncnorm.moment(2,-lbdec,lbdec)
0.437724594904
>>> lbdect = stats.t.isf(0.1, 10)
>>> print stats.t.expectedfunc(args=(10,), lb=lbdect, conditional=True)
1.9892028422
>>> print expectedfunc(stats.t, args=(10,), lb=lbdect, conditional=True)
1.9892028422
The t distribution has fatter tails than the normal distribution, so the conditional expectation of the top decile is larger for the t distribution than for the normal distribution, 1.989 versus 1.755.
thanks for this.
ReplyDeletejust wondering why your comment states 'only for standard version of distribution, location and scale not tested'
since you pass in args to the integration, this should work just fine if we pass in loc and scale, right?
Hi, if it's not tested then I don't trust it. After rewriting and testing this function also for location and scale, I needed to adjust the lower and upper bound to take the location shift and scaling into account. This might not be a problem for unbound distribution, integrating from -inf to +inf, but it was incorrect for distribution with a finite bound, for example distributions on the positive real line.
ReplyDeleteThe successor to this function is now a method of the distributions in scipy.stats (added in scipy 0.9)