Using the distribution classes in scipy.stats it is easy to calculate
expectations of a function with respect to any distributions
using numerical integration.
I’m going to write a function that calculates the expectation,
then I attach it to the class of continuous distributions
in scipy.stats. Finally we can use our new method with any existing
distribution.
Warning: Monkey Patching a class can have unintended effects if the
new or changed methods interfere with other uses. In this case
we just add a new method, which does not effect any of the original
use of the distributions.
import numpy as np
from scipy import stats, integrate
def expectedfunc(self, fn=None, args=(), lb=None, ub=None, conditional=False):
'''calculate expected value of a function with respect to the distribution
only for standard version of distribution,
location and scale not tested
Parameters
----------
all parameters are keyword parameters
fn : function (default: identity mapping)
Function for which integral is calculated. Takes only one argument.
args : tuple
argument (parameters) of the distribution
lb, ub : numbers
lower and upper bound for integration, default is set to the support
of the distribution
conditional : boolean (False)
If true then the integral is corrected by the conditional probability
of the integration interval. The return value is the expectation
of the function, conditional on being in the given interval.
Returns
-------
expected value : float
'''
if fn is None:
def fun(x, *args):
return x*self.pdf(x, *args)
else:
def fun(x, *args):
return fn(x)*self.pdf(x, *args)
if lb is None:
lb = self.a
if ub is None:
ub = self.b
if conditional:
invfac = self.sf(lb,*args) - self.sf(ub,*args)
else:
invfac = 1.0
return integrate.quad(fun, lb, ub,
args=args)[0]/invfac
For now this is just a function where the first argument is
a distribution instance, as they are available in scipy.stats.
We can call this function to calculate the forth moment of the
standard normal distribution and compare it with the moment
of stats.norm
>>> print stats.norm.moment(4)
3.0
>>> print expectedfunc(stats.norm, lambda(x): (x)**4)
3.0
We obtain the same result, which means in this case our function works
correctly.
Now we can attach it to stats.distributions.rv_continuous, which is the
superclass of all continuous distributions. We could have alse used
new.instancemethod which is, however, depreciated and will be removen in py3k.
>>> import types
>>> stats.distributions.rv_continuous.expectedfunc =
... types.MethodType(expectedfunc,None,stats.distributions.rv_continuous)
>>> #print dir(stats.norm)
>>> print stats.norm.expectedfunc
<bound method norm_gen.expectedfunc of <scipy.stats.distributions.norm_gen object at 0x02122830>>
Examples
Here is the forth moment for both the normal and the t distribution.
The t distribution requires one parameter, the degrees of freedom, which I
set to 10 to get fatter tails:
>>> print stats.norm.expectedfunc(lambda(x): (x)**4)
3.0
>>> print stats.norm.moment(4)
3.0
>>> print stats.t.expectedfunc(lambda(x): (x)**4, 10)
6.25
>>> print stats.t.moment(4, 10)
6.25
Expectation of some additional functions:
>>> print stats.norm.expectedfunc(lambda(x): np.sqrt(np.abs(x)))
0.822178958662
>>> print stats.norm.expectedfunc(lambda(x): np.exp(-np.abs(x)))
0.52315658373
>>> print stats.norm.expectedfunc(lambda(x): np.exp(-x), lb=0)
0.261578291865
If our function is identical to one, and we use integration bounds on our
integral, then we get the probability of the interval:
>>> print stats.norm.expectedfunc(lambda(x): 1, lb=-1, ub=0.5)
0.532807207343
>>> print stats.norm.cdf(0.5) - stats.norm.cdf(-1)
0.532807207343
Can we calculate the expectation of exp(x)?
>>> print stats.norm.expectedfunc(lambda(x): np.exp(x))
Warning: The ocurrence of roundoff error is detected, which prevents
the requested tolerance from being achieved. The error may be
underestimated.
-1.#IND
The expectation of exp(x) with respect to the standard normal
distribution is unbound, and our numerical integration function
returns nan, not a number.
If we integrate with respect to a distribution with finite
support, for example the rdistribution, rdist, then the
expectation is finite:
>>> print stats.rdist.expectedfunc(lambda(x): np.exp(x),0.1)
1.49242160729
We can also try complex values:
>>> print stats.norm.expectedfunc(lambda(x): np.exp(1j*x))
0.606530659713
I have no idea if this is correct, but this is the basic
calculation for the characteristic function of a distribution.
Next we can try out conditional expectation.
As an example, we calculate the expectation of a standard
normal random variable conditional on values being in the top
decile, i.e. the expectation of all values in the top 10 %.
>>> lbdec = stats.norm.isf(0.1)
>>> print stats.norm.expectedfunc(lb=lbdec, conditional=True)
1.75498331932
>>> print expectedfunc(stats.norm, lb=lbdec, conditional=True)
1.75498331932
>>> print stats.norm.expectedfunc(lambda(x): 1, lb=-lbdec, ub=lbdec)
0.8
>>> #should be 0.8
What’s the variance if we truncate the normal distribution
at the 0.1 and 0.9 deciles?
>>> print stats.norm.expectedfunc(lambda(x): x**2, lb=-lbdec, ub=lbdec,conditional=True)
0.437724594904
>>> print expectedfunc(stats.norm, lambda(x): x**2, lb=-lbdec, ub=lbdec,conditional=True)
0.437724594904
and verify the result with truncated normal
>>> print stats.truncnorm.moment(2,-lbdec,lbdec)
0.437724594904
>>> lbdect = stats.t.isf(0.1, 10)
>>> print stats.t.expectedfunc(args=(10,), lb=lbdect, conditional=True)
1.9892028422
>>> print expectedfunc(stats.t, args=(10,), lb=lbdect, conditional=True)
1.9892028422
The t distribution has fatter tails than the normal distribution, so the
conditional expectation of the top decile is larger for the
t distribution than for the normal distribution, 1.989 versus 1.755.