Showing posts with label Statistics in Python. Show all posts
Showing posts with label Statistics in Python. Show all posts

Wednesday, May 1, 2013

Power Plots in statsmodels

I just want to show another two plots for the statistical power of a test, since I didn't have time for this earlier

The code to produce them is just calling the methods of the power classes, for example for the one sample t-test.

Wednesday, April 24, 2013

Help: I caught a bug

I think I must be turning too much into a statistician and econometrician lately, I must have caught a virus or something. Maybe it started already a while ago

The theme of the scipy conference this year is "Machine Learning & Tools for Reproducible Science". However, I'm not doing any sexy twitter analysis, I just spent some days coding tests for proportion, boring stuff like pairwise comparisons of proportions.

Anyway, I decided to submit a tutorial proposal for econometrics with statsmodels to the scipy conference, see (lightly edited) proposal below. Since my proposal didn't get accepted, my first response was: Wrong topic, Too much statistic, We just want numbers, not check whether the model is correct, and find out how to fix it.

That leaves me with more time to go back to figuring out which other basic statistical tests are still missing in Python.

Statistics in Python: Reproducing Research

This is just a short comment on a blog post.

Ferando Perez wrote a nice article about "Literate computing" and computational reproducibility: IPython in the age of data-driven journalism

In the second part, he explains that Vincent Arel-Bundock came up with an ipython notebook within three hours to replicate some criticism of an economics journal article. Vincent's notebook can be seen here.

What I found most striking was not the presentation as a notebook, although that makes it easy to read, instead it was: pandas, patsy and statsmodels, and no R in sight. We have come a long way with Statistics in Python since I started to get involved in it five years ago.

Vincent has made many improvements and contributions to statsmodels in the last year.

Aside

I'm not following much of the economics debates these days, so I only know what I read in the two references that Fernando gave.

My impression is that it's just the usual (mis)use of economics research results. Politicians like the numbers that give them ammunition for their position. As economist, you are either very careful about how to present the results, or you join the political game (I worked for several years in an agricultural department of a small country). (An example for the use of economics results in another area, blaming the financial crisis on the work on copulas.)

"Believable" research: If your results sound too good or too interesting to be true, maybe they are not, and you better check your calculations. Although mistakes are not uncommon, the business as usual part is that the results are often very sensitive to assumptions, and it takes time to figure out what results are robust. I have seen enough economic debates where there never was a clear answer that convinced more than half of all economists. A long time ago, when the Asian Tigers where still tigers, one question was: Did they grow because of or in spite of government intervention?

Wednesday, April 17, 2013

Multiple Testing P-Value Corrections in Statsmodels

series : "Statistics in Python"

This is a follow-up to my previous posts, here and this post, which are on software development, and multiple comparisons which looks at a specific case of pairwise mean comparisons.

In the following, I provide a brief introduction to the statsmodels functions for p-value asjustements to correct for multiple testing problems and then illustrate some properties using several Monte Carlo experiments.

Thursday, March 28, 2013

Inter-rater Reliability, Cohen's Kappa and Surprises with R

Introduction

This is part three in adventures with statsmodels.stats, after power and multicomparison.

This time it is about Cohen's Kappa, a measure of inter-rater agreement or reliability. Suppose we have two raters that each assigns the same subjects or objects to one of a fixed number of categories. The question then is: How well do the raters agree in their assignments? Kappa provides a measure of association, the largest possible value is one, the smallest is minus 1, and it has a corresponding statistical test for the hypothesis that agreement is only by chance. Cohen's kappa and similar measures have widespread use, among other fields, in medical or biostatistic. In one class of applications, raters are doctors, subjects are patients and the categories are medical diagnosis. Cohen's Kappa provides a measure how well the two sets of diagnoses agree, and a hypothesis test whether the agreement is purely random.

For more background see this Wikipedia page which was the starting point for my code.

Most of the following focuses on weighted kappa, and the interpretation of different weighting schemes. In the last part, I add some comments about R, which provided me with several hours of debugging, since I'm essentially an R newbie and have not yet figured out some of it's "funny" behavior.