I’m going to try to blog more technical stuff, as well as simple recipes, here more. I’m working out the best way to present it – whether to bundle everything together or separate out the technical posts.
Today we’re going to use Python to find a simple correlation, and then fit a straight line to the curve.
First you want to install the SciPy and NumPy libraries – they have a lot of cool functions for Python. On Mac, if you have MacPorts installed, this is trivial:
$ sudo port install py27-numpy py27-scipy
Then you find a Pearson’s correlation as follows:
# The dependent variable >>> x = [1, -2, 2, 3, 1] # The independent variable >>> y = [7.5, -3.5, 14.5, 19, 6.6] #Find a correlation >>> from scipy.stats.stats import pearsonr #First value is the r-value, 2nd is the p-value >>> pearsonr(x,y) (0.98139984935586166, 0.0030366388199721478) # To find the best-fit line, use the numpy directory >>> from numpy.linalg import lstsq # Put the x variable in the correct format. >>> A = numpy.vstack([x, numpy.ones(len(x))]).T >>> A array([[ 1., 1.], [-2., 1.], [ 2., 1.], [ 3., 1.], [ 1., 1.]]) >>> lstsq(A, y) (array([ 4.5 , 4.32]), array([ 10.848]), 2, array([ 4.53897844, 1.84327826])) # 4.5 is the slope, 4.32 is the y-intercept. 10.848 is the sum of the residuals. # 2 is the rank. The last array are the singular values.
For more details, including how to plot the correlation, see the NumPy documentation here.