Go back to FacetsPythonCourse2008

Scientific Programming in Python

Author: Eilif Muller
Date: Tue, 10 Jun 2008

Prelude

Why Python?

Reason 1: Not all programming languages were created equal

The brainfuck programming language: http://en.wikipedia.org/wiki/Brainfuck

Char      Meaning
====      =======
>        increment the pointer
          (to point to the next cell to the right)
<        decrement the pointer
          (to point to the next cell to the left).
+        increment (increase by one)
          the byte at the pointer.
-        decrement (decrease by one)
          the byte at the pointer.
.         output the value of the byte at the pointer.
,         accept one byte of input, storing its value
          in the byte at the pointer.
[         jump forward to the command after the corresponding ]
          if the byte at the pointer is zero.
]         jump back to the command after the corresponding [
          if the byte at the pointer is nonzero.

Reason 1: brainfuck (hello world)

++++++++++
[>+++++++>++++++++++>+++>+<<<<-] The initial loop to set up
                                 useful values in the array
>++.                             Print 'H'
>+.                              Print 'e'
+++++++.                         Print 'l'
.                                Print 'l'
+++.                             Print 'o'
>++.                             Print ' '
<<+++++++++++++++.               Print 'W'
>.                               Print 'o'
+++.                             Print 'r'
------.                          Print 'l'
--------.                        Print 'd'
>+.                              Print '!'
>.                               Print newline

Reason 2: (deep quote)

If you want to build a ship, don't drum up people together to collect wood or assign them tasks and work,

but rather teach them to long for the endless immensity of the sea

-- Antoine de Saint Exupery

Reason 2: (deep thoughts)

Freiburg train station - total recall dejavu

Reason 2: (deep and abstruse)

David Hume quote

Reason 2: (shall we build a ship)

Problem:

State of Neuroscience Modelling: Models generally don't exceed the complexity of what can be built by one PhD student from scratch in 3 years using one simulator plus heaps of throw-away MATLAB code.

A step in the right direction:

  1. Use a state-of-the-art systems integration language (Python)
  2. Build modular and re-usable models in python and share them: http://neuralensemble.org: a repository for collaborative modeling
  3. Long for the endless immensity of the sea

Reason 3: (CV)

Curriculum Vitae
Eilif Muller

...

Programming skills:

brainfuck.

Reason 3: (CV)

Curriculum Vitae
Eilif Muller

...

Programming skills:

Python.

  • Python will get you a job at google when your grant proposal gets rejected
  • brainfuck won't.
  • Think about it.

Introduction

  • Debian + Ubuntu

    apt-get install python-numpy python-scipy python-matplotlib ipython
    
  • Gentoo

    emerge numpy scipy matplotlib ipython
    
  • Window$, Redhat EL3, Mac OS X (soon)
    Enthought Python -> http://www.enthought.com
  • Mac OS X: Fink

    fink install scipy-core-py25 scipy-py25 matplotlib-py25 ipython-py25
    
echo "alias pylab=\"ipython -pylab\"" >> \
      ~/.bash_profile && source ~/.bash_profile

ipython

emuller@markov ~/theway $ pylab

Python 2.5.2 (r252:60911, Jun  2 2008, 15:58:57)
Type "copyright", "credits" or "license" for more information.

IPython 0.8.2 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object'. ?object also works, ?? prints more.

  Welcome to pylab, a matplotlib-based Python environment.
  For more information, type 'help(pylab)'.

In [1]:
  • interactive demo:

    cd,pwd,ls,magic,?,??,reset,who,debug,pdb,logstart,logon,logoff,p
    

numpy: overview

  • Implements an array-type with MATLAB-like syntax
  • n-dimensional, unlike MATLAB
  • supports many types: bool,int,float,object; unlike MATLAB
  • Near C performance for large arrays
  • Obsoletes older packages: Numeric, Numarray
  • For our intro we will assume no MATLAB experience
  • Fee based documentation - please get your lab to buy a copy! (20-40$)

numpy: importing

  • Importing:

    import numpy
    
  • In general don't from X import *. It ruins readability.

  • But numpy is sometimes an exception to this rule ...

  • pylab does this for you:

    from numpy import *
    
  • I will assume a pylab environment for what follows

numpy: creating an array

  • from a sequence

    a = array([[1.0,2,3],[4,5,6]])
    
  • 2x2x2 of zeros,[ones]

    a = zeros((2,2,2),float)
    
  • of regularly spaced numbers

    a = arange(0,10,0.1)
    

numpy: slicing

a = zeros((2,3,4),float)

a[0,0,1:-1] = [2.1,3.1]
a[:,2,3:1:-1] = [[0.1, 0.2],[0.6,0.5]]

a
Out[]: array([[[ 0. ,  2.1,  3.1,  0. ],
               [ 0. ,  0. ,  0. ,  0. ],
               [ 0. ,  0. ,  0.2,  0.1]],

              [[ 0. ,  0. ,  0. ,  0. ],
               [ 0. ,  0. ,  0. ,  0. ],
               [0. ,  0. ,  0.5,  0.6]]])

numpy: element-wise operations

a = arange(0,10,0.1)
b = a*a
plot(a,b,'b-')
c = 1/a[1:]
plot(a[1:],c,'r--')
alltrue(2*a == a+a)
# True!

numpy: shaping

a = 8*ones((2,3,4),float)
print a.shape
b = 9*ones(a.shape,float)
c = a*b
alltrue(c[:]==8*9)   # True!
b = a.reshape(6,4)
b[-2,-1] = 1.0
print b
array([[ 8.,  8.,  8.,  8.],
       [ 8.,  8.,  8.,  8.],
       [ 8.,  8.,  8.,  8.],
       [ 8.,  8.,  8.,  8.],
       [ 8.,  8.,  8.,  1.],
       [ 8.,  8.,  8.,  8.]])
print b.flat
<numpy.flatiter object at 0x8c9cbd0>
print b.flat[:]
[ 8.  8.  8.  8.  8.  8.  8.  8.  8.  8.  8.  8.  8.  8.  8.  8.  8.  8.
8.  1.  8.  8.  8.  8.]
alltrue(b.flatten() == b.reshape(6*4))   # True!

numpy: resizing, appending

a = array([1,2])
b = resize(a,(2,3))
print b
array([[1, 2, 1],
       [2, 1, 2]])
c = array([3,4])
d = c[newaxis].T
alltrue(d == c.reshape(2,1))  # True!
alltrue(d == transpose(c[newaxis]))  # True!
e = concatenate((b,d),axis=1)
print c[newaxis]
array([[3, 4]])
print d
array([[1],
       [2]])
print e
array([[1, 2, 1, 3],
       [2, 1, 2, 4]])

numpy: more appending

>>> a = array([10,20,30,40])
>>> append(a,50)
array([10, 20, 30, 40, 50])
>>> append(a,[50,60])
array([10, 20, 30, 40, 50, 60])
>>> a = array([[10,20,30],[40,50,60],[70,80,90]])
>>> append(a,[[15,15,15]],axis=0)
array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90],
[15, 15, 15]])
>>> append(a,[[15],[15],[15]],axis=1)
array([[10, 20, 30, 15],
[40, 50, 60, 15],
[70, 80, 90, 15]])

numpy: random number generation

#import numpy.random as random
# in pylab: from numpy.random import *
isi = normal(loc=10,scale=2,size=1000)
mean(isi)
Out[]: 10.0382244096
std(isi)
Out[]: 1.96909053812
gamma, exponential, poisson, uniform, multivariate_normal,
randint, ...  # Have a look: dir(random)
a = arange(0,10,0.1)
seed(8)
b = permutation(a)
seed(8)
idx = permutation(100)
alltrue(b == a[idx])  # True!

numpy: indexing

isi = exponential(0.1,size=1000)
st = add.accumulate(isi)
st[searchsorted(st,10)]
Out[]: 10.0875751149
st[0:5]
Out[]: array([ 0.14864697,  0.15272492,
               0.20884776,  0.24667073,  0.34786589])
st[ [1,4] ]
Out[]: array([ 0.15272492,  0.34786589])
st[ [1,4] ][1]
Out[]: 0.347865887059
st_cut = st[st>0.500]
# equiv:
#   st_cut = compress(st>0.500,st)
# equiv:
#   st_cut = st[nonzero(st>0.500)]
st_cut[0:2]
Out[]: array([ 0.60674483,  0.75081591])
a = [-1, 1, -2, 2]
alltrue(absolute(a) == where(greater(a,0),a,negative(a)) )  # ufunc!

numpy: min, max, sorting ...

a = array([1,2.1,2.0,5.1,6.1])
clip(a)
Out[]: array([ 2. ,  2.1,  2. ,  5.1,  6. ])
min(a),max(a), median(a), sort(a)
Out[]: (1, 6.1, 2.1, array([ 1. ,  2. ,  2.1,  5.1,  6.1]))
argmin(a),argmax(a), argsort(a)
Out[]: (0, 4, array([0, 2, 1, 3, 4]))
var(a),std(a),mean(a)
Out[]: (3.8984, 1.97443662851, 3.26)
unique([5,2,4,0,4,4,2,2,1])
Out[]: array([0, 1, 2, 4, 5])

numpy: fft

x = arange(-100,100,0.01)
y = zeros_like(x)
# A box of height 1
y[(x>-0.5)&(x<0.5)] = 1.0

figure()
plot(x,y)
axis([-1,1,-0.5,2])
z = fft(y)
f = fftfreq(len(y),d=0.01)

figure()
plot(f,abs(z))
axis([-5,5,0,100])
img/fft1.png img/fft2.png

numpy: complex numbers

x = arange(0,2*pi,2*pi/10)
a = zeros(x.shape,complex)
a.real = cos(x)
a.imag = sin(x)
abs(a)
Out[]: array([ 1.,  1.,  1.,  1.,  1.,...])
arg = arctan2(a.imag,a.real)
arg
Out[]: array([ 0.        ,  0.62831853,  1.25663706,  1.88495559,  2.51327412,
      3.14159265, -2.51327412, -1.88495559, -1.25663706, -0.62831853])
sqrt(-1)
Warning: invalid value encountered in sqrt
Out[17]: nan

from scipy import sqrt
x = 1+sqrt(-1)
x == 1+1j  # True!
conj(x)
Out[]: (1-1j)

scipy

scipy.special:
exotic special functions: erfc, legendre, wofz, beta, hermite, etc.
scipy.linalg:
eig, eigvals, inv, solve, lstsq, lu, ... have a look with ?
scipy.interpolate:
easy class-based 1d, 2d interpolation, FITPACK wrappers
scipy.weave:
inline C!
scipy.optimize:
A collection of general-purpose optimization routines. anneal, fsolve,
scipy.sparse:
2D sparse matrix module.

more scipy

scipy.cluster:
Vector Quantization / Kmeans
scipy.signal:
Signal Processing: correlate, convolve, resample, filter design, wavelets Waveforms/Windows: sawtooth, gausspulse, chirp
scipy.stats:
probability distributions and statistical functions.
scipy.integrate:
ode solvers, numerical integration
scipy.ndimage:
n-dimensional image package

...

scipy.weave

For when C is your only friend

import scipy.weave as weave

def f_blitz(a,b,c):

    code = r"""

    for(int i=0;i<Na[0];i++) {
      c(i) = a(i)*b(i);
    }

    """

    weave.inline(code,['a','b','c'], type_converters=weave.converters.blitz)

 a = 2*ones(10)
 b = arange(0,10)
 c = zeros(10)

 <weave: compiling>
 creating /tmp/emuller/python25_intermediate/compiler_6148187adcbc8be6683ba9eef0ee9abd

 c
 Out[]: array([  0.,   2.,   4., ...])

pytables

High-level interface to Heirarchical Data Format 5 (HDF5)

apt-get install python-tables
HDF5:
  • Scientific data storage format (newer netcdf)
  • array/table container + metadata as node
  • nodes organized in tree structure
  • high-performence I/O
  • on-the-fly de/compression
  • Widely used and supported
  • MATLAB reads/writes hdf5

Note: use pytables>2.0

pytables

Basic usage

import tables,numpy.random
h5 = tables.openFile('test.h5','w')
h5.createGroup(h5.root,'mygroup')
h5.createArray('/mygroup','isi1',numpy.random.exponential(size=10000))
h5.createArray('/mygroup','isi2',numpy.random.exponential(size=10000))
h5.close()
h5 = tables.openFile('test.h5')
a = h5.root.mygroup.isi1.read()
mean(a)

But much, much more: http://pytables.org

matplotlib

MATLAB style plotting ...

isi = random.exponential(0.1,1000)
#h1,bins = histogram(isi)

bins = arange(0,1.0,0.01)

h1,null = histogram(isi,bins)

p = h1.astype(float)/1000/0.01  # or cast[float](h1)
plot(bins,p,ls='steps')
plot(bins,1/0.1*exp(-bins/0.1),'r--',lw=2)
# mathtext - Pseudo LaTeX
xlabel(r'$\lambda_{\rm{q^\ast}}$ [$\mu$S]',size=20)
Real LaTeX rendering:
  1. set test.usetex = True in your matplotlibrc
  2. read http://www.scipy.org/Wiki/Cookbook/Matplotlib/UsingTex

See http://matplotlib.sourceforge.net

matplotlib (figure management)

Like MATLAB

figure()
plot(arange(0,10))
# another figure
figure()
plot(arange(10,0,-1))
#back to the other one
figure(1)
# clear the figure
clf()
# close the figure
close()

curve fitting

from scipy.optimize import leastsq

# generate data
from numpy import random,histogram,arange,sqrt,exp,nonzero

n = 1000; isi = random.exponential(0.1,size=n)

db = 0.01; bins = arange(0,1.0,db)
h = histogram(isi,bins)[0]

p = h.astype(float)/n/db

# function to be fit
# x - independent variable
# p - tuple of parameters
fitfunc = lambda p, x: exp(-x/p[0])/p[0]
# standard form, here err is absolute error
errfunc = lambda p, x, y, err: (y - fitfunc(p, x)) / err

curve fitting

# initial values for fit parameters
pinit = array([0.2])

# hist count less than 4 has poor estimate of the weight
# don't use in the fitting process
idx = nonzero(h>4)

out = leastsq(errfunc, pinit,\
              args=(bins[idx]+0.01/2, p[idx], \
              p[idx]/sqrt(h[idx])),
              full_output = 1)


l1 = 'data'

errorbar(bins[idx],p[idx],\
         yerr=p[idx]/sqrt(h[idx]),fmt='ko',label=l1)

l2 = r"""$\frac{1}{\lambda} exp(-x/\lambda)$,
$\lambda = %.3f \pm %.3f$""" % (out[0], sqrt(out[1]))

plot(bins,fitfunc((out[0],),bins),'r--',lw=2,label=l2)

legend()

MayaVI2

How to plot interactively with MayaVI

$ ipython -wthread
import scipy
from scipy import sqrt

# some interesting function:
def f(x, y):
    return scipy.cos(sqrt(x**2+y**2)*2)

x = scipy.arange(-7., 7.05, 0.1)
y = scipy.arange(-5., 5.05, 0.1)

# 3D visualization of f:
from enthought.tvtk.tools import mlab
fig = mlab.figure()

s = mlab.SurfRegular(x, y, f)
fig.add(s)

#fig.renwin.isometric_view()
#fig.renview.camera.zoom(0.1)
#fig.renwin.set_size((400,400))
#fig.renview.render()
#fig.pop()

# movie!

Lots of other tools!

Ipython1, mpi4py, pygsl, rpy (R), NeuroTools, ScientificPython

Modular toolkit for Data Processing (MDP) is a Python data processing framework. Implemented algorithms include: Principal Component Analysis (PCA), Independent Component Analysis (ICA), Slow Feature Analysis (SFA), Independent Slow Feature Analysis (ISFA), Growing Neural Gas (GNG), Factor Analysis, Fisher Discriminant Analysis (FDA), Gaussian Classifiers, and Restricted Boltzmann Machines. Read the full list.

http://mdp-toolkit.sourceforge.net/

... and much more!

Attachments