jeudi 30 juin 2011

calling shell commands from within python

That's quite easy to interact with the shell from within python, it only needs to import the subprocess module.
If the matter is just executing a command, use the call method:

>>>>subprocess.call('ls')
Applications    SharedLin       Volumes         etc             mach_kernel     sbin            var
Developer       SharedXP        bin             home            net             sw
Library         System          cores           iraf            opt             tmp
Network         Users           dev             lost+found      private         usr


If some arguments need to be used, you cannot put them in the same string as the command, need to use:

>>>>subprocess.call(['ls','-l'])
total 40733
drwxrwxr-x+ 134 root                admin      4556 28 jui 00:21 Applications
drwxrwxr-x@  18 root                admin       612 18 mar 20:03 Developer

[...]

or you'll need to execute the command in a shell:
>>>>subprocess.call('ls -l', shell=True)

If one want to deal with the output, use the Popen and communicate methods :

>>>>ls = subprocess.Popen(["ls"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
>>>>stdout, stderr = ls.communicate()

You can access the distinct values of the stdout by splitting them:
stdout.split()

Now, one can need to execute the command with a givenfile as input and redirect the output to another file.
That's possible using the Popen method:

input_file = open('model.in','r')
output_file = open('model.out','wb')
p = subprocess.Popen('cloudy.exe', stdout=output_file, stdin = input_file)
input_file.close()
output_file.close()
That's it!

References:
http://docs.python.org/library/subprocess.html

http://jimmyg.org/blog/2009/working-with-python-subprocess.html

jeudi 23 juin 2011

.r and %run and Debugging

From IDL, it's easy to run a program at main level and stay after the execution with all the variables as they are defined by the program.
In (i)python, if you interactively import myprog it will execute it, but let you without any access to what has been set within the program.
There is a way to do the equivalent of .r in IDL, but you needs to run ipython.

ipython:
%run myprog
..... executing the myprog.py and give you the prompt back so you can explore the variables and objects to see if they are as they are supposed to be...

The %run (the % is needed, it's not a prompt) function is recompiling the code each time it is used, so changes are taken into account (it's not the case with import).

Nice tip for developpers...

BTW, there is also a debugger mode : from ipython, just type %pdb and then import or %run what you want. If something wrong happens, you will be let where it happened, with access to all the variables... like in IDL ;-) You have this activated by default using ipython -pdb.

More and more important informations  on ipython here (I think I must read these pages every week for the next 3 months!):
http://ipython.org/documentation.html

lundi 6 juin 2011

Broadcasting: adding arrays of different shapes

Something very different between python-numpy and IDL is the way they are both dealing with linear algebra in case of operations on different size or shape arrays.

In IDL, trying to operate on 2 vectors with different size reduces the oprtation to the lowest size:
IDL> a=[1,2,3]
IDL> b=[10,20,30,40]
IDL> print,a+b
       11       22       33

As we can see, the latest element is purely omitted.
IDL> c=[[1,2,3,4]]
IDL> d=[[10],[20],[30],[40]]
IDL> print,c
       1       2       3       4
IDL> print,d
      10
      20
      30
      40

The shape of the first operand is conserved:
IDL> print,c+d
      11      22      33      44
IDL> print,d+c
      11
      22
      33
      44

And we can loose part of the vector:
IDL> print,a+c
       2       4       6
IDL> print,a+d
      11      22      33



On the contrary, Python-numpy is adding some information, this is the broadcasting.

import numpy as np
a=np.array([1,2,3.])
b=np.array([10,20,30.,40])

a*b
# Traceback (most recent call last):
#  File "<stdin>", line 1, in <module>
#ValueError: shape mismatch: objects cannot be broadcast to a single shape



BUT:

a = a.reshape(3,1)
a
#array([[ 1.],

#      [ 2.],
#       [ 3.]])

b
#array([ 10.,  20.,  30.,  40.])


a*b
#array([[  10.,   20.,   30.,   40.],
#       [  20.,   40.,   60.,   80.],
#       [  30.,   60.,   90.,  120.]])


Another example where Python is guessing in which dimension the array must be extended:
c= a*b
c.shape
# (3, 4)
c*a
#array([[  10.,   20.,   30.,   40.],
#       [  40.,   80.,  120.,  160.],
#       [  90.,  180.,  270.,  360.]])
c*b
#array([[  100.,   400.,   900.,  1600.],
#       [  200.,   800.,  1800.,  3200.],
#       [  300.,  1200.,  2700.,  4800.]])
b2==np.array([10,20,30.])
#array([False, False, False], dtype=bool)
d=a*b2
d.shape
#(3, 3)
d
#array([[ 1.,  2.,  3.],
#       [ 2.,  4.,  6.],
3       [ 3.,  6.,  9.]])
d*a
#array([[  1.,   2.,   3.],

#       [  4.,   8.,  12.],
#       [  9.,  18.,  27.]])
d*b2
#array([[  1.,   4.,   9.],
#       [  2.,   8.,  18.],
#       [  3.,  12.,  27.]])


More details and nice figures in:

http://www.scipy.org/EricsBroadcastingDoc

Structure-like object (2): record arrays, access and views

Some complement to the previous message on structured arrays and record arrays:

import numpy as np
a = np.zeros((10,),dtype=[('name', str), ('ra', float), ('dec', float)])
a['ra'] = np.random.random_sample(10)*360
a['dec'] = np.random.random_sample(10)*180-90
tt = ((a['ra'] > 5.) & (abs(a['dec']) < 10.))
b = a[tt]
a.size
#10
tt.size
#10
tt.sum()
#1
b.size
#1


If no names age given to the different tags, it is set by default to f0,f1,...fN.
One can access the data without knowing the names of the tags:
a[a.dtype.names[2]]
is more complicated that IDL>a.(2), but at least it's possible ;-) And it as some advantage: you can change the names:
a.dtype.names = ('obs','ra','dec')

Be careful with subset, they are views:
b=a[1]
a['dec'][1]
# 0.0
b['dec']
# 0.0
b['dec'] = 2
a['dec'][1]
# 2.0
But it's not so easy to see it:
b['dec'] is a['dec'][1]
# False

Now we can turn the structured array a into a recarray:

a2 = a.view(np.recarray)
This add a new access mode for the data:
>>> a2.dec
array([ 32.61106119,  82.72958898, -18.46190884,  44.79729473,
       -54.65838972, -23.78818937,   3.56472044, -79.63061338,
        15.81108779,  73.37221597])

Be careful, this is a view, it means that the data are NOT duplicated, they are the same:
>>> a2.dec[1] = 2
>>>
>>> a2.dec[1]
2.0
>>>
>>> a[1]['dec']
2.0
>>>
>>> a['dec'][1]
2.0


This can slow down the access to a2 AND to a !!! So not so useful, or for small tables.

Refs:
http://docs.scipy.org/doc/numpy/user/basics.rec.html
http://www.scipy.org/Cookbook/Recarray

dimanche 5 juin 2011

Structure-like object

I really like the structure variable in IDL. Especially the arrays of structure. It allows to search for elements using the where function and to extract a sub-structure matching a give condition.
I mean, if I have an dataset like this:
IDL> a = replicate({name:'',ra:0.0,dec:0.0},1000)
I can search for all the elements matching for exemple:
IDL> tt = where(a.ra gt 10. and abs(a.dec) gt 5.)
IDL> b = a[tt]


The same (more or less) can be made using numpy (imported as np):

a = np.zeros((1000,),dtype=[('name', str), ('ra', float), ('dec', float)])
tt = ((a['ra'] > 5.) & (abs(a['dec']) < 10.))
b = a[tt]

I don't really know if this is the best way. And also don't know how to access for example the second tag without naming it, like in IDL a.(1)...

I can also create an object:
class obs(object):
    def __init__(self,name='',ra=0.,dec=0.):
        self.name=name
        self.ra=ra
        self.dec=dec
And even define an array of objects:
colec = np.empty( (3,3), dtype=object)
And then put the objects in the colec:
colec[:,:] = obs()
BUT this will create a collection of 9 times the same object!!!
colec[0,0].ra = 5.5
colec[1,1].ra
>>>5.5
Some loop needed here. But the worst is that one will loose all the power of linear algebra from numpy.
So stay with the first approach for now.

Row- or Column major arrays and loops order.

After a too long period of silence, I'm coming back to learn Python. I just want to point out a little difference between IDL and Python in the order arrays are stored in the computer memory.
There is two ways arrays can be stored: row- or column major. It has a direct impact on the way one has to loop on the arrays. IDL is like Fortran (column major) and Python is like C (row major). It means that in Python, as you move linearly through the memory of an array, the second dimension (rigthmost) changes the fastest, while in IDL the first (leftmost) dimension changes the fastest.
Consequence on the loop order:

for i in range(0,5):
 for j in range(0,4):
              ... a[i,j] ...