I'm not sure I'll follow with this blog...
I'm now involved in teaching Python to astronomers colleagues... and I set up another one, where I share the course I'm giving. If you are interested, follow it THERE.
From IDL to Python
The blog of a long standing IDL user who tries to learn Python.
lundi 27 février 2012
jeudi 30 juin 2011
calling shell commands from within python
That's quite easy to interact with the shell from within python, it only needs to import the subprocess module.
If the matter is just executing a command, use the call method:
>>>>subprocess.call('ls')
Applications SharedLin Volumes etc mach_kernel sbin var
Developer SharedXP bin home net sw
Library System cores iraf opt tmp
Network Users dev lost+found private usr
If some arguments need to be used, you cannot put them in the same string as the command, need to use:
>>>>subprocess.call(['ls','-l'])
total 40733
drwxrwxr-x+ 134 root admin 4556 28 jui 00:21 Applications
drwxrwxr-x@ 18 root admin 612 18 mar 20:03 Developer
[...]
or you'll need to execute the command in a shell:
>>>>subprocess.call('ls -l', shell=True)
If one want to deal with the output, use the Popen and communicate methods :
>>>>ls = subprocess.Popen(["ls"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
>>>>stdout, stderr = ls.communicate()
You can access the distinct values of the stdout by splitting them:
stdout.split()
Now, one can need to execute the command with a givenfile as input and redirect the output to another file.
That's possible using the Popen method:
input_file = open('model.in','r')
output_file = open('model.out','wb')
p = subprocess.Popen('cloudy.exe', stdout=output_file, stdin = input_file)
input_file.close()
output_file.close()
That's it!
References:
http://docs.python.org/library/subprocess.html
http://jimmyg.org/blog/2009/working-with-python-subprocess.html
If the matter is just executing a command, use the call method:
>>>>subprocess.call('ls')
Applications SharedLin Volumes etc mach_kernel sbin var
Developer SharedXP bin home net sw
Library System cores iraf opt tmp
Network Users dev lost+found private usr
If some arguments need to be used, you cannot put them in the same string as the command, need to use:
>>>>subprocess.call(['ls','-l'])
total 40733
drwxrwxr-x+ 134 root admin 4556 28 jui 00:21 Applications
drwxrwxr-x@ 18 root admin 612 18 mar 20:03 Developer
[...]
or you'll need to execute the command in a shell:
>>>>subprocess.call('ls -l', shell=True)
If one want to deal with the output, use the Popen and communicate methods :
>>>>ls = subprocess.Popen(["ls"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
>>>>stdout, stderr = ls.communicate()
You can access the distinct values of the stdout by splitting them:
stdout.split()
Now, one can need to execute the command with a givenfile as input and redirect the output to another file.
That's possible using the Popen method:
input_file = open('model.in','r')
output_file = open('model.out','wb')
p = subprocess.Popen('cloudy.exe', stdout=output_file, stdin = input_file)
input_file.close()
output_file.close()
That's it!
References:
http://docs.python.org/library/subprocess.html
http://jimmyg.org/blog/2009/working-with-python-subprocess.html
jeudi 23 juin 2011
.r and %run and Debugging
From IDL, it's easy to run a program at main level and stay after the execution with all the variables as they are defined by the program.
In (i)python, if you interactively import myprog it will execute it, but let you without any access to what has been set within the program.
There is a way to do the equivalent of .r in IDL, but you needs to run ipython.
ipython:
%run myprog
..... executing the myprog.py and give you the prompt back so you can explore the variables and objects to see if they are as they are supposed to be...
The %run (the % is needed, it's not a prompt) function is recompiling the code each time it is used, so changes are taken into account (it's not the case with import).
Nice tip for developpers...
BTW, there is also a debugger mode : from ipython, just type %pdb and then import or %run what you want. If something wrong happens, you will be let where it happened, with access to all the variables... like in IDL ;-) You have this activated by default using ipython -pdb.
More and more important informations on ipython here (I think I must read these pages every week for the next 3 months!):
http://ipython.org/documentation.html
In (i)python, if you interactively import myprog it will execute it, but let you without any access to what has been set within the program.
There is a way to do the equivalent of .r in IDL, but you needs to run ipython.
ipython:
%run myprog
..... executing the myprog.py and give you the prompt back so you can explore the variables and objects to see if they are as they are supposed to be...
The %run (the % is needed, it's not a prompt) function is recompiling the code each time it is used, so changes are taken into account (it's not the case with import).
Nice tip for developpers...
BTW, there is also a debugger mode : from ipython, just type %pdb and then import or %run what you want. If something wrong happens, you will be let where it happened, with access to all the variables... like in IDL ;-) You have this activated by default using ipython -pdb.
More and more important informations on ipython here (I think I must read these pages every week for the next 3 months!):
http://ipython.org/documentation.html
lundi 6 juin 2011
Broadcasting: adding arrays of different shapes
Something very different between python-numpy and IDL is the way they are both dealing with linear algebra in case of operations on different size or shape arrays.
In IDL, trying to operate on 2 vectors with different size reduces the oprtation to the lowest size:
IDL> a=[1,2,3]
IDL> b=[10,20,30,40]
IDL> print,a+b
11 22 33
As we can see, the latest element is purely omitted.
IDL> c=[[1,2,3,4]]
IDL> d=[[10],[20],[30],[40]]
IDL> print,c
1 2 3 4
IDL> print,d
10
20
30
40
The shape of the first operand is conserved:
IDL> print,c+d
11 22 33 44
IDL> print,d+c
11
22
33
44
And we can loose part of the vector:
IDL> print,a+c
2 4 6
IDL> print,a+d
11 22 33
On the contrary, Python-numpy is adding some information, this is the broadcasting.
import numpy as np
a=np.array([1,2,3.])
b=np.array([10,20,30.,40])
a*b
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
#ValueError: shape mismatch: objects cannot be broadcast to a single shape
BUT:
a = a.reshape(3,1)
a
#array([[ 1.],
# [ 2.],
# [ 3.]])
b
#array([ 10., 20., 30., 40.])
a*b
#array([[ 10., 20., 30., 40.],
# [ 20., 40., 60., 80.],
# [ 30., 60., 90., 120.]])
Another example where Python is guessing in which dimension the array must be extended:
c= a*b
c.shape
# (3, 4)
c*a
#array([[ 10., 20., 30., 40.],
# [ 40., 80., 120., 160.],
# [ 90., 180., 270., 360.]])
c*b
#array([[ 100., 400., 900., 1600.],
# [ 200., 800., 1800., 3200.],
# [ 300., 1200., 2700., 4800.]])
b2==np.array([10,20,30.])
#array([False, False, False], dtype=bool)
d=a*b2
d.shape
#(3, 3)
d
#array([[ 1., 2., 3.],
# [ 2., 4., 6.],
3 [ 3., 6., 9.]])
d*a
#array([[ 1., 2., 3.],
# [ 4., 8., 12.],
# [ 9., 18., 27.]])
d*b2
#array([[ 1., 4., 9.],
# [ 2., 8., 18.],
# [ 3., 12., 27.]])
More details and nice figures in:
http://www.scipy.org/EricsBroadcastingDoc
In IDL, trying to operate on 2 vectors with different size reduces the oprtation to the lowest size:
IDL> a=[1,2,3]
IDL> b=[10,20,30,40]
IDL> print,a+b
11 22 33
As we can see, the latest element is purely omitted.
IDL> c=[[1,2,3,4]]
IDL> d=[[10],[20],[30],[40]]
IDL> print,c
1 2 3 4
IDL> print,d
10
20
30
40
The shape of the first operand is conserved:
IDL> print,c+d
11 22 33 44
IDL> print,d+c
11
22
33
44
And we can loose part of the vector:
IDL> print,a+c
2 4 6
IDL> print,a+d
11 22 33
On the contrary, Python-numpy is adding some information, this is the broadcasting.
import numpy as np
a=np.array([1,2,3.])
b=np.array([10,20,30.,40])
a*b
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
#ValueError: shape mismatch: objects cannot be broadcast to a single shape
BUT:
a = a.reshape(3,1)
a
#array([[ 1.],
# [ 2.],
# [ 3.]])
b
#array([ 10., 20., 30., 40.])
a*b
#array([[ 10., 20., 30., 40.],
# [ 20., 40., 60., 80.],
# [ 30., 60., 90., 120.]])
Another example where Python is guessing in which dimension the array must be extended:
c= a*b
c.shape
# (3, 4)
c*a
#array([[ 10., 20., 30., 40.],
# [ 40., 80., 120., 160.],
# [ 90., 180., 270., 360.]])
c*b
#array([[ 100., 400., 900., 1600.],
# [ 200., 800., 1800., 3200.],
# [ 300., 1200., 2700., 4800.]])
b2==np.array([10,20,30.])
#array([False, False, False], dtype=bool)
d=a*b2
d.shape
#(3, 3)
d
#array([[ 1., 2., 3.],
# [ 2., 4., 6.],
3 [ 3., 6., 9.]])
d*a
#array([[ 1., 2., 3.],
# [ 4., 8., 12.],
# [ 9., 18., 27.]])
d*b2
#array([[ 1., 4., 9.],
# [ 2., 8., 18.],
# [ 3., 12., 27.]])
More details and nice figures in:
http://www.scipy.org/EricsBroadcastingDoc
Structure-like object (2): record arrays, access and views
Some complement to the previous message on structured arrays and record arrays:
import numpy as np
a = np.zeros((10,),dtype=[('name', str), ('ra', float), ('dec', float)])
a['ra'] = np.random.random_sample(10)*360
a['dec'] = np.random.random_sample(10)*180-90
tt = ((a['ra'] > 5.) & (abs(a['dec']) < 10.))
b = a[tt]
a.size
#10
tt.size
#10
tt.sum()
#1
b.size
#1
If no names age given to the different tags, it is set by default to f0,f1,...fN.
One can access the data without knowing the names of the tags:
a[a.dtype.names[2]]
is more complicated that IDL>a.(2), but at least it's possible ;-) And it as some advantage: you can change the names:
a.dtype.names = ('obs','ra','dec')
Be careful with subset, they are views:
b=a[1]
a['dec'][1]
# 0.0
b['dec']
# 0.0
b['dec'] = 2
a['dec'][1]
# 2.0
But it's not so easy to see it:
b['dec'] is a['dec'][1]
# False
Now we can turn the structured array a into a recarray:
>>> a2.dec
array([ 32.61106119, 82.72958898, -18.46190884, 44.79729473,
-54.65838972, -23.78818937, 3.56472044, -79.63061338,
15.81108779, 73.37221597])
Be careful, this is a view, it means that the data are NOT duplicated, they are the same:
>>> a2.dec[1] = 2
>>>
>>> a2.dec[1]
2.0
>>>
>>> a[1]['dec']
2.0
>>>
>>> a['dec'][1]
2.0
This can slow down the access to a2 AND to a !!! So not so useful, or for small tables.
Refs:
http://docs.scipy.org/doc/numpy/user/basics.rec.html
http://www.scipy.org/Cookbook/Recarray
import numpy as np
a = np.zeros((10,),dtype=[('name', str), ('ra', float), ('dec', float)])
a['ra'] = np.random.random_sample(10)*360
a['dec'] = np.random.random_sample(10)*180-90
tt = ((a['ra'] > 5.) & (abs(a['dec']) < 10.))
b = a[tt]
a.size
#10
tt.size
#10
tt.sum()
#1
b.size
#1
If no names age given to the different tags, it is set by default to f0,f1,...fN.
One can access the data without knowing the names of the tags:
a[a.dtype.names[2]]
is more complicated that IDL>a.(2), but at least it's possible ;-) And it as some advantage: you can change the names:
a.dtype.names = ('obs','ra','dec')
Be careful with subset, they are views:
b=a[1]
a['dec'][1]
# 0.0
b['dec']
# 0.0
b['dec'] = 2
a['dec'][1]
# 2.0
But it's not so easy to see it:
b['dec'] is a['dec'][1]
# False
Now we can turn the structured array a into a recarray:
a2 = a.view(np.recarray)
This add a new access mode for the data:>>> a2.dec
array([ 32.61106119, 82.72958898, -18.46190884, 44.79729473,
-54.65838972, -23.78818937, 3.56472044, -79.63061338,
15.81108779, 73.37221597])
Be careful, this is a view, it means that the data are NOT duplicated, they are the same:
>>> a2.dec[1] = 2
>>>
>>> a2.dec[1]
2.0
>>>
>>> a[1]['dec']
2.0
>>>
>>> a['dec'][1]
2.0
This can slow down the access to a2 AND to a !!! So not so useful, or for small tables.
Refs:
http://docs.scipy.org/doc/numpy/user/basics.rec.html
http://www.scipy.org/Cookbook/Recarray
dimanche 5 juin 2011
Structure-like object
I really like the structure variable in IDL. Especially the arrays of structure. It allows to search for elements using the where function and to extract a sub-structure matching a give condition.
I mean, if I have an dataset like this:
IDL> a = replicate({name:'',ra:0.0,dec:0.0},1000)
I can search for all the elements matching for exemple:
IDL> tt = where(a.ra gt 10. and abs(a.dec) gt 5.)
IDL> b = a[tt]
The same (more or less) can be made using numpy (imported as np):
a = np.zeros((1000,),dtype=[('name', str), ('ra', float), ('dec', float)])
tt = ((a['ra'] > 5.) & (abs(a['dec']) < 10.))
b = a[tt]
I don't really know if this is the best way. And also don't know how to access for example the second tag without naming it, like in IDL a.(1)...
I can also create an object:
class obs(object):
def __init__(self,name='',ra=0.,dec=0.):
self.name=name
self.ra=ra
self.dec=decAnd even define an array of objects:
colec = np.empty( (3,3), dtype=object)
And then put the objects in the colec:
colec[:,:] = obs()
BUT this will create a collection of 9 times the same object!!!
colec[0,0].ra = 5.5
colec[1,1].ra
>>>5.5
Some loop needed here. But the worst is that one will loose all the power of linear algebra from numpy.
So stay with the first approach for now.
I mean, if I have an dataset like this:
IDL> a = replicate({name:'',ra:0.0,dec:0.0},1000)
I can search for all the elements matching for exemple:
IDL> tt = where(a.ra gt 10. and abs(a.dec) gt 5.)
IDL> b = a[tt]
The same (more or less) can be made using numpy (imported as np):
a = np.zeros((1000,),dtype=[('name', str), ('ra', float), ('dec', float)])
tt = ((a['ra'] > 5.) & (abs(a['dec']) < 10.))
b = a[tt]
I don't really know if this is the best way. And also don't know how to access for example the second tag without naming it, like in IDL a.(1)...
I can also create an object:
class obs(object):
def __init__(self,name='',ra=0.,dec=0.):
self.name=name
self.ra=ra
self.dec=decAnd even define an array of objects:
colec = np.empty( (3,3), dtype=object)
And then put the objects in the colec:
colec[:,:] = obs()
BUT this will create a collection of 9 times the same object!!!
colec[0,0].ra = 5.5
colec[1,1].ra
>>>5.5
Some loop needed here. But the worst is that one will loose all the power of linear algebra from numpy.
So stay with the first approach for now.
Row- or Column major arrays and loops order.
After a too long period of silence, I'm coming back to learn Python. I just want to point out a little difference between IDL and Python in the order arrays are stored in the computer memory.
There is two ways arrays can be stored: row- or column major. It has a direct impact on the way one has to loop on the arrays. IDL is like Fortran (column major) and Python is like C (row major). It means that in Python, as you move linearly through the memory of an array, the second dimension (rigthmost) changes the fastest, while in IDL the first (leftmost) dimension changes the fastest.
Consequence on the loop order:
There is two ways arrays can be stored: row- or column major. It has a direct impact on the way one has to loop on the arrays. IDL is like Fortran (column major) and Python is like C (row major). It means that in Python, as you move linearly through the memory of an array, the second dimension (rigthmost) changes the fastest, while in IDL the first (leftmost) dimension changes the fastest.
Consequence on the loop order:
for i in range(0,5): for j in range(0,4):
... a[i,j] ...
Inscription à :
Articles (Atom)