mercredi 29 septembre 2010

Arrays and types: the rec.array type

Here are some example of unexpected behaviors, when coming from IDL:
In [4]: a
Out[4]: [1.0, 2.0, 3] #a is a list
In [5]: b
Out[5]: array([ 0.84147098,  0.90929743,  0.14112001])
In [6]: 2*b
Out[6]: array([ 1.68294197,  1.81859485,  0.28224002])
In [7]: 2*a
Out[7]: [1.0, 2.0, 3, 1.0, 2.0, 3] #this double the list, not the elements!!!
In [10]: c=array([1,2.,3])
In [11]: 2*c
Out[11]: array([ 2.,  4.,  6.]) #now it's OK
In [12]: d=[1,2.,'3'] #mixing int, float and string
In [13]: e=array(d)
In [14]: e
Out[14]: array(['1', '2.0', '3'], dtype='|S8')
#converted to a single type: string

As you can see, the [] are not always defining arrays, sometime it needs explicitly the function array().
A list can contain different types, not an array.

Let's see now how the csv-reader can deal with different types in the same line (as when using structures in IDL):
The file to read is:
int  flt  str  int
1 1 "test1" 1
2 2.3 "tralala" 2
5 3.14 "" 3
6 1e3 "double " 4
7 1e79 "big one" 5


The reading process:
rec=csv2rec('test1.csv',delimiter=" ")
And the type is a rec.array, with mixing types:

In [43]: rec
Out[43]:
rec.array([(1, 1.0, 'test1', 1), (2, 2.2999999999999998, 'tralala', 2),
       (5, 3.1400000000000001, '', 3), (6, 1000.0, 'double ', 4),
       (7, 9.9999999999999997e+78, 'big one', 5)],
      dtype=[('int', '<i8'), ('flt', '<f8'), ('str', '|S7'), ('int_1', '<i8')])


Note that no problems with the value of 1 for the 2nd elements in the first data row: it's an integer, but as the element of the same column in the 2nd data row is 2.3, it is not wrongly considered that the type of this row is integer (contrary to IDL read_ascii(), which only consider the first data row to determine the data types).
Also note the headers, defined using the first row: int is used twice, the second time the name is converted to int_1. So cute!

lundi 27 septembre 2010

the basic syntax (1)

Following the IDL cookbook, I first explore some syntax of Python.

I use the ipython, so the keyboards inputs are transcript with "In [n]" and the corresponding output with "Out [n]". To print something, just type it and press Enter:

In [4]: 'Hello mundo;-)'
Out[4]: 'Hello mundo;-)'
In [5]: a=1
In [6]: a
Out[6]: 1
In [8]: a?
Type:        int
Base Class:    <type 'int'>
String Form:    1
Namespace:    Interactive
Docstring:
    int(x[, base]) -> integer
.... 
In [35]: 3*exp(-12/(27.*log(1.3)))
Out[35]: 0.55135006225210048
In [36]: 3*exp(-12/(27.*log10([12.,14.,15])))
Out[36]: array([ 2.5086749 ,  2.53502116,  2.54592125])
 
Special characters:

;  is use to have 2 comands on the same line (and is NOT starting a comment!) 
a=5 ; b= 3
which actually can be done with:
a,b = 5,3
No matter spaces IF NOT AT THE BEGINNING OF THE LINE:
a , b = 5 , 3  
BTW this can be very powerful:
a, b, c = b, a+b, c+1  
Do you remember this one? Fibonnacci!

# is starting a comment 
a = b # a and b are pointing to different memory place

Parameters are separated by "," on the same line:
can = Canvas(f,width =250, height =250, bg ='ivory')

The blocks are defined by indentation. No begin, no end. In the following, the ....: are printed by ipython. The indentation is automatic in ipython, but not in python: you have to type at least one space to define the IF block. To finish a block, just empty lines (one in python, 2 in ipython...)
In [70]: a,b=5,6

In [71]: if a<b:
   ....:     print a+b
   ....:    
   ....:    
11


In [65]: def f(x):
   ....:     return x*2
   ....: 
In [67]: f(5)
Out[67]: 10



mercredi 22 septembre 2010

The same module in different libraries

One thing that I found nice in Python compared to IDL, is the amount of very complete and complex libraries that have been developed and are accessible so easily... And one thing I quickly found horrible is the amount os libraries!...

Just an example: I installed the Mayavi package (it means that I have to also install the vtk package, but hopefully all is already in the macport repositories...). and began to try playing with it.
And I found that sometimes some tutorial are talking about functions I don't have... For example, the points3d() function is not available. Well, I realized that it is in the mlab module and that actually I have... 6 mlab.py files in the directory where Python is looking for!!!
Just asking from a command line shell:
find /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages mlab.py
give me:
./enthought/mayavi/mlab.py
./enthought/mayavi/tools/mlab.py
./enthought/tvtk/tools/mlab.py
./matplotlib/mlab.py
./numpy/numarray/mlab.py
./numpy/oldnumeric/mlab.py
So I have to be very careful with which module I download and in which order, to be sure that I have the mlab module I want.
Or the best is to call it explicitely and to check what is in each:
from enthought.mayavi import mlab as mlab_mayavi 
from enthought.tvtk.tools import mlab as mlab_tvtk
To obtain the list of the functions available from one module, just type for example mlab_mayavi.  and use the TAB to obtain the possible completions.
We can very quickly realize that the needed points3d() function is absent in all mlab modules except the mayavi one.
So be very careful with the modules, mlab is not always mlab...

mardi 21 septembre 2010

Arrays bounds using NumPy

NumPy is load by default when calling ipython -pylab. If not, it's always possible to import all the module by from numpy import *  or import numpy as N (then you need to start any numpy function with N.).
To easy create arrays, just use:
a=r_[1:10]
It will create an array from 1 to... 9 !!! It is actually a kind of shortcut for:
a=arange(1,10)
which also provides an array of 9 elements, from 1 to 9.
The step can be specified:
a=r_[1:10:.1] or a=arange(1,10,.1).
In this case, the array latest element is 9.9, not 10.
It is also possible to give the number of elements, for this a "j" is added to the 3rd parameter:
b=r_[1:10:100j] or b=linspace(1,10,100).
BUT in these latest two equivalent cases, the latest element is actually 10!...
Another way to say this:
a=r_[1:10:.1] and b=r_[1:9.9:90j] are equivalent. Not obvious for me...
BTW, the same philosophy apply when selecting part of an array:
In [71]: a=r_[1:10]
In [72]: a
Out[72]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])
In [73]: a[0:2]
Out[73]: array([1, 2])

I certainly would expect the answer to be a 3 elements sub-array. I'm sure this will be a big source of bug in my future Python codes ;-) 
I found something quite powefull: accessing the latest elements of an array:
In [77]: a[-1]
Out[77]: 9
In [78]: a[-5:-1]
Out[78]: array([5, 6, 7, 8])
In [79]: a[-5::]
Out[79]: array([5, 6, 7, 8, 9])


I also saw that the indexing of arrays can be quite sophisticated: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

lundi 20 septembre 2010

Quick function definition: lambda and "closure"

There is 2 ways to define a function in Python. The classical:
def fitfunc(p,x):
         return p[0]*exp(-(x-p[1])**2/(2.0*p[2]**2))
and with the use of lambda:
fitfunc = lambda p, x: p[0]*exp(-(x-p[1])**2/(2.0*p[2]**2))

Both can be used in any place:
plot(x,fitfunc([1.,0.7,.3],x),'.')

From what I saw on the net, the compact lambda definition is more used when the function is small and used locally...

BTW, while Googling around this concept, I found this:

In [114]: def summer(nombre):
   .....:     def sumit(value):
   .....:         return value+nombre
   .....:     return sumit
   .....: 
In [115]: sum10 = summer(10)

In [116]: sum10(20)
Out[116]: 30


The summer function return a function, which depend on the value of the nombre variable. And the sum10 function keep in mind the value 10, even after leaving the summer function. This is what the computing science people name a "closure"...

dimanche 19 septembre 2010

filtering data: the where() function

I'm now trying to use my favorite IDL function in Python: the where() function. It is used to select the subscripts of an array where a condition is fulfilled.
As I'm using the pylab version of ipython, it comes with a where function that can match more or less the IDL one. But it seems that another method can also be used...
Let's do some examples: I want to select from a table of 1000000 lines and 10 columns the elements that have a value for the 3rd and 7th columns bigger than 0.5.


I'm first creating a 2D-table on which I will apply my filter.

a=random((1e6,10))

I first had to realize that the order of the subscripts are in the inverse order than in IDL: first rows, then columns.
So the elements of the 3rd columns are a[:,2], and the ones for the 7th columns are a[:,6].
I first try:

tt = where(a[:,2] > 0.5 and a[:,6] > 0.5)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

OK, I'm not yet at the level of understanding what Python told me, but clearly it's not correct. I finally found that with parenthesis and & instead og and, things are going well:
tt = where((a[:,2] > 0.5) & (a[:,6] > 0.5))

tt is a so-called "tuple", which seems to be an array, it has a size:
In [90]: size(tt)
Out[90]: 250248


It's a credible value given the condition and the size of the input table.

I can now use this variable to extract the values from the initial table:
x=a[tt,2]
y=a[tt,6]

The main problem here is that the table I have in x and y are not correctly shaped:
In [112]: x.shape
Out[112]: (1, 250248)


If I try to plot this (using plot(x,y,'.')), it doesn't work (well, after some minutes waiting for a result I killed the plot!).
I can reshape the x and y tables using for example the transpose function:
plot(transpose(x),transpose(y),'.')
but the best is to transpose the filter before using it:
tt2 = transpose(where((a[:,2] > 0.5) & (a[:,6] > 0.5)))
plot(a[tt2,2],a[tt2,6],'.')
is working fine. BTW, tt2 is not anymore a tuple, it's not an array of integers, so for example:
In [146]: tt2.size
Out[146]: 250248

Another way of filtering the data is to generate a table of booleans:
tt3 = (a[:,2] > 0.5) & (a[:,6] > 0.5)
This is an array:
In [147]: tt3.size
Out[147]: 1000000

In [148]: tt3.dtype
Out[148]: dtype('bool')

Contrary to IDL, it can directly be used in a table:
In [149]: a[tt3,2].size
Out[149]: 250248

In [150]: a[tt3,2].shape
Out[150]: (250248,)


Fine, the plot works also with this:
In [154]: plot(a[tt3,2],a[tt3,6],'.')

I tried both methods (where and boolean) on big table, but didn't saw any real difference on time execution. If some readers could tell me which one is really the more "python-way"...

dimanche 12 septembre 2010

Reading an ascii file

I used to use the read_ascii() function in IDL (BTW I use my own version, which return an array of structures and not a structure containing arrays...), and it seems there is not an equivalent tool... or I didn't found it yet!
Nevertheless, I tried to make the job simpler by reading a Comma Separated Variable file (csv), as it can be output by any spreadsheet or downloaded from the net (e.g. Vizier tables).
I first found a module able to read such file: csv (!).
This is the first solution I used:

import csv
from scipy import *
rr = csv.DictReader(open('DIGEDA1.1.csv'))
x=[]
y=[]
for row in rr:
    x.append(row['5007'])
    y.append(row['3727'])
plot(x,y)

The problem is that I didn't succeed to apply simple filter to the data: I want to plot the log of the values, so I have to filter out the negative values... It seems that the format used for the x and y variables (namely "list") is not really the best one to be used with where.
 
Finally, I found another function to read the file, which outputs something very similar to a structure.
It is part of the pylab distribution (matplolib), so no need to import it as I use the pylab argument when calling ipython. So here my second (and successful) try to read, filter and plot my data:

rec=csv2rec('DIGEDA1.1.csv')
tt = where(logical_and(rec['3727']>0,rec['5007']>0))
rec2=rec[tt]
x=rec2['5007']
y=rec2['3727']
plot(log(y),log(x),'o')


The rec variable contains the whole file, in such a way very similar to IDL structure. A very nice thing is that the names of the different columns are directly taken from the first line of my file, which contains this information.

It is finally quite easy (well, it took me 3 hours to find all this in the Google jungle... it's why I'm doing this blog!).

samedi 11 septembre 2010

Where is my plot?

Let's go to the first plot in ipython!
I just copy-past from the web something very promising:
ipython -pylab
Will import the modules I will need for most of my purposes. Nice!
And then, from within ipython:
x = arange(10)
y = x**2
plot(x,y)
That's it!
but... nothing appears!
On the Ubuntu, a nice window with a parabolic function, but in Mac, nothing!
Only a nice message like this one: 

Out[3]: [<matplotlib.lines.Line2D object at 0x103f139d0>]
 

But Linux gave the same, so it doesn't seems to be an error.
After quite one hour Googling all around, I found that for some strange reason my output device is wrongly set. In the Python language, it is called a backend, and some information can be found here:

This is determined by a value set in the matplotlibrc files. One is the default, coming with the matplotlib distribution, another one can be defined in the user folder.
It seems that the default value in my case is Agg, but for the interactivity it should be TkAgg.

So in a file named ~/.matplotlib/matplotlibrc I put:
backend      : TkAgg

And I saw a very nice parabolic plot! And what a surprise: the plot is dynamic, I can change the axes, make close-up, and DIRECTLY SAVE IN PS OR PDF file (IDL users know why capitalize here...).

Installation on the Ubuntu

Well, the Ubuntu comes with an already recent python (2.5) version,and the package manager lets me install ipython, matplotlib, numpy straightforwardly . Nothing else, sorry, too easy ;-)

Installation

OK, let's start to use Python! Well, first of all, where can I obtain it? It's a freeware, so don't need to ask for money to buy a licence, that's a first very good point!
I'm working with mainly 2 OS: Linux (very old Fedora) and OSX from Mac. It's easier for me to work now with the Mac on this project, but from what I understood, anything will be translated from one OS to the other one without problems. I also have a very uptodate Ubuntu in a VirtualBox, where I can also try things.
Python is already installed on the Mac. Fine! But it seems I will have to install "modules" (the way Python people call libraries). Me first problem is that other programs in my Mac depends on Python, so what If I do something wrong? To avoid this, I decided to install another Python distribution in the Port environment, and no to touch the /Library one.

Very easy to install, just a "sudo port install python-26" and it's OK!
I realized that a previous Python was already present, the 24. I install python_select and use it to define the default version to be 26, namely the 2.6.6 one.
Ok, ready to go.
First command:
Python 2.6.6 (r266:84292, Sep  9 2010, 10:10:29)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 5+7
12

So cool!

I quickly Google some informations and install some modules that seems to be of big interest:
scipy, numpy, pylab. All available by the "sudo port install" command.
The 2 other ones I installed today are numdisplay and pyfits, both are not in the port distribution. Easy to find the page where they can be downloaded, and after a gunzip-tar, I'm with directories in which it seems the tradition wants a "sudo python setup.py install" to do the job.

As I'm coming from IDL, I don't want to loose one of the 3 main issues of this language: the I (which stay for Interactive). I already know that Python can deal with the D (data) and the L (language), but for the I, it seems that the use of ipython is preferable. OK, not a problem, it's available from the port repository.

Now I can start to play with my brand new game!...

Here is a list of all the modules accessible from my ipython, using help ('modules'):
{CM:cmmacbook-2 ~/Python/Modules} ipython-2.6
Python 2.6.6 (r266:84292, Sep  9 2010, 10:10:29)
Type "copyright", "credits" or "license" for more information.

IPython 0.10 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object'. ?object also works, ?? prints more.

In [1]: help ('modules')

Please wait a moment while I gather a list of all available modules...

/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/matplotlib/numerix/__init__.py:18: DeprecationWarning:
**********************************************************
matplotlib.numerix and all its subpackages are deprecated.
They will be removed soon.  Please use numpy instead.
**********************************************************

  warnings.warn(msg, DeprecationWarning)
AppKit              _multibytecodec     hotshot             posix
Audio_mac           _multiprocessing    htmlentitydefs      posixfile
BaseHTTPServer      _random             htmllib             posixpath
Bastion             _scproxy            httplib             pprint
CGIHTTPServer       _socket             ibrowse             profile
Canvas              _sqlite3            ic                  pspersistence
Carbon              _sre                icglue              pstats
Cocoa               _ssl                icopen              pty
CodeWarrior         _strptime           idlelib             pwd
ColorPicker         _struct             igrid               py2app
ConfigParser        _symtable           ihooks              py_compile
Cookie              _testcapi           imaplib             pyclbr
CoreFoundation      _threading_local    imghdr              pydoc
Dialog              _tkinter            imp                 pydoc_topics
DocXMLRPCServer     _warnings           imputil             pyexpat
EasyDialogs         _weakref            inspect             pyfits
Explorer            abc                 io                  pylab
FileDialog          aepack              ipipe               pytz
Finder              aetools             ipy_app_completers  quopri
FixTk               aetypes             ipy_autoreload      random
Foundation          aifc                ipy_bzr             re
FrameWork           altgraph            ipy_completers      readline
HTMLParser          anydbm              ipy_constants       repr
IN                  applesingle         ipy_defaults        resource
IPython             appletrawmain       ipy_editors         rexec
InterpreterExec     appletrunner        ipy_exportdb        rfc822
InterpreterPasteInput argvemulator        ipy_extutil         rlcompleter
MacOS               array               ipy_fsops           robotparser
MimeWriter          ast                 ipy_gnuglobal       runpy
MiniAEFrame         astyle              ipy_greedycompleter sched
Nav                 asynchat            ipy_jot             scipy
Netscape            asyncore            ipy_kitcfg          scitedirector
OSATerminology      atexit              ipy_legacy          select
PhysicalQInput      audiodev            ipy_lookfor         sets
PhysicalQInteractive audioop             ipy_p4              setuptools
PixMapWrapper       autoGIL             ipy_pretty          sgmllib
Queue               base64              ipy_profile_doctest sha
Scientific          bdb                 ipy_profile_none    shelve
Scientific_affinitypropagation bdist_mpkg          ipy_profile_numpy   shlex
Scientific_interpolation bgenlocations       ipy_profile_scipy   shutil
Scientific_netcdf   binascii            ipy_profile_sh      signal
Scientific_numerics_package_id binhex              ipy_profile_zope    site
Scientific_vector   bisect              ipy_pydb            smtpd
ScrolledText        bsddb               ipy_rehashdir       smtplib
SimpleDialog        buildtools          ipy_render          sndhdr
SimpleHTTPServer    bundlebuilder       ipy_server          socket
SimpleXMLRPCServer  bz2                 ipy_signals         sqlite3
SocketServer        cPickle             ipy_stock_completers sre
StdSuites           cProfile            ipy_synchronize_with sre_compile
StringIO            cStringIO           ipy_system_conf     sre_constants
SystemEvents        calendar            ipy_traits_completer sre_parse
Terminal            cfmfile             ipy_user_conf       ssl
Tix                 cgi                 ipy_vimserver       stat
Tkconstants         cgitb               ipy_which           statvfs
Tkdnd               chunk               ipy_winpdb          string
Tkinter             clearcmd            ipy_workdir         stringold
UserDict            cmath               itertools           stringprep
UserList            cmd                 jobctrl             strop
UserString          code                json                struct
_AE                 codecs              keyword             subprocess
_AH                 codeop              ledit               sunau
_App                collections         lib2to3             sunaudio
_CF                 colorsys            libxml2             symbol
_CG                 commands            libxml2mod          symtable
_CarbonEvt          compileall          linecache           sys
_Cm                 compiler            locale              syslog
_Ctl                configobj           logging             tabnanny
_Dlg                contextlib          macerrors           tarfile
_Drag               cookielib           macholib            telnetlib
_Evt                copy                macostools          tempfile
_File               copy_reg            macpath             terminalcommand
_Fm                 crypt               macresource         termios
_Folder             csv                 macurl2path         test
_Help               ctypes              mailbox             textwrap
_IBCarbon           curses              mailcap             this
_Icn                datetime            markupbase          thread
_LWPCookieJar       dateutil            marshal             threading
_Launch             dbhash              math                time
_List               dbm                 matplotlib          timeit
_Menu               decimal             md5                 tkColorChooser
_Mlte               difflib             mhlib               tkCommonDialog
_MozillaCookieJar   dircache            mimetools           tkFileDialog
_OSA                dis                 mimetypes           tkFont
_Qd                 distutils           mimify              tkMessageBox
_Qdoffs             doctest             mmap                tkSimpleDialog
_Qt                 drv_libxml2         modulefinder        toaiff
_Res                dumbdbm             modulegraph         token
_Scrap              dummy_thread        mpl_toolkits        tokenize
_Snd                dummy_threading     multifile           trace
_TE                 easy_install        multiprocessing     traceback
_Win                email               mutex               tty
__builtin__         encodings           netrc               turtle
__future__          envpersist          new                 types
_abcoll             errno               nis                 unicodedata
_ast                exceptions          nntplib             unittest
_bisect             ext_rescapture      nose                urllib
_bsddb              fcntl               ntpath              urllib2
_builtinSuites      filecmp             nturl2path          urlparse
_bytesio            fileinput           numbers             user
_codecs             findertools         numdisplay          uu
_codecs_cn          fnmatch             numeric_formats     uuid
_codecs_hk          formatter           numpy               validate
_codecs_iso2022     fpformat            objc                videoreader
_codecs_jp          fractions           opcode              warnings
_codecs_kr          ftplib              operator            wave
_codecs_tw          functools           optparse            weakref
_collections        future_builtins     os                  webbrowser
_csv                gc                  os2emxpath          whichdb
_ctypes             gdbm                parser              win32clip
_ctypes_test        genericpath         pdb                 wsgiref
_curses             gensuitemodule      pickle              xdrlib
_curses_panel       gestalt             pickleshare         xml
_elementtree        getopt              pickletools         xml2po
_fileio             getpass             pimp                xmllib
_functools          gettext             pipes               xmlrpclib
_hashlib            glob                pkg_resources       xxsubtype
_heapq              grp                 pkgutil             zipfile
_hotshot            gzip                platform            zipimport
_json               hashlib             plistlib            zlib
_locale             heapq               popen2             
_lsprof             hmac                poplib             

Enter any module name to get more help.  Or, type "modules spam" to search
for modules whose descriptions contain the word "spam".

Well, it's lot!
I will first concentrate on trying to reproduce what I'm teaching to the students in IDL. The best could be for example to try to follow the Cook-book I wrote 10 years ago: http://132.248.1.102/~morisset/idl_cours/IDL/index_local.htm

So... why changing?

I'm an IDL user since ... 1992, when I started my PhD thesis. I use it since then, I'm even teaching the use of it some times. I'm not really a "guru", but people in my Institute use to ask me when they have problems.
It's true that I'm using IDL for everything, even for a simple addition ;-) I use to write IDL programs to generate the figures, the LaTeX tables and the main results of my papers and presentations. I'm quite able to do anything I want in IDL, such as string manipulation, reading-writing data, interfacing with other programs. I have a lot of ready-to-use routines that I can apply to new problems. I also wrote 3 big programs using IDL (Cloudy_3D, X-SSN, 3MdB managing tools).
So... why changing in these conditions? Various reasons, without one dominating.
  • I have problems since the beginning of my IDL story of using a commercial language. I tried GDL, but some important functions are still not implemented (for example SAVE applied to structure containing pointers to arrays...). It's not a problem in my every day work, I have access to IDL licenses at my Institute. But teaching to students a language that they cannot afford is going against my personal politic...
  • It seems that the next generation softwares in my area will by in Python: data processing pipelines (satellite data), data analysing tools (IRAF is now PyRAF), are in Python. Dedicated library specific to my area are now available in Python (e.g. FITS reading tools)
  • Python was not so useful for me 5 years ago, as some important functionalities were missing. It changed in the last years and now it seems to me that it's really powerful. Let's see...
The aim of this blog is to share my experience in this learning of Python, first to help me to organize my ideas and secondly to help others to see how hard it is.
Any tip, tool, link is really welcome.