dimanche 3 octobre 2010

My first module! Still on reading ascii formatted file.

I did my first program in Python! Here it is:

def ReadFortran(file,format,names,comment="#"):
    """
    Read a file using a Fortran-style format.
    Return a NumPy rec.array with each column named following the given names.
    Example: data = ReadFortran('test2.dat','a10,1x,f6.2,1x,f6.2,1x,i2',['name', 'ra', 'dec','mag'],comment="#")
    Morisset, IA-UNAM, Oct. 2010
    """

    import Scientific.IO.FortranFormat as FF
    import numpy.core.records as nprec
    FFformat = FF.FortranFormat(format)
    f=open(file,'r')
    rows=[]
    for line in f:
        if line[0] != comment:
            row = FF.FortranLine(line,FFformat)
            rows.append(row.data)
    f.close()
    return nprec.fromrecords(rows, names=names)


OK, it's not a very big one, but it took me a lot of time trying to avoid the list.append command. And I didn't found. But it seems that most of the execution time is on the FortranLine command.
It is very slower than the same in IDL: it reads a 1000000 lines file in some 45 seconds, while IDL take 5... The csv2rec takes 20 secs.
Perhaps one of these days I'll try to call a fortran routine to read the file...

ADD:
It seems that a more compact and pythonesk way of writing the loop is to change:
rows=[]
    for line in f:
        if line[0] != comment:
            row = FF.FortranLine(line,FFformat)
            rows.append(row.data)

into:
rows = [FF.FortranLine(line,FFformat).data for line in f if line[0] != comment]

The map function could also be used:
rows = map(lambda line:FF.FortranLine(line,FFformat).data,f)
BUT in this latest case we can't manage the comment parameter.

Got the tips from http://jaynes.colorado.edu/PythonIdioms.html

Aucun commentaire:

Enregistrer un commentaire