mercredi 29 septembre 2010

Arrays and types: the rec.array type

Here are some example of unexpected behaviors, when coming from IDL:
In [4]: a
Out[4]: [1.0, 2.0, 3] #a is a list
In [5]: b
Out[5]: array([ 0.84147098,  0.90929743,  0.14112001])
In [6]: 2*b
Out[6]: array([ 1.68294197,  1.81859485,  0.28224002])
In [7]: 2*a
Out[7]: [1.0, 2.0, 3, 1.0, 2.0, 3] #this double the list, not the elements!!!
In [10]: c=array([1,2.,3])
In [11]: 2*c
Out[11]: array([ 2.,  4.,  6.]) #now it's OK
In [12]: d=[1,2.,'3'] #mixing int, float and string
In [13]: e=array(d)
In [14]: e
Out[14]: array(['1', '2.0', '3'], dtype='|S8')
#converted to a single type: string

As you can see, the [] are not always defining arrays, sometime it needs explicitly the function array().
A list can contain different types, not an array.

Let's see now how the csv-reader can deal with different types in the same line (as when using structures in IDL):
The file to read is:
int  flt  str  int
1 1 "test1" 1
2 2.3 "tralala" 2
5 3.14 "" 3
6 1e3 "double " 4
7 1e79 "big one" 5


The reading process:
rec=csv2rec('test1.csv',delimiter=" ")
And the type is a rec.array, with mixing types:

In [43]: rec
Out[43]:
rec.array([(1, 1.0, 'test1', 1), (2, 2.2999999999999998, 'tralala', 2),
       (5, 3.1400000000000001, '', 3), (6, 1000.0, 'double ', 4),
       (7, 9.9999999999999997e+78, 'big one', 5)],
      dtype=[('int', '<i8'), ('flt', '<f8'), ('str', '|S7'), ('int_1', '<i8')])


Note that no problems with the value of 1 for the 2nd elements in the first data row: it's an integer, but as the element of the same column in the 2nd data row is 2.3, it is not wrongly considered that the type of this row is integer (contrary to IDL read_ascii(), which only consider the first data row to determine the data types).
Also note the headers, defined using the first row: int is used twice, the second time the name is converted to int_1. So cute!

Aucun commentaire:

Enregistrer un commentaire