As I'm using the pylab version of ipython, it comes with a where function that can match more or less the IDL one. But it seems that another method can also be used...
Let's do some examples: I want to select from a table of 1000000 lines and 10 columns the elements that have a value for the 3rd and 7th columns bigger than 0.5.
I'm first creating a 2D-table on which I will apply my filter.
I first had to realize that the order of the subscripts are in the inverse order than in IDL: first rows, then columns.
So the elements of the 3rd columns are a[:,2], and the ones for the 7th columns are a[:,6].
I first try:
tt = where(a[:,2] > 0.5 and a[:,6] > 0.5)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
OK, I'm not yet at the level of understanding what Python told me, but clearly it's not correct. I finally found that with parenthesis and & instead og and, things are going well:
tt = where((a[:,2] > 0.5) & (a[:,6] > 0.5))
tt is a so-called "tuple", which seems to be an array, it has a size:
In : size(tt)
It's a credible value given the condition and the size of the input table.
I can now use this variable to extract the values from the initial table:
The main problem here is that the table I have in x and y are not correctly shaped:
In : x.shape
Out: (1, 250248)
If I try to plot this (using plot(x,y,'.')), it doesn't work (well, after some minutes waiting for a result I killed the plot!).
I can reshape the x and y tables using for example the transpose function:
but the best is to transpose the filter before using it:
tt2 = transpose(where((a[:,2] > 0.5) & (a[:,6] > 0.5)))
is working fine. BTW, tt2 is not anymore a tuple, it's not an array of integers, so for example:
In : tt2.size
Another way of filtering the data is to generate a table of booleans:
tt3 = (a[:,2] > 0.5) & (a[:,6] > 0.5)
This is an array:
In : tt3.size
In : tt3.dtype
Contrary to IDL, it can directly be used in a table:
In : a[tt3,2].size
In : a[tt3,2].shape
Fine, the plot works also with this:
In : plot(a[tt3,2],a[tt3,6],'.')
I tried both methods (where and boolean) on big table, but didn't saw any real difference on time execution. If some readers could tell me which one is really the more "python-way"...