Announcement for numarray We have been working on a reimplementation of Numeric, the numeric array manipulation extension module for Python. The reimplementation is virtually a complete rewrite. While we believe that performance is generally very good for arrays larger then 100K elements, there are functions and aspects of numarray which have not been optimized yet, so please be careful about drawing conclusions about its efficiency from limited test cases. We also welcome those that would like to contribute to this effort by helping with the development, or adding libraries. This new version was developed for number of reasons. To summarize, we regularly deal with large datasets and the new version gives us the capabilities that we feel are necessary for working with such datasets. In particular: 1) Avoiding promotion of array types in expressions involving Python scalars (e.g., 2.* should not result in a Float64 array). 2) Ability to use memory mapped files [largely implemented]. 3) Ability to access data in fields of arrays of records as numeric arrays without copying the data to a new array. 4) Ability to reference byteswapped data or non-aligned data (as might be found in record arrays) without producing new temporary arrays. 5) Reuse temporary arrays in expressions when possible [not implemented yet] 6) Provide more convenient use of index arrays (put and take). A new implementation was decided upon since many of the existing numpy developers agree that the existing implementation is not suitable for massive changes and enhancements. Furthermore Guido is very reluctant to accept Numeric into the Standard Library unless it was rewritten. This version has nearly the fully functionality of the basic Numeric (only masked arrays and the convolve function are missing, and the use of the ufunc methods reduce, accumulate, and outer for complex types). No other libraries have been adapted to to Numarray yet. NUMARRAY IS NOT FULLY COMPATIBLE WITH NUMERIC. (But is very similar in most respects). The incompatibilities will be listed below. Major Compatibility issues: 1) Coercion rules are different. Expressions involving scalars may not produce the same type of arrays. 2) Types are represented by Type Objects rather than character codes (though the old character codes may still be used as arguments to the functions). 3) The C interface implements a subset of the Numeric API to provide source level compatibility for Numeric extensions. This means that Numeric extensions need to be ported to and recompiled for numarray. Numeric C-extension binaries are not directly useable by numarray, but Numeric source code typically is easily ported to numarray. OPEN ISSUES: There are still some open questions about the interface which we hope to resolve with feedback from the Numeric commmunity. These issues are listed later. Depending on what the consensus is, we may change aspects of the user interface. It is not our intent to make the case for either side. This should be done by the proponents of each view in more detail on the numpy mailing list. 1) Currently a number of functions or methods (e.g., reduce) act on the first dimension rather than the last by default (as does Numeric). Some feel that it makes sense to always apply functions to the most rapidly varying dimension by default and they are proposing that we change numarray to reflect this. Others say that the current behavior is more compatible with Python behavior, and that which dimension a function applies to depends on the function (reduce would apply to the first while FFT would apply to the last). 2) The ones and zeros functions should return Float64 arrays by default (currently Int32) 3) What is the necessary degree of optimization for small arrays? The current implementation has much of the code in Python. The performance on large arrays is generally fast (or easy to optimize), but reducing the overhead for setting up array operations will require much more of the current implementation be rewritten in C or more complex algorithms in Python. Either approach involves significant work and a more difficult maintain module. We have not yet done any significant benchmarking lately, but we do have some order of magnitude estimates about relative performance. (The following numbers depend on the platform so do not take them too seriously). IDL is already at about half its asymptotic throughput for 200 element arrays. Numpy is at its half throughput for 2000 element arrays. We estimate that numarray is probably another order of magnitude worse, i.e., that 20K element arrays are at half the asymptotic speed. How much should this be improved? We have some thoughts about other approaches that don't require this sort of optimization, but rather try casting iterative calculations on small arrays into a form that numarray can perform in one call. These ideas will be elaborated on in a separate message. 4) What types are of highest importance to add: Float128, Complex256. New capabilities: o Record arrays: arrays of records that have fixed format fields. o Character arrays: arrays of fixed-length character strings. o Support for memory mapped files. o Use of index arrays within subscripts: e.g. if ind = array([4, 4, 0, 2]) and x = 2*arange(6), x[inx] results in array([8, 8, 0, 4]) o Support for byteswapped representation. Things yet to be done: o Add all the commonly available Numeric libraries to numarray. o Benchmarking and optimization. For example, a some of the functions could be reimplemented with much higher performance (e.g., array printing). We have developed a revised version of the Numeric manual for numarray (all in all, there is a great deal of commonality).