Announcement for numarray

We have been working on a reimplementation of Numeric, the numeric
array manipulation extension module for Python.  The reimplementation
is virtually a complete rewrite.

While we believe that performance is generally very good for arrays
larger then 100K elements, there are functions and aspects of numarray
which have not been optimized yet, so please be careful about drawing
conclusions about its efficiency from limited test cases. We also
welcome those that would like to contribute to this effort by helping
with the development, or adding libraries.

This new version was developed for number of reasons.  To summarize,
we regularly deal with large datasets and the new version gives us the
capabilities that we feel are necessary for working with such
datasets. In particular:

1) Avoiding promotion of array types in expressions involving Python
   scalars (e.g., 2.*<Float32 array> should not result in a Float64
   array).

2) Ability to use memory mapped files [largely implemented].

3) Ability to access data in fields of arrays of records as numeric
   arrays without copying the data to a new array.

4) Ability to reference byteswapped data or non-aligned data (as might
   be found in record arrays) without producing new temporary arrays.

5) Reuse temporary arrays in expressions when possible [not
implemented yet]

6) Provide more convenient use of index arrays (put and take).

A new implementation was decided upon since many of the existing numpy
developers agree that the existing implementation is not suitable for
massive changes and enhancements. Furthermore Guido is very reluctant
to accept Numeric into the Standard Library unless it was rewritten.

This version has nearly the fully functionality of the basic Numeric
(only masked arrays and the convolve function are missing, and the use
of the ufunc methods reduce, accumulate, and outer for complex
types). No other libraries have been adapted to to Numarray yet.

NUMARRAY IS NOT FULLY COMPATIBLE WITH NUMERIC. (But is very similar in
most respects). The incompatibilities will be listed below. 

Major Compatibility issues:

1) Coercion rules are different. Expressions involving
   scalars may not produce the same type of arrays.

2) Types are represented by Type Objects rather than character
   codes (though the old character codes may still be used
   as arguments to the functions).

3) The C interface implements a subset of the Numeric API to provide
   source level compatibility for Numeric extensions.  This means that
   Numeric extensions need to be ported to and recompiled for
   numarray.  Numeric C-extension binaries are not directly useable by
   numarray, but Numeric source code typically is easily ported to
   numarray.

OPEN ISSUES:

There are still some open questions about the interface which we hope
to resolve with feedback from the Numeric commmunity. These issues are
listed later. Depending on what the consensus is, we may change
aspects of the user interface. It is not our intent to make the case
for either side. This should be done by the proponents of each view in
more detail on the numpy mailing list.

1) Currently a number of functions or methods (e.g., reduce) act on
   the first dimension rather than the last by default (as does
   Numeric). Some feel that it makes sense to always apply functions
   to the most rapidly varying dimension by default and they are
   proposing that we change numarray to reflect this. Others say that
   the current behavior is more compatible with Python behavior, and
   that which dimension a function applies to depends on the function
   (reduce would apply to the first while FFT would apply to the
   last).

2) The ones and zeros functions should return Float64 arrays
   by default (currently Int32)

3) What is the necessary degree of optimization for small arrays? The
   current implementation has much of the code in Python. The
   performance on large arrays is generally fast (or easy to
   optimize), but reducing the overhead for setting up array
   operations will require much more of the current implementation be
   rewritten in C or more complex algorithms in Python. Either
   approach involves significant work and a more difficult maintain
   module. We have not yet done any significant benchmarking lately,
   but we do have some order of magnitude estimates about relative
   performance.  (The following numbers depend on the platform so do
   not take them too seriously). IDL is already at about half its
   asymptotic throughput for 200 element arrays. Numpy is at its half
   throughput for 2000 element arrays. We estimate that numarray is
   probably another order of magnitude worse, i.e., that 20K element
   arrays are at half the asymptotic speed. How much should this be
   improved? We have some thoughts about other approaches that don't
   require this sort of optimization, but rather try casting iterative
   calculations on small arrays into a form that numarray can perform
   in one call. These ideas will be elaborated on in a separate
   message.  

4) What types are of highest importance to add: Float128, Complex256.

New capabilities:

o Record arrays: arrays of records that have fixed format fields.
o Character arrays: arrays of fixed-length character strings.
o Support for memory mapped files.
o Use of index arrays within subscripts: e.g. 
  if ind = array([4, 4, 0, 2]) and x = 2*arange(6),
  x[inx] results in array([8, 8, 0, 4])
o Support for byteswapped representation.

Things yet to be done:

o Add all the commonly available Numeric libraries to numarray.
o Benchmarking and optimization. For example, a some of the
  functions could be reimplemented with much higher performance
  (e.g., array printing).

We have developed a revised version of the Numeric manual for numarray
(all in all, there is a great deal of commonality).