Section 2: The numpy Library

 Table of Contents > Chapter 7 > Section 2 

We have seen how a Python list can be used to represent information of arbitrary size.  Unfortunately, processing very large amounts of data stored in Python lists can be slow.  The numpy library provides a more efficient representation for a large amount of data in the form of an array.  Unlike Python lists which are heterogeneous (a given list can have entries of different types), arrays are homogeneous - all entries must be of the same type.

Creating numpy Arrays
You can create numpy arrays from existing Python lists or tuples using the array() function:

>>> import numpy as np

>>> x = np.array([1, 2, 3])
>>> y = np.array((4.0, 5.0, 6.0))

>>> type(x)
numpy.ndarray

>>> type(y)
numpy.ndarray

Note that the type of x and y is numpy.ndarray.  The nd stands for n-dimensional.  The arrays that we just created are 1-dimensional but higher dimensional arrays are possible and we'll consider them a little later on in this chapter.

The numpy library has a function arange() that is analogous to Python's range() function, see:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html#numpy.arange 

Note the values produced by each of the following calls:

>>> np.arange(5)
array([0, 1, 2, 3, 4])

>>> np.arange(2, 5)
array([2, 3, 4])

>>> np.arange(1, 10, 2)
array([1, 3, 5, 7, 9])

>>> np.arange(1, 5, 0.5)
array([ 1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5])

Unlike the Python range() function, numpy's arange() works with arguments of type float.


Broadcasting Operations
An important way in which numpy arrays differ from Python lists is that operations on numpy arrays are broadcast across elements of the array.

Note the following comparison of operations on Python lists and numpy arrays.  Let's start with Python lists:

>>> lst1 = list(range(5))
>>> lst2 = list(range(5, 10))
>>> lst1 + lst2
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> lst1 + 5
TypeError: can only concatenate list (not "int") to list


Now let's try the same with numpy arrays...

>>> arr1 = np.arange(5)
>>> arr2 = np.arange(5, 10)

>>> arr1 + arr2
array([ 5,  7,  9, 11, 13])

Compare the last result with the one obtained above for lst1 + lst2.  When applied to Python lists, the + operator creates a new list that consists of all the elements of lst1 followed by all the elements in lst2.  However, when applied to numpy arrays, the + operator is broadcast to produce the sum of the data in the arrays, element-by-element.   Here are a few more examples of broadcasting operations:

>>> arr1 + 5   # broadcast '+ 5' across all elements of arr1
array([5, 6, 7, 8, 9]) 

>>> arr1 < 3   # broadcast  '< 3' across all elements of arr1
array([ True,  True,  True, False, False], dtype=bool)

The numpy library has a very large number of functions that operate on arrays.  See the collections of mathematics:

http://docs.scipy.org/doc/numpy/reference/routines.math.html

and statistics:

http://docs.scipy.org/doc/numpy/reference/routines.statistics.html

functions (numpy calls them routines), for example. 

Some of these functions are such that they broadcast an operation across all items in the array to which they are applied:

np.sin(arr1)     # produce the sin() of each element in arr1

while others produce aggregate results:

np.mean(arr1)    # produce the mean of all the entries in arr1

The documentation describes the purpose of each function and the value produced.



Indexing Operations
You can index into a numpy array exactly the same way as you would with any sequence type in Python:

>>> x = np.arange(5)
>>> x[0]
0

>>> x[1]
1

>>> x[-1]
4



Slicing Operations
Slicing operations can be applied to numpy arrays using the same syntax as for Python lists.  Recall that a slicing operation on a Python list produces a copy of the list.  Consequently, changes made to the copy are not reflected in the original (and vice versa).  However, a slicing operation on a numpy array produces a view onto the original array.  Hence, any change made to the view is made to the original and vice-versa – BEWARE!

arr[start:end:skip]
- produces a view onto the numpy array arr starting at index start, skipping ahead skip entries every time, up to but not including the element at index end.  Note that all of these arguments are optional.  If start is not provided, it assumes the default value of 0.  If end is not provided, it assumes the default value of arr.size (the size of the array).  If skip is not provided, it assumes the default value of 1

Consider the following operations on lists:

>>> l = [1, 2, 3, 4]
>>> m = l[:]              #copy using a slicing operation

>>> m[0] = 7              #change first item in m

>>> m
[7, 2, 3, 4]              #see that first item has been changed

>>> l
[1, 2, 3, 4]              #but original list has not been modified


and compare them with similar operations on numpy arrays:

>>> r = np.array([1, 2, 3, 4])
>>> s = r[:]                #s is a view onto the whole of r

>>> s[0] = 7                #change first item in s        
>>> s
array([7, 2, 3, 4])         #see that first item has been changed

>>> r
array([7, 2, 3, 4])         #original array has also been modified


This can be incredibly useful, particularly in cases where you want to apply an operation to a subset of the entries of an array.  You simply create a view onto the array, then apply the desired operation to the view:

>>> r = np.arange(1, 5)
>>> s = r[::2]          #view consisting of every 2nd element of r
>>> s[:] = 0            #assign 0 to every element of s
>>> r
array([0, 2, 0, 4])     #every 2nd element of r has the value 0



Fancy Indexing
There are a couple of different forms of "fancy" indexing that can be performed on numpy arrays.

Arrays of integers can be used to index into other arrays:

>>> ia = np.array([0, 3, 4])
>>> a = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
>>> a[ia]          
array([1.0, 4.0, 5.0])              #view onto a of entries at
                                    #indexes 0, 3 & 4 only
>>> a[ia] = -1
>>> a
array([-1,  2,  3, -1, -1,  6])     #entries at index 0, 3 and 4
                                    #have the value -1


Arrays of Booleans can be used to index into other arrays.  The result is an array containing the elements of the indexed array that correspond only to True values in the Boolean array.  

>>> ba = np.array([True, False, True, True, False, True])
>>> a = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
>>> a[ba]
array([1.0, 3.0, 4.0, 6.0])       #view onto a of entries that
                                  #correspond to True values in ba
>>> a[ba] = 0
>>> a              
array([0, 2, 0, 0, 5, 0])         #entries that correspond to True
                                  #values in ba are now 0