Beginner guide to Vectorization, Broadcasting & Indexing in Numpy

SIMD (single instruction, multiple data)

SIMD on single core
SIMD on Wiki
SIMD Vs SISD
SIMD Vs SISD
Image source: NativeScript blog

1. Vectors, Matrices, and Arrays

list([1,2,3])
[1,2,3]
import numpy as nparr = np.array([1,2,3])type(arr)
# numpy.ndarray
import numpy as np

vector_row = np.array([1, 2, 3])
vector_column = np.array([[1],
[2],
[3]])
import numpy as npmatrix = np.mat([[1, 2],
[1, 2],
[1, 2]])
type(matrix)
# numpy.matrix
array_matrix = np.array([[1, 2],
[1, 2],
[1, 2]])
array_matrix.size   
# will get the size(number of elements) of matrix
array_matrix.shape
# will get the shape (x,x,x) of matrix

2. Intro to NumPy Arrays

arr  = np.arange(18).reshape(2, 3, 3)
arr2 = np.arange(18).reshape(3,2,3)
Arrange and Reshape from Numpy Lib

3. Vectorization

import numpy as np

matrix = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

sqr = lambda i: i * i

vectorized_sqr = np.vectorize(sqr)

vectorized_sqr(matrix)

4. Broadcasting

matrix2 = np.arange(8).reshape(2,2,2)
# Reshaping into 2 matrices of 2rows X 2columns
vectorized_sqr(matrix2)
# Can still use this function with new shaped data
x = np.array([1,2,3])
y = np.array([10])
x + y
# array([11, 12, 13]) element wise sum operation performed

5. Indexing

one_d = np.array([1,2,3,4,5])one_d[0]   # access from starting index = 1
one_d[-5] # reverse access from ending point = 1
one_d[4] # 5
one_d[-1] # 5
two_d = np.array([[1,2,3,4,5], [6,7,8,9,10]])two_d[1][2]   # 1 is second array, 2 for 3 index of that array= 8
two_d[-1][-3] # It's the same but access from ending point= 8
3D array
three_d = np.arange(18).reshape(2,3,3)three_d[0][0][1]    # we will have= 1 three_d[-2][-3][-2] # same value= 1
n = np.arange(9).reshape(3,3)
# array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
n[[0,1],[1,1]]
# Will get middle elements of first two array 1 & 4
n[[0,1,2],[1,1,1]]
# First we access row number then column of all elements
# According to our matrix [n]
# 1 index is [0,1]
# 4 index is [0,1]
# 7 index is [0,1]
normal_list = list(range(1,10))np_array = np.arange(1,10)accessor = np.arange([2,3,4])normal_list[accessor]
TypeError: only integer scalar arrays can be converted to a scalar index
np_array[accessor]
# We will get index 2,3,4 of an np_array access with np_array
accessor_list = [2,3,4]np_array[accessor_list]
# It's valid, we access via normal python list to an np_array array
m = np.array([[1,-2,3],[-2,4,3]])m[m < 2]
# array([ 1, -2, -2])
# We got elements those are last than 2
m[m > 2] * 2
# array([6, 8, 6])
# We can do on the fly calculation like this too
row = np.arange(25).reshape(5,5)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
# 3rd 4th column of that matrix
row[:,[2,3]] or row[:, 2:4:1]
# : in first parameter meaning for every row
array([[ 2, 3],
[ 7, 8],
[12, 13],
[17, 18],
[22, 23]])
# 4th 3rd column of that matrix
row[:,-2:-4:-1]
# Remember how we reverse by using negative value ?
# Here not only we access from last element, it's also return the result in that order
array([[ 3, 2],
[ 8, 7],
[13, 12],
[18, 17],
[23, 22]])
greater_than_4 = row > 4
array([[False, False, False, False, False],
[ True, True, True, True, True],
[ True, True, True, True, True],
[ True, True, True, True, True],
[ True, True, True, True, True]])
row[greater_than_4]
array([ 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24])
# Please notice that we'd flatten into one-dimension array
# Replacing any elements under4
# Condition, Data, Replacement for not meeting condition
np.where(greater_than_4, row, 0)
array([[ 0, 0, 0, 0, 0],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
zero_under_4_over_20 = np.logical_and(row > 4, row < 20)
array([[False, False, False, False, False],
[ True, True, True, True, True],
[ True, True, True, True, True],
[ True, True, True, True, True],
[False, False, False, False, False]])
row[zero_under_4_over_20]
# as usual we will get one-dimension array back
np.where(zero_under_4_over_20, row, 0)
array([[ 0, 0, 0, 0, 0],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[ 0, 0, 0, 0, 0]])

6. Average, Variance & Standard Deviation

np.max(matrix)
# Max value of an element
np.min(matrix)
# Min value of an element
np.mean(matrix)
# Return mean value
np.var(matrix)
# Return mean variance
np.std(matrix)
# Return standard deviation value
matrix.flatten() 
# flatten into 1 dimension matrix or row vector
matrix.reshape(9)
# Since we refer to previous variable name called matrix, which had 9 elements, reshape into flat row vector
matrix.reshape(1,-1)
# -1 mean "as many as needed", but this would create nested array inside so i suggest we avoid this
matrix.reshape(9, -1)
# Reshape into flat column vector

References and useful resources

Full stack developer, base in Myanmar. Turn coffee into code, a husband, a father, a reader.