In previous decades, the rise of ML (DL) draw everyone from different background, many academia (logicians, mathematicians, neurologist etc), scientists/engineers (computing on Matlab, Octave) and especially programmers. Google the term “AI hype cycle” and you’ll see how Artificial General Intelligence(AGI) flip the coin & now we had narrow/weak ML or should i say Deep Artificial Neural Networks everywhere. In order to emerge complex systems, we need more than that, pattern formation, collective behavior, nonlinear dynamics, evolution & adaptation just to name a few. I am not here to show you those but humble building blocks that are part of ML system in production.
Many weak AIs depend upon vector/matrix friendly programming languages like Python & R. Both languages support arrays/vectors operation out of the box & we need more, Numpy to the rescue. This short article will cover 3 useful topics Vectorization, Broadcasting and Indexing.
Q: Let’s address the elephant in the room, why vector or multiple-dimension array processing ?
A: Vector processor, way faster than general purpose registers especially on modern CPU
Followings are practical usage
- Multimedia applications need fast arithmetic operations on the large scale of integers or floating-point processing
- Intensive GPU processing, like games, highest resolution or mining :P
- Machine learning, gradients and automatic differentiation, or simple Y=a + bX
Please dive detail of computer architectures at Flynn’s taxonomy. And feel free to skip two session if you’re familiar with basic concepts.
1. Vectors, Matrices, and Arrays
List, build-in data type of python can created either using list function or primitive
list([1,2,3])
[1,2,3]
Array
import numpy as nparr = np.array([1,2,3])type(arr)
# numpy.ndarray
Vector, one-dimensional array horizontally (row) or vertically (column).
import numpy as np
vector_row = np.array([1, 2, 3])
vector_column = np.array([[1],
[2],
[3]])
Matrix, just simple rectangular array with X rows & X columns.
import numpy as npmatrix = np.mat([[1, 2],
[1, 2],
[1, 2]])type(matrix)
# numpy.matrixarray_matrix = np.array([[1, 2],
[1, 2],
[1, 2]])
Note: please don’t use this .mat() function but stick with .array() method since they are defecto standard in Numpy and many function/operation return array(numpy.ndarray) not matrix(numpy.matrix) data type.
A vector is comparable to a fixed-length array containing integer or floating-point values. — apple doc
array_matrix.size
# will get the size(number of elements) of matrixarray_matrix.shape
# will get the shape (x,x,x) of matrix
2. Intro to NumPy Arrays
I assume you have basic python skills here, otherwise this freecodecamp course will help you. Let’s create evenly spaced values using arange() function and reshape into interesting shape of matrices.
arr = np.arange(18).reshape(2, 3, 3)
arr2 = np.arange(18).reshape(3,2,3)
These two arrays are in 18 elements starting with zero, for first array you should be reading from left to right like “2 reshaped arrays, which are in 3 rows X 3 columns”. You got it for arr2 then “3 reshaped arrays in 2 rows X 3 columns format”.
Numpy arrays are extremely fast, store in continuous memory block of memory in short.
3. Vectorization
Vectorization is a method for achieving parallelism inside a single processor core. — Intel blog
To put into layman’s terms a single instruction perform over each individual element of an array on single core of CPU in parallel manner. Vectorized operation can replace traditional for loops, run faster, work on different matrix dimension size. Take away — it’s just array expression replacing explicit for-loop.
This practice of replacing explicit loops with array expressions is commonly referred to as vectorization. In general, vectorized array operations will often be one or two (or more) orders of magnitude faster than their pure Python equivalents, with the biggest impact [seen] in any kind of numerical computations. ~ Wes McKinney
import numpy as np
matrix = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
sqr = lambda i: i * i
vectorized_sqr = np.vectorize(sqr)
vectorized_sqr(matrix)
To explain about snippet, we are using lambda(nameless/anonymous function) to double the input value i and we pass that into Numpy’s vectorize method as instruction[lambda named sqr ] then pass the data [matrix].
Vectorize function is just for convenience it won’t run any faster and also np.array is already aware of vectorization.
Many examples in this article refer this variable called [matrix].
4. Broadcasting
From previous example np.vectorize( ) function work on different matrix dimensions, this is known as broadcasting. Meaning rules like same shape don’t constraint here to perform any operation.
matrix2 = np.arange(8).reshape(2,2,2)
# Reshaping into 2 matrices of 2rows X 2columnsvectorized_sqr(matrix2)
# Can still use this function with new shaped data
Simple example would be following snippet and you can find the theory behind it.
x = np.array([1,2,3])
y = np.array([10])x + y
# array([11, 12, 13]) element wise sum operation performed
5. Indexing
Access the array elements arbitrarily permitting items to be access out of order and even repeatedly. — unknown source
We’ll start with very basic 1D and work up to multi-dimension arrays.
one_d = np.array([1,2,3,4,5])one_d[0] # access from starting index = 1
one_d[-5] # reverse access from ending point = 1one_d[4] # 5
one_d[-1] # 5
2 Dimension array
two_d = np.array([[1,2,3,4,5], [6,7,8,9,10]])two_d[1][2] # 1 is second array, 2 for 3 index of that array= 8
two_d[-1][-3] # It's the same but access from ending point= 8
3 Dimension Array
three_d = np.arange(18).reshape(2,3,3)three_d[0][0][1] # we will have= 1 three_d[-2][-3][-2] # same value= 1
Let’s jump into randomly accessing to an array. We will select 2 elements from the matrix, then we will access using first wrapper size(3), it will be both column/row .
n = np.arange(9).reshape(3,3)
# array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])n[[0,1],[1,1]]
# Will get middle elements of first two array 1 & 4n[[0,1,2],[1,1,1]]
# First we access row number then column of all elements
# According to our matrix [n]
# 1 index is [0,1]
# 4 index is [0,1]
# 7 index is [0,1]
To be fancy we will access group of them using normal list and np_array.
normal_list = list(range(1,10))np_array = np.arange(1,10)accessor = np.arange([2,3,4])normal_list[accessor]
TypeError: only integer scalar arrays can be converted to a scalar indexnp_array[accessor]
# We will get index 2,3,4 of an np_array access with np_arrayaccessor_list = [2,3,4]np_array[accessor_list]
# It's valid, we access via normal python list to an np_array array
Let’s switch our gear to Boolean indexing, what we’d seen above is just Integer indexing. Boolean indexing is useful to get rid of unwanted data, replacing null/None or incomplete data, other tasks like masking, filtering etc.
m = np.array([[1,-2,3],[-2,4,3]])m[m < 2]
# array([ 1, -2, -2])
# We got elements those are last than 2m[m > 2] * 2
# array([6, 8, 6])
# We can do on the fly calculation like this too
Slicing, we will be using : column operator to access row or column in steps manner, or all elements.
array[[row, column]] and array [start:end:step] can use interchangeable.
row = np.arange(25).reshape(5,5)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])# 3rd 4th column of that matrix
row[:,[2,3]] or row[:, 2:4:1]
# : in first parameter meaning for every row
array([[ 2, 3],
[ 7, 8],
[12, 13],
[17, 18],
[22, 23]])# 4th 3rd column of that matrix
row[:,-2:-4:-1]
# Remember how we reverse by using negative value ?
# Here not only we access from last element, it's also return the result in that order
array([[ 3, 2],
[ 8, 7],
[13, 12],
[18, 17],
[23, 22]])
We can combine Boolean indexing with slicing too, it’s getting more interesting.
greater_than_4 = row > 4
array([[False, False, False, False, False],
[ True, True, True, True, True],
[ True, True, True, True, True],
[ True, True, True, True, True],
[ True, True, True, True, True]])row[greater_than_4]
array([ 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24])
# Please notice that we'd flatten into one-dimension array# Replacing any elements under4
# Condition, Data, Replacement for not meeting condition
np.where(greater_than_4, row, 0)
array([[ 0, 0, 0, 0, 0],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
Put famous logical and/or/not, not only for filtering but also replacing & slicing.
zero_under_4_over_20 = np.logical_and(row > 4, row < 20)
array([[False, False, False, False, False],
[ True, True, True, True, True],
[ True, True, True, True, True],
[ True, True, True, True, True],
[False, False, False, False, False]])row[zero_under_4_over_20]
# as usual we will get one-dimension array back np.where(zero_under_4_over_20, row, 0)
array([[ 0, 0, 0, 0, 0],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[ 0, 0, 0, 0, 0]])
I had covered some of advance indexing topic here but as always please reference official doc for more information.
6. Average, Variance & Standard Deviation
Let’s continue with useful Numpy function, using same old data [matrix]
np.max(matrix)
# Max value of an elementnp.min(matrix)
# Min value of an elementnp.mean(matrix)
# Return mean valuenp.var(matrix)
# Return mean variancenp.std(matrix)
# Return standard deviation value
Another useful trick would be flatten the matrix into vector either row or column. Flatten and reshape are exactly the same except the one with -1 and column orientation.
matrix.flatten()
# flatten into 1 dimension matrix or row vectormatrix.reshape(9)
# Since we refer to previous variable name called matrix, which had 9 elements, reshape into flat row vector matrix.reshape(1,-1)
# -1 mean "as many as needed", but this would create nested array inside so i suggest we avoid thismatrix.reshape(9, -1)
# Reshape into flat column vector
References and useful resources
I hope you get the essence of vectorization and other tricks provided by Numpy. Happy learning, please provide your feedback on comment.