Data Manipulation

1. Data Manipulation#

1.1. Getting Started#

We can create new Vector by UnitRange, the syntax a:b with a and b creates a UnitRange, range starting at a (include) and ending at b(also include). … operator splits one argument into many different arguments in function calls.

x = Float32[1.0:12.0...]

12-element Vector{Float32}:
0
0
0
0
0
0
0
0
0
0
0
0

This is equivalent to:

x = Vector{Float32}(1.0:12.0)
x = collect(1.0:12.0)
x = Float32[1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0]

12-element Vector{Float32}:
0
0
0
0
0
0
0
0
0
0
0
0

The significant difference between Julia and other programming languages is that array index start from 1.

x[1]

1.0f0

In julia, Vector is a 1-dimensional Array, Vector{Int} is a shorthand to Array{Int, 1}.

Vector{Int} == Array{Int,1}

true

We can access a Array’s shape (the length along each axis) via the size function, it will return a Tuple containing the dimensions of the specified array. Because we are dealing with a vector here, the returned Tuple contains just a single element and is identical to the length.

size(x)

(12,)

We can inspect the total number of elements in a Vector or Matrix via the length function.

length(x)

We can change the shape of an Array without altering its length or values, by invoking reshape function. For example, we can transform our vector x whose shape is (12,) to a matrix X with shape (3, 4). This new matrix retains all elements. Notice that the elements of our vector are laid out one column at a time and thus x[3] == X[3,1]. Because Julia is column-major.

X = reshape(x,3,4)

3×4 Matrix{Float32}:
0  4.0  7.0  10.0
0  5.0  8.0  11.0
0  6.0  9.0  12.0

If you want to specify the permutation, use permutedims,or transpose 4x3 matrix:

X = permutedims(reshape(x,4,3),(2,1))
X = transpose(reshape(x,4,3))

3×4 transpose(::Matrix{Float32}) with eltype Float32:
0   2.0   3.0   4.0
0   6.0   7.0   8.0
0  10.0  11.0  12.0

Like Vector, Matrix is an alias for 2-dimensional Array.

Matrix{Int} == Array{Int,2}

true

The new dimensions may be specified either as a list of arguments or as a shape tuple,reshape(x,(3,4)). At most one dimension may be specified with a :, in which case its length is computed such that its product with all the specified dimensions is equal to the length of the original array x, we could have equivalently called reshape(x, 3, :) or reshape(x, :, 4).

We can also construct higher dimensional Array with reshape function. More about multi-dimensional Arrays.

X2 = reshape(x,2,2,3)

2×2×3 Array{Float32, 3}:
[:, :, 1] =
 1.0  3.0
 2.0  4.0

[:, :, 2] =
 5.0  7.0
 6.0  8.0

[:, :, 3] =
  9.0  11.0
 10.0  12.0

reshape creates a view of original vector, meaning that no copy is formed:

X2[2,1,1] = 1
x

12-element Vector{Float32}:
0
0
0
0
0
0
0
0
0
0
0
0

We can construct a multi-dimensional Array with all elements set to zero and a shape of (2, 3, 4) via the zeros function.

zeros(Int,(2, 3, 4))

2×3×4 Array{Int64, 3}:
[:, :, 1] =
 0  0  0
 0  0  0

[:, :, 2] =
 0  0  0
 0  0  0

[:, :, 3] =
 0  0  0
 0  0  0

[:, :, 4] =
 0  0  0
 0  0  0

Using zero(X) to have the same shape as X.

zero(X)

3×4 Matrix{Float32}:
0  0.0  0.0  0.0
0  0.0  0.0  0.0
0  0.0  0.0  0.0

Similarly, we can create a multi-dimensional Array with all ones by invoking ones.

ones((2, 3, 4))

2×3×4 Array{Float64, 3}:
[:, :, 1] =
 1.0  1.0  1.0
 1.0  1.0  1.0

[:, :, 2] =
 1.0  1.0  1.0
 1.0  1.0  1.0

[:, :, 3] =
 1.0  1.0  1.0
 1.0  1.0  1.0

[:, :, 4] =
 1.0  1.0  1.0
 1.0  1.0  1.0

We often wish to sample each element randomly (and independently) from a given probability distribution. For example, the parameters of neural networks are often initialized randomly. The following snippet creates a matrix with elements drawn from a standard Gaussian (normal) distribution with mean 0 and standard deviation 1.

randn((3,4))

3×4 Matrix{Float64}:
 -0.0725183  -0.383395   1.87141   -1.51073
  1.33734    -0.74673    0.600773  -1.4579
 -0.300499    0.348163  -1.58002    1.50599

Finally, we can construct matrix by supplying the exact values for each element.

[2 1 4 3;
1 2 3 4;
4 3 4 1;]

3×4 Matrix{Int64}:
1  4  3
2  3  4
3  4  1

1.2. Indexing and Slicing#

We can access array elements by indexing (starting with 1). To access first or last element based in Array, we can use begin and end.

x[begin],x[end],x[end-1]

(1.0f0, 12.0f0, 11.0f0)

We can access whole ranges of unfold multi-dimensional Array via slicing (e.g., X[begin:end]), where the returned value includes the first index (begin) and the last (end).

X[begin:end]

12-element Vector{Float32}:
0
0
0
0
0
0
0
0
0
0
0
0

When only one index is specified for a order multi-dimensional Array, it is applied to unfolded vector.

X[5]

6.0f0

In the following code, [end,:]selects the last row.

X[end,:]

4-element Vector{Float32}:
0
0
0
0

And [2:3,:] selects the second and third rows.

X[2:3,:]

2×4 Matrix{Float32}:
 5.0   6.0   7.0   8.0
 9.0  10.0  11.0  12.0

In reality, we can use any UnitRange to slice array.

A = reshape(collect(1:36),6,6)
A[begin:2:end,begin:2:end] 

3×3 Matrix{Int64}:
13  25
15  27
17  29

If we want to assign multiple elements the same value, we can broadcast the value via .=. For instance, [1:2,:]accesses the first and second rows, where : takes all the elements along column. While we discussed for matrices, this also works for vectors and for array of more than 2 dimensions.

A[1:2,:] .= 12
A

6×6 Matrix{Int64}:
12  12  12  12  12
12  12  12  12  12
 9  15  21  27  33
10  16  22  28  34
11  17  23  29  35
12  18  24  30  36

1.3. Operations#

Vectorized “dot” operators can be applied elementwise including unary operators:

exp.(x)

12-element Vector{Float32}:
7182817
7182817
085537
59815
41316
4288
6332
958
084
467
14
 162754.8

Also, for every binary operation like ^, there is a corresponding “dot” operation .^ that is automatically defined to perform ^ element-by-element on arrays.

x = [1.0,2,4,8]
y = [2.0,2,2,2]
x.+y, x.-y, x.*y, x./y, x.^y

([3.0, 4.0, 6.0, 10.0], [-1.0, 0.0, 2.0, 6.0], [2.0, 4.0, 8.0, 16.0], [0.5, 1.0, 2.0, 4.0], [1.0, 4.0, 16.0, 64.0])

We can also concatenate multiple arrays together, stacking them end-to-end to form a larger array. We just need to provide a list of arrays and tell the system along which axis to concatenate. The example below shows what happens when we concatenate two matrices along rows (dimension 1) and columns (dimension 2).

X = reshape(collect(1:12),(3,4))
Y = [1.0 1 4 3; 1 2 3 4; 4 3 2 1]
vcat(X,Y)

6×4 Matrix{Float64}:
0  4.0  7.0  10.0
0  5.0  8.0  11.0
0  6.0  9.0  12.0
0  1.0  4.0   3.0
0  2.0  3.0   4.0
0  3.0  2.0   1.0

hcat(X,Y)

3×8 Matrix{Float64}:
0  4.0  7.0  10.0  1.0  1.0  4.0  3.0
0  5.0  8.0  11.0  1.0  2.0  3.0  4.0
0  6.0  9.0  12.0  4.0  3.0  2.0  1.0

We can specify the dims we want to concatenate.This allows one to construct block-diagonal matrices:

cat(X,Y,dims=(2))

3×8 Matrix{Float64}:
0  4.0  7.0  10.0  1.0  1.0  4.0  3.0
0  5.0  8.0  11.0  1.0  2.0  3.0  4.0
0  6.0  9.0  12.0  4.0  3.0  2.0  1.0

cat(X,Y,dims=(1,2))

6×8 Matrix{Float64}:
0  4.0  7.0  10.0  0.0  0.0  0.0  0.0
0  5.0  8.0  11.0  0.0  0.0  0.0  0.0
0  6.0  9.0  12.0  0.0  0.0  0.0  0.0
0  0.0  0.0   0.0  1.0  1.0  4.0  3.0
0  0.0  0.0   0.0  1.0  2.0  3.0  4.0
0  0.0  0.0   0.0  4.0  3.0  2.0  1.0

Sometimes, we want to construct a binary array via logical statements. Take X .== Y as an example. For each position i, j, if X[i, j] and Y[i, j] are equal, then the corresponding entry in the result takes value 1, otherwise it takes value 0.

X .== Y

3×4 BitMatrix:
0  0  0
0  0  0
0  0  0

Summing all the elements in the array:

reduce(+,X)

1.4. Broadcasting#

Under certain conditions, even when shapes differ, we can still perform elementwise binary operations by invoking the broadcasting mechanism. Broadcasting works according to the following two-step procedure: (i) expand one or both arrays by copying elements along dimension 2 so that after this transformation, the two arrays have the same shape; (ii) perform an elementwise operation on the resulting arrays.

a = reshape(collect(1:3),(3,1))

3×1 Matrix{Int64}:
 1
 2
 3

b = reshape(collect(1:2),(1,2))

1×2 Matrix{Int64}:
 1  2

a.+b

3×2 Matrix{Int64}:
3
4
5

1.5. Saving Memory#

Running operations can cause new memory to be allocated to host results. For example, if we write Y = Y .+ X, we dereference the array that Y used to point to and instead point Y at the newly allocated memory. We can demonstrate this issue with Julia’s objectid() function, objectid(x)==objectid(y) if x === y, and for mutable values (arrays, mutable composite types), x === y is true if x and y are the same object, stored at the same location in memory. Note that after we run Y = Y .+ X, objectid(Y) points to a different location. That is because julia first evaluates Y .+ X, allocating new memory for the result and then points Y to this new location in memory.

before = objectid(Y)
Y = Y.+X
objectid(Y) == before

false

However, .+= is an in-place operation:

before = objectid(Y)
Y .+=X
objectid(Y) == before

true

“dotted” updating operators like a .+= b (or @. a += b) are parsed as a .= a .+ b, where .= is a fused in-place assignment operation:

before = objectid(Y)
Y .= Y.+X
objectid(Y) == before

true