1. Data Manipulation#
1.1. Getting Started#
We can create new Vector
by UnitRange
, the syntax a:b
with a
and b
creates a UnitRange
, range starting at a
(include) and ending at b
(also include). … operator splits one argument into many different arguments in function calls.
x = Float32[1.0:12.0...]
12-element Vector{Float32}:
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
11.0
12.0
This is equivalent to:
x = Vector{Float32}(1.0:12.0)
x = collect(1.0:12.0)
x = Float32[1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0]
12-element Vector{Float32}:
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
11.0
12.0
The significant difference between Julia and other programming languages is that array index start from 1.
x[1]
1.0f0
In julia, Vector
is a 1-dimensional Array
, Vector{Int}
is a shorthand to Array{Int, 1}
.
Vector{Int} == Array{Int,1}
true
We can access a Array
’s shape (the length along each axis) via the size
function, it will return a Tuple
containing the dimensions of the specified array. Because we are dealing with a vector here, the returned Tuple
contains just a single element and is identical to the length
.
size(x)
(12,)
We can inspect the total number of elements in a Vector
or Matrix
via the length
function.
length(x)
12
We can change the shape of an Array
without altering its length or values, by invoking reshape
function. For example, we can transform our vector x whose shape is (12,) to a matrix X with shape (3, 4). This new matrix retains all elements. Notice that the elements of our vector are laid out one column at a time and thus x[3] == X[3,1]. Because Julia is column-major.
X = reshape(x,3,4)
3×4 Matrix{Float32}:
1.0 4.0 7.0 10.0
2.0 5.0 8.0 11.0
3.0 6.0 9.0 12.0
If you want to specify the permutation, use permutedims
,or transpose
4x3 matrix:
X = permutedims(reshape(x,4,3),(2,1))
X = transpose(reshape(x,4,3))
3×4 transpose(::Matrix{Float32}) with eltype Float32:
1.0 2.0 3.0 4.0
5.0 6.0 7.0 8.0
9.0 10.0 11.0 12.0
Like Vector
, Matrix
is an alias for 2-dimensional Array.
Matrix{Int} == Array{Int,2}
true
The new dimensions may be specified either as a list of arguments or as a shape tuple,reshape(x,(3,4))
. At most one dimension may be specified with a :
, in which case its length is computed such that its product with all the specified dimensions is equal to the length of the original array x, we could have equivalently called reshape(x, 3, :)
or reshape(x, :, 4)
.
We can also construct higher dimensional Array
with reshape function. More about multi-dimensional Arrays.
X2 = reshape(x,2,2,3)
2×2×3 Array{Float32, 3}:
[:, :, 1] =
1.0 3.0
2.0 4.0
[:, :, 2] =
5.0 7.0
6.0 8.0
[:, :, 3] =
9.0 11.0
10.0 12.0
reshape
creates a view
of original vector, meaning that no copy is formed:
X2[2,1,1] = 1
x
12-element Vector{Float32}:
1.0
1.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
11.0
12.0
We can construct a multi-dimensional Array
with all elements set to zero and a shape of (2, 3, 4) via the zeros
function.
zeros(Int,(2, 3, 4))
2×3×4 Array{Int64, 3}:
[:, :, 1] =
0 0 0
0 0 0
[:, :, 2] =
0 0 0
0 0 0
[:, :, 3] =
0 0 0
0 0 0
[:, :, 4] =
0 0 0
0 0 0
Using zero(X)
to have the same shape as X.
zero(X)
3×4 Matrix{Float32}:
0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
Similarly, we can create a multi-dimensional Array
with all ones by invoking ones
.
ones((2, 3, 4))
2×3×4 Array{Float64, 3}:
[:, :, 1] =
1.0 1.0 1.0
1.0 1.0 1.0
[:, :, 2] =
1.0 1.0 1.0
1.0 1.0 1.0
[:, :, 3] =
1.0 1.0 1.0
1.0 1.0 1.0
[:, :, 4] =
1.0 1.0 1.0
1.0 1.0 1.0
We often wish to sample each element randomly (and independently) from a given probability distribution. For example, the parameters of neural networks are often initialized randomly. The following snippet creates a matrix with elements drawn from a standard Gaussian (normal) distribution with mean 0 and standard deviation 1.
randn((3,4))
3×4 Matrix{Float64}:
-0.0725183 -0.383395 1.87141 -1.51073
1.33734 -0.74673 0.600773 -1.4579
-0.300499 0.348163 -1.58002 1.50599
Finally, we can construct matrix by supplying the exact values for each element.
[2 1 4 3;
1 2 3 4;
4 3 4 1;]
3×4 Matrix{Int64}:
2 1 4 3
1 2 3 4
4 3 4 1
1.2. Indexing and Slicing#
We can access array elements by indexing (starting with 1). To access first or last element based in Array, we can use begin
and end
.
x[begin],x[end],x[end-1]
(1.0f0, 12.0f0, 11.0f0)
We can access whole ranges of unfold multi-dimensional Array via slicing (e.g., X[begin:end]
), where the returned value includes the first index (begin
) and the last (end
).
X[begin:end]
12-element Vector{Float32}:
1.0
5.0
9.0
1.0
6.0
10.0
3.0
7.0
11.0
4.0
8.0
12.0
When only one index is specified for a order multi-dimensional Array, it is applied to unfolded vector.
X[5]
6.0f0
In the following code, [end,:]
selects the last row.
X[end,:]
4-element Vector{Float32}:
9.0
10.0
11.0
12.0
And [2:3,:]
selects the second and third rows.
X[2:3,:]
2×4 Matrix{Float32}:
5.0 6.0 7.0 8.0
9.0 10.0 11.0 12.0
In reality, we can use any UnitRange
to slice array.
A = reshape(collect(1:36),6,6)
A[begin:2:end,begin:2:end]
3×3 Matrix{Int64}:
1 13 25
3 15 27
5 17 29
If we want to assign multiple elements the same value, we can broadcast
the value via .=
. For instance, [1:2,:]
accesses the first and second rows, where :
takes all the elements along column. While we discussed for matrices, this also works for vectors and for array of more than 2 dimensions.
A[1:2,:] .= 12
A
6×6 Matrix{Int64}:
12 12 12 12 12 12
12 12 12 12 12 12
3 9 15 21 27 33
4 10 16 22 28 34
5 11 17 23 29 35
6 12 18 24 30 36
1.3. Operations#
Vectorized “dot” operators can be applied elementwise including unary operators:
exp.(x)
12-element Vector{Float32}:
2.7182817
2.7182817
20.085537
54.59815
148.41316
403.4288
1096.6332
2980.958
8103.084
22026.467
59874.14
162754.8
Also, for every binary operation like ^
, there is a corresponding “dot” operation .^
that is automatically defined to perform ^ element-by-element on arrays.
x = [1.0,2,4,8]
y = [2.0,2,2,2]
x.+y, x.-y, x.*y, x./y, x.^y
([3.0, 4.0, 6.0, 10.0], [-1.0, 0.0, 2.0, 6.0], [2.0, 4.0, 8.0, 16.0], [0.5, 1.0, 2.0, 4.0], [1.0, 4.0, 16.0, 64.0])
We can also concatenate multiple arrays together, stacking them end-to-end to form a larger array. We just need to provide a list of arrays and tell the system along which axis to concatenate. The example below shows what happens when we concatenate two matrices along rows (dimension 1) and columns (dimension 2).
X = reshape(collect(1:12),(3,4))
Y = [1.0 1 4 3; 1 2 3 4; 4 3 2 1]
vcat(X,Y)
6×4 Matrix{Float64}:
1.0 4.0 7.0 10.0
2.0 5.0 8.0 11.0
3.0 6.0 9.0 12.0
1.0 1.0 4.0 3.0
1.0 2.0 3.0 4.0
4.0 3.0 2.0 1.0
hcat(X,Y)
3×8 Matrix{Float64}:
1.0 4.0 7.0 10.0 1.0 1.0 4.0 3.0
2.0 5.0 8.0 11.0 1.0 2.0 3.0 4.0
3.0 6.0 9.0 12.0 4.0 3.0 2.0 1.0
We can specify the dims
we want to concatenate.This allows one to construct block-diagonal matrices:
cat(X,Y,dims=(2))
3×8 Matrix{Float64}:
1.0 4.0 7.0 10.0 1.0 1.0 4.0 3.0
2.0 5.0 8.0 11.0 1.0 2.0 3.0 4.0
3.0 6.0 9.0 12.0 4.0 3.0 2.0 1.0
cat(X,Y,dims=(1,2))
6×8 Matrix{Float64}:
1.0 4.0 7.0 10.0 0.0 0.0 0.0 0.0
2.0 5.0 8.0 11.0 0.0 0.0 0.0 0.0
3.0 6.0 9.0 12.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 1.0 1.0 4.0 3.0
0.0 0.0 0.0 0.0 1.0 2.0 3.0 4.0
0.0 0.0 0.0 0.0 4.0 3.0 2.0 1.0
Sometimes, we want to construct a binary array via logical statements. Take X .== Y
as an example. For each position i, j, if X[i, j] and Y[i, j] are equal, then the corresponding entry in the result takes value 1, otherwise it takes value 0.
X .== Y
3×4 BitMatrix:
1 0 0 0
0 0 0 0
0 0 0 0
Summing all the elements in the array:
reduce(+,X)
78
1.4. Broadcasting#
Under certain conditions, even when shapes differ, we can still perform elementwise binary operations by invoking the broadcasting mechanism. Broadcasting works according to the following two-step procedure: (i) expand one or both arrays by copying elements along dimension 2 so that after this transformation, the two arrays have the same shape; (ii) perform an elementwise operation on the resulting arrays.
a = reshape(collect(1:3),(3,1))
3×1 Matrix{Int64}:
1
2
3
b = reshape(collect(1:2),(1,2))
1×2 Matrix{Int64}:
1 2
a.+b
3×2 Matrix{Int64}:
2 3
3 4
4 5
1.5. Saving Memory#
Running operations can cause new memory to be allocated to host results. For example, if we write Y = Y .+ X, we dereference the array that Y used to point to and instead point Y at the newly allocated memory. We can demonstrate this issue with Julia’s objectid()
function, objectid(x)==objectid(y)
if x === y
, and for mutable values (arrays, mutable composite types), x === y
is true if x and y are the same object, stored at the same location in memory. Note that after we run Y = Y .+ X, objectid(Y)
points to a different location. That is because julia first evaluates Y .+ X
, allocating new memory for the result and then points Y to this new location in memory.
before = objectid(Y)
Y = Y.+X
objectid(Y) == before
false
However, .+=
is an in-place operation:
before = objectid(Y)
Y .+=X
objectid(Y) == before
true
“dotted” updating operators like a .+= b
(or @. a += b
) are parsed as a .= a .+ b
, where .=
is a fused in-place assignment operation:
before = objectid(Y)
Y .= Y.+X
objectid(Y) == before
true