MATLAB stands for MATrix LABoratory. This program and Octave define almost all operations through arrays and matrices. The reasoning behind such a decision is that for solving most complex problems, we will simplify them into a linear system, and solve the approximated problem.

Having a good and efficient matrix system means to be able to write efficient mathematical code. Most importantly, it allows writing compact instructions in order to be able to compare side by side the implementation and mathematical definition of most algorithms.

## A Naive Approach for Implementing a Vector and Class Matrix

Even if the C++ standard library has a std::vector class, it has nothing to do with a mathematical vector.
The std::array would probably be a better candidate since it has a fixed size, but we still need to define most mathematical operations.
Since in C++ we are able to overload different operators (in particular `+`

, `-`

, `*`

, `/`

and `[]`

), we can provide a nice syntax for our mathematical entities.

We will also need to define a matrix class. Like for a mathematical vector class, it is not difficult to create such a class and the implementation is trivial, but it is not very efficient for a lot of numerical problems. Writing a more advanced matrix class would make the code more complicated and harder to test, therefore I went with the simple implementation.

```
namespace calcnum{
template<std::size_t N>
class vector{
public:
std::array<double, N> val = {}; // 0-init
vector() = default;
explicit vector( const double (& val_) [N]) {
std::copy(std::begin(val_), std::end(val_), std::begin(val));
}
double& operator [](std::size_t idx) {
return val.at(idx);
}
const double& operator [](std::size_t idx) const {
return val.at(idx);
}
};
template<std::size_t ROW, std::size_t COL>
class matrix{
std::array<std::array<double, COL>, ROW> val = {};
public:
matrix() = default;
explicit matrix(const double (&val_)[ROW][COL]) {
for(std::size_t row = 0; row != ROW; ++row){
for(std::size_t col= 0; col != COL; ++col){
val.at(row).at(col) = val_[row][col];
}
}
}
struct index{
std::size_t row;
std::size_t col;
};
double& operator [](index idx) {
return val.at(idx.row).at(idx.col);
}
const double& operator [](index idx) const {
return val.at(idx.row).at(idx.col);
}
};
}
```

We still need to define all operators for those the matrix and vector class.
In the snippet above I’ve only provided `operator[]`

, which gives us a nice syntax for operating with these objects:

```
int main(){
{
calcnum::matrix<2, 2> I;
I[{0,0}] = 1;
I[{1,1}] = 1;
const calcnum::matrix<2, 2> I2({{1,0},{0,1}});
assert(calcnum::approx_equal(I[{0,0}], I2[{0,0}], 0.0001));
}
{
const calcnum::matrix<2, 3> M({{1,2,3},{4,5,6}});
assert(M[{0,0}] == 1);
assert(M[{0,1}] == 2);
assert(M[{0,2}] == 3);
assert(M[{1,0}] == 4);
assert(M[{1,1}] == 5);
assert(M[{1,2}] == 6);
}
}
```

## Advantages and Drawbacks of the Current Approach

With the previously presented classes, we are able to define matrices of all sizes. Since the row and column parameters are defined at compile time, we are able to detect at compile time if we are trying to add or multiplicate together matrices of incorrect sizes. There is no need to verify the sizes at runtime. This is a big win compared to the runtime error provided by both MATLAB and Octave: not only because the code might be faster, but because we can assume that the operation is done with the right types since the compiler itself will do the validation continuously. The first drawback is that all elements are always saved on the stack. This is great for performance since we do not need to allocate memory dynamically, but if our matrices are too big, maybe 100 or more elements, the stack might overflow. Thus we need if we want to store a generic big matrix, to allocate memory.

Last but not least, most matrices coming from approximation problems and iterative matrices are not really random, they have some special properties about their structures.
Many are, for example, diagonal or `n`

-diagonal, in which case we need approx `n`

times `std::min(ROW, COL)`

elements and not `ROW*COL`

, since all elements that are not on the diagonal are zero.
In the case of lower or upper triangular matrices, half of the elements are 0.
In those cases, we can also implement a more efficient `operator+`

and `operator*`

.
Sometimes matrices are sparse: it means that most elements are zero, and it would surely be more efficient to store only those that have a different value from zero and their position inside the matrix, instead of the whole matrix.

It is, therefore, possible to provide multiple implementations of the matrix class, based on the properties. This also means that those matrices need to be able to interact together and that all operators should get overloaded for every matrix type. I did not try to solve all these problems because they have been already solved more than once by different libraries. This is just one more reason to use these libraries unless we do want to write one by ourselves.

## Solving a Linear Systems

Given a system $Ax=b$, with $A$ nonsingular and $x$ unknown, we could simply get the value of $x$ by calculating the inverse of $Ax=b$, since it would hold $x={A}^{-1}Ax={A}^{-1}b$. Unfortunately, this approach does not scale well, and for big matrices, it is also unstable. If we insist in doing so, we will probably also lose some nice properties of the $Ax=b$ matrix, for example, sparsity.

Therefore we normally try to solve linear systems with other methods.

## LU Factorization

When solving a triangular system, for example, of the form

the algorithm is trivial. We iterate from the first through the last row and simply substitute all variables. The following function solves a triangular lower system, the code for an upper triangular matrix is very similar.

```
template<std::size_t N>
calcnum::vector<N> solve_triang_low(const matrix<N, N>& A, calcnum::vector<N> b){
assert(is_triang_low(A, 0.001));
auto& x = b;
x[0] = b[0]/A[{0,0}];
for(std::size_t row = 1; row != N; ++row){
double sumprod = 0;
for (std::size_t col = 0; col != row; ++col){
sumprod += A[{row,col}] * x[col];
}
x[i] = (1/A[{row,row}]) * (b[row] - sumprod);
}
return x;
}
```

There is an algorithm, the LU decomposition, that factors a matrix as the product of a lower triangular matrix and an upper triangular matrix, i.e. we can, therefore, apply the LU decomposition, and solve two triangular systems.

Before implementing the LU factorization, we need to realize that we might need to reorder the columns (or rows) of `A`

, because if `A[{i,i}]`

is zero, then we’ll have a division through zero.
Even if that would not happen, reordering the rows (or columns) of `A`

will make the algorithm more stable.
As a general rule, we’d like to maximize `A[{i,i}]`

in order to avoid dividing through small numbers.

```
template<std::size_t N>
matrix<N,N> pivotize_by_row(const matrix<N, N>& A){
auto P = identity<N,N>();
for (std::size_t col = 0; col != N; ++col) {
// find max value on specific row
double a_max = A[{col,col}];
std::size_t row = col;
for(std::size_t j = col; j != N; ++j){
if(A[{j,col}]>a_max){
a_max = A[{j,col}];
row = j;
}
}
if(row != col){
for(std::size_t j = 0; j != N; ++j){
std::swap(P[{j,col}], P[{j,row}]);
}
}
}
template<std::size_t N>
struct lu_fact_res{
matrix<N, N> P;
matrix<N, N> L;
matrix<N, N> U;
};
template<std::size_t N>
lu_fact_res<N> LU(matrix<N, N> A){
const auto P = pivotize_by_row(A);
A = P*A;
matrix<N, N> L;
matrix<N, N> U;
for (std::size_t j = 0; j != N; ++j) {
L[{j,j}] = 1;
for (std::size_t i = 0; i != j + 1; ++i) {
double s1 = 0;
for (std::size_t k = 0; k != i; ++k){
s1 += U[{k,j}] * L[{i,k}];
}
U[{i,j}] = A[{i,j}] - s1;
}
for (std::size_t i = j; i != N; ++i) {
double s2 = 0;
for (std::size_t k = 0; k != j; k++){
s2 += U[{k,j}] * L[{i,k}];
}
L[{i,j}] = (A[{i,j}] - s2) / U[{j,j}];
}
}
return {P, L, U};
}
```

And after the factorization, we can reuse the previously defined functions for solving triangular systems

## Iterative Methods

The LU factorization approach seems to be good enough for all systems, but unfortunately, it is not.
When applying the factorization, we might lose other nice properties.
What if, for example, the matrix is sparse?
What if the matrix is n-diagonal with `n>1`

?
We will lose those properties during the factorization.
And if the matrix we are handling is really big, there might be other methods that, with a lot fewer operations, will give us a satisfactory result.

Iterative approaches, like Jacobi and Gauss-Seidel, give us control over the error, and the possibility to use a lot fewer FLOP, and eventually less memory.

```
template<std::size_t N>
struct jacobi_status{
calcnum::vector<N> x;
};
template<std::size_t N>
class jacobi_iter{
matrix<N, N> BJ;
calcnum::vector<N> db;
jacobi_status<N> st;
public:
explicit jacobi_iter(const matrix<N, N>& A_, const calcnum::vector<N>& b_, const calcnum::vector<N>& x0) : BJ(A_), db(b_){
st.x = x0;
// create iteration matrix
for(std::size_t i = 0; i != N; ++i){
for(std::size_t j = 0; j != N; ++j){
if(i != j){
BJ[{i,j}] = -BJ[{i,j}]/BJ[{i,i}];
}
}
db[i] = db[i]/BJ[{i,i}];
BJ[{i,i}] = 0;
}
}
jacobi_iter& operator++(){
st.x = BJ*st.x + db;
return *this;
}
const jacobi_status<N>* operator->() const {
return &st;
}
};
```

After `N`

iterations of the Jacobi method, if it converges, we will have the same result of the LU decomposition (rounding issues besides).

Similarly, we can define the Gauss-Seidel algorithm iterator

```
template<std::size_t N>
struct gauss_seidel_status{
calcnum::vector<N> x;
};
template<std::size_t N>
class gauss_seidel_iter{
matrix<N, N> U;
matrix<N, N> DL;
calcnum::vector<N> db;
gauss_seidel_status<N> st;
public:
explicit gauss_seidel_iter(const matrix<N, N>& A, calcnum::vector<N> b_, calcnum::vector<N> x0){
st.x = x0;
// create iteration matrixes
for(std::size_t i = 0; i != N; ++i){
for(std::size_t j = 0; j != N; ++j){
if(i < j){
U[{i,j}] = A[{i,j}];
}else{
DL[{i,j}] = A[{i,j}];
}
}
}
db = solve_triang_low(DL,b_);
}
gauss_seidel_iter& operator++(){
auto y = U*st.x;
st.x = db - solve_triang_inf(DL,y);
return *this;
}
gauss_seidel_status<N>* operator->() {
return &st;
}
};
```

## SOR Algorithms

The SOR algorithm is a variation of the Gauss-Seidel algorithm. Since we implemented the Gauss-Seidel algorithm as an iterator algorithm, we can, at every step, "fix" the approximation. We can, therefore, reuse the whole Gauss-Seidel iterator algorithm just by wrapping it in another iterator algorithm

```
// for omega = 1, we have Gauss-Seidel
template<std::size_t N>
class SOR_iter{
gauss_seidel_iter<N> it;
double omega;
public:
explicit SOR_iter(const matrix<N, N>& A, calcnum::vector<N> b_, calcnum::vector<N> x0, double omega_) :it(A,b_,x0), omega(omega_){
}
SOR_iter& operator++(){
auto xk = it->x;
++it;
auto x = it->x;
x = omega*x + (1-omega)*xk;
it->x = x;
return *this;
}
gauss_seidel_status<N>* operator->() {
return &(*it);
}
};
```

## Conclusion

So, here it is; writing a naive matrix system is very naive, and writing it in C++ with templates and operator overloading will give us some advantages out of the box compared to other languages.
Nevertheless, **use a matrix library**.
It will result in even more compact code, with more optimizations, and fewer headaches for ourselves when implementing an algorithm.
Armadillo has even a comparison table with MatLab/Octave, in case you are familiar with those programming languages.