Floating in C++ - part 1 - Root problem

The introduction to this article series can be found here, whereas here you can find all posts.

In this article, I’m going to implement the Bisection method and Newton’s method, both used for finding the root of a generic function. Given a function, we want to find a value $x\in \Omega \subseteq ℝ$ that solves following problem

$f\left(x\right)=0,f:\Omega \to ℝ$

bisection method

The first algorithm is the bisection method. It is very easy to implement, and a good starting point for gathering some ideas on how to structure the future code.

A complete description of how this algorithms works, together with an example, and an implementation in pseudocode, can be found on wikipedia.

Here is what a first naive implementation would look like

#include <functional>

double apply_bisection(std::function<double(double)> f, double a, double b){
double fleft = f(a);
if(fleft==0){
return a;
}
double fright = f(b);
if(fright==0){
return b;
}
double  mid = (a+b)/2;
double fmid = f(mid);
while(fmid != 0){
if(fleft*fmid < 0) {
b = mid;
fmid = f(mid);
} else {
a = mid;
fmid = f(mid);
}
}
return mid;
}

even if such an algorithm is theoretically correct, it may never end, because of a rounding issue. There might never be a value such that fmid==0. Therefore it might be a good idea to set an epsilon value explicitly:

#include <functional>

double apply_bisection(std::function<double(double)> f, double a, double b, double err){
double fleft = f(a);
if(fleft==0){
return a;
}
double fright = f(b);
if(fright==0){
return b;
}
double  mid = (a+b)/2;
double fmid = f(mid);
while(approx_value(fmid, 0, err)){
if(fleft*fmid < 0) {
b = mid;
fmid = f(mid);
} else {
a = mid;
fmid = f(mid);
}
}
return mid;
}

What if the algorithm is too slow to converge? Or what if the err value is too small and the algorithm never stops? We should also always set a maximum number of iterations.

#include <functional>

double apply_bisection(std::function<double(double)> f, double a, double b, double err, std::size_t nitmax=100){
double fleft = f(a);
if(fleft==0){
return a;
}
double fright = f(b);
if(fright==0){
return b;
}
double  mid = (a+b)/2;
double fmid = f(mid);
while(approx_value(fmid, 0, err) && nit != nitmax){
if(fleft*fmid < 0) {
b = mid;
fmid = f(mid);
} else {
a = mid;
fmid = f(mid);
}
}
return mid;
}

Otherwise we could also stop the algorithm if the subinterval is small enough. Fortunately, for the bisection method, it is equivalent to setting a maximum number of iteration. So, which condition is easier for the caller to consider or set?

#include <functional>

double apply_bisection(std::function<double(double)> f, double a, double b, double err1, double err2){
double fleft = f(a);
if(fleft==0){
return a;
}
double fright = f(b);
if(fright==0){
return b;
}
double  mid = (a+b)/2;
double fmid = f(mid);
while(approx_value(fmid, 0, err1) && (b-a)<=err2){
if(fleft*fmid < 0) {
b = mid;
fmid = f(mid);
} else {
a = mid;
fmid = f(mid);
}
}
return mid;
}

The bisection algorithm is very simple, but we already have different stopping strategies. The biggest problem of the current implementation is that the caller does not even know how good the approximation is. They need to call the function f again, even if we just calculated the result internally.

Most of the times, when I had to implement those algorithms for studying their properties, I always needed to return more information, for example:

• How many iterations were necessary before we approximated the solution

• How the error converged to zero

• Absolute and relative error, sometimes at every iteration

But creating such a generic apply_bisection function would make it hard to use, maintain, and reuse in other algorithms.

Even worse, the caller of the algorithm may have other information about f, that might help to make the approximation more accurate, but we cannot use those information in such a generic algorithm.

The keyword for me is iterative algorithm.

Since we are programming in C++, we should also take advantage of some functionalities that it offers in comparison to C and Octave.

If we had some sort of function that every time it gets called would give us a better approximation, then the caller could implement the stopping criteria (it would actually be his responsibility) without changing the underlying algorithm every time.

And if we want return more information, for example the evaluation of f or the interval where the functions have been evaluated, then the caller would be able to collect all the statistical information he needs for studying e.g. the convergence order.

Instead of passing two doubles, it would be nice to pass some sort of interval as a parameter.

After making those considerations, I came up with the following implementation

struct closed_interval{
double a;
double b;
};
inline double midpoint(const closed_interval& interval){
return (interval.a + interval.b)/2;
}

struct bisect_status{
closed_interval interval;
double fa;
double fb;
};

class bisect_iter{
std::function<double(double)> f;
bisect_status st;

public:
explicit bisect_iter(const std::function<double(double)>& f_, const closed_interval& interval) : f(f_){
st.interval = interval;
st.fa = f(interval.a);
st.fb = f(interval.b);
}
bisect_iter& operator++(){
const auto mid = midpoint(st.interval);
const auto fmid = f(mid);

const auto signfmid = signum(fmid);
const auto signfa = signum(st.fa);
const auto signfb = signum(st.fb);
if(st.fa * st.fb == 0){ // just in case we have really hit the 0, will probably never happen with real data...
st.interval = {mid, mid};
st.fa = fmid;
st.fb = fmid;
} else if(fmid*st.fa > 0){
st.interval.a = mid;
st.fa = fmid;
} else {
st.interval.b = mid;
st.fb = fmid;
}
return *this;
}

const bisect_status& operator*() const {
return st;
}
const bisect_status* operator->() const {
return &st;
}
};

For someone coming from C or Matlab, this implementation of bisection algorithm might seem a bit odd. What is operator++()? Which function do we need to call to get an approximation of x0? And where are the stopping criteria defined?

It might be easier to understand the implementation after seeing how this class can be used:

return x*x-25;
}

int main(){
closed_interval interval{0,11};

const double max_int_length = 0.00001;

while(length(it->interval) > max_int_length){
++it;
const auto& st = *it;
std::cout << " a: " << st.interval.a << ", b: " << st.interval.b << "\n";
}
}

Every time we write ++it, the bisection algorithm moves a step forward. The caller, in the example, decided to implement a stopping criteria based on the length of the interval. With this implementation, we can reuse the algorithm with a different stopping criteria, for example, something based on the error of f evaluated in a and b.

return x*x-25;
}

int main(){
closed_interval interval{0,11};

const double max_err = 0.00001;

while(std::min(std::fabs(it->fa), std::fabs(it->fb)) > max_err){
++it;
std::cout << " fa: " << it->fa << ", fb: " << it->fb << "\n";
}
}

Newton method

Another famous algorithm for solving the root problem is the Newton method. Similar to the bisection algorithm, it is possible to define multiple stopping criteria, therefore I decided to implement the method as an iterator:

struct newton_status{
double fx;
double dfx;
double x;
};

class newton_iter{
std::function<double(double)> f;
std::function<double(double)> df;
newton_status st;

public:
explicit newton_iter(const std::function<double(double)>& f_, const std::function<double(double)>& df_, const double& x0) : f(f_), df(df_){
st.x = x0;
st.fx = f(st.x);
st.dfx = df(st.x);
}
newton_iter& operator++(){
auto oldst = st;
st.x = oldst.x - oldst.fx/oldst.dfx;
st.fx = f(st.x);
st.dfx = df(st.x);
return *this;
}
const newton_status& operator*() const {
return st;
}
const newton_status* operator->() const {
return &st;
}
};

Like earlier, it is the responsibility of the caller to decide when to stop the iterations

double fun_new(double x){
return x*x*x-3*x-2;
}
double dfun_new(double x){
return 3*x*x-3;
}

int main(){
newton_iter it(f, df, x0);
while(it->fx > 0.0000001){
++it;
std::cout << " fx: " << it->fx << "\n";
}
}

Secant method

The Newton method has many variations and one of them is the Secant method. The secant method does not need the first derivative of the function but it approximates the derivative evaluating f in two different points. This algorithm is very practical if we do not have, or do not know, the derivative of a given function.

struct secant_status{
double xn;
double xn_1;
double fxn;
double fxn_1;
};

class secant_iter{
std::function<double(double)> f;
secant_status st;
public:
explicit secant_iter(const std::function<double(double)>& f_, double x0, double x1) : f(f_){
st.xn_1 = x0;
st.xn = x1;
st.fxn_1 = f(x0);
st.fxn = f(x1);
}
secant_iter& operator++(){
const auto x_new = st.xn - ((st.xn - st.xn_1)/(st.fxn - st.fxn_1))*st.fxn;
// watch ordering!
st.xn_1 = st.xn;
st.fxn_1 = st.fxn;
st.xn = x_new;
st.fxn = f(x_new);

return *this;
}
const secant_status& operator*() const {
return st;
}
const secant_status* operator->() const {
return &st;
}
};
double fun_sec(double x){
return x+0.5-std::sin(x);
}
int main(){
secant_iter it(fun_sec, -1, -2);
auto oldst = *it;
while(std::fabs(it->fxn) > 0.00001){
++it;
std::cout << " fx: " << it->fx << "\n";
}
}

Conclusion

Implementing the bisection and Newton method is straightforward. With the "iterative algorithm" approach the code should be easily reusable for other algorithms. It is possible to implement the Newton method for multidimensional functions the same way, but we would need to solve a matrix system. The standard library does not provide a (mathematical) matrix class, so you’ll need to implement some matrix and vector classes, and operations beforehand.

The implementations of the newton and bisection algorithm can be found on githube here and here. I’ve also tested the implementation with Catch, test cases can be found here) and here.