This library is meant for interpolating one dimensional data by methods which both hit every point exactly or smooth the data into an approximate fitting curve. The library contains the following interpolation methods:
Linear Interpolation
Quadratic Interpolation
Lagrange Interpolation
Zero Spline
Quadratic Spline
Cubic Spline
BSpline Global Curve Interpolation
BSpline Global Curve Approximation
Loess Interpolation
Gaussian Processes Interpolation
Curve fitting
We will use the following data to demonstrate interpolation methods.
using DataInterpolations, Plots gr() # Dependent variable u = [14.7, 11.51, 10.41, 14.95, 12.24, 11.22] # Independent variable t = [0.0, 62.25, 109.66, 162.66, 205.8, 252.3]
6-element Array{Float64,1}: 0.0 62.25 109.66 162.66 205.8 252.3
For each method, we will show how to perform the fit and use the plot recipe to show the fitting curve.
This is a linear interpolation between ends points of interval of input data point.
A = LinearInterpolation(u,t) scatter(t, u, label="input data") plot!(A)
This function fits a parabola passing through three nearest points from input data point. It is continuous and piecewise differentiable.
A = QuadraticInterpolation(u,t) scatter(t, u, label="input data") plot!(A)
It fits polynomial of degree d (=length(t)-1), and is thuse a continuously differentiable function.
A = LagrangeInterpolation(u,t) scatter(t, u, label="input data") plot!(A)
This function is constant between data points. By default it takes value at left end of the interval. One can change that behavior by passing the keyword argument dir = :right
.
A = ConstantInterpolation(u,t) scatter(t, u, label="input data") plot!(A)
Or using the right endpoints:
A = ConstantInterpolation(u,t,dir=:right) scatter(t, u, label="input data") plot!(A)
This is the quadratic spline. It is a continuously differentiable interpolation which hits each of the data points exactly. Splines are a local interpolation method, meaning that the curve in a given spot is only affected by the points nearest to it. It is explained in more detail at https://www.math.uh.edu/~jingqiu/math4364/spline.pdf .
A = QuadraticSpline(u,t) scatter(t, u, label="input data") plot!(A)
This is the cubic spline. It is a continuously twice differentiable interpolation which hits each of the data points exactly. It is explained in more detail at https://www.math.uh.edu/~jingqiu/math4364/spline.pdf .
A = CubicSpline(u,t) scatter(t, u, label="input data") plot!(A)
This is an interpolating B-spline. B-splines are a global method, meaning that every data point is taken into account for each point of the curve. The interpolating B-spline is the version which hits each of the points. This method is described in more detail at https://pages.mtu.edu/~shene/COURSES/cs3621/NOTES/INT-APP/CURVE-INT-global.html . Let's plot a cubic B-spline (3rd order). Since the data points are not close to uniformly spaced, we will use the :ArcLen
and :Average
choices:
A = BSplineInterpolation(u,t,3,:ArcLen,:Average) scatter(t, u, label="input data") plot!(A)
The approximating B-spline is a smoothed version of the B-spline. It again is a global method. In this case, we need to give a number of control points length(t)>h
and this method fits a B-spline through the control points which is a least square approximation. This has a natural effect of smoothing the data. For example, if we use 4 control points, we get the result:
A = BSplineApprox(u,t,3,4,:ArcLen,:Average) scatter(t, u, label="input data") plot!(A)
Some methods are better suited for dense data. Let's generate such data to demonstrate these methods.
t = sort(10 .* rand(100)) u = sin.(t) .+ 0.5 * randn(100)
100-element Array{Float64,1}: -0.30439819039116406 0.009263076286891103 -0.9344936486849522 0.17837111909468986 0.7935457490963249 -1.328039642867978 0.3925664350171329 0.7334487618175094 0.41615724418615657 0.9399573345627144 â‹® 0.015347216501376187 -0.09760492670482843 0.7049014268831115 0.6689527222344425 -0.13410788198783755 -0.17457646703182647 -0.38384082815612125 -1.0015208982338497 -0.6283578154290572
LOESS is a local least square method. Let's use a 3rd degree polynomial with a smoothing α=0.75.
A = Loess(u,t,2,0.75) scatter(t,u,label="points",legend=:bottomright) plot!(A,label="2nd Degree LOESS") A = Loess(u,t,3,0.75) plot!(A,label="3rd Degree LOESS")
A Gaussian Process is a stochastic interpolation, meaning that it gives a different result each time A(t)
is called, with the mean and variance determined from the uncertainty in the dataset. This functionality is provided as a wrapper over GaussianProcess.jl. Here we will use a zero-mean function with a squared exponential covariance kernal function, and -1 as our guess for the log standard deviation of the observation noise. More details on the arguments can be found at the GaussianProcess.jl repository. Using our wrapper, we build a GPInterpolation like any other:
mZero = MeanZero() kern = SE(0.0,0.0) logObsNoise = -1.0 A = GPInterpolation(u,t,mZero,kern,logObsNoise) scatter(t,u,label="points",legend=:bottomright) plot!(A)
But now we notice that adding more samples of the same interpolation gives different results:
plot!(A)
We can directly plot the Gaussian Process to see the mean and variance functions:
plot(A.gp)
A curve fit works with both dense and sparse data. We will demonstrate the curve fit on the dense data since we generated it based on sin(t)
, so this is the curve we want to fit through it. Do do so, let's define a similar function with parameters. Let's choose the form:
m(t,p) = @. p[1]*sin(p[2]*t) + p[3]*cos(p[4]*t)
m (generic function with 1 method)
Notice that this is a function on the whole array of t
and expects an array for the predicted u
out. This choice of m
is the assumption that our function is of the form p1*sin(p2*t)+p3*cos(p4*t)
. We want to find the p
to match our data. Let's start with the guess of every p
being zero, that is p=ones(4)
. Then we would fit this curve using:
A = Curvefit(u,t,m,ones(4))
ERROR: MethodError: no method matching Curvefit(::Array{Float64,1}, ::Array{Float64,1}, ::typeof(Main.WeaveSandBox0.m), ::Array{Float64,1}) Closest candidates are: Curvefit(::Any, ::Any, ::Any, ::Any, !Matched::Any) at /Users/andreasnoack/.julia/dev/DataInterpolations/src/interpolation_caches.jl:377 Curvefit(::Any, ::Any, ::Any, ::Any, !Matched::Any, !Matched::Any) at /Users/andreasnoack/.julia/dev/DataInterpolations/src/interpolation_caches.jl:377 Curvefit(::Any, ::Any, ::Any, ::Any, !Matched::Any, !Matched::Any, !Matched::Any) at /Users/andreasnoack/.julia/dev/DataInterpolations/src/interpolation_caches.jl:377 ...
scatter(t,u,label="points",legend=:bottomright) plot!(A)
We can check what the fitted parameters are via:
A.c_f.param
ERROR: type GPInterpolation has no field c_f
Notice that it essentially made p3=0
with p1=p2=1
, meaning it approximately found sin(t)
! But note that the ability to fit is dependent on the initial parameters. For example, with p=zeros(4)
as the initial parameters the fit is not good:
A = Curvefit(u,t,m,zeros(4))
ERROR: MethodError: no method matching Curvefit(::Array{Float64,1}, ::Array{Float64,1}, ::typeof(Main.WeaveSandBox0.m), ::Array{Float64,1}) Closest candidates are: Curvefit(::Any, ::Any, ::Any, ::Any, !Matched::Any) at /Users/andreasnoack/.julia/dev/DataInterpolations/src/interpolation_caches.jl:377 Curvefit(::Any, ::Any, ::Any, ::Any, !Matched::Any, !Matched::Any) at /Users/andreasnoack/.julia/dev/DataInterpolations/src/interpolation_caches.jl:377 Curvefit(::Any, ::Any, ::Any, ::Any, !Matched::Any, !Matched::Any, !Matched::Any) at /Users/andreasnoack/.julia/dev/DataInterpolations/src/interpolation_caches.jl:377 ...
scatter(t,u,label="points",legend=:bottomright) plot!(A)
And the parameters show the issue:
A.c_f.param
ERROR: type GPInterpolation has no field c_f