Multivariate Nonlinear Regression

This example demonstrates how to use a product group to manipulate a multivariate regression model in an algebraically precise way. It builds on the example monomial_regression.jl.

Let m be a model like

f(x,y,z) = a*x + b*y
x,y = rand(Normal(0,1), (n,2))
target[i] = f(x[i],y[i]) + rand(Normal(0,1), 1)
min_{a,b} sum_i (f(x[i],y[i],z[i]) - target[i])^2

So it is a bivariate least squares regression model. Define the set of transformations $T$ as the free monoid generated by

x^i->x^i+1
x^i->x^i-1
y^i->y^i+1
y^i->y^i-1

The free monoid over this set of transformations is actually the product of cyclic groups $Z^2 = Z x Z$ with the group operation of $+:Z^2->Z^2$. so you get the set of models:

f(x,y) = a*x^i + b*y^j
x,y = rand(Normal(0,1), (n,2))
target[i] = f(x[i],y[i]) + rand(Normal(0,1), 1)
min_{a,b} sum_i (f(x[i],y[i]) - target[i])^2

for all combinations of integers $i,j$ which is a pretty cool structure. If you do the transformations with i mod N then you get $Z_N^2$ which is a finite abelian group. It doesn't get more computationally tractable than component-wise arithmetic mod N.

We could then talk about the orbits of a given model under this group. For this group, there is only one orbit, but if you take subgroups like the group generated by $(1,1)$ aka $\langle 1,1\rangle$ you get multiple orbits. For example if you start with

f(x,y) = a*x^1 + b*y^2
x,y = rand(Normal(0,1), (n,2))
target[i] = f(x[i],y[i]) + rand(Normal(0,1), 1)
min_{a,b} sum_i (f(x[i],y[i]) - target[i])^2

the orbit is all of the models of the form:

f(x,y) = a*x^1+i + b*y^2+i
x,y = rand(Normal(0,1), (n,2))
target[i] = f(x[i],y[i]) + rand(Normal(0,1), 1)
min_{a,b} sum_i (f(x[i],y[i]) - target[i])^2

where $i$ is in $Z_N$. You could reason about finding the best fitting model from a given orbit. and then comparing the orbits to see which one was contained the best fitting model.

Here is an example implementation of this theory

Example Model

In [1]:
#Our working example of a multivariate nonlinear regression model
expr = quote
    module Regression
    using Random
    using LsqFit
    using LinearAlgebra

    function f(X, β)
        a = β[1]
        b = β[2]
        x = X[:, 1]
        y = X[:, 2]
        return a.*x.^0 .+ b.*y.^0
    end

    function sample(g::Function, n)
        x = randn(Float64, n)
        target = g(x) .+ randn(Float64, n[1])./1600
        return x, target
    end

    function describe(fit)
        if !fit.converged
            error("Did not converge")
        end
        return (β = fit.param, r=norm(fit.resid,2))
    end
    #setup

    # Random.seed!(42)
    β = (1/2, 1/2)
    n = (1000,2)
    g(x) = β[1].*x[:,1].^2 + β[2].*x[:,2].^3
    X, target = sample(g, n)
    # x, y = X[:,1], X[:,2]
    # @show size(X), size(target)
    # loss(a) = sum((f.(a, x, y) .- target).^2)
    # @show loss.([-1,-1/2,-1/4, 0, 1/4,1/3,1/2,2/3, 1])

    a₀ = [1.5, 0.1]
    try
        ŷ₀ = f(X, a₀)
    catch
        error("Could not execute f on the initial data")
    end

    #solving
    fit = curve_fit(f, X, target, a₀; autodiff=:forwarddiff)
    result = describe(fit)
    end
end;

Implementation Details

The following code is the implementation details for representing the models as an AbstractProblem and representing the transformations as Product{Tuple{Pow{Int}, Pow{Int}}} and applying the transformations onto the models.

In [2]:
using SemanticModels
using SemanticModels.ModelTools
using SemanticModels.ModelTools.Transformations
In [3]:
import SemanticModels.ModelTools: model, AbstractModel
import SemanticModels.Parsers: findfunc, findassign
import Base: show
In [4]:
""""    MultivariateLsq


A program that solves min_β || f(X,β) - y ||_2

Example:

`f(X, β) = β[1]*X[:,1]^p + β[2]*X[:,2]^q`
"""
struct MultivariateLsq <: AbstractModel
    expr
    f
    coefficient
    p₀
end
Out[4]:
MultivariateLsq
In [5]:
function show(io::IO, m::MultivariateLsq)
    write(io, "MultivariateLsq(\n  f=$(repr(m.f)),\n  coefficient=$(repr(m.coefficient)),\n  p₀=$(repr(m.p₀))\n)")
end
Out[5]:
show (generic function with 354 methods)
In [6]:
function model(::Type{MultivariateLsq}, expr::Expr)
    if expr.head == :block
        return model(MultivariateLsq, expr.args[2])
    end
    objective = :l2norm
    f = callsites(expr, :curve_fit)[end].args[3]
    coeff = callsites(expr, f)[1].args[end]
    p₀ = callsites(expr, :curve_fit)[end].args[end]
    return MultivariateLsq(expr, f, coeff, p₀)
end
Out[6]:
model (generic function with 7 methods)
In [7]:
"""    poly(m::MultivariateLsq)::Expr

find the part of the model that implements the polynomial model for regression.
"""
function poly(m::MultivariateLsq)
    func = findfunc(m.expr, m.f)[1]
    poly = func.args[2].args[end].args[1]
    return poly
end
Out[7]:
poly
In [8]:
"""    (t::Pow)(m::MultivariateLsq, i::Int)

apply the power transformation to the i-th coordinate of m.

Example:

If `m` is a program implementing `f(X, β) = β[1]*X[:,1]^p + β[2]*X[:,2]^q`

a) and `t = Pow(2)` then `t(m, 1)` is the model implementing
`f(X, β) = β[1]*X[:,1]^p+2 + β[2]*X[:,2]^q`.

or

b) and `t = Pow(2)` then `t(m, 2)` is the model implementing
`f(X, β) = β[1]*X[:,1]^p + β[2]*X[:,2]^q+2`.

See also [`(t::Pow)(m::MultivariateLsq, i::Int)`](@ref)
"""
function (t::Pow)(m::MultivariateLsq, i::Int)
    p = poly(m)
    comp = p.args[i+1]
    pow = callsites(comp, :(.^))
    pow[end].args[3] += t.inc
    return m
end
Out[8]:
SemanticModels.ModelTools.Transformations.Pow
In [9]:
"""    (t::Product)(m::MultivariateLsq)

apply the product transformation to the model m.
For a MultivariateLsq model a product transformation works
by applying the transformation component-wise where each component represents a
column of the data.

Example:

If `m` is a program implementing `f(X, β) = β[1]*X[:,1]^p + β[2]*X[:,2]^q`
and `t = Pow(2,-1)` then `t(m)` is the model implementing
`f(X, β) = β[1]*X[:,1]^p+2 + β[2]*X[:,2]^q-1`.
"""
function (t::Product)(m::MultivariateLsq)
    m = m
    for (i, tᵢ) in enumerate(t.dims)
        m = tᵢ(m, i)
    end
    return m
end
Out[9]:
SemanticModels.ModelTools.Transformations.Product

Model Augmentation with Group Actions

the Group Product{Tuple{Pow{Int},Pow{Int}}} which is isomorphic to $(Z, +, 0)\times (Z,+,0)$ can act on our class of models

Let's build an instance of the model object from the code snippet expr

In [10]:
m = model(MultivariateLsq, expr)
@show m
poly(m)
m = MultivariateLsq(
  f=:f,
  coefficient=:β,
  p₀=:a₀
)
Out[10]:
:(a .* x .^ 0 .+ b .* y .^ 0)

We can get the poly from the model and show how the Pow group elements can act on it componentwise.

In [11]:
@show poly((one(Pow)^4)(deepcopy(m), 1))
@show poly((one(Pow)^4)(deepcopy(m), 2))
poly((one(Pow) ^ 4)(deepcopy(m), 1)) = :(a .* x .^ 4 .+ b .* y .^ 0)
poly((one(Pow) ^ 4)(deepcopy(m), 2)) = :(a .* x .^ 0 .+ b .* y .^ 4)
Out[11]:
:(a .* x .^ 0 .+ b .* y .^ 4)

Some basis vectors will come in handy for building elements of the product group. $T_1,T_2$ are generators for $Z\times Z$

In [12]:
@show T₁ = Product((one(Pow), zero(Pow)))
@show T₂ = Product((zero(Pow), one(Pow)))
T₁ = Product((one(Pow), zero(Pow))) = (Pow(1), Pow(0))
T₂ = Product((zero(Pow), one(Pow))) = (Pow(0), Pow(1))
Out[12]:
(Pow(0), Pow(1))

Exploring the Orbits

We use the Levenberg Marquardt algorithm for general least squares problems in data analysis. The goal is to fix a model class and find the best coefficients for that class. Our algebraic representation allows us to have a similar treatment of model class (in this case the exponents $i,j$ in our formula). We can sample from the orbits of the group when applied to the model and solve for the best coefficients in order to find the best model class.

In this case we have reduced the code search space to a model search space to an algebraic formulation.

In [13]:
m = model(MultivariateLsq, expr)
results = []
for i in 1:3
    for j in 1:3
        m₀ = deepcopy(m)
        T = (T₁^i)  (T₂^j)
        mᵢⱼ = T(m₀)
        p = poly(mᵢⱼ)
        M = eval(mᵢⱼ.expr)
        push!(results, (i=i, j=j, M.result..., p=p))
    end
end
WARNING: replacing module Regression.
WARNING: replacing module Regression.
WARNING: replacing module Regression.
WARNING: replacing module Regression.
WARNING: replacing module Regression.
WARNING: replacing module Regression.
WARNING: replacing module Regression.
WARNING: replacing module Regression.

It turns out that this method recovers the true model order $2,3$

In [14]:
sort(results, by=x->x.r)
Out[14]:
9-element Array{Any,1}:
 (i = 2, j = 3, β = [0.499994, 0.499999], r = 0.019543920062539754, p = :(a .* x .^ 2 .+ b .* y .^ 3))
 (i = 1, j = 3, β = [-0.0286846, 0.50621], r = 27.71519694321366, p = :(a .* x .^ 1 .+ b .* y .^ 3))  
 (i = 3, j = 3, β = [-0.00581907, 0.49535], r = 28.603785715392704, p = :(a .* x .^ 3 .+ b .* y .^ 3))
 (i = 3, j = 1, β = [0.0238286, 1.38205], r = 43.276924121433915, p = :(a .* x .^ 3 .+ b .* y .^ 1))  
 (i = 2, j = 1, β = [0.527814, 1.61601], r = 50.57776110721493, p = :(a .* x .^ 2 .+ b .* y .^ 1))    
 (i = 2, j = 2, β = [0.565564, -0.221878], r = 54.29029352260181, p = :(a .* x .^ 2 .+ b .* y .^ 2))  
 (i = 1, j = 1, β = [0.0352979, 1.55163], r = 58.213154771372025, p = :(a .* x .^ 1 .+ b .* y .^ 1))  
 (i = 1, j = 2, β = [-0.191248, 0.0919223], r = 62.50087875442424, p = :(a .* x .^ 1 .+ b .* y .^ 2)) 
 (i = 3, j = 2, β = [-0.069573, 0.0807769], r = 64.98340128442709, p = :(a .* x .^ 3 .+ b .* y .^ 2)) 

Conclusions

We have seen how abstract algebra can be applied to the category of models to build a systematic treatment of model augmentation. This proves that the model transformations can be arranged into a simple algebraic structure that can act on a model to build new models. The structure of the transformations are easier to analyze than the changes to the models themselves.