This example demonstrates how to use a product group to manipulate a multivariate regression model in an algebraically precise way. It builds on the example monomial_regression.jl
.
Let m
be a model like
f(x,y,z) = a*x + b*y
x,y = rand(Normal(0,1), (n,2))
target[i] = f(x[i],y[i]) + rand(Normal(0,1), 1)
min_{a,b} sum_i (f(x[i],y[i],z[i]) - target[i])^2
So it is a bivariate least squares regression model. Define the set of transformations $T$ as the free monoid generated by
x^i->x^i+1
x^i->x^i-1
y^i->y^i+1
y^i->y^i-1
The free monoid over this set of transformations is actually the product of cyclic groups $Z^2 = Z x Z$ with the group operation of $+:Z^2->Z^2$. so you get the set of models:
f(x,y) = a*x^i + b*y^j
x,y = rand(Normal(0,1), (n,2))
target[i] = f(x[i],y[i]) + rand(Normal(0,1), 1)
min_{a,b} sum_i (f(x[i],y[i]) - target[i])^2
for all combinations of integers $i,j$ which is a pretty cool structure. If you do the transformations with i mod N
then you get $Z_N^2$ which is a finite abelian group. It doesn't get more computationally tractable than component-wise arithmetic mod N.
We could then talk about the orbits of a given model under this group. For this group, there is only one orbit, but if you take subgroups like the group generated by $(1,1)$ aka $\langle 1,1\rangle$ you get multiple orbits. For example if you start with
f(x,y) = a*x^1 + b*y^2
x,y = rand(Normal(0,1), (n,2))
target[i] = f(x[i],y[i]) + rand(Normal(0,1), 1)
min_{a,b} sum_i (f(x[i],y[i]) - target[i])^2
the orbit is all of the models of the form:
f(x,y) = a*x^1+i + b*y^2+i
x,y = rand(Normal(0,1), (n,2))
target[i] = f(x[i],y[i]) + rand(Normal(0,1), 1)
min_{a,b} sum_i (f(x[i],y[i]) - target[i])^2
where $i$ is in $Z_N$. You could reason about finding the best fitting model from a given orbit. and then comparing the orbits to see which one was contained the best fitting model.
Here is an example implementation of this theory
#Our working example of a multivariate nonlinear regression model
expr = quote
module Regression
using Random
using LsqFit
using LinearAlgebra
function f(X, β)
a = β[1]
b = β[2]
x = X[:, 1]
y = X[:, 2]
return a.*x.^0 .+ b.*y.^0
end
function sample(g::Function, n)
x = randn(Float64, n)
target = g(x) .+ randn(Float64, n[1])./1600
return x, target
end
function describe(fit)
if !fit.converged
error("Did not converge")
end
return (β = fit.param, r=norm(fit.resid,2))
end
#setup
# Random.seed!(42)
β = (1/2, 1/2)
n = (1000,2)
g(x) = β[1].*x[:,1].^2 + β[2].*x[:,2].^3
X, target = sample(g, n)
# x, y = X[:,1], X[:,2]
# @show size(X), size(target)
# loss(a) = sum((f.(a, x, y) .- target).^2)
# @show loss.([-1,-1/2,-1/4, 0, 1/4,1/3,1/2,2/3, 1])
a₀ = [1.5, 0.1]
try
ŷ₀ = f(X, a₀)
catch
error("Could not execute f on the initial data")
end
#solving
fit = curve_fit(f, X, target, a₀; autodiff=:forwarddiff)
result = describe(fit)
end
end;
The following code is the implementation details for representing the models as an AbstractProblem
and representing the transformations as Product{Tuple{Pow{Int}, Pow{Int}}}
and applying the transformations onto the models.
using SemanticModels
using SemanticModels.ModelTools
using SemanticModels.ModelTools.Transformations
import SemanticModels.ModelTools: model, AbstractModel
import SemanticModels.Parsers: findfunc, findassign
import Base: show
"""" MultivariateLsq
A program that solves min_β || f(X,β) - y ||_2
Example:
`f(X, β) = β[1]*X[:,1]^p + β[2]*X[:,2]^q`
"""
struct MultivariateLsq <: AbstractModel
expr
f
coefficient
p₀
end
function show(io::IO, m::MultivariateLsq)
write(io, "MultivariateLsq(\n f=$(repr(m.f)),\n coefficient=$(repr(m.coefficient)),\n p₀=$(repr(m.p₀))\n)")
end
function model(::Type{MultivariateLsq}, expr::Expr)
if expr.head == :block
return model(MultivariateLsq, expr.args[2])
end
objective = :l2norm
f = callsites(expr, :curve_fit)[end].args[3]
coeff = callsites(expr, f)[1].args[end]
p₀ = callsites(expr, :curve_fit)[end].args[end]
return MultivariateLsq(expr, f, coeff, p₀)
end
""" poly(m::MultivariateLsq)::Expr
find the part of the model that implements the polynomial model for regression.
"""
function poly(m::MultivariateLsq)
func = findfunc(m.expr, m.f)[1]
poly = func.args[2].args[end].args[1]
return poly
end
""" (t::Pow)(m::MultivariateLsq, i::Int)
apply the power transformation to the i-th coordinate of m.
Example:
If `m` is a program implementing `f(X, β) = β[1]*X[:,1]^p + β[2]*X[:,2]^q`
a) and `t = Pow(2)` then `t(m, 1)` is the model implementing
`f(X, β) = β[1]*X[:,1]^p+2 + β[2]*X[:,2]^q`.
or
b) and `t = Pow(2)` then `t(m, 2)` is the model implementing
`f(X, β) = β[1]*X[:,1]^p + β[2]*X[:,2]^q+2`.
See also [`(t::Pow)(m::MultivariateLsq, i::Int)`](@ref)
"""
function (t::Pow)(m::MultivariateLsq, i::Int)
p = poly(m)
comp = p.args[i+1]
pow = callsites(comp, :(.^))
pow[end].args[3] += t.inc
return m
end
""" (t::Product)(m::MultivariateLsq)
apply the product transformation to the model m.
For a MultivariateLsq model a product transformation works
by applying the transformation component-wise where each component represents a
column of the data.
Example:
If `m` is a program implementing `f(X, β) = β[1]*X[:,1]^p + β[2]*X[:,2]^q`
and `t = Pow(2,-1)` then `t(m)` is the model implementing
`f(X, β) = β[1]*X[:,1]^p+2 + β[2]*X[:,2]^q-1`.
"""
function (t::Product)(m::MultivariateLsq)
m = m
for (i, tᵢ) in enumerate(t.dims)
m = tᵢ(m, i)
end
return m
end
the Group Product{Tuple{Pow{Int},Pow{Int}}}
which is isomorphic to $(Z, +, 0)\times (Z,+,0)$ can act on our class of models
Let's build an instance of the model object from the code snippet expr
m = model(MultivariateLsq, expr)
@show m
poly(m)
We can get the poly from the model and show how the Pow group elements can act on it componentwise.
@show poly((one(Pow)^4)(deepcopy(m), 1))
@show poly((one(Pow)^4)(deepcopy(m), 2))
Some basis vectors will come in handy for building elements of the product group. $T_1,T_2$ are generators for $Z\times Z$
@show T₁ = Product((one(Pow), zero(Pow)))
@show T₂ = Product((zero(Pow), one(Pow)))
We use the Levenberg Marquardt algorithm for general least squares problems in data analysis. The goal is to fix a model class and find the best coefficients for that class. Our algebraic representation allows us to have a similar treatment of model class (in this case the exponents $i,j$ in our formula). We can sample from the orbits of the group when applied to the model and solve for the best coefficients in order to find the best model class.
In this case we have reduced the code search space to a model search space to an algebraic formulation.
m = model(MultivariateLsq, expr)
results = []
for i in 1:3
for j in 1:3
m₀ = deepcopy(m)
T = (T₁^i) ∘ (T₂^j)
mᵢⱼ = T(m₀)
p = poly(mᵢⱼ)
M = eval(mᵢⱼ.expr)
push!(results, (i=i, j=j, M.result..., p=p))
end
end
It turns out that this method recovers the true model order $2,3$
sort(results, by=x->x.r)
We have seen how abstract algebra can be applied to the category of models to build a systematic treatment of model augmentation. This proves that the model transformations can be arranged into a simple algebraic structure that can act on a model to build new models. The structure of the transformations are easier to analyze than the changes to the models themselves.