Partitioned Least Squares

Linear least squares is one of the most widely used regression methods among scientists in many fields. The simplicity of the model allows this method to be used when data is scarce and it is usually appealing to practitioners that need to gather some insight into the problem by inspecting the values of the learnt parameters. PartitionedLS is a variant of the linear least squares model allowing practitioners to partition the input features into groups of variables that they require to contribute similarly to the final result.

An example of analysing a dataset using PartitionedLS is given here

The model

The Partitioned Least Squares model is formally defined as:

\[\begin{gather*} \text{minimize}_{\mathbf{\alpha}, \mathbf{\beta}} \| \mathbf{X} \times (\mathbf{P} \circ \mathbf{\alpha}) \times \mathbf{\beta} - \mathbf{y} \|_2^2 \\ \begin{aligned} \quad s.t.\quad &\mathbf{\alpha} \succeq 0\\ &\mathbf{P}^T \times \mathbf{\alpha} = \mathbf{1}. \end{aligned} \end{gather*}\]

where:

  • $\mathbf{X}$ is a $N × M$ matrix or table with Continuous element scitype containing the examples for which the predictions are sought. Check column scitypes of a table X with schema(X).
  • $\mathbf{y}$ is a $N$ vector with Continuous element scitype. Check scitype with scitype(y).
  • $\mathbf{P}$ is a $M × K$ Int matrix specifying how to partition the $M$ attributes into $K$ subsets. $P_{m,k}$ should be 1 if attribute number $m$ belongs to partition $k$.
  • $\mathbf{\beta}$ is a vector weighting the importance of each set of attributes in the partition;
  • $\mathbf{\alpha}$ is a vector weighting the importance of each attribute within one of the sets in the partition. Note that the constraints imply that for each set in the partition the weights of the corresponding $\alpha$ variables are all positive and sum to $1$.

The PartitionedLS problem is non-convex and NP-complete. The library provides two algorithms to solve the problem anyway: an iterative algorithm based on the Alternating Least Squares approach and an optimal algorithm that guarantees requiring however exponential time in the cardinality of the partition (i.e., it is mainly useful when $K$ is small).

More details can be found in the paper Partitioned Least Squares.

To install this library

Just add it as a dependency to your Julia environment. Launch julia from the main directory of your project and enter the following commands:

# Opens the package manager REPL
]

# Activate you local environment (can be skipped if you want to install the library globally)
activate .

# Adds the library to the environment
add PartitionedLS

To use this library

You will need a matrix P describing the partitioning of your variables, e.g.:

P = [[1 0]; 
     [1 0]; 
     [0 1]]

specifies that the first and the second variable belongs to the first partition, while the third variable belongs to the second.

You have then the choice to use either the standard interface or the MLJ interface.

Standard interface

The standard interface defines a fit function for each of the implemented algorithms. The function returns a tuple containing:

  • a PartLSFitResult object containing the model and the parameters found by the algorithm;
  • nothing (this is mandated by the MLJ interface, but it is not used in this case).
  • a NamedTuple containing some additional information.

A complete example:


using PartitionedLS

X = [[1. 2. 3.]; 
     [3. 3. 4.]; 
     [8. 1. 3.]; 
     [5. 3. 1.]]

y = [1.; 
     1.; 
     2.; 
     3.]

P = [[1 0]; 
     [1 0]; 
     [0 1]]


# fit using the optimal algorithm 
result = fit(Opt, X, y, P, η = 0.0)


# Make predictions on the given data matrix. The function works
# with results returned by anyone of the solvers.
predict(result[1], X)

MLJ interface

The MLJ interface is a allows you to use the library in a more MLJ-like fashion. The interface is defined by the PartLS model, which can be used in the MLJ framework. The model can be used in the same way as any other MLJ model.

A complete example:

using MLJ
using PartitionedLS

X = [[1. 2. 3.]; 
     [3. 3. 4.]; 
     [8. 1. 3.]; 
     [5. 3. 1.]]

y = [1.;
     1.;
     2.;
     3.]

P = [[1 0]; 
     [1 0]; 
     [0 1]]

# Define the model

model = PartLS(P=P, Optimizer=Opt, η=0.0)

# Fit the model
mach = machine(model, X, y)
fit!(mach)

# Make predictions
predict(mach, X)

API Documentation

PartitionedLS.PartLSType
PartLS

A model type for fitting a partitioned least squares model to data. Both an MLJ and native interfacew are provided.

MLJ Interface

From MLJ, the type can be imported using

PartLS = @load PartLS pkg=PartitionedLS

Construct an instance with default hyper-parameters using the syntax model = PartLS(). Provide keyword arguments to override hyper-parameter defaults, as in model = PartLS(P=...).

Training data

In MLJ or MLJBase, bind an instance model to data with

mach = machine(model, X, y)

where

  • X: any matrix or table with Continuous element scitype. Check column scitypes of a table X with schema(X).

Train the machine using fit!(mach).

Hyper-parameters

  • Optimizer: the optimization algorithm to use. It can be Opt, Alt or BnB (names exported by PartitionedLS.jl).

  • P: the partition matrix. It is a binary matrix where each row corresponds to a partition and each column corresponds to a feature. The element P_{k, i} = 1 if feature i belongs to partition k.

  • η: the regularization parameter. It controls the strength of the regularization.

  • ϵ: the tolerance parameter. It is used to determine when the Alt optimization algorithm has converged. Only used by the Alt algorithm.

  • T: the maximum number of iterations. It is used to determine when to stop the Alt optimization algorithm has converged. Only used by the Alt algorithm.

  • rng: the random number generator to use.

    • If nothing, the global random number generator rand is used.

    • If an integer, the global number generator rand is used after seeding it with the given integer.

    • If an object of type AbstractRNG, the given random number generator is used.

Operations

  • predict(mach, Xnew): return the predictions of the model on new data Xnew

Fitted parameters

The fields of fitted_params(mach) are:

  • α: the values of the α variables. For each partition k, it holds the values of the α variables are such that $\sum_{i \in P_k} \alpha_{k} = 1$.
  • β: the values of the β variables. For each partition k, β_k is the coefficient that multiplies the features in the k-th partition.
  • t: the intercept term of the model.
  • P: the partition matrix. It is a binary matrix where each row corresponds to a partition and each column corresponds to a feature. The element P_{k, i} = 1 if feature i belongs to partition k.

Examples

PartLS = @load PartLS pkg=PartitionedLS

X = [[1. 2. 3.];
     [3. 3. 4.];
     [8. 1. 3.];
     [5. 3. 1.]]

y = [1.;
     1.;
     2.;
     3.]

P = [[1 0];
     [1 0];
     [0 1]]


model = PartLS(P=P)
mach = machine(model, X, y) |> fit!

# predictions on the training set:
predict(mach, X)

Native Interface

using PartitionedLS

X = [[1. 2. 3.];
     [3. 3. 4.];
     [8. 1. 3.];
     [5. 3. 1.]]

y = [1.;
     1.;
     2.;
     3.]

P = [[1 0];
     [1 0];
     [0 1]]


# fit using the optimal algorithm
result = fit(Opt, X, y, P, η = 0.0)
y_hat = predict(result.model, X)

For other fit keyword options, refer to the "Hyper-parameters" section for the MLJ interface.

source
PartitionedLS.PartLSFitResultType
struct PartLSFitResult

The PartLSFitResult struct represents the solution of the partitioned least squares problem. It contains the values of the α and β variables, the intercept t and the partition matrix P.

Fields

  • α::Vector{AbstractFloat}: The values of the α variables. For each partition $k$, it holds the values of the α variables are such that $\sum_{i \in P_k} \alpha_{k} = 1$.
  • β::Vector{AbstractFloat}: The values of the β variables. For each partition $k$, $\beta_k$ is the coefficient that multiplies the features in the k-th partition.
  • t::AbstractFloat: The intercept term of the model.
  • P::Matrix{Int64}: The partition matrix. It is a binary matrix where each row corresponds to a partition and each column corresponds to a feature. The element $P_{k, i} = 1$ if feature $i$ belongs to partition $k$.
source
MLJModelInterface.fitFunction
fit(
    ::Type{Alt},
    X::Array{F<:AbstractFloat, 2},
    y::Array{F<:AbstractFloat, 1},
    P::Matrix{Int64};
    η,
    ϵ,
    T,
    nnlsalg,
    rng
) -> Tuple{PartLSFitResult, Nothing, NamedTuple{(:opt,), <:Tuple{Any}}}

Fits a PartitionedLS model by alternating the optimization of the α and β variables. This version uses an optimization strategy based on non-negative-least-squaes solvers. This formulation is faster and more numerically stable with respect to fit(Alt, ...)`.

Arguments

  • X: $N × M$ matrix or table with Continuous element scitype containing the examples for which the predictions are sought. Check column scitypes of a table X with schema(X).
  • y: $N$ vector with Continuous element scitype. Check scitype with scitype(y).
  • P: $M × K$ Int matrix specifying how to partition the $M$ attributes into $K$ subsets. $P_{m,k}$ should be 1 if attribute number $m$ belongs to partition $k$.
  • η: regularization factor, higher values implies more regularized solutions. Default is 0.0.
  • T: number of alternating loops to be performed. Default is 100.
  • ϵ: minimum relative improvement in the objective function before stopping the optimization. Default is 1e-6
  • nnlsalg: specific flavour of nnls algorithm to be used, possible values are :pivot, :nnls, :fnnls. Default is :nnls

Result

A Tuple with the following fields:

  1. a PartLSFitResult object containing the fitted model
  2. a nothing object
  3. a NamedTuple with a field opt containing the optimal value of the objective function
source
#(TYPEDSIGNATURES)

Fits a PartialLS Regression model to the given data and resturns the learnt model (see the Result section). It uses a coplete enumeration strategy which is exponential in K, but guarantees to find the optimal solution.

Arguments

  • X: $N × M$ matrix or table with Continuous element scitype containing the examples for which the predictions are sought. Check column scitypes of a table X with schema(X).
  • y: $N$ vector with Continuous element scitype. Check scitype with scitype(y).
  • P: $M × K$ Int matrix specifying how to partition the $M$ attributes into $K$ subsets. $P_{m,k}$ should be 1 if attribute number $m$ belongs to partition $k$.
  • η: regularization factor, higher values implies more regularized solutions (default: 0.0)
  • returnAllSolutions: if true an additional output is appended to the resulting tuple containing all solutions found during the algorithm.
  • nnlsalg: the kind of nnls algorithm to be used during solving. Possible values are :pivot, :nnls, :fnnls (default: :nnls)

Example

X = rand(100, 10)
y = rand(100)
P = [1 0 0; 0 1 0; 0 0 1; 1 1 0; 0 1 1]
result = fit(Opt, X, y, P)
source
fit(
    ::Type{BnB},
    X::Matrix{<:AbstractFloat},
    y::AbstractVector{<:AbstractFloat},
    P::Matrix{Int64};
    η,
    nnlsalg
) -> Tuple{PartLSFitResult, Nothing, NamedTuple{(:opt, :nopen), <:Tuple{Any, Any}}}

Implements the Branch and Bound algorithm to fit a Partitioned Least Squres model.

Arguments

  • X: $N × M$ matrix or table with Continuous element scitype containing the examples for which the predictions are sought. Check column scitypes of a table X with schema(X).
  • y: $N$ vector with Continuous element scitype. Check scitype with scitype(y).
  • P: $M × K$ Int matrix specifying how to partition the $M$ attributes into $K$ subsets. $P_{m,k}$ should be 1 if attribute number $m$ belongs to partition $k$.
  • η: regularization factor, higher values implies more regularized solutions (default: 0.0)
  • nnlsalg: the kind of nnls algorithm to be used during solving. Possible values are :pivot, :nnls, :fnnls (default: :nnls)

Result

A tuple with the following fields:

  1. a PartLSFitResult object containing the fitted model
  2. a nothing object
  3. a NamedTuple with fields:
    • opt containing the optimal value of the objective function
    • nopen containing the number of open nodes in the branch and bound tree
source
fit(
    m::PartLS,
    verbosity,
    X,
    y
) -> Tuple{PartLSFitResult, Nothing, Any}

Fits a PartitionedLS Regression model to the given data and resturns the learnt model (see the Result section). It conforms to the MLJ interface.

Arguments

  • m: A PartLS model to fit
  • verbosity: the verbosity level
  • X: any matrix or table with Continuous element scitype. Check column scitypes of a table X with schema(X).
  • y: any vector with Continuous element scitype. Check scitype with scitype(y).
source
MLJModelInterface.predictFunction
predict(
    α::AbstractVector{<:AbstractFloat},
    β::AbstractVector{<:AbstractFloat},
    t::AbstractFloat,
    P::Matrix{Int64},
    X::Matrix{<:AbstractFloat}
) -> Any

Result

the prediction for the partitioned least squares problem with solution α, β, t over the dataset X and partition matrix P

source
predict(
    model::PartLSFitResult,
    X::Matrix{<:AbstractFloat}
) -> Any

Make predictions for the datataset X using the PartialLS model model.

Arguments

  • model: a PartLSFitResult
  • X: any matrix or table with Continuous element scitype containing the examples for which the predictions are sought. Check column scitypes of a table X with schema(X).

Return

the predictions of the given model on examples in X.

source
predict(model::PartLS, fitresult, X) -> Any

Make predictions for the datataset X using the PartitionedLS model model. It conforms to the MLJ interface.

source
PartitionedLS.homogeneousCoordsFunction

Rewrites X and P in homogeneous coordinates. The result is a tuple (Xo, Po) where Xo is the homogeneous version of X and Po is the homogeneous version of P.

Arguments

  • X: any matrix or table with Continuous element scitype. Check column scitypes of a table X with schema(X).
  • P: the partition matrix

Return

  • Xo: the homogeneous version of X
  • Po: the homogeneous version of P
source
PartitionedLS.regularizeProblemFunction

Adds regularization terms to the problem. The regularization terms are added to the objective function as a sum of squares of the α variables. The regularization parameter η controls the strength of the regularization.

Arguments

  • X: any matrix or table with Continuous element scitype. Check column scitypes of a table X with schema(X).
  • y: any vector with Continuous element scitype. Check scitype with scitype(y).
  • P: the partition matrix
  • η: the regularization parameter

Return

  • Xn: the new data matrix
  • yn: the new target vector

Main idea

K new rows are added to the data matrix X, row $k \in \{1 \dots K\}$ is a vector of zeros except for the components that corresponds to features belonging to the k-th partition, which is set to sqrt(η). The target vector y is extended with K zeros.

The point of this change is that when the objective function is evaluated as $math \|Xw - y\|^2$, the new part of the matrix contributes to the loss with a factor of $η \sum \|w_i\|^2$ . This is equivalent to adding a regularization term to the objective function.

source