| Title: | Regularized Multi-Task Learning |
|---|---|
| Description: | Efficient solvers for 10 regularized multi-task learning algorithms applicable for regression, classification, joint feature selection, task clustering, low-rank learning, sparse learning and network incorporation. Based on the accelerated gradient descent method, the algorithms feature a state-of-art computational complexity O(1/k^2). Sparse model structure is induced by the solving the proximal operator. The detail of the package is described in the paper of Han Cao and Emanuel Schwarz (2018) <doi:10.1093/bioinformatics/bty831>. |
| Authors: | Han Cao [cre, aut, cph], Emanuel Schwarz [aut] |
| Maintainer: | Han Cao <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.0 |
| Built: | 2026-05-21 08:40:22 UTC |
| Source: | https://github.com/transbiozi/rmtl |
Calculate the averaged prediction error across tasks. For classification problem, the miss-classification rate is returned, and for regression problem, the mean square error(MSE) is returned.
calcError(m, newX = NULL, newY = NULL)calcError(m, newX = NULL, newY = NULL)
m |
A MTL model |
newX |
The feature matrices of new individuals |
newY |
The responses of new individuals |
The averaged prediction error
#create example data data<-Create_simulated_data(Regularization="L21", type="Regression") #train a model model<-MTL(data$X, data$Y, type="Regression", Regularization="L21", Lam1=0.1, Lam2=0, opts=list(init=0, tol=10^-6, maxIter=1500)) #calculate the training error calcError(model, newX=data$X, newY=data$Y) #calculate the test error calcError(model, newX=data$tX, newY=data$tY)#create example data data<-Create_simulated_data(Regularization="L21", type="Regression") #train a model model<-MTL(data$X, data$Y, type="Regression", Regularization="L21", Lam1=0.1, Lam2=0, opts=list(init=0, tol=10^-6, maxIter=1500)) #calculate the training error calcError(model, newX=data$X, newY=data$Y) #calculate the test error calcError(model, newX=data$tX, newY=data$tY)
Create an example dataset which contains 1), training datasets (X: feature matrices, Y: response vectors); 2), test datasets
(tX: feature matrices, tY: response vectors); 3), the ground truth model (W: coefficient matrix) and 4), extra
information for some algorithms (i.e. a matrix for encoding the network information is necessary for calling the MTL method with network
structure(Regularization=Graph )
Create_simulated_data( t = 5, p = 50, n = 20, type = "Regression", Regularization = "L21" )Create_simulated_data( t = 5, p = 50, n = 20, type = "Regression", Regularization = "L21" )
t |
Number of tasks |
p |
Number of features |
n |
Number of samples of each task. For simplicity, all tasks contain the same number of samples. |
type |
The type of problem, must be "Regression" or "Classification" |
Regularization |
The type of MTL algorithm (cross-task regularizer). The value must be
one of { |
The example dataset.
data<-Create_simulated_data(t=5,p=50, n=20, type="Regression", Regularization="L21") str(data)data<-Create_simulated_data(t=5,p=50, n=20, type="Regression", Regularization="L21") str(data)
Perform the k-fold cross-validation to estimate the .
cvMTL( X, Y, type = "Classification", Regularization = "L21", Lam1_seq = 10^seq(1, -4, -1), Lam2 = 0, G = NULL, k = 2, opts = list(init = 0, tol = 10^-3, maxIter = 1000), stratify = FALSE, nfolds = 5, ncores = 2, parallel = FALSE )cvMTL( X, Y, type = "Classification", Regularization = "L21", Lam1_seq = 10^seq(1, -4, -1), Lam2 = 0, G = NULL, k = 2, opts = list(init = 0, tol = 10^-3, maxIter = 1000), stratify = FALSE, nfolds = 5, ncores = 2, parallel = FALSE )
X |
A set of feature matrices |
Y |
A set of responses, could be binary (classification
problem) or continues (regression problem). The valid
value of binary outcome |
type |
The type of problem, must be |
Regularization |
The type of MTL algorithm (cross-task regularizer). The value must be
one of { |
Lam1_seq |
A positive sequence of |
Lam2 |
A positive constant |
G |
A matrix to encode the network information. This parameter
is only used in the MTL with graph structure ( |
k |
A positive number to modulate the structure of clusters
with the default of 2. This parameter is only used in MTL with
clustering structure ( |
opts |
Options of the optimization procedure. One can set the
initial search point, the tolerance and the maximized number of
iterations through the parameter. The default value is
|
stratify |
|
nfolds |
The number of folds |
ncores |
The number of cores used for parallel computing with the default value of 2 |
parallel |
|
The estimated and related information
#create the example data data<-Create_simulated_data(Regularization="L21", type="Classification") #perform the cross validation cvfit<-cvMTL(data$X, data$Y, type="Classification", Regularization="L21", Lam2=0, opts=list(init=0, tol=10^-6, maxIter=1500), nfolds=5, stratify=TRUE, Lam1_seq=10^seq(1,-4, -1)) #show meta-infomration str(cvfit) #plot the CV accuracies across lam1 sequence plot(cvfit)#create the example data data<-Create_simulated_data(Regularization="L21", type="Classification") #perform the cross validation cvfit<-cvMTL(data$X, data$Y, type="Classification", Regularization="L21", Lam2=0, opts=list(init=0, tol=10^-6, maxIter=1500), nfolds=5, stratify=TRUE, Lam1_seq=10^seq(1,-4, -1)) #show meta-infomration str(cvfit) #plot the CV accuracies across lam1 sequence plot(cvfit)
Train a multi-task learning model.
MTL( X, Y, type = "Classification", Regularization = "L21", Lam1 = 0.1, Lam1_seq = NULL, Lam2 = 0, opts = list(init = 0, tol = 10^-3, maxIter = 1000), G = NULL, k = 2 )MTL( X, Y, type = "Classification", Regularization = "L21", Lam1 = 0.1, Lam1_seq = NULL, Lam2 = 0, opts = list(init = 0, tol = 10^-3, maxIter = 1000), G = NULL, k = 2 )
X |
A set of feature matrices |
Y |
A set of responses, could be binary (classification
problem) or continues (regression problem). The valid
value of binary outcome |
type |
The type of problem, must be |
Regularization |
The type of MTL algorithm (cross-task regularizer). The value must be
one of { |
Lam1 |
A positive constant |
Lam1_seq |
A positive sequence of |
Lam2 |
A non-negative constant |
opts |
Options of the optimization procedure. One can set the
initial search point, the tolerance and the maximized number of
iterations using this parameter. The default value is
|
G |
A matrix to encode the network information. This parameter
is only used in the MTL with graph structure ( |
k |
A positive number to modulate the structure of clusters
with the default of 2. This parameter is only used in MTL with
clustering structure ( |
The trained model including the coefficient matrix W
and intercepts C and related meta information
#create the example data data<-Create_simulated_data(Regularization="L21", type="Regression") #train a MTL model #cold-start model<-MTL(data$X, data$Y, type="Regression", Regularization="L21", Lam1=0.1, Lam2=0, opts=list(init=0, tol=10^-6, maxIter=1500)) #warm-start model<-MTL(data$X, data$Y, type="Regression", Regularization="L21", Lam1=0.1, Lam1_seq=10^seq(1,-4, -1), Lam2=0, opts=list(init=0, tol=10^-6, maxIter=1500)) #meta-information str(model) #plot the historical objective values plotObj(model)#create the example data data<-Create_simulated_data(Regularization="L21", type="Regression") #train a MTL model #cold-start model<-MTL(data$X, data$Y, type="Regression", Regularization="L21", Lam1=0.1, Lam2=0, opts=list(init=0, tol=10^-6, maxIter=1500)) #warm-start model<-MTL(data$X, data$Y, type="Regression", Regularization="L21", Lam1=0.1, Lam1_seq=10^seq(1,-4, -1), Lam2=0, opts=list(init=0, tol=10^-6, maxIter=1500)) #meta-information str(model) #plot the historical objective values plotObj(model)
Plot the cross-validation curve
## S3 method for class 'cvMTL' plot(x, ...)## S3 method for class 'cvMTL' plot(x, ...)
x |
The returned object of function |
... |
Other parameters |
#create the example data data<-Create_simulated_data(Regularization="L21", type="Classification") #perform the cv cvfit<-cvMTL(data$X, data$Y, type="Classification", Regularization="L21", Lam2=0, opts=list(init=0, tol=10^-6, maxIter=1500), nfolds=5, stratify=TRUE, Lam1_seq=10^seq(1,-4, -1)) #plot the curve plot(cvfit)#create the example data data<-Create_simulated_data(Regularization="L21", type="Classification") #perform the cv cvfit<-cvMTL(data$X, data$Y, type="Classification", Regularization="L21", Lam2=0, opts=list(init=0, tol=10^-6, maxIter=1500), nfolds=5, stratify=TRUE, Lam1_seq=10^seq(1,-4, -1)) #plot the curve plot(cvfit)
Plot the values of objective function across iterations in the optimization procedure. This function indicates the "inner status" of the solver during the optimization, and could be used for diagnosis of the solver and training procedure.
plotObj(m)plotObj(m)
m |
A trained MTL model |
#create the example date data<-Create_simulated_data(Regularization="L21", type="Regression") #Train a MTL model model<-MTL(data$X, data$Y, type="Regression", Regularization="L21", Lam1=0.1, Lam2=0, opts=list(init=0, tol=10^-6, maxIter=1500)) #plot the objective values plotObj(model)#create the example date data<-Create_simulated_data(Regularization="L21", type="Regression") #Train a MTL model model<-MTL(data$X, data$Y, type="Regression", Regularization="L21", Lam1=0.1, Lam2=0, opts=list(init=0, tol=10^-6, maxIter=1500)) #plot the objective values plotObj(model)
Predict the outcomes of new individuals. For classification, the probability of the individual being assigned to positive label P(y==1) is estimated, and for regression, the prediction score is estimated
## S3 method for class 'MTL' predict(object, newX = NULL, ...)## S3 method for class 'MTL' predict(object, newX = NULL, ...)
object |
A trained MTL model |
newX |
The feature matrices of new individuals |
... |
Other parameters |
The predictive outcome
#Create data data<-Create_simulated_data(Regularization="L21", type="Regression") #Train model<-MTL(data$X, data$Y, type="Regression", Regularization="L21", Lam1=0.1, Lam2=0, opts=list(init=0, tol=10^-6, maxIter=1500)) predict(model, newX=data$tX)#Create data data<-Create_simulated_data(Regularization="L21", type="Regression") #Train model<-MTL(data$X, data$Y, type="Regression", Regularization="L21", Lam1=0.1, Lam2=0, opts=list(init=0, tol=10^-6, maxIter=1500)) predict(model, newX=data$tX)
Print the meta information of the model
## S3 method for class 'MTL' print(x, ...)## S3 method for class 'MTL' print(x, ...)
x |
A trained MTL model |
... |
Other parameters |
#create data data<-Create_simulated_data(Regularization="L21", type="Regression") #train a MTL model model<-MTL(data$X, data$Y, type="Regression", Regularization="L21", Lam1=0.1, Lam2=0, opts=list(init=0, tol=10^-6, maxIter=1500)) #print the information of the model print(model)#create data data<-Create_simulated_data(Regularization="L21", type="Regression") #train a MTL model model<-MTL(data$X, data$Y, type="Regression", Regularization="L21", Lam1=0.1, Lam2=0, opts=list(init=0, tol=10^-6, maxIter=1500)) #print the information of the model print(model)