Tutorial for the use of package "regpro"

# Introduction

We start the tutorial with data visualization tools like QQ plots, tail plots, and smooth scatter plots. Then we give an introduction for calculating an estimator in the linear regression, kernel regression, local linear regression, additive model, single index model, and forward stagewise modeling. Finally we introduce the calculation of linear and kernel quantile regression estimators. The functions are available in the R-package regpro''. This package can be used to learn and to implement regression methods. We introduce also programs of package denpro'' which contains tools for data visualization, function visualization, and density estimation.

# Contents

• 1 Data Visualization
• 1.1 QQ Plots
• 1.2 Tail Plots
• 1.3 Two Dimensional Scatter Plots
• 1.4 Three Dimensional Scatter Plots
• 2 Linear Regression
• 3 Kernel Regression
• 3.1 One Dimensional Kernel Regression
• 3.2 Moving Averages
• 3.3 Two Dimensional Kernel Regression
• 3.4 Three and Higher Dimensional Kernel Regression
• 3.5 Kernel Estimator of Derivatives
• 3.6 Combined State and Time Space Smoothing
• 4 Local Linear Regression
• 4.1 One Dimensional Local Linear Regression
• 4.2 Two Dimensional Local Linear Regression
• 4.3 Three and Higher Dimensional Local Linear Regression
• 4.4 Local Linear Derivative Estimation
• 5 Additive Models: Backfitting
• 6 Single Index Regression
• 6.1 Estimating the Index
• 6.2 Estimating the Link Function
• 6.3 Plotting the Single Index Regression Function
• 7 Forward Stagewise Modeling
• 7.1 Stagewise Fitting of Additive Models
• 7.2 Projection Pursuit Regression
• 8 Quantile Regression
• 8.1 Linear Quantile Regression
• 8.2 Kernel Quantile Regression

# 1 Data Visualization

## 1.1 QQ Plots

QQ plots are defined in Section 6.1.2 of the book. First we generate data from t-distribution. It is useful to set the seed of the random number generator using the function set.seed''. Then we can later reproduce the results. In the following sections of this tutorial we skip the application of function set.seed'' to save space.

set.seed(1)
dendat<-rt(1000,df=6)


We compare the sample to the normal distribution. The QQ plot is now the plot of the points

( x_{(i)} , F^{-1}(p_i) ) , i=1, … ,n ,

where

x_{(1)} ≤ … ≤ x_{(n)}

is the ordered sample, p_i = (i-1 ⁄ 2) ⁄ n,

and F is the distribution function of the standard normal distribution. We plot also the line x=y with the function segments''.

x<-dendat[order(dendat)]
n<-length(x)
p<-(seq(1:n)-1/2)/n
y<-qnorm(p,mean=mean(x),sd=sd(x))
plot(x,y)
segments(-6,-6,6,6)


## 1.2 Tail Plots

Tail plots are defined in Section 6.1.2 of the book. We show how a left tail plot can be made. The right tail plot can be made analogously. The left tail consists of the observations that are to the left from the median. The data is ordered before plotting. The left tail plot is a scatter plot of the observations in the left tail and the level, defined as the order statistics of the observations. The logarithmic scale is used for the y-axis.

split<-median(dendat)
left.tail<-dendat[(dendat < split)]
ord<-order(left.tail,decreasing=TRUE)
ordered.left.tail<-left.tail[ord]
level<-seq(length(left.tail),1)
plot(ordered.left.tail,level,log="y")


## 1.3 Two Dimensional Scatter Plots

Function plot'' can bes used to plot two dimensional scatter plots.

n<-20000
dendat<-matrix(rnorm(2*n),n,2)
plot(dendat)


When the sample size n is large it is useful to plot binned data, as in Figure 6.1(b) of the book. Bin counts can be calculated with the function pcf.histo'' in package denpro''. Parameter N gives the number of bins for each direction.

N<-c(100,100)
pcf<-pcf.histo(dendat,N)


Then we transform the bin counts to interval [0,1], make a scale of gray values, and use the function plot.histo'' from the package denpro''.

f<-sqrt(pcf$value) colo<-1-(f-min(f)+0.5)/(max(f)-min(f)+0.5) col<-gray(colo) plot.histo(pcf,col=col)  Available are also the function hexbin'' in the R package hexbin'' which uses hexagonal binning, and the function hist2d'' in the R package gplots''. ## 1.4 Three Dimensional Scatter Plots Let us simulate three dimensional data. dendat<-matrix(rnorm(3*20000),20000,3)  Then we calculate a three dimensional histogram. N<-c(100,100,100) pcf<-pcf.histo(dendat,N)  We use the function histo2data'' to make data points from the histogram. The centers of the bins are the new data points. We rotate the new data and project it to two dimension. Before plotting the data, we order it in such a way that the observations with the largest value for the third coordinate are plotted last. hd<-histo2data(pcf) alpha<-pi; beta<-pi; gamma<-0 rotdat<-rotation3d(hd$dendat,alpha,beta,gamma)
i1<-1; i2<-2; i3<-3
ord<-order(rotdat[,i3]); plotdat<-rotdat[ord,c(i1,i2)]
contour(dp$x,dp$y,dp$z,nlevels=30)  ### 3.3.2 Level Set Trees Functionpcf.kernesti'' gives an output that can be used to calculate level set trees, using package denpro''. Funnction leafsfirst'' calculates a level set tree using the Leafsfirst algorithm. Function plotvolu'' plots a volume plot. Parameter cutlev'' can be used to cut the lower part of the tree and to zoom into the higher levels. Function plotbary'' plots the barycenter plot of the level set tree. Parameter coordi'' is used to choose the coordinate of the barycenter plot. Function plottree'' plots the tree structure of the level set tree. Function treedisc'' can be used to prune the level set tree to make the plotting faster. pcf<-pcf.kernesti(x,y,h=0.5,N=c(15,15)) lst<-leafsfirst(pcf,lowest="regr") plotvolu(lst,lowest="regr") plotvolu(lst,lowest="regr",cutlev=0.05) plotbary(lst,lowest="regr") # barycenter plot,1st coord. plotbary(lst,coordi=2,lowest="regr") # 2nd coord. plottree(lst,lowest="regr") # plot level set tree td<-treedisc(lst,pcf,ngrid=30,lowest="regr") plotvolu(td,modelabel=FALSE,lowest="regr")  ## 3.4 Three and Higher Dimensional Kernel Regression Three and higher dimensional regression function estimates can be visualized by one or two dimensional slices, partial dependency plots, and level set tree based visualizations. First we simulate data. The regression function is a mixture f(x) = ∑_{i=1}^4 φ(x-c_i) , where c_1=D× (1/2,0,0), c_2=D× (-1/2,0,0), c_3=D× (0,\sqrt{3}/2,0), c_4=D× (0,1/(2\sqrt{3}),\sqrt{2/3}), D=3, and φ is the density function of the standard normal density. The distribution of the explanatory variables X=(X_1,X_2,X_3) is uniform on [-3,3]^3 and the error term ε has distribution N(0,σ^2). The sample size is denoted n and the number of explanatory variables is denoted d. n<-1000; d<-3 x<-8*matrix(runif(n*d),n,d)-3 C<-(2*pi)^(-d/2) phi<-function(x){ return( C*exp(-sum(x^2)/2) ) } D<-3 c1<-D*c(1/2,0,0) c2<-D*c(-1/2,0,0) c3<-D*c(0,sqrt(3)/2,0) c4<-D*c(0,1/(2*sqrt(3)),sqrt(2/3)) fun<-function(x){phi(x-c1)+phi(x-c2)+phi(x-c3)+phi(x-c4)} y<-matrix(0,n,1) for (i in 1:n) y[i]<-fun(x[i,])+0.01*rnorm(1)  ### 3.4.1 Slices A one dimensional slice of regression function estimate \hat{f} :{\bf R}^d → {\bf R} is g(x_1) = \hat{f}(x_1,x_{0,2}, … ,x_{0,d}) , where x_{0,2}, … ,x_{0,d} is fixed point. We can choose x_{0,k} to be the sample median of the k:th coordinate of the vector of explanatory variables: x_{0,k} = \mbox{median}(X_{1,k},…,X_{n,k}) for k=2,…,d. One dimensional slices can be calculated using function pcf.kernesti.slice''. The parameter p=0.5 indicates that the fixed point is the median. The parameter N gives the number of evaluation points. pcf<-pcf.kernesti.slice(x,y,h=0.5,N=50,coordi=1,p=0.5) dp<-draw.pcf(pcf); plot(dp$x,dp$y,type="l")  We can compare the one dimensional slice to the one dimensional slice of the linear regression estimate. coordi<-1 notcoordi<-c(2,3) li<-linear(x,y); a<-li$beta1[notcoordi]
x0<-c(median(x[,2]),median(x[,3]))
flin<-li$beta0+li$beta1[coordi]*dp$x+sum(a*x0) plot(dp$x,flin,type="l")


We can now plot the slice of kernel regression estimate and the slice of linear regression in the same figure.

ylim<-c(min(flin,dp$y),max(flin,dp$y))
contour(dp$x,dp$y,dp$z,nlevels=30)  # 5 Additive Models: Backfitting The additive models were covered in Section 4.2 of the book, where the backfitting algorithm was presented. In the two dimensional additive model the response variable Y can be written Y = f_1(X_1) + f_2(X_2) + ε , where X_1 and X_2 are the explanatory variables, f_k : {\bf R} → {\bf R} are the unknown components, and ε is an error term. First we simulate data from an additive model. The distribution of the explanatory variable X=(X_1,X_2) is uniform on [0,1]^2. The regression function is f(x_1,x_2) = f_1(x_1) + f_2(x_2), where f_1(x_1) = x_1^2 - EX_1^2, and f_2(x_2) = log(x_2) - E log(X_2). The response variable is Y = f(x_1,x_2) + ε, where ε ∼ N(0,σ^2). n<-100 d<-2 x<-matrix(runif(n*d),n,d) fun1<-function(t){t^2}; fun2<-function(t){log(t)} f<-matrix(0,n,d) f[,1]<-fun1(x[,1])-mean(fun1(x[,1])) f[,2]<-fun2(x[,2])-mean(fun2(x[,2])) y<-f[,1]+f[,2]+0.1*rnorm(n)  We estimate the additive model using function additive''. We need to give as arguments the n× d-matrix x of the values of the explanatory variables, the n vector y of the values of the response variable, the smoothing parameter h>0, and the number of iterations M ≥ 1. h<-0.1 M<-5 est<-additive(x,y,h=h,M=M)  The output est$eval is n× d matrix that contains the evaluations \hat{f}_k(X_{i,k}), i=1,…,n, k=1,…,d, where X_{ik} is the k:th component of the i:th observation. Next we plot the components of the estimate. The code below plots the estimate of the first component and the true first component, The functions are plotted at the observations X_{i1}, i=1,…,n.

or<-order(x[,1])
t<-x[or,1]
hatf1<-est$eval[or,1]; f1<-f[or,1] plot(t,y[or]) # data matplot(t,hatf1,type="l",add=TRUE) # estimate matplot(t,f1,type="l",add=TRUE) # true function  We can evaluate the estimate on a regular grid. We need to give the matrix est$eval as an argument. The matrix was calculated at a previous step. The code below calculates the estimates \hat{f}_1 and \hat{f}_2 and the estimate \hat{f}(x_1,x_2) = \bar{y} + \hat{f}_1(x_1) + \hat{f}_2(x_2) on a grid, where \bar{y} is the mean of the values of the response variable. Parameter num gives the number of grid points in one direction.

num<-50
t<-seq(1,num)/(num+1)
u<-t
hatf<-matrix(0,num,num)
hatf1<-matrix(0,num,1); hatf2<-matrix(0,num,1)
func<-function(arg){
additive(x,y,arg,h=h,M=M,eval=est$eval)$valvec
}
for (i in 1:num){ for (j in 1:num){
valvec<-func(c(t[i],u[j]))
hatf1[i]<-valvec; hatf2[j]<-valvec
hatf[i,j]<-mean(y)+sum(valvec)
} }
plot(t,hatf1,type="l")
persp(t,u,hatf,ticktype="detailed",phi=30,theta=-30)


Function pcf.additive'' can be used for perspective plots, contour plots, and level set trees.

N<-c(50,50)
pcf<-pcf.additive(x,y,N=N,h=h,eval=est$eval,M=M) dp<-draw.pcf(pcf,minval=min(pcf$value))
persp(dp$x,dp$y,dp$z,phi=30,theta=-30) contour(dp$x,dp$y,dp$z,nlevels=30)

lst<-leafsfirst(pcf)
plotvolu(lst,lowest="regr")


# 6 Single Index Regression

The single index model was defined in Section 4.1 of the book. In the single index model the response variable can be written

Y = g(θ'X) + ε ,

where X∈ {\bf R}^d, θ ∈ {\bf R}^d is an unknown direction vector with \|θ\|=1, and g :{\bf R} → {\bf R} is an unknown link function.

First we simulate data from a single index model. The distribution of the vector X=(X_1,X_2) of the explanatory variables is the standard normal 2D distribution. The error term is ε ∼ N(0,σ^2). The index vector is θ = (0,1) and the link function g is the distribution function Φ of the standard normal distribution.

n<-100
x<-matrix(rnorm(n*2),n,2)
theta<-matrix(c(0,1),2,1)
x1d<-x%*%theta
y<-pnorm(x1d)+0.1*rnorm(n)


## 6.1 Estimating the Index

We cover four estimators discussed in Section 4.1.2 of the book: derivative method, average derivative method, numerical minimization to find the least squares solution, and a stagewise algorithm to find the least squares solution.

The average derivative method was defined in equation (4.9) of the book. The following commands can be used to to estimate the direction vector θ with the average derivative method.

method<-"aved"
h<-1.5
hat.theta<-single.index(x,y,h=h,method=method)


The derivative method was defined in equation (4.8) of the book. In the derivative method we have to specify additionally the point at which the gradient is estimated. We choose the point (0,0).

method<-"poid"
h<-1.5
argd<-c(0,0)
hat.theta<-single.index(x,y,h=h,method=method,argd=argd)


The direction vector can be estimated using numerical minimization to find the solution in the least squares problem of equation (4.2) of the book. The numerical minimization can be performed with the following commands.

method<-"nume"; h<-1.5
hat.theta<-single.index(x,y,h=h,method=method)


The least squares problem in equation (4.2) of the book can be solved by using an iterative method, as explained in Section 4.1.2 of the book. The following commands can be used to apply the iterative method. Argument M gives the number of iterations.

method<-"iter"
h<-1.5
M<-10
hat.theta<-single.index(x,y,h=h,method=method,M=M)


## 6.2 Estimating the Link Function

After estimating the direction vector θ, we have to estimate the link function g:{\bf R} → {\bf R}. We do this by the kernel estimator using the following commands, which plot also the true link function.

x1d<-x%*%hat.theta  # project data to 1D
pcf<-pcf.kernesti(x1d,y,h=0.3,N=20)
dp<-draw.pcf(pcf)
matplot(dp$x,dp$y,type="l",ylim=c(min(y,0),max(y,1)))
matplot(dp$x,pnorm(dp$x),type="l",add=TRUE)


## 6.3 Plotting the Single Index Regression Function

We can estimate the regression function $f(x) = g(θ'x)$ on a grid with the function pcf.single.index''.

h<-0.3
N<-c(40,40)
method<-"poid"
pcf<-pcf.single.index(x,y,h=h,N=N,method=method)
dp<-draw.pcf(pcf)
persp(dp$x,dp$y,dp$z,ticktype="detailed",phi=3,theta=30)  If the data does not come from a single index model, it can be better to use the derivative method, and estimate the gradient at the point of evaluation. We give an example where the regression function is the standard normal density function. First we simulate data. n<-1000 x<-matrix(6*runif(n*2)-3,n,2) C<-(2*pi)^(-1) phi<-function(x){ return( C*exp(-rowSums(x^2)/2) ) } y<-phi(x)+0.1*rnorm(n)  Then we use the derivative method and estimate the direction so that the gradient is estimated separately at each the point of evaluation. method<-"poid" h<-1.5 num<-50 t<-6*seq(1,num)/(num+1)-3; u<-t hatf<-matrix(0,num,num) for (i in 1:num){ for (j in 1:num){ arg<-c(t[i],u[j]) hatf[i,j]<- single.index(x,y,arg=arg,h=h,method=method,argd=arg) } } persp(t,u,hatf,ticktype="detailed",phi=30,theta=-30)  # 7 Forward Stagewise Modeling Algorithms for forward stagewise modeling were given in Section 5.4 of the book. We include the stagewise fitting of additive models of Section 5.4.2 of the book and projection pursuit regression of Section 5.4.3 of the book to this tutorial. ## 7.1 Stagewise Fitting of Additive Models An algorithm for stagewise fitting of additive models was given in Section 5.4.2 of the book. The stagewise fitting is an alternative method to backfitting for finding an estimate with the additive structure. We use the same simulated data as in the case of backfitting the additive model in Section 5 of the tutorial: n<-100 d<-2 x<-matrix(runif(n*d),n,d) fun1<-function(t){t^2}; fun2<-function(t){log(t)} f<-matrix(0,n,d) f[,1]<-fun1(x[,1])-mean(fun1(x[,1])) f[,2]<-fun2(x[,2])-mean(fun2(x[,2])) y<-f[,1]+f[,2]+0.1*rnorm(n)  We estimate the additive model using function additive.stage''. The arguments are the n × d matrix x of the values of the explanatory variables, the n vector y of the values of the response variable, the smoothing parameter h>0, and the number of iterations M. h<-0.1 M<-5 est<-additive.stage(x,y,h=h,M=M)  The output est$eval'' is the n× d matrix that contains the evaluations \hat{f}_k(X_{i,k}), i=1,… ,n, k=1,… ,d. We can use the same code to plot the components as in the case of the backfitting algorithm.

We can evaluate the estimate on a regular grid. The evaluation proceeds differently than in the case of backfitting, because now we need to give as arguments the n× M matrix of residuals and the M vector, which indicates which variable is chosen at each step. These are given as est$residu'' and est$deet''. Note that in the case of backfitting it is enough to calculate the n × d matrix of final y-values, whereas to evaluate the stagewise estimator at an arbitrary point, we need to calculate and save the complete n× M matrix of residuals and the indicator vector which contains the estimated direction for each step.

num<-50
t<-seq(1,num)/(num+1)
u<-t
hatf<-matrix(0,num,num)
funi<-function(t,u){
additive.stage(x,y,c(t,u),h=h,M=M,residu=est$residu, deet=est$deet)$value } for (i in 1:num){ for (j in 1:num){ hatf[i,j]<-funi(t[i],u[j]) } } persp(t,u,hatf,ticktype="detailed",phi=30,theta=3)  ## 7.2 Projection Pursuit Regression An algorithm for projection pursuit regression was given in Section 5.4.3 of the book. We simulate data where X=(X_1,X_2) is uniformly distributed on [-3,3]^2, ε ∼ N(0,σ^2), and the regression function is the density function of the standard 2D normal distribution. n<-1000 x<-matrix(6*runif(n*2)-3,n,2) phi<-function(x){ (2*pi)^(-1)*exp(-rowSums(x^2)/2) } y<-phi(x)+0.1*rnorm(n)  We calculate the projection pursuit regression function estimate using function pp.regression''. The arguments are the n× d-matrix x of the values of the explanatory variables, the n vector y of the values of the response variable, the smoothing parameter h>0, and the number of iterations M. h<-0.5 M<-3 est<-pp.regression(x,y,h=h,M=M)  The output est$eval'' is a vector of length n that contains the evaluations \hat{f}(X_{i}), i=1,… ,n. We can evaluate the estimate on a regular grid. We need to give as arguments the n× M matrix of residuals and the M vector of directions \hat{ θ }_m. These are given as est$residu'' and est$teet''.

num<-30
t<-6*seq(1,num)/(num+1)-3
u<-t
hatf<-matrix(0,num,num)
funi<-function(t,u){
beta1<-op$par[2:(d+1)] # the coefficients  ## 8.2 Kernel Quantile Regression Function pcf.kern.quan'' implements kernel quantile regression. pcf<-pcf.kern.quan(x,y,h=0.5,N=50,p=0.1) dp<-draw.pcf(pcf); plot(x,y); lines(dp$x,dp\$y,type="l")


Let us look at the code of function pcf.kern.quan''. We want to calculate the estimate at point arg. First we calculate the weights, similarily as in kernel mean regression.

arg<-1
h<-0.5
p<-0.1
ker<-function(x){ (1-x^2)*(x^2<=1) }
w<-ker((arg-x)/h); ps<-w/sum(w)


Then we implement the rule given in equation (3.55) of the book. The estimate of p-quantile is hatq''.

or<-order(y)
ps.ord<-ps[or]
i<-1
zum<-0
while ((i<=n) && (zumn) hatq<-max(y) else hatq<-y[or[i]]


January 2014