Semiparametric analysis of complex longitudinal data
Event history data consist of the longitudinal records of event occurrence times. Recurrent event data and panel count data are two common types of event history data that occur in many areas, such as medical studies and social sciences. A great deal of literature has been established for their analyses. Nevertheless, only limited research exists on the variable selection for recurrent event data and panel count data. The existing methods can be seen as direct generalizations of the available penalized procedures for linear models, but may not perform as well as expected due to the complex structure of event history data. The first and second parts of this dissertation then discuss simultaneous parameter estimation and variable selection for event history data. We present a new variable selection method with a new penalty function, which will be referred to as the broken adaptive ridge regression approach. In addition to the establishment of the oracle property, we also show that the proposed variable selection method has the clustering or grouping effect when covariates are highly correlated. Furthermore, the numerical studies are performed and indicate that the method works well for practical situations and can outperform the existing methods. Applications to real data are provided. Most of the existing studies of longitudinal data assume that covariates can be observed at the same observation times for the response variable, and the observation process is independent of the response variable completely or given covariates. In practice, the response variables and covariates are sometimes observed intermittently at different time points, leading to sparse asynchronous longitudinal data. The observation process may also be related to the response variable even given covariates and sometimes both issues can even occur at the same time. Although each of the two issues has been developed to address in literature, it does not seem to exist an established approach that can deal with both together. To address both issues simultaneously, the third part of this dissertation proposes a flexible semiparametric transformation conditional model and a kernel-weighted estimating equation based approach. The proposed estimators of regression parameters are shown to be consistent and asymptotically follow the normal distribution. For the assessment of the finite sample performance of the proposed method, an extensive simulation study is carried out and suggests that it performs well for practical situations. The approach is applied to a prospective HIV study that motivated this investigation.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License. Copyright held by author.