Research

Robert Lunde (2023). On the Validity of Conformal Prediction for Network Data Under Non-Uniform Sampling. In submission.

Summary: We study the properties of conformal prediction for network data under various sampling schemes that often result in a non-representative sample of nodes. We interpret these sampling mechanisms as selection rules and show that conformal prediction remains finite-sample valid if the regression array is jointly exchangeable and the selection rule satisfies an invariance property. We also show that a weighted conformal prediction procedure is asymptotically valid for samples from a random walk on a graph under appropriate conditions.

Robert Lunde, Elizaveta Levina, and Ji Zhu (2023). Conformal Prediction for Network-Assisted Regression. Major Revision, Journal of the American Statistical Association.

Summary: We study the properties for conformal prediction in regression problems with network information. We show that conformal prediciton remains finite-sample valid in these settings under a joint exchangeability condition on a regression array and a mild symmetry condition on the network statistics. We also show that a form of asymptotic conditional validity is achievable.

Robert Lunde, Purnamrita Sarkar, and Rachel Ward (2021). Bootstrapping the Error of Oja’s Algorithm. Accepted at Neurips 2021, Spotlight Talk.

Summary: We establish a high-dimensional weighted chi-squared approximation for the the sine-squared error of Oja’s algorithm, a widely used method for streaming PCA. We also propose an online multiplier bootstrap method and establish consistency of the procedure.

Qiaohui Lin, Robert Lunde, and Purnamrita Sarkar (2020). Trading off Accuracy for Speedup: Multiplier Bootstraps for Subgraph Counts. In submission.

Summary: We propose a family of bootstrap procedures, ranging from a fast, randomized linear bootstrap for massive, sparse graphs to an accurate quadratic procedure for smaller graphs. We establish conditions under which the randomized linear bootstrap offers consistent inference for an appropriate target distribution and conditions under which higher-order correctness holds for the quadratic bootstrap.

Qiaohui Lin, Robert Lunde, and Purnamrita Sarkar (2020). On the Theoretical Properties of the Network Jackknife. ICML 2020.

Summary: We study the properties of a leave-node-out jackknife procedure for network data. Under the sparse graphon model, we prove an Efron-Stein type inequality, showing the jackknife estimate of the variance is conservative in expectation analogous to the independent setting. We also establish consistency of the network jackknife for count functionals introduced by Bickel et al (2011).

Robert Lunde and Purnamrita Sarkar (2022). Subsampling Sparse Graphons Under Minimal Assumptions. Accepted at Biometrika. R code (zip)

Summary: We establish a general theory for subsampling network data generated by the sparse graphon model; the main requirement is weak convergence of the functional of interest. Under appropriate sparsity conditions, we also derive a multivariate central limit theorem for the nonzero eigenvalues of an adjacency matrix generated by a low-rank sparse graphon. Our weak convergence result yields the asymptotic validity of subsampling for eigenvalues.

Keywords: networks, sparse graphons, subsampling, eigenvalues, weak convergence

Robert Lunde (2019). Sample Splitting and Weak Assumption Inference for Time Series.

Summary: We show that sample splitting remains asymptotically valid under appropriate dependence conditions. In addition, we prove a non-stationary central limit theorem by combining the Dependent Lindeberg Method of Bardet et al. (2008) with a phenemonon involving the variance of weakly dependent sequences. Using this central limit theorem, we also demonstrate the validity of a block- multiplier bootstrap under θ-dependence and mean-stationarity.

Keywords: time series, weak dependence, sample splitting, central limit theorem, non-stationarity, bootstrap

Robert Lunde and Cosma Rohilla Shalizi (2017). Bootstrapping Generalization Error Bounds for Time Series..

Summary: We establish conditions under which a bootstrap estimator of the generalization error may be used to construct valid confidence intervals for the risk of a time series model. We show that autoregressive models satisfy the conditions in our theorem even when the model is misspecified. We also show that empirical processes formed by splitting a β-mixing process in half are asymptotically independent.

Keywords: time series, statistical learning, empirical processes, bootstrap