Research

Robert Lunde, Minjie Yang, Elizaveta Levina, and Ji Zhu (2026). Conformal Prediction For Dyadic Regression Under Complex Missingness.

Summary: We study the properties of conformal prediction for dyadic regression, which includes the ubiquitous link prediction problem. Under a joint exchangeability assumption on the regression array and the missingness mechanism, we establish finite-sample validity of various conformal prediction procedures for sampled elements. This result makes use of new machinery for conformal prediction beyond exchangeability, which we believe is of independent interest. We also establish asymptotic validity of weighted conformal prediction for missing elements under a graphon missingness mechanism.

Wei Li, Nilanjan Charkaborty, and Robert Lunde (2025). Assumption-Lean Inference for Network-Linked Data. Major Revision, Bernoulli.

Summary: In most regression problems involving network data, it is natural to posit that unobserved latent variables affect the response. In our view, this makes parametric modeling difficult to justify; to this end, we propose an assumption-lean paradigm for linear regression on network-linked data. We consider inference under a jointly exchangeable regression array. We establish an Aldous-Hoover representation for such arrays, which is of independent interest. Finally, we consider inference for subgraph frequencies and spectral embeddings; for the former, we show that explicitly correcting for network bias can lead to improvements over the OLS estimator.

Ayoushman Bhattacharya,Nilanjan Charkaborty, and Robert Lunde (2025). Statistical Inference for Subgraph Frequencies of Exchangeable Hyperedge Models . Major Revision, Journal of the Royal Statistical Society, Series B.

Summary: In many applications, it is natural to view interactions rather than nodes as the fundamental units. Under an exchangeable hyperedge framework, we propose several notions of subgraph frequencies for hypergraphs, including novel edge-colored subgraphs that account for multiplicity. We derive the asymptotic normality of these statistics, explore their robustness to the omission of low-degree nodes, and study the asymptotic properties of a class of statistics without multiplicity.

Robert Lunde (2023). On the Validity of Conformal Prediction for Network Data Under Non-Uniform Sampling. Major Revision, Bernoulli.

Summary: We study the properties of conformal prediction for network data under various sampling schemes that often result in a non-representative sample of nodes. We interpret these sampling mechanisms as selection rules and show that conformal prediction remains finite-sample valid if the regression array is jointly exchangeable and the selection rule satisfies an invariance property. We also show that a weighted conformal prediction procedure is asymptotically valid for samples from a random walk on a graph under appropriate conditions.

Robert Lunde, Elizaveta Levina, and Ji Zhu (2025). Conformal Prediction for Network-Assisted Regression. Journal of the American Statistical Association, 120(551), 1633–1644..

Summary: We study the properties for conformal prediction in regression problems with network information. We show that conformal prediciton remains finite-sample valid in these settings under a joint exchangeability condition on a regression array and a mild symmetry condition on the network statistics. We also show that a form of asymptotic conditional validity is achievable.

Robert Lunde, Purnamrita Sarkar, and Rachel Ward (2021). Bootstrapping the Error of Oja’s Algorithm. Accepted at Neurips 2021, Spotlight Talk.

Summary: We establish a high-dimensional weighted chi-squared approximation for the the sine-squared error of Oja’s algorithm, a widely used method for streaming PCA. We also propose an online multiplier bootstrap method and establish consistency of the procedure.

Qiaohui Lin, Robert Lunde, and Purnamrita Sarkar (2020). Trading off Accuracy for Speedup: Multiplier Bootstraps for Subgraph Counts. major revision, Statistica Sinica.

Summary: We propose a family of bootstrap procedures, ranging from a fast, randomized linear bootstrap for massive, sparse graphs to an accurate quadratic procedure for smaller graphs. We establish conditions under which the randomized linear bootstrap offers consistent inference for an appropriate target distribution and conditions under which higher-order correctness holds for the quadratic bootstrap.

Qiaohui Lin, Robert Lunde, and Purnamrita Sarkar (2020). On the Theoretical Properties of the Network Jackknife. ICML 2020.

Summary: We study the properties of a leave-node-out jackknife procedure for network data. Under the sparse graphon model, we prove an Efron-Stein type inequality, showing the jackknife estimate of the variance is conservative in expectation analogous to the independent setting. We also establish consistency of the network jackknife for count functionals introduced by Bickel et al (2011).

Robert Lunde and Purnamrita Sarkar. Subsampling Sparse Graphons Under Minimal Assumptions. Biometrika, Volume 110, Issue 1, March 2023, Pages 15–32 R code (zip)

Summary: We establish a general theory for subsampling network data generated by the sparse graphon model; the main requirement is weak convergence of the functional of interest. Under appropriate sparsity conditions, we also derive a multivariate central limit theorem for the nonzero eigenvalues of an adjacency matrix generated by a low-rank sparse graphon. Our weak convergence result yields the asymptotic validity of subsampling for eigenvalues.

Keywords: networks, sparse graphons, subsampling, eigenvalues, weak convergence

Robert Lunde (2019). Sample Splitting and Weak Assumption Inference for Time Series.

Summary: We show that sample splitting remains asymptotically valid under appropriate dependence conditions. In addition, we prove a non-stationary central limit theorem by combining the Dependent Lindeberg Method of Bardet et al. (2008) with a phenemonon involving the variance of weakly dependent sequences. Using this central limit theorem, we also demonstrate the validity of a block- multiplier bootstrap under θ-dependence and mean-stationarity.

Keywords: time series, weak dependence, sample splitting, central limit theorem, non-stationarity, bootstrap

Robert Lunde and Cosma Rohilla Shalizi (2017). Bootstrapping Generalization Error Bounds for Time Series., major revision, Sankhya A.

Summary: We establish conditions under which a bootstrap estimator of the generalization error may be used to construct valid confidence intervals for the risk of a time series model. We show that autoregressive models satisfy the conditions in our theorem even when the model is misspecified. We also show that empirical processes formed by splitting a β-mixing process in half are asymptotically independent.

Keywords: time series, statistical learning, empirical processes, bootstrap