When less is more: Sample-aware pruning of uninformative features improves neural-network regression

Nicholas Christakis, Dimitris Drikakis

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Modern regression datasets often contain a mix of weakly informative predictors and purely noisy features. We establish, both theoretically and empirically, that discarding variables whose population covariance with the target is zero does not increase expected generalisation risk, up to a vanishing O(log⁡n/n) term under weight decay, and typically enhances performance in small-sample regimes. Risk bounds show that pruning reduces finite-sample estimation error O(log⁡n/n) without affecting approximation error, while gradients associated with noise dimensions vanish asymptotically. Controlled synthetic benchmarks spanning 27 different configurations confirm these predictions: at n=102 samples with 18/20 informative features (correlation strength of 0.5), pruning reduces MSE by 44 % and increases R2 from 0.426 to 0.678; as n grows to 104, the benefit tapers, reflecting the growing influence of implicit regularisation. Attribution analysis via SHAP corroborates the oracle-level identification of relevant features. These conclusions are further validated on real-world data using the Boston Housing Dataset, where pruning just two statistically less informative features yields consistent gains in both full-sample and small-sample regimes, despite the presence of negatively correlated predictors—supporting our theoretical claim that correlation strength, not sign, determines informativeness. The safety guarantee is conditional on informativeness defined by nonzero population covariance with the target; variables that are informative only through purely non-monotonic or symmetric effects fall outside this scope. Our results caution against the automatic retention of high-p-value variables, provide sample-size-aware guidelines for feature filtering, and extend to any shrinkage-based learner, including ridge regression, kernel methods, and Gaussian processes. Code and data are publicly released to ensure full reproducibility.

    Original languageEnglish
    Article number131913
    JournalNeurocomputing
    Volume660
    DOIs
    Publication statusPublished - 7 Jan 2026

    Keywords

    • Artificial neural networks
    • Feature selection
    • Feature sparsity
    • Generalisation gains
    • Noise-neural networks
    • Sample-aware pruning

    Fingerprint

    Dive into the research topics of 'When less is more: Sample-aware pruning of uninformative features improves neural-network regression'. Together they form a unique fingerprint.

    Cite this