TY - JOUR
T1 - When less is more
T2 - Sample-aware pruning of uninformative features improves neural-network regression
AU - Christakis, Nicholas
AU - Drikakis, Dimitris
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2026/1/7
Y1 - 2026/1/7
N2 - Modern regression datasets often contain a mix of weakly informative predictors and purely noisy features. We establish, both theoretically and empirically, that discarding variables whose population covariance with the target is zero does not increase expected generalisation risk, up to a vanishing O(logn/n) term under weight decay, and typically enhances performance in small-sample regimes. Risk bounds show that pruning reduces finite-sample estimation error O(logn/n) without affecting approximation error, while gradients associated with noise dimensions vanish asymptotically. Controlled synthetic benchmarks spanning 27 different configurations confirm these predictions: at n=102 samples with 18/20 informative features (correlation strength of 0.5), pruning reduces MSE by 44 % and increases R2 from 0.426 to 0.678; as n grows to 104, the benefit tapers, reflecting the growing influence of implicit regularisation. Attribution analysis via SHAP corroborates the oracle-level identification of relevant features. These conclusions are further validated on real-world data using the Boston Housing Dataset, where pruning just two statistically less informative features yields consistent gains in both full-sample and small-sample regimes, despite the presence of negatively correlated predictors—supporting our theoretical claim that correlation strength, not sign, determines informativeness. The safety guarantee is conditional on informativeness defined by nonzero population covariance with the target; variables that are informative only through purely non-monotonic or symmetric effects fall outside this scope. Our results caution against the automatic retention of high-p-value variables, provide sample-size-aware guidelines for feature filtering, and extend to any shrinkage-based learner, including ridge regression, kernel methods, and Gaussian processes. Code and data are publicly released to ensure full reproducibility.
AB - Modern regression datasets often contain a mix of weakly informative predictors and purely noisy features. We establish, both theoretically and empirically, that discarding variables whose population covariance with the target is zero does not increase expected generalisation risk, up to a vanishing O(logn/n) term under weight decay, and typically enhances performance in small-sample regimes. Risk bounds show that pruning reduces finite-sample estimation error O(logn/n) without affecting approximation error, while gradients associated with noise dimensions vanish asymptotically. Controlled synthetic benchmarks spanning 27 different configurations confirm these predictions: at n=102 samples with 18/20 informative features (correlation strength of 0.5), pruning reduces MSE by 44 % and increases R2 from 0.426 to 0.678; as n grows to 104, the benefit tapers, reflecting the growing influence of implicit regularisation. Attribution analysis via SHAP corroborates the oracle-level identification of relevant features. These conclusions are further validated on real-world data using the Boston Housing Dataset, where pruning just two statistically less informative features yields consistent gains in both full-sample and small-sample regimes, despite the presence of negatively correlated predictors—supporting our theoretical claim that correlation strength, not sign, determines informativeness. The safety guarantee is conditional on informativeness defined by nonzero population covariance with the target; variables that are informative only through purely non-monotonic or symmetric effects fall outside this scope. Our results caution against the automatic retention of high-p-value variables, provide sample-size-aware guidelines for feature filtering, and extend to any shrinkage-based learner, including ridge regression, kernel methods, and Gaussian processes. Code and data are publicly released to ensure full reproducibility.
KW - Artificial neural networks
KW - Feature selection
KW - Feature sparsity
KW - Generalisation gains
KW - Noise-neural networks
KW - Sample-aware pruning
UR - https://www.scopus.com/pages/publications/105020253605
U2 - 10.1016/j.neucom.2025.131913
DO - 10.1016/j.neucom.2025.131913
M3 - Article
AN - SCOPUS:105020253605
SN - 0925-2312
VL - 660
JO - Neurocomputing
JF - Neurocomputing
M1 - 131913
ER -