Special Session 133: New developments on nonlinear expectations

Large Language Models training under Sublinear Expectation
Shuzhen Yang
Shandong University
Peoples Rep of China
Co-Author(s):    
Abstract:
As the complexity of large language models (LLMs) continues to grow, the scale of training data expands exponentially, and the inherent uncertainty in data distribution has emerged as a nonnegligible critical challenge. We incorporate the sublinear expectation theory into LLMs to analyze uncertainties inherent in training data. Leveraging the $\phi$-max mean algorithm, we propose a novel cross-entropy loss function (Lossmax mean) and the corresponding training strategy (SLE-Strategy). Using a decoder-only Transformer as the base model, we conduct five sets of controlled experiments with different parameter sizes, comparing the model using SLE-Strategy (SLE-model) with model using traditional strategy. Experimental results demonstrate that the SLE-model achieves a substantial improvement in training efficiency, reducing the per-epoch training time by 57.46%-86.34% while maintaining a controllable performance gap. This innovative approach effectively balances cost and model performance, offering a promising direction for optimizing LLM training in resource-constrained scenarios.