ParaRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models
Recurrent Neural Networks (RNNs) laid the foundation for sequence modeling, but their intrinsic sequential nature restricts parallel computation, creating a fundamental barrier to scaling. This has led to the dominance of parallelizable architectures like Transformers and, more recently, State Space Models (SSMs). While SSMs achieve efficient parallelization through structured linear recurrences, this linearity constraint limits their expressive power and precludes modeling complex, nonlinear sequence-wise dependencies. To address this, we present ParaRNN, a framework that breaks the…
Prof. Liu Yang from the University of Chinese Academy of Sciences (UCAS), in collaboration with her colleagues from Renmin University of China and Massachusetts Institute of Technology, has proposed a novel network, namely, the physics-encoded recurrent convolutional neural network (PeRCNN), for modeling and discovery of nonlinear spatio-temporal dynamical systems based…