Heterogeneous Student Knowledge Distillation From BERT Using a Lightweight Ensemble Framework
Deep learning models have demonstrated their effectiveness in capturing complex relationships between input features and target outputs across many different application domains.These models, however, often come with considerable memory and computational demands, posing challenges for deployment on resource-constrained edge devices.Knowledge distil