Date and Place: Thursdays and hybrid (live in 32-349/online via Zoom). For detailed dates see below!
Content
In the Scientific Computing Seminar we host talks of guests and members of the SciComp team as well as students of mathematics, computer science and engineering. Everybody interested in the topics is welcome.
List of Talks
Event Information:
-
Tue14Sep2021
SC Seminar: Kavyashree Renukachari
15:30Online
Kavyashree Renukachari, TU Kaiserslautern
Title: Estimation of Critical Batch Sizes for Distributed Deep Learning
Abstract:
The applications of Deep Learning in various domains is increasing, and so is the size of the dataset and the complexity of the model used in such applications. This increase creates a need for higher computational power and strategies to enable faster training of Deep Learning models. Data Parallelism is one such strategy that is being extensively used to handle large datasets. The number of compute resources is increased to handle large datasets where the workload on each resource is kept constant. It is also illustrated in several studies that Deep Learning models can be trained in a shorter time through larger batch sizes. However, there is no particular law to determine the upper limit on the batch size.
A recent study introduced a statistic called Gradient Noise Scale that could help identify the largest efficient batch size that can be used for DNN training. This study also illustrated that initially, there is a linear scaling rule for batch size, and after a certain point, additional parallelism would provide no or very minimal benefits. The Gradient Noise Scale value is calculated during DNN training. It was also noted that the noise scale value is dependent predominantly on the
dataset. Due to these factors, in this thesis, we try to estimate the gradient noise scale value before DNN training. Experiments are carried out to derive a relationship between the statistical properties of the dataset and the gradient noise scale value. Once the gradient noise scale value is understood as a function of one of the statistical properties, it could be used to obtain the value of the largest efficient batch size for a given dataset before DNN training.How to join:
The talk is held online via Zoom. You can join with the link https://uni-kl-de.zoom.us/j/94636397127?pwd=Y1g4dGVFQitzUHVRQUFpcFB4WVFKQT09.