DiscoverEye On A.I.Ben Sorscher: Data Pruning for Efficient Machine Learning
Ben Sorscher: Data Pruning for Efficient Machine Learning

Ben Sorscher: Data Pruning for Efficient Machine Learning

Update: 2023-03-02
Share

Description

In this episode, Ben Sorscher, a PhD student at Stanford, sheds light on the challenges posed by the ever-increasing size of data sets used to train machine learning models, specifically large language models. The sheer size of these data sets has been pushing the limits of scaling, as the cost of training and the environmental impact of the electricity they consume becomes increasingly enormous.

As a solution, Ben discusses the concept of “data pruning” - a method of reducing the size of data sets without sacrificing model performance. Data pruning involves selecting the most important or representative data points and removing the rest, resulting in a smaller, more efficient data set that still produces accurate results.

Throughout the podcast, Ben delves into the intricacies of data pruning, including the benefits and drawbacks of the technique, the practical considerations for implementing it in machine learning models, and the potential impact it could have on the field of artificial intelligence.

Craig Smith Twitter: https://twitter.com/craigss
Eye on A.I. Twitter: https://twitter.com/EyeOn_AI

Comments 
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Ben Sorscher: Data Pruning for Efficient Machine Learning

Ben Sorscher: Data Pruning for Efficient Machine Learning

Craig Smith