Ravid Shwartz-Ziv

Assistant Professor and Faculty Fellow, Center for Data Science, New York University

Senior Research scientist, Wand AI

Biography

I am an Assistant Professor and Faculty Fellow at NYU’s Center for Data Science, where I lead cutting-edge research in artificial intelligence, with a particular focus on Large Language Models (LLMs) and their applications. My work spans theoretical foundations and practical implementations, combining academic rigor with industry impact.

Research Focus

My research bridges fundamental theoretical understanding with practical applications in AI, particularly focusing on:

Pioneering novel approaches for analyzing LLM representations and intermediate layer dynamics
Developing efficient model adaptation and personalization techniques
Advancing information-theoretic frameworks for understanding neural networks
Creating innovative benchmarking frameworks for evaluation of AI systems

Notable Achievements

Pioneer in applying information theory to neural networks, with seminal work featured in Quanta Magazine and Wired
Published extensively in top-tier venues (NeurIPS, ICLR, ICML)
Google PhD Fellowship recipient (2018-2021)
Best Paper Award, Information Fusion journal (2023)
CPAL Rising Star Award, The University of Hong Kong (2023)
Moore-Sloan Fellowship, NYU (2021-2022)

Current Projects

Leading research initiatives in LLM personalization and adaptation
Developing novel approaches for analyzing and improving model efficiency
Creating new frameworks for understanding information flow in large-scale models
Advancing multi-agent systems and user-centric search implementations

My work combines theoretical insights with practical applications, contributing to both the academic understanding of AI systems and their real-world implementation. Through my dual role in academia and industry, I strive to bridge the gap between theoretical breakthroughs and practical applications in AI.

Interests

Large Language Models (LLMs)
Model Efficiency & Compression
Information Theory
Neural Network Interpretability
Self-Supervised Learning
Representation Learning
Multi-Agent Systems
Personalization in AI

Education

Ph.D. in Computational Neuroscience, 2021
Hebrew University of Jerusalem
B.Sc. in Computer Science and Computational Biology, 2014
Hebrew University of Jerusalem

Experience

Senior AI Researcher & Team Lead

Wand AI

Jan 2023 – Present New York, NY

Develop novel LLM personalization techniques
Pioneer efficient adaptation methods for LLMs
Direct end-to-end development from research prototypes to production

Assistant Professor and Faculty Fellow

NYU, Center for Data Science

Sep 2021 – Present New York, NY

Lead research initiatives in LLMs, focusing on model efficiency, compression techniques, and novel benchmarking frameworks
Pioneer novel approaches for analyzing LLM representations and information flow
Direct research in representation learning through information-theoretic lens
Develop and teach graduate-level courses in Advanced Machine Learning and Deep Learning

Senior AI & Data Science Researcher

Intel

Jan 2020 – Dec 2023 New York, NY

Led development of AI algorithms focusing on LLMs and RAG systems
Optimized validation processes for automated code validation
Collaborated with cross-functional teams on tabular data solutions

Research Student

Google AI

Jun 2019 – May 2020 Mountain View, CA

Developed an information-theoretic framework for infinitely-wide neural networks
Created efficient data compression algorithms leveraging information theory

AI & Data Science Researcher

Intel

Feb 2013 – May 2019 Petah-Tikva, Israel

Developed computer vision solutions for GPU defect detection
Created ML-based automated testing frameworks
Designed sensor-based ML systems for healthcare monitoring

Algorithm and Web Developer

Wikipedia

Jan 2010 – Jan 2013 Israel

Developed machine learning projects, including OCR system and copyright detection tool
Enhanced user experience through creation of editing gadgets
Contributed to projects promoting public knowledge accessibility

Featured Publications

Minh Nguyen Nhat, Baker Andrew, Neo Clement, Roush Allen, Kirsch Andreas, Ravid Shwartz-Ziv

January 2025 In ICLR

Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs

We introduce min-p sampling, a dynamic truncation method for language models that improves text generation quality and diversity, especially at high temperatures, showing superior performance across multiple benchmarks.

PDF

Arefin Md Rifat, Subbaraj Gopeshh, Gontier Nicolas, LeCun Yann, Rish Irina, Ravid Shwartz-Ziv, Pal, Christopher

November 2024 In ICLR

Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning

We propose Seq-VCR, a method to prevent representation collapse in Transformer models, significantly improving their performance on complex reasoning tasks without requiring chain-of-thought supervision.

PDF

White Colin, Dooley Samuel, Roberts Manley, Pal Arka, Feuer Ben, Jain Siddhartha, Ravid Shwartz-Ziv, Jain Neel, Saifullah Khalid, Naidu Siddartha, Hegde Chinmay, LeCun Yann, Goldstein Tom, Neiswanger Willie, Goldblum Micah

June 2024 In ICLR

LiveBench: A Challenging, Contamination-Free LLM Benchmark

We introduce LiveBench, a novel contamination-free LLM benchmark featuring continuously updated questions from recent sources, objective scoring, and challenging tasks across multiple domains.

PDF

Roush Allen, Shabazz Yusuf, Balaji Arvind, Zhang Peter, Mezza Stefano, Zhang Markus, Basu Sanjay, Vishwanath Sriram, Fatemi Mehdi, Ravid Shwartz-Ziv

June 2024 In NeurIPS

OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset

We present OpenDebateEvidence, a massive dataset containing 3.5 million documents from competitive debate, enabling advancement in argument mining and summarization through large language model fine-tuning.

PDF Dataset

Sanyal Sunny, Ravid Shwartz-Ziv, Dimakis Alexandros G., Sanghavi Sujay

April 2024 arXiv

Inheritune: Training Smaller Yet More Attentive Language Models

We present Inheritune, a training method that creates smaller yet equally effective language models by inheriting and fine-tuning early transformer layers from larger models, addressing the issue of lazy layers in deep networks.

PDF Code

Ravid Shwartz-Ziv, Balestriero Randall, Kawaguchi Kenji, Rudner Tim GJ, LeCun Yann

December 2023 In NeurIPS

An Information Theory Perspective on Variance-Invariance-Covariance Regularization

We provide an information-theoretic analysis of VICReg, deriving theoretical foundations for deterministic networks and introducing new SSL methods based on these insights.

PDF

Ravid Shwartz-Ziv, Goldblum Micah, Li Yucen, Bruss C. Bayan, Wilson Andrew G.

December 2023 In NeurIPS

Back to Basics: Revisiting Standard Deep Learning Components for Class Imbalance

We show that carefully tuning standard deep learning components can achieve state-of-the-art performance on class-imbalanced datasets without specialized techniques.

PDF DOI

Ravid Shwartz-Ziv, Ravid, Shwartz-Ziv, Yann, LeCun

April 2023 In Entropy

To Compress or Not to Compress--Self-Supervised Learning and Information Theory: A Review

We present a comprehensive review of self-supervised learning through the lens of information theory, introducing a unified framework that encompasses existing approaches and highlighting the interplay between compression and information preservation in deep neural networks.

PDF DOI

Ravid Shwartz-Ziv, Randall, Balestriero, Yann, LeCun

August 2022 In ICML 2022: Pre-training: Perspectives, Pitfalls, and Paths Forward workshop

What Do We Maximize in Self-Supervised Learning?

We examine self-supervised learning methods to provide an information-theoretical understanding of their construction. As a first step, we demonstrate how information-theoretic quantities can be obtained for a deterministic network. This enables us to demonstrate how SSL methods can be (re)discovered from first principles and thier assumptions about the data distribution. Furthermore, we empirically demonstrate the validity of our assumptions, confirming our novel understanding.

PDF

Ravid Shwartz-Ziv, Micah, Goldblum, Hossein, Souri, Sanyam, Kapoor, Chen, Zhu, Yann, LeCun, Andrew, Wilson

May 2022 In NeurIPS 2022

Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors

We show that we can learn highly informative posteriors from the source task, through supervised or self-supervised approaches, which then serve as the basis for priors that modify the whole loss surface on the downstream task. This approach enables significant performance gains and more data-efficient learning on a variety of downstream classification and segmentation tasks

PDF Code Video

Ravid Shwartz-Ziv, Armon Amitai

January 2022 In Information Fusion

Tabular Data: Deep Learning is Not All You Need

We explored whether deep models should be a recommended option for tabular data by rigorously comparing the new deep models to XGBoost on various datasets. Our study shows that XGBoost outperforms these deep models across the datasets, including the datasets used in the papers that proposed the deep models. We also show that an ensemble of deep models and XGBoost performs better on these datasets than XGBoost alone.

PDF Video

Zoe Piran(?), Ravid Shwartz-Ziv(?), Naftali Tishby

May 2020

The Dual Information Bottleneck

A new framework, which resolves some of the known drawbacks of the Information Bottleneck. We provide a theoretical analysis of the framework, finding the structure of its solutions and present a novel variational formulation for DNNs.

PDF Code

Ravid Shwartz-Ziv, Alexander A Alemi

January 2020 In The Symposium on Advances in Approximate Bayesian Inference

Information in Infinite Ensembles of Infinitely-Wide Neural Networks

Study the generalization properties of infinite ensembles of infinitely-wide neural networks. We report analytical and empirical investigations in the search for signals that correlate with generalization.

PDF Code

Ben-Ari Itamar(?), Ravid Shwartz-Ziv(?)

December 2018 NIPS 2018 Workshop on Systems for ML

Attentioned Convolutional LSTM Inpaintingv Network for Anomaly Detection in Videos

A semi supervised model for detecting anomalies in videos inspiredby the Video Pixel Network. We extend the Convolutional LSTM video encoder of the VPN with a novel convolutional based attention. This approach could be a component in applications requiring visual common sense.

PDF

Ravid Shwartz-Ziv(?), Ben-Ari Itamar(?)

December 2017 NIPS 2017 Time Series Workshop

Sequence Modeling Using a Memory Controller Extension for LSTM

We extend the standard LSTM architecture by augmenting it with an additional gate which produces a memory control vector signal. This vector is fed back to the LSTM instead of the original output prediction. By decoupling the LSTM prediction from its role as a memory controller we allow each output to specialize in its own task.

PDF

Ravid Shwartz-Ziv, Tishby, Naftali

July 2017

Opening the Black Box of Deep Neural Networks via Information

We demonstrate the effectiveness of the Information-Plane visualization of DNNs. (i) Most of the training epochs are spent on compression of the input to efficient representation. (ii) The representation compression phase begins when the SGD steps change from a fast drift into a stochastic relaxation (iii) The converged layers lie very close to the information bottleneck theoretical bound, and the maps to the hidden layers satisfy the IB self-consistent equations (iv) The training time is dramatically reduced when adding more hidden layers.

PDF Code Article

Ravid Shwartz-Ziv, Armon Amitai

January 2017 In TICML 2021 Workshop AutoML

Tabular Data: Deep Learning is Not All You Need

PDF

Recent Publications

Quickly discover relevant content by filtering publications.

Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs

Minh Nguyen Nhat, Baker Andrew, Neo Clement, Roush Allen, Kirsch Andreas, Ravid Shwartz-Ziv

PDF

Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning

Arefin Md Rifat, Subbaraj Gopeshh, Gontier Nicolas, LeCun Yann, Rish Irina, Ravid Shwartz-Ziv, Pal, Christopher

PDF

LiveBench: A Challenging, Contamination-Free LLM Benchmark

We introduce LiveBench, a novel contamination-free LLM benchmark featuring continuously updated questions from recent sources, objective scoring, and challenging tasks across multiple domains.

PDF

OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset

Roush Allen, Shabazz Yusuf, Balaji Arvind, Zhang Peter, Mezza Stefano, Zhang Markus, Basu Sanjay, Vishwanath Sriram, Fatemi Mehdi, Ravid Shwartz-Ziv

PDF Dataset

Inheritune: Training Smaller Yet More Attentive Language Models

Sanyal Sunny, Ravid Shwartz-Ziv, Dimakis Alexandros G., Sanghavi Sujay

PDF Code

See all publications

Talks

Compression in deep learning - an information theory perspective

While DNNs have achieved many breakthroughs, our understanding of their internal structure, optimization process, and generalization is poor, and we often treat them as black boxes. We attempt to resolve these issues by suggesting that DNNs learn to optimize the Information Bottleneck (IB) principle - the tradeoff between information compression and prediction quality. In the first part of the talk, I presented this approach, showing an analytical and numerical study of DNNs in the information plane. This analysis reveals how the training process compresses the input to an optimal, efficient representation. I discussed recent works inspired by this analysis and show how we can apply them to real-world problems. In the second part of the talk, I will discuss information in infinitely-wide neural networks using recent results in Neural Tangent Kernels (NTK) networks. The NTK allows us to derive many tractable information-theoretic quantities. By utilizing these derivations, we can do an empirical search to find the important information-theoretic quantities that affect generalization in DNNs. I aslo presented the Dual Information Bottleneck (dualIB) framework, to find an optimal representation that resolves some of the drawbacks of the original IB. A theoretical analysis of the dualIB shows the structure of its solution and its ability to preserve the original distribution’s statistics. Within this, we focused on the variational form of the dualIB, allowing its application to DNNs.

Sep 22, 2021 12:00 AM NYU Center for Data Science

Video

Compression in deep learning - an information theory perspective

Information in Infinite Ensembles of Infinitely-Wide Neural Networks

Finding generalization signals using information for infinitely-wide neural networks.

Dec 8, 2019 12:00 AM 2nd Symposium on Advances in Approximate Bayesian Inference, Vancouver, Canada

PDF

Representation Compression in Deep Neural Network

An information theoretic viewpoint on the behavior of deep networks optimization processes and their generalization abilities by the information plane and how compression can help.

Aug 29, 2018 12:00 AM Google PhD Fellowship Summit, Mountain View, CA, USA