Ravid Shwartz-Ziv

Ravid Shwartz-Ziv

Assistant Professor and Faculty Fellow, Center for Data Science, New York University

Senior Research scientist, Wand AI

Biography

Biography

I am an Assistant Professor and Faculty Fellow at NYU’s Center for Data Science, where I lead cutting-edge research in artificial intelligence, with a particular focus on Large Language Models (LLMs) and their applications. My work spans theoretical foundations and practical implementations, combining academic rigor with industry impact.

Research Focus

My research bridges fundamental theoretical understanding with practical applications in AI, particularly focusing on:

  • Pioneering novel approaches for analyzing LLM representations and intermediate layer dynamics
  • Developing efficient model adaptation and personalization techniques
  • Advancing information-theoretic frameworks for understanding neural networks
  • Creating innovative benchmarking frameworks for evaluation of AI systems

Notable Achievements

  • Pioneer in applying information theory to neural networks, with seminal work featured in Quanta Magazine and Wired
  • Published extensively in top-tier venues (NeurIPS, ICLR, ICML)
  • Google PhD Fellowship recipient (2018-2021)
  • Best Paper Award, Information Fusion journal (2023)
  • CPAL Rising Star Award, The University of Hong Kong (2023)
  • Moore-Sloan Fellowship, NYU (2021-2022)

Current Projects

  • Leading research initiatives in LLM personalization and adaptation
  • Developing novel approaches for analyzing and improving model efficiency
  • Creating new frameworks for understanding information flow in large-scale models
  • Advancing multi-agent systems and user-centric search implementations

My work combines theoretical insights with practical applications, contributing to both the academic understanding of AI systems and their real-world implementation. Through my dual role in academia and industry, I strive to bridge the gap between theoretical breakthroughs and practical applications in AI.

Interests

  • Large Language Models (LLMs)
  • Model Efficiency & Compression
  • Information Theory
  • Neural Network Interpretability
  • Self-Supervised Learning
  • Representation Learning
  • Multi-Agent Systems
  • Personalization in AI

Education

  • Ph.D. in Computational Neuroscience, 2021

    Hebrew University of Jerusalem

  • B.Sc. in Computer Science and Computational Biology, 2014

    Hebrew University of Jerusalem

Experience

 
 
 
 
 

Senior AI Researcher & Team Lead

Wand AI

Jan 2023 – Present New York, NY
  • Develop novel LLM personalization techniques
  • Pioneer efficient adaptation methods for LLMs
  • Direct end-to-end development from research prototypes to production
 
 
 
 
 

Assistant Professor and Faculty Fellow

NYU, Center for Data Science

Sep 2021 – Present New York, NY
  • Lead research initiatives in LLMs, focusing on model efficiency, compression techniques, and novel benchmarking frameworks
  • Pioneer novel approaches for analyzing LLM representations and information flow
  • Direct research in representation learning through information-theoretic lens
  • Develop and teach graduate-level courses in Advanced Machine Learning and Deep Learning
 
 
 
 
 

Senior AI & Data Science Researcher

Intel

Jan 2020 – Dec 2023 New York, NY
  • Led development of AI algorithms focusing on LLMs and RAG systems
  • Optimized validation processes for automated code validation
  • Collaborated with cross-functional teams on tabular data solutions
 
 
 
 
 

Research Student

Google AI

Jun 2019 – May 2020 Mountain View, CA
  • Developed an information-theoretic framework for infinitely-wide neural networks
  • Created efficient data compression algorithms leveraging information theory
 
 
 
 
 

AI & Data Science Researcher

Intel

Feb 2013 – May 2019 Petah-Tikva, Israel
  • Developed computer vision solutions for GPU defect detection
  • Created ML-based automated testing frameworks
  • Designed sensor-based ML systems for healthcare monitoring
 
 
 
 
 

Algorithm and Web Developer

Wikipedia

Jan 2010 – Jan 2013 Israel
  • Developed machine learning projects, including OCR system and copyright detection tool
  • Enhanced user experience through creation of editing gadgets
  • Contributed to projects promoting public knowledge accessibility

Recent Publications

Quickly discover relevant content by filtering publications.

Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs

We introduce min-p sampling, a dynamic truncation method for language models that improves text generation quality and diversity, especially at high temperatures, showing superior performance across multiple benchmarks.
Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs

Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning

We propose Seq-VCR, a method to prevent representation collapse in Transformer models, significantly improving their performance on complex reasoning tasks without requiring chain-of-thought supervision.
Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning

LiveBench: A Challenging, Contamination-Free LLM Benchmark

We introduce LiveBench, a novel contamination-free LLM benchmark featuring continuously updated questions from recent sources, objective scoring, and challenging tasks across multiple domains.
LiveBench: A Challenging, Contamination-Free LLM Benchmark

OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset

We present OpenDebateEvidence, a massive dataset containing 3.5 million documents from competitive debate, enabling advancement in argument mining and summarization through large language model fine-tuning.
OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset

Inheritune: Training Smaller Yet More Attentive Language Models

We present Inheritune, a training method that creates smaller yet equally effective language models by inheriting and fine-tuning early transformer layers from larger models, addressing the issue of lazy layers in deep networks.
Inheritune: Training Smaller Yet More Attentive Language Models

Talks

Compression in deep learning - an information theory perspective

While DNNs have achieved many breakthroughs, our understanding of their internal structure, optimization process, and generalization is poor, and we often treat them as black boxes. We attempt to resolve these issues by suggesting that DNNs learn to optimize the Information Bottleneck (IB) principle - the tradeoff between information compression and prediction quality. In the first part of the talk, I presented this approach, showing an analytical and numerical study of DNNs in the information plane. This analysis reveals how the training process compresses the input to an optimal, efficient representation. I discussed recent works inspired by this analysis and show how we can apply them to real-world problems. In the second part of the talk, I will discuss information in infinitely-wide neural networks using recent results in Neural Tangent Kernels (NTK) networks. The NTK allows us to derive many tractable information-theoretic quantities. By utilizing these derivations, we can do an empirical search to find the important information-theoretic quantities that affect generalization in DNNs. I aslo presented the Dual Information Bottleneck (dualIB) framework, to find an optimal representation that resolves some of the drawbacks of the original IB. A theoretical analysis of the dualIB shows the structure of its solution and its ability to preserve the original distribution’s statistics. Within this, we focused on the variational form of the dualIB, allowing its application to DNNs.
Compression in deep learning -  an information theory perspective

Information in Infinite Ensembles of Infinitely-Wide Neural Networks

Finding generalization signals using information for infinitely-wide neural networks.
Information in Infinite Ensembles of Infinitely-Wide Neural Networks

Representation Compression in Deep Neural Network

An information theoretic viewpoint on the behavior of deep networks optimization processes and their generalization abilities by the information plane and how compression can help.
Representation Compression in Deep Neural Network

On the Information Theory of Deep Neural Networks

Understanding Deep Neural Networks with the information bottleneck principle.
On the Information Theory of Deep Neural Networks

Open the Black Box of Deep Neural Networks

Where is the information in deep neural networks? trying to find it by looking on the information plane.
Open the Black Box of Deep Neural Networks

Contact