I’m an Assistant Professor and a Fellow at NYU Center for Data Science, New York, where I’m working with Andrew Gordon Wilson and Yann LeCun to understand and develop better models using Bayesian deep networks, information theory, and self-supervised learning.
I completed my Ph.D. under Prof Naftali Tishby’s supervision at the Hebrew University of Jerusalem, where I also worked with Prof. Haim Sompolinsky. In my Ph.D., I focused on the connection between deep neural networks and information theory. I tried to develop a deeper understating of deep networks based on information theory and implement it over large-scale problems. I received the Google Ph.D. Fellowship.
Additionally, I am a researcher at Intel’s Artificial Intelligence Research Group. There, I am involved in the development of deep learning, computer vision, and sensory data solutions for healthcare, manufacturing, and marketing, both internally and externally.
In 2019-2020, I worked at Google Brain, CA, USA, exploring DNN generalization using information theory tools.
In the past, I have also been involved in several Wikipedia projects. I volunteer at The Public Knowledge Workshop in my spare time.
And I also enjoy playing basketball.
PhD in Computer Science and Neuroscience, 2021
The Hebrew University of Jerusalem
MSc in Computer Science and Neuroscience, 2016
The Hebrew University of Jerusalem
BSc in Computer Science and Bioinformatics, 2014
The Hebrew University of Jerusalem
Empirical and theoretical study of DNNs based on information-theoretical principles.
Developing novel deep learning, computer vision and sensory data solutions for healthcare, manufacturing, sales, and marketing for both internal and external usage. Selected Projects
We examine self-supervised learning methods to provide an information-theoretical understanding of their construction. As a first step, we demonstrate how information-theoretic quantities can be obtained for a deterministic network. This enables us to demonstrate how SSL methods can be (re)discovered from first principles and thier assumptions about the data distribution. Furthermore, we empirically demonstrate the validity of our assumptions, confirming our novel understanding.
We show that we can learn highly informative posteriors from the source task, through supervised or self-supervised approaches, which then serve as the basis for priors that modify the whole loss surface on the downstream task. This approach enables significant performance gains and more data-efficient learning on a variety of downstream classification and segmentation tasks
We explored whether deep models should be a recommended option for tabular data by rigorously comparing the new deep models to XGBoost on various datasets. Our study shows that XGBoost outperforms these deep models across the datasets, including the datasets used in the papers that proposed the deep models. We also show that an ensemble of deep models and XGBoost performs better on these datasets than XGBoost alone.
A new framework, which resolves some of the known drawbacks of the Information Bottleneck. We provide a theoretical analysis of the framework, finding the structure of its solutions and present a novel variational formulation for DNNs.
Study the generalization properties of infinite ensembles of infinitely-wide neural networks. We report analytical and empirical investigations in the search for signals that correlate with generalization.
A semi supervised model for detecting anomalies in videos inspiredby the Video Pixel Network. We extend the Convolutional LSTM video encoder of the VPN with a novel convolutional based attention. This approach could be a component in applications requiring visual common sense.
We extend the standard LSTM architecture by augmenting it with an additional gate which produces a memory control vector signal. This vector is fed back to the LSTM instead of the original output prediction. By decoupling the LSTM prediction from its role as a memory controller we allow each output to specialize in its own task.
We demonstrate the effectiveness of the Information-Plane visualization of DNNs. (i) Most of the training epochs are spent on compression of the input to efficient representation. (ii) The representation compression phase begins when the SGD steps change from a fast drift into a stochastic relaxation (iii) The converged layers lie very close to the information bottleneck theoretical bound, and the maps to the hidden layers satisfy the IB self-consistent equations (iv) The training time is dramatically reduced when adding more hidden layers.
We explored whether deep models should be a recommended option for tabular data by rigorously comparing the new deep models to XGBoost on various datasets. Our study shows that XGBoost outperforms these deep models across the datasets, including the datasets used in the papers that proposed the deep models. We also show that an ensemble of deep models and XGBoost performs better on these datasets than XGBoost alone.