We examine self-supervised learning methods to provide an information-theoretical understanding of their construction. As a first step, we demonstrate how information-theoretic quantities can be obtained for a deterministic network. This enables us to demonstrate how SSL methods can be (re)discovered from first principles and thier assumptions about the data distribution. Furthermore, we empirically demonstrate the validity of our assumptions, confirming our novel understanding.
We show that we can learn highly informative posteriors from the source task, through supervised or self-supervised approaches, which then serve as the basis for priors that modify the whole loss surface on the downstream task. This approach enables significant performance gains and more data-efficient learning on a variety of downstream classification and segmentation tasks
We explored whether deep models should be a recommended option for tabular data by rigorously comparing the new deep models to XGBoost on various datasets. Our study shows that XGBoost outperforms these deep models across the datasets, including the datasets used in the papers that proposed the deep models. We also show that an ensemble of deep models and XGBoost performs better on these datasets than XGBoost alone.
A new framework, which resolves some of the known drawbacks of the Information Bottleneck. We provide a theoretical analysis of the framework, finding the structure of its solutions and present a novel variational formulation for DNNs.
Study the generalization properties of infinite ensembles of infinitely-wide neural networks. We report analytical and empirical investigations in the search for signals that correlate with generalization.
Analysing perceptual learning of pure tones in the auditory cortex. Using a novel computational model, we show that overrepresentation of the learned tones does not improve along the training.
A semi supervised model for detecting anomalies in videos inspiredby the Video Pixel Network. We extend the Convolutional LSTM video encoder of the VPN with a novel convolutional based attention. This approach could be a component in applications requiring visual common sense.
We extend the standard LSTM architecture by augmenting it with an additional gate which produces a memory control vector signal. This vector is fed back to the LSTM instead of the original output prediction. By decoupling the LSTM prediction from its role as a memory controller we allow each output to specialize in its own task.
We demonstrate the effectiveness of the Information-Plane visualization of DNNs. (i) Most of the training epochs are spent on compression of the input to efficient representation. (ii) The representation compression phase begins when the SGD steps change from a fast drift into a stochastic relaxation (iii) The converged layers lie very close to the information bottleneck theoretical bound, and the maps to the hidden layers satisfy the IB self-consistent equations (iv) The training time is dramatically reduced when adding more hidden layers.
We explored whether deep models should be a recommended option for tabular data by rigorously comparing the new deep models to XGBoost on various datasets. Our study shows that XGBoost outperforms these deep models across the datasets, including the datasets used in the papers that proposed the deep models. We also show that an ensemble of deep models and XGBoost performs better on these datasets than XGBoost alone.