Arsenii Ashukha
I am a Research Scientist at Samsung AI Center. I (almost) received a PhD in ML at BayesGroup, so I can make big overcomplicated DNNs work 🙂. The results of my PhD contributed to sparsification, uncertainty estimation, ensembling, and fundamentals of Bayesian deep learning.
Prior to that, I was a part of Yandex Research in collaboration with University of Amsterdam, where I worked on Bayesian deep learning with Dmitry Vetrov and Max Welling. I did ML engineering internships at Yandex (deep learning for music), Rambler (recommendation systems), and worked on NLP with Natalia Loukachevitch.
Email /
CV /
Google Scholar /
GitHub /
Twitter


Research

Resolutionrobust Large Mask Inpainting with Fourier Convolutions
Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova,
Arsenii Ashukha, Aleksei Silvestrov, Naejin Kong, Harshith Goka, Kiwoong Park, Victor Lempitsky
WACV, 2022
project page /
arXiv /
code /
bibtex
LaMa generalizes surprisingly well to much higher resolutions (~2k❗️) than it saw during training (256x256), and achieves the excellent performance even in challenging scenarios, e.g. completion of periodic structures.


Pitfalls of InDomain Uncertainty Estimation and Ensembling in Deep Learning
Arsenii Ashukha*,
Alexander Lyzhov*,
Dmitry Molchanov*,
Dmitry Vetrov
ICLR, 2020
blog post /
poster video (5mins) /
code /
arXiv /
bibtex
The work shows that i) a simple ensemble of independently trained networks performs significantly better than recent techniques ii) a simple testtime augmentation applied to a conventional network outperforms lowparameters ensembles (e.g. Dropout) and also improves all ensembles for free iii) comparison of uncertainty estimation ability of algorithms is often done incorectly in literature.


Greedy Policy Search: A Simple Baseline for Learnable TestTime Augmentation
Dmitry Molchanov*,
Alexander Lyzhov*,
Yuliya Molchanova*,
Arsenii Ashukha*,
Dmitry Vetrov
UAI, 2020
code /
arXiv /
slides /
bibtex
We introduce greedy policy search (GPS), a simple but highperforming method for learning a policy of testtime augmentation.


The Deep Weight Prior
Andrei Atanov*,
Arsenii Ashukha*,
Kirill Struminsky,
Dmitry Vetrov,
Max Welling
ICLR, 2019
code /
arXiv /
bibtex
The deep weight prior is the generative model for kernels of convolutional neural networks, that acts as a prior distribution while training on new datasets.


Variance Networks: When Expectation Does Not Meet Your Expectations
Kirill Neklyudov*,
Dmitry Molchanov*,
Arsenii Ashukha*,
Dmitry Vetrov
ICLR, 2019
code /
arXiv /
bibtex
It is possible to learn a zerocentered Gaussian distribution over the weights of a neural network by learning only variances, and it works surprisingly well.


SemiConditional Normalizing Flows for SemiSupervised Learning
Andrei Atanov,
Alexandra Volokhova,
Arsenii Ashukha,
Ivan Sosnovik,
Dmitry Vetrov
INNF Workshop at ICML, 2019
code /
arXiv /
bibtex
We employ semiconditional normalizing flow architecture that allows efficiently trains normalizing flows when only few labeled data points are available.


Unsupervised Domain Adaptation with SharedLatent Dynamics for Reinforcement Learning
Evgenii Nikishin,
Arsenii Ashukha,
Dmitry Vetrov
BLD Workshop at NeurIPS, 2019
code /
poster
Domain adaptation via learning shared dynamics in a latent space with adversarial matching of latent states.


Uncertainty Estimation via Stochastic Batch Normalization
Andrei Atanov,
Arsenii Ashukha,
Dmitry Molchanov,
Kirill Neklyudov,
Dmitry Vetrov
Joint Workshop Track at ICLR, 2018
code /
arXiv
Inferencetime stochastic batch normalization improves the performance of uncertainty estimation of ensembles.


Structured Bayesian Pruning via LogNormal Multiplicative Noise
Kirill Neklyudov,
Dmitry Molchanov,
Arsenii Ashukha,
Dmitry Vetrov
NeurIPS, 2017
code /
arXiv /
bibtex /
poster
The model allows to sparsify a DNN with an arbitrary pattern of spasticity e.g., neurons or convolutional filters.


Variational Dropout Sparsifies Deep Neural Networks
Dmitry Molchanov*,
Arsenii Ashukha*,
Dmitry Vetrov
ICML, 2017
retrospective⏳ /
talk (15 mins) /
arXiv /
bibtex /
code (theano,
tf by GoogleAI,
colab pytorch)
Variational dropout secretly trains highly sparsified deep neural networks, while a pattern of sparsity is learned jointly with weights during training.

Code
Check out very short and simple and fan to make implementations of ML algorithms:
Also, chek out more solid implementations (at least they can do ImageNet):

The webpage template was borrowed from Jon Barron.

