nav_sign

a.eidrdtuudnc.gamw@name

Home News Experience Research Education Achievements

I am a Research MSc student at Mila affiliated with University of Montréal supervised by Irina Rish.

I also am the Founder and President at Landskape AI, a theoretical and analytical deep learning non-profit research organization.

I broadly work on robust and interpretable learning under constraints with focus on but not limited to the following domains:

  • Diffusion Models and Prompting
  • Compression (MoE, Sparsity)
  • Vision Language Models
  • Continual Learning

CV  /  Google Scholar  /  GitHub  /  Blog  /  Certificates

Sponsor

profile photo

News




   Research Experience

Research Associate IOct 2023 - Present
Carnegie Mellon University (CMU), Human Sensing Lab (HSL)
Supervisor: Prof. Fernando De la Torre
Research Area: Transfer of personalization on continual update of diffusion models.

nthu

Machine Learning ResearcherApril. 2022 - Feb. 2023
Morgan Stanley
Supervisor: Kashif Rasul
Research Area: Continual Learning, Time Series, Model Reprogramming

Remote Visiting Research ScholarAug. 2021 - Present
VITA, University of Texas at Austin
Supervisor: Dr. Zhangyang Wang
Research Area: Sparsity, Robustness and Knowledge Distillation.

nthu

Research AffiliateFeb. 2020 - Present
Laboratory of Space Research (LSR), University of Hong Kong
Supervisor: Dr. Quentin A. Parker
Research Area: Computer Vision applications in PNe Exploration.

nthu

Research InternJun. 2018 - Aug. 2018
NVIDIA AI Lab, Bennett University
Supervisors: Dr. Deepak Garg and Dr. Suneet Gupta
Research Area: Large Scale Visual Recognition.

   Industrial and Leadership Experience
nthu

Founder, President and ResearcherSept. 2019 - Present
Landskape AI
Mentors: Assc. Prof. Jaegul Choo, Javier Ideami and Federico Lois
Research Area: Analytical Deep Learning Theory.

nthu

Technical Content and Course DeveloperNov. 2023 - Present
Towards AI
Area: RAG, LLM, LangChain, LLamaIndex

nthu

Machine Learning EngineerDec. 2020 - Oct. 2021
Weights & Biases
Team: Frameworks and Integrations.

nthu

Technical Content DeveloperJun. 2020 - Jan. 2021
Paperspace
Blog
Topic Area: Computer Vision (Attention Mechanisms).

Publication
*indicates equal contribution

nthu Mish: A Self Regularized Non-Monotonic Neural Activation Function
Diganta Misra

BMVC, 2020
project / paper / abstract / bibtex

We propose Mish, a novel self-regularized non-monotonic activation function which can be mathematically defined as: $f(x)=xtanh(softplus(x))$. As activation functions play a crucial role in the performance and training dynamics in neural networks, we validated experimentally on several well-known benchmarks against the best combinations of architectures and activation functions. We also observe that data augmentation techniques have a favorable effect on benchmarks like ImageNet-1k and MS-COCO across multiple architectures. For example, Mish outperformed Leaky ReLU on YOLOv4 with a CSP-DarkNet-53 backbone on average precision ($AP^{val}_{50}$) by $2.1\%$ in MS-COCO object detection and ReLU on ResNet-50 on ImageNet-1k in Top-1 accuracy by $\approx 1 \%$ while keeping all other network parameters and hyperparameters constant. Furthermore, we explore the mathematical formulation of Mish in relation with the Swish family of functions and propose an intuitive understanding on how the first derivative behavior may be acting as a regularizer helping the optimization of deep neural networks.

@article{misra2019mish,
title={Mish: A self regularized non-monotonic neural activation function},
author={Misra, Diganta},
journal={arXiv preprint arXiv:1908.08681},
volume={4},
pages={2},
year={2019},
publisher={CoRR}
}
CV Talk Episode / ML Cafe Episode / Sicara Talk / W&B Salon Episode / CROWN

GitHub Repo starsGitHub forks
Run on Gradient
For those who are curious, the name Mish was coined by my girlfriend. 👩‍💻
nthu Rotate to Attend: Convolutional Triplet Attention Module
Diganta Misra*, Trikay Nalamada*, Ajay Uppili Arasanipalai*, Qibin Hou

WACV, 2021
project / paper / supplementary / video / abstract / bibtex

Benefiting from the capability of building interdependencies among channels or spatial locations, attention mechanisms have been extensively studied and broadly used in a variety of computer vision tasks recently. In this paper, we investigate light-weight but effective attention mechanisms and present triplet attention, a novel method for computing attention weights by capturing cross-dimension interaction using a three-branch structure. For an input tensor, triplet attention builds inter-dimensional dependencies by the rotation operation followed by residual transformations and encodes inter-channel and spatial information with negligible computational overhead. Our method is simple as well as efficient and can be easily plugged into classic backbone networks as an add-on module. We demonstrate the effectiveness of our method on various challenging tasks including image classification on ImageNet-1k and object detection on MSCOCO and PASCAL VOC datasets. Furthermore, we provide extensive insight into the performance of triplet attention by visually inspecting the GradCAM and GradCAM++ results. The empirical evaluation of our method supports our intuition on the importance of capturing dependencies across dimensions when computing attention weights.

@inproceedings{misra2021rotate,
title={Rotate to attend: Convolutional triplet attention module},
author={Misra, Diganta and Nalamada, Trikay and Arasanipalai, Ajay Uppili and Hou, Qibin},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={3139--3148},
year={2021}
}
GitHub Repo starsGitHub forks
nthu Just Say the Name: Online Continual Learning with Category Names Only via Data Generation   New!

Minhyuk Seo, Diganta Misra, Seongwon Cho, MinJae Lee, Jonghyun Choi 최

Preprint
paper / abstract / bibtex

Twitter

In real-world scenarios, extensive manual annotation for continual learning is impractical due to prohibitive costs. Although prior arts, influenced by large-scale webly supervised training, suggest leveraging web-scraped data in continual learning, this poses challenges such as data imbalance, usage restrictions, and privacy concerns. Addressing the risks of continual webly supervised training, we present an online continual learning framework - Generative Name only Continual Learning (G-NoCL). The proposed G-NoCL uses a set of generators G along with the learner. When encountering new concepts (i.e., classes), G-NoCL employs the novel sample complexity-guided data ensembling technique DIverSity and COmplexity enhancing ensemBlER (DISCOBER) to optimally sample training data from generated data. Through extensive experimentation, we demonstrate superior performance of DISCOBER in G-NoCL online CL benchmarks, covering both In-Distribution (ID) and Out-of-Distribution (OOD) generalization evaluations, compared to naive generator-ensembling, web-supervised, and manually annotated data.

@misc{seo2024just, title={Just Say the Name: Online Continual Learning with Category Names Only via Data Generation}, author={Minhyuk Seo and Diganta Misra and Seongwon Cho and Minjae Lee and Jonghyun Choi}, year={2024}, eprint={2403.10853}, archivePrefix={arXiv}, primaryClass={cs.LG} } Mila CERC AAI AI & Scale Workshop 2024 Talk

nthu Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order   New!

Diganta Misra, Multiple authors

Preprint
paper / abstract / bibtex

HuggingFace report HuggingFace Models Twitter

Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to democratize access to pretrained models for collaborative community development. However, such existing models face challenges: limited multilingual capabilities, continual pretraining causing catastrophic forgetting, whereas pretraining from scratch is computationally expensive, and compliance with AI safety and development laws. This paper presents Aurora-M, a 15B parameter multilingual open-source model trained on English, Finnish, Hindi, Japanese, Vietnamese, and code. Continually pretrained from StarCoderPlus on 435 billion additional tokens, Aurora-M surpasses 2 trillion tokens in total training token count. It is the first open-source multilingual model fine-tuned on human-reviewed safety instructions, thus aligning its development not only with conventional red-teaming considerations, but also with the specific concerns articulated in the Biden-Harris Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. Aurora-M is rigorously evaluated across various tasks and languages, demonstrating robustness against catastrophic forgetting and outperforming alternatives in multilingual settings, particularly in safety evaluations.

@misc{nakamura2024auroram, title={Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order}, author={Taishi Nakamura and Mayank Mishra and Simone Tedeschi and Yekun Chai and Jason T Stillerman and Felix Friedrich and Prateek Yadav and Tanmay Laud and Vu Minh Chien and Terry Yue Zhuo and Diganta Misra and Ben Bogin and Xuan-Son Vu and Marzena Karpinska and Arnav Varma Dantuluri and Wojciech Kusa and Tommaso Furlanello and Rio Yokota and Niklas Muennighoff and Suhas Pai and Tosin Adewumi and Veronika Laippala and Xiaozhe Yao and Adalberto Junior and Alpay Ariyak and Aleksandr Drozd and Jordan Clive and Kshitij Gupta and Liangyu Chen and Qi Sun and Ken Tsui and Noah Persaud and Nour Fahmy and Tianlong Chen and Mohit Bansal and Nicolo Monti and Tai Dang and Ziyang Luo and Tien-Tung Bui and Roberto Navigli and Virendra Mehta and Matthew Blumberg and Victor May and Huu Nguyen and Sampo Pyysalo}, year={2024}, eprint={2404.00399}, archivePrefix={arXiv}, primaryClass={cs.CL} }
nthu APP: Anytime Progressive Pruning
Diganta Misra*, Bharat Runwal*, Tianlong Chen, Zhangyang Wang, Irina Rish

DyNN workshop at ICML,2022
SNN, 2022
CLL workshop at ACML, 2022
SlowDNN workshop, 2023
project / paper / webpage / poster / abstract / bibtex

With the latest advances in deep learning, there has been a lot of focus on the online learning paradigm due to its relevance in practical settings. Although many methods have been investigated for optimal learning settings in scenarios where the data stream is continuous over time, sparse networks training in such settings have often been overlooked. In this paper, we explore the problem of training a neural network with a target sparsity in a particular case of online learning: the anytime learning at macroscale paradigm (ALMA). We propose a novel way of progressive pruning, referred to as \textit{Anytime Progressive Pruning} (APP); the proposed approach significantly outperforms the baseline dense and Anytime OSP models across multiple architectures and datasets under short, moderate, and long-sequence training. Our method, for example, shows an improvement in accuracy of $\approx 7\%$ and a reduction in the generalization gap by $\approx 22\%$, while being $\approx 1/3$ rd the size of the dense baseline model in few-shot restricted imagenet training. We further observe interesting nonmonotonic transitions in the generalization gap in the high number of megabatches-based ALMA. The code and experiment dashboards can be accessed at \url{https://github.com/landskape-ai/Progressive-Pruning} and \url{https://wandb.ai/landskape/APP}, respectively.

@misc{misra2022app,
title={APP: Anytime Progressive Pruning},
author={Diganta Misra and Bharat Runwal and Tianlong Chen and Zhangyang Wang and Irina Rish},
year={2022},
eprint={2204.01640},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
NSL presentation / MLC Research Jam #8 /
MLC Research Jam #9 / Continual AI Seminar

GitHub Repo starsGitHub Repo forks
nthu $\mathcal{D}^2$-Sparse: Navigating the low data learning regime with sparse networks   New!
Diganta Misra*, Niklas Nolte*, Sparsha Mishra, Lu Yin

PML4LRS @ ICLR, 2024 (Oral)
abstract

Learning under constraints has been a fundamental avenue of research in deep learning since the advent of modern deep neural networks. In parallel to the upwards trajectory of scaling neural networks, one practical constraint that has embodied efficient deep learning has been that of sparsity. Unstructured weight sparsity has been the cornerstone of pioneering works in the space of pruning and lottery ticket hypothesis. In this paper, we propose \textbf{$\mathcal{D}^2$-Sparse}, a novel dual dynamic sparse learning system for low-data learning regime. Our paper combines two popular constraints in deep learning namely sparsity and low-data learning, often studied in disjoint paradigms, thus opening new directions of research in sparsity. $\mathcal{D}^2$-Sparse outperforms standard iterative pruning schema when coupled with standard deep networks in computer vision tasks like image classification and in natural language processing like code generation with no extra-overhead cost on inference. Compared to iterative pruning, on $\frac{1}{8}$-th total data budget, $\mathcal{D}^2$-Sparse achieves a $\approx$ 4% top-1 accuracy boost for ResNet-18 on the CIFAR-100 classification task. Further, we demonstrate the effectiveness of the proposed method in anytime learning scenarios and provide extensive analysis into evolution of sparse masks in $\mathcal{D}^2$-Sparse over the training process. Code, dashboard, and model weights will be open-sourced for public access upon acceptance.

nthu GitChameleon: Breaking the version barrier for code generation models   New!
Justine Gehring*, Nizar Islah*, Diganta Misra*, Massimo Caccia, Irina Rish

DMLR @ ICLR, 2024
abstract

The ever-changing landscape of programming languages poses a significant challenge in the development and training of models designed for code generation. Code, being a dynamic and constantly evolving environment, necessitates a continuous process of adaptation to stay in sync with the rapidly shifting paradigms, frameworks, and methodologies within the software development domain. The inherent variability in coding styles, the emergence of new programming languages, and the continuous evolution of libraries and packages underscore the imperative for an active approach in updating code generation models. In response to this challenge, we introduce GitChameleon, an innovative dataset comprising more than 12,000 version-sensitive examples in Python, designed to facilitate research into the adaptation of code generation models to the rapidly changing landscape of programming languages. Furthermore, we assess the performance of state-of-the-art code models and demonstrate their inadequacy in generating version-specific code. For example, the latest CodeLlama-70B only achieves a 46.76% exact string match score when evaluated on GitChameleon.

nthu Uncovering the Hidden Cost of Model Compression   New!
Diganta Misra*, Muawiz Chaudhary*, Agam Goyal*, Bharat Runwal*, Pin Yu Chen

PiV @ CVPR, 2024
paper / abstract / bibtex

In the era of resource-intensive foundation models, efficient adaptation in downstream tasks has become paramount. Visual Prompting (VP), inspired by prompting in Large Language Models (LLMs), has emerged as a key transfer learning method in computer vision. Aligned with the growing significance of efficiency, research in model compression has become pivotal to alleviate the computational burden in both training and deploying over-parameterized neural networks. A key goal in model compression is the development of sparse models capable of matching or surpassing the performance of their over-parameterized, dense counterparts. While prior research has explored the impact of model sparsity on transfer learning, its effects on visual prompting-based transfer remain unclear. This study addresses this gap, revealing that model sparsity adversely affects the performance of visual prompting-based transfer, particularly in low-data-volume scenarios. Furthermore, our findings highlight the negative influence of sparsity on the calibration of downstream visual-prompted models. This empirical exploration calls for a nuanced understanding beyond accuracy in sparse settings, opening avenues for further research in Visual Prompting for sparse models.

@article{misra2023reprogramming, title = {Reprogramming under constraints: Revisiting efficient and reliable transferability of lottery tickets}, author = {Diganta Misra and Agam Goyal and Bharat Runwal and Pin Yu Chen}, year = {2023}, journal = {arXiv preprint arXiv: 2308.14969} } Cohere ForAI Lightning Talk / Google Sparsity Reading Group Talk / MLC Research Jam 17

GitHub Repo starsGitHub forks
nthu Challenging Common Assumptions about Catastrophic Forgetting
Timothée Lesort, Oleksiy Ostapenko, Diganta Misra, Md Rifat Arefin, Pau Rodriguez, Laurent Charlin, Irina Rish

CoLLAs workshop, 2022
CoLLAs, 2023
paper / abstract / bibtex

Standard gradient descent algorithms applied to sequences of tasks are known to produce catastrophic forgetting in deep neural networks. When trained on a new task in a sequence, the model updates its parameters on the current task, forgetting past knowledge. This article explores scenarios where we scale the number of tasks in a finite environment. Those scenarios are composed of a long sequence of tasks with reoccurring data. We show that in such setting, stochastic gradient descent can learn, progress, and converge to a solution that according to existing literature needs a continual learning algorithm. In other words, we show that the model performs knowledge retention and accumulation without specific memorization mechanisms. We propose a new experimentation framework, SCoLe (Scaling Continual Learning), to study the knowledge retention and accumulation of algorithms in potentially infinite sequences of tasks. To explore this setting, we performed a large number of experiments on sequences of 1,000 tasks to better understand this new family of settings. We also propose a slight modifications to the vanilla stochastic gradient descent to facilitate continual learning in this setting. The SCoLe framework represents a good simulation of practical training environments with reoccurring situations and allows the study of convergence behavior in long sequences. Our experiments show that previous results on short scenarios cannot always be extrapolated to longer scenarios.

@article{lesort2022scaling,
title = {Scaling the Number of Tasks in Continual Learning},
author = {Timothée Lesort and Oleksiy Ostapenko and Diganta Misra and Md Rifat Arefin and Pau Rodríguez and Laurent Charlin and Irina Rish},
year = {2022},
journal = {arXiv preprint arXiv: Arxiv-2207.04543}
}

nthu Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Diganta Misra, Mukund Varma T., Multiple authors

TMLR, 2023
project / paper / abstract / bibtex

Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 442 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.

@article{srivastava2022beyond,
title = {Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models},
author = {Aarohi Srivastava and Abhinav Rastogi and Abhishek Rao and Abu Awal Md Shoeb and Abubakar Abid and Adam Fisch and Adam R. Brown and Adam Santoro and Aditya Gupta and Adrià Garriga-Alonso and Agnieszka Kluska and Aitor Lewkowycz and Akshat Agarwal and Alethea Power and Alex Ray and Alex Warstadt and Alexander W. Kocurek and Ali Safaya and Ali Tazarv and Alice Xiang and Alicia Parrish and Allen Nie and Aman Hussain and Amanda Askell and Amanda Dsouza and Ameet Rahane and Anantharaman S. Iyer and Anders Andreassen and Andrea Santilli and Andreas Stuhlmüller and Andrew Dai and Andrew La and Andrew Lampinen and Andy Zou and Angela Jiang and Angelica Chen and Anh Vuong and Animesh Gupta and Anna Gottardi and Antonio Norelli and Anu Venkatesh and Arash Gholamidavoodi and Arfa Tabassum and Arul Menezes and Arun Kirubarajan and Asher Mullokandov and Ashish Sabharwal and Austin Herrick and Avia Efrat and Aykut Erdem and Ayla Karakaş and B. Ryan Roberts and Bao Sheng Loe and Barret Zoph and Bartłomiej Bojanowski and Batuhan Özyurt and Behnam Hedayatnia and Behnam Neyshabur and Benjamin Inden and Benno Stein and Berk Ekmekci and Bill Yuchen Lin and Blake Howald and Cameron Diao and Cameron Dour and Catherine Stinson and Cedrick Argueta and César Ferri Ramírez and Chandan Singh and Charles Rathkopf and Chenlin Meng and Chitta Baral and Chiyu Wu and Chris Callison-Burch and Chris Waites and Christian Voigt and Christopher D. Manning and Christopher Potts and Cindy Ramirez and Clara E. Rivera and Clemencia Siro and Colin Raffel and Courtney Ashcraft and Cristina Garbacea and Damien Sileo and Dan Garrette and Dan Hendrycks and Dan Kilman and Dan Roth and Daniel Freeman and Daniel Khashabi and Daniel Levy and Daniel Moseguí González and Danny Hernandez and Danqi Chen and Daphne Ippolito and Dar Gilboa and David Dohan and David Drakard and David Jurgens and Debajyoti Datta and Deep Ganguli and Denis Emelin and Denis Kleyko and Deniz Yuret and Derek Chen and Derek Tam and Dieuwke Hupkes and Diganta Misra and Dilyar Buzan and Dimitri Coelho Mollo and Diyi Yang and Dong-Ho Lee and Ekaterina Shutova and Ekin Dogus Cubuk and Elad Segal and Eleanor Hagerman and Elizabeth Barnes and Elizabeth Donoway and Ellie Pavlick and Emanuele Rodola and Emma Lam and Eric Chu and Eric Tang and Erkut Erdem and Ernie Chang and Ethan A. Chi and Ethan Dyer and Ethan Jerzak and Ethan Kim and Eunice Engefu Manyasi and Evgenii Zheltonozhskii and Fanyue Xia and Fatemeh Siar and Fernando Martínez-Plumed and Francesca Happé and Francois Chollet and Frieda Rong and Gaurav Mishra and Genta Indra Winata and Gerard de Melo and Germán Kruszewski and Giambattista Parascandolo and Giorgio Mariani and Gloria Wang and Gonzalo Jaimovitch-López and Gregor Betz and Guy Gur-Ari and Hana Galijasevic and Hannah Kim and Hannah Rashkin and Hannaneh Hajishirzi and Harsh Mehta and Hayden Bogar and Henry Shevlin and Hinrich Schütze and Hiromu Yakura and Hongming Zhang and Hugh Mee Wong and Ian Ng and Isaac Noble and Jaap Jumelet and Jack Geissinger and Jackson Kernion and Jacob Hilton and Jaehoon Lee and Jaime Fernández Fisac and James B. Simon and James Koppel and James Zheng and James Zou and Jan Kocoń and Jana Thompson and Jared Kaplan and Jarema Radom and Jascha Sohl-Dickstein and Jason Phang and Jason Wei and Jason Yosinski and Jekaterina Novikova and Jelle Bosscher and Jennifer Marsh and Jeremy Kim and Jeroen Taal and Jesse Engel and Jesujoba Alabi and Jiacheng Xu and Jiaming Song and Jillian Tang and Joan Waweru and John Burden and John Miller and John U. Balis and Jonathan Berant and Jörg Frohberg and Jos Rozen and Jose Hernandez-Orallo and Joseph Boudeman and Joseph Jones and Joshua B. Tenenbaum and Joshua S. Rule and Joyce Chua and Kamil Kanclerz and Karen Livescu and Karl Krauth and Karthik Gopalakrishnan and Katerina Ignatyeva and Katja Markert and Kaustubh D. Dhole and Kevin Gimpel and Kevin Omondi and Kory Mathewson and Kristen Chiafullo and Ksenia Shkaruta and Kumar Shridhar and Kyle McDonell and Kyle Richardson and Laria Reynolds and Leo Gao and Li Zhang and Liam Dugan and Lianhui Qin and Lidia Contreras-Ochando and Louis-Philippe Morency and Luca Moschella and Lucas Lam and Lucy Noble and Ludwig Schmidt and Luheng He and Luis Oliveros Colón and Luke Metz and Lütfi Kerem Şenel and Maarten Bosma and Maarten Sap and Maartje ter Hoeve and Madotto Andrea and Maheen Farooqi and Manaal Faruqui and Mantas Mazeika and Marco Baturan and Marco Marelli and Marco Maru and Maria Jose Ramírez Quintana and Marie Tolkiehn and Mario Giulianelli and Martha Lewis and Martin Potthast and Matthew L. Leavitt and Matthias Hagen and Mátyás Schubert and Medina Orduna Baitemirova and Melody Arnaud and Melvin McElrath and Michael A. Yee and Michael Cohen and Michael Gu and Michael Ivanitskiy and Michael Starritt and Michael Strube and Michał Swędrowski and Michele Bevilacqua and Michihiro Yasunaga and Mihir Kale and Mike Cain and Mimee Xu and Mirac Suzgun and Mo Tiwari and Mohit Bansal and Moin Aminnaseri and Mor Geva and Mozhdeh Gheini and Mukund Varma T and Nanyun Peng and Nathan Chi and Nayeon Lee and Neta Gur-Ari Krakover and Nicholas Cameron and Nicholas Roberts and Nick Doiron and Nikita Nangia and Niklas Deckers and Niklas Muennighoff and Nitish Shirish Keskar and Niveditha S. Iyer and Noah Constant and Noah Fiedel and Nuan Wen and Oliver Zhang and Omar Agha and Omar Elbaghdadi and Omer Levy and Owain Evans and Pablo Antonio Moreno Casares and Parth Doshi and Pascale Fung and Paul Pu Liang and Paul Vicol and Pegah Alipoormolabashi and Peiyuan Liao and Percy Liang and Peter Chang and Peter Eckersley and Phu Mon Htut and Pinyu Hwang and Piotr Miłkowski and Piyush Patil and Pouya Pezeshkpour and Priti Oli and Qiaozhu Mei and Qing Lyu and Qinlang Chen and Rabin Banjade and Rachel Etta Rudolph and Raefer Gabriel and Rahel Habacker and Ramón Risco Delgado and Raphaël Millière and Rhythm Garg and Richard Barnes and Rif A. Saurous and Riku Arakawa and Robbe Raymaekers and Robert Frank and Rohan Sikand and Roman Novak and Roman Sitelew and Ronan LeBras and Rosanne Liu and Rowan Jacobs and Rui Zhang and Ruslan Salakhutdinov and Ryan Chi and Ryan Lee and Ryan Stovall and Ryan Teehan and Rylan Yang and Sahib Singh and Saif M. Mohammad and Sajant Anand and Sam Dillavou and Sam Shleifer and Sam Wiseman and Samuel Gruetter and Samuel R. Bowman and Samuel S. Schoenholz and Sanghyun Han and Sanjeev Kwatra and Sarah A. Rous and Sarik Ghazarian and Sayan Ghosh and Sean Casey and Sebastian Bischoff and Sebastian Gehrmann and Sebastian Schuster and Sepideh Sadeghi and Shadi Hamdan and Sharon Zhou and Shashank Srivastava and Sherry Shi and Shikhar Singh and Shima Asaadi and Shixiang Shane Gu and Shubh Pachchigar and Shubham Toshniwal and Shyam Upadhyay and Shyamolima and Debnath and Siamak Shakeri and Simon Thormeyer and Simone Melzi and Siva Reddy and Sneha Priscilla Makini and Soo-Hwan Lee and Spencer Torene and Sriharsha Hatwar and Stanislas Dehaene and Stefan Divic and Stefano Ermon and Stella Biderman and Stephanie Lin and Stephen Prasad and Steven T. Piantadosi and Stuart M. Shieber and Summer Misherghi and Svetlana Kiritchenko and Swaroop Mishra and Tal Linzen and Tal Schuster and Tao Li and Tao Yu and Tariq Ali and Tatsu Hashimoto and Te-Lin Wu and Théo Desbordes and Theodore Rothschild and Thomas Phan and Tianle Wang and Tiberius Nkinyili and Timo Schick and Timofei Kornev and Timothy Telleen-Lawton and Titus Tunduny and Tobias Gerstenberg and Trenton Chang and Trishala Neeraj and Tushar Khot and Tyler Shultz and Uri Shaham and Vedant Misra and Vera Demberg and Victoria Nyamai and Vikas Raunak and Vinay Ramasesh and Vinay Uday Prabhu and Vishakh Padmakumar and Vivek Srikumar and William Fedus and William Saunders and William Zhang and Wout Vossen and Xiang Ren and Xiaoyu Tong and Xinyi Wu and Xudong Shen and Yadollah Yaghoobzadeh and Yair Lakretz and Yangqiu Song and Yasaman Bahri and Yejin Choi and Yichi Yang and Yiding Hao and Yifu Chen and Yonatan Belinkov and Yu Hou and Yufang Hou and Yuntao Bai and Zachary Seid and Zhao Xinran and Zhuoye Zhao and Zijian Wang and Zijie J. Wang and Zirui Wang and Ziyi Wu},
year = {2022},
journal = {arXiv preprint arXiv: Arxiv-2206.04615}
}
Tense task

GitHub Repo starsGitHub forks
nthu On the low-shot transferability of [V]-Mamba   New!
Diganta Misra*, Jay Gala*, Antonio Orvieto

PiV @ CVPR, 2024
paper / abstract / bibtex

The strength of modern large-scale neural networks lies in their ability to efficiently adapt to new tasks with few examples. Although extensive research has investigated the transferability of Vision Transformers (ViTs) to various downstream tasks under diverse constraints, this study shifts focus to explore the transfer learning potential of [V]-Mamba. We compare its performance with ViTs across different few-shot data budgets and efficient transfer methods. Our analysis yields three key insights into [V]-Mamba's few-shot transfer performance: (a) [V]-Mamba demonstrates superior or equivalent few-shot learning capabilities compared to ViTs when utilizing linear probing (LP) for transfer, (b) Conversely, [V]-Mamba exhibits weaker or similar few-shot learning performance compared to ViTs when employing visual prompting (VP) as the transfer method, and (c) We observe a weak positive correlation between the performance gap in transfer via LP and VP and the scale of the [V]-Mamba model. This preliminary analysis lays the foundation for more comprehensive studies aimed at furthering our understanding of the capabilities of [V]-Mamba variants and their distinctions from ViTs.

@misc{misra2024lowshot, title={On the low-shot transferability of [V]-Mamba}, author={Diganta Misra and Jay Gala and Antonio Orvieto}, year={2024}, eprint={2403.10696}, archivePrefix={arXiv}, primaryClass={cs.CV} }
nthu Mitigating Mode Collapse in Sparse Mixture of Experts   New!
Nizar Islah, Diganta Misra, Timothy Nest, Matthew Riemer

NeurIPS New in ML workshop, 2023
abstract

The recent success of Sparse Mixture-of-Experts (SMoEs) models has sparked renewed interest in routed networks in deep learning. A prominent aspect of the SMoE is the scaling of the number of total parameters in a model, effectively increasing capacity while keeping computation costs similar to dense models. Yet, these models pose optimization challenges as inputs are routed discretely to experts in each layer. Often, a regularization term is added to the loss function to penalize the imbalanced selection of experts. We aim to demonstrate that the heuristic regularization strategies used in recent SMoEs, while successful in some tasks, have significant limitations which we aim to address. In multi-domain or multi-task settings, without explicit knowledge of the task or domain, the network will suffer from a mode collapse-performance tradeoff, in which some experts will receive significantly less training signal, or performance on some tasks will suffer. Second, we derive a theoretical basis of the various routing functions, with entropy-maximization as a common objective. Third, we will demonstrate a first application of Generative Flow Networks (GFlowNets) to SMoEs, with a state, policy, and action space, represented at a particular layer of the model by the input, routing network, and sampling from expert probabilities, respectively. We aim to show that SMoEs trained with the Trajectory Balance objective from GFlowNet literature can achieve competitive performance with state of the art routing methods, such as Switch Transformer, and suffer less from expert collapse in multi-task (NYUv2, Pascal-Context) and multi-domain (Omniglot) settings. This work lays some foundations for further exploration of theoretically motivated approaches to routing in sparse MoEs.

nthu Knowing Your Nonlinearities: Shapley Interactions Reveal the Underlying Structure of Data   New!
Divyansh Singhvi, Andrej Erkelens, Raghav Jain, Diganta Misra, Naomi Saphra

NeurIPS ATTRIB workshop, 2023
paper / abstract / bibtex

Twitter thread

Measuring nonlinear feature interaction is an established approach to understanding complex patterns of attribution in many models. In this paper, we use Shapley Taylor interaction indices (STII) to analyze the impact of underlying data structure on model representations in a variety of modalities, tasks, and architectures. Considering linguistic structure in masked and auto-regressive language models (MLMs and ALMs), we find that STII increases within idiomatic expressions and that MLMs scale STII with syntactic distance, relying more on syntax in their nonlinear structure than ALMs do. Our speech model findings reflect the phonetic principal that the openness of the oral cavity determines how much a phoneme varies based on its context. Finally, we study image classifiers and illustrate that feature interactions intuitively reflect object boundaries. Our wide range of results illustrates the benefits of interdisciplinary work and domain expertise in interpretability research. @misc{singhvi2024knowing, title={Knowing Your Nonlinearities: Shapley Interactions Reveal the Underlying Structure of Data}, author={Divyansh Singhvi and Andrej Erkelens and Raghav Jain and Diganta Misra and Naomi Saphra}, year={2024}, eprint={2403.13106}, archivePrefix={arXiv}, primaryClass={cs.LG} }

nthu Pruning CodeBERT for Improved Code-to-Text Efficiency   New!
Alex Gu, Ria Sonecha, Saaketh Vedantam, Bharat Runwal, Diganta Misra

SNN workshop at ICLR, 2023
abstract

The size and prevalence of large language models (LLMs) make them an apt target for model compression. Most LLMs consist of a Transformer encoder and decoder, which each have 6 to 12 layers of multiheaded self-attention blocks, along with fully connected layers. This results in a large number of parameters, making them quite expensive to train and query. Our work focuses on finding techniques to prune CodeBERT, a specific LLM trained to work multimodally between text and code. We explore the effects of structured and unstructured magnitude pruning on the encoder layers of CodeBERT, evaluating on the task of generating natural language comments from a piece of Ruby code.

poster / paper
nthu SPIRIT: Zero Shot Information Retrieval Domain Transfer with Soft Prompts
Ethan Kim, Diganta Misra

Preprint
paper / abstract

Dense information retrieval yields strong in-domain performance, but often struggles with out-of-domain generalization, lagging be- hind unsupervised methods. Retrieval tasks can vary across a num- ber of dimensions including domain, query intent, and language. Using a single dense retrieval model for all tasks often underper- forms lexical methods such as BM25. For practical information retrieval systems, it is expensive to deploy a different model for each task. Therefore, our motivation is to develop a cheap and effective information retrieval model that maintains strong per- formance across different domains while easily adapting to any new domain. Other approaches to domain transfer in information retrieval rely on large auxiliary language models or datasets and create a separate model for each task. In this work, we develop a method utilizing prompt tuning to efficiently adapt dense retrievers with a minimal amount of additional computation. By combining models trained on a variety of different domains, we can effectively boost performance on a target task in a new domain. Specifically, we train dense retrieval models using prompt tuning on a large number of information retrieval tasks across diverse domains and types of query intents. To adapt to a new domain, we create new prompt embeddings by averaging the prompt embeddings from a set of source tasks selected in an unsupervised manner. We evaluate zero-shot transfer performance across a wide variety of information retrieval domains and show competitive performance while lever- aging a minimal amount of compute. Notably, our SPIRIT method achieves while being extremely lightweight and practical to deploy in production.

nthu Genetic Algorithm Optimized Inkjet Printed Electromagnetic Absorber on Paper Substrate
Diganta Misra, Rahul Pelluri, Vijay Kumar Verma, Bhargav Appasani, Nisha Gupta

IEEE AESPC, 2018
paper / abstract / bibtex

Printable electronics based electromagnetic absorbers are receiving increasing attention of the electromagnetic community because of their unprecedented advantages. This paper presents the design of printable electromagnetic absorbers for the X band. The design of the absorber is optimized using the Genetic Algorithm (GA) to enhance the absorptivity and the absorption bandwidth. The design involves the placement of several square-shaped conductive ink at optimal locations on the paper substrate such that desired absorption characteristics are obtained. Simulations are carried out using the HFSS simulation software. The optimized structure offers an absorptivity of more than 90% in the X band thereby proving to be a viable solution for stealth applications.

@inproceedings{misra2018genetic,
title={Genetic Algorithm Optimized Inkjet Printed Electromagnetic Absorber on Paper Substrate},
author={Misra, Diganta and Pelluri, Rahul and Verma, Vijay Kumar and Appasani, Bhargav and Gupta, Nisha},
booktitle={2018 International Conference on Applied Electromagnetics, Signal Processing and Communication (AESPC)},
volume={1},
pages={1--3},
year={2018},
organization={IEEE}
}
nthu Large-Scale Meta-Analysis of Genes Encoding Pattern in Wilson’s Disease   (Best Paper Award)
Diganta Misra, Anurag Tiwari, Amrita Chaturvedi

Springer IC4S, 2018
paper / abstract / bibtex

In this paper, we propose an unsupervised learning approach with an objective to understand gene expressions for analysis of Wilson’s disease in the liver of Mus musculus organisms. We proceeded to obtain the best parameters for cluster division to correctly classify gene expression sets so as to capture the effect and characteristics of the disease in the genome levels of the organisms in the best possible way. The clustering proved beneficial in capturing the correct genetic analogy of Wilson’s disease. Analytical experiments were carried out using various clustering algorithms and were evaluated using performance metrics including silhouette score analysis and Calinski–Harabasz index.

@inproceedings{misra2019large, title={Large-Scale Meta-Analysis of Genes Encoding Pattern in Wilson’s Disease}, author={Misra, Diganta and Tiwari, Anurag and Chaturvedi, Amrita}, booktitle={Advances in Computer Communication and Computational Sciences: Proceedings of IC4S 2018}, pages={389--400}, year={2019}, organization={Springer} }
nthu Convoluted Cosmos: Classifying Galaxy Images Using Deep Learning
Diganta Misra, Sachi Nandan Mohanty, Mohit Agarwal, Suneet K Gupta

Springer ICDMAI, 2019 (Proceedings of the AISC)
paper / abstract / bibtex

In this paper, a deep learning-based approach has been developed to classify the images of galaxies into three major categories, namely, elliptical, spiral, and irregular. The classifier successfully classified the images with an accuracy of 97.3958%, which outperformed conventional classifiers like Support Vector Machine and Naive Bayes. The convolutional neural network architecture involves one input convolution layer having 16 filters, followed by 4 hidden layers, 1 penultimate dense layer, and an output Softmax layer. The model was trained on 4614 images for 200 epochs using NVIDIA-DGX-1 Tesla-V100 Supercomputer machine and was subsequently tested on new images to evaluate its robustness and accuracy.

@incollection{misra2020convoluted,
title={Convoluted cosmos: classifying galaxy images using deep learning},
author={Misra, Diganta and Mohanty, Sachi Nandan and Agarwal, Mohit and Gupta, Suneet K},
booktitle={Data Management, Analytics and Innovation},
pages={569--579},
year={2020},
publisher={Springer}
}
Research under progress
Dynamic Sparse Upcycling
Diganta Misra, Mohammed Muqeeth (UNC Chapel Hill), Nizar Islah (Mila), Shiwei Liu (VITA, UT-Austin), Zhangyang "Atlas" Wang (VITA, UT-Austin), Conglong Li (Microsoft)

Open Source Frameworks & Projects Sponsor
nthu Avalanche: an End-to-End Library for Continual Learning
Dec'20 - Present

I am an active lead maintainer of the Reproducible Continual Learning framework by Avalanche and also actively work on the evaluation framework of Avalanche mainly in the direction of integration of Weights & Biases API.

GitHub Repo starsGitHub forks

nthu Echo
Jun'19 - Present

Echo is an OSS deep learning package with support for TensorFlow, PyTorch and MegEngine, containing novel validated methods, components and building blocks used in deep learning.

GitHub Repo starsGitHub forks

nthu Evonorm
Apr'20

Created the most popular open source reimplementation of Evolving Normalization-Activation Layers by Liu. et. al.

GitHub Repo starsGitHub forks

eca ECANets
Jan'21

Reproduced the CVPR 2020 paper: ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks for the ML Reproducibility Challenge 2020. Integrated with Weights & Biases.

GitHub Repo stars

nthu Big Bench
Aug'21

Our fine grained tense modification task was accepted to Google's Big Bench for testing large LMs. In collaboration with Mukund Varma T.

GitHub Repo starsGitHub forks

nthu Aurora-M
April 2023 - Present

I am currently the lead for the modelling part of the Multi-Domain Expert Layers (MDEL) Training: How to increase knowledge without breaking the bank? as a collaborative effort co-ordinated by Ontocord AI wherein my team is working on different aspects of architecture design and training of the MDEL model on SUMMIT supercomputer cluster as part of the INCITE allocation.

GitHub Repo starsHuggingFace report



   Education
nthu

ELLIS PhD FellowSeptember 2024 (Upcoming)
Max Planck Institute for Intelligent Systems (MPI-IS) + ELLIS Institute TĂĽbingen + TĂĽbingen AI Center
Advisor: Antonio Orvieto
TĂĽbingen, Germany

nthu

Masters in Machine LearningSeptember 2021 - Present
Montréal Institute of Learning Algorithms (Mila)
Advisor: Professor Irina Rish
Montréal, Canada

Masters in Computer Science (MSc CS)September 2021 - Present
University of Montréal
Advisor: Professor Irina Rish
Montréal, Canada

nthu

Bachelors of Technology (B.Tech) in EEEJun. 2016 - May. 2020
Kalinga Institute of Industrial Technology (KIIT)
Advisor: Asst. Prof. Dr. Bhargav Appasani
Bhubaneswar, India

   Internships and Exchange Programs
nthu

Data Science InternJun. 2018 - Feb. 2019
CSIR-CDRI

During this internship, I was involved in building the analytical pipeline, data collection, pre-processing of data, cleaning of data, Geo-spatial Analysis of data and Document writing for the project on understanding demographics of Venture Capital and Early Seed Investments. As a part of a team of three, I was advised and mentored by Dr. Sukant Khurana.

Remote

nthu

Summer InternMay. 2018 - Jun. 2018
IIT-Kharagpur

Studied basic algorithmic techniques using functional programming languages - Lisp and Prolog under the guidance of Assc. Prof. Pawan Kumar.

Kharagpur, India

Summer Exchange InternJun. 2017 - Aug. 2017
Bangkok University

Served as a primary instructor for cultural engagements along with teaching basic english and computer science to primary grade students at RangsonWittaya School, Nakhon Sawan under the AIESEC SDG #4 programme. Was also part of culture exchange, entrepreneurship and social service programs at Bangkok University

Bangkok, Thailand

Initiatives and Academic Services
NeuroMatch Academy

I was responsible for developing the content for the Strategies section in the Continual Learning lecture of the Deep Learning Cohort of Neuromatch Academy 2021.
W&B ML Reproducibility Challenge

I was the lead organizer of the W&B MLRC 2021 where I actively supported our challenge participants. Our mission of organizing this challenge was to make machine learning research reproducible, transparent and accessible to everyone. This initiative was also supported by our W&B MLRC Grant of $500 for each participant.
INF8225: Probabilistic Learning

I was a teaching assistant for the INF8225: Probabilistic Learning course at Polytechnique University taught by Christopher J. Pal for the Winter 2022 semester.
INF8245E: Machine Learning

I was a teaching assistant for the INF8245E: Machine Learning course at Polytechnique University taught by Sarath Chandar for the Fall 2023 semester.
IFT6390: Machine Learning

I was a teaching assistant for the IFT6390: Machine Learning course at UdeM taught by Ioannis Mitliagkis for the Fall 2023 semester.
IFT6760A: Towards AGI: Scaling, Emergence, Alignment

I am a teaching assistant for the IFT6760A: Towards AGI: Scaling, Emergence, Alignment course at UdeM taught by Irina Rish for the Winter 2024 semester.
Deep Learning Theory Reading Group, Mila

I was an organizer of the DL Theory Reading Group at Mila, Montreal.
Mila 2022 Entrepreneurs Cohort Program

I was selected as one of the entrepreneurs in residence and pitched my startup idea called 9CLEF (Elevator Pitch).
Mila 2022 TRAIL

I was selected as one of the first students in the Trustworthy Responsible AI Learning certificate (TRAIL) program at Mila, Montreal. (Certificate)
McMedHacks 2023

I was selected as one of the mentors of McMedHacks 2023 and will be mentoring the participants on the topic of AI in Healthcare.
NewinML workshop @ NeurIPS, 2023

I am serving as an organizer, Program Chair (PC) and Area Chair (AC) for the NewinML workshop at NeurIPS 2023.
Volunteer @ NeurIPS, 2023

I will be serving as a volunteer at the NeurIPS, 2023 conference.
Served as a Reviewer/ Program Committee member for:

ECCV 2024 (R),CVPR 2024 (R), Workshop on Data-centric Machine Learning Research (DMLR): Harnessing Momentum for Science @ ICLR, 2024 (R), Workshop on Secure and Trustworthy Large Language Models @ ICLR, 2024 (R), TMLR (R), Conference on Lifelong Learning Agents(CoLLA) 2022 (PC), Conference on Lifelong Learning Agents(CoLLA) 2023, 2024 (R), ICASSP 2023 (R) (Certificate), Efficient Systems for Foundation Models workshop at ICLR 2023 (R), Continual Learning AI Un-Conference (R), Springer Soft Computing (R).
Achievements
(Complete list available upon request)
Quebec Ministry of Higher Education International Students Scholarship
2022


I was awarded the DIRO x Quebec Ministry of Higher Education international students scholarship worth CAD$4,000 for the academic year 2022.
Quebec Ministry of Higher Education International Students Scholarship
2023


I was awarded the DIRO x Quebec Ministry of Higher Education international students scholarship worth CAD$3,000 for the academic year 2023.
UNIQUE AI Excellence Scholarship
2022


I was awarded the UNIQUE AI Excellence Scholarship worth CAD$10,000 for the academic year 2022. Under this scholarship, I will be working with Irina Rish and Pouya Bashivan on dynamic sparsity based research.
PaperswithCode Top Contributor award
2022


I was awarded the PaperswithCode Top Contributor award for the academic year 2022.
MILA Entrepreneurs Grant
2022


I was awarded the MILA Entrepreneurs Grant worth CAD$5,000 to pursue my startup venture 9CLEF (Elevator Pitch) and build an early prototype.
AMII AI Week Travel Bursary
2022


I was awarded the AMII AI Week 2022 Student Travel Bursary worth CAD$1,500.

Updated on: 1st April, 2024 Danke Schön, Jon Barron!