Machine Learning Algorithms that Make Sense
in constrained and large-scale settings with applications in Advertising, Healthcare, Sustainability (Climate, Computing, Agricultural), Social Goods...
MAIL stands for practical Machine Learning and AI Lab, led by Dr. Khoa D Doan.
Here at MAIL, We develop computational frameworks that enable existing complex/deep models to be more suitable for practical uses. We focus on improving the following aspects of existing models: (i) training/inference, (ii) realistic assumptions, (iii) algorithmic robustness, and (iv) efficiency in constrained settings. Most of our ML/AI solutions center around large-scale approaches that have low computational complexity and require less human effort.
Selected Press Coverage: khoahocphattrien , Thanh Nien, VnExpress, BaoDauTu, DanTri, Vietnam.vn, Vietnam.vn, Yahoo Finance, Benzinga, Macau Business, Taiwan News, TNGlobal, VinGroup…
Research Interests
Our research seek answers to the following question: Are existing ML methods simple to use and reliable for practical uses? Simplicity, given the context where ML is to be deployed, refers to the ability to (i) feasibly build or implement the method, (ii) execute the deployed model efficiently, and (iii) evolve the deployed model with less effort. Reliability relates to (i) whether we can rely on the model to solve the intended task well, (ii) whether this performance is preserved under frequently perturbed environments in practice such as data corruptions or distributional changes, and (iii) whether the model is resilient to (i.e., its performance is not significantly affected by) various forms of security attacks such as adversarial examples and causal attacks. In this sense, I believe that many existing ML methods, including those with complex deep neural networks (DNNs), are reliable in ideal and high-resource settings but not yet reliable and simple to use given real-world constraints. The effort to answer this question will help us truly realize the potential of AI/ML methodology in practice, further advancing its societal benefits, especially for many low-resource and low-income communities that have yet equally benefitted from these advancements.
Our goal, therefore, is to develop computational frameworks that enable existing complex/deep models to be more suitable for practical uses. We focus on studying and improving the following aspects of existing ML models: (i) training, (ii) inference, (iii) realistic assumptions, and (iv) security/robustness. Our current research scope roughly falls into the following areas or themes:
Robust Machine Learning
We first aim to investigate the security and reliability issues of existing ML models under various forms of attacks – including causative attacks (such as trojan/backdoor attacks) and exploratory attacks (such as adversarial examples) – and test-time distributional changes, then develop suitable countermeasures and mitigation approaches to ensure their safe and effective deployment in real-world scenarios.
- Overcoming Catastrophic Forgetting in Federated Class-Incremental Learning via Federated Global Twin Generator (2024 by Nguyen et al)
- Estimating Uncertainties of Multimodal Models with Missing Modalities (2024 Nguyen et al.)
- Non-Cooperative Backdoor Attacks in Federated Learning: A New Threat Landscape (2024 by Nguyen et al.)
- Everyone Can Attack: Repurpose Lossy Compression as a Natural Backdoor Attack (2024 by Yang et al.)
- Synthesizing Physical Backdoor Datasets: An Automated Framework Leveraging Deep Generative Models (2024 by Yang et al.)
- Flatness-aware Sequential Learning Generates Resilient Backdoors (ECCV 2024 by Pham et al.)
- Data Poisoning Quantization Backdoor Attack (ECCV 2024 by Huynh et al.)
- Composite Concept Extraction through Backdooring (ICPR* 2024 by Ghosh et al.)
- Fooling the Textual Fooler via Randomizing Latent Representations (ACL 2024 by Hoang et al.)
- Understanding the Robustness of Randomized Feature Defense Against Query-Based Adversarial Attacks (ICLR 2024 by Nguyen et al.)
- Backdoor attacks and defenses in federated learning: Survey, challenges and future research directions (EAAI 2024 by Nguyen et al.)
- Iba: Towards irreversible backdoor attacks in federated learning (NeurIPS 2023 by Nguyen et al.)
- A Cosine Similarity-based Method for Out-of-Distribution Detection (ICML-W 2023 by Nguyen et al.)
- Clean-label Backdoor Attacks by Selectively Poisoning with Limited Information from Target Class (NeurIPS-W 2023 by Nguyen et al.)
- Defending backdoor attacks on vision transformer via patch processing (AAAI 2023 by Doan et al.)
- Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class (NeurIPS 2022 by Doan et al.)
- Backdoor Attack with Imperceptible Input and Latent Modification (NeurIPS 2021 by Doan et al.)
- LIRA: Learnable, Imperceptible and Robust Backdoor Attacks (ICCV 2021 by Doan et al.)
Practical Information Retrieval
We aim to develop retrieval models, especially the hashing/quantization techniques, that (i) can be trained efficiently on large-scale data, (ii) can make inference decisions in real-time, (iii) can generalize well with limited labeled data, and (iv) have robust inference such as out-of-distribution and missing-data robustness.
- Cooperative Learning of Multipurpose Descriptor and Contrastive Pair Generator via Variational MCMC Teaching for Supervised Image Hashing (2024 by Doan et al)
- Asymmetric Hashing for Fast Ranking via Neural Network Measures (SIGIR 2023 by Doan et al.)
- One Loss for Quantization: Deep Hashing with Discrete Wasserstein Distributional Matching (CVPR 2022 by Doan et al.)
- Generative Hashing Network (ACCV 2022 by Doan et al.)
- Interpretable Graph Similarity Computation via Differentiable Optimal Alignment of Node Embeddings (SIGIR 2021 by Doan et al.)
- Image Hashing by Minimizing Discrete Component-wise Wasserstein Distance (2021 by Doan et al.)
- Efficient Implicit Unsupervised Text Hashing using Adversarial Autoencoder (WWW 2020 by Doan et al.)
Generative Modeling and its Applications
We aim to (i) study and understand the characteristics and principles behind generative models, including generative-adversarial networks, energy-based models, and diffusion models then (ii) develop robust, data-efficient, and/or secured generative-based predictive frameworks, focusing on accelerating security and reliability research in practical applications.
- Unveiling Concept Attribution in Diffusion Models (2024 by Nguyen et al.)
- Sparse Watermarking in LLMs with Enhanced Text Quality (2024 by Hoang et al.)
- Fair Generation in LLMs with RAG (Chu et al.)
- Reward Over-optimization in Direct Alignment Algorithms with Adaptive Learning (2024 by Nguyen et al.)
- Predictive Concept Attribution in Difussion Models (by Nguyen et al.)
- Synthesizing Physical Backdoor Datasets: An Automated Framework Leveraging Deep Generative Models (2024 by Yang et al.)
- Image Generation Via Minimizing Frechet Distance in Discriminator Feature Space (2021 by Doan et al.)
Low-resource Machine Learning
One of our lifelong passions is to catalyze equitable access to AI tools and approaches in low-resource and low-income communities, starting with the development of high-quality, culture-aware benchmarking systems, and ending with suitable solutions to adapt the existing ML models with satisfactory performance and efficient resource utilization. This effort will help democratize knowledge in these communities, significantly improving the quality of life of their citizens.
Consequently, a large part of our research is now devoted to solving various societal challenges with AI in low-resource communities. We’re working on problems such as low-resource NLP, low-resource remote-sensing predictive algorithms, cross-cultural language understanding, and visual question answering algorithms for medical domain. For more information, please visit the Center for Envrionmental Intelligence (CEI, where Prof. Khoa D. Doan is currently the Environment Monitoring Lab Director) and VinUni-Illinois Smart Health Center (VISHC, where Prof. Khoa D. Doan is currently the Associate Director)
VinUni-Illinois Smart Health Center (VISHC) – VISHC is open to collaborate with all researchers and research/industry institutions in Vietnam and around the world. VISHC aims to solve various healthcare related challenges with translational and innovative research. Led by Prof. Minh Do and Prof. Helen Nguyen at UIUC, and Prof. Khoa D Doan at VinUni (who leads MAIL-Research), the VISHC’s team comprises of world-renowned researchers and talented PhD/Master Students, Research Assistants and Postdocs. Please reach out via email for any collaboration opportunities.
Center for Environmental Intelligence (CEI) – MAIL-Research is a member of CEI. CEI represents a pioneering initiative at the intersection of advanced technology, environmental science, and interdisciplinary research and is open for collaboration. Led by Prof. Laurent El Ghaoui, CEI aims to address critical global sustainability challenges with innovative approaches based on AI. Please reach out via email for any collaboration opportunities, especially those related to environmental monitoring.