Machine Learning Algorithms that Make Sense

in constrained and large-scale, societal applications in Advertising, Healthcare, Sustainability (Remote Sensing, Computing, Agricultural)...

MAIL stands for practical Machine Learning and AI Lab, led by Dr. Khoa D Doan.

We aim to develop simple, reliable, and secure computational frameworks that enable existing complex/deep models to be usable for practical applications. Our lifelong passion is to be able to deploy these algorithms in low-resource and low-income communities, catalyzing equitable ML access and helping to democratize knowledge in these communities and improve the quality of life of their citizens.

Selected Press Coverage: khoahocphattrien , Thanh Nien, VnExpress, BaoDauTu, DanTri, Vietnam.vn, Vietnam.vn, Yahoo Finance, Benzinga, Macau Business, Taiwan News, TNGlobal, VinGroup

Research Interests

Our research seek answers to the following question: Are existing ML, and recently MLLM, simple, reliable, and secure to use in practical applications? Here, simplicity, given the context of the model deployment, refers to the ability to (i) feasibly build or adapt the method, (ii) execute the deployed model efficiently, and (iii) evolve the deployed model with less effort. Reliability measures (i) how well the model understands real-world contexts (e.g., spatial and multi-lingual/cultural) and solves the intended task well, and (ii) how much of this performance is preserved under frequently perturbed, practical environments such as data corruptions or distributional changes. Finally, Security refers to the model’s resilience against various forms of security vulnerabilities, such as exploratory and causal attacks.

Although many ML methods are reliable in ideal and high-resource settings, we believe that they remain fragile and difficult to deploy under real-world constraints – often failing to maintain their performance, adaptability, and usability. Addressing this gap is central to our research, as it will enable us to realize the true potential of ML in practice – translating methodological progress into tangible societal impact, particularly for low-resource and underserved communities that have yet to fully benefit from these advancements.

Our current research scope roughly falls into the 4 different areas: (1) understanding of security vulnerabilities, (2) assessment of modern ML systems in real-world conditions, (3) mechanistic interpretation and theoretical understanding of real-world failure modes, (4) efficient control (during training and inference) of the models for practical use.

Robust and Secure Machine Learning

We first aim to investigate the security and reliability issues of existing ML models under various forms of attacks – including causative attacks (such as trojan/backdoor attacks) and exploratory attacks (such as adversarial examples) – and test-time distributional changes, then develop suitable countermeasures and mitigation approaches to ensure their safe and effective deployment in real-world scenarios.

  • Clean-Label Physical Backdoor Attacks with Data Distillation (AAAI 2026 by Dao et al.)
  • Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks (ICLR 2025 by Nguyen et al.)
  • Overcoming Catastrophic Forgetting in Federated Class-Incremental Learning via Federated Global Twin Generator (2024 by Nguyen et al)
  • Estimating Uncertainties of Multimodal Models with Missing Modalities (2024 Nguyen et al.)
  • Non-Cooperative Backdoor Attacks in Federated Learning: A New Threat Landscape (2024 by Nguyen et al.)
  • Everyone Can Attack: Repurpose Lossy Compression as a Natural Backdoor Attack (2024 by Yang et al.)
  • Synthesizing Physical Backdoor Datasets: An Automated Framework Leveraging Deep Generative Models (2024 by Yang et al.)
  • Flatness-aware Sequential Learning Generates Resilient Backdoors (ECCV ORAL 2024 by Pham et al.)
  • Data Poisoning Quantization Backdoor Attack (ECCV 2024 by Huynh et al.)
  • Composite Concept Extraction through Backdooring (ICPR* 2024 by Ghosh et al.)
  • Fooling the Textual Fooler via Randomizing Latent Representations (ACL 2024 by Hoang et al.)
  • Understanding the Robustness of Randomized Feature Defense Against Query-Based Adversarial Attacks (ICLR 2024 by Nguyen et al.)
  • Backdoor attacks and defenses in federated learning: Survey, challenges and future research directions (EAAI 2024 by Nguyen et al.)
  • Iba: Towards irreversible backdoor attacks in federated learning (NeurIPS 2023 by Nguyen et al.)
  • A Cosine Similarity-based Method for Out-of-Distribution Detection (ICML-W 2023 by Nguyen et al.)
  • Defending backdoor attacks on vision transformer via patch processing (AAAI 2023 by Doan et al.)
  • Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class (NeurIPS 2022 by Doan et al.)
  • Backdoor Attack with Imperceptible Input and Latent Modification (NeurIPS 2021 by Doan et al.)
  • LIRA: Learnable, Imperceptible and Robust Backdoor Attacks (ICCV 2021 by Doan et al.)

Practical Information Retrieval

We aim to develop retrieval models, especially the hashing/quantization techniques, that (i) can be trained efficiently on large-scale data, (ii) can make inference decisions in real-time, (iii) can generalize well with limited labeled data, and (iv) have robust inference such as out-of-distribution and missing-data robustness.

  • Cooperative Learning of Multipurpose Descriptor and Contrastive Pair Generator via Variational MCMC Teaching for Supervised Image Hashing (2024 by Doan et al)
  • Asymmetric Hashing for Fast Ranking via Neural Network Measures (SIGIR 2023 by Doan et al.)
  • One Loss for Quantization: Deep Hashing with Discrete Wasserstein Distributional Matching (CVPR 2022 by Doan et al.)
  • Generative Hashing Network (ACCV 2022 by Doan et al.)
  • Interpretable Graph Similarity Computation via Differentiable Optimal Alignment of Node Embeddings (SIGIR 2021 by Doan et al.)
  • Image Hashing by Minimizing Discrete Component-wise Wasserstein Distance (2021 by Doan et al.)
  • Efficient Implicit Unsupervised Text Hashing using Adversarial Autoencoder (WWW 2020 by Doan et al.)

Generative Modeling and its Applications

We aim to (i) study and understand the characteristics and principles behind generative models, including generative-adversarial networks, energy-based models, and diffusion models then (ii) develop robust, data-efficient, and/or secured generative-based predictive frameworks, focusing on accelerating security and reliability research in practical applications.

  • Unveiling Concept Attribution in Diffusion Models (2024 by Nguyen et al.)
  • Sparse Watermarking in LLMs with Enhanced Text Quality (2024 by Hoang et al.)
  • Fair Generation in LLMs with RAG (Chu et al.)
  • Reward Over-optimization in Direct Alignment Algorithms with Adaptive Learning (2024 by Nguyen et al.)
  • Predictive Concept Attribution in Difussion Models (by Nguyen et al.)
  • Synthesizing Physical Backdoor Datasets: An Automated Framework Leveraging Deep Generative Models (2024 by Yang et al.)
  • Image Generation Via Minimizing Frechet Distance in Discriminator Feature Space (2021 by Doan et al.)

Low-resource Machine Learning

One of our lifelong passions is to catalyze equitable access to AI tools and approaches in low-resource and low-income communities, starting with the development of high-quality, culture-aware benchmarking systems, and ending with suitable solutions to adapt the existing ML models with satisfactory performance and efficient resource utilization. This effort will help democratize knowledge in these communities, significantly improving the quality of life of their citizens.

Consequently, a large part of our research is now devoted to solving various societal challenges with AI in low-resource communities. We’re working on problems such as low-resource NLP, low-resource remote-sensing predictive algorithms, cross-cultural language understanding, and visual question answering algorithms for medical domain. For more information, please visit the Center for Envrionmental Intelligence (CEI, where Prof. Khoa D. Doan is currently the Environment Monitoring Lab Director) and VinUni-Illinois Smart Health Center (VISHC, where Prof. Khoa D. Doan is currently the Associate Director)

VinUni-Illinois Smart Health Center (VISHC) – VISHC is open to collaborate with all researchers and research/industry institutions in Vietnam and around the world. VISHC aims to solve various healthcare related challenges with translational and innovative research. Led by Prof. Minh Do and Prof. Helen Nguyen at UIUC, and Prof. Khoa D Doan at VinUni (who leads MAIL-Research), the VISHC’s team comprises of world-renowned researchers and talented PhD/Master Students, Research Assistants and Postdocs. Please reach out via email for any collaboration opportunities.

Center for Environmental Intelligence (CEI) – MAIL-Research is a member of CEI. CEI represents a pioneering initiative at the intersection of advanced technology, environmental science, and interdisciplinary research and is open for collaboration. Led by Prof. Laurent El Ghaoui, CEI aims to address critical global sustainability challenges with innovative approaches based on AI. Please reach out via email for any collaboration opportunities, especially those related to environmental monitoring.

The brick walls are there for a reason. The brick walls are not there to keep us out. The brick walls are there to give us a chance to show how badly we want something -- Randy Pausch