Scientific discovery often involves the observation of an inconsistency among “normal” patterns within our data. The observation of something different, incongruous with the data, is what we call anomaly detection. Looking for anomalies is quite different from other tasks since we do not know what exactly to look for—we just need to look for something different. The goal of this workshop is to nurture the community of researchers working at the intersection of machine learning and various scientific domains toward scientific discovery.
This workshop will also serve as the award ceremony with special recognition of the winners for the 1st HDR Interdisciplinary Machine Learning Challenge focusing on anomaly detection. The challenge is a series of 4 challenges one for each of three distinct scientific domains (biology, physics, and climate science), and a combined challenge across domains. A critical element of this challenge was the integration of FAIR and Reproducible science.
Our 1-day workshop will include keynote/invited talks, contributed paper presentations, a poster session, a panel discussion, and winning team presentations of their solutions to the challenge.
The intended audience for this workshop includes (a) AI/ML/data science researchers working on topics such as anomaly and novelty detection, out-of-distribution detection, open world recognition, scientific discovery, and FAIR dataset and reproducible workflows, who are looking for novel interdisciplinary research problems; (b) domain scientists working on problems with data-driven scientific discoveries.
9:00 - 9:10am: Opening Remarks
9:10 - 9:50am: Eric Nalisnick—Anomalous Anomalies: Monitoring and Adapting Anomaly
Detectors
Abstract: Anomaly detectors perform essential tasks ranging from protecting an autonomous
system from abnormal inputs to isolating interesting scientific signals that may lead to
discovery. However, how do we know that the anomaly detector will be robust? The detector itself
could be susceptible to failure if there is a change in the distribution of anomalies expected.
And to make matters worse, we usually don't have an abundance of test-time anomalies that are
labeled as such. In this talk, I will discuss my group's recent work on monitoring if our
anomaly detector is still valid and adapting it to data shift, without supervision.Details
Bio: Eric Nalisnick is an assistant professor at Johns Hopkins University. His research
interests span statistical machine learning and probabilistic modeling, with an emphasis on
quantifying uncertainty in deep learning, human-AI collaboration, specifying prior knowledge,
and detecting distribution shift. He previously was an assistant professor at the University of
Amsterdam, a postdoctoral researcher at the University of Cambridge and a PhD student at the
University of California, Irvine. Eric has also held research positions at DeepMind, Microsoft,
Twitter, and Amazon. His papers have been recognized with selective oral presentations (ECCV
2024) and awards (AIStats 2023, AIStats 2024).
9:50 - 10:30am: Jennifer Ngadiuba—Boosting sensitivity to new physics at the LHC with
anomaly detection
Abstract: Anomaly detection techniques have been proposed as a way to mitigate the impact
of
model-specific assumptions when searching for new physics at the LHC. In this talk I will
discuss how these techniques, when based on modern AI developments, could be utilized
at different stages of data processing workflow, from real-time systems to offline analysis,
and the impact they could have to revolutionize the current paradigms in the search for
new physics.Details
Bio: Jennifer Ngadiuba, a Wilson Fellow at Fermilab since 2021, specializes in searching
for new physics in collider data and advancing AI applications in high-energy physics. After
earning her Ph.D. from the University of Zurich, she contributed to CMS experiment studies,
focusing on diboson resonances and jet substructure techniques. Starting when she was research
fellow at CERN and Caltech, she pioneered deep learning for anomaly detection and fast machine
learning on FPGAs for real-time systems in particle physics. Her work earned her the DOE AI4HEP
award and AI2050 fellowship in 2023, recognizing her transformative contributions to
experimental physics and machine learning.
10:30 - 11:30am: Coffee Break and Poster Session (Papers and Associated Posters)
11:30 - 12:10pm: Adji Bousso Dieng—Vendi Scoring For Discovery
Abstract: This talk will cover the concepts, tools, and methods that make up Vendi
Scoring, a new research direction focused on the concept of diversity. I’ll begin by introducing
the Vendi Scores, a family of diversity metrics rooted in ecology and quantum mechanics, along
with their extensions. Next, I’ll discuss algorithms for efficiently searching large materials
databases and exploring complex energy landscapes, such as those found in molecular simulations,
using the Vendi Scores. Finally, I’ll introduce the new concept of 'algorithmic microscopy,'
which stems from Vendi Scoring, and describe the Vendiscope, the first algorithmic microscope
designed to help scientists zoom in on large data collections for data-driven discovery.Details
Bio: Adji Bousso Dieng is an Assistant Professor of Computer Science at Princeton
University where she leads the lab Vertaix on research at the intersection of artificial
intelligence and the natural sciences. She is affiliated with the Chemical and Biological
Engineering Department, the Princeton Materials Institute, the Princeton Quantum Initiative, the
Andlinger Center for Energy and the Environment, and the High Meadows Environmental Institute
(HMEI) at Princeton. She is also a Research Scientist at Google AI and the founder and President
of the nonprofit The Africa I Know. She has been recently named an Early-Career Distinguished
Presenter at the MRS Spring Meeting, one of 10 African Scholars to watch in 2025 by The Africa
Report, an Outstanding Recent Alumni by Columbia University's Grad School of Arts and Sciences,
an AI2050 Early Career Fellow by Schmidt Sciences, and as the Annie T. Randall Innovator of
2022 for her research and advocacy by the American Statistical Association. She received her
Ph.D. from Columbia University. Her doctoral work received many recognitions, including a Google
Ph.D. Fellowship in Machine Learning, a rising star in Machine Learning nomination by the
University of Maryland, and a Savage Award from the International Society for Bayesian
Analysis, for her doctoral thesis. Dieng's research has been covered in media such as the New
Scientist and TechXplore. She hails from Kaolack, Senegal.
12:10 - 12:30pm: Suhee Yoon—Diffusion based Semantic Outlier Generation via Nuisance
Awareness for Out-of-Distribution Detection
Abstract: Out-of-distribution (OOD) detection, which determines whether a given sample is
part of the in-distribution (ID), has
recently shown promising results through training with synthetic OOD datasets. Nonetheless,
existing methods often
produce outliers that are considerably distant from the ID, showing limited efficacy for
capturing subtle distinctions
between ID and OOD. To address these issues, we propose a novel framework, Semantic Outlier
generation via Nuisance
Awareness (SONA), which notably produces challenging outliers by directly leveraging pixel-space
ID samples through
diffusion models. Our approach incorporates SONA guidance, providing separate control over
semantic and nuisance regions
of ID samples. Thereby, the generated outliers achieve two crucial properties: (i) they present
explicit
semantic-discrepant information, while (ii) maintaining various levels of nuisance resemblance
with ID. Furthermore, the
improved OOD detector training with SONA outliers facilitates learning with a focus on semantic
distinctions. Extensive
experiments demonstrate the effectiveness of our framework, achieving an impressive AUROC of 88%
on near-OOD datasets,
surpassing the performance of baseline methods by a significant margin of approximately 6%.Details
Bio: Suhee Yoon is an AI Research Scientist at LG AI Research, specializing in AI Safety
& Reliability with a focus on
robustness under distribution shifts and efficient adaptation of large-scale foundation models.
She holds a Master of
Science in Industrial Engineering from Sungkyunkwan University. Her research spans various data
domains, including
computer vision, chemistry, and tabular data, aiming to develop AI models that are both reliable
and adaptable to
real-world challenges.
12:30 - 2:00pm: Lunch (not provided)
2:00 - 2:10pm: Challenge Overview
2:10 - 2:50pm: Butterfly Challenge Talk
2:50 - 3:30pm: Gravitational Waves Talk
3:30 - 4:00pm: Coffee Break
4:00 - 4:40pm: Sea Level Rise Challenge Talk
4:40 - 5:00pm: Overall Challenge Talk
5:00 - 5:30pm: Closing Remarks and Discussion of Next Challenge
We encourage participation from researchers in a broad range of topics that explore AI/ML techniques to detect novel patterns and anomalies within data and promote scientific discovery. Examples of research questions include (but are not limited to):
We offer an extended deadline for submissions with the same Camera-Ready and Poster Deadlines, but keep in mind that these are after the AAAI Early Registration Deadline has passed.
We are accepting extended abstract submissions for position, review, or research results (up to 2 pages, excluding references). Shorter versions (up to 6 pages, excluding references) of articles in submission or accepted at other venues (or presented after Oct. 1, 2024) are acceptable as long as they do not violate the dual-submission policy of the other venue. We allow the addition of an appendix (no page limit) following references. All submissions will undergo peer review (double-blind).
Submissions should follow the AAAI template format (two-column, camera-ready style; AAAI Author Kit) and submitted via CMT.
The accepted paper will NOT be archived in AAAI proceedings. This allows the authors to extend their work afterward and submit it to a conference or journal.