Goncalo Jorge Gouveia, Ph.D
Senior Research Associate
Boyce Thompson Institute
Department of Chemistry and Chemical Biology
Cornell University
Molecular formulas from isotopic fine structures of ultra-high-resolution LC-MS
Goncalo J. Gouveia-1, Brandon Bills-2, Guohui Li-2, Xiao Wang-2, Joshua P. Kline-2, Sunandini Yedla-2, Delia Qu-3, Aaron M. Ferber-3, Utku Acikalin-3, Yingheng Wang-3, Maximilian J. Helf-4, Carla P. Gomes-3, Frank C. Schroeder-1
1-Boyce Thompson Institute and Department of Chemistry and Chemical Biology, Cornell University, Ithaca; NY
2-Thermo Fisher Scientific, San Jose, CA
3-Department of Computer Science, Cornell University, Ithaca; NY
4-Novartis, Basel, Switzerland
Abstract
Molecular formula (MF) assignment to Liquid Chromatography – Mass Spectrometry features represents a critical first step in the identification of unknown metabolites. MS1 data alone is generally not sufficient to derive a unique MF for metabolites above m/z ~300, even at high-mass accuracies better than 5 ppm. However, in MS1 data collected at ultra-high resolution, the Isotopic Fine Structure (IFS) of metabolites become sufficiently resolved to distinguish nominal isobars (e.g., the M+2 peaks corresponding to 13C2 and 18O) creating a pattern that can be compared to predicted IFS from candidate MFs.
Our approach uses a combination of heuristic elemental filters and a stratified ensemble of similarity metrics, which enabled assignment of >1300 MFs in a single SRM1950 sample collected on an Thermo Scientific™ Orbitrap™ IQ-X™ Tribrid™ at 1M resolution. Our approach does not require MS2 and works directly on raw, unprocessed data regardless of adduct annotation. We then explored the ability to assign MFs as a function of resolution and parameters that modulate ion population-dependent effects. It is well established that space charge effects can generate errors in measured masses and can lead to altered peak intensities and shapes. We demonstrate that the number of assigned MFs doubles when acquiring data over smaller m/z ranges, and that tradeoffs between resolution and experimental peak fidelity to predicted intensities, can be fine-tuned by setting appropriate acquisition parameters. We benchmarked our method against 495 MS2-based annotations with a top-1 correct MF accuracy of 95.3%, establishing an orthogonal method capable of assigning MFs at-scale from LC-MS features even when they are unknown metabolites. The synergistic integration of MS2 data for each assignment can be used for validation while also increasing the MF annotation to near 100% accuracy.
Biosketch
I am a Senior Research Associate at Cornell University in the Boyce Thompson Institute. I’m currently leading efforts to develop AI tools for the identification of unknown metabolites from high-resolution LC-MS spectra. I earned a PhD in Biochemistry and Molecular Biology from the University of Georgia in 2022, mentored by Prof. Arthur Edison where I worked on creating a framework that could integrate NMR and LC-MS, two technologies essential for de novo chemical structure elucidation. Before joining Cornell, I was a Post-Doctoral Scientist at the Institute for Bioscience and Biotechnology Research, mentored by Dr. Frank Delaglio at the National Institute of Standards and Technology. Here I developed quality frameworks for real-time computational tools that optimized CHO and CAR-T cell production in bioreactors, using bench-top NMR metabolomics outputs to adjust bioreactor parameters on-the-fly. Prior to my academic career I obtained BSc and MSc degrees in Forensic Science in the United Kingdom, which led to a professional career as a forensic drugs and toxicology scientist at LGC Forensics, gaining extensive experience in lab management, accreditation, method development and testifying in the court of law.
Date
Date(s) - November 18, 2025
6:00 pm - 8:00 pm
Emplacement / Location
Université de Montréal - Campus MIL (Beer and pizza at 18h, conference at 19h in A-4502)

