Evaluation of Distance Metrics and Spatial Autocorrelation in Uniform Manifold Approximation and Projection Applied to Mass Spectrometry Imaging Data
In this work, we explored the utility of a recently introduced, nonlinear dimensionality reduction method named Uniform Manifold Approximation (UMAP) for MSI data analysis. We compared UMAP to PCA and t-SNE. t-SNE is another pervasive nonlinear dimensionality reduction approach that is increasingly used for MSI data analysis. The work was primarily carried out by Tina Smets while receiving input and supervision from Nico Verbeeck and Marc Claesen for data analysis aspects.
Figure 1: Visualizations of two human lymphoma tissues using different approaches. Top row shows the resulting three-dimensional embedding encoded in the color channels of each image. The bottom row shows the associated embedding space. In this image, Euclidian distance was used in methods that enable choices. The data was acquired using a rapifleX MALDI-TOF instrument at 10 µm spatial resolution.
Specifically, our results illustrate that UMAP and t-SNE yield comparable results, which are clearly superior to simpler linear methods like PCA. Compared to t-SNE, however, UMAP provides significant computational advantages, namely:
- UMAP shows dramatically reduced computation time compared to t-SNE. In our experiments, we’ve observed an order of magnitude speedup of UMAP compared to the well-known Barnes-Hut approximation of t-SNE.
- In contrast to t-SNE, UMAP enables out-of-sample prediction, which means that the model can be used to embed data it was not trained on. This is a critical advantage for many applications.
Additional to the investigation of UMAP itself, we compared various distance metrics for MSI data. The results are shown in the figure below.
Figure 2: UMAP-based visualizations of two human lymphoma tissue using various distance metrics.
Upon comparing Figures 1 and 2, we can clearly see the superiority of distance metrics like cosine similarity and correlation compared to using standard Euclidian distance to model chemical similarity across spectra. The main underlying mathematical weaknesses of the Euclidian distance to model chemical similarity are its sensitivity to outliers (in this case m/z bins with very high intensity compared to the rest) along with its well-known problems when working in sparse, high-dimensional spaces.
Finally, during our investigation we identified a region of outlier pixels that skewed the UMAP analysis such that the dynamic range of colors was poorly used. After removing the impact of these outliers, we managed to improve our visualizations further.
Figure 3: UMAP-based visualization and associated embedding of a human lymphoma tissue after removing the influence of outlier pixels.
We were the first to apply UMAP to MSI data and showed that it has significant potential within MSI data analysis. Since our publication, UMAP has received a lot of attention by the MSI community and UMAP is now being adopted as a staple approach for MSI dimensionality reduction.
Tina Smets1, Nico Verbeeck1,2, Marc Claesen1,2, Arndt Asperger3, Gerard Griffioen4, Thomas Tousseyn5, Wim Waelput6, Etienne Waelkens7, Bart De Moor1. Evaluation of Distance Metrics and Spatial Autocorrelation in Uniform Manifold Approximation and Projection Applied to Mass Spectrometry Imaging Data, Analytical Chemistry 91:, 5706–5714, 2019
STADIUS Center for Dynamical Systems, Signal Processing, and Data Analytics, Department of Electrical Engineering (ESAT), KU Leuven, 3001 Leuven, Belgium
Aspect Analytics NV, C-mine 12, 3600 Genk, Belgium
Bruker Daltonik GmbH, Fahrenheitstrasse 4, 28359 Bremen, Germany
reMYND, Bio-Incubator, Gaston Geenslaan 1, 3000 Leuven, Belgium
Department of Pathology, University Hospitals KU Leuven, 3001 Leuven, Belgium
Department of Pathology, UZ-Brussel, 1000 Brussels, Belgium
Department of Cellular and Molecular Medicine, KU Leuven, 3000 Leuven, Belgium
View journal page