Evaluation of Distance Metrics and Spatial Autocorrelation in Uniform Manifold Approximation and Projection Applied to Mass Spectrometry Imaging Data

In this work, we explored the utility of a recently introduced, nonlinear dimensionality reduction method named Uniform Manifold Approximation (UMAP) for MSI data analysis.

Published by
In collaboration with
Access publication
In this work, we explored the utility of a recently introduced, nonlinear dimensionality reduction method named Uniform Manifold Approximation (UMAP) for MSI data analysis. We compared UMAP to PCA and t-SNE. t-SNE is another pervasive nonlinear dimensionality reduction approach that is increasingly used for MSI data analysis. The work was primarily carried out by Tina Smets while receiving input and supervision from Nico Verbeeck and Marc Claesen for data analysis aspects.

Figure 1: Visualizations of two human lymphoma tissues using different approaches. Top row shows the resulting three-dimensional embedding encoded in the color channels of each image. The bottom row shows the associated embedding space. In this image, Euclidian distance was used in methods that enable choices. The data was acquired using a rapifleX MALDI-TOF instrument at 10 µm spatial resolution.

Specifically, our results illustrate that UMAP and t-SNE yield comparable results, which are clearly superior to simpler linear methods like PCA. Compared to t-SNE, however, UMAP provides significant computational advantages, namely:

  • UMAP shows dramatically reduced computation time compared to t-SNE. In our experiments, we’ve observed an order of magnitude speedup of UMAP compared to the well-known Barnes-Hut approximation of t-SNE.
  • In contrast to t-SNE, UMAP enables out-of-sample prediction, which means that the model can be used to embed data it was not trained on. This is a critical advantage for many applications.

Additional to the investigation of UMAP itself, we compared various distance metrics for MSI data. The results are shown in the figure below.

Figure 2: UMAP-based visualizations of two human lymphoma tissue using various distance metrics.

Upon comparing Figures 1 and 2, we can clearly see the superiority of distance metrics like cosine similarity and correlation compared to using standard Euclidian distance to model chemical similarity across spectra. The main underlying mathematical weaknesses of the Euclidian distance to model chemical similarity are its sensitivity to outliers (in this case m/z bins with very high intensity compared to the rest) along with its well-known problems when working in sparse, high-dimensional spaces.

Finally, during our investigation we identified a region of outlier pixels that skewed the UMAP analysis such that the dynamic range of colors was poorly used. After removing the impact of these outliers, we managed to improve our visualizations further.

Figure 3: UMAP-based visualization and associated embedding of a human lymphoma tissue after removing the influence of outlier pixels.

We were the first to apply UMAP to MSI data and showed that it has significant potential within MSI data analysis. Since our publication, UMAP has received a lot of attention by the MSI community and UMAP is now being adopted as a staple approach for MSI dimensionality reduction.

Publication details

Tina Smets1, Nico Verbeeck1,2, Marc Claesen1,2, Arndt Asperger3, Gerard Griffioen4, Thomas Tousseyn5, Wim Waelput6, Etienne Waelkens7, Bart De Moor1. Evaluation of Distance Metrics and Spatial Autocorrelation in Uniform Manifold Approximation and Projection Applied to Mass Spectrometry Imaging Data, Analytical Chemistry 91:, 5706–5714, 2019