https://doi.org/10.1140/epjti/s40485-015-0018-6
Review
Review of multidimensional data processing approaches for Raman and infrared spectroscopy
1
Department of Inorganic and Physical Chemistry, Indian Institute of Science, Bangalore, 560012, India
2
LaserLaB, VU University Amsterdam, Amsterdam, the Netherlands
3
Department of Instrumentation and Applied Physics, Indian Institute of Science, Bangalore, 560012, India
* e-mail: umapathy@ipc.iisc.ernet.in
Received:
1
November
2014
Accepted:
5
May
2015
Published online:
2
June
2015
Raman and Infrared (IR) spectroscopies provide information about the structure, functional groups and environment of the molecules in the sample. In combination with a microscope, these techniques can also be used to study molecular distributions in heterogeneous samples. Over the past few decades Raman and IR microspectroscopy based techniques have been extensively used to understand fundamental biology and responses of living systems under diverse physiological and pathological conditions. The spectra from biological systems are complex and diverse, owing to their heterogeneous nature consisting of bio-molecules such as proteins, lipids, nucleic acids, carbohydrates etc. Sometimes minor differences may contain critical information. Therefore, interpretation of the results obtained from Raman and IR spectroscopy is difficult and to overcome these intricacies and for deeper insight we need to employ various data mining methods. These methods must be suitable for handling large multidimensional data sets and for exploring the complete spectral information simultaneously. The effective implementation of these multivariate data analysis methods requires the pretreatment of data. The preprocessing of raw data helps in the elimination of noise (unwanted signals) and the enhancement of discriminating features. This review provides an outline of the state-of-the-art data processing tools for multivariate analysis and the various preprocessing methods that are widely used in Raman and IR spectroscopy including imaging for better qualitative and quantitative analysis of biological samples.
Key words: Preprocessing / Baseline removal / Principal component analysis / Linear discriminant analysis / Classification models / Clustering / Partial least squares / Cross validation / Receiver operating characteristic
© The Author(s), 2015