An alternative method for presence/absence maps determination by orthogonal projections

Get Complete Project Material File(s) Now! »

Raman imaging system

The image was collected using a RM300 PerkinElmer system (Perkin Elmer, Waltham, MA) and the Spectrum Image version 6.1 software. The microscope was coupled to the spectrometer and spectra were acquired through it with a spatial resolution of 10µm in a Raman diffuse reflection mode. Wavenumber range was 3200–100 cm-1 with a resolution of 2 cm-1. Spectra were acquired at a single point on the sample, then the sample was moved and another spectrum was taken. This process was repeated until spectra of points covering the region of interest were obtained.
A 785nm laser with a power of 400mW was used. Two scans of two seconds were accumulated for each spectrum. An image of 70 pixels per 70 pixels corresponding to 4900 spectra was acquired for a surface of 700µm by 700µm.

Pre-processing

Data were pre-processed in order to remove non-chemical biases from the spectra (scattering effect due to non-homogeneity of the surface, interference from external light source, spikes due to cosmic rays, random noise). First of all, data were spike-corrected in order to reduce the effect of cosmic rays [61]. Next, the spectral range was reduced in order to focus only on the region of interest, corresponding to a Raman shift from 1800cm-1 to 200 cm-1. Reduced spectra were pre-processed by standard normal variates correction (SNV) [72] in order to reduce the effect of baseline variations and uninformative variations in global spectral intensity.

Independent Component Analysis (ICA)

ICA is one of the most powerful techniques in blind source separation [119]. It has been developed to extract the pure underlying signals from a set of mixed signals in unknown proportions. Considering a noise-free ICA model, a matrix X (n x m) of n spectra and m variables (Raman shift) is decomposed as a linear generative model by the following expression:
Where S is a (k x m) matrix of k independent source signals called the independent components and A is a (n x k) mixing matrix of coefficients or proportions of the pure signals in each mixed signal of X.
The objective of ICA is to estimate a set of vectors that are as independent as possible, and the mixed signals in X can then be expressed as linear combinations of these independent components (ICs). It attempts to recover the original signals by estimating a linear transformation, using a criterion which reflects the statistical independence among the sources.
To solve the previous equation (Eq. III-1), an unmixing matrix W based on the observation of X needs to be calculated. The output U, constituted by the independent component u1,u2, … un should be as independent as possible. For a noise-free ICA model, W should be the inverse of A, and U should be equal to S, according to the following equation:
Lots of algorithms are available to perform ICA calculations such as FastICA [120] or Radical [121]. In this paper, the JADE (Joint Approximate Diagonalization of Eigenmatrices) algorithm was used [122]. Compared with other methods based on parameter optimization, the JADE algorithm performs matrix diagonalizations, and therefore does not involve an optimization procedure [123].
The ICA_by_blocks algorithm [124] was used to determine the optimal number of signals to extract. This method starts by splitting the initial data matrix X into B blocks of samples (with approximately equal numbers of rows). Note that the samples in each block have to be representative of the whole dataset. ICA models are then computed with an increasing number of ICs for each block. To ensure the same signs of the ICs of the different models, the signs of the vector A (and therefore the corresponding S) are adjusted so that the most intense value in each vector of A is positive. ICs corresponding to true source signals should be found in all representative subsets of samples, or row blocks, of the full data matrix. These ICs should be strongly correlated.

Data analysis

Data analysis was performed by using Matlab R2012a software. The Matlab code of the JADE algorithm was downloaded from the web site in ref. [125].

Results & discussion

Selection of number of independent components

Determination of the number of ICs for ICA decomposition is a critical step of the data analysis. Indeed, calculating too few ICs results in non-pure signals, whereas calculating too many ICs can decompose pure signals into several contributions. The ICA_by_blocks method was applied by splitting the dataset row-wise into two blocks and by performing ICAs on each block. Sample selection to create the two subsets was done by using a « venetian blind » procedure. Each test set is determined by selecting every bth (number of blocks) object in the dataset, starting at object number one. ICA models were calculated for both blocks with from 1 IC to 20 ICs. ICs were compared in each block by calculating the correlation coefficients between all pairs of signals from both blocks for a given model. The highest-dimensional model for which ICs obtained in a block were similar to ICs obtained in another block indicates the optimal number of ICs to extract from the data under study. Figure III-2 shows that the lowest correlation between signals significantly decreases after 9 ICs, which was therefore considered as the optimal number of component for the decomposition of the dataset. The initial drop after 4 ICs and then after 7 ICs is assumed to be due to the fact that the ICs are not extracted from the two data blocks in exactly the same order.
Since the sample contains five compounds and supposing that the five spectra are independent and that the acquired mixture spectra are linear combinations of the pure spectra, five ICs should have been sufficient. In this example, in contrast with the theoretical decomposition, four more components were used to build ICA models. Physical effects such as particle size variation or fluorescence of a compound could explain this “over-decomposition” of the dataset.

READ long-term adaptive response to high-frequency light signals in the unicellular eukariote dunaliella salina

Distribution of API

An ICA model based on the JADE decomposition with 9 ICs was calculated on the unfolded, SNV pre-processed data cube. The matrices of the proportions, A, for each signal, S, were then folded back in order to obtain a representation of the spatial distribution of each independent component. In Figure III-3, different textures of images can be observed. Indeed, IC1, IC6 and IC9 show very specific inhomogeneous distributions with agglomerates. Considering the different scales of score images, IC2, IC3, IC4, and IC5 have similar textures (or distributions) such as IC7 and IC8 which are the same as that in IC1. It can also be seen that the distributions observed in these two sets of images are complementary, indicating that these two sets of Independent components occupy complementary regions in the tablet. In order to associate an independent component with a chemical compound, the calculated signals were examined.

Table of contents :

Chapter I: General introduction
1. Introduction
2. Outline of the thesis
Chapter II: The use of Raman spectroscopy in the pharmaceutical environment: theory and applications
1. Raman spectroscopy
1.1. Theoretical aspects
1.2. Raman chemical imaging
1.3. Applications in the pharmaceutical environment
2. Chemometric tools
2.1. Data pre-processing
2.1.1. Spike correction
2.1.2. Baseline correction
2.1.3. Normalisation
2.1.4. Derivatives
2.2. Multivariate data analysis
2.2.1. Principal component analysis
2.2.2. Independent component analysis
2.2.3. Multivariate curve resolution-Alternating least squares
3. Identification of a low dose compound
3.1. Definition of a low dose compound
3.2. The sampling aspect
3.3. Data analysis aspect
3.4. Contributions of the thesis
Chapter III: Use of blind source separation approach for pure spectra determination and spatial distribution of constituents
1. Introduction
2. Materials and methods
2.1. Samples
2.2. Raman imaging system
2.3. Pre-processing
2.4. Independent Component Analysis (ICA)
2.5. Data analysis
3. Results & discussion
3.1. Selection of number of independent components
3.2. Distribution of API
4. Conclusions
Chapter IV: Use of multivariate curve resolution for identification of a low dose compound
1. Introduction
2. Materials and Methods
2.1. Samples
2.2. Raman imaging system
2.3. Pre-processing
2.4. Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS)
3. Results and discussion
3.1. Exploratory analysis
3.2. MCR-ALS
3.2.1. Non-negativity and local rank constraints
3.2.2. Effect of PCA filtering on MCR-ALS results
3.2.3. Pure spectrum augmented matrix
4. Conclusions
Chapter V: An alternative method for presence/absence maps determination by orthogonal projections
1. Introduction
2. Theory
2.1. Notations
2.2. Pretreatment using orthogonal projections
2.3. Multivariate curve resolution-alternating least squares (MCR-ALS)
2.4. Proposed approach to determine presence/absence maps of compounds to set local rank constraints
3. Materials and methods
3.1. Raman microscopy
3.2. Samples
3.2.1. Simulated data
3.2.2. Real dataset
4. Results and discussion
4.1. Principal component analysis (PCA) on pure images
4.2. Proposed approach on simulated data
4.3. Proposed approach on real dataset
5. Conclusions
Chapter VI: An iterative approach for compound detection in an unknown formulation
1. Introduction
2. Materials and methods
2.1. Notations
2.2. Samples
2.3. Raman imaging system
2.4. Spectral library
2.5. Proposed approach
2.5.1. Spectral distances
2.5.2. Identification of the pure compound
2.5.3. Orthogonal projection
2.5.4. Overview of the iterative approach
3. Results and discussion
3.1. Identification of the tablet compounds
3.2. Multivariate curve resolution-alternating least squares
4. Conclusions
Chapter VII: Conclusions and future work
1. Introduction
2. Main contributions
2.1. A flashback to the beginning of this work
2.2. Applications of a blind source separation methodology
2.3. Applications of multivariate curve resolution
2.4. Alternative method for presence/absence map estimations
2.5. Compound detection in an unknown formulation
3. Limits and future work