Home HomeContact ContactSitemap SitemapPrivate Zone Private ZoneSlovenská verzia Slovenská verzia
Institute of Measurement Science SAS Slovak Academy of Sciences (SAS)
Home
Contact
Organization Structure
History
- - - - - - -
Infrastructure
Staff
Departments
Library
Common Laboratories
- - - - - - -
Projects
Selected Results
Publications and Citations
Annual Reports
- - - - - - -
Doctoral Study
Pedagogic Activities
Offered Jobs
Home arrow Departments arrow Theoretical Methods arrow Department Projects arrow BAMOD - Breath-gas analysis for molecular-oriented detection of minimal diseases
BAMOD - Breath-gas analysis for molecular-oriented detection of minimal diseases
Page 2 of 3

Results

During the period of the first 12 month of the BAMOD project we have developed a first version of the Matlab toolbox for statistical analysis of concentration measurements of volatile organic compounds (VOCs) in human breath gas for potential detection of primary lung cancer (PLC), based on the proton-transfer-reaction mass spectrometry (PTR-MS). The developed system is applicable also for the statistical analysis of PTR-MS measurements on cell and bacterial cultures, as well as for the analysis of human breath gas concentration measurements based on selected-ion-flow-tube mass spectrometry (SIFT-MS), provided that the input data are in the standard format.

The current version of the toolbox consists of a set of Matlab functions divided into the following packages:

  • Functions for the standard dataset format preparation and manipulation. This version includes the interface for extracting data from the so called GES database (i.e. the data structure of the human breath gas samples from primary lung cancer patients and healthy individuals, measured by PTR-MS, used at the Medical University of Innsbruck, Austria). This includes pre-filtration of the data and application of further restrictions on compounds, like e.g. age, gender, smoking habit, etc.
  • Functions for basic descriptive statistical analyses, including tests for normal distribution, comparison of empirical and fitted normal and log-normal distributions, confidence intervals for difference of means of two normal distributions and/or for difference of means and ratios of means of two log-normal distributions, multivariate test for equality of means, multivariate analysis of variance MANOVA.
  • Classification algorithms for classification into two, or more than two, different groups (e.g. cancer patients and healthy controls). The following classification methods are included:
    • Classifier based on a linear/quadratic discriminant analysis;
    • Weighted ranks classifier;
    • Support vector machines classifier;
    • Partial least squares classifier;
    • Weighted voting classifier and the classifier based on Classification trees.

The Matlab toolbox, distributed as the Bamod Statistical Toolbox, is available at the Bamod communication platform: https://ox.voc-research.at/, the statistical algorithms section.

An important objective of the Deliverable is the analysis of discriminating potential of selected VOCs for classification into two groups (lung cancer patients and healthy controls), based on preliminary measurement of concentrations in exhaled breath measured by PTR-MS at ppb levels.

The preliminary database with data from clinical trial based on breath gas samples from primary lung cancer patients and healthy individuals measured by PTR-MS (GES database) consists of concentration measurements of product ions at 11 selected m/z values (mass-to-charge ratios), namely, m/z of 'm31', 'm33', 'm42', 'm59', 'm63', 'm69', 'm79', 'm93', 'm107', 'm108', and 'm115'.

Based on the analysis those measurements, we have characterised the potential VOC markers that have the discriminating power to distinguish between healthy controls and the PLC patients based on the exhaled breath gas concentrations. The best predictive potential seems to be with 'm31' (formaldehyde), followed by the other VOCs: 'm33' (methanol), 'm42' (acetaldehyde), 'm59' (acetone), 'm63' (dimethyl-sulfide), 'm69' (isoprene), 'm107' (xylene), and 'm115' (heptanone). The VOC-concentrations are higher for patients in the mass-to-charge ratios 'm31', 'm42', 'm59', 'm107', and 'm115', and lower in masses 'm63', and 'm69'.

Based on the average misclassification errors obtained by the QDA for individual explanatory variables the best performance has been achieved with 'm31' (formaldehyde), followed by 'm42' (acetaldehyde), 'm69' (isoprene), and 'm33' (methanol).

Based on the average misclassification errors obtained by the QDA for the pairs of explanatory variables, the best performances have been achieved for the following pairs: 'm31', 'm33' (formaldehyde, methanol), 'm31', 'm79' (formaldehyde, benzene), 'm31', 'm69' (formaldehyde, isoprene), and 'm31', 'm59' (formaldehyde, acetone).

BAMOD Box Plots

Figure Box plots of selected VOCs concentrations (the measurements in particles-per-billion were logarithmically transformed). Here by (e) we denote the concentrations in the exhaled breath gas, by (i) we denote the concentrations in the inspired breath gas, by (c) we denote the concentration measurements in the group of healthy controls, and by (p) we denote the concentration measurements in the group of lung cancer patients.

During the second reporting period (08/2007-01/2009) IM SAS have developed a second version of the Matlab toolbox for statistical analysis of concentration measurements of volatile organic compounds (VOCs) in human breath gas for potential detection of primary lung cancer (PLC), based on the proton-transfer-reaction mass spectrometry (PTR-MS). The toolbox contains functions for the selection, manipulation and visualization of data as well as the statistical functions such as descriptive statistics, some classification methods, ROC analysis, etc. It has been designed and developed under Matlab R2006b so the compatibility with higher Matlab versions should be guaranteed. The up-to-date release of the Bamod Statistical Toolbox is available in the statistical algorithms section of the BAMOD communication platform (https://ox.voc-research.at/).The Bamod Statistical Toolbox is divided into four sub-packages that implement different bundle of features:

  1. Data management package;
  2. Descriptive statistics package;
  3. Classification methods package;
  4. ROC analysis package.

Data management

Due to the need of processing several different data-formats coming from variety of studies and instruments, a set of data management functions has became an essential part of the Bamod Statistical Toolbox. Its core element is the standard data structure that was designed for exchanging the data between all other parts of the toolbox. It helps to reduce demands for incorporating new data-formats to the toolbox. Any external data-format is translated to the standard data structure using the customized convertor function. Such convertors were prepared for the most data-formats used in the BAMOD project.

Descriptive statistics

The Descriptive statistics package includes functions for basic description and statistical analysis of multivariate measurements (concentrations of VOCs) on subjects, organized into many possible sub-groups (populations), according to different classification rules (e.g. health status (cancer, patient), gender (male, female), smoking habit (smoker, non-smoker), etc.) or according to different sampling methods (e.g. type of breath samples (inhaled, exhaled)). 

Classification methods

The Classification methods package includes classification algorithms for classification into two (or more) data groups. It includes Fisher's linear/quadratic classifier, Weighted ranks (non-parametric) classifier, Support vector machines classifier, Partial least squares classifier, classifiers with NaNs (weighted voting classifier and the classifier based on the classification trees). 

ROC analysis

The package contains functions for calculating empirical ROC curves as well as generalized ROC curves based on Fisher's linear/quadratic approach. The methods for calculating confidence intervals for the empirical ROC curve and the Youden index are also available. Furthermore, the package contains functions for generating comprehensive ROC overviews of the single VOC compounds (i.e. m/z ratios) or the pairs of VOCs.The applicability of the suggested procedures and algorithms was tested with selected data from clinical trials aimed at early diagnosis of the primary lung cancer. The data were measured by the PTR-MS instrument at Innsbruck Medical University, Austria, during the period Feb 2006 - May 2008.

Based on the given data and suggested procedures, the best individual VOCs with highest power to distinguish between the groups of healthy controls and the group of PLC patients based on the exhaled breath gas concentrations (the compounds measured by PTR-MS at particular m/z ratios which passed the suggested examination procedures) are the following: m/z 39, 41, 40, 43, 44, 88, 27, 172 (ordered from the best to the worst). For the Fisher's linear approach, the most successful combinations of the compounds measured at particular m/z proved to be "m/z 41", "m/z 39, 40", and "m/z 39". In the case of Fisher's quadratic approach, the most successful combinations proved to be "m/z 41", "m/z 41, 172", and "m/z 39, 172".

Further Activities

  • Analysis of variability of PTR-MS measurements based on large, well controlled experiment with sufficient number of samples and replications taken on healthy controls.
  • Analysis of uncertainty of PTR-MS measurements. Analysis of within-sample distributions of different mass-to-charge ratios. Research and development of a probability distribution model of the concentration measurements by PTR-MS.
  • Research of the optimal statistical methods for Receiver Operating Characteristic (ROC) curve estimation. Evaluation of different classification methods based on ROCs.
  • Inspection of available mass-to-charge ratios measured by PTR-MS method with respect to their discriminative potential by the ROC analysis.
  • Development of methods and algorithms for confidence intervals for the parameters of log-normal distribution and for difference and ratio of two log-normal means.
  • Statistical analysis of breath gas samples from clinical study measurements, measured by PTR-MS on healthy volunteers. In particular, analysis of isoprene, acetone and methanol levels in breath of healthy population and their relationship to age, cholesterol and BMI, respectively.

Deliverables and Milestones

During the second reported period the Deliverable D6.2 have been published.: „ROC based determination of optimal thresholds for separation of cancer patient groups and controls using train and test data". The technical annex states 2 milestones for WP6:

  • M1: Suggestion of potential methods for discrimination and classification of breath gas samples.
  • M2: Analysis of basic statistical properties of classification rules based on preliminary training and testing data.

Those milestones have been reached.

 
Measurement Science Review (On-Line Journal)
Conferences
Seminars
News
Staff of Department of Theoretical Methods