skip to main content
Guest
e-Shelf
My Account
Sign out
Sign in
This feature requires javascript
New Search
Journals by Title
Help
Language:
English
Français
Deutsch
This feature required javascript
This feature requires javascript
Primo Search
Great Falls College MSU
Great Falls College MSU
TRAILS Collections
MT Academic Libraries
EBSCO
EBSCO
Search For:
Clear Search Box
Search in:
Great Falls College MSU
Or hit Enter to replace search target
Or select another collection:
Search in:
Great Falls College MSU
Search in:
Great Falls College MSU Print Collection
Search in:
Great Falls College MSU Course Reserves
Advanced Search
Browse Search
This feature requires javascript
This feature requires javascript
Improved Chemical Structure–Activity Modeling Through Data Augmentation
Cortes-Ciriano, Isidro ; Bender, Andreas
Journal of chemical information and modeling, 2015-12-28, Vol.55 (12), p.2682-2692
[Peer Reviewed Journal]
Full text available
Citations
Cited by
View Online
Details
Recommendations
Availability
Times Cited
This feature requires javascript
Actions
Add to e-Shelf
Remove from e-Shelf
E-mail
Print
Permalink
Citation
EasyBib
EndNote
RefWorks
Delicious
Export RIS
Export BibTeX
This feature requires javascript
Title:
Improved Chemical Structure–Activity Modeling Through Data Augmentation
Author:
Cortes-Ciriano, Isidro
;
Bender, Andreas
Subjects:
Algorithms
;
Animals
;
Proteins - metabolism
;
Humans
;
Models, Molecular
;
Rats
;
Linear Models
;
Proteins - chemistry
;
Quantitative Structure-Activity Relationship
;
Index Medicus
Is Part Of:
Journal of chemical information and modeling, 2015-12-28, Vol.55 (12), p.2682-2692
Description:
Extending the original training data with simulated unobserved data points has proven powerful to increase both the generalization ability of predictive models and their robustness against changes in the structure of data (e.g., systematic drifts in the response variable) in diverse areas such as the analysis of spectroscopic data or the detection of conserved domains in protein sequences. In this contribution, we explore the effect of data augmentation in the predictive power of QSAR models, quantified by the RMSE values on the test set. We collected 8 diverse data sets from the literature and ChEMBL version 19 reporting compound activity as pIC50 values. The original training data were replicated (i.e., augmented) N times (N ∈ 0, 1, 2, 4, 6, 8, 10), and these replications were perturbed with Gaussian noise (μ = 0, σ = σnoise) on either (i) the pIC50 values, (ii) the compound descriptors, (iii) both the compound descriptors and the pIC50 values, or (iv) none of them. The effect of data augmentation was evaluated across three different algorithms (RF, GBM, and SVM radial) and two descriptor types (Morgan fingerprints and physicochemical-property-based descriptors). The influence of all factor levels was analyzed with a balanced fixed-effect full-factorial experiment. Overall, data augmentation constantly led to increased predictive power on the test set by 10–15%. Injecting noise on (i) compound descriptors or on (ii) both compound descriptors and pIC50 values led to the highest drop of RMSEtest values (from 0.67–0.72 to 0.60–0.63 pIC50 units). The maximum increase in predictive power provided by data augmentation is reached when the training data is replicated one time. Therefore, extending the original training data with one perturbed repetition thereof represents a reasonable trade-off between the increased performance of the models and the computational cost of data augmentation, namely increase of (i) model complexity due to the need for optimizing σnoise and (ii) the number of training examples.
Publisher:
United States: American Chemical Society
Language:
English
Identifier:
ISSN:
1549-9596
EISSN:
1549-960X
DOI:
10.1021/acs.jcim.5b00570
PMID:
26619900
Source:
© ProQuest LLC All rights reserved
Show collections
Hide collections
This feature requires javascript
This feature requires javascript
Back to results list
This feature requires javascript
This feature requires javascript
Searching Remote Databases, Please Wait
Searching for
in
scope:(01TRAILS_MSU_GFC),primo_central_multiple_fe
Show me what you have so far
This feature requires javascript
This feature requires javascript