Analysing double-strand breaks in cultured cells for drug screening applications by causual inference

Get Complete Project Material File(s) Now! »

The emergent role of deep learning in high content screening

Deep learning is touted as a panacea for computer vision problems, and high content data ought not to be an exception. In applications of deep learning, a neural network may perform several of the HCA stages (Figure 1.2) simultaneously, as neural networks naturally incorporate elements of feature extraction and dimensionality reduction (Sommer et al. [2017] is a good example).
However, the successful deployment of deep neural networks relies crucially on vast volumes of data to enable effective generalisation. Large annotated datasets such as ImageNet (Russakovsky et al. [2015]) were one of a few key preconditions that fostered the rise of deep learning for object classification in 2012, and extensions of deep learning to other problem domains were likewise accompanied by the curation of large, special-purpose datasets, for example Lin et al. [2014] for object detection. However, annotated data for supervised training is expensive, requiring manual effort, often by domain experts. ImageNet leverages online crowdsourcing platforms (quality is ensured by the consensus of multiple annotators). This is a bottleneck for all deep learning research, and therefore extends to computational phenotyping. The Broad Institute benchmark collection (BBBC) datasets (Ljosa et al. [2012] and, specifically, Caie et al. [2010]) have become a sort of benchmark for developing drug response phenotyping algorithms (for example, Kraus et al. [2016] or Kandaswamy et al. [2016]). However, while benchmark data sets are of course useful and have had an enormous impact on the field, we still face the problem that for new imaging projects, we do not have enough data to train neural networks. This relates to the fact that bio-images tend to be extremely variable between different projects: the visual aspect is heavily influenced by the choice of markers and the mode of microscopy. Indeed, by selecting different markers, one is effectively looking at different objects. For this reason, it seems unlikely that large scale datasets will definitively solve the problem of annotated data, except for the most widely used markers and imaging modalities.

The building blocks of artificial neural networks

An artificial neural network (henceforth neural network) is a collection of computational units known as neurons, organised into a sequence of layers. An input passes forward through a neural network, undergoing a series of transformations through combination with a set of weights in each layer. During a training procedure, the neural network is shown examples of input data paired with target outputs. After each round of training, the neural network adjusts its (randomly initialised) weights so as to make it a little more likely to emit the target values given future appearances of the input data.

Fighting overfitting in deep learning

In machine learning, overfitting is the effect of fitting the noise instead of the signal. In practice, all data contains noise that obscures the ground signal, and when a dataset is sufficiently small, a modestly powerful model may interpolate it perfectly, only to then be useless on independent test data. Much of machine learning is ultimately concerned with striking a balance between overfitting and underfitting. In supervised learning, this balance is evaluated with generalisation error, a measure of the ability of a model to generalise the data. This is estimated by evaluating a trained model over a test set of independent, unseen data. The bias-variance decomposition illustrates the tradeoff between over- and underfitting. Suppose we have Y = f(x) + generating training data with f(x) the true signal for data point x, and noise, = N(0, 2). Let our model estimate be denoted by ˆ f(x). Then, the MSE (here an arbitrary measure of goodness of fit) between an estimate and the true parameters, averaged over all possible data,

Table of contents :

1 Introduction
1.1 Computational phenotyping
1.1.1 High content screening
1.1.2 The elements of high content screening
1.1.3 High content analysis
1.2 Challenges for high content analysis
1.2.1 Multi-cell-line data
1.2.2 The emergent role of deep learning in high content screening
1.3 Contributions
2 Deep learning fundamentals
2.1 The building blocks of artificial neural networks
2.1.1 Backpropagation
2.2 Convolutional neural networks
2.2.1 AlexNet and the ConvNet revolution
2.3 Neural object detection
2.3.1 Regions with CNN features
2.4 Fighting overfitting in deep learning
2.4.1 Data augmentation
2.5 Transfer learning
2.5.1 Domain-adversarial neural networks
2.6 Generative adversarial networks
2.6.1 Deep convolutional GANs
2.6.2 Conditional GANs
2.6.3 Assorted GANs
I Computational phenotyping for multiple cell lines
3 High content analysis in drug and wild type screens
3.1 Overview
3.2 Datasets
3.2.1 Wild type screen dataset
3.3 Cell measurement pipeline
3.3.1 Nuclei segmentation
3.3.2 Cell membrane segmentation
3.3.3 Feature extraction
3.4 Use cases in high content analysis
3.4.1 Controlling for spatial effects in the drug screen dataset
3.4.2 Viabilities of TNBC cell lines correlate
3.4.3 Cell cycle modulates double-strand break rate
3.4.4 TNBC cell lines assume distinct wild type morphologies
3.5 Discussion
4 Domain-invariant features for mechanism of action predictionin a multi-cell-line drug screen
4.1 Overview
4.2 Phenotypic profiling for mechanism of action prediction
4.2.1 MOA prediction
4.2.2 Phenotypic profiling
4.2.3 Multi-cell-line analysis
4.2.4 Model evaluation
4.2.5 Software
4.3 Results
4.3.1 Single cell line analysis
4.3.2 Analysis on multiple cell lines
4.4 Discussion
II Fluorescence-free phenotyping
5 Experimentally-generated ground truth for detecting cell types in phase contrast time-lapes microscopy
5.1 Overview
5.1.1 Biological context
5.1.2 Computational phenotyping for phase contrast images
5.2 CAR-T dataset
5.2.1 Observations on the dataset
5.2.2 Coping with fluorescent quenching
5.3 Fluorescence prediction
5.3.1 Image-to-image translation models
5.3.2 Results
5.3.3 Bridging the gap to cell detection
5.4 Object detection system
5.4.1 Experimentally-generated ground truth
5.4.2 Object detection system
5.4.3 Results
5.5 Discussion
6 Deep style transfer for synthesis of images of CAR-T cell populations
6.1 Overview
6.2 Feasibility study: synthesising cell crops
6.2.1 Generative adversarial networks
6.2.2 Deep convolutional GANs
6.2.3 Conditional GANs
6.3 Style transfer for simulating populations of Raji cells
6.3.1 Conditional dilation for tessellating cell contours
6.4 CycleGANs for cell population synthesis
6.4.1 First results with CycleGANs
6.5 Fine-tuning a state-of-the art object detection system
6.6 Perspectives on synthesising a full object detection dataset
6.6.1 Region of interest discrimination
6.7 Discussion
7 Conclusions
7.1 Chapter summaries
7.2 The future of high content screening
7.2.1 Overcoming massive image annotation
7.2.2 The renaissance of label-free microscopy
7.2.3 Towards toxicogenetics
A Supplementary figures
B Glossary of neural network architectures
B.1 Fully-connected GAN
B.2 Deep convolutional GAN
B.3 F-Net
B.4 PatchGAN
C Analysing double-strand breaks in cultured cells for drug screening applications by causual inference
C.1 Introduction
C.2 Experimental setup
C.3 Approaches to measuring double-strand breaks
C.3.1 Counting spots with diameter openings
C.3.2 Granulometry-based features
C.3.3 Average intensity
C.4 Analysis
C.4.1 Causal considerations
C.5 Results
C.6 Conclusions
D Supplementary analysis
D.1 Discovering phenotypic classes with unsupervised learning
D.2 Training a RoI classifier