ODEs and Neural Network Optimization

somdn_product_page

(Downloads - 0)

Catégorie :

For more info about our services contact : help@bestpfe.com

Table of contents

List of Figures
List of Tables
List of Acronyms
I. Motivation
1. Introduction
1.1. Context
1.2. Subject and Contributions of this Thesis
1.2.1. General-Purpose Unsupervised Representation Learning for Time Series
1.2.2. Dynamical Systems and Representation Learning for Complex Spatiotemporal Data
1.2.3. Study of Generative Adversarial Networks via their Training Dynamics
1.2.4. Outline of this Thesis
2. Background and Related Work
2.1. Neural Architecture for Sequence Modeling
2.1.1. Recurrent Neural Networks
2.1.1.1. Principle
2.1.1.2. Refinements
2.1.2. Neural Differential Equations
2.1.2.1. ODEs and PDEs
2.1.2.2. Differential Equations and Neural Networks
2.1.2.3. ODEs and Neural Network Optimization
2.1.3. Alternatives
2.1.3.1. Convolutional Neural Networks
2.1.3.2. Transformers
2.2. Unsupervised Representation Learning for Temporal Data
2.2.1. Contrastive Learning
2.2.2. Learning from Autoencoding and Prediction
2.2.2.1. Learning Methods
2.2.2.2. Disentangled Representations
2.3. Deep Generative Modeling
2.3.1. Families of Deep Generative Models
2.3.1.1. Variational Autoencoders
2.3.1.2. Generative Adversarial Networks
2.3.1.3. Other Categories
2.3.2. Sequential Deep Generative Models
2.3.2.1. Temporally Aware Training Objectives
2.3.2.2. Stochastic and Deterministic Models for Sequence-to- Sequence Tasks
2.3.2.3. Latent Generative Temporal Structure
II. Time Series Representation Learning
3. Unsupervised Scalable Representation Learning for Time Series
3.1. Introduction
3.2. Related Work
3.3. Unsupervised Training
3.4. Encoder Architecture
3.5. Experimental Results
3.5.1. Classification
3.5.1.1. Univariate Time Series
3.5.1.2. Multivariate Time Series
3.5.2. Evaluation on Long Time Series
3.6. Discussion
3.6.1. Behavior of the Learned Representations Throughout Training .
3.6.2. Influence of K
3.6.3. Discussion of the Choice of Encoder
3.6.4. Reproducibility
3.7. Conclusion
III. State-Space Predictive Models for Spatiotemporal Data
4. Stochastic Latent Residual Video Prediction
4.1. Introduction
4.2. Related Work
4.3. Model
4.3.1. Latent Residual Dynamic Model
4.3.2. Content Variable
4.3.3. Variational Inference and Architecture
4.4. Experiments
4.4.1. Evaluation and Comparisons
4.4.2. Datasets and Prediction Results
4.4.2.1. Stochastic Moving MNIST
4.4.2.2. KTH Action Dataset
4.4.2.3. Human3.6M
4.4.2.4. BAIR Robot Pushing Dataset
4.4.3. Illustration of Residual, State-Space and Latent Properties
4.4.3.1. Generation at Varying Frame Rate
4.4.3.2. Disentangling Dynamics and Content
4.4.3.3. Interpolation of Dynamics
4.4.3.4. Autoregressivity and Impact of the Encoder and Decoder Architecture
4.5. Conclusion
5. PDE-Driven Spatiotemporal Disentanglement
5.1. Introduction
5.2. Background: Separation of Variables
5.2.1. Simple Case Study
5.2.2. Functional Separation of Variables
5.3. Proposed Method
5.3.1. Problem Formulation Through Separation of Variables
5.3.2. Fundamental Limits and Relaxation
5.3.3. Temporal ODEs
5.3.4. Spatiotemporal Disentanglement
5.3.5. Loss Function
5.3.6. Discussion of Differences with Chapter 4’s Model
5.4. Experiments
5.4.1. Physical Datasets: Wave Equation and Sea Surface Temperature
5.4.2. A Synthetic Video Dataset: Moving MNIST
5.4.3. A Multi-View Dataset: 3D Warehouse Chairs
5.4.4. A Crowd Flow Dataset: TaxiBJ
5.5. Conclusion
IV. Analysis of GANs’ Training Dynamics
6. A Neural Tangent Kernel Perspective of GANs
6.1. Introduction
6.2. Related Work
6.3. Limits of Previous Studies
6.3.1. Generative Adversarial Networks
6.3.2. On the Necessity of Modeling Discriminator Parameterization .
6.4. NTK Analysis of GANs
6.4.1. Modeling Inductive Biases of the Discriminator in the InfiniteWidth Limit
6.4.2. Existence, Uniqueness and Characterization of the Discriminator
6.4.3. Differentiability of the Discriminator and its NTK
6.4.4. Dynamics of the Generated Distribution
6.5. Fined-Grained Study for Specific Losses
6.5.1. The IPM as an NTK MMD Minimizer
6.5.2. LSGAN, Convergence, and Emergence of New Divergences .
6.6. Empirical Study with GAN(TK)2
6.6.1. Adequacy for Fixed Distributions
6.6.2. Convergence of Generated Distribution
6.6.3. Visualizing the Gradient Field Induced by the Discriminator .
6.6.3.1. Setting
6.6.3.2. Qualitative Analysis of the Gradient Field
6.7. Conclusion and Discussion
V. Conclusion
7. Overview of our Work
7.1. Summary of Contributions
7.2. Reproducibility
7.3. Acknowledgements
7.4. Other Works
8. Perspectives
8.1. Unfinished Projects
8.1.1. Adaptive Stochasticity for Video Prediction
8.1.2. GAN Improvements via the GAN(TK)2 Framework
8.1.2.1. New Discriminator Architectures
8.1.2.2. New NTK-Based GAN Model
8.2. Future Directions
8.2.1. Temporal Data and Text
8.2.2. Spatiotemporal Prediction
8.2.2.1. Merging the Video and PDE-Based Models
8.2.2.2. Scaling Models
8.2.2.3. Relaxing the Constancy of the Content Variable
8.2.3. NTKs for the Analysis of Generative Models
8.2.3.1. Analysis of GANs’s Generators
8.2.3.2. Analysis of Other Models
Appendix
A. Supplementary Material of Chapter 3
A.1. Training Details
A.1.1. Input Preprocessing
A.1.2. SVM Training
A.1.3. Hyperparameters
A.2. Univariate Time Series
A.3. Multivariate Time Series
B. Supplementary Material of Chapter 4
B.1. Evidence Lower Bound
B.2. Datasets Details
B.2.1. Data Representation
B.2.2. Stochastic Moving MNIST
B.2.3. KTH Action Dataset (KTH)
B.2.4. Human3.6M
B.2.5. BAIR Robot Pushing Dataset (BAIR)
B.3. Training Details
B.3.1. Architecture
B.3.2. Optimization
B.4. Influence of the Euler step size
B.5. Pendulum Experiment
B.6. Additional Samples
B.6.1. Stochastic Moving MNIST
B.6.2. KTH
B.6.3. Human3.6M
B.6.4. BAIR
B.6.5. Oversampling
B.6.6. Content Swap
B.6.7. Interpolation in the Latent Space
C. Supplementary Material of Chapter 5
C.1. Proofs
C.1.1. Resolution of the Heat Equation
C.1.2. Heat Equation with Advection Term
C.2. Accessing Time Derivatives of w and Deriving a Feasible Weaker Constraint
C.3. Of Spatiotemporal Disentanglement
C.3.1. Separation of Variables Preserves the Mutual Information of w and y through Time
C.3.1.1. Invertible Flow of an Ordinary Differential Equation (ODE)
C.3.1.2. Preservation of Mutual Information by Invertible Mappings
C.3.2. Ensuring Disentanglement at any Time
C.4. Datasets
C.4.1. WaveEq and WaveEq-100
C.4.2. Sea Surface Temperature (SST)
C.4.3. Moving MNIST
C.4.4. 3D Warehouse Chairs
C.4.5. TaxiBJ
C.5. Training Details
C.5.1. Baselines
C.5.2. Model Specifications
C.5.2.1. Architecture
C.5.3. Optimization
C.5.4. Prediction Offset for SST
C.6. Additional Results and Samples
C.6.1. Preliminary Results on KTH
C.6.2. Modeling SST with Separation of Variables
C.6.3. Additional Samples
C.6.3.1. WaveEq
C.6.3.2. SST
C.6.3.3. Moving MNIST
C.6.3.4. 3D Warehouse Chairs
D. Supplementary Material of Chapter 6
D.1. Proofs of Theoretical Results and Additional Results
D.1.1. Recall of Assumptions
D.1.2. On the Solutions of Equation (6.9)
D.1.3. Differentiability of Infinite-Width Networks and their NTKs .
D.1.3.1. K Preserves Kernel Differentiability
D.1.3.2. Differentiability of Conjugate Kernels, NTKs and Discriminators
D.1.4. Dynamics of the Generated Distribution
D.1.5. Optimality in Concave Setting
D.1.5.1. Assumptions
D.1.5.2. Optimality Result
D.1.6. Case Studies of Discriminator Dynamics
D.1.6.1. Preliminaries
D.1.6.2. LSGAN
D.1.6.3. IPMs
D.1.6.4. Vanilla GAN
D.2. Discussions and Remarks
D.2.1. From Finite to Infinite-Width Networks
D.2.2. Loss of the Generator and its Gradient
D.2.3. Differentiability of the Bias-Free ReLU Kernel
D.2.4. Integral Operator and Instance Noise
D.2.5. Positive Definite NTKs
D.3. Experimental Details
D.3.1. Datasets
D.3.2. Parameters
Bibliography

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *