Datasets for VQA (Visual Question Answering)

somdn_product_page

(Downloads - 0)

Catégorie :

For more info about our services contact : help@bestpfe.com

Table of contents

1 introduction 
1.1 Context
1.1.1 Joint image/text understanding
1.1.2 Visual Question Answering
1.2 Contributions
1.3 Industrial context
2 related works 
2.1 VQA architecture
2.2 Mono-modal representations
2.2.1 Image representation
2.2.2 Textual embedding
2.3 Multi-modal fusion
2.3.1 Fusion in Visual Question Answering (VQA)
2.3.2 Bilinear models
2.4 Towards visual reasoning
2.4.1 Visual attention
2.4.2 Image/question attention
2.4.3 Exploiting relations between regions
2.4.4 Composing neural architectures
2.5 Datasets for VQA
2.6 Outline and contributions
3 mutan: multimodal tucker fusion for vqa 
3.1 Introduction
3.2 Bilinear models
3.2.1 Tucker decomposition
3.2.2 Multimodal Tucker Fusion
3.2.3 MUTAN fusion
3.2.4 Model unification and discussion
3.2.5 MUTAN architecture
3.3 Experiments
3.3.1 Comparison with leading methods
3.3.2 Further analysis
3.4 Conclusion
4 block: bilinear superdiagonal fusion for vqa and vrd 
4.1 Introduction
4.2 BLOCK fusion model
4.2.1 BLOCK model
4.3 BLOCK fusion for VQA task
4.3.1 VQA architecture
4.3.2 Fusion analysis
4.3.3 Comparison to leading VQA methods
4.4 BLOCK fusion for VRD task
4.4.1 VRD Architecture
4.4.2 Fusion analysis
4.4.3 Comparison to leading VRD methods
4.5 Conclusion
5 murel: multimodal relational reasoning for vqa 
5.1 Introduction
5.2 MuRel approach
5.2.1 MuRel cell
5.2.2 MuRel network
5.3 Experiments
5.3.1 Experimental setup
5.3.2 Qualitative results
5.3.3 Model validation
5.3.4 State of the art comparison
5.4 Conclusion
6 conclusion 
6.1 Summary of Contributions
6.2 Perspectives for Future Work
bibliography

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *