Talk: Towards Understanding Vision and Language Systems

Dear All,

IEEE Signal Processing Society, Bangalore Chapter and Department of Computational and Data Sciences,Indian Institute of Science Invite you to the following talk:

SPEAKER : Badri Narayana Patro, Postdoctoral Fellow, Google Research India
TITLE : “Towards Understanding Vision and Language Systems: Controllability, Uncertainty and
Interpretability for VQA and VQG”
Venue : #102, CDS Seminar Hall
Date & Time : Feb 26, 2020, 04:00 PM

Intelligent interaction between humans and automated systems is an important area of research in computer vision, natural language processing, and machine learning. One such interaction involves visual question answering (VQA), where questions asked by humans are answered by a machine. This involves using information from different modalities, such as image and language. Such automated systems have many application possibilities in building assistive technology for visually impaired people and in efficient surveillance and robotics systems. Now to have a healthy interaction, it is important also to have engaging questions being generated by machines. Generating natural questions based on an image is a challenging semantic task termed visual question generation (VQG) that requires multimodal representations. Images can have multiple visual and language contexts that are relevant for generating questions, namely places, captions, and tags. In the literature, there are several ways based on deep learning to solve these problems. In this talk, we aim to understand these techniques better to incorporate controllability, obtain an estimate of the uncertainty of solving the problem, and also be able to explain these techniques using visual and textual explanations in a robust manner.

Badri Narayana Patro is currently a Postdoctoral Fellow at Google Research Lab for AI, India. He has submitted his PhD thesis on the title “ Towards Understanding Vision and Language Systems” in the Department of Electrical Engineering from Indian Institute of Technology, Kanpur. He received his M.Tech degree in Department of Electrical Engineering from Indian Institute of Technology, Bombay and received his B.Tech degree in Electronic and telecommunication Engineering from National Institute of Science and Technology, Odisha. He was working as a Lead engineering in Samsung R&D institute Delhi, India. He has also worked for Harman International limited, Pune, India as an associate software engineer and assistant software engineer at Larsen and Toubro at Mysore. His expertise is in the fields of Computer Vision, Natural Language Processing, Pattern Recognition and applied Machine Learning. He has authored papers in different venues such as CVPR, ICCV, AAAI, EMNLP, COLING, WACV. He works the Vision and Language team at Google Research Lab and works closely with allied teams such as AI for Social Goods and Google Image teams within Google. He has served as a reviewer for CVPR, ICCV, AAAI, ACL, TIP, TMM, WACV, ICVGIP, NCVPRIPG conferences and Journals. He is a member of the ACL, AAAI, and CVF. He actively collaborates with institutes to further his research interests.

Host Faculty: Prof. Venkatesh Babu

Leave a Reply