Home      Log In      Contacts      FAQs      INSTICC Portal
 

Tutorials

The role of the tutorials is to provide a platform for a more intensive scientific exchange amongst researchers interested in a particular topic and as a meeting point for the community. Tutorials complement the depth-oriented technical sessions by providing participants with broad overviews of emerging fields. A tutorial can be scheduled for 1.5 or 3 hours.

TUTORIALS LIST

A Guided Tour of Computational Modelling of Visual Attention  (VISIGRAPP)
Lecturer(s): Olivier Le Meur

Building Personal AIs with First Person (Egocentric) Vision  (VISIGRAPP)
Lecturer(s): Antonino Furnari



A Guided Tour of Computational Modelling of Visual Attention


Lecturer

Olivier Le Meur
Univ Rennes CNRS IRISA
France
 
Abstract

Since the first computational model of visual attention, proposed in 1998 by Itti et al. [1], a lot of progress has been made. Progress concern both the modelling in itself and the way we assess the performance of saliency models. Recently, new advances in machine learning, more specifically in deep learning, have brought a new momentum in this field. In this tutorial, we present saliency models as well as the metrics used to assess their performances. In particular, we will empathize new saliency models which are based on convolutional neural networks. We will present different deep architectures and the different loss functions used during the training process. We will conclude this presentation by introducing saccadic models [2,3] which are a generalization of saliency models [1] Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on pattern analysis and machine intelligence, 20(11), 1254-1259. [2] Le Meur, O., & Liu, Z. (2015). Saccadic model of eye movements for free-viewing condition. Vision research, 116, 152-164. [3] Le Meur, O., & Coutrot, A. (2016). Introducing context-dependent and spatially-variant viewing biases in saccadic models. Vision research, 121, 72-84.


Keywords

Visual attention
Saliency modelling
Eye movements
Deep saliency network


Aims and Learning Objectives

This tutorial aims to present both the history and the latest achievements in the computational modelling of visual attention

Target Audience

Master students, Phd students, postdoc and researchers

Prerequisite Knowledge of Audience

Computer Vision - image processing - machine learning

Detailed Outline

I. Introduction
II. Definitions and concepts of visual attention
III. Non-supervised saliency models
IV. Ground truth definition and methods for assessing saliency models
V. Deep saliency network, a new breakthrough
VI. Saccadic model as a new generation of saliency models
V. Attentive applications
VI. Conclusions

Secretariat Contacts
e-mail: visigrapp.secretariat@insticc.org

Building Personal AIs with First Person (Egocentric) Vision


Lecturer

Antonino Furnari
University of Catania
Italy
 
Brief Bio
Antonino Furnari is a postdoctoral fellow at the University of Catania. He received his bachelor's degree and master's degree in computer science (both magna cum laude) in 2010 and 2013 respectively from the University of Catania. He received his PhD in Mathematics and Computer Science in 2016 from the University of Catania. He is member of the IPLAB (University of Catania) research group since 2012 and, since 2013, he is a research member of the joint laboratory STMicroelectronics - University of Catania, Italy. He is author or co-author of more than 30 papers in international book chapters, international journals and international conference proceedings. Since 2017 he serves as a contract professor at the Department of Mathematics and Computer science, University of Catania, Italy. Since 2017 he serves as academic assessment chair at the International Computer Vision Summer School (ICVSS). His research interests concern Computer Vision, Pattern Recognition, and Machine Learning, with focus on First Person Vision. Personal web page: http://dmi.unict.it/~furnari
Abstract

The increasing availability of wearable devices capable of acquiring and processing images and video from the point of view of the user (e.g., Google Glass, Microsoft HoloLens and Magic Leap One) has promoted the interest of the computer vision community on first person (egocentric) vision. Being portable and allowing to mediate the reality as perceived by their users, such devices are ideal candidates for implementing personal intelligent assistants which can understand our behavior and augment our abilities. Unlike standard “third person vision”, which assumes that the processed images and video are acquired from a static point of view neutral to the perceived events, first person (egocentric) vision assumes images and video to be acquired from the rather non-static point of view of the user by means of a wearable device. These unique acquisition settings make first person (egocentric) vision different from standard third person vision. Most notably, the visual information collected using wearable cameras always “tells something” about the user, revealing what they do, what they pay attention to and how they interact with the world. Moreover, wearable devices allow to effortlessly collect huge quantities of user-centric visual data. In this tutorial, we will discuss the challenges and opportunities offered by first person (egocentric) vision, cover the historical background and seminal works, present the main technological tools (including devices and algorithms) which can be used to analyze first person visual data and discuss challenges and open problems.

Keywords

wearable, first person, egocentric, localization, action recognition, action anticipation

Aims and Learning Objectives

The participants will understand the main advantages of first person (egocentric) vision over third person vision to understand the user’s behavior and build personalized applications. Specifically, the participants will learn about: 1) the main differences between third person and first person (egocentric) vision, including the way in which the data is collected and processed, 2) the devices which can be used to collect data and provide services to the users, 3) the algorithms which can be used to manage first person visual data for instance to perform localization, indexing, action and activity recognition.

Target Audience

First year PhD students, graduate students, researchers.

Prerequisite Knowledge of Audience

Fundamentals of Computer Vision and Machine Learning (including Deep Learning)

Detailed Outline

The tutorial will cover the following topics:
- Outline of the tutorial;
- History of first person (egocentric) vision and motivation;
- Differences between third person and first person vision;
- Wearable devices to acquire/process first person visual data;
- Main problems of interest in first person vision:
- Localization;
- Attention;
- Action recognition;
- Object recognition;
- Activity recognition;
- Action anticipation;
- Indexing and exploitation of egocentric visual data;
- Technological tools (devices and algorithms) which can be used to build first person vision applications;
- Challenges and open problems;
- Conclusions and insights for research in the field.

Secretariat Contacts
e-mail: visigrapp.secretariat@insticc.org

footer