From Scene Classification to Activity Recognition: Hierarchical, Multi-Task, Multi-Modal and Federated Learning Strategies

Scene classification remains a central challenge in computer vision, requiring models to capture both the local structure and global context of visual environments. As scene understanding grows increasingly relevant across applications of engineering significance – ranging from autonomous navigation to environmental monitoring – there is a need for learning frameworks that go beyond isolated category modelling. Recent advances have also emphasized the importance of privacy-preserving approaches, particularly in settings where data cannot be centrally aggregated. In this context, federated learning has emerged as a powerful paradigm for activity recognition, enabling collaborative model training across distributed devices while safeguarding user data.

In this talk, Reza will explore emerging strategies that incorporate multi-task learning, context-aware representations, and knowledge transfer to improve scene classification performance. He will also present his work on human activity recognition – specifically fall detection – using a multimodal and federated learning approach, demonstrating how privacy-aware solutions can be effectively applied to real-world scene understanding tasks.