The project "Relationship Modeling and Learning of Visual Data in the Open World" won the 2022 First Prize of Natural Science Award established by China Society of Image and Graphics (CSIG). Main contributors were Ruiping Wang, Shiguang Shan, Xilin Chen, Huajie Jiang, and Wen Wang, representing the Key Laboratory of Intelligent Information Processing at ICT-CAS.
A paradigm shift during the past decade has altered visual intelligence research from the single-task visual perception in a closed domain to the more challenging compositional-task involving both visual perception and cognition in the realistic open world. A key issue now is how to effectively explore multi-level correlations (e.g., structural, semantic and contextual correlation) hidden in visual data while universally establishing a simple but efficient representation model as a basis for subsequent visual classification and understanding tasks.
To tackle such challenges, the project made three main proposals:
(1) It proposed a statistical manifold modeling and metric learning method for video/image set that overcomes the inherent heterogeneous space contradiction between the data distribution manifold and the linear learning algorithm, reveals the intrinsic property of the geometric structure of the Riemannian manifold, and leads toward a new research direction of "Riemannian metric learning."
(2) It proposed a transferable visual attribute learning and scalable visual classification method in the open world, constructed a deep hash learning architecture with joint embedding of high-level categories and middle-level attributes, developed a visual feature learning algorithm guided by attribute knowledge, and explicitly disentangled the hierarchical structured classification criteria of visual categories, to open a new direction for constructing interpretable visual classification models.
(3) It proposed a scene context graph modeling and structured reasoning method, which builds a hierarchical scene graph reflecting the priority mechanism of human cognition, characterizes the multi-dimensional visual concepts (such as entities, attributes and relationships) in the scene under a unified framework, and effectively supports progress on high-level scene cognition tasks such as image captioning, image-text cross modal retrieval, and visual question answering.
The selected 8 representative papers of the project have been cited more than 2,100 times by Google Scholar and have received wide ranging recognition from tens of senior fellow researchers. Relevant technologies based on the papers have been successfully transferred to application scenarios such as remote sensing image interpretation and smart security checks.
CSIG’s Science and Technology Award is given to encourage researchers’ enthusiasm and creativity in the image and graphics field while promoting overall innovation and industrial development in image and graphics technology. In 2022, through a rigorous review process including formal review, initial evaluation and final evaluation, CSIG selected a total of 6 natural science awards, including a first prize given to ICT’s Key Lab of Intelligent Information Processing, as well as 3 technical invention awards, and 6 scientific and technological progress awards.