A Unified Framework for Locating and Recognizing Human Actions Proposed at CVPR 2011

The spatiotemporal representation and recognition of human actions in videos, which has great potentials in applications like human-computer interaction, video surveillance and multimedia data search etc, is one of widely studied topics in computer vision and pattern recognition community.

Under the guidance of Associate Researcher CHANG Hong etc., Master student XIE Yuelei presented a novel pose based unified framework for locating and recognizing human actions in videos. In their methods, human poses are detected and represented based on deformable part model. This is the first work on exploring the effectiveness of deformable part model on combining human detection and pose estimation into action recognition. Comparing with previous methods, theirs has three main advantages. First, the method does not rely on the video pre-processing quality, such as satisfactory foreground segmentation or reliable tracking; second, they proposed a novel compact representation for human poses which works together with human detection and can well represent spatial and temporal structure inside an action; third, with human detection taken into consideration, the method has the ability to localize and recognize multiple human actions simultaneously in the same cluttered scene. Evaluation on benchmark datasets and recorded videos demonstrates the efficacy of the proposed method.

The work entitled “A Unified Framework for Locating and Recognizing Human Actions” was reported at the 28th annual conference of the IEEE Computer Vision and Pattern Recognition, held at Colorado Springs in Colorado, USA, from 21st to 25th June, 2011. The paper will be published by IEEE and included in IEEE Xplore.