학술논문

Video Event Detection Using Temporal Pyramids of Visual Semantics with Kernel Optimization and Model Subspace Boosting
Document Type
Conference
Source
2012 IEEE International Conference on Multimedia and Expo Multimedia and Expo (ICME), 2012 IEEE International Conference on. :747-752 Jul, 2012
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Kernel
Boosting
Visualization
Support vector machines
Data models
Multimedia communication
Event detection
Video
Event
Classification
Modeling
SVM
Semantic
Visual
Pyramid
Bipartite
Model
Selection
Optimization
Temporal
TRECVID
MED
Language
ISSN
1945-7871
1945-788X
Abstract
In this study, we present a system for video event classification that generates a temporal pyramid of static visual semantics using minimum-value, maximum-value, and average-value aggregation techniques. Kernel optimization and model subspace boosting are then applied to customize the pyramid for each event. SVM models are independently trained for each level in the pyramid using kernel selection according to 3-fold cross-validation. Kernels that both enforce static temporal order and permit temporal alignment are evaluated. Model subspace boosting is used to select the best combination of pyramid levels and aggregation techniques for each event. The NIST TRECVID Multimedia Event Detection (MED) 2011 dataset was used for evaluation. Results demonstrate that kernel optimizations using both temporally static and dynamic kernels together achieves better performance than any one particular method alone. In addition, model sub-space boosting reduces the size of the model by 80%, while maintaining 96% of the performance gain.