Latent Spatio-temporal Models for Action Localization and Recognition in Nursing Home Surveillance Video


This paper presents an application of vision-based monitoring of long-term care facility residents. We develop an algorithm to detect events of interest, particularly falls by elderly residents. The algorithm uses a max-margin latent variable approach with spatiotemporal locations of the person in the video as latent variables. The recently developed Action Bank descriptor is utilized as a rich feature representation for each frame. Empirical results demonstrate the effectiveness of this method.