|
Download PDFOpen PDF in browserCurrent versionSmart Surveillance for Smart CityEasyChair Preprint 3519, version 16 pages•Date: May 30, 2020AbstractIn recent years, video surveillance technology has become pervasive in every sphere. The manual generation of the description of videos requires huge time and labor and sometimes important aspects of videos are overlooked in human summaries. The present work is an attempt towards the automated description generation of Surveillance Video. The proposed method consists of the extraction of key-frames from a surveillance video, object detection in the key-frames, natural language (English) description generation of the key-frames and finally summarizing the descriptions. The key-frames are identified based on a mean square error ratio. Object detection in a key-frame is performed using region convolutional Neural Network (R-CNN). We used Long Short Term Memory (LSTM) to generate captions from frames. Translation Error Rate (TER) is used to identify and remove duplicate event descriptions. Tf-idf is used to rank the event descriptions generated from a video and the top-ranked description is returned as the system generated a summary of the video. We evaluated the MSVD data set to validate our proposed approach and the system produces a Bilingual Evaluation Understudy (BLEU) score of 46.83. Keyphrases: Content Based Video Retrieval, Image frame, Smart City, Smart Surveillance, key frame, key frame extraction, microsoft video description, object detection, pattern recognition, real-time object detection, video description corpus, video summarization Download PDFOpen PDF in browserCurrent version |
|
|