MemoVAD: Resource-Efficient Video Anomaly Detection via Dynamic Semantic Memory in Edge Computing Scenarios

Abstract

Deploying Video Anomaly Detection (VAD) in real-world surveillance faces a fundamental tension between the demand for high-level semantics to ensure effectiveness and the limited computational resources of edge devices. Vision–Language Models (VLMs) provide rich open-vocabulary semantics, but their latency and computational cost preclude on-device deployment. To address the challenge, we propose MemoVAD, an edge–cloud collaborative framework that selectively incorporates VLM semantics into streaming VAD. MemoVAD runs most inference on the edge with a lightweight detector and a causal Temporal Context Encoder (TCE) to model temporal dependencies. Specifically, we introduce an Uncertainty-Aware Gating (UAG) policy grounded in Subjective Logic to model perceived uncertainty and query the cloud-based VLM only for high-uncertainty and semantically novel clips. Besides, a Dynamic Semantic Memory (DSM) is designed to cache VLM-verified prototypes for efficient retrieval, enabling the edge model to progressively absorb VLM-level semantics via a semantic adapter. Experiments on UCF-Crime and XD-Violence datasets via a real edge device show that MemoVAD substantially reduces communication overhead while surpassing state-of-the-art performance.

Research result visualization

The architecture of our MemoVAD system.

Qualitative Results on UCF-Crime Dataset

Detection result on test video Arrest024_x264. The system successfully localizes the anomaly window (frames 1005-3120) with high confidence scores, marked by the red alert indicator (Anomaly start time: approximately 00:00:33).

Detection result on Assault006_x264. The proposed method accurately localizes the start of the assault and maintains robust detection over 6000+ frames of the anomalous event (Anomaly start time: approximately 00:00:38).

Detection of multiple anomalies in video Explosion033_x264. The model demonstrates robust temporal localization capabilities by identifying discontinuous anomalous segments (Anomaly start times: approximately 00:00:32 and 00:00:51).

Robustness test on Normal_Videos_150_x264. In the absence of anomalous events, the detection system consistently produces low confidence scores, as evidenced by the stable green border, indicating effective discrimination between daily activities and potential threats.