PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

MemoVAD: Resource-Efficient Video Anomaly Detection via Dynamic Semantic Memory in Edge Computing Scenarios

Abstract

Deploying Video Anomaly Detection (VAD) in real-world surveillance faces a fundamental tension between the demand for high-level semantics to ensure effectiveness and the limited computational resources of edge devices. Vision–Language Models (VLMs) provide rich open-vocabulary semantics, but their latency and computational cost preclude on-device deployment. To address the challenge, we propose MemoVAD, an edge–cloud collaborative framework that selectively incorporates VLM semantics into streaming VAD. MemoVAD runs most inference on the edge with a lightweight detector and a causal Temporal Context Encoder (TCE) to model temporal dependencies. Specifically, we introduce an Uncertainty-Aware Gating (UAG) policy grounded in Subjective Logic to model perceived uncertainty and query the cloud-based VLM only for high-uncertainty and semantically novel clips. Besides, a Dynamic Semantic Memory (DSM) is designed to cache VLM-verified prototypes for efficient retrieval, enabling the edge model to progressively absorb VLM-level semantics via a semantic adapter. Experiments on UCF-Crime and XD-Violence datasets via a real edge device show that MemoVAD substantially reduces communication overhead while surpassing state-of-the-art performance.

The architecture of our MemoVAD system.

Qualitative Results on UCF-Crime Dataset

Detection result on test video Arrest024_x264. The system successfully localizes the anomaly window (frames 1005-3120) with high confidence scores, marked by the red alert indicator (Anomaly start time: approximately 00:00:33).

Detection result on Assault006_x264. The proposed method accurately localizes the start of the assault and maintains robust detection over 6000+ frames of the anomalous event (Anomaly start time: approximately 00:00:38).

Detection of multiple anomalies in video Explosion033_x264. The model demonstrates robust temporal localization capabilities by identifying discontinuous anomalous segments (Anomaly start times: approximately 00:00:32 and 00:00:51).

Robustness test on Normal_Videos_150_x264. In the absence of anomalous events, the detection system consistently produces low confidence scores, as evidenced by the stable green border, indicating effective discrimination between daily activities and potential threats.

More Works from Our Lab

Paper Title 1

Paper Title 2

Paper Title 3

MemoVAD: Resource-Efficient Video Anomaly Detection via Dynamic Semantic Memory in Edge Computing Scenarios

Abstract

The architecture of our MemoVAD system.

Qualitative Results on UCF-Crime Dataset