Recent work on behaviour recognition, semantic video search, crowd analysis, person re-identification has led to a spin-out Vision Semantics in 2007 which has attracted investment to a joint venture from a global market-leader in the banking sector. It has also benefited UK government services and industrial competitiveness.
Recent work (2007-2013) on self-learning visual context of crowded spaces, unsupervised video behaviour profiling, abnormal behaviour recognition, semantic video search/screening, distributed multi-camera behaviour correlation, and person re-identification over distributed large spaces has led to the formation of QMUL spin-out company Vision Semantics Limited (VSL) in 2007. VSL has developed market gap-filling semantic video analysis technologies and multi-camera tracking systems for watchlist re-identification in distributed urban environments.
The system has opened up opportunities to potentially improve existing industrial video analytics products used by a wide range of customers. VSL has also developed innovative technologies for automatic passenger management and real-time crowd density analysis for transport safety and crowd evacuation requirements.
Early work (1996-2006) on motion analysis, face detection, tracking and recognition was licensed to a start-up video analytics company Safehouse Technologies, who built a substantial computer vision technology product base with numerous patents granted and recognition within the industry. Safehouse established considerable inroads to the networked CCTV surveillance sector in North America, and Australasia.
QMUL's Computer Vision Lab has undertaken substantial research programmes in collaboration with DSTL, MOD SA/SD, the US Army Labs, Vision Semantics and Safehouse Technologies to develop robust and scalable mathematical models and computer algorithms for automatic detection, tracking and recognition of object behaviour patterns captured in distributed CCTV cameras from a distance in public spaces, solving a significant challenge on how to analyse and effectively filter massive amounts of public space video data to find “needles in haystacks” [1,6].
There has been an accelerated expansion of Closed-Circuit TeleVision (CCTV) camera systems in public spaces ranging from transport infrastructures, shopping centres, sports arenas to residential streets, serving as a tool for crime reduction and risk management. Current CCTV surveillance continues to be a repetitive, time-consuming manual task that is often reliant on a human operator to spot a momentary incident occurring on dozens of monitors concurrently. CCTV systems rely heavily on human operators to monitor activities and determine incidents, e.g. tracking a suspicious target from one camera to another in a large area of distributed space, or across disjoint views. However, there are inherent limitations to employing unsupervised human operators due to the lack of a priori knowledge for what to look for.
Consequently, most existing CCTV recordings are never replayed, or at best retrieved only after an incident had occurred. Very little if anything, is known about what exactly has been recorded. When a major incident occurs, the police have to review thousands of hours of video recordings to look for a single event that may only last a few seconds. Even if the precise image frames of interest are identified, the image data can often be of insufficient quality either for recognition or as evidence.
There is a massive demand from the commercial technology providers and end-users for activity- and behaviour-based semantic video content analysis to enable fully automated and highly selective screening and search of salient events and objects (e.g. a watchlist) in the colossal amount of video data generated from both infrastructure CCTV cameras and mobile devices. Currently there is no suitable solution on the market. Existing video analytics suffer from (a) crude signal thresholding, (b) hard-wired configuration requiring specialist setup per application domain; (c) rule-based detection systems inflexible and not scalable to different operational conditions; (d) unacceptable false alarm rates, poor usability for user control.
The Computer Vision Lab has endured to develop leading and innovative techniques for object tracking and re-identification, behaviour profiling and anomaly detection based semantic video search/auto-screening, scalable to large scale public space video data. Specifically, fundamental mathematical models and scalable computer algorithms have been developed for
This research has been led by Professor Shaogang Gong, founder and Chief Scientist of Vision Semantics since 2007, who also founded the QMUL Computer Vision Lab and has led the QMUL Computer Vision Group since 1993.
The research has been funded by 6 consecutive EPSRC/DTI grants between 1995-2011 including 2 successive MOD JGS grants between 2004-2011, with a further MOD grant from 2011 to 2015. The research has also been funded by three EU FP7 Security grants between 2008-2017, and a EU FP7 Transport grant between 2011-2014.