Elsevier

Ad Hoc Networks

Exploiting scene and body contexts in controlling continuous vision body cameras

Abstract

Ever-increasing performance at decreasing price has fueled camera deployments in a wide variety of real-world applications—making the case stronger for battery-powered, continuous-vision camera systems. However, given the state-of-the-art battery technology and embedded systems, most battery-powered mobile devices still do not support continuous vision. In order to reduce energy and storage requirements, there have been proposals to offload energy-demanding computations to the cloud Naderiparizi et al. (2016) [1], to discard uninteresting video frames Naderiparizi et al. (2017), and to use additional sensors to detect and predict when to turn on the camera Bahl et al. (2012) [2]. However, these proposals either require a fat communication bandwidth or have to sacrifice capturing of important events.

In this paper, we present — ZenCam 1 ,which is an always-on body camera that exploits readily available information in the encoded video stream from the on-chip firmware to classify the dynamics of the scene. This scene-context is further combined with simple inertial measurement unit (IMU)-based activity level-context of the wearer to optimally control the camera configuration at run-time to keep the device under the desired energy budget. We describe the design and implementation of ZenCam and thoroughly evaluate its performance in real-world scenarios. Our evaluation shows a 29.8%–35% reduction in energy consumption and 48.1-49.5% reduction in storage usage when compared to a standard baseline setting of 1920x1080 at 30fps while maintaining a competitive or better video quality at the minimal computational overhead.

Introduction

Inrecent years, the use of body-worn cameras has increased exponentially—from law enforcement officers, to first responders, to the military. The need for having a body camera to carry out day-to-day jobs with a higher level of competency has brought many commercial products into the consumer market. Typically, body cameras are used as a recording device which stores multimedia content inside the device. These video feeds are later downloaded and analyzed off-line. Besides satisfying application-specific video-quality requirements, the two most desirable properties of body cameras are extended battery life and efficient storage. However, most body cameras of today have a short battery life which limits our ability to capture hours-long events. Storage space is also limited in these portable systems. Although the cost of storage has become cheap nowadays, efficiency is always desirable, and in some cases, it is necessary as there is monetary cost associated with archiving a large amount of data, and for this reason, often security videos are archived for a limited period.

Unfortunately, today's body cameras are developed somewhat in an ad hoc manner by gluing together different sensing, computational, and communication modules with limited or guarantee-less optimization in their designs. When such systems are deployed in the real world, they fail to provide a promised quality of service and an extended battery life. We argue that in order to develop an efficient body camera that provides an extended battery-life, efficient storage, and satisfactory video quality, instead of an ad hoc combination of independently developed modules, we need to apply context-aware control-theoretic principles and engineer a body-worn camera that is dependable, efficient, and robust.

We envision that as the technology behind video capture and processing matures, we will see increased use of body cameras in the days to come. Of particular interest to us in this paper are the body cameras worn by the law enforcement officers, because of their direct impact on our society. Several studies [4], [5], [6] have shown that the use of body cameras among the officers increases transparency and reduces the abuse of power. For instance, a Phoenix-based study [4] in 2014 showed that officers who kept their body cameras on made 17% more arrests and received 23% fewer complaints than the officers who did not. The reason behind this is psychological for the most part, as the officer is aware that his actions are being recorded. This develops an extra sense of dutifulness among the law enforcement officers, which is likely to reduce the number of disturbing events like the ones we have witnessed in recent years all over the country.

Existing implementations of body cameras typically consist of an on/off switch that is controlled by its wearer. While this design suits the purpose in an ideal scenario, in many situations, the wearer (e.g., a law-enforcement officer) may not be able to predict the right moment to turn the camera on or may completely forget to do so in the midst of an action. There are some cameras which automatically turns on at an event (e.g., when a gun is pulled), but they miss the "back-story", i.e., how the situation had developed.

We advocate that body cameras should be always on so that they are able to continuously capture the scene for an extended period. However, cameras being one of the most power-hungry sensors, the lifetime of a continuous vision camera is limited to tens of minutes to few hours depending on the size of the battery. A commonsense approach to extend the battery-life of a continuous vision camera is to analyze the scene and record it at high resolution and/or at high frame rate only when the scene dynamics is high and vice versa. However, on-device scene dynamics analysis is extremely costly and the cost generally outweighs the benefit. Recent works [7], [8] therefore employ secondary low-power cameras (e.g., a thermal imaging sensor) and FPGA-based scene analysis to wake-up the primary camera when an interesting event is detected at a lower energy cost. These systems, however, have the same limitation of missing the back-story, and in general, they are bulkier than a single camera-based system due to the additional camera sensor.

In this paper, we present ZenCam , which is an always on and continuously recording body camera that ensures the desired battery-life and a near-optimal 2 video quality. The camera analyzes the dynamics of the scene as well as the activity level of the wearer in real-time, and controls the parameters of the body camera (e.g., frame rate and resolution) in order to satisfy the battery-life and the video quality requirements. The novel technical aspects of ZenCam are two-fold:

First, we develop a light-weight video analytics algorithm that operates entirely on the encoded video stream obtained directly from the camera firmware for fast and low-power scene dynamics classification. This is different from existing algorithms that decompress and analyze video frames in the pixel domain.

Second, a novel control-theoretic approach where we employ a model predictive controller [9] to dynamically control the frame rate and resolution of the camera sensor to achieve the desired battery life and a near-optimal video quality.

We develop a prototype of ZenCam using low-cost, off-the-shelf sensors, and lightweight, open source software modules. We identify the system parameters that affect the output power and the video quality, and empirically model their relationship. We implement a scene dynamics analysis algorithm which uses natively available motion vectors from the camera and implement a light-weight activity-level classifier that uses an on-board accelerometer to determine the activity-level of the wearer. The control algorithm uses both types of contextual information to dynamically adapt the camera parameters.

We deploy ZenCam in multiple real-world environments, such as an office building, a street, and inside a car, and evaluate its performance. We demonstrate that ZenCam achieves a 29.8%–35% reduction in energy consumption and a 48.1–49.5% reduction in storage when compared to a fixed-configuration body camera, without losing the video quality. Compared to an Oracle system, ZenCam's computational overhead is 10%–17% with unoptimized hardware and software implementation. To the best of our knowledge, ZenCam is the first of its kind to achieve such energy and storage savings via an on-device, single camera-based, light-weight, low-power scene dynamics analysis algorithm, and a minimal hardware addition (i.e., an IMU), without sacrificing the video quality as the scene and user dynamics change at run-time.

The paper is organized as follow. Section 2 provides a brief background on video coding and control theory. The system overview and challenges are discussed in Section 3. Section 4 details the compressed video analysis and activity analysis. The controller design is in Section 5. Section 6 describes the implementation of the system. Sections 7 Energy overhead measurement, 8 Real-world deployment provides system overhead measurements and evaluations respectively. Related works are listed in Section 10. Finally we discuss the system and conclude in Sections 9 Discussion, 11 Conclusion respectively. We note that parts of this work have been previously published in 2019 IEEE 15th International Conference on Distributed Computing in Sensor Systems (DCOSS) [3].

Section snippets

Video coding

Video cameras produce videos by taking consecutive images (frames). However, if a video is composed of only images, the size of the videos can easily reach Gigabytes (GB) within a magnitude of several minutes. Video coding, sometimes referred as video compression, is to reduce video data size by reducing spatial and temporal redundancies [10], [11]. Spatial redundancy reduction is for structural similarity within a single image and temporal redundancy reduction is for similarity across

ZenCam system design

ZenCam is an autonomous, always-on, continuous-vision body camera that dynamically adjusts its configurations based on both the degree of body movement of the wearer and the dynamics of the scene. As a result, ZenCam achieves a desired battery life and ensures an optimal video quality of the recorded scenes. The system architecture of ZenCam is depicted in Fig. 2.

Encoded video analytics

When a video is recorded, the encoder calculates motion vectors and residual values [17]. Motion vectors represent the translation of pair-wise most similar macro-blocks (e.g., 8 × 8 pixels) across frames. Residual value is calculated as the difference between the macro-block in the current frame and the previous frame's matching one. In the final video, the current frame is reconstructed from macro-blocks of the previous frame and the differences. ZenCam exploits these two freely available

Camera controller

We design the body camera control system as shown in Fig. 4. The controller takes input from the sensor which measures the system's battery level b ( t ) . The context look-up table provides the weights ( w 1 , w 2 , ) and the reference values of resolution r r and frame rate f r for the current context. The controller computes the control input ( r , f ) based on these parameters and the reference value of the desired battery level b r ( t ) .

We define the objective function for the controller to match the actual

Hardware development

We implement a prototype of ZenCam which is shown in Fig. 6. We use a Raspberry Pi Zero W and a Pi Camera Module V2 as the main components. We connect an LSM9DS0 IMU to the Raspberry Pi via I 2 C. A Raspberry Pi Zero is used to demonstrate that our system works with off-the-shelf parts and runs on resource-constrained platforms. Pi Camera also provides us with an easy access to motion vectors and residual values without decoding the frames. This information is available in nearly every camera

Energy overhead measurement

We use Pi Camera sensor mode 4 (out of the 7 modes available), shown in Fig. 7(a), as it utilizes the full sensor area and produces frame rates from 0.1 fps to 40 fps. The power consumption of the system for different combinations of resolution and frame rates are shown in Fig. 7(b). Based on these measurements, we select our resolution settings from a pool of three choices: 1920 × 1080, 1600 × 900, and 1280 × 720, which are commonly used in real-world applications. Our measurements reveal that

Deployment setup

Two body cameras having identical hardware are deployed in multiple real-world scenarios (indoors and outdoors) in two sessions for a total of 271 min. In each session, both cameras start and finish recording at the same time. The first camera runs ZenCam algorithms and the other one uses a fixed settings of 1600 × 900 resolution and 40 fps to record high-quality ground-truth video of the scene. This second video footage is later used in our lab to produce two different solutions to compare

Discussion

Why the video features are free? The video features we use (i.e., motion vectors and residuals) are generated by the video codec during the video encoding process. Since these two are already computed inside the camera firmware, all we require is to expose them to the application layer and use them at run-time in the proposed scene dynamics analyzer. Since there is no overhead of frame decoding, feature computation, and frame re-encoding, we are relieved from an expensive step of feature

Continuous mobile vision

Prior works [26] have leveraged low-power IMU sensors to control the camera parameters. However, for lack of scene dynamics analysis, these systems fails to record videos at a suitable frame rate when the scene is dynamic. [31] also utilizes IMU for selecting data for 3D construction, which is not for continuous vision application. [7], [8] utilize low-power secondary camera sensors such as a thermal imaging sensor and hardware implementation of algorithms using FPGAs to reduce power

Conclusion

In this paper, we design, implement and evaluate ZenCam, an always-on body camera that saves system resources via scene dynamics analysis in the encoded video domain and human activity classification using IMU-based sensing. ZenCam uses this information to control camera parameters for energy and storage consumption reduction. Our evaluation shows that ZenCam achieves 29.8%–35% energy savings and 48.1–49.5% storage space reduction, while maintaining a competitive or better video quality, when

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This paper was supported, in part, by NSF grants CNS-1816213 and CNS-1704469.

Shiwei Fang is a Ph.D. candidate in the Department of Computer Science at the University of North Carolina at Chapel Hill, USA, member of Embedded Intelligence Lab under supervision of Prof. Shahriar Nirjon. He received his B.E. in computer engineering at Stony Brook University, USA. He is interested in developing cyber-physical systems that can extract useful and high quality hidden information from sensors, such as camera, IMU, WiFi, and mmWave radar, while maintain a minimal overhead. His

References (45)

  • et al.

    Body-worn cameras for police accountability: Opportunities and risks

    Comput. Law Secur. Rev.

    (2015)

  • NaderipariziSaman et al.

    Wispcam: An rf-powered smart camera for machine vision applications

  • BahlParamvir et al.

    Vision: cloud-powered sight for all: showing the cloud what you see

  • FangShiwei et al.

    Zencam: Context-driven control of autonomous body cameras

  • KatzCharles M et al.

    Evaluating the Impact of Officer Worn Body Cameras in the Phoenix Police Department

    (2014)

  • RoyAllyson

    On–Officer Video Cameras: Examining the Effects of Police Department Policy and Assignment on Camera Use and Activation

    (2014)

  • HanSeungyeop et al.

    GlimpseData: Towards continuous vision-based personal analytics

  • NaderipariziSaman et al.

    Glimpse: A programmable early-discard camera architecture for continuous mobile vision

  • WangLiuping

    Model Predictive Control System Design and Implementation Using MATLAB®

    (2009)

  • LiuS. et al.

    Codingflow: Enable video coding for video stabilization

    IEEE Trans. Image Process.

    (2017)

  • BabuR. Venkatesh et al.

    A survey on compressed domain video analysis techniques

    Multimedia Tools Appl.

    (2016)

  • TakiY et al.

    Interframe coding that follows the motion

    Proc. Inst. Electron. Commun. Eng. Jpn. Annu. Conv.

    (1974)

  • NinomiyaY. et al.

    A motion-compensated interframe coding scheme for television pictures

    IEEE Trans. Commun.

    (1982)

  • AhmedNasir et al.

    Discrete cosine transform

    IEEE Trans. Comput.

    (1974)

  • ZengBing et al.

    Directional discrete cosine transforms—a new framework for image coding

    IEEE Trans. Circuits Syst. Video Technol.

    (2008)

  • WiegandThomas et al.

    Overview of the H. 264/AVC video coding standard

    IEEE Trans. Circuits Syst. Video Technol.

    (2003)

  • RichardsonIain E

    H. 264 and MPEG-4 Video Compression: Video Coding for Next-Generation Multimedia

    (2004)

  • FranklinGene F et al.

    Feedback Control of Dynamic Systems, Vol. 3

    (1994)

  • LandauIoan Doré et al.

    Adaptive Control, Vol. 51

    (1998)

  • Yen-Fu Ou, Tao Liu, Zhi Zhao, Zhan Ma, Yao Wang, Modeling the impact of frame rate on perceptual quality of video, in:...
  • T. Zinner, O. Hohlfeld, O. Abboud, T. Hossfeld, Impact of frame rate and resolution on objective QoE metrics, in: 2010...
  • WangZhou et al.

    Image quality assessment: from error visibility to structural similarity

    IEEE Trans. Image Process.

    (2004)

  • Cited by (0)

    Shiwei Fang is a Ph.D. candidate in the Department of Computer Science at the University of North Carolina at Chapel Hill, USA, member of Embedded Intelligence Lab under supervision of Prof. Shahriar Nirjon. He received his B.E. in computer engineering at Stony Brook University, USA. He is interested in developing cyber-physical systems that can extract useful and high quality hidden information from sensors, such as camera, IMU, WiFi, and mmWave radar, while maintain a minimal overhead. His work has applications in public health and safety, customer analytics, and sensing for robotics. He has published in multiple conferences and won a best paper award at International Conference on Distributed Computing in Sensor Systems (DCOSS 2019).

    Ketan Mayer-Patel is an associate professor of Computer Science at the UNC-Chapel Hill. He has over two decades of experience in developing distributed multimedia systems. In particular, he investigates ways in which application-level knowledge can be used to inform how media streams are compressed and represented and align those representations with streaming mechanisms to best optimize overall system performance. He is chair of the executive committee for ACM Multimedia Systems (MMSys) and the International Workshop for Network and Operating System Support for Digital Audio and Video (NOSSDAV).

    Shahriar Nirjon is an assistant professor of Computer Science at UNC Chapel Hill. Shahriar (who goes by "Nirjon") is interested in Embedded Intelligence—the general idea of which is to make resource-constrained embedded systems capable of sensing, learning, adapting, and evolving in real-time. Research challenges that he deals with include on-device machine learning, RF sensing, real-time issues, and a variety of optimization problems on resource-constrained embedded platforms. His work has applications in the area of smart cities, remote health and wellness monitoring, and the Internet of Things. Nirjon received his Ph.D. from the University of Virginia, Charlottesville in 2014.

    View full text