Publications

Z. -S. Fu*, S. Sudhakar*, S. Karaman and V. Sze, "DecTrain: Deciding When to Train a Monocular Depth DNN Online," in IEEE Robotics and Automation Letters, vol. 10, no. 3, pp. 2822-2829, March 2025

Abstract Paper Video Poster Slides Software

Deep neural networks (DNNs) can deteriorate in accuracy when deployment data differs from training data. While performing online training at all timesteps can improve accuracy, it is computationally expensive. We propose DecTrain, a new algorithm that decides when to train a monocular depth DNN online using self-supervision with low overhead. To make the decision at each timestep, DecTrain compares the cost of training with the predicted accuracy gain. We evaluate DecTrain on out-of-distribution data, and find DecTrain maintains accuracy compared to online training at all timesteps, while training only 44% of the time on average. We also compare the recovery of a low inference cost DNN using DecTrain and a more generalizable high inference cost DNN on various sequences. DecTrain recovers the majority (97%) of the accuracy gain of online training at all timesteps while reducing computation compared to the high inference cost DNN which recovers only 66%. With an even smaller DNN, we achieve 89% recovery while reducing computation by 56%. DecTrain enables low-cost online training for a smaller DNN to have competitive accuracy with a larger, more generalizable DNN at a lower overall computational cost.

Dasong Gao*, Peter Zhi Xuan Li*, Vivienne Sze, Sertac Karaman, "GEVO: Memory-Efficient Monocular Visual Odometry Using Gaussians," Arxiv 2024.

Abstract Paper Video Poster Slides Software

Constructing a high-fidelity representation of the 3D scene using a monocular camera can enable a wide range of applications on mobile devices, such as micro-robots, smartphones, and AR/VR headsets. On these devices, memory is often limited in capacity and its access often dominates the consumption of compute energy. Although Gaussian Splatting (GS) allows for high-fidelity reconstruction of 3D scenes, current GS-based SLAM is not memory efficient as a large number of past images is stored to retrain Gaussians for reducing catastrophic forgetting. These images often require two-orders-of-magnitude higher memory than the map itself and thus dominate the total memory usage. In this work, we present GEVO, a GS-based monocular SLAM framework that achieves comparable fidelity as prior methods by rendering (instead of storing) them from the existing map. Novel Gaussian initialization and optimization techniques are proposed to remove artifacts from the map and delay the degradation of the rendered images over time. Across a variety of environments, GEVO achieves comparable map fidelity while reducing the memory overhead to around 58 MBs, which is up to 94× lower than prior works.

Peter Zhi Xuan Li, Sertac Karaman, Vivienne Sze, “GMMap: Memory-Efficient Continuous Occupancy Map Using Gaussian Mixture Model,” IEEE Transactions on Robotics (T-RO), Vol. 40, pp. 1339 – 1355, January 2024.

Abstract Paper Video Poster Slides Software

Energy consumption of memory accesses dominates the compute energy in energy-constrained robots which require a compact 3D map of the environment to achieve autonomy. Recent mapping frameworks only focused on reducing the map size while incurring significant memory usage during map construction due to multi-pass processing of each depth image. In this work, we present a memory-efficient continuous occupancy map, named GMMap, that accurately models the 3D environment using a Gaussian Mixture Model (GMM). Memory-efficient GMMap construction is enabled by the single-pass compression of depth images into local GMMs which are directly fused together into a globally-consistent map. By extending Gaussian Mixture Regression to model unexplored regions, occupancy probability is directly computed from Gaussians. Using a low-power ARM Cortex A57 CPU, GMMap can be constructed in real-time at up to 60 images per second. Compared with prior works, GMMap maintains high accuracy while reducing the map size by at least 56%, memory overhead by at least 88%, DRAM access by at least 78%, and energy consumption by at least 69%. Thus, GMMap enables real-time 3D mapping on energy-constrained robots.

Soumya Sudhakar, Vivienne Sze, Sertac Karaman, “Data Centers on Wheels: Emissions from Computing Onboard Autonomous Vehicles,” IEEE Micro, Special Issue on Environmentally Sustainable Computing, Vol. 43, No. 1, January/February 2023.

Abstract Paper Video Poster Slides Software

While much attention has been paid to data centers' greenhouse gas emissions, less attention has been paid to autonomous vehicles' (AVs) potential emissions. In this work, we introduce a framework to probabilistically model the emissions from computing onboard a global fleet of AVs and show that the emissions have the potential to make a non-negligible impact on global emissions, comparable to that of all data centers today. Based on current trends, a widespread AV adoption scenario where approximately 95% of all vehicles are autonomous requires computer power to be less than 1.2 kW for emissions from computing on AVs to be less than emissions from all data centers in 2018 in 90% of modeled scenarios. Anticipating a future scenario with high adoption of AVs, business-as-usual decarbonization, and workloads doubling every three years, hardware efficiency must double every 1.1 years for emissions in 2050 to equal 2018 data center emissions. The rate of increase in hardware efficiency needed in many scenarios to contain emissions is faster than the current rate. We discuss several avenues of future research unique to AVs to further analyze and potentially reduce the carbon footprint of AVs.

Peter Li, Sertac Karaman, Vivienne Sze, “Memory-Efficient Gaussian Fitting for Depth Images in Real Time,” IEEE International Conference on Robotics and Automation (ICRA), May 2022

Abstract Paper Video Poster Slides Software

Computing consumes a significant portion of energy in many robotics applications, especially the ones involving energy-constrained robots. In addition, memory access accounts for a significant portion of the computing energy. For mapping a 3D environment, prior approaches reduce the map size while incurring a large memory overhead used for storing sensor measurements and temporary variables during computation. In this work, we present a memory-efficient algorithm, named Single-Pass Gaussian Fitting (SPGF), that accurately constructs a compact Gaussian Mixture Model (GMM) which approximates measurements from a depthmap generated from a depth camera. By incrementally constructing the GMM one pixel at a time in a single pass through the depthmap, SPGF achieves higher throughput and orders-of-magnitude lower memory overhead than prior multi-pass approaches. By processing the depthmap row-by-row, SPGF exploits intrinsic properties of the camera to efficiently and accurately infer surface geometries, which leads to higher precision than prior approaches while maintaining the same compactness of the GMM. Using a low-power ARM Cortex-A57 CPU on the NVIDIA Jetson TX2 platform, SPGF operates at 32fps, requires 43KB of memory overhead, and consumes only 0.11J per frame (depthmap). Thus, SPGF enables real-time mapping of large 3D environments on energy-constrained robots.

Soumya Sudhakar, Vivienne Sze, Sertac Karaman, “Uncertainty from Motion for DNN Monocular Depth Estimation,” IEEE International Conference on Robotics and Automation (ICRA), May 2022

Abstract Paper Video Poster Slides Software

Deployment of deep neural networks (DNNs) for monocular depth estimation in safety-critical scenarios on resource-constrained platforms requires well-calibrated and efficient uncertainty estimates. However, many popular uncertainty estimation techniques, including state-of-the-art ensembles and popular sampling-based methods, require multiple inferences per input, making them difficult to deploy in latency-constrained or energy-constrained scenarios. We propose a new algorithm, called Uncertainty from Motion (UfM), that requires only one inference per input. UfM exploits the temporal redundancy in video inputs by merging incrementally the per-pixel depth prediction and per-pixel aleatoric uncertainty prediction of points that are seen in multiple views in the video sequence. When UfM is applied to ensembles, we show that UfM can retain the uncertainty quality of ensembles at a fraction of the energy by running only a single ensemble member at each frame and fusing the uncertainty over the sequence of frames. In a set of representative experiments using FCDenseNet and eight in-distribution and out-of-distribution video sequences, UfM offers comparable uncertainty quality to an ensemble of size 10 while consuming only 11.3% of the ensemble’s energy and running 6.4× faster on a single Nvidia RTX 2080 Ti GPU, enabling near ensemble uncertainty quality for resource-constrained, real-time scenarios.

Keshav Gupta, Peter Li, Sertac Karaman, Vivienne Sze, “Efficient Computation of Map-scale Continuous Mutual Information on Chip in Real Time,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2021

Abstract Paper Video Poster Slides Software

Exploration tasks are essential to many emerging robotics applications, ranging from search and rescue to space exploration. The planning problem for exploration requires determining the best locations for future measurements that will enhance the fidelity of the map, for example, by reducing its total entropy. A widely-studied technique involves computing the Mutual Information (MI) between the current map and future measurements, and utilizing this MI metric to decide the locations for future measurements. However, computing MI for reasonably-sized maps is slow and power hungry, which has been a bottleneck towards fast and efficient robotic exploration. In this paper, we introduce a new hardware accelerator architecture for MI computation that features a low-latency, energy-efficient MI compute core and an optimized memory subsystem that provides sufficient bandwidth to keep the cores fully utilized. The core employs interleaving to counter the recursive algorithm, and workload balancing and numerical approximations to reduce latency and energy consumption. We demonstrate this optimized architecture with a Field- Programmable Gate Array (FPGA) implementation, which can compute MI for all cells in an entire 201-by-201 occupancy grid (e.g., representing a 20.1m-by-20.1m map at 0.1m resolution) in 1.55 ms while consuming 1.7 mJ of energy, thus finally rendering MI computation for the whole map real time and at a fraction of the energy cost of traditional compute plat- forms. For comparison, this particular FPGA implementation running on the Xilinx Zynq-7000 platform is two orders of magnitude faster and consumes three orders of magnitude less energy per MI map compute, when compared to a baseline GPU implementation running on an NVIDIA GeForce GTX 980 platform. The improvements are more pronounced when compared to CPU implementations of equivalent algorithms.

Zhengdong Zhang, Theia Henderson, Sertac Karaman, Vivienne Sze, "FSMI: Fast computation of Shannon Mutual Information for information-theoretic mapping," International Journal of Robotics Research (IJRR), Vol. 39, No. 9, pp. 1155-1177, August 2020

Abstract Paper Video Poster Slides Software

Exploration tasks are embedded in many robotics applications, such as search and rescue and space exploration. Information-based exploration algorithms aim to find the most informative trajectories by maximizing an information- theoretic metric, such as the mutual information between the map and potential future measurements. Unfortunately, most existing information-based exploration algorithms are plagued by the computational difficulty of evaluating the Shannon mutual information metric. In this paper, we consider the fundamental problem of evaluating Shannon mutual information between the map and a range measurement. First, we consider 2D environments. We propose a novel algorithm, called the Fast Shannon Mutual Information (FSMI). The key insight behind the algorithm is that a certain integral can be computed analytically, leading to substantial computational savings. Second, we consider 3D environments, represented by efficient data structures, e.g., an OctoMap, such that the measurements are compressed by Run-Length Encoding (RLE). We propose a novel algorithm, called FSMI-RLE, that efficiently evaluates the Shannon mutual information when the measurements are compressed using RLE. For both the FSMI and the FSMI-RLE, we also propose variants that make different assumptions on the sensor noise distribution for the purpose of further computational savings. We evaluate the proposed algorithms in extensive experiments. In particular, we show that the proposed algorithms outperform existing algorithms that compute Shannon mutual information as well as other algorithms that compute the Cauchy-Schwarz Quadratic mutual information (CSQMI). In addition, we demonstrate the computation of Shannon mutual information on a 3D map for the first time.

Theia Henderson, Vivienne Sze, Sertac Karaman, “An Efficient and Continuous Approach to Information-Theoretic Exploration,” IEEE International Conference on Robotics and Automation (ICRA), May 2020

Abstract Paper Video Poster Slides Software

Exploration of unknown environments is embed- ded and essential in many robotics applications. Traditional algorithms, that decide where to explore by computing the expected information gain of an incomplete map from future sensor measurements, are limited to very powerful computa- tional platforms. In this paper, we describe a novel approach for computing this expected information gain efficiently, as principally derived via mutual information. The key idea behind the proposed approach is a continuous occupancy map framework and the recursive structure it reveals. This structure makes it possible to compute the expected information gain of sensor measurements across an entire map much faster than computing each measurements’ expected gain independently. Specifically, for an occupancy map composed of |M| cells and a range sensor that emits |Θ| measurement beams, the algorithm (titled FCMI) computes the information gain corresponding to measurements made at each cell in O(|Θ||M|) steps. To the best of our knowledge, this complexity bound is better than all existing methods for computing information gain. In our experiments, we observe that this novel, continuous approach is two orders of magnitude faster than the state-of-the-art FSMI algorithm.

Soumya Sudhakar, Sertac Karaman, Vivienne Sze, “Balancing Actuation and Computing Energy in Motion Planning,” IEEE International Conference on Robotics and Automation (ICRA), May 2020

Abstract Paper Video Poster Slides Software

We study a novel class of motion planning problems, inspired by emerging low-energy robotic vehicles, such as insect-size flyers, chip-size satellites, and high-endurance autonomous blimps, for which the energy consumed by computing hardware during planning a path can be as large as the energy consumed by actuation hardware during the execution of the same path. We propose a new algorithm, called Compute Energy Included Motion Planning (CEIMP). CEIMP operates similarly to any other anytime planning algorithm, except it stops when it estimates further computing will require more computing energy than potential savings in actuation energy. We show that CEIMP has the same asymptotic computational complexity as existing sampling-based motion planning algorithms, such as PRM*. We also show that CEIMP outperforms the average baseline of using maximum computing resources in realistic computational experiments involving 10 floor plans from MIT buildings. In one representative experiment, CEIMP outper- forms the average baseline 90.6% of the time when energy to compute one more second is equal to the energy to move one more meter, and 99.7% of the time when energy to compute one more second is equal to or greater than the energy to move 3 more meters.

Peter Li*, Zhengdong Zhang*, Sertac Karaman, Vivienne Sze, "High-throughput Computation of Shannon Mutual Information on Chip," Robotics: Science and Systems (RSS), June 2019

Abstract Paper Video Poster Slides Software

Exploration problems are fundamental to robotics, arising in various domains, ranging from search and rescue to space exploration. Many effective exploration algorithms rely on the computation of mutual information between the current map and potential future measurements in order to make planning decisions. Unfortunately, computing mutual information metrics is computationally challenging. In fact, a large fraction of the current literature focuses on approximation techniques to devise computationally-efficient algorithms. In this paper, we propose a novel computing hardware architecture to efficiently compute Shannon mutual information. The proposed architecture consists of multiple mutual information computation cores, each evaluating the mutual information between a single sensor beam and the occupancy grid map. The key challenge is to ensure that each core is supplied with data when requested, so that all cores are maximally utilized. Our key contribution consists of a novel memory architecture and data delivery method that ensures effective utilization of all mutual information computation cores. This architecture was optimized for 16 mutual information computation cores, and was implemented on an FPGA. We show that it computes the mutual information metric for an entire map of 20m × 20m at 0.1m resolution in near real time, at 2 frames per second, which is approximately two orders of magnitude faster, while consuming an order of magnitude less power, when compared to an equivalent implementation on a Xeon CPU.

Zhengdong Zhang, Trevor Henderson, Vivienne Sze, Sertac Karaman, “FSMI: Fast computation of Shannon Mutual Information for information-theoretic mapping,” IEEE International Conference on Robotics and Automation (ICRA), May 2019

Abstract Paper Video Poster Slides Software

Information-based mapping algorithms are crit- ical to robot exploration tasks in several applications rang- ing from disaster response to space exploration. Unfortu- nately, most existing information-based mapping algorithms are plagued by the computational difficulty of evaluating the Shan- non mutual information between potential future sensor mea- surements and the map. This has lead researchers to develop approximate methods, such as Cauchy-Schwarz Quadratic Mutual Information (CSQMI). In this paper, we propose a new algorithm, called Fast Shannon Mutual Information (FSMI), which is significantly faster than existing methods at computing the exact Shannon mutual information. The key insight behind FSMI is recognizing that the integral over the sensor beam can be evaluated analytically, removing an expensive numerical integration. In addition, we provide a number of approximation techniques for FSMI, which significantly improve computation time. Equipped with these approximation techniques, the FSMI algorithm is more than three orders of magnitude faster than the existing computation for Shannon mutual information; it also outperforms the CSQMI algorithm significantly, being roughly twice as fast, in our experiments.

Diana Wofk*, Fangchang Ma*, Tien-Ju Yang, Sertac Karaman, Vivienne Sze, "FastDepth: Fast Monocular Depth Estimation on Embedded Systems," International Conference on Robotics and Automation (ICRA), May 2019

Abstract Paper Video Poster Slides Software

Depth sensing is a critical function for robotic tasks such as localization, mapping and obstacle detection. There has been a significant and growing interest in depth estimation from a single RGB image, due to the relatively low cost and size of monocular cameras. However, state-of-the-art single-view depth estimation algorithms are based on fairly complex deep neural networks that are too slow for real-time inference on an embedded platform, for instance, mounted on a micro aerial vehicle. In this paper, we address the problem of fast depth estimation on embedded systems. We propose an efficient and lightweight encoder-decoder network architecture and apply network pruning to further reduce computational complexity and latency. In particular, we focus on the design of a low-latency decoder. Our methodology demonstrates that it is possible to achieve similar accuracy as prior work on depth estimation, but at inference speeds that are an order of magnitude faster. Our proposed network, FastDepth, runs at 178 fps on an NVIDIA Jetson TX2 GPU and at 27 fps when using only the TX2 CPU, with active power consumption under 10 W. FastDepth achieves close to state-of-the-art accuracy on the NYU Depth v2 dataset. To the best of the authors’ knowledge, this paper demonstrates real-time monocular depth estimation using a deep neural network with the lowest latency and highest throughput on an embedded platform that can be carried by a micro aerial vehicle.

Amr Suleiman, Zhengdong Zhang, Luca Carlone, Sertac Karaman, Vivienne Sze, "Navion: A 2mW Fully Integrated Real-Time Visual-Inertial Odometry Accelerator for Autonomous Navigation of Nano Drones," IEEE Journal of Solid State Circuits (JSSC), 54:4(1106-1119), April 2019

Abstract Paper Video Poster Slides Software

This paper presents Navion, an energy-efficient accelerator for visual-inertial odometry (VIO) that enables au- tonomous navigation of miniaturized robots (e.g., nano drones), and virtual/augmented reality on portable devices. The chip uses inertial measurements and mono/stereo images to estimate the drone’s trajectory and a 3D map of the environment. This estimate is obtained by running a state-of-the-art VIO algorithm based on non-linear factor graph optimization, which requires large irregularly structured memories and heterogeneous compu- tation flow. To reduce the energy consumption and footprint, the entire VIO system is fully integrated on chip to eliminate costly off-chip processing and storage. This work uses compression and exploits both structured and unstructured sparsity to reduce on-chip memory size by 4.1×. Parallelism is used under tight area constraints to increase throughput by 43%. The chip is fabricated in 65nm CMOS, and can process 752×480 stereo images from EuRoC dataset in real-time at 20 frames per second (fps) consuming only an average power of 2mW. At its peak performance, Navion can process stereo images at up to 171 fps and inertial measurements at up to 52 kHz, while consuming an average of 24mW. The chip is configurable to maximize accuracy, throughput and energy-efficiency trade-offs and to adapt to different environments. To the best of our knowledge, this is the first fully-integrated VIO system in an ASIC.

Amr Suleiman, Zhengdong Zhang, Luca Carlone, Sertac Karaman, Vivienne Sze, "Navion: A Fully Integrated Energy-Efficient Visual-Inertial Odometry Accelerator for Autonomous Navigation of Nano Drones," IEEE Symposium on VLSI Circuits (VLSI-Circuits), June 2018

Abstract Paper Video Poster Slides Software

This paper presents Navion, an energy-efficient accelerator for visual-inertial odometry (VIO) that enables autonomous navigation of miniaturized robots (e.g., nano drones), and virtual/augmented reality on portable devices. The chip uses inertial measurements and mono/stereo images to estimate the drone’s trajectory and a 3D map of the environment. This estimate is obtained by running a state-of-the-art algorithm based on non-linear factor graph optimization, which requires large irregularly structured memories and heterogeneous computation flow. To reduce the energy consumption and footprint, the entire VIO system is fully integrated on chip to eliminate costly off-chip processing and storage. This work uses compression and exploits both structured and unstructured sparsity to reduce on-chip memory size by 4.1x. Parallelism is used under tight area constraints to increase throughput by 43%. The chip is fabricated in 65nm CMOS, and can process 752x480 stereo images at up to 171 fps and inertial measurements at up to 52 kHz, while consuming an average of 24mW. The chip is configurable to maximize accuracy, throughput and energy-efficiency across different environments. To the best of our knowledge, this is the first fully integrated VIO system in an ASIC.

Zhengdond Zhang*, Amr Suleiman*, Luca Carlone, Vivienne Sze, Sertac Karaman, "Visual-Inertial Odometry on Chip: An Algorithm-and-Hardware Co-design Approach," Robotics: Science and Systems (RSS), July 2017

Abstract Paper Video Poster Slides Software

Autonomous navigation of miniaturized robots (e.g., nano/pico aerial vehicles) is currently a grand challenge for robotics research, due to the need for processing a large amount of sensor data (e.g., camera frames) with limited on-board computational resources. In this paper we focus on the design of a visual-inertial odometry (VIO) system in which the robot estimates its ego-motion (and a landmark-based map) from on- board camera and IMU data. We argue that scaling down VIO to miniaturized platforms (without sacrificing performance) requires a paradigm shift in the design of perception algorithms, and we advocate a co-design approach in which algorithmic and hardware design choices are tightly coupled. Our contribution is four-fold. First, we discuss the VIO co-design problem, in which one tries to attain a desired resource-performance trade-off, by making suitable design choices (in terms of hardware, algorithms, implementation, and parameters). Second, we characterize the design space, by discussing how a relevant set of design choices affects the resource-performance trade-off in VIO. Third, we provide a systematic experiment-driven way to explore the design space, towards a design that meets the desired trade-off. Fourth, we demonstrate the result of the co-design process by providing a VIO implementation on specialized hardware and showing that such implementation has the same accuracy and speed of a desktop implementation, while requiring a fraction of the power.