DEANZA | ENGINEER

Overview

The DeAnza project evolved from its original form of a rudimentary image viewer in Qt and C++ on unix/ubuntu with calibration of a static loaded image. This was a revisit to an implementation approach in the ultrasound domain where key image physical and pixel parameters are captured to connect the image to real world distances.

It expanded to also support macOS, calibration of a frame frozen from a live video stream and also inference models for Head Pose Estimation from a image frames received from an attached camera source.

Problem

The original problem statement behind DeAnza project was to build an rudimentary image viewer in Qt and C++ on unix/ubuntu. This was a revisit to the Qt UI framework and interaction detection with the image viewer set the stage for rudimentary Data Distribution Service (DDS) middleware wire up in late 2024.

When NVIDIA Jetson Orin Nano hardware entered the equation in 2025 - the options opened up. The workbench provided a research environment to bring up ROS2 packages, paying close attention to working within the resource limitations of edge hardware balancing work on the CPU and GPU.

Since then, it has been a workbench to continue to build out and demonstrate a level of understanding in a few areas including NVIDIA edge systems, CUDA, ROS2 and additional models and inference pathways. In short, this is the beginnings of Perception and Interpretation subsystems to bring up a robot system.

Approach

There is a host application written in C++ using Qtframework was generalized to build, not only on unix/ubuntu but on macOS as well.

There is a backend based on Humble ROS on NVIDIA Jetson Orin Nano hardware with several application packages including gstreamer video feed.

The feature sets included inference pipelines for Head Pose Estimation and work to set the stage for an additional inference pathway with a foundational model for Depth Estimation. Specifically, TensorFlow models were executed but using the ONNX engine approach.

All data channels between host and the ROS2 packages running on the edge system are based on Data Distribution Service (DDS) middleware.

Outcome

As a snapshot, the current resting state of the image viewer application host fronted and edge system backend is outlined below.

The live video is based on gstream for the UVC USB connected camera to the NVIDIA Jetson Orin Nano edge hardware. The user can calibrate to identify pixel and physical units to connect pixel space to real world from a static image or from a frame frozen from live video. The user can define and send over connected sets of waypoints that enable pixel and physical measurements. The calibration and waypoint measurements workflow steps are functional, a minimal Perception pipeline with Head Pose Estimation (HPE) inference is in place - if a human head is in the scene. A seed Interpretation channel is up for Depth Estimation, but the specifications for the related user features are being thought through.

For the static still user scenario, a real world example is described in the project summary at the link below - based on an actual photo taken from a vantage point above Princeton harbor using a Canon 50D with a 55-250mm zoom.

Read more about the application system including the image capture and inference pipeline here.