GPUs are ubiquitous throughout data science, but why have they not yet been integrated into radio frequency and wireless technology?
Authors
John Ferguson – CEO
Jeff Zurita – Senior Applications Engineer

The graphics processing unit (GPU) has served as the fuel for almost all modern Artificial Intelligence (AI) breakthroughs and this fact has led to the incorporation of the GPU into many sensor systems. The GPU’s ease of programming, plentiful software tools, and high efficiency for neural network computation has made them the processor of choice for AI. A GPU equipped software defined radio (SDR) significantly improves on the computational limitations of traditional SDRs and also allows users to leverage the enterprise software tools that are developed and maintained by the industry leaders such as NVIDIA, Microsoft, and Google. Reliability and dependability is thereby improved compared to many other SDR product lines.
The incorporation of the GPU into the SDR allows for parallelizable applications, such as AI and signal processing, to be performed directly on the sensor at the edge. This reduces cost and complexity by eliminating the auxiliary computer that is required by the majority of SDR vendors. So the obvious question is, “why have GPUs not been readily incorporated into software-defined radios?”. In this article, we will attempt to answer this question and provide the specific benefits of why the industry needs to shift sides and embrace the GPU for systems and signals.
Deepwave Digital started this GPU-accelerated SDR market in 2019 with the introduction of the Artificial Intelligence Radio Transceiver (AIR-T) Embedded Series and recently announced the AIR-T Edge Series. The Edge Series SDRs are rugged, high-performance, GPU-enabled software-defined radios that are ready for deployment in outdoor and harsh environments. The AIR8201 sensor contains a fully functioning Linux operating system and is equipped with three high-performance processors: a GPU, a CPU, and a field programmable gate array (FPGA).
The Software-Defined Radio
The basic building blocks of modern wireless or radio frequency (RF) systems are software-defined radios. Mechanical components in traditional radio systems, such as filters and amplifiers, are replaced in SDRs by software components in order to provide greater performance and flexibility. Although the concepts were initially developed in the late 1980s, technological advancement has accelerated their development and use. SDRs are used extensively in military applications, commercial wireless enterprises, and even in the amateur/home use marketplace. The only limitation of SDR systems is that the programmable processors must have enough horsepower to execute the highly complex algorithms. Most SDRs are only equipped with a CPU and an FPGA. The CPU benefits from a wide list of available software tools but has limited processing power. The FPGA has a tremendous amount of processing power, but creating software can take months to a year. The inclusion of a GPU in an SDR, a feature unique to the Deepwave Digital SDR products, represents the best of both worlds: the ability to be easily programmed/reprogrammed, accompanied by immense processing power.
Benefits of GPU Processing for SDRs
The Ubiquity of the GPU for Artificial Intelligence
Almost all deep learning leverages the GPU for processing. This is primarily enabled by NVIDIA’s CUDA and cuDNN software tools that allow for efficient and reliable GPU computation of parallel and deep machine learning algorithms, respectively. The GPU is so ubiquitous in the AI space that it is supported by every deep learning framework. Most frameworks also support CPUs; however, the computing time becomes intractable because there are not enough parallel cores. This is the key to the GPU’s supremacy.
Not only is the GPU the workhorse for training AI algorithms, but it is also highly leveraged for inference — the deployment of AI for operation. When an SDR incorporates the GPU into the radio, pre-trained models may be downloaded and run on the SDR in a matter of minutes. This is not possible using an FPGA and a CPU does not have the compute necessary for real-time operation.
Considerations for Using GPUs in Signal Processing
A few SDR systems use a GPU on a PCI Express bus: the same bus as the radio. While this does enable GPU acceleration of signal processing, it comes at the expense of an additional data copy as shown in the figure below. The signal data is copied from the radio (RF Front End + FPGA) to the system memory. It is then copied again to the GPU memory, introducing latency that must be overcome. This latency and the complication of overcoming it has led to much reluctance to incorporate the GPU in RF sensors.

Deepwave’s Approach to Reduce Latency: Zero-Copy
Deepwave’s AIR-T product line eliminates this extra memory copy by leveraging the NVIDIA Jetson modules that have a shared memory architecture. As shown in the figure above, the CPU and GPU share the same physical memory, making the data equally accessible between the two processors. This is called zero-copy and is the core functionality that enables real-time signal processing and AI on the AIR-T.
Leveraging Trusted APIs
One of the major benefits of the AIR-T is that it leverages application programming interfaces (APIs) that are developed and maintained by industry leaders. This helps eliminate roadblocks that can occur when using non-enterprise-level APIs. The lists below provide a few of the most common APIs that the AIR-T supports. Support for the AirStack driver is provided for Python, C++, and Go programming languages.
Deepwave’s Drivers
- AirStack: AIR-T’s firmware, software, and Linux operating system
- AirStack Sandbox: FPGA development kit for custom firmware
- AirPack: Deep learning classification model building framework
- Radio Signal Classifier: Signal classification service for the AIR-T
Click here for Deepwave’s documentation
Supported Deep Learning Tools
- cuDNN: (NVIDIA) Neural network computation engine
- PyTorch: (Facebook AI Lab) Deep learning Training and Inference
- TensorFlow: (Google) Deep learning Training and Inference
- MATLAB: (MathWorks) Deep learning toolbox
- ONNX: (Microsoft) Industry standard neural network model file format
- ONNX Runtime: (Microsoft) Inference engine for ONNX files
- TensorRT: (NVIDIA) Neural network runtime optimization
Supported Signal Processing Tools
- CUDA: (NVIDIA) GPU acceleration programming language
- cuSignal: (NVIDIA) GPU accelerated signal processing
- TorchAudio: (Facebook AI Lab) PyTorch signal processing
- GNU Radio: (GNU Radio Community) Signal processing and communications
Third Party Applications
- OmniSIG: (DeepSig) RF sensing and awareness using deep learning
Parallel Processing Example with CPU vs GPU
As a great example of performing signal processing is to create a polyphase filter: an industry-standard way to filter out interference. In the video below, this filter is implemented using a CPU (left) and a GPU (right) on the AIR-T. The GPU signal processing is implemented using cuSignal. The code is nearly the exact same, however, the GPU version provides the user an almost 8x speedup, allowing for processing signals at 3.727 gigabits per second. The CPU implementation is limited to 0.467 gigabits per second. As far as signal bandwidth, the CPU version can process up to 7 MHz of bandwidth while the GPU version will process 58 MHz of bandwidth.\
Source Code For Signal Classification
In addition to the compatibility with standard tools, Deepwave also offers an optional software framework, AirPack, which greatly simplifies the creation, training, and deployment of neural networks. AirPack includes a Python API so functions can be accessed from a larger Python application. AirPack also includes training data to verify and measure the performance of neural networks during operations such as hyperparameter tuning or architectural changes.

Simplicity of Deep Learning Deployment
The Deepwave Digital AIR-T design, and the software API’s to support it, bring deep learning to the RF domain. The software support for the AIR-T allows for deep learning neural nets to be trained offline in any deep learning framework or programming language such as TensorFlow, PyTorch, or MATLAB. The trained neural network can then be ported to the AIR-T as an industry-standard ONNX file and be incorporated into an application. The ONNX file can be called directly using ONNX Runtime or, further optimized for speed of execution by using the TensorRT tool.
In the video below, Deepwave has trained a signal classification model to detect and classify various radar modulation methods. The signal is fed into the AIR-T and the deep learning classifier produces the label based on the model.
Summary
The processing power of GPUs offers a significant increase in computational capabilities for SDRs, allowing for high-performance signal processing operations and making deep learning in SDRs possible.
Deepwave Digital has embraced the GPU potential with the AIR-T software-defined radio, which includes both CPU and GPU on board. The Deepwave AIR-T also mitigates latency issues in GPU computing by using a shared memory scheme, the zero-copy architecture where the CPU and GPU share a common memory pool. This minimizes latency and makes possible many real time applications in both signal processing and machine learning.
Finally, the complete software suite for the AIR-T takes full advantage of the GPU computational resources and allows rapid development and deployment of RF applications.