PDigit's AI PORTFOLIO
Blending AI efficiency with human interpretation
EDGE AI Embedded Computing
Why? I’m a strong believer and find particular synergy and value in merging electronic and embedded engineering expertise with AI. Advantages
Commercial Products Integration: Implementing TPUs solutions like NVIDIA Jetson (Orion), Google Coral, Axelera Metis, Rapi AU HAT+ (Hail based) for real-time AI at the edge.
Custom HW/FPGA Integration: Designing hardware solutions for unique business requirements and ‘extreme’ features: throughput (# inferences/s), power efficiency, BOM reduction, maintainability, and service.
Solution Examples:
- Smart surveillance systems with facial recognition
and anomaly detection capabilities for enhanced
security
- Autonomous vehicles using edge computing for
immediate response to road conditions and
obstacles
- IoT healthcare devices that monitor patient vitals
and predict health episodes, sending alerts in
real-time
- Embedded chatbots in industrial machinery providing immediate diagnostics and maintenance suggestions
- Smart surveillance systems with facial recognition
and anomaly detection capabilities for enhanced
security
Embedded Applications: Utilizing the computational power of embedded devices to perform data processing locally, rather than relying on cloud computing
- Specific Implementation:
- In smart surveillance systems, using embedded
devices with GPUs (like NVIDIA Jetson) for real-time
image processing and anomaly detection without the
latency of cloud processing
- For autonomous vehicles, embedding systems that
process sensor data (LIDAR, radar, cameras) in real-time
for immediate decision-making
- Healthcare IoT devices that process vital data locally and only send critical information or alerts to central servers, preserving bandwidth and enhancing privacy
- In smart surveillance systems, using embedded
devices with GPUs (like NVIDIA Jetson) for real-time
image processing and anomaly detection without the
latency of cloud processing
- Specific Implementation:
Other Edge AI HW Solutions
- Neuromorphic computing (uNPU) - Ultra Low Power examples: Neuronova ; RISC-V neuromorphic edge AI microcontroller
See also Small LLM and TinyML
The Evolution of Edge Computing: Fine-Tuning AI Models in Real-Time
NOTE: We here refer to Edge AI in an embedded context The term “Edge Computing” can also refer to distributed computing but has quite a different meaning. See Wikipedia for the “distributed” definition.
Edge Computing represents a significant leap in data processing technology, particularly in its ability to fine-tune AI models in real-time. This process involves making incremental adjustments to AI models, allowing them to adapt more effectively to specific tasks. Crucial in dynamic environments, Edge Computing enables AI models to continuously learn and evolve, contrasting with traditional fixed models
This technology is transformative, bringing data
processing closer to its source, thus enhancing
efficiency and leading to smarter technological
solutions. Key benefits include reduced latency,
improved data management, and enhanced security
Practical examples (2024/2025) include NVIDIA Jetson
Nano-Orin, axelera.ai Metis, Raspberry Pi 5 + Pi AI
HAT/HAT+, Google Coral TPUs and more announced,
showcasing Edge Computing’s versatility, STM32 AI (* and
many new chips coming out continuously)
References
(external Links)
- AI
at the Edge: Low Power, High Stakes – 2025
- Edge
AI Movement (CES 2024)
- “Embedded Hardware for Processing AI at the Edge: GPU, VPU, FPGA and ASIC Explained” - Edge Computing
Advantages of Edge AI
Reduced Latency and Improved Speed
By processing data close to its source, Edge Computing minimizes latency, essential for real-time applications like autonomous vehicles and smart city infrastructures.
Enhanced Data Management and Privacy
Local data processing reduces reliance on cloud storage, enhancing data privacy and meeting data protection regulations.
Scalability and Cost-Effectiveness
It allows for efficient scalability without major infrastructure overhauls, exemplified by specialized hardware like the Axelera Metis TPU, NVIDIA Jetso Orin*
Resilience and Continuous Operations
Its independence from central servers ensures continuous operation, even in unstable environments.
Benchmarking and Profiling Performances
Typ Measurements
- TOPS, TOPS/W
- Inference Latency and accuracy on common (open src)
models
- FPS
See also - Edge Benchmarking Projects
- my presentation with demos: AI
& Edge embedded AI (MAY 2025)
- YT video [ITA with slides in ENG]
- deepdive podcast summary in ENG
Small LLM and TinyML
What are Small LLMs and Why? Small LLMs are streamlined versions of larger language models, designed to be more efficient while still delivering valuable language processing capabilities. They are particularly effective in scenarios where computational resources, response time, and application specificity are critical. Typical scenarios where Small LLMs can outperform their larger counterparts
TinyML is focused on running machine learning models on extremely low-power and low-footprint microcontrollers (MCUs) embedded MCU devices have very limited RAM (KBytes) and “slow” CPUs (in range of MHz). For ulra-lowpower see neuromorfic chips (kHz, mW)
Limited Resource Environments
- Scenario: In embedded systems or IoT devices where memory and processing power are at a premium, small LLMs can provide intelligent capabilities without the overhead of larger models.
- Example: A wearable device that offers real-time language translation or a smart home device providing voice-activated control and simple Q&A functionalities.
Faster Response Times
- Scenario: Situations where rapid response is crucial, and the latency introduced by cloud processing is unacceptable.
- Example: Real-time language
processing in customer service chatbots embedded in
websites or applications, where quick responses are
essential
- in TinyML latency is very low* (milliseconds ms)
Specificity and Customization
- Scenario: Cases where the model needs to be highly specialized for a specific domain, task, or language, and a larger, more general model might not offer the same level of accuracy.
- Example: An industrial or medical chatbot that needs to understand and use very specific terminologies accurately. A small LLM can be finely tuned to these sectors, providing more precise and reliable responses.
Small LLMs: Cost-Effectiveness
- Scenario: For businesses or applications where the cost of computing power is a significant consideration, smaller LLMs can be more economical.
- Example: Implementing AI-driven customer service solutions might opt for smaller LLMs to keep operational costs down while still benefiting from AI capabilities.
Small LLMs: Energy Efficiency
- Scenario: In environments where energy consumption is a concern, such as mobile applications or in developing regions with limited power infrastructure.
- Example: Mobile apps for language learning or translation services that use small LLMs to minimize battery drain.
External link What is the difference between CPU vs GPU vs TPU? Complete Overview
Small LLMs: Privacy and Data Security
- Scenario: When processing sensitive data, keeping the data on the device (edge computing) rather than sending it to the cloud can be more secure.
- Example: Typical benefits are in healthcare applications where patient data privacy is paramount, using small LLMs to process and understand patient inquiries locally.
In these scenarios, small LLMs offer a balance between the advanced capabilities of AI language processing and the constraints of specific applications, environments, or resources. They represent a tailored approach, where the size and complexity of the model are matched to the specific needs and limitations of the use case.
- Other Advantages & summary: LLMs may not be the fit-all solution. In several cases, Small LMs have proven to be more efficient, cost-effective, and tailored for specific business needs.
TinyML typical Applications
- Keyword Spotting: Enabling voice assistants to wake up to a specific phrase (eg “Hey Google”)
- Simple Anomaly Detection: Detecting unusual vibrations in industrial machinery to predict failures
- Presence Detection: Using low-resolution image sensors to determine if a person is in a room to conserve energy
Embedded Applications (Small LLM & Edge Computing) Summary
Edge Computing has diverse industry applications, from healthcare’s real-time patient monitoring to predictive maintenance in manufacturing and enhanced personalized on-site services, as well as remote monitoring stations.
Deploying smaller, more efficient LLMs that are suitable for embedded systems with limited computational resources.
Specific Implementations examples:
- Embedded chatbots in retail kiosks, providing customer assistance without needing a constant internet connection.
- Language translation tools in portable devices, offering real-time translation without relying heavily on server communication, suitable for travelers or field workers.
- Efficient language translation services for niche languages or dialects not well-covered by larger models.