DeepSeek LLM: YY3588 Edge AI Hardware vs Jetson Orin vs Raspberry Pi 5

We use cookles to Improve your online experience. By continuing browsing this website, we assume you agree our use of cookies.

DeepSeek LLM on the Edge: A Performance Showdown - YY3588 vs. Jetson Orin vs. Raspberry Pi 5

By youyeetoo September 8th, 2025 8553 views

The YY3588 is a high-performance AIoT development board from Youyeetoo Technology. AIoT, or Artificial Intelligence of Things, refers to the integration of artificial intelligence technology with the Internet of Things to achieve intelligent connectivity for everything.

As large language models (LLMs) continue to become more lightweight, deploying models with hundreds of millions of parameters on edge devices has become a reality. This article uses the youyeetoo YY3588, based on the Rockchip RK3588, as the hardware platform to test its performance when deploying models from the DeepSeek series, exploring the potential of running large models in edge computing scenarios.

1. Hardware and Software Environment

1.1 YY3588 Development Board Basic Configuration

1.1.1 Core Hardware

NPU: 6TOPS computing power (INT8) + Mali-G610 GPU
Memory & Storage: 16GB LPDDR4X (Tested bandwidth 68GB/s) | 512GB NVMe SSD (Expanded via PCIe 3.0 x4 interface)

This powerful sbc computer offers flexible memory and storage configuration options. For memory, it supports various LPDDR4 specifications up to 16GB. For storage, it provides multiple choices including eMMC, SATA SSD, and MicroSD card, with support for up to 256GB of eMMC storage, ensuring ample data space.

1.1.2 Software Stack

System: Ubuntu 22.04 LTS (RK3588 custom kernel 5.10)
Inference Framework: ONNX Runtime 1.16 + RKNN-Toolkit2 1.6
Optimization Tool: DeepSeek Official Quantization Toolchain v0.3

2. DeepSeek Model Deployment

2.1 Model Selection and Optimization

Test Model: DeepSeek-MoE-16B (4.3GB after sparsification)
Quantization Scheme:

python quantize.py --model deepseek-16b-fp32.onnx \
--output deepseek-16b-int8.rknn \
--dataset calibration_data/ \
--quant_type hybrid

Optimization Results:
- Model size reduced to 1.2GB (72% compression rate)
- Memory usage dropped from 12GB to 3.8GB

2.2 Key Steps for Deepseek-R1 1.5B Model Deployment

Here is a brief tutorial for those looking to deploy LLM on edge devices.

2.2.1 Ubuntu 22.04 Host Setup:

# Download rknn-llm
git clone https://github.com/airockchip/rknn-llm.git

# Install miniforge3 and conda
wget -c https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
chmod 777 Miniforge3-Linux-x86_64.sh
./Miniforge3-Linux-x86_64.sh

## Confirm successful installation
conda -V

2.2.2 Create RKLLM-Toolkit Conda Environment:

source ~/miniforge3/bin/activate
conda create -n RKLLM-Toolkit python=3.8
conda activate RKLLM-Toolkit
pip3 install rkllm-toolkit/packages/rkllm_toolkit-1.1.4-cp38-cp38-linux_x86_64.whl
# Check for successful installation (no errors means success)
python

2.2.3 Convert DeepSeek-R1-1.5B from HuggingFace to RKLLM Model:

cd examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/export/
python export_rkllm.py

Code for DeepSeek model conversion to RKLLM

The converted model is: DeepSeek-R1-Distill-Qwen-1.5B.rkllm

2.2.4 Compile Libraries and Demo

Download the cross-compilation toolchain (if a complete SDK has been downloaded, the cross-compilation toolchain within the SDK can be used).

# Modify compiler path
vim examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/deploy/build-linux.sh

Compiler path modification for DeepSeek deployment

Start Compilation:

cd examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/deploy/
bash build-linux.sh

Generate Libraries and Demo:

rknn-llm/examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/deploy/install/demo_Linux_aarch64$ ls
lib  llm_demo

2.2.5 Run Model On-Device:
Push the library, demo, and converted model to the board, then execute the demo.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:./lib
export RKLLM_LOG_LEVEL=1
./llm_demo DeepSeek-R1-Distill-Qwen-1.5B.rkllm 10000 10000

2.2.6 Related Resources Download
https://wiki.youyeetoo.com/YY3588

2.2.7 Screenshots of Running Process and Video Links

DeepSeek LLM running on YY3588 screenshot 1

DeepSeek LLM running on YY3588 screenshot 1

DeepSeek LLM running on YY3588 screenshot 2

DeepSeek LLM running on YY3588 screenshot 3

3. Performance Test and Comparison

3.1 Inference Speed Test (Input length 256 tokens)

Execution Mode	First token latency	Throughput (tokens/s)	Power (W)
CPU (A76 Quad-core)	850ms	4.2	8.1
GPU (Mali-G610)	420ms	9.8	6.5
NPU (INT8 Quantized)	220ms	18.5	4.3

3.2 Stress Test

Multi-tasking: Simultaneous Q&A, summary generation, and sentiment analysis.
- Resource Usage: NPU 85% / Memory 12GB / Temperature 72℃
- Latency Fluctuation: ±15% (Superior to Xavier NX performance)

Long-text Processing: Input 4096 tokens from a legal document.
- Memory Management: Implemented chunked loading via mmap to avoid Out-of-Memory (OOM) errors.

4. Typical Application Scenario Verification

4.1 Intelligent Customer Service System

Test Case: E-commerce after-sales consultation scenario.

Actual Results:
- Response Time: Average 1.2 seconds/round (including network transmission)
- Accuracy: 88.7% (Compared to 92.1% from a cloud API)
- Offline capability: Basic services can be maintained even when disconnected from the network.

4.2 Local Knowledge Base Search (RAG)

4.2.1 Architecture Design:

mermaid
graph LR
A[User Query] --> B(Embedding Model)
B --> C[FAISS Vector Database]
C --> D[DeepSeek Generate Answer]
D --> E[Output Response]

4.2.2 Performance:

Latency for millions of document retrievals: <300ms
Supports RAG (Retrieval-Augmented Generation) mode

5. Horizontal Comparison and Scenario Recommendations

When looking for a Jetson Orin alternative or a powerful Raspberry Pi upgrade, this is how the YY3588 stacks up as a piece of edge AI hardware.

Comparison Item	YY3588 + DeepSeek	Raspberry Pi 5 + Llama 2-7B	Jetson Orin + DeepSeek
Single Inference Power	4.3W	7.8W	12.3W
tokens/¥ Performance Ratio	428	196	315
Typical Scenario	Enterprise Edge Inference Gateway	Education / Lightweight Experiments	High-Performance Robotics Main Controller

6. Conclusion

The combination of the YY3588 and DeepSeek validates the feasibility of deploying large models on the edge. The deep, synergistic optimization between its NPU and software stack demonstrates the progress of the domestic chip ecosystem. Although there are still limitations in handling ultra-long text and supporting massive-scale models, it is more than sufficient to open up new imaginative possibilities for intelligent terminal devices.

youyeetoo X1s Unveiled: The N5095 x86 Single Board Computer Solving the N5105 Shortage

Intel N100 Datasheet and Schematics: A Working Engineer's Reference Guide

Sign In

DeepSeek LLM on the Edge: A Performance Showdown - YY3588 vs. Jetson Orin vs. Raspberry Pi 5