Run AzureML model on ARM64 with CPU

Disclaimer

Similar to the previous blog post this is still not supported by Microsoft / Azure. The project here is only for educational purposes and NOT FOR PRODUCTION. Nobody takes responsibility to this concept. You get it as it is and you can use it on your own.

The project is not supported, endorsed or encouraged by Microsoft. This is an independent personal (likely opinionated) project to demonstrate possible capabilities.

Introduction

This article will guide you through how to run an AzureML generated Yolov5 model on the ARM64 board with CPU. (NPU accelerated is coming soon.)

Pre-requisites

To achieve our target we need to have a proper ARM64 board, install Kubernetes, attach to Azure with Arc, install the AzureML extension and attach the cluster under the AzureML workspace.
For details read the previous AzureML extension on ARM64 blog post.

In addition we need to build our own ARM64 compatible container which can run the pre-trained AI model. If you don’t have an AI model yet then here is my guide: Azure ML in action with Lego figures

AzureML environments

The ML process requires a properly configured toolchain (Pytorch, Sci-kit Learn, etc.) to train and run the AI model. This pre-installed and pre-configured toolchain setup is called “Environment” in AzureML. The Environment is a Docker container where Python runtime and the Python packages are pre-installed to support our ML activities. AzureML offers several ready to environments which are called “Curated environments“. This is cool because we don’t need to bother about setting up the platform but we just pick an image and we can focus on the ML pipeline.

BUT!!! … of course these environments are prepared for x86 CPUs, nothing for ARM64 (not a surprise as ARM64 is NOT supported 🙂 ). Don’t worry we will solve this in the coming sections.

The (almost) good news is that AzureML lets us Create a custom environment container by defining the Dockerfile and the Conda environment file. This is a very nice way to install our wish-listed packages but as you guessed well this is supported only on x86.

However we cannot build ARM64 images but we can import them into the central registry or we can just simple run it from our local registry. And this is the way what will use below.

Creating AzureML environment for Yolov5

As a starting point, I thought to use one of the curated environment, take its Dockerfile and port it to ARM64. Unfortunately, there are some Python packages which are not supported on ARM64 (really just few) and hence this option doesn’t work because it ended in a dependency hell.

The below solution still based on the curated environment but installs only the minimum amount of Python packages which are really needed to run the model. You can find the Dockerfile and the Conda file in my GitHub repo.

Build the environment and push it to our local registry. Run this commands on the ARM board:

docker build -t localhost:5000/azureml-inference-arm64:latest https://github.com/szasza576/arc-on-arm.git#main:azureml/arm-yolo-env-cpu
docker push localhost:5000/azureml-inference-arm64:latest

(Almost) running the AI model

Another thing what we shall discuss about is MLflow. The AzureML generated (by AutoML) models use the MLflow concept to specify and run the AI models. This is a nice framework to easily handle the models. Unfortunately those AzureML Python packages which intended to cover this functionality were not compatible with ARM. Hence we cannot just reuse this part. MLflow is mainly used to load the model and run it. So instead we need to do 2 things: extract the raw AI model and write our own scoring script.

Extract the raw AI model

Again a good news 🙂 When we use AutoML then it saves the interim steps of the model training. We can extract the raw model in onnx format what we can run on our board.

Go to the Jobs section
Select that job which produced the best result during the AutoML
Go to the Outputs + logs
You can expand the folders, especially the mlflow-model and the train_artifacts. You can find valuable information in both folders.
We will need the “model.onnx” and the “labels.json” files from the train_artifacts and the settings.json from the mlflow-model/artifacts.

Now you can download the files manually put them into a folder and register a model in the Models menu on the left side … or you can use the following script which will do nicely automatically on behalf on you.

You shall have Python3 (max. version 3.12), Python Pip and the Azure CLI installed on your machine. Then install additional packages:

python3 -m pip install azure-ai-ml
python3 -m pip install 'mlflow<=2.3'
python3 -m pip install azureml-mlflow

Download the converter script from GitHub:

curl https://raw.githubusercontent.com/szasza576/arc-on-arm/refs/heads/main/azureml/model-register/model-register.py -o model-register.py

The scipt does the followings:

Searches for the best training job in the AutoML child results
Downloads the artifacts (model.onnx, labels.json, settings.json) into a temporary folder
(optionally converts the model to RKNN format. We don’t need it now. I will create another blog post about it.)
Moves the final files into the target folder
Creates and register a new Model in the AzureML workspace

These are the configurable parameters:

--workspace_name --> Name of the ML workspace
--resource_group --> Name of the ML workspace's Resource Group
--subscription_id --> Subscription ID of the ML workspace
--job --> Name of the AutoML training job
--model_name --> Name of the model how it will be registered.

The following parameters will be used to convert the model. You can ignore them now:

--rknn --> Enable rknn conversion and save the model in RKNN format instead of ONNX
--quant --> Enable quantization during RKNN conversion
--rknn_platform --> Define the Rockchip CPU model like RK3588. [i8, fp] for [rk3562,rk3566,rk3568,rk3588] and [u8, fp] for [rk1808,rv1109,rv1126]

And here is an example to run (values are from my environment which might be different than yours):

# Don't forget to login
# az login

python3 model-register.py `
  --workspace_name armtesting `
  --resource_group arc-on-arm `
  --subscription_id 11111111-2222-3333-4444-555555555555 `
  --job Legofigure_yolov5 `
  --model_name lego_onnx_model

If everything went well then we can see our registered model on the Models tab.

Scoring script

The scoring script is the entry point of our environment. It implements the RestAPI, loads the model and run the inference when a new picture arrives on the API.

As we had to rework our environment and we cannot use the Azure provided MLFlow models hence we cannot use the AzureML provided scoring script as well. This requires a small modification to adopt to our custom environment.

You can download the modified scoring script to your computer from my GitHub repo: scoring_arm_yolov5_cpu.py

Attach the SBC to AzureML

Now our cluster shall be Arc enabled and the AzureML extension shall run. If not then check again the Pre-requisites chapter 😉

Attach the cluster to the AzureML workspace. You can check this blog post for details: Azure ML with our own Kubernetes cluster or you can do it inside Azure AI Studio.

Go to the Compute menu (left side)
Select Kubernetes (top middle)
Click on +New

Fill out the form like:

Compute name: anything what you wish like “rock-5b”
Select your cluster from the drop-down list
Kubernetes namespace: “azureml” which is the same what we used for the extension.
Click on Attach

Finally we need need to specify the “instance type” of our pod. The defaultinstancetype specifies low CPU and memory limits hence we need to create our own. (Fun fact: this is a standard process hence the first step which is not a hackish one 😀 )
Create a new cpu instance with the following command:

kubectl apply -f https://raw.githubusercontent.com/szasza576/arc-on-arm/refs/heads/main/azureml/aml_cpu_instance.yaml

Now go back to the Azure AI Studio and select your cluster under the Computes. You shall see that the cluster is registered and the new instance type appears.

Deploy the model

Finally we can deploy our model to the ARM board.

Go to the Models menu
Select our model
Click on the Deploy button
And select the Real-time endpoint

Select Kubernetes as Compute type
Select our cluster in the drop down menu
Click on Next until you arrive to the Code + environment page

At the scoring script, click on the Browse and select the scoring script what you downloaded in the previous chapter.
Select Container registry image as environment type.
Past the image URL what we built in the previous chapter.
localhost:5000/azureml-inference-arm64:latest

On the Compute page:

Select cpuinstancetype as Instance type
Change the Instance count to 1 (as we have only 1 node in our cluster)
Click on Next till you arrive to the Review page and then click Create

Hurray our model shall be deployed after 2-3 minutes.

Testing the setup

At this point the deployment shall behave similarly as its x86 sibling. The endpoint shall be created and it shall expose the RestAPI in the same way.

You can follow the IoTrain – Internet of Trains aka Azure ML on the Edge cloud guide to deploy the camera app on a Raspberry Pi or ESP32. You can also connect the camera to the Rock5B and deploy the components directly on that.

(optional) Building the “Detector” on ARM

If you follow the previously mentioned guide then it will build an x86 container for the Detector with the “az acr build” command. This is also fine because our ARM setup can run x86 code with the help of QEMU. But it would be much nicer to build an ARM native container for this purpose as well.

Go to the Azure Portal and search for your container registry what you created based on the previous blog posts.
Go the Access keys
Copy the Password field

On the ARM SBC use these command to login, build and push to our ACR. (Paste the password when prompted):

# Set this variable to your ACR's name
ACRName="iotrainsacr"

docker login ${ACRName}.azurecr.io --username $ACRName

docker build \
  https://github.com/szasza576/arc-iotrains.git#main:detector/dockerimage \
  -f Dockerfile \
  -t ${ACRName}.azurecr.io/detector-arm:latest

docker push ${ACRName}.azurecr.io/detector-arm:latest

As we changed the image name from detector to detector-arm hence we need to update this in the deployment file as well. Use these commands instead of those in the previous post to deploy the detector. You can use the rest of the manifest files as it was documented earlier.

wget https://raw.githubusercontent.com/szasza576/arc-iotrains/main/detector/k8s-manifests/marker-deployment.yaml
sed -i s/"<YOURACR>"/$ACRName/g marker-deployment.yaml
sed -i s/"detector:latest"/"detector-arm:latest"/g marker-deployment.yaml
kubectl apply -f marker-deployment.yaml
rm marker-deployment.yaml

Once you are ready then you can start capturing pictures. The inference takes ~350 ms on the RK3588 CPU at 720p resolution. You shall see something like this:

Enjoy 🙂