AzureML extension on ARM64

Disclaimer

The project is only for educational purposes and definitely NOT FOR PRODUCTION. Some of the steps are completely against of standard security principles (running container in privilege). There is no warranty and no SLA. Nobody takes responsibility to this concept. You get it as it is and you can use it on your own.

The project is not supported, endorsed or encouraged by Microsoft. This is an independent personal (likely opinionated) project to demonstrate possible capabilities.

Introduction

Azure Arc makes on-premise servers capable to connect to the Azure ecosystem and deploy Azure services to the connected servers. It deploys on top of a Kubernetes cluster and it requires extensions to make the server/cluster capable to deploy the different services.

If you wish to learn more about Arc then I suggest to visit the Azure Arc JumpStart webpage.

Azure Arc is supported on ARM64 but most of the extensions aren’t yet. Very likely in the next 1-2 year this will change and makes this project obsolete (crossed-fingers :)). It is also clearly stated in the Arc documentation.

“Currently, Azure Arc-enabled Kubernetes cluster extensions aren’t supported on ARM64-based clusters, except for Flux (GitOps). To install and use other cluster extensions, the cluster must have at least one node of operating system and architecture type linux/amd64.”
https://learn.microsoft.com/en-us/azure/azure-arc/kubernetes/system-requirements#cluster-requirements

This project helps to deploy the Arc extensions on ARM64 boards. In some cases, it reaches the goal by QEMU emulator (running x86-64 binaries on ARM64) and in some cases it rebuilds some images. This blog post focuses on the Azure ML extension only.

Why???

First of all, Why not? 😀

ARM boards/CPUs are widely available and they set their place in the world. More and more SoC (System on a Chip) has integrated AI accelerator which makes them capable to run AI models.

While the hardware compute power increases the AI models are getting smaller and more efficient. However this might be a bit early but due to the rapid technology improvements running AI models on ARM will be a standard, especially on edge cloud.

The ARM board

ARM boards were not famous for their compute power but more for their energy efficiency. Nevertheless they became more capable in the last years which makes them a good choice to edge cloud appliances.

I use a Radxa Rock5B (aka. Okdo Rock5B) board in this project because of the following reasons:

CPU: Rockchip RK3588 CPU with 8 cores (4x Cortex-A76 and 4x Cortex-A55)
Memory: 16GB (more than enough to host several apps) but there is 32GB variant as well.
eMMC and M2 options for fast storage (Arc and K8s are sensitive to fast storage). You will need at least 64GB of storage but more is always better.
and most importantly an 6 TOPS NPU (Neural Processing Unit) –> an AI accelerator.

You can read a detailed review about the board on CNX Software’s blog. Nevertheless there are other boards from different vendors with this SoC with similar capabilities.

What about Raspberry PI (or other ARM)??? Well, I haven’t tested it yet but I guess it shall work on an RPi-5 with 8GB memory as well. I will test it later if I have some spare time. For the AzureML, I strongly suggest to use an AI accelerator what the RPi is short of.

Porting techniques

When we would like to port an application then there are different ways how we can achieve it. There are 3 major different ways how these containers were implemented.

Compiled binaries

Azure engineers also use the classic way to write code, compile it and build a container around it. The source code might be GOLang, C++ or anything. These binaries are built for x86-64 and cannot be reverse engineered to port it to ARM64.

We would need the source code to compile it to ARM64 to run natively but we don’t have this code of course. We will use a QEMU emulator which runs the original x86-64 binaries on ARM64. This is slow method and far from being efficient but will save lot of time. It works almost all binaries with full transparency.

As the extensions are mainly for management purposes hence losing performance here is acceptable. Although, it is not recommended to use this method for the final workloads (AppServices, AzureML models, etc.).

.NET / C# code

Some components were written in C#. The .NET core framework is sensitive to QEMU, hence running these containers in an emulated environment will fail (mainly memory management issues).

The good news is that .NET (and JAVA) compiles the source code to a interim object which is hardware agnostic. The .NET framework is responsible to the hardware adaptation. This means we can build a new ARM64 based container with the ARM64 specific .NET framework and just copy the pre-compiled object from the original container.

3rd party containers

Azure also uses 3rd party containers like Prometheus without modification. In this case we just simply point our deployments to the original sources where ARM64 containers are available. In most cases we can just patch the Kubernetes manifest files.

Docker image inspection

Finally a quick world about Docker images. Once you pull an image from anywhere then you can have look into it. You can see the executed steps during the image building and hence you can reproduce the original Dockerfile. Of course, you still don’t have access to all the components (like files) but with this you can take most of the steps.

To do this you can use these commands:

docker pull <your image>
docker history <your image> --no-trunc
docker inspect <your image>

If you prefer to use a tool instead of CLI then I can suggest the dive tool which makes the browsing easy and it shows the different layers as well.

Basic setup

I don’t go into the details. This is just the usual setup and there is no magic here.

Node deployment

Install an Ubuntu on the board. I used Joshua’s solution.

Setup a fix IP address (run it on ARM board)

if [ -z ${MasterIP+x} ]; then MasterIP="192.168.0.190"; fi

sudo tee /etc/netplan/10-basic-config.yaml<<EOF
network:
  ethernets:
    enP4p65s0:
      dhcp4: false
      dhcp6: false
      addresses:
        - ${MasterIP}/24
      routes:
        - to: default
          via: 192.168.0.1
      nameservers:
        addresses:
          - 192.168.0.1
  version: 2
EOF

sudo netplan apply

Setup Kubernetes cluster (run it on ARM board)

sudo apt update
sudo apt install -y curl
curl -sL https://raw.githubusercontent.com/szasza576/arc-on-arm/master/base-infra/kubernetes-setup.sh | sudo bash

Copy the kubeconfig file to your management computer. (I use a Windows 11 but you can use your own of course.)

Azure setup and Arc connectivity

Attaching the node with Arc is easy as ARM is supported by Arc so this is just following the Azure documentation or check the Azure Arc with on-prem Kubernetes blogpost.

Before starting, check if your cluster is the active context and switch if necessary (Powershell on management PC):

kubectl config get-contexts

kubectl config use-context <YourClusterName>

Create a Resource Group and attach the cluster (Powershell on management PC):

$ResourceGroup="arc-on-arm"
$ClusterName="rock5b"
$Location="westeurope"

az group create --name $ResourceGroup --location $Location

az connectedk8s connect `
  --name $ClusterName `
  --resource-group $ResourceGroup

QEMU emulator

The QEMU emulator can run x86-x64 code on ARM64 by emulating the CPU. It comes with some overhead but it is fine for management containers.

The deployment is fairly easy as it can run as a container. It registers the module on the host and that’s it. Once an x86-x64 binary is called then it automatically loads in the background so it is fully transparent.

I created a cronjob which runs on top of Kubernetes in every 5 minutes and re-register the module. I noticed that sometimes it de-registers and hence the 5 minutes iteration. It also works between restarts as well automatically.

Deploy the QEMU emulator (Powershell on management PC):

kubectl apply -f https://raw.githubusercontent.com/szasza576/arc-on-arm/main/base-infra/multiarch.yaml

Credits go to @tonistiigi for his great binfmt work.

Local Docker registry

We will rebuild some images and hence we need a local Docker registry to store these images. If you already have a working registry then of course you can use it as well. Then just update the patcher scripts to point to your registry address instead of “localhost:5000”.

Note that, we will use the “localhost” name so this will work with only single node clusters. If you are setting up a multi-node cluster then you need to run the Registry service in a proper way.

Run these commands on the ARM board:

sudo mkdir /mnt/registry
sudo chown $(id -u):$(id -g) /mnt/registry
docker run -d -p 5000:5000 -v /mnt/registry:/var/lib/registry --restart unless-stopped --name myregistry registry:2

AzureML extension

Rebuild relayserver

The relayserver component uses a C# code and hence that container shall be rebuilt with ARM64 specific .NET framework.

Build ARM64 container for relayserver and push to the local repository (run it on the ARM board):

docker build -t localhost:5000/azureml/amlarc/docker/relayserver:1.1.53 https://github.com/szasza576/arc-on-arm.git#main:azureml/relayserver
docker push localhost:5000/azureml/amlarc/docker/relayserver:1.1.53

Extension patchers

There is a set of patcher scripts which helps you to easily change the images to ARM64 images. This runs in the background and do the patching when needed.

Deploy the patcher tools (Powershell on management PC):

kubectl apply -f https://raw.githubusercontent.com/szasza576/arc-on-arm/main/azureml/aml-patcher/aml-patcher.yaml

3 components will be patched when they are deployed. These are expressed in the following chapters

Relayserver

The relayserver deployment appears when the extension is deployed. It will use the Azure provided image and actually it deploys with green. The original code causes issues when we would like to add the cluster to the Azure ML workspace and that’s why we had to rebuild the docker image.

The relayserver patcher tool will periodically check if the relayserver is deployed and updates its image parameter to point to our registry at “localhost:5000”.

The patcher keeps running in the background and if an update sets it back to Microsoft’s registry then it brings it back to our.

Prometheus operator

The Prometheus operator is a selectable component when we deploy the Azure ML extension. Kubernetes recognizes that it is AMD64 platform image and refuses to pull it despite the configured QEMU setup.

These are 3rd party images which are the same as can be found on the internet hence the easiest solution to point to the official image source where the ARM64 images are also available.

The prometheus patcher will check the CRD (Custom Resource Definition) where the image parameters are stored and patches that CRD when it appears. This is a bit tricky because kubectl doesn’t support strategic merge for CRDs hence JSON merge shall be used here. If you check the code you can see that it refers to a yaml file which contains the patches.

Storageinitializer

Finally there is a component which downloads our AI model when we create a real-time endpoint. Nothing wrong with this container as it can run nicely (inside QEMU) but it causes TLS handshake errors here due to a too aggressive CPU limit.

The storage patcher runs continuously and checks if the storageinitializer-modeldata container starts running. If it runs then it increases the container’s CPU limit directly at containerd level. This eliminates the bottleneck and the container can download the AI model properly.

Ok, this is a strange one so let me explain. Once you create a real-time endpoint then a CRD is created on Kubernetes. This CRD is processed by the inference-operator which creates the deployment for this endoint. That deployment includes 3 containers: an identity sidecar, the AI’s runtime environment and storageinitializer-modeldata.

If we patch the deployment file (like we did with the relayserver) then the inference-operator founds it and sets it back which causes a restart and everything starts from zero. The CRD contains resource parameters only for the AI runtime container but nothing for the storageinitializer so we cannot modify the CRD. If we patch at the containerd level then it remains unnoticed to Kubernetes because K8s is “just” an orchestrator hence it won’t notice it. This is the only way to hack it.

Deploy AzureML extension

Now you can deploy the Azure ML extension in the traditional way as written in the documentation.

You can use the portal and add the extension (deploy only the Prometheus and the Volcano components) or you can use the following Azure CLI command (Powershell on management PC):

$ResourceGroup="arc-on-arm"
$Location="westeurope"
$ClusterName="rock5b"
$AMLExtName="azureml"

az k8s-extension create `
  --name $AMLExtName `
  --cluster-name $ClusterName `
  --resource-group $ResourceGroup `
  --extension-type Microsoft.AzureML.Kubernetes `
  --config enableTraining=True `
           enableInference=True `
           inferenceRouterServiceType=LoadBalancer `
           allowInsecureConnections=True `
           inferenceLoadBalancerHA=False `
  --cluster-type connectedClusters  `
  --scope cluster

Uninstalling

You can uninstall the extension in the normal way via the Portal.

To uninstall the patcher components just delete the namespaces and restart the node (to remove QEMU):

kubectl delete ns qemu
kubectl delete ns aml-patcher
sudo reboot

Troubleshooting

I get ImgPullError

The git repo and this blog post are not continuously updated and if Azure updates the versions then it will fall out of sync. Please update the version numbers in the relevant files or send a comment here so I’m notified.

Model inference timeout

You can deploy AI models with the default Azure ML environment but those are based on x86-64 codes and hence those will run within a QEMU emulated environment. It means it will be extremely slow.

You need to create an ARM64 environment image in AzureML and upload it to the AzureML’s registry. It is important to build the whole image what AzureML can import. If you also specify a conda file then AzureML tries to build the image and of course it cannot deal with ARM64 images so it will fail. I will create a guide about this.

Just as a reference: running a Yolov5 on a weak notebook with (Nvidia 940m](https://www.techpowerup.com/gpu-specs/geforce-940m.c2643) takes 150 ms to score an image. The same model can be deployed here but will run on CPU due to lacking Nvidia card* and also it runs inside QEMU and it takes 100 seconds (yes, not ms but sec) to score.

*As the Rock5B has an PCIe 3.0 x4 M.2. connector hence it is possible to attach an Nvidia card to it … but we will use its internal AI accelerator in the next round.

Dead Kubernetes after restart

Despite of disabling the swap service, it comes back sometimes. Kubelet doesn’t start if the swap is active.

Run these to disable the swap again:

sudo systemctl disable swapfile.swap
sudo systemctl stop swapfile.swap

Once it is done then Kubelet comes back to service and Kubernetes will be available in 1-2 minutes.