AIoT Ops: A step forward in your 4.0 industry strategy

Dernière mise à jour : mai 7

PROFESSIONALIZE YOUR DEVICE ARMY WITH MDW PARTNERS AND MICROSOFT AZURE



Hundreds of AI models, thousands of IoT devices

Nowadays, it is common for companies to have Data Scientists developing AI models to solve different business needs. Sales forecasting, customer churn, image recognition and many others will produce a notable increase in revenue and the company’s performance. Some companies, especially those specialised in manufacturing, supply chain or energy, can take significant advantage of AI by joining the universes of AI and IoT. It is common for this kind of companies to have several IoT devices in their factories and plants, which have the capacity to run AI models on edge. The tasks AI and IoT can accomplish are practically unlimited:

  • Detecting leaks and defects on pieces

  • Ensuring safety areas

  • Detecting IPEs on workers

  • Counting people and pieces

  • Scanning delivery notes

  • And many more

These kinds of solutions are a gigantic step forward in Industry 4.0 and can produce notable improvements in terms of revenue, safety, and automatisation. However, if you are planning to develop and maintain a solution like this and create a secure environment for your AI and IoT models, there are several matters you should consider.


Figure 1: Industrial revolutions. Now we are in the era of AI and IoT.


The latest requirements in industry

The industry and AI are evolving together, and new requirements need to be met to keep moving forward. During the last decade, the goal of companies and data scientists has been to understand what problems could be resolved with AI. Nowadays, there is plenty of evidence confirming that AI can resolve a large variety of problems. The need now is to create a sustainable and secure environment for our models and their deployments. We are no longer on a research and development phase; it is time to professionalise the way we do AI and IoT.

Which AI models are embedded in which IoT devices? How can we update models safely? How can we replace a model if the performance is not what we expected? These may sound like basic questions, yet they become tricky questions when we are managing a large number of devices and models, which will be a common situation in a medium- large sized company.

For instance, imagine a company that has a Data Science team developing a new computer vision model to detect defects on metallurgical pieces. The aim is to deploy it onto different IoT devices across the production line. Once the first version of the model is trained, the Data Science team, in collaboration with an IT person, will create a module to deploy the model onto the devices. Now imagine that when this model is in production, another person in the Data Science team has achieved a better score on the same task with a different model architecture. Then, for the next deployment onto devices they will use that new version. Later, the company plans to replace some devices for the next generation ones, utilising an improved version of the model. Imagine after 6 or 12 months of work, the quantity of models, versions, and deployments onto IoT devices this company could have.



Figure 2: Necessity of a centralized environment between Data Scientists

Companies looking to create a sustainable and safe AI and IoT ecosystem, will need to find a solution to this problem. On this blog, we will explore practices that will help implement this solution. In the following sections, we will go through different technologies: certainly, we will see how to develop an AI model and how to create an IoT module with the model embedded. However, our purpose is to show how to professionalise this flow and how to create an effective ecosystem from which everything can be controlled.



Three components of an AIoT solution

There are three technologies which need to be merged to create the desired ecosystem:

IoT: Firstly, we need to understand how IoT devices work. IoT devices are one of the main pillars in Industry 4.0. They are at the front line in our factories and plants, and they can have several functions, such as sending metrics of processes and components in real time. Furthermore, thanks to the latest technological advances, IoT devices are able to run embedded AI models and get predictions directly on site. This is known as running models on edge. Therefore, what we need to understand is how to build these intelligent modules and how to organise massive deployments of a module (or a set of them) onto several IoT devices.

AI: On the AI side, the requirement is clear in terms of functionality: we want to create a secure and controlled environment for the life cycle of the AI models. To do so, we will need to automatise and track all steps involved on an AI solution. This is known as MLOps. MLOps will improve the development phase and deployment via monitoring, validation, and governance of machine learning models. What we will explore in the AI section is how to create a register where the model versions and experiments will be stored, in a collaborative environment of data scientists. To be able to replicate each model we create, it is essential to track every experiment, dataset and transformation. This will be a crucial part of the desired environment we are aiming to build.

DevOps: Finally, we will need a mechanism to orchestrate all the aforementioned areas. Remember the example illustrated earlier, and the difficulties we had with effectively managing a large number of models and devices. Thanks to DevOps we can centralise and automatise the whole process creating an easily manageable and secure environment. Using DevOps pipelines, we can automatise a set of tasks and execute them as many times as needed. In this blog’s entry we will use DevOps to automatise the whole process, from the model training phase, to the deployment on the IoT device.


By merging these three technologies we ensure that we have control over every part of an AIoT solution.


In the next section we will focus on IoT devices. These devices have specific features we need to consider before we start deploying AI models onto them. In the next section we will explore how to organise one deployment onto several devices and the cloud services necessary to do that.



The Internet of Things

IoT is the foundation of a 4.0 strategy. The evolution in this area has been astonishing during the past years, allowing new possibilities to these devices.


The uses of IoT devices go from monitoring a critical machine on a production line, to warning workers of potential accidents. Nonetheless, by embedding AI models onto these devices, the applications become even more interesting.


Deployments on edge have many advantages, for instance, the data does not need to travel to the cloud, which avoids connection issues and reduces the inference time. In addition, the evolution of these devices is bringing us the possibility to deploy increasingly complex models. Before we get into how to deploy models onto IoT devices, we need to understand the main differences between IoT devices and traditional computes.



IoT devices

Size: This is the most distinctive feature of IoT devices. It is certainly untrue that all IoT devices are small in terms of size, memory, and processor, but it is actually the case for the great majority of them. In terms of AI models, this feature has different implications. One of them is that we cannot deploy a large model with several layers and parameters. The models selected are usually simpler than regular ones deployed onto large clusters. There are several techniques that allow us to simplify our models. These are known as Model Quantization. We should keep this concept in mind, as we will need to use Model Quantization when the model we produce requires a large memory and size.

Processor architecture and GPU: IoT devices have another peculiarity. Due to their compute requirements, the family of IoT devices uses a variety of processor architectures. For example, Raspberries use ARM32 and Nvidia Jetson use ARM64 and GPU. There is a large variety of combinations for IoT device ecosystems. Therefore, the solution we need to implement to create an IoT module will be different depending on the selected device.


In the following section we will explore how to create an IoT module. Furthermore, we will explore how to create a collection of intelligent modules for different devices.


From models to intelligent modules

IoT devices use different technologies to deploy modules onto them, but probably the most common approach is Docker. Citing Wikipedia, Docker is a set of platforms as a service (PaaS) products that use OS-level virtualisation to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries, and configuration files.

Thanks to Docker it is possible to encapsulate an OS within an application, an AI model or scripts into a container.


In the example we are describing here, we must create a Docker image with three characteristics:

  • A previously trained AI model.

  • The base Docker image should be adapted to the processor architecture of our devices.

  • The IoT module set up to send predictions to our current systems and to the cloud.

To do so, we will generate the necessary instructions to create Docker images and manifests with all the previous requirements. However, what if we have different devices with different CPU architectures as we mentioned earlier? In this case, the only option is to create a different image per CPU/GPU architecture. Once we have accomplished this task, we will have a set of images per model that we want to deploy, stored on our Docker image registry (Docker Hub / Azure Container Registry). For instance, if we are creating a model to detect defects on pieces and we have Raspberry Pi and Jetson, we will have two different images: one for the Raspberry device and another one for the Jetson device. The results will be the same with both of them, but they have different requirements in terms of processor.

There is one more topic we will need to cover to deploy AI models onto IoT devices: the part responsible for orchestrating the massive deployments of our modules (Docker images) onto devices, and the communication between devices and the cloud. For this purpose, we will use one of Azure’s services, IoT Hub.

Figure 3: Basic IoT architecture: Registry of modules, devices, and cloud service for bi-directional communication


IoT Hub: From modules to massive deployments

Citing Microsoft, IoT Hub is a managed service, hosted in the cloud, that acts as a central message hub for bi-directional communication between your IoT application and the devices it manages. You can use Azure IoT Hub to build IoT solutions with reliable and secure communications between millions of IoT devices and a cloud-hosted solution backend. You can connect virtually any device to IoT Hub.

Thanks to this service we can easily organize deployments into several devices, selecting the specific modules we want to deploy and the devices we want to use. Therefore, the mission of IoT Hub will be joining the software (modules) with hardware (physical devices).

Once we have checked and confirmed that our deployment is working correctly, we can add a step to our DevOps pipeline. This will be especially important to the final part of this blog, when we will cover how to automatise all the steps.

We have already covered IoT, in the following section we will explore the AI paradigm. We have seen how to encapsulate modules into several devices. Now the goal is to apply best practices in ML to create a centralized and secured ecosystem for our models.


MLOps

Before we start talking about MLOps, there is an important concept we should keep in mind: there are millions of ML models with a great variety of functions, however the steps to follow to resolve an AI problem are always very similar. The goal of MLOps is to help Data Scientists by automatising the parts that will remain the same during the entire project’s life cycle. It is important to note the phases of an AI project:

  • Exploratory Analysis: This step consists of analyzing datasets to summarize their main characteristics, often using visual methods. We will then gather insights about the dataset, data quality and data nature, in preparation for the modelling stage. This step is unique, as there is no need to repeat it each time a model is trained. When new data appears, we just need to ensure that the data characteristics remain the same.

  • Processing and feature engineering: This is a crucial step. The dataset and therefore the final model will be completely different depending on which features have been chosen and which transformations have been applied to the data. The final deployed model will need the new data to follow the same processing and feature engineering pipeline to be able to produce predictions.

  • Training: Once the data has been processed, we are ready to train a ML model. In this step there are several model architectures and hyperparameters to test. The main point here is to create a model registry where we can log each instance of training and execution. This registry should include all the relevant information to reproduce an experiment and all the metrics obtained. The aim is to facilitate the selection of the best model for our desired outcome.

  • Deployment: Once we have completed several experiments, we will have a registry of models. Once these steps have been completed, we can proceed to select the best one, which will be put into production. You should consider the following matters: how will we consume the model? Do we need real-time inference always-on? Will it be consumed just once a day? Or maybe just when an event happens? In the next section we will come back to this point and look at it from the perspective of IoT devices.

  • Monitoring: Once we have put the model into production, we need to ensure that everything is okay with it, that the assumptions we made about data are correct and that the model is working according to the metrics obtained during the training phase.

The aim of MLOps is to automate the above phases and execute them when necessary. The triggers to execute the pipeline are typically modifications of the processing code, new data, new models to test, etc. - everything that could potentially lead to an improved version of the model.


MLOps can help to significantly improve the performance and the quality of your AI solutions, but how can we start applying MLOps? One of Azure’s services can be used for this purpose, Azure ML Services.

Azure ML Services

Azure ML Services allows us to centralize all the experiments, models, docker images, computes, etc. in just one cloud-based service. This avoids Data Scientists having different models and processes on their machines and allows them to add an extra traceability and recovery layer to the ML life cycle.


Figure 4: Thanks to Azure ML it is possible to automate and track all ML processes

How does Azure ML Services work? The objective of Azure ML Services is to centralize all the processes mentioned earlier with as little code as possible. To do so, just a few extra lines of code are needed to log everything associated to the model: experiments, metrics, model registrations, etc. Your scripts for these tasks will practically remain the same, we just need to set up the connection with your Azure ML Services Workspace to start logging and registering every item.

Figure 5: MLOps life-cycle

We now have scripts that accomplish each task in a cloud environment, but if we had to execute them manually, there would not be automation and therefore there would not be MLOps.

We need to automate the execution of all the scripts developed for each ML life cycle task by adding them to our DevOps pipeline. There will be several options depending on the infrastructure, services, and business requirements.

In the example we are looking into today, the requirement is clear: deploying models onto IoT devices. Having covered information on how to create intelligent modules for IoT devices and how to manage the life cycle of AI models, in the following section we will explore what is needed to get the entire process automatised and centralized using DevOps.


DevOps

Citing Wikipedia, DevOps is a set of practices that combine software development (Dev) and IT operations (Ops). DevOps aims to shorten systems’ development life cycles and provide continuous delivery with high software quality.

A DevOps pipeline could be understood as a set of tasks associated with code that have to be executed every time that something happens (a trigger). In traditional development, the most common triggers are either code being modified in a branch, or a time scheduled execution. In non-technical terms, the pipeline needs to be executed when there is something new that will produce a different result in the solution deployed.

When we are developing AI models and deploying them onto IoT devices, the tasks involved are suitable to be automated using DevOps. Then, why is it not a common practice to create DevOps pipelines for AI and IoT projects?

Centralizing AI and IoT processes in this kind of pipeline will produce a sizable improvement in the management of models and devices. Nonetheless, this is not the only advantage. Once we have created the DevOps pipeline, it will become very easy to execute it over and over. Consequently, you do not need to be an IT expert to get several models deployed onto IoT devices. Therefore, Data Scientists will become more autonomous and the IT team can focus on other tasks.

One of the main differences between AIoT development and traditional development are the triggers that can start the pipeline: new data available, new models to test or new processor architectures on the devices will all be possible triggers for our AIoT pipelines.

Taking into account everything we have covered in this and previous sections, we will now explore what steps need to be added to the AIoT Ops pipeline.

First of all, we can automate all tasks associated with MLOps: the creation of models, registration of models, and centralisation of the experiments. One of the outputs of this will be a trained AI model.



Figure 6: ML steps in DevOps pipelines


Secondly, once we have obtained the model, we will create an intelligent IoT module with the model embedded. Then we will deploy this module onto as many devices as we want.


Figure 7: IoT steps in DevOps pipelines


AIoT Ops: End2end Solution

We have seen different tasks we can automatise to manage the deployments of AI models onto IoT devices. These tasks will be added onto a DevOps pipeline to be executed whenever a trigger occurs. To summarise what our end2end solution consists of, the graphic below shows the steps of our DevOps pipeline and the services involved.



By fulfilling the tasks we have described in this blog we have achieved a complete end2end solution. Thanks to DevOps we are able to centralize AI and IoT worlds avoiding incongruity and uncertainty of models and deployments.


Did you know?

Creating a complex DevOps pipeline to deploy AI models onto IoT devices could be seen as a waste of resources, however, the advantages of applying the best practices available on the field is not just improving the management of the AI and IoT solution, but also a time saver for long term projects.

Every data scientist asked about applying MLOps will agree on its benefits. Therefore, why not start applying these technologies for deployments on IoT devices and take a step forward on your 4.0 transformation?

The way we plan our AI strategy is evolving, companies are now aware of the importance of effectively managing their solutions. After a decade focused on developing models and testing whether they could resolve our business needs, it is time now to take all that learning and increase the productivity with governance, management, and security. It is time for AIoT Ops.


How can we help you?

At MDW Partners, triple MS Gold Partner, we are a group of Data Scientist and Data Engineer cloud experts with extensive technology expertise. We help our clients with defining their IoT strategy and designing their Artificial Intelligence ecosystem.

In which areas can we help?

1. Creating a centralized and sustainable environment for your Data Scientists.

2. Developing state of the art AI models to cover your needs.

3. Creating complex IoT modules with AI models embedded.

4. Orchestrating everything with DevOps pipelines.

5. Adapting solutions to your current infrastructure.


Would you like some advice to help you implement an AIoT solution? Go ahead and set up an appointment for a free consultation call at a time of your convenience.