{"id":1495,"date":"2025-01-05T15:54:00","date_gmt":"2025-01-05T15:54:00","guid":{"rendered":"https:\/\/www.pickplace.de\/?p=1495"},"modified":"2026-03-09T16:01:04","modified_gmt":"2026-03-09T16:01:04","slug":"cortex-m85-the-standard-for-ai-on-microcontrollers","status":"publish","type":"post","link":"https:\/\/www.pickplace.de\/en\/cortex-m85-der-standard-fur-ai-auf-mikrocontrollern\/","title":{"rendered":"Cortex M85 \u2013 the standard for AI on microcontrollers"},"content":{"rendered":"<div class=\"wp-block-stackable-text stk-block-text stk-block stk-fcaa249\" data-block-id=\"fcaa249\"><p class=\"stk-block-text__text\">Die Anforderungen an eingebettete Ger&#xE4;te sind in den letzten Jahren stark gestiegen. Besonders in Bereichen wie maschinellem Lernen (ML) und der Signalverarbeitung besteht ein wachsender Bedarf an leistungsf&#xE4;higen und gleichzeitig energieeffizienten L&#xF6;sungen. Durch begrenzte Ressourcen und Rechenleistung sind Mikrocontroller h&#xE4;ufig nur zweite Wahl. Die&#xA0;kosten- und energieintensiveren High-End-Prozessoren wie Cortex-A oder spezialisierte GPU-basierte Ans&#xE4;tze stehen im Vordergrund. Nun jedoch preschen Hersteller mit der neuen Cortex-M85-Architektur vor.<\/p><\/div>\n\n\n\n<div class=\"wp-block-stackable-text stk-block-text stk-block stk-2d63d64\" data-block-id=\"2d63d64\"><p class=\"stk-block-text__text\">Mit dem ARM Cortex M85 steht nun ein&#xA0;neuer Standard f&#xFC;r Mikrocontroller zur Verf&#xFC;gung, die speziell f&#xFC;r die Anforderungen ressourcensparender mobiler Systeme entwickelt wurde. Der M85 kombiniert hohe deterministische Rechenleistung mit innovativen Features, die den Einsatz von MCUs &#xFC;ber klassische Anwendungsfelder hinaus erweitern. Ferner sind ULP-Anwendungen (Ultra Low Power)&#xA0; m&#xF6;glich.&#xA0;Die Grundlage f&#xFC;r den im ML-Kontext ben&#xF6;tigten hohen Datendurchsatz ist die Helium-Technologie, die durch die M-Profile Vector Extension (MVE) die M&#xF6;glichkeit zu komplexeren Matrizenoperationen bietet.<\/p><\/div>\n\n\n\n<div class=\"wp-block-stackable-heading stk-block-heading stk-block-heading--v2 stk-block stk-7773f17\" id=\"m-profile-vector-extension-mve\" data-block-id=\"7773f17\"><h2 class=\"stk-block-heading__text\">M-Profile Vector Extension (MVE)<\/h2><\/div>\n\n\n\n<div class=\"wp-block-stackable-text stk-block-text stk-block stk-fe667a6\" data-block-id=\"fe667a6\"><p class=\"stk-block-text__text\">A core component of the Cortex M85 is MVE, which massively increases processing speed for ML models and signal processing. ML applications at the edge often rely on optimized matrix operations that were previously trained on powerful servers and then implemented on microcontrollers. By utilizing libraries like CMSIS-NN, these models can be efficiently executed on the Cortex M.<\/p><\/div>\n\n\n\n<div class=\"wp-block-stackable-text stk-block-text stk-block stk-b895a07\" data-block-id=\"b895a07\"><p class=\"stk-block-text__text\">The Helium extension allows the Floating Point Unit (FPU) to be used as a 128-bit vector register, enabling 16 operations with 8-bit, 8 operations with 16-bit, or 4 operations with 32-bit to be performed in parallel. This results in up to four times the performance compared to a typical Cortex-M7 with similar performance parameters (clock, RAM\/ROM). Practically, microcontroller abstraction is provided via the CMSIS library. ARM thus provides the necessary MVE instructions directly with the CMSIS-NN library, which significantly simplifies the applicability of ML applications.<\/p><\/div>\n\n\n\n<div class=\"wp-block-stackable-text stk-block-text stk-block stk-e243ca6\" data-block-id=\"e243ca6\"><p class=\"stk-block-text__text\">The Helium technology of the Cortex M85 optimizes data processing through the concept of \u201ebeatwise\u201c execution, which is based on 8 vector registers, each with a length of 128 bits. These registers are divided into four equal sections of 32 bits each, referred to as \u201ebeats\u201c (A through D). Each beat represents 32 bits of computation, regardless of element size \u2013 for example, 1 x 32-bit MAC or 4 x 8-bit MAC.<br><br>A typical scenario, as illustrated in the following diagram, shows an alternating sequence of Vector Load (VLDR) and Vector MAC (VMLA) instructions over four clock cycles. In a classic 128-bit data path architecture, large portions of the hardware, such as the memory path and the MAC blocks, would often be underutilized. However, the MVE architecture breaks down each 128-bit-wide instruction into four equally sized beats. By separating the load and MAC hardware, the processing of these beats can be overlapped: while beat A of a VLDR is being loaded, beat A of a VMLA, which accesses data from the previous cycle, is simultaneously processed.<\/p><\/div>\n\n\n\n<div class=\"wp-block-stackable-image stk-block-image stk-block stk-66c7df4\" data-block-id=\"66c7df4\"><style>.stk-66c7df4 .stk-img-figcaption{text-align:center !important;}.stk-66c7df4 .stk-img-wrapper{width:597px !important;}<\/style><figure><span class=\"stk-img-wrapper stk-image--shape-stretch\"><img loading=\"lazy\" decoding=\"async\" class=\"stk-img wp-image-1496\" src=\"https:\/\/www.pickplace.de\/wp-content\/uploads\/2026\/03\/helium_tech.webp\" width=\"597\" height=\"204\" alt=\"Cortex M85 - Stepped arrangement of colored blocks A-H against a black background; modular electronics, embedded hardware.\"\/><\/span><figcaption class=\"stk-img-figcaption\">\u201eBeatwise Operation in the Helium MVE Cycle<\/figcaption><\/figure><\/div>\n\n\n\n<div class=\"wp-block-stackable-text stk-block-text stk-block stk-b0abbcd\" data-block-id=\"b0abbcd\"><p class=\"stk-block-text__text\">This overlapping design allows it to achieve the same performance as a processor with a 128-bit data path. Even with processors that only have a 32-bit data path, comparable instructions can be efficiently processed through \u201ebeatwise\u201c execution. Such a design doubles the performance of a single-issue scalar processor that can load 8 x 32-bit values and perform MAC calculations in eight cycles \u2013 and this without the high hardware overhead of a dual-issue design.<\/p><\/div>\n\n\n\n<div class=\"wp-block-stackable-heading stk-block-heading stk-block-heading--v2 stk-block stk-3d7f3ad\" id=\"low-overhead-branch-extension\" data-block-id=\"3d7f3ad\"><h2 class=\"stk-block-heading__text\">Low Overhead Branch Extension<\/h2><\/div>\n\n\n\n<div class=\"wp-block-stackable-text stk-block-text stk-block stk-14216c6\" data-block-id=\"14216c6\"><p class=\"stk-block-text__text\">Not only in the context of ML, but also with loop structures, processing efficiency has a significant impact on overall performance. The Cortex M85 introduces optimized pipeline control here with the new machine instructions WLS, DLS, and LE. These instructions minimize overhead in loop operations, as the beginning and end of the loop are stored directly in the core registers.<br><br>A special feature: The work of using these new instructions is handled by the compiler, so developers automatically benefit from improved performance. Even if the MVE extension is not implemented, the new loop instructions are available.<\/p><\/div>\n\n\n\n<div class=\"wp-block-stackable-heading stk-block-heading stk-block-heading--v2 stk-block stk-0333079\" id=\"half-precision-floating-point\" data-block-id=\"0333079\"><h2 class=\"stk-block-heading__text\">Half Precision Floating Point<\/h2><\/div>\n\n\n\n<div class=\"wp-block-stackable-text stk-block-text stk-block stk-a59183b\" data-block-id=\"a59183b\"><p class=\"stk-block-text__text\">To further increase computational performance, the FPU of the Cortex-M85 supports 16-bit half-precision operations in addition to 32-bit single-precision and 64-bit double-precision operations. This is particularly helpful when normalizing ML models, as calculations with smaller data types not only reduce memory usage but also increase computation speed \u2013 without risking significant quality loss.<\/p><\/div>\n\n\n\n<div class=\"wp-block-stackable-heading stk-block-heading stk-block-heading--v2 stk-block stk-026bbff\" id=\"zusammenfassung\" data-block-id=\"026bbff\"><h2 class=\"stk-block-heading__text\">Summary<\/h2><\/div>\n\n\n\n<div class=\"wp-block-stackable-text stk-block-text stk-block stk-8c9035c\" data-block-id=\"8c9035c\"><p class=\"stk-block-text__text\">Der ARM Cortex M85 mit der Helium-Technologie und der MVE stellt einen bedeutenden Fortschritt in der Welt der Mikrocontroller dar. Er erm&#xF6;glicht leistungsstarke Anwendungen im Bereich KI und Signalverarbeitung, die bisher teureren Prozessoren vorbehalten waren. Mit seiner hohen deterministischen Rechenleistung und den innovativen Optimierungen zeigt der Cortex M85, wie die Grenzen der klassischen MCU-Anwendungsf&#xE4;lle verschoben werden k&#xF6;nnen.<\/p><\/div>","protected":false},"excerpt":{"rendered":"<p>Die Anforderungen an eingebettete Ger\u00e4te sind in den letzten Jahren stark gestiegen. Besonders in Bereichen wie maschinellem Lernen (ML) und der Signalverarbeitung besteht ein wachsender Bedarf an leistungsf\u00e4higen und gleichzeitig energieeffizienten L\u00f6sungen. Durch begrenzte Ressourcen und Rechenleistung sind Mikrocontroller h\u00e4ufig nur zweite Wahl. Die&nbsp;kosten- und energieintensiveren High-End-Prozessoren wie Cortex-A oder spezialisierte GPU-basierte Ans\u00e4tze stehen im [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1501,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1495","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.pickplace.de\/en\/wp-json\/wp\/v2\/posts\/1495","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pickplace.de\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pickplace.de\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pickplace.de\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pickplace.de\/en\/wp-json\/wp\/v2\/comments?post=1495"}],"version-history":[{"count":3,"href":"https:\/\/www.pickplace.de\/en\/wp-json\/wp\/v2\/posts\/1495\/revisions"}],"predecessor-version":[{"id":1505,"href":"https:\/\/www.pickplace.de\/en\/wp-json\/wp\/v2\/posts\/1495\/revisions\/1505"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.pickplace.de\/en\/wp-json\/wp\/v2\/media\/1501"}],"wp:attachment":[{"href":"https:\/\/www.pickplace.de\/en\/wp-json\/wp\/v2\/media?parent=1495"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pickplace.de\/en\/wp-json\/wp\/v2\/categories?post=1495"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pickplace.de\/en\/wp-json\/wp\/v2\/tags?post=1495"}],"curies":[{"name":"WP","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}