AI can in the Embedded Development Reformulate requirements, generate test ideas, write code snippets, and prepare documentation. However, in the functional safety environment, a qualification problem always arises. This is because code generation tools fundamentally always fall under the so-called T3 tools, which require separate qualification or, in the case of safety functions, usually a certificate. This is because a development tool may only generate safety-related code if its behavior, operating limits, and potential errors can be demonstrably assessed. With generative AI systems, this assessment is only possible to a limited extent. So, how should we approach the interaction Functional safety And evaluate AI? A detailed look at this complex situation.
Content
Starting position
Vibe coding refers to a way of working where software is generated and iteratively adjusted through natural language prompts. People describe the desired behavior, the language model generates code, tests, or documentation, followed by further prompts. In simple prototypes, this approach can speed up the initial implementation.
Functional safety follows a different logic. Safety-related software must be derived from requirements, tested, documented with traceability, and verified against defined acceptance criteria. This applies, for example, to automotive systems according to ISO 26262 and industrial systems according to IEC 61508, aviation software DO-178C with tool aspects DO-330 or medical device software according to IEC 62304.
What software is allowed to do and what it is not allowed to do is, for example, relatively clearly regulated in IEC 61508-3. The use of artificial intelligence is expressly not recommended, even if AI is mentioned here in the form of error correction or error-correcting procedures – this is, of course, only one aspect of functional safety. Furthermore, AI or Generative AI in its current form is certainly not what the committees envisioned for AI in 2010.

The conflict therefore arises when AI outputs are directly incorporated into safety-related artifacts. A language model can generate code, test cases, or safety arguments. However, this does not prove that these results are correct, complete, or compliant with standards. Nevertheless, there are gradations to be made here.
Technical Background: Functional Safety & AI
In functional safety, a software error can affect a safety function, trigger a hazardous state, or mask an existing fault. Therefore, safety processes require evidence regarding requirements, architecture, implementation, verification, traceability, configuration management, and reviews.
Tool qualification starts at the following point: A tool becomes critical when the development process relies on its correct function and faulty outputs are not fully detected. The automotive world, the industrial world, and all other regulated industries are normatively in agreement on this. ISO 26262, for example, describes this principle through trust in software tools and considers the possibility that a malfunctioning tool introduces errors or fails to detect existing errors. IEC 61508 follows the same basic idea. DO-330 even explicitly formulates supplementary guidelines for tool qualification in the aviation context.
Tools differ in their effects. A code generator can introduce errors into a system. A testing tool can overlook errors. In both cases, the context of use must be considered. The key question is whether its output influences safety-related decisions and whether this output is detected or corrected by independent measures.
With classic tools, functions, versions, configurations, test cases, and known error patterns can often be described. Providers can supply qualification packages, such as test specifications, expected results, and documentation for a specific tool function in a specific environment. With generative AI systems, this type of limitation is more difficult because outputs are prompt-dependent, context-dependent, and model-dependent.
Generative AI for Functional Safety
AI is changing embedded work in areas where writing, structuring, and variation are required. A model can derive test ideas from a requirement, pre-fill a traceability matrix, explain MISRA C guidelines, generate unit test stubs, or pre-structure an FMEA table. These activities produce drafts that can then be reviewed by domain experts.
The safety problem begins where such designs are treated as verified artifacts. An AI-generated test case is initially a suggestion. It does not prove that a requirement has been fully tested. An AI-generated safety mechanism is initially an implementation idea. It does not prove that the diagnostic coverage is sufficient or that faulty assumptions have been ruled out. An AI-generated review comment is initially a hint. It does not replace a review with documented responsibility.
The thesis is therefore: For many AI-based development tools, a classic tool qualification as a safety tool will hardly be viable if their output is not fully verified. The more sensible process treats AI outputs as unchecked inputs. Safety assurance then lies in requirements, reviews, static analysis, testing, coverage evidence, traceability, and independent verification.
Tool Qualification and Generative AI
A tool qualification requires a robust description of the tool's behavior in its intended use. It is a distinct discipline within the context of functional safety. AI, on the other hand, is largely not covered by this tool qualification. Tool qualification typically includes the functions used, configurations and selection options, possible malfunctions, error detection measures, and proof that the tool delivers the expected results in the intended environment. Proof is provided through a set of detailed test procedures and is not trivial.
Generative AI complicates several of these points:
- The same goal can lead to different outputs depending on the prompt, context, and model version.
- The internal derivation of an output is not fully understandable to the development team.
- The model can generate plausible but incorrect justifications.
- The available context may be incomplete with larger codebases.
- Security requirements can be linguistically correct but technically implemented incorrectly.
- Unsafe patterns, missing input validations, or inappropriate library usage can arise during code generation.
These characteristics do not preclude the use of AI. However, they do change the role of the tool. If the AI only provides drafts and all safety-related results are checked by qualified or otherwise secured processes, the safety proof does not depend on the correct functioning of the AI. Conversely, if a project relies on AI outputs without verification, the AI itself would have to be assessed as a safety-relevant tool. This is precisely where the problem of qualification arises, which is difficult to solve.
Permissible and Problematic Usage Patterns
For non-security-related code, Vibe Coding can be used more broadly, provided reviews, tests, and configuration management are in place. For security-related software, the leeway decreases with criticality.
For ASIL A/B or SIL 1/2, AI can assist with preparatory activities. Examples include requirements decomposition, test case design, boilerplate code, review checklists, MISRA declarations, or suggestions for range checks, timeout handling, watchdog supervision, and CRC checks. The output must then go through the intended development process.
For ASIL C/D or SIL 3/4, AI-generated code should only be used in narrowly defined units. Functions with clear inputs, clear outputs, and complete testability are suitable. For safety-related logic, human reviews, independent tests, and formal or semi-formal proofs for critical parts are obvious measures.
Problematic are use cases where AI takes over the safety authority. These include unverified ASIL D logic, automatically finalized safety requirements, estimated diagnostic coverage, skipped reviews, or toolchain qualification intended to be replaced by AI outputs.
Where generative AI will likely be tolerated
Generative AI in software development cannot be permanently excluded from safety-critical projects. This is particularly evident in the area of code snippets. This refers to individual code fragments, helper functions, transformations, initializations, test aids, or clearly defined implementation parts that are generated based on a precise prompt. The AI does not assume responsibility for architecture, safety concepts, requirements, test strategy, or release. It acts more like a more productive form of development environment: an „extended keyboard“ with a suggestion function.

This is precisely where the real challenge for functional safety lies. If developers use generative AI to formulate individual sub-aspects more quickly, a blanket ban is practically difficult to enforce. Therefore, for safety-critical systems, the first question is not whether generative AI should be fundamentally „allowed“ or „forbidden.“ The more relevant question is: under what conditions can an AI-generated code snippet be incorporated into a safety-relevant development artifact?
A robust rule cannot be: „AI generated it, so it's wrong.But it can't be: “The developer has taken it over, so it has been sufficiently tested..“
AI-generated code snippets must be treated as foreign, unqualified code and should always be integrated through a process. Specifically in the context of functional safety: AI snippets may only be used if the developer fully understands the code, can verify its function against the associated requirements, and does not adopt any unclarified assumptions from the prompt or the model's response.
This shifts the organization's role. It doesn't just have to decide whether AI tools are allowed. It has to define:
- which artifacts may be created,
- how prompts are documented,
- whether AI-generated code needs to be marked,
- what review depth is required;
- which code areas are excluded.
The alternative would be a corporate or societal decision to explicitly prohibit the use of generative AI in safety-critical systems. This would be a clear but strict approach. It would have the advantage of unambiguous rules, but also significant practical weaknesses: it would need to be controllable, not create shadow usage, and withstand pressure for productivity, the skilled labor situation, and international competition.
Technical limitations
Regardless of all points, the technical limitations must be highlighted.
Language models can lose context with larger codebases. Changes can affect distant dependencies. Generated code may contain inconsistent structures, duplicate logic, or hidden assumptions. Safety-critical software also requires controlled versions, reproducible builds, and traceable changes. Prompt histories alone do not fulfill this role.
As the biggest blocker for widespread use of functional safety: AI has a Randomness. Because there's an inherent non-determinism in generative AI. The same prompt can lead to different code snippets, depending on the model version, parameters, context window, temperature, provider, or tool integration. For functional safety, this is a problem because safety-critical development relies on reproducibility.
What works and what doesn't
A viable process separates AI assistance from shifting responsibility for functional safety. AI then delivers designs or snippets at most. This must be demonstrable in case of doubt. The organization as a whole determines which designs are adopted, which tests are performed, and which evidence is used in the safety case. This can work, but must simultaneously be bound by strict rules that will require proof.
Here are a few examples of such rules:
- Code is only accepted after review and static analysis (PASS)
- Traceability is not derived from AI texts, but is checked against real artifacts.
- A prompt is executed multiple times. Different results are compared against each other using a diff.
- The four-eyes principle always applies.
- Generated snippets will be marked as such. One approach is markup in the code or in the ticket.
- Reviews and independent verification remain manually documented process steps.
Conclusion
Generative AI will not disappear from embedded development, not even in the context of functional safety. Therefore, the decisive factor is not a blanket yes or no, but a robust process: AI-generated snippets must be treated, understood, checked, documented, and verified like unqualified third-party code. For safety-critical systems, AI may deliver productivity, but it must never replace responsibility, tool qualification, or safety evidence.



