Skip to content

General Requirements — Simulator Engine API

General

The API should be self-documenting and tooling compatible (code generation, testing, mocking).

Why required: Mandating that the API is self-documenting and tooling-compatible allows any client or orchestrator to discover and consume the simulator's API. Furthermore, this avoids the documentation being out-of-band or outdated.

An example standard that incorporates this practice is OpenAPI standard.

The API should be implemented non-blocking, so that it responds timely even if the simulator is doing a heavy, long running task.

Why required: Simulators often perform computationally expensive operations (model loading, solving, stepping). A blocking API would stall all callers — including health probes and orchestrators — during these operations, potentially causing false-positive failure detection, watchdog restarts, or cascading timeouts in the wider system. Non-blocking design ensures the API remains responsive at all times, independent of simulation workload.

From our perspective, we prefer a fully open API without authorization on simulator side, as we can implement user-authorization on a different level. If the simulator API wants to implement detailed authorization, it should be compatible with industry standard authentication solutions (RBAC, OAuth...)

Note: Simulators are typically deployed inside a controlled network boundary (e.g. a private cluster or VPN), where network-level access control already restricts who can reach the API. Adding authentication at the simulator layer introduces integration complexity — every client (orchestrator, health probe, CI pipeline) must manage credentials — without a meaningful security gain in that deployment context. Authorization is therefore left to the infrastructure layer (API gateway, service mesh, or similar) rather than mandated at the simulator API itself. If a specific deployment scenario requires it, the preference for industry-standard schemes (RBAC, OAuth 2.0) ensures it remains interoperable.

Start of Simulation

There are generally two options: 1. Start the simulator without a config, and provide configuration after start up through API 1. Provide a "pointer/path" to the configuration for the simulation at startup

When starting simulators from a central orchestrator, it is useful to go with the approach of providing the configuration before/at startup, so that the container can be re-started in case it stops unexpectedly.

Why required: Defines how simulation initialization is triggered. The "provide config at startup" approach is critical for orchestrated/containerized environments: if a container crashes and restarts, the orchestrator can relaunch it with the same configuration without manual intervention, enabling resilience and automation.

Info

  • Simulator software (name, e.g. K-Spice, LedaFlow)
  • Simulator version
  • API version
  • Short Description about Simulator Software and what it does

Why required: Allows operators and orchestrators to identify what they're talking to — software name, version, and purpose. The addition of an explicit API version field is critical: as the spec evolves, orchestrators must know which API version a simulator exposes to select compatible behaviour and avoid silent breaking changes in multi-simulator environments.

Deployment Health Status

the status of the software - API ready [true,false] - license status - get-log (should be limited to a meaningful amount of event log) - optional subscribe to logging (e.g. Open-Telemetry)

Why required: Answers "is the software operational?" — distinct from the simulation itself. A health probe (license valid, API ready) lets orchestrators (Kubernetes liveness/readiness probes, load balancers) decide whether to route traffic or restart the container, independent of whether a simulation is actively running. get-log gives operators a bounded view of recent events for diagnosing faults (e.g. license expiry, startup errors) without requiring direct container access. The optional log subscription (e.g. via OpenTelemetry) enables integration into centralised observability platforms so that events and warnings are surfaced proactively rather than discovered during post-mortem log inspection.

Simulator Status

the running status of the simulation itself - general simulation status: [running, stopped, loading, uninitialized] - achieved speed - requested speed - uptime - model-time (clock) - currently loaded/running case(s)/model(s) - optional subscribe on status variables, ref. subscription under Data Access

Why required: Answers "what is the simulation doing right now?" — separate from deployment health. Real-time status (running/stopped, achieved vs. requested speed, model time, uptime) lets orchestrators detect stalls, speed mismatches, or incorrect model loads and react accordingly, without requiring log parsing.

Simulator Control

  • start/run
  • stop/shut-down
  • optional set speed
  • optional step (one tick/time-step)
  • optional set time/simulator-clock
  • load file/case/snapshot
  • save file/case/snapshot
  • pause
  • download file/case/snapshot
  • upload file/case/snapshot
  • delete file/case/snapshot
  • optional multi-case run with argument that represents list of cases to execute

Why required: Provides full programmatic control over the simulation lifecycle: - start/run, stop/shut-down, pause — allow orchestrators to drive the simulation state machine without manual intervention. - set speed (optional) — allows the orchestrator or test harness to control the real-time factor; running faster than real-time accelerates test throughput, while slower-than-real-time ensures external systems can keep pace. - step / single tick (optional) — enables deterministic, reproducible test replay by advancing the simulation exactly one time-step at a time; essential for debugging and validation workflows. - set time / simulator clock (optional) — allows seeking to a specific model-time, enabling fast-forward, reset, or branching from an arbitrary point without replaying from the start. - load/save file/case/snapshot — enable reproducible test scenarios and checkpointing; a saved snapshot can be reloaded after a crash or used to branch into multiple test runs from the same initial state. - upload/download file/case/snapshot — decouple case management from the host machine, allowing orchestrators or CI pipelines to push input files and pull results without shared filesystem access. - delete file/case/snapshot — prevent unbounded disk growth in long-running or automated environments where many cases are cycled through. - multi-case run (optional) — allows an orchestrator to queue a batch of cases as a single operation rather than issuing sequential load/run/stop cycles, reducing round-trip overhead and simplifying pipeline logic.

Data Access

  • available types (describes which kind of unit/modules/nodes/blocks/device the simulator contains, and which attributes and functions/methods/commands a unit/modules/nodes/blocks/device provides)
  • available units of measurement per attribute
  • topology (structure of available unit/modules/nodes/blocks/devices, this should be implemented in such a way that it returns a meaningful amount of data. It is not helpful to send thousands of api calls to retrieve a simple topology structure, but for large models it is also not smart to return the entire topology at once. The API should therefore provide something like a limit, recurse or similar)
  • read momentary value for a specific attribute/variable in a unit/modules/nodes/blocks/device
  • set/write value for a specific attribute/variable in a unit/modules/nodes/blocks/device
  • optional subscribe on a list/set of attributes/variables/values, possibly with option for providing the target for publishing the data to. Subscriptions should either die out automatically after a certain period, or be handled in an alternative reliable way that tolerates that clients die without notification without creating an endless overhead (e.g. keep-alive).

Why required: Makes the simulator's internal state observable and controllable by external systems: - available types — without a type catalogue, clients must hard-code assumptions about what attributes and commands exist; a self-describing API allows generic tooling to work with any simulator without bespoke adapters. - available units of measurement — numeric values are meaningless without knowing whether a pressure reading is in bar, Pa, or psi; exposing units per attribute allows consumers to correctly interpret and convert values without out-of-band documentation or hard-coded assumptions. - topology — models can have complex hierarchical structures; a paginated/recursive topology endpoint lets clients traverse large models efficiently without thousands of individual calls or receiving an unmanageably large payload in one go. - read momentary value — the primary mechanism for extracting live simulation data (sensor readings, state variables, outputs) into data pipelines, dashboards, or test assertions. - set/write value — enables closed-loop scenarios where external controllers or test harnesses inject inputs into the simulation at runtime, not just at load time. - subscription (optional) — polling for high-frequency data is inefficient and creates unnecessary API load; subscriptions allow the simulator to push updates, with automatic expiry or keep-alive to prevent resource leaks when clients disconnect without cleanup.