Systems Foundations

Virtualization, Bare Metal, and VM Orchestration as Product Infrastructure

How QEMU/KVM, libvirt, OVS, KubeVirt, Firecracker-style microVMs, Linux, bare metal lifecycle, VM orchestration, metering, and billing become the compute substrate beneath larger platform ecosystems.

Context

Bare metal and virtual machines look like infrastructure primitives, but the companies that learned to operate them as products became some of the largest ecosystems in technology. AWS EC2, Google Compute Engine, and Azure Virtual Machines are not just compute services. They are control planes for capacity, lifecycle, identity, networking, storage, metering, billing, support, and the higher-level products that sit on top.

Kubernetes did not replace that substrate. It usually rides on it. Managed Kubernetes, databases, AI platforms, build systems, notebooks, observability products, and enterprise applications depend on the provider first solving machine lifecycle, image lifecycle, tenant boundaries, network attachment, storage attachment, quotas, usage accounting, failure recovery, and billing. If you build on bare metal or VMs, you inherit those responsibilities instead of getting them for free.

Linux Is the Common Substrate

Linux powers most of this stack because it owns the primitives the platform composes: KVM for hardware virtualization, cgroups and namespaces for isolation, bridges and OVS for networking, block devices and filesystems for storage, systemd for host lifecycle, eBPF and perf tooling for observability, and device drivers for GPUs, NICs, disks, and accelerators.

QEMU, KVM, libvirt, OVS, container runtimes, Kubernetes nodes, Slurm nodes, and many cloud agents are different layers over the same operating substrate. A serious platform has to understand where Linux ends, where the control plane begins, and which layer owns scheduling, isolation, networking, storage, telemetry, and recovery.

Virtualization and Bare Metal Are Product Boundaries

A VM is not only a technical isolation mechanism. It is a product boundary that customers understand: a shape, price, lifecycle, SLA, image, network interface, disk, console, log stream, and support contract. Once you expose that boundary, you need to decide what customers can control and what the platform owns.

Bare metal removes some virtualization overhead and exposes hardware directly, which can be valuable for GPUs, NICs, storage, low-latency systems, compliance, or customers who want full machine ownership. It also removes many guardrails. Firmware, BMC access, disk state, PXE or image boot, secure wipe, hardware failure, rack locality, network ports, and support workflows become part of the product.

The Ecosystem Does Not Exist Until You Build It

A bare metal or VM product needs more than a scheduler. It needs a way to discover inventory, classify hardware, allocate capacity, install or boot hosts, manage firmware and images, attach networking and storage, recover failed machines, rotate credentials, meter usage, and explain charges. Without that system, every customer request becomes a manual operations ticket.

This is why hyperscalers became ecosystems. They did not only rent servers. They built APIs, instance types, identity, VPCs, volumes, snapshots, images, placement, autoscaling, tagging, metering, billing, support, quotas, and eventually managed services on top of the compute substrate.

Billing Is Architecture

Usage records are not an afterthought. The billing model decides what the system must observe: allocated time, running time, reserved capacity, GPU class, CPU and memory shape, storage size, network egress, snapshots, IPs, licenses, support tier, or marketplace fees. If the platform cannot meter the thing customers believe they bought, revenue leaks or disputes become operational work.

Good infrastructure products connect inventory, allocation, identity, metering, invoices, support, and lifecycle state. That is what makes cloud platforms feel like products instead of a pile of hosts, and it is why products built above EC2, GCE, and Azure VMs can become major revenue streams.

Comparison

What You Have to Build to Make Bare Metal or VMs a Product

The core work is not only creating machines. It is creating the control plane, lifecycle system, and commercial evidence needed for customers to trust, operate, and pay for them.

LayerWhat It Must DoWhy It Matters
InventoryTrack hosts, parts, GPU class, NICs, disks, rack, power, firmware, health, and ownership.You cannot allocate, repair, price, or support capacity you cannot describe.
ProvisioningInstall, image, boot, configure, validate, and hand off bare metal or VM instances repeatedly.Manual provisioning turns every customer launch into a services engagement.
LifecycleCreate, start, stop, resize, rebuild, drain, recover, wipe, retire, and reassign resources.Customers buy an operating model, not just access to a machine.
NetworkingAttach IPs, VLANs, VPC-like boundaries, firewalls, SR-IOV, RDMA, load balancing, DNS, and routing policy.Most customer failures show up as reachability, performance, isolation, or policy problems.
StorageAttach disks, images, snapshots, shared filesystems, object storage, backup, and secure deletion.Compute products become useful when state and recovery are predictable.
Metering and billingRecord usage, reservations, support tiers, credits, quotas, limits, invoices, and revenue reporting.A platform cannot scale commercially if finance and operations disagree about consumption.
Higher-level productsHost Kubernetes, Slurm, databases, AI serving, build systems, notebooks, and managed services on the substrate.The base compute layer becomes the launchpad for larger ecosystems and recurring revenue streams.

What to Understand

  • -Bare metal and VMs are not just resources; they are product contracts for lifecycle, isolation, support, pricing, and recovery.
  • -Kubernetes, Slurm, databases, notebooks, and AI products usually sit above a lower compute substrate that still needs ownership.
  • -The missing ecosystem is the hard part: inventory, provisioning, networking, storage, identity, metering, billing, support, and failure recovery.
  • -Virtualization is a product boundary as much as an infrastructure boundary: it affects isolation, upgrades, debugging, billing, support, and what customers believe they are buying.
  • -QEMU/KVM and libvirt give broad VM flexibility, but the diagram only works in production when CPU topology, memory backing, PCIe devices, OVS networking, image lifecycle, and host maintenance are explicit.
  • -KubeVirt lets VMs live inside Kubernetes workflows, which is useful when teams need VM semantics with Kubernetes scheduling, GitOps, policy, service discovery, and operator patterns.
  • -Firecracker-style microVMs are attractive for fast, strongly isolated CPU and memory workloads, but they do not provide QEMU-style VFIO passthrough and should not be treated as the answer for direct GPU or NIC ownership.
  • -GPU and NIC passthrough belongs in the QEMU/KVM, libvirt, or bare-metal decision path; it can preserve performance, but it moves risk into firmware, IOMMU groups, NUMA placement, reset behavior, live migration expectations, and failure recovery.
  • -Full passthrough outside Kubernetes can be the right answer when the workload needs simpler device ownership, lower orchestration overhead, predictable PCIe locality, direct scheduler control, or fewer abstraction layers between the job and hardware.
  • -Slurm can live outside Kubernetes for classic HPC scheduling, inside Kubernetes through integration layers, or beside Kubernetes as a separate control plane; the right choice depends on job shape, tenancy, accounting, operator model, and how much Kubernetes should own.
  • -Nested control planes can fight each other. A VM manager, Kubernetes, KubeVirt, batch scheduler, model server, network plugin, and storage client may all believe they own placement and recovery.
  • -The right abstraction depends on the workload: developer sandbox, tenant VM, microVM job runner, bare-metal training node, edge appliance, or long-lived service host.
  • -Isolation should be explicit across identity, data, credentials, topology, logs, management-plane visibility, and who can reach the guest or host control socket.

Common Failure Modes

  • -The team builds a scheduler but not provisioning, repair, billing, quota, support evidence, or secure deprovisioning.
  • -VMs are exposed as a product before image lifecycle, networking, storage, metering, and failure recovery are repeatable.
  • -Bare metal becomes a manual operations queue because firmware, BMC, disk wipe, inventory, and validation loops are not automated.
  • -Billing is added late, so usage records do not match customer contracts, resource ownership, or finance reporting.
  • -The VM boundary hides topology and makes performance bugs look like application bugs instead of host, PCIe, NUMA, network, or storage placement issues.
  • -Teams adopt KubeVirt because Kubernetes is familiar, then forget that VM boot, image import, guest agents, migration, storage classes, and node drains have different failure behavior than pods.
  • -Kubernetes is forced into the control path for workloads that would be easier to operate with full passthrough, direct host ownership, or a Slurm-first scheduling model.
  • -Slurm and Kubernetes both try to own placement, accounting, lifecycle, or recovery without a clear boundary for who schedules GPUs, drains nodes, handles failed jobs, and reports utilization.
  • -Firecracker or microVM isolation is treated as a drop-in answer for GPU or NIC workloads even though it is not a VFIO passthrough runtime and its networking, storage, debugging, snapshotting, and observability paths have different constraints.
  • -GPU reset, live migration expectations, or host drain behavior are not tested before customer workloads depend on them.
  • -NUMA, hugepages, CPU pinning, interrupt placement, tap devices, OVS bridges, and storage paths are left to defaults on latency-sensitive workloads.
  • -Isolation requirements are discussed after the architecture already assumes shared hosts, broad credentials, privileged agents, or a control plane with too much reach.

What Good Looks Like

  • -Every machine or VM has clear owner, lifecycle state, image, network, storage, health, metering, billing, and support evidence.
  • -Customers can consume repeatable shapes through APIs or workflows instead of one-off operator actions.
  • -The platform can host higher-level products like Kubernetes, Slurm, AI serving, and databases without losing track of cost, ownership, and failure domains.
  • -The design states which runtime owns which job: QEMU/libvirt for flexible VMs and passthrough, KubeVirt for Kubernetes-native VM lifecycle, Firecracker-style microVMs for fast isolated non-VFIO units, Slurm for HPC-style batch work, and bare metal when abstraction adds risk.
  • -Full passthrough paths are documented as a first-class option when direct device ownership, deterministic locality, and simpler failure recovery matter more than Kubernetes-native lifecycle control.
  • -Slurm and Kubernetes boundaries are explicit: which system schedules jobs, owns quotas and accounting, handles node drains, exposes logs, and reports GPU utilization.
  • -Topology-sensitive settings are visible and testable: CPU layout, memory backing, PCIe devices, NIC queues, OVS paths, hugepages, storage class, guest image, and node placement.
  • -Kubernetes and VM orchestration boundaries are clear: scheduling, admission, networking, storage, migration, health, rollback, and evidence collection each have an owner.
  • -Operational playbooks cover host maintenance, stuck devices, guest failure, image updates, VM migration limits, node drains, and what evidence to collect before rebooting or rescheduling.
  • -Bare metal remains an explicit option when virtualization would hide hardware behavior, complicate passthrough, or create more customer risk than value.

Field Notes

Public Checks and Protected Preview

These public snippets show the operating questions and evidence I look for. The protected area will add source-code context, diagrams, templates, and implementation examples when ready.

Quick Diagnostic

  • -What is the unit of sale or operation: bare-metal host, VM shape, reserved capacity, GPU slice, tenant cluster, or managed platform surface?
  • -Can the platform prove one source of truth for inventory, allocation, lifecycle state, metering, billing, support, and ownership?
  • -Which layer owns Linux host configuration, firmware and BMC state, images, networking, storage, identity, quota, repair, and deprovisioning?
  • -Which runtime is responsible for the workload: QEMU/libvirt VM, KubeVirt VM, Firecracker-style non-VFIO microVM, container, Slurm node, or bare metal?
  • -Does the workload actually benefit from Kubernetes ownership, or would full passthrough outside Kubernetes reduce latency, placement ambiguity, and operational surface area?
  • -Should Slurm own scheduling, should Kubernetes own scheduling, or should they integrate with a clear contract for quotas, accounting, node drains, and GPU allocation?
  • -What does the VM boundary protect: customer isolation, lifecycle control, reproducibility, hardware sharing, fast job startup, or support workflow?
  • -If passthrough is involved, are IOMMU grouping, reset behavior, NUMA locality, OVS paths, and host maintenance tested?
  • -Are VM, Kubernetes, KubeVirt, batch scheduler, model server, network plugin, and storage placement decisions fighting each other?

Evidence to Look For

  • -Inventory schema covering host class, GPU, NIC, disk, rack, power, firmware, BMC, health, lifecycle state, and customer or tenant ownership.
  • -Provisioning and image-lifecycle record showing boot path, OS image, kernel, drivers, network attachment, storage attachment, validation checks, and handoff state.
  • -Metering event model that maps resource allocation and runtime state to invoices, credits, quotas, alerts, support records, and revenue reporting.
  • -Guest CPU layout, memory backing, hugepage use, device locality, NIC queues, OVS bridge path, storage class, and guest image notes.
  • -Control-plane ownership map for libvirt, Kubernetes, KubeVirt, Firecracker-style microVM orchestration, GitOps, network policy, storage, and observability.
  • -Passthrough decision record comparing Kubernetes-native VM lifecycle against direct host ownership, bare metal, and Slurm-first operation.
  • -Slurm and Kubernetes boundary map covering job placement, quotas, accounting, logs, failed jobs, node drains, and utilization reporting.
  • -Host maintenance, stuck-device, guest-failure, image-update, node-drain, migration-limit, and evidence-collection procedure summaries.
  • -Isolation review covering identity, data, credentials, logs, topology visibility, guest access, host sockets, and management-plane reach.

Protected Preview

  • -Control-plane schema reviews for inventory, allocation, lifecycle, billing, and support state.
  • -Metering and billing model walkthroughs for bare metal, VM, storage, network, support, and reservation products.
  • -Host lifecycle runbooks for provisioning, firmware, BMC, secure wipe, repair, drains, and reassignment.
  • -QEMU/KVM/libvirt topology review examples.
  • -KubeVirt and Firecracker-style non-VFIO orchestration decision templates.
  • -Slurm inside, outside, and beside Kubernetes tradeoff reviews.
  • -Full-passthrough architecture notes for GPU, NIC, and storage-heavy workloads.
  • -Passthrough readiness and customer-isolation checklists.
  • -Host configuration patterns for reproducible VM and microVM lifecycle work.

Further Resources

Protected Resources

Host configs, topology diagrams, passthrough notes, customer isolation reviews, billing schemas, and lifecycle runbooks stay in the protected area.

View Gated Resources