[PATCH i-g-t 08/10] tools/vmtb: Redesign VirtualMachine class

public inbox for igt-dev@lists.freedesktop.org
 help / color / mirror / Atom feed

From: Adam Miszczak <adam.miszczak@linux.intel.com>
To: igt-dev@lists.freedesktop.org
Cc: marcin.bernatowicz@linux.intel.com, kamil.konieczny@linux.intel.com
Subject: [PATCH i-g-t 08/10] tools/vmtb: Redesign VirtualMachine class
Date: Tue, 24 Feb 2026 08:50:25 +0100	[thread overview]
Message-ID: <20260224075027.2409675-9-adam.miszczak@linux.intel.com> (raw)
In-Reply-To: <20260224075027.2409675-1-adam.miszczak@linux.intel.com>

Split original VirtualMachine class into specialized abstractions:
VirtualMachine, VirtualDevice and VfDriver.
Extend new VirtualMachine with new functions to load and remove DRM driver
and discover devices on Guest OS.

Redesigned abstractions have the following responsibilities:
- VirtualMachine supports operations like start/stop VM, load/unload driver,
  enumerate virtual devices, execute process on guest,
  power states handling, save/load state etc.
- VirtualDevice represents GPU device visible on a guest OS
  (resources allocated to VF, local PCI BDF, Device ID)
- VfDriver handles low-level functions of virtual device
  as sysfs/debugfs access

New implementation provides placeholders for multiple VFs passed to VM,
but currently only single VF (one virtual GPU per VM) is supported.

Additionally, create separate VM/Guest kernel log files,
located in a root VMTB directory.
VM dmesg is also propagated to logfile.log as previously.

Signed-off-by: Adam Miszczak <adam.miszczak@linux.intel.com>
---
 tools/vmtb/bench/machines/virtual/vm.py | 252 +++++++++++-------------
 tools/vmtb/vmm_flows/conftest.py        |   7 +-
 2 files changed, 116 insertions(+), 143 deletions(-)

diff --git a/tools/vmtb/bench/machines/virtual/vm.py b/tools/vmtb/bench/machines/virtual/vm.py
index 312e87e4b..2469af965 100644
--- a/tools/vmtb/bench/machines/virtual/vm.py
+++ b/tools/vmtb/bench/machines/virtual/vm.py
@@ -4,14 +4,14 @@
 import base64
 import json
 import logging
-import os
-import posixpath
+import re
 import shlex
 import signal
 import subprocess
 import threading
 import time
 import typing
+from pathlib import Path
 from types import FrameType
 
 from bench import exceptions
@@ -21,6 +21,7 @@ from bench.machines.machine_interface import (DEFAULT_TIMEOUT,
                                               SuspendMode)
 from bench.machines.virtual.backends.guestagent import GuestAgentBackend
 from bench.machines.virtual.backends.qmp_monitor import QmpMonitor
+from bench.machines.virtual.device import VirtualDevice
 
 logger = logging.getLogger('VirtualMachine')
 
@@ -60,40 +61,34 @@ class VirtualMachine(MachineInterface):
 
     def __init__(self, vm_number: int, backing_image: str, driver: str,
                  igt_config: VmtbIgtConfig, vf_migration_support: bool) -> None:
-        self.vf_bdf: typing.Optional[str] = None
-        self.process: typing.Optional[subprocess.Popen] = None
         self.vmnum: int = vm_number
-        self.card_num: int = 0
-        self.sysfs_prefix_path = posixpath.join('/sys/class/drm/', f'card{str(self.card_num)}')
-        self.questagent_sockpath = posixpath.join('/tmp', f'qga{self.vmnum}.sock')
-        self.qmp_sockpath = posixpath.join('/tmp', f'mon{self.vmnum}.sock')
         self.drm_driver_name: str = driver
         self.igt_config: VmtbIgtConfig = igt_config
         self.vf_migration: bool = vf_migration_support
 
-        if not posixpath.exists(backing_image):
+        # Passed VFs and VirtualDevices list - placeholder for multiple VFs passed to single VM:
+        # currently only one VF/VirtualDevice per VM supported (ie. passed_vf_bdfs[0]).
+        # TODO: add support for multiple VFs/VirtualDevices per VM
+        self.passed_vf_bdfs: typing.List[str] = []
+        self.gpu_devices: typing.List[VirtualDevice] = []
+        self.process: typing.Optional[subprocess.Popen] = None
+
+        self.questagent_sockpath = Path('/tmp') / f'qga{self.vmnum}.sock'
+        self.qmp_sockpath = Path('/tmp') / f'mon{self.vmnum}.sock'
+
+        if not Path(backing_image).exists():
             logger.error('No image for VM%s', self.vmnum)
             raise exceptions.GuestError(f'No image for VM{self.vmnum}')
+
         self.image: str = self.__create_qemu_image(backing_image)
+
         self.migrate_source_image: typing.Optional[str] = None
         self.migrate_destination_vm: bool = False
 
-        # Resources provisioned to the VF/VM:
-        self._lmem_size: typing.Optional[int] = None
-        self._ggtt_size: typing.Optional[int] = None
-        self._contexts: typing.Optional[int] = None
-        self._doorbells: typing.Optional[int] = None
-
-        # GT number and tile is relevant mainly for multi-tile devices
-        # List of all GTs used by a given VF:
-        # - for single-tile: only root [0]
-        # - for multi-tile Mode 2/3: either root [0] or remote [1]
-        # - for multi-tile Mode 1: spans on both tiles [0, 1]
-        self._gt_nums: typing.List[int] = []
-        self._tile_mask: typing.Optional[int] = None
+        self.dmesg_logger = self.__setup_dmesg_logger()
 
     def __str__(self) -> str:
-        return f'VM{self.vmnum}_{self.vf_bdf}'
+        return f'VM{self.vmnum}-VF:{self.passed_vf_bdfs[0] if self.passed_vf_bdfs else 'N/A'}'
 
     def __del__(self) -> None:
         if not self.is_running():
@@ -112,6 +107,18 @@ class VirtualMachine(MachineInterface):
             print('QEMU did not terminate, killing it')
             self.process.kill()
 
+    def __setup_dmesg_logger(self) -> logging.Logger:
+        """Configure VM dmesg logger.
+        Logs are directed to dedicated vm[NUM]_dmesg.log and propagated to logfile.log.
+        """
+        dmesg_logger = logging.getLogger(f'VM{self.vmnum}-kmsg')
+        # Remove any existing logger handlers to avoid duplicated prints for parametrized tests
+        dmesg_logger.handlers.clear()
+        dmesg_log_handler = logging.FileHandler(f'vm{self.vmnum}_dmesg.log')
+        dmesg_logger.addHandler(dmesg_log_handler)
+
+        return dmesg_logger
+
     def __get_backing_file_format(self, backing_file: str) -> typing.Any:
         """Get the format of the backing image file using qemu-img info."""
         command = ['qemu-img', 'info', '--output=json', backing_file]
@@ -143,14 +150,13 @@ class VirtualMachine(MachineInterface):
         return output_image
 
     def __log_qemu_output(self, out: typing.TextIO) -> None:
-        stdoutlog = logging.getLogger(f'VM{self.vmnum}-kmsg')
         for line in iter(out.readline, ''):
-            stdoutlog.debug(line.strip())
+            self.dmesg_logger.debug(line.strip())
 
     def __sockets_exists(self) -> bool:
-        return os.path.exists(self.questagent_sockpath) and os.path.exists(self.qmp_sockpath)
+        return self.questagent_sockpath.exists() and self.qmp_sockpath.exists()
 
-    def __get_popen_command(self) -> typing.List[str]:
+    def __prepare_qemu_command(self) -> typing.List[str]:
         command = ['qemu-system-x86_64',
                    '-vnc', f':{self.vmnum}',
                    '-serial', 'stdio',
@@ -165,8 +171,8 @@ class VirtualMachine(MachineInterface):
                    '-chardev', f'socket,id=mon{self.vmnum},path=/tmp/mon{self.vmnum}.sock,server=on,wait=off',
                    '-mon', f'chardev=mon{self.vmnum},mode=control']
 
-        if self.vf_bdf:
-            command.extend(['-enable-kvm', '-cpu', 'host', '-device', f'vfio-pci,host={self.vf_bdf}'])
+        if self.passed_vf_bdfs:
+            command.extend(['-enable-kvm', '-cpu', 'host', '-device', f'vfio-pci,host={self.passed_vf_bdfs[0]}'])
             if self.vf_migration:
                 command[-1] += ',enable-migration=on'
 
@@ -185,77 +191,82 @@ class VirtualMachine(MachineInterface):
             cur = cur[key]
         return cur
 
-    @property
-    def get_vm_num(self) -> int:
-        return self.vmnum
-
     def assign_vf(self, vf_bdf: str) -> None:
-        self.vf_bdf = vf_bdf
+        """Pass VFs to VM - required to run QEMU (prior to VM power on)"""
+        self.passed_vf_bdfs.append(vf_bdf)
 
     def set_migration_source(self, src_image: str) -> None:
         self.migrate_source_image = src_image
         self.migrate_destination_vm = True
 
-    @property
-    def lmem_size(self) -> typing.Optional[int]:
-        if self._lmem_size is None:
-            self.helper_get_debugfs_selfconfig()
-
-        return self._lmem_size
-
-    @property
-    def ggtt_size(self) -> typing.Optional[int]:
-        if self._ggtt_size is None:
-            self.helper_get_debugfs_selfconfig()
-
-        return self._ggtt_size
-
-    @property
-    def contexts(self) -> typing.Optional[int]:
-        if self._contexts is None:
-            self.helper_get_debugfs_selfconfig()
-
-        return self._contexts
-
-    @property
-    def doorbells(self) -> typing.Optional[int]:
-        if self._doorbells is None:
-            self.helper_get_debugfs_selfconfig()
-
-        return self._doorbells
-
-    @property
-    def tile_mask(self) -> typing.Optional[int]:
-        if self._tile_mask is None:
-            self.helper_get_debugfs_selfconfig()
-
-        return self._tile_mask
-
-    @property
-    def gt_nums(self) -> typing.List[int]:
-        self._gt_nums = self.get_gt_num_from_sysfs()
-        if not self._gt_nums:
-            logger.warning("VM sysfs: missing GT index")
-            self._gt_nums = [0]
-
-        return self._gt_nums
-
-    def get_gt_num_from_sysfs(self) -> typing.List[int]:
-        # Get GT number of VF passed to a VM, based on an exisitng a sysfs path
-        vm_gt_num = []
-        if self.dir_exists(posixpath.join(self.sysfs_prefix_path, 'gt/gt0')):
-            vm_gt_num.append(0)
-        if self.dir_exists(posixpath.join(self.sysfs_prefix_path, 'gt/gt1')):
-            vm_gt_num.append(1)
-
-        return vm_gt_num
-
     def get_drm_driver_name(self) -> str:
         return self.drm_driver_name
 
     def get_igt_config(self) -> VmtbIgtConfig:
         return self.igt_config
 
+    def get_dut(self) -> VirtualDevice:
+        # Currently only one first enumerated device is supported (virtual card0)
+        return self.gpu_devices[0]
+
+    def is_drm_driver_loaded(self) -> bool:
+        return self.dir_exists(f'/sys/bus/pci/drivers/{self.drm_driver_name}')
+
+    def load_drm_driver(self) -> None:
+        """Load (modprobe) guest DRM driver."""
+        if not self.is_drm_driver_loaded():
+            logger.debug("VirtualMachine - load DRM driver")
+            drv_probe_pid = self.execute(f'modprobe {self.drm_driver_name}')
+            if self.execute_wait(drv_probe_pid).exit_code != 0:
+                logger.error("%s driver probe failed on guest!", self.drm_driver_name)
+                raise exceptions.GuestError(f'{self.drm_driver_name} driver probe failed on guest!')
+
+    def unload_drm_driver(self) -> None:
+        """Unload (remove) guest DRM driver."""
+        logger.debug("VirtualMachine - unload DRM driver")
+        for device in self.gpu_devices:
+            logger.debug("Unbind %s from virtual device %s", self.drm_driver_name, device.pci_info.bdf)
+            device.unbind_driver()
+
+        rmmod_pid = self.execute(f'modprobe -rf {self.drm_driver_name}')
+        if self.execute_wait(rmmod_pid).exit_code != 0:
+            logger.error("DRM driver remove failed!")
+            raise exceptions.HostError('DRM driver remove failed!')
+
+        logger.debug("%s successfully removed", self.drm_driver_name)
+
+    def discover_devices(self) -> None:
+        """Detect all (virtual) PCI GPU devices on the guest and initialize VirtualDevice list."""
+        if not self.is_drm_driver_loaded():
+            logger.error("Unable to discover devices on guest - %s driver is not loaded!", self.drm_driver_name)
+            raise exceptions.HostError(
+                f'Unable to discover devices on guest - {self.drm_driver_name} driver is not loaded!')
+
+        detected_devices: typing.List[VirtualDevice] = []
+        drv_path = Path('/sys/bus/pci/drivers/') / self.drm_driver_name
+
+        dev_dir_ls = self.dir_list(str(drv_path))
+
+        # Look for a directory name with a PCI BDF (e.g. 0000:1a:00.0)
+        for bdf in dev_dir_ls:
+            match = re.match(r'\d{4}(?::[0-9a-z-A-Z]{2}){2}.[0-7]', bdf)
+            if match:
+                device = VirtualDevice(bdf, self)
+                detected_devices.append(device)
+
+        # Output list of detected devices sorted by an ascending card index (device minor number)
+        self.gpu_devices = sorted(detected_devices, key=lambda dev: dev.pci_info.minor_number)
+
+        if not self.gpu_devices:
+            logger.error("Virtualized GPU PCI device (bound to %s driver) not detected!", self.drm_driver_name)
+            raise exceptions.GuestError(
+                f'Virtualized GPU PCI device (bound to {self.drm_driver_name} driver) not detected!')
+
+        logger.debug("Detected virtualized GPU PCI device(s):")
+        for dev in self.gpu_devices:
+            logger.debug("[virtual card%s] PCI BDF: %s / DevID: %s (%s)",
+                          dev.pci_info.minor_number, dev.pci_info.bdf, dev.pci_info.devid, dev.gpu_model)
+
     @Decorators.timeout_signal
     def poweron(self) -> None:
         logger.debug('Powering on VM%s', self.vmnum)
@@ -263,7 +274,7 @@ class VirtualMachine(MachineInterface):
             logger.warning('VM%s already running', self.vmnum)
             return
 
-        command = self.__get_popen_command()
+        command = self.__prepare_qemu_command()
         # We don't want to kill the process created here (like 'with' would do) so disable the following linter issue:
         # R1732: consider-using-with (Consider using 'with' for resource-allocating operations)
         # pylint: disable=R1732
@@ -292,8 +303,8 @@ class VirtualMachine(MachineInterface):
                 logger.info('waiting for socket')
                 time.sleep(1)
             # Passing five minutes timeout for every command
-            self.ga = GuestAgentBackend(self.questagent_sockpath, 300)
-            self.qm = QmpMonitor(self.qmp_sockpath, 300)
+            self.ga = GuestAgentBackend(str(self.questagent_sockpath), 300)
+            self.qm = QmpMonitor(str(self.qmp_sockpath), 300)
             vm_status = self.qm.query_status()
 
             if not self.migrate_destination_vm and vm_status != 'running':
@@ -317,6 +328,7 @@ class VirtualMachine(MachineInterface):
 
     @Decorators.timeout_signal
     def poweroff(self) -> None:
+        """Power off VM via the Guest-Agent guest-shutdown(powerdown) command."""
         logger.debug('Powering off VM%s', self.vmnum)
         assert self.process
         if not self.is_running():
@@ -338,8 +350,9 @@ class VirtualMachine(MachineInterface):
 
             if self.__sockets_exists():
                 # Remove leftovers and notify about unclear qemu shutdown
-                os.remove(self.questagent_sockpath)
-                os.remove(self.qmp_sockpath)
+                self.questagent_sockpath.unlink()
+                self.qmp_sockpath.unlink()
+                logger.error('VM%s was not gracefully powered off - sockets exist', self.vmnum)
                 raise exceptions.GuestError(f'VM{self.vmnum} was not gracefully powered off - sockets exist')
 
     def reboot(self) -> None:
@@ -441,7 +454,7 @@ class VirtualMachine(MachineInterface):
         ret = execout.get('return')
         if ret:
             pid: int = ret.get('pid')
-            logger.debug('Running %s on VM%s with pid %s', command, self.vmnum, pid)
+            logger.debug("Run command on VM%s: %s (PID: %s)", self.vmnum, command, pid)
             return pid
 
         logger.error('Command %s did not return pid', command)
@@ -519,9 +532,12 @@ class VirtualMachine(MachineInterface):
         return True
 
     def dir_list(self, path: str) -> typing.List[str]:
-        # TODO: implement, currently no-op to fulfill MachineInterface requirement
-        logger.warning("VirtualMachine.dir_list() is not implemented yet!")
-        return []
+        pid = self.execute(f'/bin/sh -c "ls {path}"')
+        status: ProcessResult = self.execute_wait(pid)
+        if status.exit_code:
+            raise exceptions.GuestError(f'VM ls failed - error: {status.exit_code}')
+
+        return status.stdout.split()
 
     def link_exists(self, path: str) -> bool:
         pid = self.execute(f'/bin/sh -c "[ -h {path} ]"')
@@ -571,45 +587,3 @@ class VirtualMachine(MachineInterface):
             raise exceptions.GuestError(f'VM{self.vmnum} state load error: {job_error}')
 
         logger.debug('VM state load finished successfully')
-
-    # helper_convert_units_to_bytes - convert size with units to bytes
-    # @size_str: multiple-byte unit size with suffix (K/M/G)
-    # Returns: size in bytes
-    # TODO: function perhaps could be moved to some new utils module
-    # improve - consider regex to handle various formats eg. both M and MB
-    def helper_convert_units_to_bytes(self, size_str: str) -> int:
-        size_str = size_str.upper()
-        size_int = 0
-
-        if size_str.endswith('B'):
-            size_int = int(size_str[0:-1])
-        elif size_str.endswith('K'):
-            size_int = int(size_str[0:-1]) * 1024
-        elif size_str.endswith('M'):
-            size_int = int(size_str[0:-1]) * 1024**2
-        elif size_str.endswith('G'):
-            size_int = int(size_str[0:-1]) * 1024**3
-
-        return size_int
-
-    # helper_get_debugfs_selfconfig - read resources allocated to VF from debugfs:
-    # /sys/kernel/debug/dri/@card/gt@gt_num/iov/self_config
-    # @card: card number
-    # @gt_num: GT instance number
-    def helper_get_debugfs_selfconfig(self, card: int = 0, gt_num: int = 0) -> None:
-        path = posixpath.join(f'/sys/kernel/debug/dri/{card}/gt{gt_num}/iov/self_config')
-        out = self.read_file_content(path)
-
-        for line in out.splitlines():
-            param, value = line.split(':')
-
-            if param == 'GGTT size':
-                self._ggtt_size = self.helper_convert_units_to_bytes(value)
-            elif param == 'LMEM size':
-                self._lmem_size = self.helper_convert_units_to_bytes(value)
-            elif param == 'contexts':
-                self._contexts = int(value)
-            elif param == 'doorbells':
-                self._doorbells = int(value)
-            elif param == 'tile mask':
-                self._tile_mask = int(value, base=16)
diff --git a/tools/vmtb/vmm_flows/conftest.py b/tools/vmtb/vmm_flows/conftest.py
index a2a6b6680..ae149d652 100644
--- a/tools/vmtb/vmm_flows/conftest.py
+++ b/tools/vmtb/vmm_flows/conftest.py
@@ -16,7 +16,6 @@ from bench.configurators.vgpu_profile_config import (VfProvisioningMode,
                                                      VfSchedulingMode,
                                                      VgpuProfileConfigurator)
 from bench.configurators.vmtb_config import VmtbConfigurator
-from bench.helpers.helpers import modprobe_driver, modprobe_driver_check
 from bench.helpers.log import HOST_DMESG_FILE
 from bench.machines.host import Device, Host
 from bench.machines.virtual.vm import VirtualMachine
@@ -273,9 +272,9 @@ def fixture_setup_vms(get_vmtb_config, get_cmdline_config, get_host, request):
         ts.poweron_vms()
 
         if tc.auto_probe_vm_driver:
-            modprobe_cmds = [modprobe_driver(vm) for vm in ts.get_vm]
-            for i, cmd in enumerate(modprobe_cmds):
-                assert modprobe_driver_check(ts.get_vm[i], cmd), f'modprobe failed on VM{i}'
+            for vm in ts.get_vm:
+                vm.load_drm_driver()
+                vm.discover_devices()
 
     logger.info('[Test execution: %sVF-%sVM]', num_vfs, num_vms)
     yield ts
-- 
2.39.1

next prev parent reply	other threads:[~2026-02-24  8:24 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-24  7:50 [PATCH i-g-t 00/10] vmtb: Modernize SR-IOV VM Test Bench core Adam Miszczak
2026-02-24  7:50 ` [PATCH i-g-t 01/10] tools/vmtb: Update QEMU parameters Adam Miszczak
2026-03-10 10:22   ` Bernatowicz, Marcin
2026-02-24  7:50 ` [PATCH i-g-t 02/10] tools/vmtb: Fix DUT selection based on card index Adam Miszczak
2026-03-10 10:26   ` Bernatowicz, Marcin
2026-02-24  7:50 ` [PATCH i-g-t 03/10] tools/vmtb: Fix VM snapshot query handling Adam Miszczak
2026-03-10 10:29   ` Bernatowicz, Marcin
2026-02-24  7:50 ` [PATCH i-g-t 04/10] tools/vmtb: Extend IGT and WSIM abstractions Adam Miszczak
2026-03-10 10:36   ` Bernatowicz, Marcin
2026-02-24  7:50 ` [PATCH i-g-t 05/10] tools/vmtb: VF auto/fair provisioning support Adam Miszczak
2026-03-10 10:38   ` Bernatowicz, Marcin
2026-02-24  7:50 ` [PATCH i-g-t 06/10] tools/vmtb: Refactor driver interfaces Adam Miszczak
2026-03-10 10:43   ` Bernatowicz, Marcin
2026-02-24  7:50 ` [PATCH i-g-t 07/10] tools/vmtb: Introduce VirtualDevice class Adam Miszczak
2026-03-10 10:45   ` Bernatowicz, Marcin
2026-02-24  7:50 ` Adam Miszczak [this message]
2026-03-10 10:47   ` [PATCH i-g-t 08/10] tools/vmtb: Redesign VirtualMachine class Bernatowicz, Marcin
2026-02-24  7:50 ` [PATCH i-g-t 09/10] tools/vmtb: Support max VFs configuration Adam Miszczak
2026-03-10 10:52   ` Bernatowicz, Marcin
2026-02-24  7:50 ` [PATCH i-g-t 10/10] tools/vmtb: Platform enabling: PTL and BMG support Adam Miszczak
2026-03-10 10:52   ` Bernatowicz, Marcin
2026-02-24 11:49 ` ✓ Xe.CI.BAT: success for vmtb: Modernize SR-IOV VM Test Bench core Patchwork
2026-02-24 12:43 ` ✓ i915.CI.BAT: " Patchwork
2026-02-24 16:27 ` ✗ i915.CI.Full: failure " Patchwork
2026-02-24 20:21 ` ✗ Xe.CI.FULL: " Patchwork

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:312e87e4 dfblob:2469af96 dfblob:a2a6b668 dfblob:ae149d65 )
 OR (
bs:"[PATCH i-g-t 08/10] tools/vmtb: Redesign VirtualMachine class" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260224075027.2409675-9-adam.miszczak@linux.intel.com \
    --to=adam.miszczak@linux.intel.com \
    --cc=igt-dev@lists.freedesktop.org \
    --cc=kamil.konieczny@linux.intel.com \
    --cc=marcin.bernatowicz@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox