From: "Bernatowicz, Marcin" <marcin.bernatowicz@linux.intel.com>
To: Adam Miszczak <adam.miszczak@linux.intel.com>,
igt-dev@lists.freedesktop.org
Cc: kamil.konieczny@linux.intel.com
Subject: Re: [PATCH i-g-t 3/3] tools/vmtb: Add VF migration tests
Date: Thu, 14 May 2026 15:21:19 +0200 [thread overview]
Message-ID: <bc707c43-dd34-4ab1-8ce3-dde055b2fc2c@linux.intel.com> (raw)
In-Reply-To: <20260416083544.2441874-4-adam.miszczak@linux.intel.com>
On 4/16/2026 10:35 AM, Adam Miszczak wrote:
> Introduce a comprehensive VF migration (state save/restore) test suite,
> covering the following scenarios:
> - idle migration: no GPU workload active during save/restore
> - idle app migration: GPU contexts created but idle during save/restore
> - busy migration (WSIM): short/long preemptable and non-preemptable batches
> - busy migration (IGT): xe_exec_reset, xe_exec_threads, xe_ccs,
> xe_compute_preempt workloads in multiple variants
> - double migration: re-migration while post-restore resource fixup
> (resfix) is in progress, tested at few KMD debug checkpoints
> - checkpointing: restore a VM state saved at earlier point of time
> - auxiliary: basic pause/resume exercise, migrate without VF driver loaded
>
> Tests support execution in diversity of configuration variants:
> VF/VM counts, auto provisioning or vGPU profiles and scheduling modes.
>
> Signed-off-by: Adam Miszczak <adam.miszczak@linux.intel.com>
> ---
> tools/vmtb/vmm_flows/test_migration.py | 1199 ++++++++++++++++++++++++
> 1 file changed, 1199 insertions(+)
> create mode 100644 tools/vmtb/vmm_flows/test_migration.py
>
> diff --git a/tools/vmtb/vmm_flows/test_migration.py b/tools/vmtb/vmm_flows/test_migration.py
> new file mode 100644
> index 000000000..8a3f10d52
> --- /dev/null
> +++ b/tools/vmtb/vmm_flows/test_migration.py
> @@ -0,0 +1,1199 @@
> +# SPDX-License-Identifier: MIT
> +# Copyright © 2024-2026 Intel Corporation
> +
> +import enum
> +import logging
> +import random
> +import time
> +from dataclasses import dataclass
> +from typing import List, Tuple
> +
> +import pytest
> +
> +from bench import exceptions
> +from bench.configurators.vgpu_profile_config import VfProvisioningMode, VfSchedulingMode
> +from bench.executors.gem_wsim import ONE_CYCLE_DURATION_MS, PREEMPT_10MS_WORKLOAD, GemWsim
> +from bench.executors.igt import IgtExecutor, IgtType
> +from bench.executors.shell import ShellExecutor
> +from bench.helpers.helpers import (cmd_run_check, driver_check,
> + duplicate_vm_image, igt_check,
> + igt_run_check, modprobe_driver_run_check)
> +from bench.machines.host import Host
> +from bench.machines.virtual.vm import VirtualMachine
> +from vmm_flows.conftest import (VmmTestingConfig, VmmTestingSetup,
> + idfn_test_config)
> +
> +logger = logging.getLogger(__name__)
> +
> +IGT_INIT_DELAY = 6 # Time between WL start and VM pause (pre-save)
> +IGT_RESTORE_DELAY = 3 # Time between VM resume and WL status check (post-restore)
> +MS_IN_SEC = 1000
> +
> +
> +# Full configuration variant: 1xVF, 2xVF and MAXxVF with auto and vGPU profiles provisioning
> +# TODO: add max VFs variants
> +test_variants_full = [(1, VfProvisioningMode.AUTO, VfSchedulingMode.DEFAULT_PROFILE),
> + (2, VfProvisioningMode.AUTO, VfSchedulingMode.DEFAULT_PROFILE),
> + (1, VfProvisioningMode.VGPU_PROFILE, VfSchedulingMode.DEFAULT_PROFILE),
> + (2, VfProvisioningMode.VGPU_PROFILE, VfSchedulingMode.DEFAULT_PROFILE)]
> +
> +
> +# Basic configuration variant: 1xVF and 2xVF with auto provisioning
> +test_variants_basic = [(1, VfProvisioningMode.AUTO, VfSchedulingMode.DEFAULT_PROFILE),
> + (2, VfProvisioningMode.AUTO, VfSchedulingMode.DEFAULT_PROFILE)]
> +
> +
> +# vGPU profiles configuration variant: 1xVF and 2xVF with vGPU profiles provisioning
> +test_variants_profiles = [(1, VfProvisioningMode.VGPU_PROFILE, VfSchedulingMode.DEFAULT_PROFILE),
> + (2, VfProvisioningMode.VGPU_PROFILE, VfSchedulingMode.DEFAULT_PROFILE)]
> +
> +
> +@dataclass
> +class MigrationWorkloadWsim:
> + workload_file: str # Wsim workload descriptor file
> + num_clients: int # Fork N clients emitting the workload simultaneously
> + num_repeats: int # How many times to emit the workload
> +
> + def __str__(self) -> str:
> + return f'WL:{self.workload_file}-(C:{self.num_clients} R:{self.num_repeats})'
> +
> +
> +# VF busy migration WSIM workloads (payload for TestBusyMigrationWsim[N]):
> +wsim_idle_app = MigrationWorkloadWsim('idle_ctxs', 1, 1)
> +wsim_short_preempt = MigrationWorkloadWsim('short_preempt', 1, 4000) # 5ms * 4000 (20s)
> +wsim_short_nonpreempt = MigrationWorkloadWsim('short_nonpreempt', 1, 4000)
> +wsim_long_preempt = MigrationWorkloadWsim('long_preempt', 1, 200) # 100ms * 200 (20s)
> +wsim_long_nonpreempt = MigrationWorkloadWsim('long_nonpreempt', 1, 200)
> +
> +
> +@dataclass
> +class MigrationWorkloadIgt:
> + igt_test: IgtType # IGT test type
> + num_repeats: int = 1 # Number of repeats of the IGT test (calibrated in runtime)
> +
> + def __str__(self) -> str:
> + return f'WL:{self.igt_test}'
> +
> +# VF busy migration IGT workloads (payload for TestBusyMigrationIgt[M]):
> +# xe_exec_reset/long_spin subtests:
> +# Average exec time: 12-13s - execute 1x
> +igt_exec_reset_long_spin_many_preempt = MigrationWorkloadIgt(
> + IgtType.EXEC_RESET_LONG_SPIN_MANY_PREEMPT)
> +igt_exec_reset_long_spin_many_preempt_media = MigrationWorkloadIgt(
> + IgtType.EXEC_RESET_LONG_SPIN_MANY_PREEMPT_MEDIA)
> +igt_exec_reset_long_spin_many_preempt_threads = MigrationWorkloadIgt(
> + IgtType.EXEC_RESET_LONG_SPIN_MANY_PREEMPT_THREADS)
> +igt_exec_reset_long_spin_many_preempt_gt0_threads = MigrationWorkloadIgt(
> + IgtType.EXEC_RESET_LONG_SPIN_MANY_PREEMPT_GT0_THREADS)
> +# Average exec time: 6-7s - execute 2x
> +igt_exec_reset_long_spin_many_preempt_gt1_threads = MigrationWorkloadIgt(
> + IgtType.EXEC_RESET_LONG_SPIN_MANY_PREEMPT_GT1_THREADS)
> +
> +# Average exec time: 12-13s - execute 1x
> +igt_exec_reset_long_spin_reuse_many_preempt = MigrationWorkloadIgt(
> + IgtType.EXEC_RESET_LONG_SPIN_REUSE_MANY_PREEMPT)
> +igt_exec_reset_long_spin_reuse_many_preempt_media = MigrationWorkloadIgt(
> + IgtType.EXEC_RESET_LONG_SPIN_REUSE_MANY_PREEMPT_MEDIA)
> +igt_exec_reset_long_spin_reuse_many_preempt_threads = MigrationWorkloadIgt(
> + IgtType.EXEC_RESET_LONG_SPIN_REUSE_MANY_PREEMPT_THREADS)
> +igt_exec_reset_long_spin_reuse_many_preempt_gt0_threads = MigrationWorkloadIgt(
> + IgtType.EXEC_RESET_LONG_SPIN_REUSE_MANY_PREEMPT_GT0_THREADS)
> +# Average exec time: 6-7s execute 2x
> +igt_exec_reset_long_spin_reuse_many_preempt_gt1_threads = MigrationWorkloadIgt(
> + IgtType.EXEC_RESET_LONG_SPIN_REUSE_MANY_PREEMPT_GT1_THREADS)
> +
> +# Average exec time: 12-13s - execute 1x
> +igt_exec_reset_long_spin_sys_reuse_many_preempt_threads = MigrationWorkloadIgt(
> + IgtType.EXEC_RESET_LONG_SPIN_SYS_REUSE_MANY_PREEMPT_THREADS)
> +igt_exec_reset_long_spin_comp_reuse_many_preempt_threads = MigrationWorkloadIgt(
> + IgtType.EXEC_RESET_LONG_SPIN_COMP_REUSE_MANY_PREEMPT_THREADS)
> +
> +# xe_exec_reset/cancel subtests:
> +# Average exec time: 5-7s execute 2x
> +igt_exec_reset_cancel = MigrationWorkloadIgt(
> + IgtType.EXEC_RESET_CANCEL)
> +igt_exec_reset_cancel_preempt = MigrationWorkloadIgt(
> + IgtType.EXEC_RESET_CANCEL_PREEMPT)
> +# Average exec time: 10-15s execute 1x
> +igt_exec_reset_cancel_timeslice_preempt = MigrationWorkloadIgt(
> + IgtType.EXEC_RESET_CANCEL_TIMESLICE_PREEMPT)
> +# Average exec time: 20-25s execute 1x
> +igt_exec_reset_cancel_timeslice_many_preempt = MigrationWorkloadIgt(
> + IgtType.EXEC_RESET_CANCEL_TIMESLICE_MANY_PREEMPT)
> +
> +# xe_exec_threads subtests (short, execute in a loop):
> +# Average exec time: <500ms
> +igt_exec_threads_basic = MigrationWorkloadIgt(
> + IgtType.EXEC_THREADS_BASIC)
> +igt_exec_threads_bal_basic = MigrationWorkloadIgt(
> + IgtType.EXEC_THREADS_BAL_BASIC)
> +# Average exec time: 1-2s
> +igt_exec_threads_cm_userptr_invalidate = MigrationWorkloadIgt(
> + IgtType.EXEC_THREADS_CM_USERPTR_INVALIDATE)
> +igt_exec_threads_bal_mixed_userptr_invalidate = MigrationWorkloadIgt(
> + IgtType.EXEC_THREADS_BAL_MIXED_USERPTR_INVALIDATE)
> +# Average exec time: 1-4s
> +igt_exec_threads_many_queues = MigrationWorkloadIgt(
> + IgtType.EXEC_THREADS_MANY_QUEUES)
> +
> +# xe_ccs subtest (short, execute in a loop):
> +# Average exec time: 200-600ms
> +igt_ccs_block_copy_compressed = MigrationWorkloadIgt(
> + IgtType.CCS_BLOCK_COPY_COMPRESSED)
> +
> +# xe_compute_preempt subtest (short, execute in a loop):
> +# Average exec time: 1.8-2s
> +igt_compute_preempt_many = MigrationWorkloadIgt(
> + IgtType.COMPUTE_PREEMPT_MANY)
> +
> +
> +class BaseTestBusyMigration:
> + """Base class for busy migration tests (with workload executed).
> +
> + The class provides implementation for VF save and restore subtests,
> + supports parametrization with a different VMs number and various IGT workload types.
> +
> + Dedicated for inheritance by separate child test classes with specific workload setup
> + to avoid bulk dynamic test variants execution with the same VM setup.
> + """
> +
> + # State save result flag: executing test_restore depends on prior test_save success
> + test_save_failed = True
> +
> + def __calibrate_igt_wl(self, vm: VirtualMachine, igt_wl: MigrationWorkloadIgt):
> + logger.info("Starting %s test loop calibration for migration workload", igt_wl.igt_test)
> + igt_exec = IgtExecutor(vm, igt_wl.igt_test)
> + assert igt_exec.check_results(), 'Calibration IGT run failed'
> +
> + results_log = igt_exec.get_results_log()
> + igt_exec_time: float = round(results_log['time_elapsed']['end'] - results_log['time_elapsed']['start'], 3)
> +
> + # Adjust IGT workload loop to execute longer than pre-save wait (with additional margins)
> + if igt_exec_time < IGT_INIT_DELAY + 2:
> + igt_wl.num_repeats = int(IGT_INIT_DELAY * 2 / igt_exec_time) + 1
> +
> + logger.debug("Calibrated IGT workload loop: %s iteration(s) x ~%ss", igt_wl.num_repeats, igt_exec_time)
> +
> + @pytest.fixture(scope='class', name='run_source_workload')
> + def fixture_run_source_workload(self, setup_vms, set_migration_wl):
> + ts: VmmTestingSetup = setup_vms
> + vm_src: VirtualMachine = ts.get_vm[0] # First VM as source
> + migration_wl = set_migration_wl # WSIM/IGT Workload variant
> +
> + if isinstance(migration_wl, MigrationWorkloadWsim):
> + wsim_file_path = ts.wsim_wl_dir / f'{migration_wl.workload_file}.wsim' # Workload descriptor file path
> + if not wsim_file_path.exists():
> + logger.error("gem_wsim workload file %s not available!", wsim_file_path)
> + raise exceptions.GemWsimError(f'gem_wsim workload file {wsim_file_path} not available!')
> +
> + # Run IGT wsim workload in pre-migration and check completion in post-migration
> + return GemWsim(vm_src, migration_wl.num_clients, migration_wl.num_repeats, workload=wsim_file_path)
> +
> + if isinstance(migration_wl, MigrationWorkloadIgt):
> + self.__calibrate_igt_wl(vm_src, migration_wl)
> + return IgtExecutor(vm_src, migration_wl.igt_test, migration_wl.num_repeats)
> +
> + logger.error("Invalid workload type passed to run_source_workload fixture")
> + raise exceptions.BenchError('Invalid workload type passed to run_source_workload fixture')
> +
> + @pytest.fixture(scope='function', name='setup_destination_vm')
> + def fixture_setup_destination_vm(self, setup_vms):
> + ts: VmmTestingSetup = setup_vms
> + vm_src: VirtualMachine = ts.get_vm[0] # First VM as a source
> + vm_dst: VirtualMachine = ts.get_vm[-1] # Last VM as a destination
> + num_vms = ts.get_num_vms()
> +
> + if num_vms == 1:
> + logger.debug("Single VM: the same source and destination VM instance")
> + assert vm_src == vm_dst
> + return vm_dst
> +
> + logger.debug("Multiple VMs: reload destination VM with the source image (with state snapshot)")
> +
> + if vm_src.is_running():
> + # QMP 'quit' is used for paused VM (cannot be powered off via guest-agent)
> + vm_src.quit()
> +
> + if vm_dst.is_running():
> + vm_dst.quit()
> + while vm_dst.is_running():
> + time.sleep(1) # VM usually doesn't terminate immediately
> +
> + # Re-start destination VM with an image containing a state snapshot
> + vm_dst.set_migration_source(vm_src.image)
> + vm_dst.poweron()
> +
> + return vm_dst
> +
> + def test_save(self, setup_vms, run_source_workload):
> + logger.info("Test VM busy migration: state save")
> + ts: VmmTestingSetup = setup_vms
> + vm_src: VirtualMachine = ts.get_vm[0] # First VM as source
> +
> + logger.debug("Execute migration in-flight workload on source VM")
> + migration_wl = run_source_workload
> + time.sleep(IGT_INIT_DELAY)
> + assert migration_wl.is_running(), 'IGT/wsim migration workload is not running on source VM'
> +
> + # Pause VM and save snapshot
> + logger.debug("Pause execution and save source VM state")
> + try:
> + vm_src.pause()
> + vm_src.save_state()
> + except exceptions.GuestError as exc:
> + logger.error("State save error: %s", exc)
> + assert False, 'VF migration failed on save'
> +
> + logger.debug("Resume execution on source VM")
> + vm_src.resume()
> +
> + assert migration_wl.check_results(), 'VF migration workload failed on source VM (post-save)'
> +
> + if ts.get_num_vms() > 1:
> + logger.debug("Multiple VMs: shutdown source VM")
> + vm_src.poweroff()
> +
> + BaseTestBusyMigration.test_save_failed = False
> +
> + def test_restore(self, setup_vms, setup_destination_vm, run_source_workload):
> + logger.info("Test VM busy migration: state restore")
> + if BaseTestBusyMigration.test_save_failed:
> + logger.error("State save failed - restore is pointless (fail immediately)")
> + assert False, 'test_save subtest failed - do not execute test_restore'
> +
> + ts: VmmTestingSetup = setup_vms
> + vm_dst: VirtualMachine = setup_destination_vm
> + migration_wl = run_source_workload # Get an instance of the IGT WL started in a save test
> +
> + # Patch the source IgtExecutor/GemWsim instance with the current VM
> + migration_wl.target = vm_dst
> + if isinstance(migration_wl, IgtExecutor):
> + # Clear IGT test results cache - remove post-save source VM results
> + # TODO: implement common IgtExecutor/GemWsim results clear interface to avoid instance type check
> + migration_wl.results.clear()
> +
> + # Load the source state snapshot
> + logger.debug("Restore source state on the destination VM")
> + vm_dst.load_state()
> + vm_dst.resume()
> +
> + # TODO: add sync to VM class
> + sync_value = random.randint(1, 0xFFFF)
> + assert vm_dst.ga.sync(sync_value)['return'] == sync_value
> +
> + assert migration_wl.is_running(), 'IGT/wsim migration workload is not running on destination VM'
> + time.sleep(IGT_RESTORE_DELAY)
> +
> + assert migration_wl.check_results(), 'VF migration workload failed on destination VM (post-restore)'
> +
> + logger.debug("Check driver health on host and destination VM")
> + assert driver_check(ts.host)
> + assert driver_check(vm_dst)
> +
> +
> +@pytest.fixture(scope='class', name='set_migration_wl')
> +def fixture_set_migration_wl(request):
> + """Set IGT/wsim descriptor file used as a migration workload in a TestBusyMigration[WL]."""
> + # Wsim workload variant provided as MigrationWorkload data class instance
> + return request.param
> +
> +
> +def idfn_workload(workload: MigrationWorkloadWsim):
> + """Add workload name to a test config ID in parametrized tests
> + (e.g. test_something[2VF-WL:workload_type-C:n-R:m].
> + """
> + return str(workload)
> +
> +
> +def set_test_config(test_variants: List[Tuple[int, VfProvisioningMode, VfSchedulingMode]],
> + max_vms: int = 2, wa_reduce_vf_lmem: bool = False) -> List[VmmTestingConfig]:
> + """Helper function to provide a parametrized test with a list of test configuration variants."""
> + test_configs: List[VmmTestingConfig] = []
> +
> + for config in test_variants:
> + (num_vfs, provisioning_mode, scheduling_mode) = config
> + test_configs.append(VmmTestingConfig(num_vfs, max_vms, provisioning_mode, scheduling_mode,
> + wa_reduce_vf_lmem=wa_reduce_vf_lmem))
> +
> + return test_configs
> +
> +
> +# Busy migration TCs with WSIM workload
> +@pytest.mark.parametrize('set_migration_wl', [wsim_short_preempt],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_full), ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationWsim1(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing short (5ms) preemptable batches.
> +
> + IGT/WSIM workload initiated pre-migration starts firing short submissions on each engine and
> + during the execution VM state is migrated (VM state snapshot is saved, then restored).
> + In the post-migration some additional batches are submitted.
> + Executed in the following VM number variants:
> + - single VF/VM: same VM acts as a source and destination.
> + - multiple VFs/VMs: the workload execution is initiated on the source VM,
> + then migrated and verified on the other, destination one.
> + """
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [wsim_short_nonpreempt],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationWsim2(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing short (5ms) non-preemptable batches.
> + Similar to TestBusyMigrationShort subtest, but emits non-preemptable batches.
> + """
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [wsim_long_preempt],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationWsim3(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing quite long (100ms) but preemptable batches.
> +
> + IGT/WSIM workload initiated pre-migration starts firing relatively long submissions and
> + during the execution VM state is migrated (VM state snapshot is saved, then restored).
> + In the post-migration some additional batches are submitted.
> + Executed in the following VM number variants:
> + - single VF/VM: same VM acts as a source and destination.
> + - multiple VFs/VMs: the workload execution is initiated on the source VM,
> + then migrated and verified on the other, destination one.
> + """
> +
> +
> +# TODO: convert to negative scenario.
> +# Test is expected to fail because non-premptable workload execution time > PT (VLK-81241)
> +@pytest.mark.parametrize('set_migration_wl', [wsim_long_nonpreempt],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationWsim4(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing quite long (100ms) non-preemptable batches.
> + Similar to TestBusyMigrationLong subtest, but emits non-preemptable batches.
> + """
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [wsim_idle_app],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestIdleAppMigration(BaseTestBusyMigration):
> + """Save-restore VM state with an idle VF but user application attached (contexts created).
> +
> + IGT/WSIM workload initiated pre-migration creates multiple user contexts and
> + does short submission on each but is idle during a save-restore operation,
> + then resumes post-migration to do more submissions on previously created contexts.
> + Executed in the following VM number variants:
> + - single VF/VM: same VM acts as a source and destination.
> + - multiple VFs/VMs: the workload execution is initiated on the source VM,
> + then migrated and verified on the other, destination one.
> + """
> +
> +# Busy migration TCs with IGT workload
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_many_preempt],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset1(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-many-preempt."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_many_preempt_media],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset2(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-many-preempt-media."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_many_preempt_threads],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_full),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset3(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-many-preempt-threads."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_many_preempt_gt0_threads],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset4(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-many-preempt-gt0-threads."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_many_preempt_gt1_threads],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset5(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-many-preempt-gt1-threads."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_reuse_many_preempt],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset6(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-reuse-many-preempt."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_reuse_many_preempt_media],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset7(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-reuse-many-preempt-media."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_reuse_many_preempt_threads],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_full),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset8(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-reuse-many-preempt-threads."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_reuse_many_preempt_gt0_threads],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset9(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-reuse-many-preempt-gt0-threads."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_reuse_many_preempt_gt1_threads],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset10(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-reuse-many-preempt-gt1-threads."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_sys_reuse_many_preempt_threads],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_full),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset11(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-sys-reuse-many-preempt-threads."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_comp_reuse_many_preempt_threads],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_full),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset12(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-comp-reuse-many-preempt-threads."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_cancel],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset13(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_reset@cancel."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_cancel_preempt],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset14(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_reset@cancel-preempt."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_cancel_timeslice_preempt],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset15(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_reset@cancel-timeslice-preempt."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_cancel_timeslice_many_preempt],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset16(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_reset@cancel-timeslice-many-preempt."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_threads_basic],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_full),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecThreads1(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_threads@threads-basic."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_threads_bal_basic],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecThreads2(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_threads@threads-bal-basic."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_threads_cm_userptr_invalidate],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecThreads3(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_threads@threads-cm-userptr-invalidate."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_threads_many_queues],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_full),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecThreads4(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_threads@threads-many-queues."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_threads_bal_mixed_userptr_invalidate],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecThreads5(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_exec_threads@threads-bal-mixed-userptr-invalidate."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_ccs_block_copy_compressed],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtCcs(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_ccs@block-copy-compressed."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_compute_preempt_many],
> + ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtComputePreempt(BaseTestBusyMigration):
> + """Save-restore VM state with VF busy executing IGT xe_compute_preempt@compute-preempt-many (CCS path)."""
> +
> +
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestIdleMigration:
> + """Save-restore VM state with an idle VF and no user application attached.
> +
> + IGT workload initiated and ended twice: pre- and post-migration, but not executing during a save-restore operation.
> + Test setup:
> + - NxVFs running NxVM instances (first (VM[0]) acts as source and a last (VM[N-1] as a destination)
> + - platform provisioned with the relevant vGPU profile M[N] (ATSM, ADLP) or C[N] (PVC)
> + - VF state is saved on the source VM and then restored on the destination VM instance
> + (in case of a single VF variant, source and destination is the same VM instance)
> + """
> +
> + @pytest.fixture(scope='function', name='setup_destination_vm')
> + def fixture_setup_destination_vm(self, setup_vms):
> + ts: VmmTestingSetup = setup_vms
> + vm_src: VirtualMachine = ts.get_vm[0] # First VM as a source
> + vm_dst: VirtualMachine = ts.get_vm[-1] # Last VM as a destination
> + num_vms = ts.get_num_vms()
> +
> + if num_vms == 1:
> + logger.debug("Single VM: the same source and destination VM instance")
> + assert vm_src == vm_dst
> + return vm_dst
> +
> + logger.debug("Multiple VMs: reload destination VM with the source image (with state snapshot)")
> +
> + if vm_src.is_running():
> + # QMP 'quit' is used for paused VM (cannot be powered off via guest-agent)
> + vm_src.quit()
> +
> + if vm_dst.is_running():
> + vm_dst.quit()
> + while vm_dst.is_running():
> + time.sleep(1) # VM usually doesn't terminate immediately
> +
> + # Re-start destination VM with an image containing a state snapshot
> + vm_dst.set_migration_source(vm_src.image)
> + vm_dst.poweron()
> +
> + return vm_dst
> +
> + def test_save(self, setup_vms):
> + logger.info("Test VM idle migration: state save")
> + ts: VmmTestingSetup = setup_vms
> + vm_src: VirtualMachine = ts.get_vm[0] # First VM as source
> +
> + # Run some interactive program (not returning, as vim) to verify state after migration
> + src_proc = ShellExecutor(vm_src, 'vim migrate.txt')
> + source_proc = vm_src.execute_status(src_proc.pid)
> + logger.debug("Source process: %s", source_proc)
> + assert source_proc.exited is False, 'Source process is not running'
> +
> + logger.debug("Execute pre-migration workload on source VM")
> + assert igt_run_check(vm_src, IgtType.EXEC_STORE)
> +
> + # Pause VM and save snapshot
> + logger.debug("Pause execution and save VM state")
> + try:
> + vm_src.pause()
> + vm_src.save_state()
> + except exceptions.GuestError as exc:
> + logger.error("State save error: %s", exc)
> + assert False, 'VF migration failed on save'
> +
> + def test_restore(self, setup_vms, setup_destination_vm):
> + logger.info("Test VM idle migration: state restore")
> + ts: VmmTestingSetup = setup_vms
> + vm_dst: VirtualMachine = setup_destination_vm
> +
> + # Load the source state snapshot
> + logger.debug("Restore source state on the destination VM")
> + vm_dst.load_state()
> + vm_dst.resume()
> +
> + # Verify program initiated on source VM is stil running after migration
> + pgrep_dst = ShellExecutor(vm_dst, 'pgrep -f "vim migrate.txt"')
> + pgrep_dst_result = vm_dst.execute_wait(pgrep_dst.pid)
> + assert pgrep_dst_result.exit_code == 0, 'Source process (vim) not found'
> + restored_proc = vm_dst.execute_status(int(pgrep_dst_result.stdout))
> + logger.debug("Restored process: %s", restored_proc)
> + assert restored_proc.exited is False, 'Restored process is not running'
> +
> + logger.debug("Execute post-migration workload on destination VM")
> + assert igt_run_check(vm_dst, IgtType.EXEC_STORE)
> +
> + logger.debug("Check driver health on host and destination VM")
> + assert driver_check(ts.host)
> + assert driver_check(vm_dst)
> +
> +
> +class ResfixWaitStage(enum.IntEnum):
> + # Resfix stopper checkpoints
> + VF_MIGRATION_CONTINUE = 0
> + VF_MIGRATION_WAIT_BEFORE_RESFIX_START = 1 << 0
> + VF_MIGRATION_WAIT_BEFORE_FIXUPS = 1 << 1
> + VF_MIGRATION_WAIT_BEFORE_RESTART_JOBS = 1 << 2
> + VF_MIGRATION_WAIT_BEFORE_RESFIX_DONE = 1 << 3
> +
> +
> +class MigrationToRestore(enum.Enum):
> + FIRST = 1
> + SECOND = 2
> +
> +
> +@dataclass
> +class DoubleMigrationConfig:
> + resfix_stoppers: ResfixWaitStage # Stage for migration RESFIX stop
> + migration_to_restore: MigrationToRestore # Migration snapshot to be restored after doubled save
> +
> + def __str__(self) -> str:
> + return f'RS:{hex(self.resfix_stoppers)}-MR:{self.migration_to_restore}'
> +
> +
> +double_migration_1_resfix_1 = DoubleMigrationConfig(
> + ResfixWaitStage.VF_MIGRATION_WAIT_BEFORE_RESFIX_START, MigrationToRestore.FIRST)
> +double_migration_1_resfix_2 = DoubleMigrationConfig(
> + ResfixWaitStage.VF_MIGRATION_WAIT_BEFORE_FIXUPS, MigrationToRestore.FIRST)
> +double_migration_1_resfix_3 = DoubleMigrationConfig(
> + ResfixWaitStage.VF_MIGRATION_WAIT_BEFORE_RESTART_JOBS, MigrationToRestore.FIRST)
> +double_migration_1_resfix_4 = DoubleMigrationConfig(
> + ResfixWaitStage.VF_MIGRATION_WAIT_BEFORE_RESFIX_DONE, MigrationToRestore.FIRST)
> +
> +
> +double_migration_2_resfix_1 = DoubleMigrationConfig(
> + ResfixWaitStage.VF_MIGRATION_WAIT_BEFORE_RESFIX_START, MigrationToRestore.SECOND)
> +double_migration_2_resfix_2 = DoubleMigrationConfig(
> + ResfixWaitStage.VF_MIGRATION_WAIT_BEFORE_FIXUPS, MigrationToRestore.SECOND)
> +double_migration_2_resfix_3 = DoubleMigrationConfig(
> + ResfixWaitStage.VF_MIGRATION_WAIT_BEFORE_RESTART_JOBS, MigrationToRestore.SECOND)
> +double_migration_2_resfix_4 = DoubleMigrationConfig(
> + ResfixWaitStage.VF_MIGRATION_WAIT_BEFORE_RESFIX_DONE, MigrationToRestore.SECOND)
> +
> +class BaseTestDoubleMigration:
> + """Base class for double migration tests.
> + Test scenario triggers VF re-migrate while the initial restore (resources fixup) is still ongoing.
> +
> + Save-load and immediately save again before the initial migration completes (prior to resfix done).
> + Post migration resources fixup is delayed via KMD debug hook to initiate the 2nd save.
> + Tests Xe KMD corner case where two migration notifications must be handled.
> + IGT/WSIM workload is executing during the migration (started prior to 1st save).
> +
> + The class provides implementation for VF save and restore-save subtests,
> + supports parametrization with a different VMs number and double migration scenario variants.
> +
> + Dedicated for inheritance by separate child test classes with specific
> + double migration test scenarios configurations:
> + - stopping RESFIX in a different stage
> + - restoring from initial (1st) or latter (2nd) migration
> + """
> +
> + def __set_debugfs_resfix_stoppers(self, vm: VirtualMachine, stage: ResfixWaitStage):
> + """Set resfix_stoppers:
> + predefined checkpoints that allow the migration process to pause at specific stages.
> + Each state will pause with a 1-second delay per iteration, continuing until
> + its corresponding bit is cleared.
> + Debug hook path: /sys/kernel/debug/dri/<card>/gt0/vf/resfix_stoppers
> + """
> + vf_driver = vm.get_dut().driver
> + vf_driver.write_debugfs(f'{vf_driver.debugfs_path}/gt0/vf/resfix_stoppers', str(stage))
> +
> + resfix_stoppers = vf_driver.read_debugfs(f'{vf_driver.debugfs_path}/gt0/vf/resfix_stoppers').strip()
> + logger.debug("[%s] Set migration resfix stoppers: %s (%s)"
> + "\nPause checkpoints:"
> + "\n\tVF_MIGRATION_WAIT_BEFORE_RESFIX_START: BIT(0)"
> + "\n\tVF_MIGRATION_WAIT_BEFORE_FIXUPS: BIT(1)"
> + "\n\tVF_MIGRATION_WAIT_BEFORE_RESTART_JOBS: BIT(2)"
> + "\n\tVF_MIGRATION_WAIT_BEFORE_RESFIX_DONE: BIT(3)"
> + "\n\tResume execution: 0",
> + vm, resfix_stoppers, bin(int(resfix_stoppers, 16)))
> +
> + return int(resfix_stoppers, 16) == stage
> +
> + def __is_resfix_stopped(self, vm: VirtualMachine):
> + vf_driver = vm.get_dut().driver
> + resfix_stoppers = vf_driver.read_debugfs(f'{vf_driver.debugfs_path}/gt0/vf/resfix_stoppers').strip()
> +
> + return int(resfix_stoppers, 16) != 0
> +
> + @pytest.fixture(scope='function', name='set_resfix_stoppers')
> + def fixture_set_resfix_stoppers(self, setup_vms, set_double_migration_config):
> + ts: VmmTestingSetup = setup_vms
> + migration_config: DoubleMigrationConfig = set_double_migration_config
> + vm_src: VirtualMachine = ts.get_vm[0] # First VM as a source
> +
> + return self.__set_debugfs_resfix_stoppers(vm_src, migration_config.resfix_stoppers)
> +
> + @pytest.fixture(scope='function', name='clear_resfix_stoppers')
> + def fixture_clear_resfix_stoppers(self, setup_vms):
> + ts: VmmTestingSetup = setup_vms
> + yield
> +
> + for vm in ts.get_vm:
> + if vm.is_running() and self.__is_resfix_stopped(vm):
> + logger.info("Teardown fixture - clear remaining resfix stoppers")
> + self.__set_debugfs_resfix_stoppers(vm, ResfixWaitStage.VF_MIGRATION_CONTINUE)
> +
> + @pytest.fixture(scope='class', name='run_source_workload')
> + def fixture_run_source_workload(self, setup_vms):
> + ts: VmmTestingSetup = setup_vms
> + vm_src: VirtualMachine = ts.get_vm[0] # First VM as source
> + migration_wl: MigrationWorkloadWsim = wsim_short_preempt # Workload variant
> + wsim_file_path = ts.wsim_wl_dir / f'{migration_wl.workload_file}.wsim' # Workload descriptor file path
> + if not wsim_file_path.exists():
> + logger.error("gem_wsim workload file %s not available!", wsim_file_path)
> + raise exceptions.GemWsimError(f'gem_wsim workload file {wsim_file_path} not available!')
> +
> + # Run IGT wsim workload in pre-migration and check completion in post-migration
> + return GemWsim(vm_src, migration_wl.num_clients, migration_wl.num_repeats, workload=wsim_file_path)
> +
> + @pytest.fixture(scope='function', name='setup_destination_vm')
> + def fixture_setup_destination_vm(self, setup_vms):
> + ts: VmmTestingSetup = setup_vms
> + vm_src: VirtualMachine = ts.get_vm[0] # First VM as a source
> + vm_dst: VirtualMachine = ts.get_vm[-1] # Last VM as a destination
> + num_vms = ts.get_num_vms()
> +
> + if num_vms == 1:
> + logger.debug("Single VM: the same source and destination VM instance")
> + assert vm_src == vm_dst
> + vm_dst.pause()
> + return vm_dst
> +
> + logger.debug("Multiple VMs: reload destination VM with the source image (with state snapshot)")
> +
> + if vm_src.is_running():
> + # QMP 'quit' is used for paused VM (cannot be powered off via guest-agent)
> + vm_src.quit()
> +
> + if vm_dst.is_running():
> + vm_dst.quit()
> + while vm_dst.is_running():
> + time.sleep(1) # VM usually doesn't terminate immediately
> +
> + # Re-start destination VM with an image containing a state snapshot
> + vm_dst.set_migration_source(vm_src.image)
> + vm_dst.poweron()
> +
> + return vm_dst
> +
> + def test_save(self, setup_vms, run_source_workload, set_resfix_stoppers):
> + logger.info("Test VM double migration: 1st state save")
> + ts: VmmTestingSetup = setup_vms
> + vm_src: VirtualMachine = ts.get_vm[0] # First VM as source
> + assert set_resfix_stoppers, 'Failed to set migration resfix stoppers'
> +
> + logger.debug("Execute throughout-migration workload on source VM")
> + migration_wl: GemWsim = run_source_workload
> + time.sleep(IGT_INIT_DELAY)
> + assert migration_wl.is_running(), 'IGT/wsim migration workload is not running on source VM'
> +
> + # Pause source VM and save snapshot
> + logger.debug("Pause execution and save source VM state (snapshot #1)")
> + try:
> + vm_src.pause()
> + vm_src.save_state() # snapshot #1
> + except exceptions.GuestError as exc:
> + logger.error("State save error: %s", exc)
> + assert False, 'VF migration failed on save'
> +
> + def test_restore_save(self, setup_destination_vm, run_source_workload,
> + set_double_migration_config, clear_resfix_stoppers):
> + logger.info("Test VM double migration: state restore and 2nd save prior recovery is done")
> + vm_dst: VirtualMachine = setup_destination_vm
> + migration_wl: GemWsim = run_source_workload # Get an instance of the IGT WL started in a save test
> + migration_config: DoubleMigrationConfig = set_double_migration_config
> +
> + # Patch the source IgtExecutor instance with the current VM and clear results cache
> + migration_wl.target = vm_dst
> +
> + # Load the source state snapshot
> + logger.debug("Restore source state on the destination VM (snapshot #1)")
> + vm_dst.load_state() # snapshot #1
> + vm_dst.resume()
> +
> + time.sleep(3) # Wait a bit for the migration recovery fires
> + logger.debug("Save 2nd VM state (snapshot #2) while the 1st migration recovery still in progress")
> +
> + if migration_config.migration_to_restore is MigrationToRestore.FIRST:
> + # VM pause/resume is implicitly called by save,
> + # snapshot #1 recovery is continued immediately after snapshot #2 save completes
> + vm_dst.save_state() # snapshot #2
> + logger.info("Continue source VM state recovery (snapshot #1)")
> +
> + if migration_config.migration_to_restore is MigrationToRestore.SECOND:
> + # Include explicit VM pause/resume, snapshot #2 load shall immediately follow it's save,
> + # to not allow continuation of state recovery of snapshot #1.
> + vm_dst.pause()
> + vm_dst.save_state() # snapshot #2
> + logger.info("Load state and re-start state recovery of 2nd saved state (snapshot #2)")
> + vm_dst.load_state() # snapshot #2
> + vm_dst.resume()
> +
> + logger.info("Continue migration recovery - clear resfix stoppers")
> + self.__set_debugfs_resfix_stoppers(vm_dst, ResfixWaitStage.VF_MIGRATION_CONTINUE)
> +
> + logger.debug("Check migration in-flight workload after destination VM save")
> + time.sleep(IGT_RESTORE_DELAY)
> + assert migration_wl.check_results(), 'VF migration workload failed on destination VM (post-restore)'
> +
> +
> +@pytest.fixture(scope='class', name='set_double_migration_config')
> +def fixture_set_double_migration_config(request):
> + """Set migration recovery wait stage for double migration test and number of snapshot to restore."""
> + # Provide list of DoubleMigrationConfig instances to setup the test.
> + return request.param
> +
> +
> +def idfn_double_migration(config: DoubleMigrationConfig):
> + """Add double migration settings to a test config ID in parametrized tests
> + (e.g. test_something[2VF-RS:resfix_stopper-MR:snapshot_to_restore].
> + """
> + return str(config)
> +
> +
> +@pytest.mark.parametrize('set_double_migration_config', [double_migration_1_resfix_1],
> + ids=idfn_double_migration, indirect=['set_double_migration_config'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_profiles, wa_reduce_vf_lmem=True),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestDoubleMigration1Resfix1(BaseTestDoubleMigration):
> + """Double migration test restoring the first snapshot (the former, initial migration):
> + save snapshot #1 -> load snapshot #1 -> save snapshot #2 (during #1 recovery) -> continue to recover #1
> + Stop resfix on VF_MIGRATION_WAIT_BEFORE_RESFIX_START (BIT0) checkpoint to initiate 2nd save.
> +
> + W/A: reduce VF VRAM quota to speed up the 2nd save (to avoid time-out).
> + """
> +
> +
> +@pytest.mark.parametrize('set_double_migration_config', [double_migration_1_resfix_2],
> + ids=idfn_double_migration, indirect=['set_double_migration_config'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_profiles, wa_reduce_vf_lmem=True),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestDoubleMigration1Resfix2(BaseTestDoubleMigration):
> + """Double migration test restoring the first snapshot (the former, initial migration):
> + save snapshot #1 -> load snapshot #1 -> save snapshot #2 (during #1 recovery) -> continue to recover #1
> + Stop resfix on VF_MIGRATION_WAIT_BEFORE_FIXUPS (BIT1) checkpoint to initiate 2nd save.
> +
> + W/A: reduce VF VRAM quota to speed up the 2nd save (to avoid time-out).
> + """
> +
> +
> +@pytest.mark.parametrize('set_double_migration_config', [double_migration_1_resfix_3],
> + ids=idfn_double_migration, indirect=['set_double_migration_config'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_profiles, wa_reduce_vf_lmem=True),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestDoubleMigration1Resfix3(BaseTestDoubleMigration):
> + """Double migration test restoring the first snapshot (the former, initial migration):
> + save snapshot #1 -> load snapshot #1 -> save snapshot #2 (during #1 recovery) -> continue to recover #1
> + Stop resfix on VF_MIGRATION_WAIT_BEFORE_RESTART_JOBS (BIT2) checkpoint to initiate 2nd save.
> +
> + W/A: reduce VF VRAM quota to speed up the 2nd save (to avoid time-out).
> + """
> +
> +
> +@pytest.mark.parametrize('set_double_migration_config', [double_migration_1_resfix_4],
> + ids=idfn_double_migration, indirect=['set_double_migration_config'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_profiles, wa_reduce_vf_lmem=True),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestDoubleMigration1Resfix4(BaseTestDoubleMigration):
> + """Double migration test restoring the first snapshot (the former, initial migration):
> + save snapshot #1 -> load snapshot #1 -> save snapshot #2 (during #1 recovery) -> continue to recover #1
> + Stop resfix on VF_MIGRATION_WAIT_BEFORE_RESFIX_DONE (BIT3) checkpoint to initiate 2nd save.
> +
> + W/A: reduce VF VRAM quota to speed up the 2nd save (to avoid time-out).
> + """
> +
> +
> +@pytest.mark.parametrize('set_double_migration_config', [double_migration_2_resfix_1],
> + ids=idfn_double_migration, indirect=['set_double_migration_config'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_profiles, wa_reduce_vf_lmem=True),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestDoubleMigration2Resfix1(BaseTestDoubleMigration):
> + """Double migration test restoring the second snapshot (the latter migration):
> + save snapshot #1 -> load snapshot #1 -> save snapshot #2 (during #1 recovery) -> load and recover #2
> + Stop resfix on VF_MIGRATION_WAIT_BEFORE_RESFIX_START (BIT0) checkpoint to initiate 2nd save.
> +
> + W/A: reduce VF VRAM quota to speed up the 2nd save (to avoid time-out).
> + """
> +
> +
> +@pytest.mark.parametrize('set_double_migration_config', [double_migration_2_resfix_2],
> + ids=idfn_double_migration, indirect=['set_double_migration_config'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_profiles, wa_reduce_vf_lmem=True),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestDoubleMigration2Resfix2(BaseTestDoubleMigration):
> + """Double migration test restoring the second snapshot (the latter migration):
> + save snapshot #1 -> load snapshot #1 -> save snapshot #2 (during #1 recovery) -> load and recover #2
> + Stop resfix on VF_MIGRATION_WAIT_BEFORE_FIXUPS (BIT1) checkpoint to initiate 2nd save.
> +
> + W/A: reduce VF VRAM quota to speed up the 2nd save (to avoid time-out).
> + """
> +
> +
> +@pytest.mark.parametrize('set_double_migration_config', [double_migration_2_resfix_3],
> + ids=idfn_double_migration, indirect=['set_double_migration_config'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_profiles, wa_reduce_vf_lmem=True),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestDoubleMigration2Resfix3(BaseTestDoubleMigration):
> + """Double migration test restoring the second snapshot (the latter migration):
> + save snapshot #1 -> load snapshot #1 -> save snapshot #2 (during #1 recovery) -> load and recover #2
> + Stop resfix on VF_MIGRATION_WAIT_BEFORE_RESTART_JOBS (BIT2) checkpoint to initiate 2nd save.
> +
> + W/A: reduce VF VRAM quota to speed up the 2nd save (to avoid time-out).
> + """
> +
> +
> +@pytest.mark.parametrize('set_double_migration_config', [double_migration_2_resfix_4],
> + ids=idfn_double_migration, indirect=['set_double_migration_config'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_profiles, wa_reduce_vf_lmem=True),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestDoubleMigration2Resfix4(BaseTestDoubleMigration):
> + """Double migration test restoring the second snapshot (the latter migration):
> + save snapshot #1 -> load snapshot #1 -> save snapshot #2 (during #1 recovery) -> load and recover #2
> + Stop resfix on VF_MIGRATION_WAIT_BEFORE_RESFIX_DONE (BIT3) checkpoint to initiate 2nd save.
> +
> + W/A: reduce VF VRAM quota to speed up the 2nd save (to avoid time-out).
> + """
> +
> +
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> + ids=idfn_test_config, indirect=['setup_vms'])
> +class TestCheckpoint:
> + """Verify a state can be saved for the future use and then loaded at the previous checkpoint."""
> +
> + @pytest.fixture(scope='function', name='setup_destination_vm')
> + def fixture_setup_destination_vm(self, setup_vms):
> + ts: VmmTestingSetup = setup_vms
> + vm_src: VirtualMachine = ts.get_vm[0] # First VM as a source
> + vm_dst: VirtualMachine = ts.get_vm[-1] # Last VM as a destination
> + num_vms = ts.get_num_vms()
> +
> + if num_vms == 1:
> + logger.debug("Single VM: the same source and destination VM instance")
> + assert vm_src == vm_dst
> + return vm_dst
> +
> + logger.debug("Multiple VMs: restart destination VM with the source image (with state checkpoint)")
> + vm_dst.poweroff()
> + # Source qcow2 must be copied because multiple VMs cannot run with the same image file
> + vm_dst.set_migration_source(duplicate_vm_image(vm_src.image))
> + vm_dst.poweron()
> + vm_dst.resume()
> + assert modprobe_driver_run_check(vm_dst)
> +
> + return vm_dst
> +
> + @pytest.fixture(scope='class', name='run_source_workload')
> + def fixture_run_source_workload(self, setup_vms):
> + ts: VmmTestingSetup = setup_vms
> + vm_src: VirtualMachine = ts.get_vm[0] # First VM as source
> +
> + # Run IGT workload to check before and after a state checkpoint
> + return IgtExecutor(vm_src, IgtType.SPIN_BATCH)
> +
> + def test_save(self, setup_vms, run_source_workload):
> + logger.info("Test VM state checkpoint save")
> + ts: VmmTestingSetup = setup_vms
> + vm_src: VirtualMachine = ts.get_vm[0] # First VM as source
> + igt_src: IgtExecutor = run_source_workload
> +
> + # Save state checkpoint
> + logger.debug("Save VM state checkpoint")
> + try:
> + vm_src.save_state()
> + except exceptions.GuestError as exc:
> + logger.error("State save error: %s", exc)
> + assert False, 'VF migration failed on save'
> +
> + # Verify workload submitted prior to the state checkpoint succeeds
> + assert igt_check(igt_src), 'Source IGT workload has failed'
> +
> + logger.debug("Check driver health on host and source VM")
> + assert driver_check(ts.host)
> + assert driver_check(vm_src)
> +
> + def test_load(self, setup_vms, setup_destination_vm, run_source_workload):
> + logger.info("Test VM state checkpoint load")
> + ts: VmmTestingSetup = setup_vms
> + vm_dst: VirtualMachine = setup_destination_vm
> + igt_src: IgtExecutor = run_source_workload # Get an instance of the IGT WL started in a save test
> +
> + # Patch the source IgtExecutor instance with the current VM and clear results cache
> + igt_src.target = vm_dst
> + igt_src.results.clear()
> +
> + # Workload submitted before the checkpoint should not be active before load
> + logger.debug("Verify IGT workload is not executing prior to the state restore (expected pgrep error)")
> + assert not cmd_run_check(vm_dst, 'pgrep igt_runner'), 'IGT workload is (unexpectedly) running'
> +
> + # Load previously saved state checkpoint and resume on destination VM
> + logger.debug("Load VM state checkpoint")
> + vm_dst.load_state()
> +
> + # Workload submitted before the checkpoint should be restored in running state after load
> + logger.debug("Verify IGT workload is executing again after the state restore")
> + assert not igt_src.status().exited, 'IGT workload is not running after checkpoint load'
> + assert igt_check(igt_src), 'IGT workload loaded on checkpoint has failed'
> +
> + logger.debug("Check driver health on host and destination VM")
> + assert driver_check(ts.host)
> + assert driver_check(vm_dst)
> +
> +
> +def test_2vm_pause_resume(create_1host_2vm):
> + """
> + VM/VF pause-resume does not affect workload execution:
> + - 2xVFs running 2xVM instance
> + - both VFs auto-provisioned, running IGT workloads
> + - 1st VM/VF is paused and resumed (but VF state is not saved/loaded)
> + - 2nd VM/VF workload should not be interrupted
> + - IGT workloads shall finish successfully on both VMs
> + """
> + ts: VmmTestingSetup = create_1host_2vm
> + host: Host = ts.host
> + vm0: VirtualMachine = ts.get_vm[0]
> + vm1: VirtualMachine = ts.get_vm[1]
> + assert driver_check(host)
> +
> + num_vfs = ts.testing_config.num_vfs
> + assert ts.get_dut().create_vf(num_vfs) == num_vfs
> +
> + vf1, vf2 = ts.get_dut().get_vfs_bdf(1, 2)
> + vm0.assign_vf(vf1)
> + vm1.assign_vf(vf2)
> + ts.poweron_vms()
> +
> + pause_vf_num = 1
> +
> + assert modprobe_driver_run_check(vm0)
> + assert modprobe_driver_run_check(vm1)
> +
> + logger.debug("Submit IGT WL (gem_wsim) on VM0")
> + iterations = 3000 # 3k iterations of 10ms WLs give 30s total expected time
> + expected_elapsed_sec = ONE_CYCLE_DURATION_MS * iterations / MS_IN_SEC
> + gem_wsim_vm0 = GemWsim(vm0, 1, iterations, PREEMPT_10MS_WORKLOAD)
> +
> + # Allow wsim WL to run some time
> + time.sleep(IGT_INIT_DELAY)
> + assert gem_wsim_vm0.is_running()
> +
> + logger.debug("Submit IGT WL (gem_spin_batch) on VM1")
> + igt_vm1 = IgtExecutor(vm1, IgtType.SPIN_BATCH)
> +
> + # Special handling of pausing VMs with infinite ExecQuanta - refer to SAS for details
> + logger.debug("Set VF1 EQ/PF before the pause")
> + ts.get_dut().driver.set_exec_quantum_ms(pause_vf_num, 1)
> + ts.get_dut().driver.set_preempt_timeout_us(pause_vf_num, 100)
> +
> + logger.debug("Pause execution on VM0/VF1")
> + vm0.pause()
> +
> + assert igt_check(igt_vm1)
> + logger.debug("VM1 IGT WL (not paused) finished successfully")
> +
> + logger.debug("Resume execution on VM0/VF1")
> + vm0.resume()
> +
> + logger.debug("Reset VF1 EQ/PF to the initial values (infinite) after resume")
> + ts.get_dut().driver.set_exec_quantum_ms(pause_vf_num, 0)
> + ts.get_dut().driver.set_preempt_timeout_us(pause_vf_num, 0)
> +
> + result_vm0 = gem_wsim_vm0.wait_results() # Throws exception on wsim fail
> + assert expected_elapsed_sec * 0.8 < result_vm0.elapsed_sec < expected_elapsed_sec * 1.5
> + logger.debug("VM0 IGT WL (paused-resumed) finished successfully")
> +
> + # Check host and VM health status after pause-resume transition
> + assert driver_check(host)
> + assert driver_check(vm0)
> + assert driver_check(vm1)
> +
> +
> +def test_1vm_save_restore_no_driver(create_1host_1vm):
> + """
> + Save/restore single VM state with no guest driver loaded:
> + - 1xVFs running 1xVM instance (single VM acts as source and destination)
> + - platform provisioned with vGPU profile M1 (ATSM, ADLP) or C1 (PVC)
> + - VF state saved and then restored on the same VM instance
> + - driver probed on VM after the resume, IGT workload executed
> + """
> + ts: VmmTestingSetup = create_1host_1vm
> + host: Host = ts.host
> + vm: VirtualMachine = ts.get_vm[0]
> + assert driver_check(host)
> +
> + num_vfs = ts.testing_config.num_vfs
> + assert ts.get_dut().create_vf(num_vfs) == num_vfs
> +
> + vf = ts.get_dut().get_vf_bdf(1)
> + vm.assign_vf(vf)
> +
> + vm.poweron()
> +
> + # Run some interactive program (not returning, as vim) to verify state after migration
> + src_proc = ShellExecutor(vm, 'vim migrate.txt')
> + src_pid = src_proc.pid
> +
> + # Pause VM and save snapshot
> + logger.debug("Pause execution and save VM state")
> + try:
> + vm.pause()
> + vm.save_state()
> + except exceptions.GuestError as exc:
> + logger.error("State save error: %s", exc)
> + assert False, 'VF migration failed on save'
> +
> + # Load previously saved snapshot and resume the same VM
> + logger.debug("Load state on the same VM instance")
> + vm.load_state()
> + vm.resume()
> +
> + # Verify program initiated on source VM is stil running after migration
> + migrated_proc = vm.execute_status(src_pid)
> + logger.debug("Migrated process: %s", migrated_proc)
> + assert migrated_proc.exited is False, 'Migrated process is not running after VM snapshot load'
> +
> + logger.debug("Probe driver and execute workload on VM")
> + assert modprobe_driver_run_check(vm)
> + assert igt_run_check(vm, IgtType.EXEC_STORE)
> +
> + logger.debug("Check driver health on host and VM")
> + assert driver_check(host)
> + assert driver_check(vm)
LGTM,
Reviewed-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
next prev parent reply other threads:[~2026-05-14 13:21 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-16 8:35 [PATCH i-g-t 0/3] vmtb: SR-IOV VF migration test suite Adam Miszczak
2026-04-16 8:35 ` [PATCH i-g-t 1/3] tools/vmtb: Define IGT tests used as VF migration workloads Adam Miszczak
2026-04-20 18:27 ` Kamil Konieczny
2026-05-14 13:18 ` Bernatowicz, Marcin
2026-04-16 8:35 ` [PATCH i-g-t 2/3] tools/vmtb: Provide VF busy migration IGT/gem_wsim workloads Adam Miszczak
2026-05-14 13:20 ` Bernatowicz, Marcin
2026-04-16 8:35 ` [PATCH i-g-t 3/3] tools/vmtb: Add VF migration tests Adam Miszczak
2026-05-14 13:21 ` Bernatowicz, Marcin [this message]
2026-04-16 15:40 ` ✓ i915.CI.BAT: success for vmtb: SR-IOV VF migration test suite Patchwork
2026-04-16 15:50 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-16 17:34 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-04-17 3:47 ` ✗ i915.CI.Full: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bc707c43-dd34-4ab1-8ce3-dde055b2fc2c@linux.intel.com \
--to=marcin.bernatowicz@linux.intel.com \
--cc=adam.miszczak@linux.intel.com \
--cc=igt-dev@lists.freedesktop.org \
--cc=kamil.konieczny@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox