Igt-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Bernatowicz, Marcin" <marcin.bernatowicz@linux.intel.com>
To: Adam Miszczak <adam.miszczak@linux.intel.com>,
	igt-dev@lists.freedesktop.org
Cc: kamil.konieczny@linux.intel.com
Subject: Re: [PATCH i-g-t 3/3] tools/vmtb: Add VF migration tests
Date: Thu, 14 May 2026 15:21:19 +0200	[thread overview]
Message-ID: <bc707c43-dd34-4ab1-8ce3-dde055b2fc2c@linux.intel.com> (raw)
In-Reply-To: <20260416083544.2441874-4-adam.miszczak@linux.intel.com>


On 4/16/2026 10:35 AM, Adam Miszczak wrote:
> Introduce a comprehensive VF migration (state save/restore) test suite,
> covering the following scenarios:
> - idle migration: no GPU workload active during save/restore
> - idle app migration: GPU contexts created but idle during save/restore
> - busy migration (WSIM): short/long preemptable and non-preemptable batches
> - busy migration (IGT): xe_exec_reset, xe_exec_threads, xe_ccs,
>    xe_compute_preempt workloads in multiple variants
> - double migration: re-migration while post-restore resource fixup
>    (resfix) is in progress, tested at few KMD debug checkpoints
> - checkpointing: restore a VM state saved at earlier point of time
> - auxiliary: basic pause/resume exercise, migrate without VF driver loaded
>
> Tests support execution in diversity of configuration variants:
> VF/VM counts, auto provisioning or vGPU profiles and scheduling modes.
>
> Signed-off-by: Adam Miszczak <adam.miszczak@linux.intel.com>
> ---
>   tools/vmtb/vmm_flows/test_migration.py | 1199 ++++++++++++++++++++++++
>   1 file changed, 1199 insertions(+)
>   create mode 100644 tools/vmtb/vmm_flows/test_migration.py
>
> diff --git a/tools/vmtb/vmm_flows/test_migration.py b/tools/vmtb/vmm_flows/test_migration.py
> new file mode 100644
> index 000000000..8a3f10d52
> --- /dev/null
> +++ b/tools/vmtb/vmm_flows/test_migration.py
> @@ -0,0 +1,1199 @@
> +# SPDX-License-Identifier: MIT
> +# Copyright © 2024-2026 Intel Corporation
> +
> +import enum
> +import logging
> +import random
> +import time
> +from dataclasses import dataclass
> +from typing import List, Tuple
> +
> +import pytest
> +
> +from bench import exceptions
> +from bench.configurators.vgpu_profile_config import VfProvisioningMode, VfSchedulingMode
> +from bench.executors.gem_wsim import ONE_CYCLE_DURATION_MS, PREEMPT_10MS_WORKLOAD, GemWsim
> +from bench.executors.igt import IgtExecutor, IgtType
> +from bench.executors.shell import ShellExecutor
> +from bench.helpers.helpers import (cmd_run_check, driver_check,
> +                                   duplicate_vm_image, igt_check,
> +                                   igt_run_check, modprobe_driver_run_check)
> +from bench.machines.host import Host
> +from bench.machines.virtual.vm import VirtualMachine
> +from vmm_flows.conftest import (VmmTestingConfig, VmmTestingSetup,
> +                                idfn_test_config)
> +
> +logger = logging.getLogger(__name__)
> +
> +IGT_INIT_DELAY = 6 # Time between WL start and VM pause (pre-save)
> +IGT_RESTORE_DELAY = 3 # Time between VM resume and WL status check (post-restore)
> +MS_IN_SEC = 1000
> +
> +
> +# Full configuration variant: 1xVF, 2xVF and MAXxVF with auto and vGPU profiles provisioning
> +# TODO: add max VFs variants
> +test_variants_full = [(1, VfProvisioningMode.AUTO, VfSchedulingMode.DEFAULT_PROFILE),
> +                      (2, VfProvisioningMode.AUTO, VfSchedulingMode.DEFAULT_PROFILE),
> +                      (1, VfProvisioningMode.VGPU_PROFILE, VfSchedulingMode.DEFAULT_PROFILE),
> +                      (2, VfProvisioningMode.VGPU_PROFILE, VfSchedulingMode.DEFAULT_PROFILE)]
> +
> +
> +# Basic configuration variant: 1xVF and 2xVF with auto provisioning
> +test_variants_basic = [(1, VfProvisioningMode.AUTO, VfSchedulingMode.DEFAULT_PROFILE),
> +                       (2, VfProvisioningMode.AUTO, VfSchedulingMode.DEFAULT_PROFILE)]
> +
> +
> +# vGPU profiles configuration variant: 1xVF and 2xVF with vGPU profiles provisioning
> +test_variants_profiles = [(1, VfProvisioningMode.VGPU_PROFILE, VfSchedulingMode.DEFAULT_PROFILE),
> +                          (2, VfProvisioningMode.VGPU_PROFILE, VfSchedulingMode.DEFAULT_PROFILE)]
> +
> +
> +@dataclass
> +class MigrationWorkloadWsim:
> +    workload_file: str # Wsim workload descriptor file
> +    num_clients: int # Fork N clients emitting the workload simultaneously
> +    num_repeats: int # How many times to emit the workload
> +
> +    def __str__(self) -> str:
> +        return f'WL:{self.workload_file}-(C:{self.num_clients} R:{self.num_repeats})'
> +
> +
> +# VF busy migration WSIM workloads (payload for TestBusyMigrationWsim[N]):
> +wsim_idle_app = MigrationWorkloadWsim('idle_ctxs', 1, 1)
> +wsim_short_preempt = MigrationWorkloadWsim('short_preempt', 1, 4000) # 5ms * 4000 (20s)
> +wsim_short_nonpreempt = MigrationWorkloadWsim('short_nonpreempt', 1, 4000)
> +wsim_long_preempt = MigrationWorkloadWsim('long_preempt', 1, 200) # 100ms * 200 (20s)
> +wsim_long_nonpreempt = MigrationWorkloadWsim('long_nonpreempt', 1, 200)
> +
> +
> +@dataclass
> +class MigrationWorkloadIgt:
> +    igt_test: IgtType # IGT test type
> +    num_repeats: int = 1 # Number of repeats of the IGT test (calibrated in runtime)
> +
> +    def __str__(self) -> str:
> +        return f'WL:{self.igt_test}'
> +
> +# VF busy migration IGT workloads (payload for TestBusyMigrationIgt[M]):
> +# xe_exec_reset/long_spin subtests:
> +# Average exec time: 12-13s - execute 1x
> +igt_exec_reset_long_spin_many_preempt = MigrationWorkloadIgt(
> +    IgtType.EXEC_RESET_LONG_SPIN_MANY_PREEMPT)
> +igt_exec_reset_long_spin_many_preempt_media = MigrationWorkloadIgt(
> +    IgtType.EXEC_RESET_LONG_SPIN_MANY_PREEMPT_MEDIA)
> +igt_exec_reset_long_spin_many_preempt_threads = MigrationWorkloadIgt(
> +    IgtType.EXEC_RESET_LONG_SPIN_MANY_PREEMPT_THREADS)
> +igt_exec_reset_long_spin_many_preempt_gt0_threads = MigrationWorkloadIgt(
> +    IgtType.EXEC_RESET_LONG_SPIN_MANY_PREEMPT_GT0_THREADS)
> +# Average exec time: 6-7s - execute 2x
> +igt_exec_reset_long_spin_many_preempt_gt1_threads = MigrationWorkloadIgt(
> +    IgtType.EXEC_RESET_LONG_SPIN_MANY_PREEMPT_GT1_THREADS)
> +
> +# Average exec time: 12-13s - execute 1x
> +igt_exec_reset_long_spin_reuse_many_preempt = MigrationWorkloadIgt(
> +    IgtType.EXEC_RESET_LONG_SPIN_REUSE_MANY_PREEMPT)
> +igt_exec_reset_long_spin_reuse_many_preempt_media = MigrationWorkloadIgt(
> +    IgtType.EXEC_RESET_LONG_SPIN_REUSE_MANY_PREEMPT_MEDIA)
> +igt_exec_reset_long_spin_reuse_many_preempt_threads = MigrationWorkloadIgt(
> +    IgtType.EXEC_RESET_LONG_SPIN_REUSE_MANY_PREEMPT_THREADS)
> +igt_exec_reset_long_spin_reuse_many_preempt_gt0_threads = MigrationWorkloadIgt(
> +    IgtType.EXEC_RESET_LONG_SPIN_REUSE_MANY_PREEMPT_GT0_THREADS)
> +# Average exec time: 6-7s  execute 2x
> +igt_exec_reset_long_spin_reuse_many_preempt_gt1_threads = MigrationWorkloadIgt(
> +    IgtType.EXEC_RESET_LONG_SPIN_REUSE_MANY_PREEMPT_GT1_THREADS)
> +
> +# Average exec time: 12-13s - execute 1x
> +igt_exec_reset_long_spin_sys_reuse_many_preempt_threads = MigrationWorkloadIgt(
> +    IgtType.EXEC_RESET_LONG_SPIN_SYS_REUSE_MANY_PREEMPT_THREADS)
> +igt_exec_reset_long_spin_comp_reuse_many_preempt_threads = MigrationWorkloadIgt(
> +    IgtType.EXEC_RESET_LONG_SPIN_COMP_REUSE_MANY_PREEMPT_THREADS)
> +
> +# xe_exec_reset/cancel subtests:
> +# Average exec time: 5-7s  execute 2x
> +igt_exec_reset_cancel = MigrationWorkloadIgt(
> +    IgtType.EXEC_RESET_CANCEL)
> +igt_exec_reset_cancel_preempt = MigrationWorkloadIgt(
> +    IgtType.EXEC_RESET_CANCEL_PREEMPT)
> +# Average exec time: 10-15s  execute 1x
> +igt_exec_reset_cancel_timeslice_preempt = MigrationWorkloadIgt(
> +    IgtType.EXEC_RESET_CANCEL_TIMESLICE_PREEMPT)
> +# Average exec time: 20-25s  execute 1x
> +igt_exec_reset_cancel_timeslice_many_preempt = MigrationWorkloadIgt(
> +    IgtType.EXEC_RESET_CANCEL_TIMESLICE_MANY_PREEMPT)
> +
> +# xe_exec_threads subtests (short, execute in a loop):
> +# Average exec time: <500ms
> +igt_exec_threads_basic = MigrationWorkloadIgt(
> +    IgtType.EXEC_THREADS_BASIC)
> +igt_exec_threads_bal_basic = MigrationWorkloadIgt(
> +    IgtType.EXEC_THREADS_BAL_BASIC)
> +# Average exec time: 1-2s
> +igt_exec_threads_cm_userptr_invalidate = MigrationWorkloadIgt(
> +    IgtType.EXEC_THREADS_CM_USERPTR_INVALIDATE)
> +igt_exec_threads_bal_mixed_userptr_invalidate = MigrationWorkloadIgt(
> +    IgtType.EXEC_THREADS_BAL_MIXED_USERPTR_INVALIDATE)
> +# Average exec time: 1-4s
> +igt_exec_threads_many_queues = MigrationWorkloadIgt(
> +    IgtType.EXEC_THREADS_MANY_QUEUES)
> +
> +# xe_ccs subtest (short, execute in a loop):
> +# Average exec time: 200-600ms
> +igt_ccs_block_copy_compressed = MigrationWorkloadIgt(
> +    IgtType.CCS_BLOCK_COPY_COMPRESSED)
> +
> +# xe_compute_preempt subtest (short, execute in a loop):
> +# Average exec time: 1.8-2s
> +igt_compute_preempt_many = MigrationWorkloadIgt(
> +    IgtType.COMPUTE_PREEMPT_MANY)
> +
> +
> +class BaseTestBusyMigration:
> +    """Base class for busy migration tests (with workload executed).
> +
> +    The class provides implementation for VF save and restore subtests,
> +    supports parametrization with a different VMs number and various IGT workload types.
> +
> +    Dedicated for inheritance by separate child test classes with specific workload setup
> +    to avoid bulk dynamic test variants execution with the same VM setup.
> +    """
> +
> +    # State save result flag: executing test_restore depends on prior test_save success
> +    test_save_failed = True
> +
> +    def __calibrate_igt_wl(self, vm: VirtualMachine, igt_wl: MigrationWorkloadIgt):
> +        logger.info("Starting %s test loop calibration for migration workload", igt_wl.igt_test)
> +        igt_exec =  IgtExecutor(vm, igt_wl.igt_test)
> +        assert igt_exec.check_results(), 'Calibration IGT run failed'
> +
> +        results_log = igt_exec.get_results_log()
> +        igt_exec_time: float = round(results_log['time_elapsed']['end'] - results_log['time_elapsed']['start'], 3)
> +
> +        # Adjust IGT workload loop to execute longer than pre-save wait (with additional margins)
> +        if igt_exec_time < IGT_INIT_DELAY + 2:
> +            igt_wl.num_repeats = int(IGT_INIT_DELAY * 2 / igt_exec_time) + 1
> +
> +        logger.debug("Calibrated IGT workload loop: %s iteration(s) x ~%ss", igt_wl.num_repeats, igt_exec_time)
> +
> +    @pytest.fixture(scope='class', name='run_source_workload')
> +    def fixture_run_source_workload(self, setup_vms, set_migration_wl):
> +        ts: VmmTestingSetup = setup_vms
> +        vm_src: VirtualMachine = ts.get_vm[0] # First VM as source
> +        migration_wl = set_migration_wl # WSIM/IGT Workload variant
> +
> +        if isinstance(migration_wl, MigrationWorkloadWsim):
> +            wsim_file_path = ts.wsim_wl_dir / f'{migration_wl.workload_file}.wsim' # Workload descriptor file path
> +            if not wsim_file_path.exists():
> +                logger.error("gem_wsim workload file %s not available!", wsim_file_path)
> +                raise exceptions.GemWsimError(f'gem_wsim workload file {wsim_file_path} not available!')
> +
> +            # Run IGT wsim workload in pre-migration and check completion in post-migration
> +            return GemWsim(vm_src, migration_wl.num_clients, migration_wl.num_repeats, workload=wsim_file_path)
> +
> +        if isinstance(migration_wl, MigrationWorkloadIgt):
> +            self.__calibrate_igt_wl(vm_src, migration_wl)
> +            return IgtExecutor(vm_src, migration_wl.igt_test, migration_wl.num_repeats)
> +
> +        logger.error("Invalid workload type passed to run_source_workload fixture")
> +        raise exceptions.BenchError('Invalid workload type passed to run_source_workload fixture')
> +
> +    @pytest.fixture(scope='function', name='setup_destination_vm')
> +    def fixture_setup_destination_vm(self, setup_vms):
> +        ts: VmmTestingSetup = setup_vms
> +        vm_src: VirtualMachine = ts.get_vm[0]  # First VM as a source
> +        vm_dst: VirtualMachine = ts.get_vm[-1] # Last VM as a destination
> +        num_vms = ts.get_num_vms()
> +
> +        if num_vms == 1:
> +            logger.debug("Single VM: the same source and destination VM instance")
> +            assert vm_src == vm_dst
> +            return vm_dst
> +
> +        logger.debug("Multiple VMs: reload destination VM with the source image (with state snapshot)")
> +
> +        if vm_src.is_running():
> +            # QMP 'quit' is used for paused VM (cannot be powered off via guest-agent)
> +            vm_src.quit()
> +
> +        if vm_dst.is_running():
> +            vm_dst.quit()
> +            while vm_dst.is_running():
> +                time.sleep(1) # VM usually doesn't terminate immediately
> +
> +        # Re-start destination VM with an image containing a state snapshot
> +        vm_dst.set_migration_source(vm_src.image)
> +        vm_dst.poweron()
> +
> +        return vm_dst
> +
> +    def test_save(self, setup_vms, run_source_workload):
> +        logger.info("Test VM busy migration: state save")
> +        ts: VmmTestingSetup = setup_vms
> +        vm_src: VirtualMachine = ts.get_vm[0] # First VM as source
> +
> +        logger.debug("Execute migration in-flight workload on source VM")
> +        migration_wl = run_source_workload
> +        time.sleep(IGT_INIT_DELAY)
> +        assert migration_wl.is_running(), 'IGT/wsim migration workload is not running on source VM'
> +
> +        # Pause VM and save snapshot
> +        logger.debug("Pause execution and save source VM state")
> +        try:
> +            vm_src.pause()
> +            vm_src.save_state()
> +        except exceptions.GuestError as exc:
> +            logger.error("State save error: %s", exc)
> +            assert False, 'VF migration failed on save'
> +
> +        logger.debug("Resume execution on source VM")
> +        vm_src.resume()
> +
> +        assert migration_wl.check_results(), 'VF migration workload failed on source VM (post-save)'
> +
> +        if ts.get_num_vms() > 1:
> +            logger.debug("Multiple VMs: shutdown source VM")
> +            vm_src.poweroff()
> +
> +        BaseTestBusyMigration.test_save_failed = False
> +
> +    def test_restore(self, setup_vms, setup_destination_vm, run_source_workload):
> +        logger.info("Test VM busy migration: state restore")
> +        if BaseTestBusyMigration.test_save_failed:
> +            logger.error("State save failed - restore is pointless (fail immediately)")
> +            assert False, 'test_save subtest failed - do not execute test_restore'
> +
> +        ts: VmmTestingSetup = setup_vms
> +        vm_dst: VirtualMachine = setup_destination_vm
> +        migration_wl = run_source_workload # Get an instance of the IGT WL started in a save test
> +
> +        # Patch the source IgtExecutor/GemWsim instance with the current VM
> +        migration_wl.target = vm_dst
> +        if isinstance(migration_wl, IgtExecutor):
> +            # Clear IGT test results cache - remove post-save source VM results
> +            # TODO: implement common IgtExecutor/GemWsim results clear interface to avoid instance type check
> +            migration_wl.results.clear()
> +
> +        # Load the source state snapshot
> +        logger.debug("Restore source state on the destination VM")
> +        vm_dst.load_state()
> +        vm_dst.resume()
> +
> +        # TODO: add sync to VM class
> +        sync_value = random.randint(1, 0xFFFF)
> +        assert vm_dst.ga.sync(sync_value)['return'] == sync_value
> +
> +        assert migration_wl.is_running(), 'IGT/wsim migration workload is not running on destination VM'
> +        time.sleep(IGT_RESTORE_DELAY)
> +
> +        assert migration_wl.check_results(), 'VF migration workload failed on destination VM (post-restore)'
> +
> +        logger.debug("Check driver health on host and destination VM")
> +        assert driver_check(ts.host)
> +        assert driver_check(vm_dst)
> +
> +
> +@pytest.fixture(scope='class', name='set_migration_wl')
> +def fixture_set_migration_wl(request):
> +    """Set IGT/wsim descriptor file used as a migration workload in a TestBusyMigration[WL]."""
> +    # Wsim workload variant provided as MigrationWorkload data class instance
> +    return request.param
> +
> +
> +def idfn_workload(workload: MigrationWorkloadWsim):
> +    """Add workload name to a test config ID in parametrized tests
> +    (e.g. test_something[2VF-WL:workload_type-C:n-R:m].
> +    """
> +    return str(workload)
> +
> +
> +def set_test_config(test_variants: List[Tuple[int, VfProvisioningMode, VfSchedulingMode]],
> +                    max_vms: int = 2, wa_reduce_vf_lmem: bool = False) -> List[VmmTestingConfig]:
> +    """Helper function to provide a parametrized test with a list of test configuration variants."""
> +    test_configs: List[VmmTestingConfig] = []
> +
> +    for config in test_variants:
> +        (num_vfs, provisioning_mode, scheduling_mode) = config
> +        test_configs.append(VmmTestingConfig(num_vfs, max_vms, provisioning_mode, scheduling_mode,
> +                                             wa_reduce_vf_lmem=wa_reduce_vf_lmem))
> +
> +    return test_configs
> +
> +
> +# Busy migration TCs with WSIM workload
> +@pytest.mark.parametrize('set_migration_wl', [wsim_short_preempt],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_full), ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationWsim1(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing short (5ms) preemptable batches.
> +
> +    IGT/WSIM workload initiated pre-migration starts firing short submissions on each engine and
> +    during the execution VM state is migrated (VM state snapshot is saved, then restored).
> +    In the post-migration some additional batches are submitted.
> +    Executed in the following VM number variants:
> +    - single VF/VM: same VM acts as a source and destination.
> +    - multiple VFs/VMs: the workload execution is initiated on the source VM,
> +      then migrated and verified on the other, destination one.
> +    """
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [wsim_short_nonpreempt],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationWsim2(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing short (5ms) non-preemptable batches.
> +    Similar to TestBusyMigrationShort subtest, but emits non-preemptable batches.
> +    """
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [wsim_long_preempt],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationWsim3(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing quite long (100ms) but preemptable batches.
> +
> +    IGT/WSIM workload initiated pre-migration starts firing relatively long submissions and
> +    during the execution VM state is migrated (VM state snapshot is saved, then restored).
> +    In the post-migration some additional batches are submitted.
> +    Executed in the following VM number variants:
> +    - single VF/VM: same VM acts as a source and destination.
> +    - multiple VFs/VMs: the workload execution is initiated on the source VM,
> +      then migrated and verified on the other, destination one.
> +    """
> +
> +
> +# TODO: convert to negative scenario.
> +# Test is expected to fail because non-premptable workload execution time > PT (VLK-81241)
> +@pytest.mark.parametrize('set_migration_wl', [wsim_long_nonpreempt],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationWsim4(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing quite long (100ms) non-preemptable batches.
> +    Similar to TestBusyMigrationLong subtest, but emits non-preemptable batches.
> +    """
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [wsim_idle_app],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestIdleAppMigration(BaseTestBusyMigration):
> +    """Save-restore VM state with an idle VF but user application attached (contexts created).
> +
> +    IGT/WSIM workload initiated pre-migration creates multiple user contexts and
> +    does short submission on each but is idle during a save-restore operation,
> +    then resumes post-migration to do more submissions on previously created contexts.
> +    Executed in the following VM number variants:
> +    - single VF/VM: same VM acts as a source and destination.
> +    - multiple VFs/VMs: the workload execution is initiated on the source VM,
> +      then migrated and verified on the other, destination one.
> +    """
> +
> +# Busy migration TCs with IGT workload
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_many_preempt],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset1(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-many-preempt."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_many_preempt_media],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset2(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-many-preempt-media."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_many_preempt_threads],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_full),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset3(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-many-preempt-threads."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_many_preempt_gt0_threads],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset4(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-many-preempt-gt0-threads."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_many_preempt_gt1_threads],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset5(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-many-preempt-gt1-threads."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_reuse_many_preempt],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset6(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-reuse-many-preempt."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_reuse_many_preempt_media],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset7(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-reuse-many-preempt-media."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_reuse_many_preempt_threads],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_full),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset8(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-reuse-many-preempt-threads."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_reuse_many_preempt_gt0_threads],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset9(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-reuse-many-preempt-gt0-threads."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_reuse_many_preempt_gt1_threads],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset10(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-reuse-many-preempt-gt1-threads."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_sys_reuse_many_preempt_threads],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_full),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset11(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-sys-reuse-many-preempt-threads."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_long_spin_comp_reuse_many_preempt_threads],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_full),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset12(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_reset@long-spin-comp-reuse-many-preempt-threads."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_cancel],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset13(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_reset@cancel."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_cancel_preempt],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset14(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_reset@cancel-preempt."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_cancel_timeslice_preempt],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset15(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_reset@cancel-timeslice-preempt."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_reset_cancel_timeslice_many_preempt],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecReset16(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_reset@cancel-timeslice-many-preempt."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_threads_basic],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_full),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecThreads1(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_threads@threads-basic."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_threads_bal_basic],
> +                         ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecThreads2(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_threads@threads-bal-basic."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_threads_cm_userptr_invalidate],
> +                        ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecThreads3(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_threads@threads-cm-userptr-invalidate."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_threads_many_queues],
> +                        ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_full),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecThreads4(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_threads@threads-many-queues."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_exec_threads_bal_mixed_userptr_invalidate],
> +                        ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtExecThreads5(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_exec_threads@threads-bal-mixed-userptr-invalidate."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_ccs_block_copy_compressed],
> +                        ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtCcs(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_ccs@block-copy-compressed."""
> +
> +
> +@pytest.mark.parametrize('set_migration_wl', [igt_compute_preempt_many],
> +                        ids=idfn_workload, indirect=['set_migration_wl'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestBusyMigrationIgtComputePreempt(BaseTestBusyMigration):
> +    """Save-restore VM state with VF busy executing IGT xe_compute_preempt@compute-preempt-many (CCS path)."""
> +
> +
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestIdleMigration:
> +    """Save-restore VM state with an idle VF and no user application attached.
> +
> +    IGT workload initiated and ended twice: pre- and post-migration, but not executing during a save-restore operation.
> +    Test setup:
> +    - NxVFs running NxVM instances (first (VM[0]) acts as source and a last (VM[N-1] as a destination)
> +    - platform provisioned with the relevant vGPU profile M[N] (ATSM, ADLP) or C[N] (PVC)
> +    - VF state is saved on the source VM and then restored on the destination VM instance
> +      (in case of a single VF variant, source and destination is the same VM instance)
> +    """
> +
> +    @pytest.fixture(scope='function', name='setup_destination_vm')
> +    def fixture_setup_destination_vm(self, setup_vms):
> +        ts: VmmTestingSetup = setup_vms
> +        vm_src: VirtualMachine = ts.get_vm[0]  # First VM as a source
> +        vm_dst: VirtualMachine = ts.get_vm[-1] # Last VM as a destination
> +        num_vms = ts.get_num_vms()
> +
> +        if num_vms == 1:
> +            logger.debug("Single VM: the same source and destination VM instance")
> +            assert vm_src == vm_dst
> +            return vm_dst
> +
> +        logger.debug("Multiple VMs: reload destination VM with the source image (with state snapshot)")
> +
> +        if vm_src.is_running():
> +            # QMP 'quit' is used for paused VM (cannot be powered off via guest-agent)
> +            vm_src.quit()
> +
> +        if vm_dst.is_running():
> +            vm_dst.quit()
> +            while vm_dst.is_running():
> +                time.sleep(1) # VM usually doesn't terminate immediately
> +
> +        # Re-start destination VM with an image containing a state snapshot
> +        vm_dst.set_migration_source(vm_src.image)
> +        vm_dst.poweron()
> +
> +        return vm_dst
> +
> +    def test_save(self, setup_vms):
> +        logger.info("Test VM idle migration: state save")
> +        ts: VmmTestingSetup = setup_vms
> +        vm_src: VirtualMachine = ts.get_vm[0] # First VM as source
> +
> +        # Run some interactive program (not returning, as vim) to verify state after migration
> +        src_proc = ShellExecutor(vm_src, 'vim migrate.txt')
> +        source_proc = vm_src.execute_status(src_proc.pid)
> +        logger.debug("Source process: %s", source_proc)
> +        assert source_proc.exited is False, 'Source process is not running'
> +
> +        logger.debug("Execute pre-migration workload on source VM")
> +        assert igt_run_check(vm_src, IgtType.EXEC_STORE)
> +
> +        # Pause VM and save snapshot
> +        logger.debug("Pause execution and save VM state")
> +        try:
> +            vm_src.pause()
> +            vm_src.save_state()
> +        except exceptions.GuestError as exc:
> +            logger.error("State save error: %s", exc)
> +            assert False, 'VF migration failed on save'
> +
> +    def test_restore(self, setup_vms, setup_destination_vm):
> +        logger.info("Test VM idle migration: state restore")
> +        ts: VmmTestingSetup = setup_vms
> +        vm_dst: VirtualMachine = setup_destination_vm
> +
> +        # Load the source state snapshot
> +        logger.debug("Restore source state on the destination VM")
> +        vm_dst.load_state()
> +        vm_dst.resume()
> +
> +        # Verify program initiated on source VM is stil running after migration
> +        pgrep_dst = ShellExecutor(vm_dst, 'pgrep -f "vim migrate.txt"')
> +        pgrep_dst_result = vm_dst.execute_wait(pgrep_dst.pid)
> +        assert pgrep_dst_result.exit_code == 0, 'Source process (vim) not found'
> +        restored_proc = vm_dst.execute_status(int(pgrep_dst_result.stdout))
> +        logger.debug("Restored process: %s", restored_proc)
> +        assert restored_proc.exited is False, 'Restored process is not running'
> +
> +        logger.debug("Execute post-migration workload on destination VM")
> +        assert igt_run_check(vm_dst, IgtType.EXEC_STORE)
> +
> +        logger.debug("Check driver health on host and destination VM")
> +        assert driver_check(ts.host)
> +        assert driver_check(vm_dst)
> +
> +
> +class ResfixWaitStage(enum.IntEnum):
> +    # Resfix stopper checkpoints
> +    VF_MIGRATION_CONTINUE = 0
> +    VF_MIGRATION_WAIT_BEFORE_RESFIX_START = 1 << 0
> +    VF_MIGRATION_WAIT_BEFORE_FIXUPS = 1 << 1
> +    VF_MIGRATION_WAIT_BEFORE_RESTART_JOBS = 1 << 2
> +    VF_MIGRATION_WAIT_BEFORE_RESFIX_DONE = 1 << 3
> +
> +
> +class MigrationToRestore(enum.Enum):
> +    FIRST = 1
> +    SECOND = 2
> +
> +
> +@dataclass
> +class DoubleMigrationConfig:
> +    resfix_stoppers: ResfixWaitStage # Stage for migration RESFIX stop
> +    migration_to_restore: MigrationToRestore # Migration snapshot to be restored after doubled save
> +
> +    def __str__(self) -> str:
> +        return f'RS:{hex(self.resfix_stoppers)}-MR:{self.migration_to_restore}'
> +
> +
> +double_migration_1_resfix_1 = DoubleMigrationConfig(
> +    ResfixWaitStage.VF_MIGRATION_WAIT_BEFORE_RESFIX_START, MigrationToRestore.FIRST)
> +double_migration_1_resfix_2 = DoubleMigrationConfig(
> +    ResfixWaitStage.VF_MIGRATION_WAIT_BEFORE_FIXUPS, MigrationToRestore.FIRST)
> +double_migration_1_resfix_3 = DoubleMigrationConfig(
> +    ResfixWaitStage.VF_MIGRATION_WAIT_BEFORE_RESTART_JOBS, MigrationToRestore.FIRST)
> +double_migration_1_resfix_4 = DoubleMigrationConfig(
> +    ResfixWaitStage.VF_MIGRATION_WAIT_BEFORE_RESFIX_DONE, MigrationToRestore.FIRST)
> +
> +
> +double_migration_2_resfix_1 = DoubleMigrationConfig(
> +    ResfixWaitStage.VF_MIGRATION_WAIT_BEFORE_RESFIX_START, MigrationToRestore.SECOND)
> +double_migration_2_resfix_2 = DoubleMigrationConfig(
> +    ResfixWaitStage.VF_MIGRATION_WAIT_BEFORE_FIXUPS, MigrationToRestore.SECOND)
> +double_migration_2_resfix_3 = DoubleMigrationConfig(
> +    ResfixWaitStage.VF_MIGRATION_WAIT_BEFORE_RESTART_JOBS, MigrationToRestore.SECOND)
> +double_migration_2_resfix_4 = DoubleMigrationConfig(
> +    ResfixWaitStage.VF_MIGRATION_WAIT_BEFORE_RESFIX_DONE, MigrationToRestore.SECOND)
> +
> +class BaseTestDoubleMigration:
> +    """Base class for double migration tests.
> +    Test scenario triggers VF re-migrate while the initial restore (resources fixup) is still ongoing.
> +
> +    Save-load and immediately save again before the initial migration completes (prior to resfix done).
> +    Post migration resources fixup is delayed via KMD debug hook to initiate the 2nd save.
> +    Tests Xe KMD corner case where two migration notifications must be handled.
> +    IGT/WSIM workload is executing during the migration (started prior to 1st save).
> +
> +    The class provides implementation for VF save and restore-save subtests,
> +    supports parametrization with a different VMs number and double migration scenario variants.
> +
> +    Dedicated for inheritance by separate child test classes with specific
> +    double migration test scenarios configurations:
> +    - stopping RESFIX in a different stage
> +    - restoring from initial (1st) or latter (2nd) migration
> +    """
> +
> +    def __set_debugfs_resfix_stoppers(self, vm: VirtualMachine, stage: ResfixWaitStage):
> +        """Set resfix_stoppers:
> +        predefined checkpoints that allow the migration process to pause at specific stages.
> +        Each state will pause with a 1-second delay per iteration, continuing until
> +        its corresponding bit is cleared.
> +        Debug hook path: /sys/kernel/debug/dri/<card>/gt0/vf/resfix_stoppers
> +        """
> +        vf_driver = vm.get_dut().driver
> +        vf_driver.write_debugfs(f'{vf_driver.debugfs_path}/gt0/vf/resfix_stoppers', str(stage))
> +
> +        resfix_stoppers = vf_driver.read_debugfs(f'{vf_driver.debugfs_path}/gt0/vf/resfix_stoppers').strip()
> +        logger.debug("[%s] Set migration resfix stoppers: %s (%s)"
> +                     "\nPause checkpoints:"
> +                     "\n\tVF_MIGRATION_WAIT_BEFORE_RESFIX_START: BIT(0)"
> +                     "\n\tVF_MIGRATION_WAIT_BEFORE_FIXUPS: BIT(1)"
> +                     "\n\tVF_MIGRATION_WAIT_BEFORE_RESTART_JOBS: BIT(2)"
> +                     "\n\tVF_MIGRATION_WAIT_BEFORE_RESFIX_DONE: BIT(3)"
> +                     "\n\tResume execution: 0",
> +                     vm, resfix_stoppers, bin(int(resfix_stoppers, 16)))
> +
> +        return int(resfix_stoppers, 16) == stage
> +
> +    def __is_resfix_stopped(self, vm: VirtualMachine):
> +        vf_driver = vm.get_dut().driver
> +        resfix_stoppers = vf_driver.read_debugfs(f'{vf_driver.debugfs_path}/gt0/vf/resfix_stoppers').strip()
> +
> +        return int(resfix_stoppers, 16) != 0
> +
> +    @pytest.fixture(scope='function', name='set_resfix_stoppers')
> +    def fixture_set_resfix_stoppers(self, setup_vms, set_double_migration_config):
> +        ts: VmmTestingSetup = setup_vms
> +        migration_config: DoubleMigrationConfig = set_double_migration_config
> +        vm_src: VirtualMachine = ts.get_vm[0]  # First VM as a source
> +
> +        return self.__set_debugfs_resfix_stoppers(vm_src, migration_config.resfix_stoppers)
> +
> +    @pytest.fixture(scope='function', name='clear_resfix_stoppers')
> +    def fixture_clear_resfix_stoppers(self, setup_vms):
> +        ts: VmmTestingSetup = setup_vms
> +        yield
> +
> +        for vm in ts.get_vm:
> +            if vm.is_running() and self.__is_resfix_stopped(vm):
> +                logger.info("Teardown fixture - clear remaining resfix stoppers")
> +                self.__set_debugfs_resfix_stoppers(vm, ResfixWaitStage.VF_MIGRATION_CONTINUE)
> +
> +    @pytest.fixture(scope='class', name='run_source_workload')
> +    def fixture_run_source_workload(self, setup_vms):
> +        ts: VmmTestingSetup = setup_vms
> +        vm_src: VirtualMachine = ts.get_vm[0] # First VM as source
> +        migration_wl: MigrationWorkloadWsim = wsim_short_preempt # Workload variant
> +        wsim_file_path = ts.wsim_wl_dir / f'{migration_wl.workload_file}.wsim' # Workload descriptor file path
> +        if not wsim_file_path.exists():
> +            logger.error("gem_wsim workload file %s not available!", wsim_file_path)
> +            raise exceptions.GemWsimError(f'gem_wsim workload file {wsim_file_path} not available!')
> +
> +        # Run IGT wsim workload in pre-migration and check completion in post-migration
> +        return GemWsim(vm_src, migration_wl.num_clients, migration_wl.num_repeats, workload=wsim_file_path)
> +
> +    @pytest.fixture(scope='function', name='setup_destination_vm')
> +    def fixture_setup_destination_vm(self, setup_vms):
> +        ts: VmmTestingSetup = setup_vms
> +        vm_src: VirtualMachine = ts.get_vm[0]  # First VM as a source
> +        vm_dst: VirtualMachine = ts.get_vm[-1] # Last VM as a destination
> +        num_vms = ts.get_num_vms()
> +
> +        if num_vms == 1:
> +            logger.debug("Single VM: the same source and destination VM instance")
> +            assert vm_src == vm_dst
> +            vm_dst.pause()
> +            return vm_dst
> +
> +        logger.debug("Multiple VMs: reload destination VM with the source image (with state snapshot)")
> +
> +        if vm_src.is_running():
> +            # QMP 'quit' is used for paused VM (cannot be powered off via guest-agent)
> +            vm_src.quit()
> +
> +        if vm_dst.is_running():
> +            vm_dst.quit()
> +            while vm_dst.is_running():
> +                time.sleep(1) # VM usually doesn't terminate immediately
> +
> +        # Re-start destination VM with an image containing a state snapshot
> +        vm_dst.set_migration_source(vm_src.image)
> +        vm_dst.poweron()
> +
> +        return vm_dst
> +
> +    def test_save(self, setup_vms, run_source_workload, set_resfix_stoppers):
> +        logger.info("Test VM double migration: 1st state save")
> +        ts: VmmTestingSetup = setup_vms
> +        vm_src: VirtualMachine = ts.get_vm[0] # First VM as source
> +        assert set_resfix_stoppers, 'Failed to set migration resfix stoppers'
> +
> +        logger.debug("Execute throughout-migration workload on source VM")
> +        migration_wl: GemWsim = run_source_workload
> +        time.sleep(IGT_INIT_DELAY)
> +        assert migration_wl.is_running(), 'IGT/wsim migration workload is not running on source VM'
> +
> +        # Pause source VM and save snapshot
> +        logger.debug("Pause execution and save source VM state (snapshot #1)")
> +        try:
> +            vm_src.pause()
> +            vm_src.save_state() # snapshot #1
> +        except exceptions.GuestError as exc:
> +            logger.error("State save error: %s", exc)
> +            assert False, 'VF migration failed on save'
> +
> +    def test_restore_save(self, setup_destination_vm, run_source_workload,
> +                          set_double_migration_config, clear_resfix_stoppers):
> +        logger.info("Test VM double migration: state restore and 2nd save prior recovery is done")
> +        vm_dst: VirtualMachine = setup_destination_vm
> +        migration_wl: GemWsim = run_source_workload # Get an instance of the IGT WL started in a save test
> +        migration_config: DoubleMigrationConfig = set_double_migration_config
> +
> +        # Patch the source IgtExecutor instance with the current VM and clear results cache
> +        migration_wl.target = vm_dst
> +
> +        # Load the source state snapshot
> +        logger.debug("Restore source state on the destination VM (snapshot #1)")
> +        vm_dst.load_state() # snapshot #1
> +        vm_dst.resume()
> +
> +        time.sleep(3) # Wait a bit for the migration recovery fires
> +        logger.debug("Save 2nd VM state (snapshot #2) while the 1st migration recovery still in progress")
> +
> +        if migration_config.migration_to_restore is MigrationToRestore.FIRST:
> +            # VM pause/resume is implicitly called by save,
> +            # snapshot #1 recovery is continued immediately after snapshot #2 save completes
> +            vm_dst.save_state() # snapshot #2
> +            logger.info("Continue source VM state recovery (snapshot #1)")
> +
> +        if migration_config.migration_to_restore is MigrationToRestore.SECOND:
> +            # Include explicit VM pause/resume, snapshot #2 load shall immediately follow it's save,
> +            # to not allow continuation of state recovery of snapshot #1.
> +            vm_dst.pause()
> +            vm_dst.save_state() # snapshot #2
> +            logger.info("Load state and re-start state recovery of 2nd saved state (snapshot #2)")
> +            vm_dst.load_state() # snapshot #2
> +            vm_dst.resume()
> +
> +        logger.info("Continue migration recovery - clear resfix stoppers")
> +        self.__set_debugfs_resfix_stoppers(vm_dst, ResfixWaitStage.VF_MIGRATION_CONTINUE)
> +
> +        logger.debug("Check migration in-flight workload after destination VM save")
> +        time.sleep(IGT_RESTORE_DELAY)
> +        assert migration_wl.check_results(), 'VF migration workload failed on destination VM (post-restore)'
> +
> +
> +@pytest.fixture(scope='class', name='set_double_migration_config')
> +def fixture_set_double_migration_config(request):
> +    """Set migration recovery wait stage for double migration test and number of snapshot to restore."""
> +    # Provide list of DoubleMigrationConfig instances to setup the test.
> +    return request.param
> +
> +
> +def idfn_double_migration(config: DoubleMigrationConfig):
> +    """Add double migration settings to a test config ID in parametrized tests
> +    (e.g. test_something[2VF-RS:resfix_stopper-MR:snapshot_to_restore].
> +    """
> +    return str(config)
> +
> +
> +@pytest.mark.parametrize('set_double_migration_config', [double_migration_1_resfix_1],
> +                        ids=idfn_double_migration, indirect=['set_double_migration_config'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_profiles, wa_reduce_vf_lmem=True),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestDoubleMigration1Resfix1(BaseTestDoubleMigration):
> +    """Double migration test restoring the first snapshot (the former, initial migration):
> +    save snapshot #1 -> load snapshot #1 -> save snapshot #2 (during #1 recovery) -> continue to recover #1
> +    Stop resfix on VF_MIGRATION_WAIT_BEFORE_RESFIX_START (BIT0) checkpoint to initiate 2nd save.
> +
> +    W/A: reduce VF VRAM quota to speed up the 2nd save (to avoid time-out).
> +    """
> +
> +
> +@pytest.mark.parametrize('set_double_migration_config', [double_migration_1_resfix_2],
> +                        ids=idfn_double_migration, indirect=['set_double_migration_config'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_profiles, wa_reduce_vf_lmem=True),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestDoubleMigration1Resfix2(BaseTestDoubleMigration):
> +    """Double migration test restoring the first snapshot (the former, initial migration):
> +    save snapshot #1 -> load snapshot #1 -> save snapshot #2 (during #1 recovery) -> continue to recover #1
> +    Stop resfix on VF_MIGRATION_WAIT_BEFORE_FIXUPS (BIT1) checkpoint to initiate 2nd save.
> +
> +    W/A: reduce VF VRAM quota to speed up the 2nd save (to avoid time-out).
> +    """
> +
> +
> +@pytest.mark.parametrize('set_double_migration_config', [double_migration_1_resfix_3],
> +                        ids=idfn_double_migration, indirect=['set_double_migration_config'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_profiles, wa_reduce_vf_lmem=True),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestDoubleMigration1Resfix3(BaseTestDoubleMigration):
> +    """Double migration test restoring the first snapshot (the former, initial migration):
> +    save snapshot #1 -> load snapshot #1 -> save snapshot #2 (during #1 recovery) -> continue to recover #1
> +    Stop resfix on VF_MIGRATION_WAIT_BEFORE_RESTART_JOBS (BIT2) checkpoint to initiate 2nd save.
> +
> +    W/A: reduce VF VRAM quota to speed up the 2nd save (to avoid time-out).
> +    """
> +
> +
> +@pytest.mark.parametrize('set_double_migration_config', [double_migration_1_resfix_4],
> +                        ids=idfn_double_migration, indirect=['set_double_migration_config'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_profiles, wa_reduce_vf_lmem=True),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestDoubleMigration1Resfix4(BaseTestDoubleMigration):
> +    """Double migration test restoring the first snapshot (the former, initial migration):
> +    save snapshot #1 -> load snapshot #1 -> save snapshot #2 (during #1 recovery) -> continue to recover #1
> +    Stop resfix on VF_MIGRATION_WAIT_BEFORE_RESFIX_DONE (BIT3) checkpoint to initiate 2nd save.
> +
> +    W/A: reduce VF VRAM quota to speed up the 2nd save (to avoid time-out).
> +    """
> +
> +
> +@pytest.mark.parametrize('set_double_migration_config', [double_migration_2_resfix_1],
> +                        ids=idfn_double_migration, indirect=['set_double_migration_config'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_profiles, wa_reduce_vf_lmem=True),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestDoubleMigration2Resfix1(BaseTestDoubleMigration):
> +    """Double migration test restoring the second snapshot (the latter migration):
> +    save snapshot #1 -> load snapshot #1 -> save snapshot #2 (during #1 recovery) -> load and recover #2
> +    Stop resfix on VF_MIGRATION_WAIT_BEFORE_RESFIX_START (BIT0) checkpoint to initiate 2nd save.
> +
> +    W/A: reduce VF VRAM quota to speed up the 2nd save (to avoid time-out).
> +    """
> +
> +
> +@pytest.mark.parametrize('set_double_migration_config', [double_migration_2_resfix_2],
> +                        ids=idfn_double_migration, indirect=['set_double_migration_config'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_profiles, wa_reduce_vf_lmem=True),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestDoubleMigration2Resfix2(BaseTestDoubleMigration):
> +    """Double migration test restoring the second snapshot (the latter migration):
> +    save snapshot #1 -> load snapshot #1 -> save snapshot #2 (during #1 recovery) -> load and recover #2
> +    Stop resfix on VF_MIGRATION_WAIT_BEFORE_FIXUPS (BIT1) checkpoint to initiate 2nd save.
> +
> +    W/A: reduce VF VRAM quota to speed up the 2nd save (to avoid time-out).
> +    """
> +
> +
> +@pytest.mark.parametrize('set_double_migration_config', [double_migration_2_resfix_3],
> +                        ids=idfn_double_migration, indirect=['set_double_migration_config'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_profiles, wa_reduce_vf_lmem=True),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestDoubleMigration2Resfix3(BaseTestDoubleMigration):
> +    """Double migration test restoring the second snapshot (the latter migration):
> +    save snapshot #1 -> load snapshot #1 -> save snapshot #2 (during #1 recovery) -> load and recover #2
> +    Stop resfix on VF_MIGRATION_WAIT_BEFORE_RESTART_JOBS (BIT2) checkpoint to initiate 2nd save.
> +
> +    W/A: reduce VF VRAM quota to speed up the 2nd save (to avoid time-out).
> +    """
> +
> +
> +@pytest.mark.parametrize('set_double_migration_config', [double_migration_2_resfix_4],
> +                        ids=idfn_double_migration, indirect=['set_double_migration_config'])
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_profiles, wa_reduce_vf_lmem=True),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestDoubleMigration2Resfix4(BaseTestDoubleMigration):
> +    """Double migration test restoring the second snapshot (the latter migration):
> +    save snapshot #1 -> load snapshot #1 -> save snapshot #2 (during #1 recovery) -> load and recover #2
> +    Stop resfix on VF_MIGRATION_WAIT_BEFORE_RESFIX_DONE (BIT3) checkpoint to initiate 2nd save.
> +
> +    W/A: reduce VF VRAM quota to speed up the 2nd save (to avoid time-out).
> +    """
> +
> +
> +@pytest.mark.parametrize('setup_vms', set_test_config(test_variants_basic),
> +                         ids=idfn_test_config, indirect=['setup_vms'])
> +class TestCheckpoint:
> +    """Verify a state can be saved for the future use and then loaded at the previous checkpoint."""
> +
> +    @pytest.fixture(scope='function', name='setup_destination_vm')
> +    def fixture_setup_destination_vm(self, setup_vms):
> +        ts: VmmTestingSetup = setup_vms
> +        vm_src: VirtualMachine = ts.get_vm[0]  # First VM as a source
> +        vm_dst: VirtualMachine = ts.get_vm[-1] # Last VM as a destination
> +        num_vms = ts.get_num_vms()
> +
> +        if num_vms == 1:
> +            logger.debug("Single VM: the same source and destination VM instance")
> +            assert vm_src == vm_dst
> +            return vm_dst
> +
> +        logger.debug("Multiple VMs: restart destination VM with the source image (with state checkpoint)")
> +        vm_dst.poweroff()
> +        # Source qcow2 must be copied because multiple VMs cannot run with the same image file
> +        vm_dst.set_migration_source(duplicate_vm_image(vm_src.image))
> +        vm_dst.poweron()
> +        vm_dst.resume()
> +        assert modprobe_driver_run_check(vm_dst)
> +
> +        return vm_dst
> +
> +    @pytest.fixture(scope='class', name='run_source_workload')
> +    def fixture_run_source_workload(self, setup_vms):
> +        ts: VmmTestingSetup = setup_vms
> +        vm_src: VirtualMachine = ts.get_vm[0] # First VM as source
> +
> +        # Run IGT workload to check before and after a state checkpoint
> +        return IgtExecutor(vm_src, IgtType.SPIN_BATCH)
> +
> +    def test_save(self, setup_vms, run_source_workload):
> +        logger.info("Test VM state checkpoint save")
> +        ts: VmmTestingSetup = setup_vms
> +        vm_src: VirtualMachine = ts.get_vm[0] # First VM as source
> +        igt_src: IgtExecutor = run_source_workload
> +
> +        # Save state checkpoint
> +        logger.debug("Save VM state checkpoint")
> +        try:
> +            vm_src.save_state()
> +        except exceptions.GuestError as exc:
> +            logger.error("State save error: %s", exc)
> +            assert False, 'VF migration failed on save'
> +
> +        # Verify workload submitted prior to the state checkpoint succeeds
> +        assert igt_check(igt_src), 'Source IGT workload has failed'
> +
> +        logger.debug("Check driver health on host and source VM")
> +        assert driver_check(ts.host)
> +        assert driver_check(vm_src)
> +
> +    def test_load(self, setup_vms, setup_destination_vm, run_source_workload):
> +        logger.info("Test VM state checkpoint load")
> +        ts: VmmTestingSetup = setup_vms
> +        vm_dst: VirtualMachine = setup_destination_vm
> +        igt_src: IgtExecutor = run_source_workload # Get an instance of the IGT WL started in a save test
> +
> +        # Patch the source IgtExecutor instance with the current VM and clear results cache
> +        igt_src.target = vm_dst
> +        igt_src.results.clear()
> +
> +        # Workload submitted before the checkpoint should not be active before load
> +        logger.debug("Verify IGT workload is not executing prior to the state restore (expected pgrep error)")
> +        assert not cmd_run_check(vm_dst, 'pgrep igt_runner'), 'IGT workload is (unexpectedly) running'
> +
> +        # Load previously saved state checkpoint and resume on destination VM
> +        logger.debug("Load VM state checkpoint")
> +        vm_dst.load_state()
> +
> +        # Workload submitted before the checkpoint should be restored in running state after load
> +        logger.debug("Verify IGT workload is executing again after the state restore")
> +        assert not igt_src.status().exited, 'IGT workload is not running after checkpoint load'
> +        assert igt_check(igt_src), 'IGT workload loaded on checkpoint has failed'
> +
> +        logger.debug("Check driver health on host and destination VM")
> +        assert driver_check(ts.host)
> +        assert driver_check(vm_dst)
> +
> +
> +def test_2vm_pause_resume(create_1host_2vm):
> +    """
> +    VM/VF pause-resume does not affect workload execution:
> +     - 2xVFs running 2xVM instance
> +     - both VFs auto-provisioned, running IGT workloads
> +     - 1st VM/VF is paused and resumed (but VF state is not saved/loaded)
> +     - 2nd VM/VF workload should not be interrupted
> +     - IGT workloads shall finish successfully on both VMs
> +    """
> +    ts: VmmTestingSetup = create_1host_2vm
> +    host: Host = ts.host
> +    vm0: VirtualMachine = ts.get_vm[0]
> +    vm1: VirtualMachine = ts.get_vm[1]
> +    assert driver_check(host)
> +
> +    num_vfs = ts.testing_config.num_vfs
> +    assert ts.get_dut().create_vf(num_vfs) == num_vfs
> +
> +    vf1, vf2 = ts.get_dut().get_vfs_bdf(1, 2)
> +    vm0.assign_vf(vf1)
> +    vm1.assign_vf(vf2)
> +    ts.poweron_vms()
> +
> +    pause_vf_num = 1
> +
> +    assert modprobe_driver_run_check(vm0)
> +    assert modprobe_driver_run_check(vm1)
> +
> +    logger.debug("Submit IGT WL (gem_wsim) on VM0")
> +    iterations = 3000 # 3k iterations of 10ms WLs give 30s total expected time
> +    expected_elapsed_sec = ONE_CYCLE_DURATION_MS * iterations / MS_IN_SEC
> +    gem_wsim_vm0 = GemWsim(vm0, 1, iterations, PREEMPT_10MS_WORKLOAD)
> +
> +    # Allow wsim WL to run some time
> +    time.sleep(IGT_INIT_DELAY)
> +    assert gem_wsim_vm0.is_running()
> +
> +    logger.debug("Submit IGT WL (gem_spin_batch) on VM1")
> +    igt_vm1 = IgtExecutor(vm1, IgtType.SPIN_BATCH)
> +
> +    # Special handling of pausing VMs with infinite ExecQuanta - refer to SAS for details
> +    logger.debug("Set VF1 EQ/PF before the pause")
> +    ts.get_dut().driver.set_exec_quantum_ms(pause_vf_num, 1)
> +    ts.get_dut().driver.set_preempt_timeout_us(pause_vf_num, 100)
> +
> +    logger.debug("Pause execution on VM0/VF1")
> +    vm0.pause()
> +
> +    assert igt_check(igt_vm1)
> +    logger.debug("VM1 IGT WL (not paused) finished successfully")
> +
> +    logger.debug("Resume execution on VM0/VF1")
> +    vm0.resume()
> +
> +    logger.debug("Reset VF1 EQ/PF to the initial values (infinite) after resume")
> +    ts.get_dut().driver.set_exec_quantum_ms(pause_vf_num, 0)
> +    ts.get_dut().driver.set_preempt_timeout_us(pause_vf_num, 0)
> +
> +    result_vm0 = gem_wsim_vm0.wait_results() # Throws exception on wsim fail
> +    assert expected_elapsed_sec * 0.8 < result_vm0.elapsed_sec < expected_elapsed_sec * 1.5
> +    logger.debug("VM0 IGT WL (paused-resumed) finished successfully")
> +
> +    # Check host and VM health status after pause-resume transition
> +    assert driver_check(host)
> +    assert driver_check(vm0)
> +    assert driver_check(vm1)
> +
> +
> +def test_1vm_save_restore_no_driver(create_1host_1vm):
> +    """
> +    Save/restore single VM state with no guest driver loaded:
> +     - 1xVFs running 1xVM instance (single VM acts as source and destination)
> +     - platform provisioned with vGPU profile M1 (ATSM, ADLP) or C1 (PVC)
> +     - VF state saved and then restored on the same VM instance
> +     - driver probed on VM after the resume, IGT workload executed
> +    """
> +    ts: VmmTestingSetup = create_1host_1vm
> +    host: Host = ts.host
> +    vm: VirtualMachine = ts.get_vm[0]
> +    assert driver_check(host)
> +
> +    num_vfs = ts.testing_config.num_vfs
> +    assert ts.get_dut().create_vf(num_vfs) == num_vfs
> +
> +    vf = ts.get_dut().get_vf_bdf(1)
> +    vm.assign_vf(vf)
> +
> +    vm.poweron()
> +
> +    # Run some interactive program (not returning, as vim) to verify state after migration
> +    src_proc = ShellExecutor(vm, 'vim migrate.txt')
> +    src_pid = src_proc.pid
> +
> +    # Pause VM and save snapshot
> +    logger.debug("Pause execution and save VM state")
> +    try:
> +        vm.pause()
> +        vm.save_state()
> +    except exceptions.GuestError as exc:
> +        logger.error("State save error: %s", exc)
> +        assert False, 'VF migration failed on save'
> +
> +    # Load previously saved snapshot and resume the same VM
> +    logger.debug("Load state on the same VM instance")
> +    vm.load_state()
> +    vm.resume()
> +
> +    # Verify program initiated on source VM is stil running after migration
> +    migrated_proc = vm.execute_status(src_pid)
> +    logger.debug("Migrated process: %s", migrated_proc)
> +    assert migrated_proc.exited is False, 'Migrated process is not running after VM snapshot load'
> +
> +    logger.debug("Probe driver and execute workload on VM")
> +    assert modprobe_driver_run_check(vm)
> +    assert igt_run_check(vm, IgtType.EXEC_STORE)
> +
> +    logger.debug("Check driver health on host and VM")
> +    assert driver_check(host)
> +    assert driver_check(vm)
LGTM,
Reviewed-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>

  reply	other threads:[~2026-05-14 13:21 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-16  8:35 [PATCH i-g-t 0/3] vmtb: SR-IOV VF migration test suite Adam Miszczak
2026-04-16  8:35 ` [PATCH i-g-t 1/3] tools/vmtb: Define IGT tests used as VF migration workloads Adam Miszczak
2026-04-20 18:27   ` Kamil Konieczny
2026-05-14 13:18   ` Bernatowicz, Marcin
2026-04-16  8:35 ` [PATCH i-g-t 2/3] tools/vmtb: Provide VF busy migration IGT/gem_wsim workloads Adam Miszczak
2026-05-14 13:20   ` Bernatowicz, Marcin
2026-04-16  8:35 ` [PATCH i-g-t 3/3] tools/vmtb: Add VF migration tests Adam Miszczak
2026-05-14 13:21   ` Bernatowicz, Marcin [this message]
2026-04-16 15:40 ` ✓ i915.CI.BAT: success for vmtb: SR-IOV VF migration test suite Patchwork
2026-04-16 15:50 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-16 17:34 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-04-17  3:47 ` ✗ i915.CI.Full: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bc707c43-dd34-4ab1-8ce3-dde055b2fc2c@linux.intel.com \
    --to=marcin.bernatowicz@linux.intel.com \
    --cc=adam.miszczak@linux.intel.com \
    --cc=igt-dev@lists.freedesktop.org \
    --cc=kamil.konieczny@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox