public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Longfang Liu <liulongfang@huawei.com>,
	Alex Williamson <alex@shazbot.org>,
	Sasha Levin <sashal@kernel.org>,
	kvm@vger.kernel.org
Subject: [PATCH AUTOSEL 6.19-6.18] hisi_acc_vfio_pci: resolve duplicate migration states
Date: Sun, 15 Feb 2026 10:03:25 -0500	[thread overview]
Message-ID: <20260215150333.2150455-8-sashal@kernel.org> (raw)
In-Reply-To: <20260215150333.2150455-1-sashal@kernel.org>

From: Longfang Liu <liulongfang@huawei.com>

[ Upstream commit 8c6ac1730a977234dff74cc1753b4a953f59be7b ]

In special scenarios involving duplicate migrations, after the
first migration is completed, if the original VF device is used
again and then migrated to another destination, the state indicating
data migration completion for the VF device is not reset.
This results in the second migration to the destination being skipped
without performing data migration.
After the modification, it ensures that a complete data migration
is performed after the subsequent migration.

Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Link: https://lore.kernel.org/r/20260122020205.2884497-4-liulongfang@huawei.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

### 3. Understanding the Bug Mechanism

Now the bug is crystal clear:

1. **`vf_qm_check_match()`** (line 401-453) performs VF device
   matching/validation during migration. Once matching succeeds, it sets
   `match_done = true` (line 451).

2. On subsequent calls, if `match_done` is already `true`, it **skips
   all validation** (line 411: early return 0).

3. **The bug**: When the device is opened for a second migration,
   `match_done` is never reset to 0. So `vf_qm_check_match()` returns
   immediately without performing the actual data migration matching,
   effectively **skipping the migration data transfer**.

4. **The fix**: Reset `match_done = 0` in
   `hisi_acc_vfio_pci_open_device()`, ensuring that each time the device
   is opened for a new migration, matching starts fresh.

### 4. Classification

This is a **clear functional bug fix**. It fixes incorrect behavior
where sequential VFIO device migrations fail silently after the first
one. The second migration appears to succeed but doesn't actually
transfer data.

### 5. Scope and Risk Assessment

- **One line change**: `hisi_acc_vdev->match_done = 0;`
- **Single file**: `drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c`
- **Minimal risk**: The change simply resets a state flag when a device
  is opened, which is the expected correct behavior
- **Located in the right place**: Inside the `open_device` function
  where other state is already being initialized (`mig_state`,
  `dev_opened`)
- **Under proper locking**: Inside the `open_mutex` critical section
- **Driver-specific**: Only affects HiSilicon accelerator VFIO devices,
  no risk to other subsystems

### 6. User Impact

- **Who is affected**: Users of HiSilicon accelerator hardware (common
  in Huawei/HiSilicon server platforms) performing VFIO-based VM live
  migration
- **Severity**: HIGH — silent data migration failure means VM state is
  silently corrupted or lost during the second migration. The VM may
  crash or behave incorrectly on the new host
- **Frequency**: Any time a VM is migrated more than once (a standard
  operation in cloud/datacenter environments for load balancing,
  maintenance, etc.)

### 7. Stable Kernel Criteria Check

- **Obviously correct**: Yes — resetting state on device open is clearly
  the right thing to do
- **Fixes a real bug**: Yes — second migrations silently fail
- **Small and contained**: Yes — single line addition in a single file
- **No new features**: Correct — this is purely a state reset bug fix
- **Tested**: Accepted by the VFIO maintainer (Alex Williamson)

### 8. Dependency Check

No dependencies on other commits. The `match_done` field and the
`open_device` function structure already exist in stable trees that have
the HiSilicon VFIO migration support.

### Conclusion

This is a textbook stable backport candidate: a one-line fix for a clear
functional bug (state not being reset between migrations) that causes
silent data migration failures in production environments. The fix is
minimal, obviously correct, properly placed under existing locking, and
carries essentially zero regression risk.

**YES**

 drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index cf45f6370c369..39bff70f1e14b 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -1547,6 +1547,7 @@ static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev)
 		}
 		hisi_acc_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
 		hisi_acc_vdev->dev_opened = true;
+		hisi_acc_vdev->match_done = 0;
 		mutex_unlock(&hisi_acc_vdev->open_mutex);
 	}
 
-- 
2.51.0


  parent reply	other threads:[~2026-02-15 15:03 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-15 15:03 [PATCH AUTOSEL 6.19-6.12] riscv: vector: init vector context with proper vlenb Sasha Levin
2026-02-15 15:03 ` [PATCH AUTOSEL 6.19-6.12] scsi: ufs: mediatek: Fix page faults in ufs_mtk_clk_scale() trace event Sasha Levin
2026-02-15 15:03 ` [PATCH AUTOSEL 6.19-6.1] hisi_acc_vfio_pci: update status after RAS error Sasha Levin
2026-02-15 15:03 ` [PATCH AUTOSEL 6.19-6.18] hisi_acc_vfio_pci: fix the queue parameter anomaly issue Sasha Levin
2026-02-15 15:03 ` [PATCH AUTOSEL 6.19-5.15] scsi: buslogic: Reduce stack usage Sasha Levin
2026-02-15 15:03 ` [PATCH AUTOSEL 6.19-5.15] tracing: Fix false sharing in hwlat get_sample() Sasha Levin
2026-02-15 15:03 ` [PATCH AUTOSEL 6.19-6.6] vhost: fix caching attributes of MMIO regions by setting them explicitly Sasha Levin
2026-02-15 15:03 ` Sasha Levin [this message]
2026-02-15 15:03 ` [PATCH AUTOSEL 6.19-6.6] ata: libata: avoid long timeouts on hot-unplugged SATA DAS Sasha Levin
2026-02-15 15:03 ` [PATCH AUTOSEL 6.19-5.15] RDMA/rtrs-clt: For conn rejection use actual err number Sasha Levin
2026-02-15 15:03 ` [PATCH AUTOSEL 6.19-6.18] um: Preserve errno within signal handler Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260215150333.2150455-8-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=alex@shazbot.org \
    --cc=kvm@vger.kernel.org \
    --cc=liulongfang@huawei.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox