public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Longfang Liu <liulongfang@huawei.com>,
	Alex Williamson <alex@shazbot.org>,
	Sasha Levin <sashal@kernel.org>,
	kvm@vger.kernel.org
Subject: [PATCH AUTOSEL 6.19-6.1] hisi_acc_vfio_pci: update status after RAS error
Date: Sun, 15 Feb 2026 10:03:20 -0500	[thread overview]
Message-ID: <20260215150333.2150455-3-sashal@kernel.org> (raw)
In-Reply-To: <20260215150333.2150455-1-sashal@kernel.org>

From: Longfang Liu <liulongfang@huawei.com>

[ Upstream commit 8be14dd48dfee0df91e511acceb4beeb2461a083 ]

After a RAS error occurs on the accelerator device, the accelerator
device will be reset. The live migration state will be abnormal
after reset, and the original state needs to be restored during
the reset process.
Therefore, reset processing needs to be performed in a live
migration scenario.

Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Link: https://lore.kernel.org/r/20260122020205.2884497-3-liulongfang@huawei.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

This confirms the critical finding.

### 3. Bug Classification: Logic Error Causing Failed Reset After RAS
Error

The old condition was:
```c
if (hisi_acc_vdev->core_device.vdev.migration_flags !=
VFIO_MIGRATION_STOP_COPY)
    return;
```

But `migration_flags` is set to `VFIO_MIGRATION_STOP_COPY |
VFIO_MIGRATION_PRE_COPY` (line 1590), which is `0x1 | 0x4 = 0x5`, not
`0x1`.

So the condition `migration_flags != VFIO_MIGRATION_STOP_COPY` evaluates
to `0x5 != 0x1` = **TRUE**, causing the function to **always return
early** and **never perform the reset**.

This means:
- After a RAS error, the device resets
- The migration state becomes inconsistent
- The `hisi_acc_vf_reset()` call that should restore state is **never
  reached**
- The device is left in a broken/inconsistent migration state

The new condition `!mig_ops` correctly checks whether migration is
supported (the pointer is non-NULL when migration ops are registered),
which aligns with how the VFIO core itself checks for migration
capability.

### 4. Scope and Risk Assessment

- **Change size**: 2 lines modified (one condition check)
- **Files touched**: 1 file
- **Risk**: Very low - the change is a simple condition check
  improvement
- **Scope**: Well-contained to the AER reset handler for HiSilicon
  accelerator VFIO devices

### 5. User Impact

- **Who is affected**: Users of HiSilicon accelerators (SEC, HPRE, ZIP
  engines) with live migration enabled, especially in
  cloud/virtualization environments
- **Severity**: After a RAS error during live migration, the device
  state would not be properly restored, potentially causing:
  - Failed live migrations
  - Corrupted device state
  - Guest VM malfunction after host-side RAS recovery

### 6. Stability Indicators

- Merged by Alex Williamson (VFIO subsystem maintainer)
- The fix is obviously correct - the old check was demonstrably wrong
  due to the exact equality comparison against a bitmask field that has
  multiple bits set

### 7. Dependency Check

This change is self-contained. It doesn't depend on other commits. The
`mig_ops` field has existed in `struct vfio_device` since the VFIO
migration rework (which is present in recent stable kernels).

### Conclusion

This is a clear bug fix. The old condition had a logic error that caused
the migration reset handler to be completely non-functional — it would
**always** return early because `migration_flags` was set to `STOP_COPY
| PRE_COPY` but was compared with exact equality to just `STOP_COPY`.
The fix is minimal (2 lines), obviously correct, and addresses a real
data integrity/reliability issue during RAS error recovery in live
migration scenarios. The change was accepted by the VFIO maintainer.

The only consideration is whether `mig_ops` exists in stable tree
versions, but since it's part of the VFIO migration rework that preceded
the addition of `PRE_COPY` support, it should be present in any kernel
that has this driver with `PRE_COPY` capability.

**YES**

 drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 8ed00f6183622..1c0b960de93c6 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -1192,8 +1192,7 @@ static void hisi_acc_vf_pci_aer_reset_done(struct pci_dev *pdev)
 {
 	struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_drvdata(pdev);
 
-	if (hisi_acc_vdev->core_device.vdev.migration_flags !=
-				VFIO_MIGRATION_STOP_COPY)
+	if (!hisi_acc_vdev->core_device.vdev.mig_ops)
 		return;
 
 	mutex_lock(&hisi_acc_vdev->state_mutex);
-- 
2.51.0


  parent reply	other threads:[~2026-02-15 15:03 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-15 15:03 [PATCH AUTOSEL 6.19-6.12] riscv: vector: init vector context with proper vlenb Sasha Levin
2026-02-15 15:03 ` [PATCH AUTOSEL 6.19-6.12] scsi: ufs: mediatek: Fix page faults in ufs_mtk_clk_scale() trace event Sasha Levin
2026-02-15 15:03 ` Sasha Levin [this message]
2026-02-15 15:03 ` [PATCH AUTOSEL 6.19-6.18] hisi_acc_vfio_pci: fix the queue parameter anomaly issue Sasha Levin
2026-02-15 15:03 ` [PATCH AUTOSEL 6.19-5.15] scsi: buslogic: Reduce stack usage Sasha Levin
2026-02-15 15:03 ` [PATCH AUTOSEL 6.19-5.15] tracing: Fix false sharing in hwlat get_sample() Sasha Levin
2026-02-15 15:03 ` [PATCH AUTOSEL 6.19-6.6] vhost: fix caching attributes of MMIO regions by setting them explicitly Sasha Levin
2026-02-15 15:03 ` [PATCH AUTOSEL 6.19-6.18] hisi_acc_vfio_pci: resolve duplicate migration states Sasha Levin
2026-02-15 15:03 ` [PATCH AUTOSEL 6.19-6.6] ata: libata: avoid long timeouts on hot-unplugged SATA DAS Sasha Levin
2026-02-15 15:03 ` [PATCH AUTOSEL 6.19-5.15] RDMA/rtrs-clt: For conn rejection use actual err number Sasha Levin
2026-02-15 15:03 ` [PATCH AUTOSEL 6.19-6.18] um: Preserve errno within signal handler Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260215150333.2150455-3-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=alex@shazbot.org \
    --cc=kvm@vger.kernel.org \
    --cc=liulongfang@huawei.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox