From: Alison Schofield <alison.schofield@intel.com>
To: Li Chen <me@linux.beauty>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>,
Dan Williams <dan.j.williams@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
"Dave Jiang" <dave.jiang@intel.com>,
Ira Weiny <ira.weiny@intel.com>, <virtualization@lists.linux.dev>,
<nvdimm@lists.linux.dev>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 0/7] nvdimm: virtio_pmem: fix request lifetime and converge broken queue failures
Date: Fri, 12 Jun 2026 11:42:29 -0700 [thread overview]
Message-ID: <aixTFVGZVDaCxMis@aschofie-mobl2.lan> (raw)
In-Reply-To: <20260609120726.1714780-1-me@linux.beauty>
On Tue, Jun 09, 2026 at 08:07:14PM +0800, Li Chen wrote:
> Hi,
Hi Li Chen,
In case you missed it, a Sashiko AI review of this set has posted
feedback. Please take a look.
https://sashiko.dev/#/patchset/20260609120726.1714780-1-me%40linux.beauty
-- Alison
>
> The nvdimm flush helper currently converts any non-zero provider flush
> callback error to -EIO. That hides useful errno values from providers. For
> example, virtio-pmem may fail child flush bio allocation with -ENOMEM, but
> that is currently reported as -EIO by nvdimm_flush().
>
> The raw failure seen in the local mkfs sanity test was:
>
> wipefs: /dev/pmem0: cannot flush modified buffers: Input/output error
> mkfs.ext4: Input/output error while writing out and closing file system
> nd_region region0: dbg: nvdimm_flush rc=-5
>
> The first two patches keep provider flush errors intact and make the
> virtio-pmem child flush bio allocation use GFP_NOIO. async_pmem_flush() can
> allocate a child flush bio from filesystem flush and writeback paths;
> GFP_NOIO is a better fit than GFP_ATOMIC there because the allocation may
> sleep but must not recurse into filesystem I/O reclaim. With these changes,
> the same mkfs sanity test reached mkfs_rc=0, mount_rc=0, and umount_rc=0.
>
> The rest of the series addresses virtio-pmem request lifetime and broken
> virtqueue handling. The virtio-pmem flush path uses a virtqueue cookie/token
> to carry a per-request context through completion. Under broken virtqueue /
> notify failure conditions, the submitter can return and free the request
> object while the host/backend may still complete the published request. The
> IRQ completion handler then dereferences freed memory when waking waiters,
> which is reported by KASAN as a slab-use-after-free and may manifest as lock
> corruption (e.g. "BUG: spinlock already unlocked") without KASAN.
>
> In addition, the flush path has two wait sites: one for virtqueue descriptor
> availability (-ENOSPC from virtqueue_add_sgs()) and one for request
> completion. If the virtqueue becomes broken, forward progress is no longer
> guaranteed and these waiters may sleep indefinitely unless the driver
> converges the failure and wakes all wait sites.
>
> This series addresses these issues:
>
> 1/7 nvdimm: preserve flush callback errors
> Return provider flush callback errors directly from nvdimm_flush().
>
> 2/7 nvdimm: virtio_pmem: use GFP_NOIO for child flush bio
> Use GFP_NOIO for the child flush bio allocation.
>
> 3/7 nvdimm: virtio_pmem: always wake -ENOSPC waiters
> Wake one -ENOSPC waiter for each reclaimed used buffer, decoupled from
> token completion.
>
> 4/7 nvdimm: virtio_pmem: use READ_ONCE()/WRITE_ONCE() for wait flags
> Use READ_ONCE()/WRITE_ONCE() for the wait_event() flags (done and
> wq_buf_avail).
>
> 5/7 nvdimm: virtio_pmem: refcount requests for token lifetime
> Refcount request objects so the token lifetime spans the window where it is
> reachable through the virtqueue until completion/drain drops the virtqueue
> reference.
>
> 6/7 nvdimm: virtio_pmem: converge broken virtqueue to -EIO
> Track a device-level broken state to converge broken/notify failures to -EIO:
> wake all waiters and drain/detach outstanding requests to complete them with
> an error, and fail-fast new requests.
>
> 7/7 nvdimm: virtio_pmem: drain requests in freeze
> Drain outstanding requests in freeze() before tearing down virtqueues so
> waiters do not sleep indefinitely.
>
> Testing was done on QEMU x86_64 with a virtio-pmem device exported as
> /dev/pmem0. This v4 series applies to v7.1-rc7, builds with
> CONFIG_VIRTIO_PMEM=m, passes checkpatch, and passed the local repro checks
> with a local-only virtqueue_kick() fault injection. I also checked that it
> applies cleanly to next/master at 6e845bcb78c9 ("Add linux-next specific
> files for 20260605").
>
> Thanks,
> Li Chen
>
> Changelog:
> v3->v4:
> - Rebased the series onto v7.1-rc7 so it applies cleanly to Linux 7.1-rc7.
> - Update the allocation site in 6/7 from kmalloc(sizeof(*req_data),
> GFP_KERNEL) to kmalloc_obj(*req_data) to match current nvdimm code.
> - Add 1/7 to preserve provider flush callback errors in nvdimm_flush().
> - Include the GFP_NOIO child flush bio allocation fix as 2/7.
> - Renumber the old request lifetime and broken virtqueue fixes after the two
> new flush error patches.
> v2->v3:
> - Split patch 1 as suggested by Pankaj Gupta: keep the waiter wakeup
> ordering change in 1/5 and move READ_ONCE()/WRITE_ONCE() updates to
> 2/5 (no functional change intended).
> - Add log report to commit msg
> - Fold the export fix into 4/5 to keep the series bisectable when
> CONFIG_VIRTIO_PMEM=m.
> v1->v2: add the export patch to fix compile issue.
>
> Links:
> v3: https://lore.kernel.org/all/20260226025712.2236279-1-me@linux.beauty/#t
> v2: https://lore.kernel.org/all/20251225042915.334117-1-me@linux.beauty/
> v1: https://www.spinics.net/lists/kernel/msg5974818.html
>
> Li Chen (7):
> nvdimm: preserve flush callback errors
> nvdimm: virtio_pmem: use GFP_NOIO for child flush bio
> nvdimm: virtio_pmem: always wake -ENOSPC waiters
> nvdimm: virtio_pmem: use READ_ONCE()/WRITE_ONCE() for wait flags
> nvdimm: virtio_pmem: refcount requests for token lifetime
> nvdimm: virtio_pmem: converge broken virtqueue to -EIO
> nvdimm: virtio_pmem: drain requests in freeze
>
> drivers/nvdimm/nd_virtio.c | 139 +++++++++++++++++++++++++++++------
> drivers/nvdimm/region_devs.c | 6 +-
> drivers/nvdimm/virtio_pmem.c | 14 ++++
> drivers/nvdimm/virtio_pmem.h | 6 ++
> 4 files changed, 139 insertions(+), 26 deletions(-)
> --
> 2.52.0
prev parent reply other threads:[~2026-06-12 18:42 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-09 12:07 [PATCH v4 0/7] nvdimm: virtio_pmem: fix request lifetime and converge broken queue failures Li Chen
2026-06-09 12:07 ` [PATCH v4 1/7] nvdimm: preserve flush callback errors Li Chen
2026-06-09 12:07 ` [PATCH v4 2/7] nvdimm: virtio_pmem: use GFP_NOIO for child flush bio Li Chen
2026-06-09 12:07 ` [PATCH v4 3/7] nvdimm: virtio_pmem: always wake -ENOSPC waiters Li Chen
2026-06-09 12:07 ` [PATCH v4 4/7] nvdimm: virtio_pmem: use READ_ONCE()/WRITE_ONCE() for wait flags Li Chen
2026-06-09 12:07 ` [PATCH v4 5/7] nvdimm: virtio_pmem: refcount requests for token lifetime Li Chen
2026-06-09 12:07 ` [PATCH v4 6/7] nvdimm: virtio_pmem: converge broken virtqueue to -EIO Li Chen
2026-06-09 12:07 ` [PATCH v4 7/7] nvdimm: virtio_pmem: drain requests in freeze Li Chen
2026-06-12 18:42 ` Alison Schofield [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aixTFVGZVDaCxMis@aschofie-mobl2.lan \
--to=alison.schofield@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=ira.weiny@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=me@linux.beauty \
--cc=nvdimm@lists.linux.dev \
--cc=pankaj.gupta.linux@gmail.com \
--cc=virtualization@lists.linux.dev \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox