From: Pranjal Shrivastava <praan@google.com>
To: Matt Evans <matt@ozlabs.org>
Cc: "Alex Williamson" <alex@shazbot.org>,
"Leon Romanovsky" <leon@kernel.org>,
"Jason Gunthorpe" <jgg@nvidia.com>,
"Alex Mastro" <amastro@fb.com>,
"Christian König" <christian.koenig@amd.com>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Logan Gunthorpe" <logang@deltatee.com>,
"Mahmoud Adam" <mngyadam@amazon.de>,
"David Matlack" <dmatlack@google.com>,
"Björn Töpel" <bjorn@kernel.org>,
"Sumit Semwal" <sumit.semwal@linaro.org>,
"Kevin Tian" <kevin.tian@intel.com>,
"Ankit Agrawal" <ankita@nvidia.com>,
"Alistair Popple" <apopple@nvidia.com>,
"Vivek Kasireddy" <vivek.kasireddy@intel.com>,
linux-kernel@vger.kernel.org, linux-media@vger.kernel.org,
dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org,
kvm@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: [PATCH v3 6/9] vfio/pci: Clean up BAR zap and revocation
Date: Fri, 12 Jun 2026 19:39:17 +0000 [thread overview]
Message-ID: <aixgZQiBQKgS7yIM@google.com> (raw)
In-Reply-To: <20260610154327.37758-7-matt@ozlabs.org>
On Wed, Jun 10, 2026 at 04:43:20PM +0100, Matt Evans wrote:
> Previously, vfio_pci_zap_bars() (and the wrapper
> vfio_pci_zap_and_down_write_memory_lock()) calls were paired with
> calls to vfio_pci_dma_buf_move().
>
> This commit replaces them with a unified new function,
> vfio_pci_zap_revoke_bars() containing both the vfio_pci_dma_buf_move()
> and the unmap_mapping_range(), making it harder for callers to omit
> one. It adds a wrapper, vfio_pci_lock_zap_revoke_bars(), which takes
> the write memory_lock before zapping, and adds a new
> vfio_pci_unrevoke_bars() for the re-enable path.
>
> As of "vfio/pci: Convert BAR mmap() to use a DMABUF", the
> unmap_mapping_range() to zap is no longer performed for vfio-pci since
> the DMABUFs used for BAR mappings already zap PTEs when the
> vfio_pci_dma_buf_move() occurs.
>
> However, it must be assumed that VFIO drivers which override the .mmap
> op could create mappings _not_ backed by DMABUFs. So, the zap is
> still performed on revoke if .mmap is overridden, using a new
> zap_bars_on_revoke flag. A driver can explicitly opt out; the flag is
> cleared by the hisi_acc_vfio_pci driver, since its .mmap just wraps
> vfio_pci_core_mmap() and so still uses DMABUFs.
>
> Signed-off-by: Matt Evans <matt@ozlabs.org>
> ---
> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 8 +++
> drivers/vfio/pci/vfio_pci_config.c | 30 ++++----
> drivers/vfio/pci/vfio_pci_core.c | 70 +++++++++++++------
> drivers/vfio/pci/vfio_pci_priv.h | 3 +-
> include/linux/vfio_pci_core.h | 1 +
> 5 files changed, 73 insertions(+), 39 deletions(-)
>
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> index 86362ec424a5..51990f6d66d5 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> @@ -1692,6 +1692,14 @@ static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device
> if (ret)
> goto out_put_vdev;
>
> + /*
> + * hisi_acc_vfio_pci_mmap() calls down to
> + * vfio_pci_core_mmap(), so BAR mappings are still
> + * DMABUF-backed. They don't require a zap on revoke, so opt
> + * out:
> + */
> + hisi_acc_vdev->core_device.zap_bars_on_revoke = false;
> +
This seems to be happening after we vfio_pci_core_register_device, which
could be slightly problematic if another device in the same group races
to trigger a hot reset before we can set this to false. Could we
initialize this flag before registration instead?
> hisi_acc_vfio_debug_init(hisi_acc_vdev);
> return 0;
>
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index a10ed733f0e3..8bfab0da481c 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -590,12 +590,10 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos,
> virt_mem = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_MEMORY);
> new_mem = !!(new_cmd & PCI_COMMAND_MEMORY);
>
> - if (!new_mem) {
> - vfio_pci_zap_and_down_write_memory_lock(vdev);
> - vfio_pci_dma_buf_move(vdev, true);
> - } else {
> + if (!new_mem)
> + vfio_pci_lock_zap_revoke_bars(vdev);
> + else
> down_write(&vdev->memory_lock);
> - }
>
> /*
> * If the user is writing mem/io enable (new_mem/io) and we
> @@ -631,7 +629,7 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos,
> *virt_cmd |= cpu_to_le16(new_cmd & mask);
>
> if (__vfio_pci_memory_enabled(vdev))
> - vfio_pci_dma_buf_move(vdev, false);
> + vfio_pci_unrevoke_bars(vdev);
> up_write(&vdev->memory_lock);
> }
>
> @@ -712,16 +710,14 @@ static int __init init_pci_cap_basic_perm(struct perm_bits *perm)
> static void vfio_lock_and_set_power_state(struct vfio_pci_core_device *vdev,
> pci_power_t state)
> {
> - if (state >= PCI_D3hot) {
> - vfio_pci_zap_and_down_write_memory_lock(vdev);
> - vfio_pci_dma_buf_move(vdev, true);
> - } else {
> + if (state >= PCI_D3hot)
> + vfio_pci_lock_zap_revoke_bars(vdev);
> + else
> down_write(&vdev->memory_lock);
> - }
>
> vfio_pci_set_power_state(vdev, state);
> if (__vfio_pci_memory_enabled(vdev))
> - vfio_pci_dma_buf_move(vdev, false);
> + vfio_pci_unrevoke_bars(vdev);
> up_write(&vdev->memory_lock);
> }
>
> @@ -908,11 +904,10 @@ static int vfio_exp_config_write(struct vfio_pci_core_device *vdev, int pos,
> &cap);
>
> if (!ret && (cap & PCI_EXP_DEVCAP_FLR)) {
> - vfio_pci_zap_and_down_write_memory_lock(vdev);
> - vfio_pci_dma_buf_move(vdev, true);
> + vfio_pci_lock_zap_revoke_bars(vdev);
> pci_try_reset_function(vdev->pdev);
> if (__vfio_pci_memory_enabled(vdev))
> - vfio_pci_dma_buf_move(vdev, false);
> + vfio_pci_unrevoke_bars(vdev);
> up_write(&vdev->memory_lock);
> }
> }
> @@ -993,11 +988,10 @@ static int vfio_af_config_write(struct vfio_pci_core_device *vdev, int pos,
> &cap);
>
> if (!ret && (cap & PCI_AF_CAP_FLR) && (cap & PCI_AF_CAP_TP)) {
> - vfio_pci_zap_and_down_write_memory_lock(vdev);
> - vfio_pci_dma_buf_move(vdev, true);
> + vfio_pci_lock_zap_revoke_bars(vdev);
> pci_try_reset_function(vdev->pdev);
> if (__vfio_pci_memory_enabled(vdev))
> - vfio_pci_dma_buf_move(vdev, false);
> + vfio_pci_unrevoke_bars(vdev);
> up_write(&vdev->memory_lock);
> }
> }
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index f9636d8f9e2a..5ea0bd4e7876 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -319,8 +319,7 @@ static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev,
> * The vdev power related flags are protected with 'memory_lock'
> * semaphore.
> */
> - vfio_pci_zap_and_down_write_memory_lock(vdev);
> - vfio_pci_dma_buf_move(vdev, true);
> + vfio_pci_lock_zap_revoke_bars(vdev);
>
> if (vdev->pm_runtime_engaged) {
> up_write(&vdev->memory_lock);
> @@ -406,7 +405,7 @@ static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev)
> down_write(&vdev->memory_lock);
> __vfio_pci_runtime_pm_exit(vdev);
> if (__vfio_pci_memory_enabled(vdev))
> - vfio_pci_dma_buf_move(vdev, false);
> + vfio_pci_unrevoke_bars(vdev);
> up_write(&vdev->memory_lock);
> }
>
> @@ -1256,6 +1255,8 @@ static int vfio_pci_ioctl_set_irqs(struct vfio_pci_core_device *vdev,
> return ret;
> }
>
> +static void vfio_pci_zap_revoke_bars(struct vfio_pci_core_device *vdev);
> +
> static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
> void __user *arg)
> {
> @@ -1264,7 +1265,7 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
> if (!vdev->reset_works)
> return -EINVAL;
>
> - vfio_pci_zap_and_down_write_memory_lock(vdev);
> + down_write(&vdev->memory_lock);
>
> /*
> * This function can be invoked while the power state is non-D0. If
> @@ -1277,10 +1278,11 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
> */
> vfio_pci_set_power_state(vdev, PCI_D0);
>
> - vfio_pci_dma_buf_move(vdev, true);
> + vfio_pci_zap_revoke_bars(vdev);
I'm wondering if this change in behavior is correct?
BEFORE this patch the sequence was:
1. zap vma mappings
2. Enter D0
After this patch the sequence becomes
1. Take the lock
2. Enter D0
3. zap vma mappings
My worry is if user-space accesses a BAR *during* the transition to D0,
it could crash since the mappings still exist during the transition?
The old code is immune to it because it removed user-mappings first.
Following the discussion from v1 regarding the ordering of
vfio_pci_dma_buf_move() and the D0 transition.. while it makes sense to
perform the DMABUF revocation/move after the hardware is in D0.. I'm not
too confident about moving zap after D0 :/
I mean, sure, the user would just see all Fs on a read and writes will
be dropped silently until we are in D0.. but the behaviour before this
change was that the user access will fault and hang on the memory_lock
instead which ensures that the user observes a consistent dev state..
> +
> ret = pci_try_reset_function(vdev->pdev);
> if (__vfio_pci_memory_enabled(vdev))
> - vfio_pci_dma_buf_move(vdev, false);
> + vfio_pci_unrevoke_bars(vdev);
> up_write(&vdev->memory_lock);
>
> return ret;
> @@ -1648,20 +1650,37 @@ ssize_t vfio_pci_core_write(struct vfio_device *core_vdev, const char __user *bu
> }
Thanks,
Praan
next prev parent reply other threads:[~2026-06-12 19:39 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-10 15:43 [PATCH v3 0/9] vfio/pci: Add mmap() for DMABUFs Matt Evans
2026-06-10 15:43 ` [PATCH v3 1/9] PCI/P2PDMA: Add CONFIG_PCI_P2PDMA_CORE Matt Evans
2026-06-10 18:39 ` Leon Romanovsky
2026-06-11 16:07 ` Bjorn Helgaas
2026-06-11 17:44 ` Matt Evans
2026-06-11 18:37 ` Pranjal Shrivastava
2026-06-12 3:39 ` Tian, Kevin
2026-06-12 14:31 ` Matt Evans
2026-06-10 15:43 ` [PATCH v3 2/9] vfio/pci: Add a helper to look up PFNs for DMABUFs Matt Evans
2026-06-11 20:30 ` Pranjal Shrivastava
2026-06-12 17:37 ` Alex Williamson
2026-06-12 18:21 ` Pranjal Shrivastava
2026-06-12 8:42 ` Tian, Kevin
2026-06-10 15:43 ` [PATCH v3 3/9] vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA Matt Evans
2026-06-12 8:43 ` Tian, Kevin
2026-06-12 9:20 ` Pranjal Shrivastava
2026-06-10 15:43 ` [PATCH v3 4/9] vfio/pci: Convert BAR mmap() to use a DMABUF Matt Evans
2026-06-12 8:46 ` Tian, Kevin
2026-06-12 10:41 ` Pranjal Shrivastava
2026-06-12 15:22 ` Matt Evans
2026-06-12 19:43 ` Pranjal Shrivastava
2026-06-10 15:43 ` [PATCH v3 5/9] vfio/pci: Provide a user-facing name for BAR mappings Matt Evans
2026-06-12 8:46 ` Tian, Kevin
2026-06-12 14:06 ` Pranjal Shrivastava
2026-06-10 15:43 ` [PATCH v3 6/9] vfio/pci: Clean up BAR zap and revocation Matt Evans
2026-06-12 19:39 ` Pranjal Shrivastava [this message]
2026-06-10 15:43 ` [PATCH v3 7/9] vfio/pci: Support mmap() of a VFIO DMABUF Matt Evans
2026-06-12 20:35 ` Pranjal Shrivastava
2026-06-10 15:43 ` [PATCH v3 8/9] vfio/pci: Permanently revoke a DMABUF on request Matt Evans
2026-06-10 15:43 ` [PATCH v3 9/9] vfio/pci: Add mmap() attributes to DMABUF feature Matt Evans
2026-06-12 8:27 ` [PATCH v3 0/9] vfio/pci: Add mmap() for DMABUFs Tian, Kevin
2026-06-12 15:11 ` Matt Evans
2026-06-12 15:17 ` Pranjal Shrivastava
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aixgZQiBQKgS7yIM@google.com \
--to=praan@google.com \
--cc=alex@shazbot.org \
--cc=amastro@fb.com \
--cc=ankita@nvidia.com \
--cc=apopple@nvidia.com \
--cc=bhelgaas@google.com \
--cc=bjorn@kernel.org \
--cc=christian.koenig@amd.com \
--cc=dmatlack@google.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=jgg@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=matt@ozlabs.org \
--cc=mngyadam@amazon.de \
--cc=sumit.semwal@linaro.org \
--cc=vivek.kasireddy@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.