From: Catalin Marinas <catalin.marinas@arm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Marc Zyngier <maz@kernel.org>,
ankita@nvidia.com,
Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
oliver.upton@linux.dev, suzuki.poulose@arm.com,
yuzenghui@huawei.com, will@kernel.org, ardb@kernel.org,
akpm@linux-foundation.org, gshan@redhat.com, aniketa@nvidia.com,
cjia@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com,
vsethi@nvidia.com, acurrid@nvidia.com, apopple@nvidia.com,
jhubbard@nvidia.com, danw@nvidia.com, mochs@nvidia.com,
kvmarm@lists.linux.dev, kvm@vger.kernel.org,
lpieralisi@kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v2 1/1] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory
Date: Wed, 6 Dec 2023 18:58:44 +0000 [thread overview]
Message-ID: <ZXDEZO6sS1dE_to9@arm.com> (raw)
In-Reply-To: <20231206172035.GU2692119@nvidia.com>
On Wed, Dec 06, 2023 at 01:20:35PM -0400, Jason Gunthorpe wrote:
> On Wed, Dec 06, 2023 at 04:31:48PM +0000, Catalin Marinas wrote:
> > > This would be fine, as would a VMA flag. Please pick one :)
> > >
> > > I think a VMA flag is simpler than messing with pgprot.
> >
> > I guess one could write a patch and see how it goes ;).
>
> A lot of patches have been sent on this already :(
But not one with a VM_* flag. I guess we could also add a VM_VFIO flag
which implies KVM has less restrictions on the memory type. I think
that's more bike-shedding.
The key point is that we don't want to relax this for whatever KVM may
map in the guest but only for certain devices. Just having a vma may not
be sufficient, we can't tell where that vma came from.
So for the vfio bits, completely untested:
-------------8<----------------------------
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 1929103ee59a..b89d2dfcd534 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1863,7 +1863,7 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
* See remap_pfn_range(), called from vfio_pci_fault() but we can't
* change vm_flags within the fault handler. Set them now.
*/
- vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
+ vm_flags_set(vma, VM_VFIO | VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_ops = &vfio_pci_mmap_ops;
return 0;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 418d26608ece..6df46fd7836a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -391,6 +391,13 @@ extern unsigned int kobjsize(const void *objp);
# define VM_UFFD_MINOR VM_NONE
#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
+#ifdef CONFIG_64BIT
+#define VM_VFIO_BIT 39
+#define VM_VFIO BIT(VM_VFIO_BIT)
+#else
+#define VM_VFIO VM_NONE
+#endif
+
/* Bits set in the VMA until the stack is in its final location */
#define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ | VM_STACK_EARLY)
-------------8<----------------------------
In KVM, Akita's patch would take this into account, not just rely on
"device==true".
> > > > If we want the VMM to drive this entirely, we could add a new mmap()
> > > > flag like MAP_WRITECOMBINE or PROT_WRITECOMBINE. They do feel a bit
> > >
> > > As in the other thread, we cannot unconditionally map NORMAL_NC into
> > > the VMM.
> >
> > I'm not suggesting this but rather the VMM map portions of the BAR with
> > either Device or Normal-NC, concatenate them (MAP_FIXED) and pass this
> > range as a memory slot (or multiple if a slot doesn't allow multiple
> > vmas).
>
> The VMM can't know what to do. We already talked about this. The VMM
> cannot be involved in the decision to make pages NORMAL_NC or
> not. That idea ignores how actual devices work.
[...]
> > Are the Device/Normal offsets within a BAR fixed, documented in e.g. the
> > spec or this is something configurable via some MMIO that the guest
> > does.
>
> No, it is fully dynamic on demand with firmware RPCs.
I think that's a key argument. The VMM cannot, on its own, configure the
BAR and figure a way to communicate this to the guest. We could invent
some para-virtualisation/trapping mechanism but that's unnecessarily
complicated. In the DPDK case, DPDK both configures and interacts with
the device. In the VMM/VM case, we need the VM to do this, we can't
split the configuration in VMM and interaction with the device in the
VM.
--
Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Marc Zyngier <maz@kernel.org>,
ankita@nvidia.com,
Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
oliver.upton@linux.dev, suzuki.poulose@arm.com,
yuzenghui@huawei.com, will@kernel.org, ardb@kernel.org,
akpm@linux-foundation.org, gshan@redhat.com, aniketa@nvidia.com,
cjia@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com,
vsethi@nvidia.com, acurrid@nvidia.com, apopple@nvidia.com,
jhubbard@nvidia.com, danw@nvidia.com, mochs@nvidia.com,
kvmarm@lists.linux.dev, kvm@vger.kernel.org,
lpieralisi@kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v2 1/1] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory
Date: Wed, 6 Dec 2023 18:58:44 +0000 [thread overview]
Message-ID: <ZXDEZO6sS1dE_to9@arm.com> (raw)
In-Reply-To: <20231206172035.GU2692119@nvidia.com>
On Wed, Dec 06, 2023 at 01:20:35PM -0400, Jason Gunthorpe wrote:
> On Wed, Dec 06, 2023 at 04:31:48PM +0000, Catalin Marinas wrote:
> > > This would be fine, as would a VMA flag. Please pick one :)
> > >
> > > I think a VMA flag is simpler than messing with pgprot.
> >
> > I guess one could write a patch and see how it goes ;).
>
> A lot of patches have been sent on this already :(
But not one with a VM_* flag. I guess we could also add a VM_VFIO flag
which implies KVM has less restrictions on the memory type. I think
that's more bike-shedding.
The key point is that we don't want to relax this for whatever KVM may
map in the guest but only for certain devices. Just having a vma may not
be sufficient, we can't tell where that vma came from.
So for the vfio bits, completely untested:
-------------8<----------------------------
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 1929103ee59a..b89d2dfcd534 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1863,7 +1863,7 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
* See remap_pfn_range(), called from vfio_pci_fault() but we can't
* change vm_flags within the fault handler. Set them now.
*/
- vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
+ vm_flags_set(vma, VM_VFIO | VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_ops = &vfio_pci_mmap_ops;
return 0;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 418d26608ece..6df46fd7836a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -391,6 +391,13 @@ extern unsigned int kobjsize(const void *objp);
# define VM_UFFD_MINOR VM_NONE
#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
+#ifdef CONFIG_64BIT
+#define VM_VFIO_BIT 39
+#define VM_VFIO BIT(VM_VFIO_BIT)
+#else
+#define VM_VFIO VM_NONE
+#endif
+
/* Bits set in the VMA until the stack is in its final location */
#define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ | VM_STACK_EARLY)
-------------8<----------------------------
In KVM, Akita's patch would take this into account, not just rely on
"device==true".
> > > > If we want the VMM to drive this entirely, we could add a new mmap()
> > > > flag like MAP_WRITECOMBINE or PROT_WRITECOMBINE. They do feel a bit
> > >
> > > As in the other thread, we cannot unconditionally map NORMAL_NC into
> > > the VMM.
> >
> > I'm not suggesting this but rather the VMM map portions of the BAR with
> > either Device or Normal-NC, concatenate them (MAP_FIXED) and pass this
> > range as a memory slot (or multiple if a slot doesn't allow multiple
> > vmas).
>
> The VMM can't know what to do. We already talked about this. The VMM
> cannot be involved in the decision to make pages NORMAL_NC or
> not. That idea ignores how actual devices work.
[...]
> > Are the Device/Normal offsets within a BAR fixed, documented in e.g. the
> > spec or this is something configurable via some MMIO that the guest
> > does.
>
> No, it is fully dynamic on demand with firmware RPCs.
I think that's a key argument. The VMM cannot, on its own, configure the
BAR and figure a way to communicate this to the guest. We could invent
some para-virtualisation/trapping mechanism but that's unnecessarily
complicated. In the DPDK case, DPDK both configures and interacts with
the device. In the VMM/VM case, we need the VM to do this, we can't
split the configuration in VMM and interaction with the device in the
VM.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-12-06 18:58 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-05 3:30 [PATCH v2 1/1] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory ankita
2023-12-05 3:30 ` ankita
2023-12-05 9:21 ` Marc Zyngier
2023-12-05 9:21 ` Marc Zyngier
2023-12-05 11:40 ` Catalin Marinas
2023-12-05 11:40 ` Catalin Marinas
2023-12-05 13:05 ` Jason Gunthorpe
2023-12-05 13:05 ` Jason Gunthorpe
2023-12-05 14:37 ` Lorenzo Pieralisi
2023-12-05 14:37 ` Lorenzo Pieralisi
2023-12-05 14:44 ` Jason Gunthorpe
2023-12-05 14:44 ` Jason Gunthorpe
2023-12-05 16:24 ` Catalin Marinas
2023-12-05 16:24 ` Catalin Marinas
2023-12-05 17:10 ` Jason Gunthorpe
2023-12-05 17:10 ` Jason Gunthorpe
2023-12-05 16:22 ` Catalin Marinas
2023-12-05 16:22 ` Catalin Marinas
2023-12-05 16:43 ` Jason Gunthorpe
2023-12-05 16:43 ` Jason Gunthorpe
2023-12-05 17:01 ` Marc Zyngier
2023-12-05 17:01 ` Marc Zyngier
2023-12-05 17:33 ` Catalin Marinas
2023-12-05 17:33 ` Catalin Marinas
2023-12-05 17:50 ` Marc Zyngier
2023-12-05 17:50 ` Marc Zyngier
2023-12-05 18:40 ` Catalin Marinas
2023-12-05 18:40 ` Catalin Marinas
2023-12-06 11:39 ` Marc Zyngier
2023-12-06 11:39 ` Marc Zyngier
2023-12-06 12:14 ` Catalin Marinas
2023-12-06 12:14 ` Catalin Marinas
2023-12-06 15:16 ` Jason Gunthorpe
2023-12-06 15:16 ` Jason Gunthorpe
2023-12-06 16:31 ` Catalin Marinas
2023-12-06 16:31 ` Catalin Marinas
2023-12-06 17:20 ` Jason Gunthorpe
2023-12-06 17:20 ` Jason Gunthorpe
2023-12-06 18:58 ` Catalin Marinas [this message]
2023-12-06 18:58 ` Catalin Marinas
2023-12-06 19:03 ` Jason Gunthorpe
2023-12-06 19:03 ` Jason Gunthorpe
2023-12-06 19:06 ` Catalin Marinas
2023-12-06 19:06 ` Catalin Marinas
2023-12-07 2:53 ` Ankit Agrawal
2023-12-07 2:53 ` Ankit Agrawal
2023-12-06 11:52 ` Lorenzo Pieralisi
2023-12-06 11:52 ` Lorenzo Pieralisi
2023-12-05 19:24 ` Catalin Marinas
2023-12-05 19:24 ` Catalin Marinas
2023-12-05 19:48 ` Jason Gunthorpe
2023-12-05 19:48 ` Jason Gunthorpe
2023-12-06 14:49 ` Catalin Marinas
2023-12-06 14:49 ` Catalin Marinas
2023-12-06 15:05 ` Jason Gunthorpe
2023-12-06 15:05 ` Jason Gunthorpe
2023-12-06 15:18 ` Lorenzo Pieralisi
2023-12-06 15:18 ` Lorenzo Pieralisi
2023-12-06 15:38 ` Jason Gunthorpe
2023-12-06 15:38 ` Jason Gunthorpe
2023-12-06 16:23 ` Catalin Marinas
2023-12-06 16:23 ` Catalin Marinas
2023-12-06 16:48 ` Jason Gunthorpe
2023-12-06 16:48 ` Jason Gunthorpe
2023-12-07 10:13 ` Lorenzo Pieralisi
2023-12-07 10:13 ` Lorenzo Pieralisi
2023-12-07 13:38 ` Jason Gunthorpe
2023-12-07 13:38 ` Jason Gunthorpe
2023-12-07 14:50 ` Lorenzo Pieralisi
2023-12-07 14:50 ` Lorenzo Pieralisi
2023-12-05 13:28 ` Lorenzo Pieralisi
2023-12-05 13:28 ` Lorenzo Pieralisi
2023-12-05 14:16 ` Shameerali Kolothum Thodi
2023-12-05 14:16 ` Shameerali Kolothum Thodi
2023-12-06 8:17 ` Shameerali Kolothum Thodi
2023-12-06 8:17 ` Shameerali Kolothum Thodi
2023-12-05 11:48 ` Catalin Marinas
2023-12-05 11:48 ` Catalin Marinas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZXDEZO6sS1dE_to9@arm.com \
--to=catalin.marinas@arm.com \
--cc=acurrid@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=aniketa@nvidia.com \
--cc=ankita@nvidia.com \
--cc=apopple@nvidia.com \
--cc=ardb@kernel.org \
--cc=cjia@nvidia.com \
--cc=danw@nvidia.com \
--cc=gshan@redhat.com \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=kwankhede@nvidia.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lpieralisi@kernel.org \
--cc=maz@kernel.org \
--cc=mochs@nvidia.com \
--cc=oliver.upton@linux.dev \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=suzuki.poulose@arm.com \
--cc=targupta@nvidia.com \
--cc=vsethi@nvidia.com \
--cc=will@kernel.org \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.