* VFIO (PCI) and write combine mapping of BARs @ 2023-07-14 2:32 Benjamin Herrenschmidt 2023-07-14 7:13 ` Lorenzo Pieralisi 0 siblings, 1 reply; 6+ messages in thread From: Benjamin Herrenschmidt @ 2023-07-14 2:32 UTC (permalink / raw) To: kvm; +Cc: linux-kernel, alex.williamson, osamaabb, linux-pci, Clint Sbisa Hi Folks ! I'd like to revive an old discussion as we (Amazon Linux) have been getting asks for it. What's the best interface to provide the option of write combine mmap's of BARs via VFIO ? The problem isn't so much the low level implementation, we just have to play with the pgprot, the question is more around what API to present to control this. One trivial way would be to have an ioctl to set a flag for a given region/BAR that cause subsequent mmap's to use write-combine. We would have to keep a bitmap for the "legacy" regions, and use a flag in struct vfio_pci_region for the others. One potentially better way is to make it strictly an attribute of vfio_pci_region, along with an ioctl that creates a "subregion". The idea here is that we would have an ioctl to create a region from an existing region dynamically, which represents a subset of the original region (typically a BAR), with potentially different attributes (or we keep the attribute get/set separate). I like the latter more because it will allow to more easily define that portions of a BAR can need different attributes without causing state/race issues between setting the attribute and mmap. This will also enable other attributes than write-combine if/when the need arises. Any better idea ? thoughs ? objections ? This is still quite specific to PCI, but so is the entire regions mechanism, so I don't see an easy path to something more generic at this stage. Cheers, Ben. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: VFIO (PCI) and write combine mapping of BARs 2023-07-14 2:32 VFIO (PCI) and write combine mapping of BARs Benjamin Herrenschmidt @ 2023-07-14 7:13 ` Lorenzo Pieralisi 2023-07-14 12:37 ` Jason Gunthorpe 0 siblings, 1 reply; 6+ messages in thread From: Lorenzo Pieralisi @ 2023-07-14 7:13 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: kvm, linux-kernel, alex.williamson, osamaabb, linux-pci, Clint Sbisa, catalin.marinas, jgg, maz [+Catalin, Marc, Jason] On Fri, Jul 14, 2023 at 12:32:49PM +1000, Benjamin Herrenschmidt wrote: > Hi Folks ! > > I'd like to revive an old discussion as we (Amazon Linux) have been > getting asks for it. > > What's the best interface to provide the option of write combine mmap's > of BARs via VFIO ? There is an ongoing thread on this topic that we should use to bring this discussion to completion: https://lore.kernel.org/linux-arm-kernel/ZHcxHbCb439I1Uk2@arm.com > > The problem isn't so much the low level implementation, we just have to > play with the pgprot, the question is more around what API to present > to control this. > > One trivial way would be to have an ioctl to set a flag for a given > region/BAR that cause subsequent mmap's to use write-combine. We would > have to keep a bitmap for the "legacy" regions, and use a flag in > struct vfio_pci_region for the others. > > One potentially better way is to make it strictly an attribute of > vfio_pci_region, along with an ioctl that creates a "subregion". The > idea here is that we would have an ioctl to create a region from an > existing region dynamically, which represents a subset of the original > region (typically a BAR), with potentially different attributes (or we > keep the attribute get/set separate). > > I like the latter more because it will allow to more easily define that > portions of a BAR can need different attributes without causing > state/race issues between setting the attribute and mmap. > > This will also enable other attributes than write-combine if/when the > need arises. > > Any better idea ? thoughs ? objections ? > > This is still quite specific to PCI, but so is the entire regions > mechanism, so I don't see an easy path to something more generic at > this stage. > > Cheers, > Ben. > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: VFIO (PCI) and write combine mapping of BARs 2023-07-14 7:13 ` Lorenzo Pieralisi @ 2023-07-14 12:37 ` Jason Gunthorpe 2023-07-25 6:15 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 6+ messages in thread From: Jason Gunthorpe @ 2023-07-14 12:37 UTC (permalink / raw) To: Lorenzo Pieralisi Cc: Benjamin Herrenschmidt, kvm, linux-kernel, alex.williamson, osamaabb, linux-pci, Clint Sbisa, catalin.marinas, maz On Fri, Jul 14, 2023 at 09:13:27AM +0200, Lorenzo Pieralisi wrote: > [+Catalin, Marc, Jason] > > On Fri, Jul 14, 2023 at 12:32:49PM +1000, Benjamin Herrenschmidt wrote: > > Hi Folks ! > > > > I'd like to revive an old discussion as we (Amazon Linux) have been > > getting asks for it. > > > > What's the best interface to provide the option of write combine mmap's > > of BARs via VFIO ? > > There is an ongoing thread on this topic that we should use to > bring this discussion to completion: > > https://lore.kernel.org/linux-arm-kernel/ZHcxHbCb439I1Uk2@arm.com There are two topics here 1) Make ARM KVM allow the VM to select WC for its MMIO. This has evolved in a way that is not related to VFIO 2) Allow VFIO to create mmaps with WC for non-VM use cases like DPDK. We have a draft patch for #1, and I think a general understanding with ARM folks that this is the right direction. 2 is more like what this email talks about - providing mmaps with specific flags. Benjamin, which are you interested in? > > The problem isn't so much the low level implementation, we just have to > > play with the pgprot, the question is more around what API to present > > to control this. Assuming this is for #2, I think VFIO has fallen into a bit of a trap by allowing userspace to form the mmap offset. I've seen this happen in other subsystems too. It seems like a good idea then you realize you need more stuff in the mmap space and become sad. Typically the way out is to covert the mmap offset into a cookie where userspace issues some ioctl and then the ioctl returns an opaque mmap offset to use. eg in the vfio context you'd do some 'prepare region for mmap' ioctl where you could specify flags. The kernel would encode the flags in the cookie and then mmap would do the right thing. Adding more stuff is done by enhancing the prepare ioctl. Legacy mmap offsets are kept working. > > This is still quite specific to PCI, but so is the entire regions > > mechanism, so I don't see an easy path to something more generic at > > this stage. Regions are general, but the encoding of the mmap cookie has various PCI semantics when used with the PCI interface.. We'd want the same ability with platform devices too, for instance. Jason ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: VFIO (PCI) and write combine mapping of BARs 2023-07-14 12:37 ` Jason Gunthorpe @ 2023-07-25 6:15 ` Benjamin Herrenschmidt 2023-07-25 11:39 ` Jason Gunthorpe 0 siblings, 1 reply; 6+ messages in thread From: Benjamin Herrenschmidt @ 2023-07-25 6:15 UTC (permalink / raw) To: Jason Gunthorpe, Lorenzo Pieralisi Cc: kvm, linux-kernel, alex.williamson, osamaabb, linux-pci, Clint Sbisa, catalin.marinas, maz On Fri, 2023-07-14 at 09:37 -0300, Jason Gunthorpe wrote: > > There are two topics here > > 1) Make ARM KVM allow the VM to select WC for its MMIO. This has > evolved in a way that is not related to VFIO > > 2) Allow VFIO to create mmaps with WC for non-VM use cases like DPDK. > > We have a draft patch for #1, and I think a general understanding with > ARM folks that this is the right direction. > > 2 is more like what this email talks about - providing mmaps with > specific flags. > > Benjamin, which are you interested in? Sorry for the delay, got caught up.... The customer request we have (and what I was indeed talking about) is 2. That said, when running in a VM, 2 won't do much without 1. > > > The problem isn't so much the low level implementation, we just have to > > > play with the pgprot, the question is more around what API to present > > > to control this. > > Assuming this is for #2, I think VFIO has fallen into a bit of a trap > by allowing userspace to form the mmap offset. I've seen this happen > in other subsystems too. It seems like a good idea then you realize > you need more stuff in the mmap space and become sad. > > Typically the way out is to covert the mmap offset into a cookie where > userspace issues some ioctl and then the ioctl returns an opaque mmap > offset to use. > > eg in the vfio context you'd do some 'prepare region for mmap' ioctl > where you could specify flags. The kernel would encode the flags in > the cookie and then mmap would do the right thing. Adding more stuff > is done by enhancing the prepare ioctl. > > Legacy mmap offsets are kept working. This indeed what I have in mind. IE. VFIO has legacy regions and add-on regions though the latter is currently only exploited by some drivers that create their own add-on regions. My proposal is to add an ioctl to create them from userspace as "children" of an existing driver-provided region, allowing to set different attributes for mmap. > > > This is still quite specific to PCI, but so is the entire regions > > > mechanism, so I don't see an easy path to something more generic at > > > this stage. > > Regions are general, but the encoding of the mmap cookie has various > PCI semantics when used with the PCI interface.. > > We'd want the same ability with platform devices too, for instance. In the current VFIO the implementation is *entirely* in vfio_pci_core for PCI and entirely in vfio_platform_common.c for platform, so while the same ioctls could be imagined to create sub-regions, it would have to be completely implemented twice unless we do a lot of heavy lifting to move some of that region stuff into common code. But yes, appart from that, no objection :-) Cheers, Ben. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: VFIO (PCI) and write combine mapping of BARs 2023-07-25 6:15 ` Benjamin Herrenschmidt @ 2023-07-25 11:39 ` Jason Gunthorpe 2023-07-26 1:19 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 6+ messages in thread From: Jason Gunthorpe @ 2023-07-25 11:39 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Lorenzo Pieralisi, kvm, linux-kernel, alex.williamson, osamaabb, linux-pci, Clint Sbisa, catalin.marinas, maz On Tue, Jul 25, 2023 at 04:15:39PM +1000, Benjamin Herrenschmidt wrote: > > Assuming this is for #2, I think VFIO has fallen into a bit of a trap > > by allowing userspace to form the mmap offset. I've seen this happen > > in other subsystems too. It seems like a good idea then you realize > > you need more stuff in the mmap space and become sad. > > > > Typically the way out is to covert the mmap offset into a cookie where > > userspace issues some ioctl and then the ioctl returns an opaque mmap > > offset to use. > > > > eg in the vfio context you'd do some 'prepare region for mmap' ioctl > > where you could specify flags. The kernel would encode the flags in > > the cookie and then mmap would do the right thing. Adding more stuff > > is done by enhancing the prepare ioctl. > > > > Legacy mmap offsets are kept working. > > This indeed what I have in mind. IE. VFIO has legacy regions and add-on > regions though the latter is currently only exploited by some drivers > that create their own add-on regions. My proposal is to add an ioctl to > create them from userspace as "children" of an existing driver-provided > region, allowing to set different attributes for mmap. I wouldn't call it children, you are just getting a different mmap cookie for the same region object. > In the current VFIO the implementation is *entirely* in vfio_pci_core > for PCI and entirely in vfio_platform_common.c for platform, so while > the same ioctls could be imagined to create sub-regions, it would have > to be completely implemented twice unless we do a lot of heavy lifting > to move some of that region stuff into common code. The machinery for managing the mmap cookies should be in common code Jason ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: VFIO (PCI) and write combine mapping of BARs 2023-07-25 11:39 ` Jason Gunthorpe @ 2023-07-26 1:19 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 6+ messages in thread From: Benjamin Herrenschmidt @ 2023-07-26 1:19 UTC (permalink / raw) To: Jason Gunthorpe Cc: Lorenzo Pieralisi, kvm, linux-kernel, alex.williamson, osamaabb, linux-pci, Clint Sbisa, catalin.marinas, maz On Tue, 2023-07-25 at 08:39 -0300, Jason Gunthorpe wrote: > On Tue, Jul 25, 2023 at 04:15:39PM +1000, Benjamin Herrenschmidt wrote: > > > Assuming this is for #2, I think VFIO has fallen into a bit of a trap > > > by allowing userspace to form the mmap offset. I've seen this happen > > > in other subsystems too. It seems like a good idea then you realize > > > you need more stuff in the mmap space and become sad. > > > > > > Typically the way out is to covert the mmap offset into a cookie where > > > userspace issues some ioctl and then the ioctl returns an opaque mmap > > > offset to use. > > > > > > eg in the vfio context you'd do some 'prepare region for mmap' ioctl > > > where you could specify flags. The kernel would encode the flags in > > > the cookie and then mmap would do the right thing. Adding more stuff > > > is done by enhancing the prepare ioctl. > > > > > > Legacy mmap offsets are kept working. > > > > This indeed what I have in mind. IE. VFIO has legacy regions and add-on > > regions though the latter is currently only exploited by some drivers > > that create their own add-on regions. My proposal is to add an ioctl to > > create them from userspace as "children" of an existing driver-provided > > region, allowing to set different attributes for mmap. > > I wouldn't call it children, you are just getting a different mmap > cookie for the same region object. I though they could be subsets but that might be overkill. > > In the current VFIO the implementation is *entirely* in vfio_pci_core > > for PCI and entirely in vfio_platform_common.c for platform, so while > > the same ioctls could be imagined to create sub-regions, it would have > > to be completely implemented twice unless we do a lot of heavy lifting > > to move some of that region stuff into common code. > > The machinery for managing the mmap cookies should be in common code Ok. I'll whip up a POC within vfio_pci only intially to test the concept and to agree on the API, then look at how we can clean all that up. Cheers, Ben. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-07-26 1:20 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-14 2:32 VFIO (PCI) and write combine mapping of BARs Benjamin Herrenschmidt 2023-07-14 7:13 ` Lorenzo Pieralisi 2023-07-14 12:37 ` Jason Gunthorpe 2023-07-25 6:15 ` Benjamin Herrenschmidt 2023-07-25 11:39 ` Jason Gunthorpe 2023-07-26 1:19 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).