From: Oliver Upton <oliver.upton@linux.dev>
To: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
Catalin Marinas <catalin.marinas@arm.com>,
ankita@nvidia.com, maz@kernel.org, suzuki.poulose@arm.com,
yuzenghui@huawei.com, will@kernel.org,
alex.williamson@redhat.com, kevin.tian@intel.com,
yi.l.liu@intel.com, ardb@kernel.org, akpm@linux-foundation.org,
gshan@redhat.com, linux-mm@kvack.org, aniketa@nvidia.com,
cjia@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com,
vsethi@nvidia.com, acurrid@nvidia.com, apopple@nvidia.com,
jhubbard@nvidia.com, danw@nvidia.com, mochs@nvidia.com,
kvmarm@lists.linux.dev, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, james.morse@arm.com
Subject: Re: [PATCH v3 2/2] kvm: arm64: set io memory s2 pte as normalnc for vfio pci devices
Date: Thu, 14 Dec 2023 16:56:01 +0000 [thread overview]
Message-ID: <ZXszoQ48pZ7FnQNV@linux.dev> (raw)
In-Reply-To: <ZXsjv+svp44YjMmh@lpieralisi>
On Thu, Dec 14, 2023 at 04:48:15PM +0100, Lorenzo Pieralisi wrote:
[...]
> > AFAICT, the only reason PCI devices can get the blanket treatment of
> > Normal-NC at stage-2 is because userspace has a Device-* mapping and can't
> > speculatively load from the alias. This feels a bit hacky, and maybe we
> > should prioritize an interface for mapping a device into a VM w/o a
> > valid userspace mapping.
>
> FWIW - I have tried to summarize the reasoning behind PCIe devices
> Normal-NC default stage-2 safety in a document that I have just realized
> now it has become this series cover letter, I don't think the PCI blanket
> treatment is related *only* to the current user space mappings (ie
> BTW, AFAICS it is also *possible* at present to map a prefetchable BAR through
> sysfs with Normal-NC memory attributes in the host at the same time a PCI
> device is passed-through to a guest with VFIO - and therefore we have a
> dev-nGnRnE stage-1 mapping for it. Don't think anyone does that - what for -
> but it is possible and KVM would not know about it).
>
> Again, FWIW, we were told (source Arm ARM) mismatched aliases concerning
> device-XXX vs Normal-NC are not problematic as long as the transactions
> issued for the related mappings are independent (and none of the
> mappings is cacheable).
>
> I appreciate this is not enough to give everyone full confidence on
> this solution robustness - that's why I wrote that up so that we know
> what we are up against and write KVM interfaces accordingly.
Apologies, I didn't mean to question what's going on here from the
hardware POV. My concern was more from the kernel + user interfaces POV,
this all seems to work (specifically for PCI) by maintaining an
intentional mismatch between the VFIO stage-1 and KVM stage-2 mappings.
If we add more behind-the-scenes tricks to get other MMIO mappings
working in the future then this whole interaction will get even
hairier. At least if we follow the stage-1 attributes (where possible)
then we can document some sort of expected behavior in KVM. The VMM would
need know if the device has read side-effects, as the only way to get a
Normal-NC mapping in the guest would be to have one at stage-1.
Kinda stinks to make the VMM aware of the device, but IMO it is a
fundamental limitation of the way we back memslots right now.
--
Thanks,
Oliver
WARNING: multiple messages have this Message-ID (diff)
From: Oliver Upton <oliver.upton@linux.dev>
To: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
Catalin Marinas <catalin.marinas@arm.com>,
ankita@nvidia.com, maz@kernel.org, suzuki.poulose@arm.com,
yuzenghui@huawei.com, will@kernel.org,
alex.williamson@redhat.com, kevin.tian@intel.com,
yi.l.liu@intel.com, ardb@kernel.org, akpm@linux-foundation.org,
gshan@redhat.com, linux-mm@kvack.org, aniketa@nvidia.com,
cjia@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com,
vsethi@nvidia.com, acurrid@nvidia.com, apopple@nvidia.com,
jhubbard@nvidia.com, danw@nvidia.com, mochs@nvidia.com,
kvmarm@lists.linux.dev, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, james.morse@arm.com
Subject: Re: [PATCH v3 2/2] kvm: arm64: set io memory s2 pte as normalnc for vfio pci devices
Date: Thu, 14 Dec 2023 16:56:01 +0000 [thread overview]
Message-ID: <ZXszoQ48pZ7FnQNV@linux.dev> (raw)
In-Reply-To: <ZXsjv+svp44YjMmh@lpieralisi>
On Thu, Dec 14, 2023 at 04:48:15PM +0100, Lorenzo Pieralisi wrote:
[...]
> > AFAICT, the only reason PCI devices can get the blanket treatment of
> > Normal-NC at stage-2 is because userspace has a Device-* mapping and can't
> > speculatively load from the alias. This feels a bit hacky, and maybe we
> > should prioritize an interface for mapping a device into a VM w/o a
> > valid userspace mapping.
>
> FWIW - I have tried to summarize the reasoning behind PCIe devices
> Normal-NC default stage-2 safety in a document that I have just realized
> now it has become this series cover letter, I don't think the PCI blanket
> treatment is related *only* to the current user space mappings (ie
> BTW, AFAICS it is also *possible* at present to map a prefetchable BAR through
> sysfs with Normal-NC memory attributes in the host at the same time a PCI
> device is passed-through to a guest with VFIO - and therefore we have a
> dev-nGnRnE stage-1 mapping for it. Don't think anyone does that - what for -
> but it is possible and KVM would not know about it).
>
> Again, FWIW, we were told (source Arm ARM) mismatched aliases concerning
> device-XXX vs Normal-NC are not problematic as long as the transactions
> issued for the related mappings are independent (and none of the
> mappings is cacheable).
>
> I appreciate this is not enough to give everyone full confidence on
> this solution robustness - that's why I wrote that up so that we know
> what we are up against and write KVM interfaces accordingly.
Apologies, I didn't mean to question what's going on here from the
hardware POV. My concern was more from the kernel + user interfaces POV,
this all seems to work (specifically for PCI) by maintaining an
intentional mismatch between the VFIO stage-1 and KVM stage-2 mappings.
If we add more behind-the-scenes tricks to get other MMIO mappings
working in the future then this whole interaction will get even
hairier. At least if we follow the stage-1 attributes (where possible)
then we can document some sort of expected behavior in KVM. The VMM would
need know if the device has read side-effects, as the only way to get a
Normal-NC mapping in the guest would be to have one at stage-1.
Kinda stinks to make the VMM aware of the device, but IMO it is a
fundamental limitation of the way we back memslots right now.
--
Thanks,
Oliver
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-12-14 16:56 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-08 16:47 [PATCH v3 0/2] kvm: arm64: allow vm to select DEVICE_* and ankita
2023-12-08 16:47 ` ankita
2023-12-08 16:47 ` [PATCH v3 1/2] kvm: arm64: introduce new flag for non-cacheable IO memory ankita
2023-12-08 16:47 ` ankita
2023-12-12 12:17 ` Will Deacon
2023-12-12 12:17 ` Will Deacon
2023-12-12 17:31 ` Catalin Marinas
2023-12-12 17:31 ` Catalin Marinas
2024-01-03 11:43 ` Suzuki K Poulose
2024-01-03 11:43 ` Suzuki K Poulose
2024-01-03 13:25 ` Ankit Agrawal
2024-01-03 13:25 ` Ankit Agrawal
2023-12-08 16:47 ` [PATCH v3 2/2] kvm: arm64: set io memory s2 pte as normalnc for vfio pci devices ankita
2023-12-08 16:47 ` ankita
2023-12-12 17:46 ` Catalin Marinas
2023-12-12 17:46 ` Catalin Marinas
2023-12-12 18:11 ` Jason Gunthorpe
2023-12-12 18:11 ` Jason Gunthorpe
2023-12-13 20:05 ` Oliver Upton
2023-12-13 20:05 ` Oliver Upton
2023-12-14 15:48 ` Lorenzo Pieralisi
2023-12-14 15:48 ` Lorenzo Pieralisi
2023-12-14 16:56 ` Oliver Upton [this message]
2023-12-14 16:56 ` Oliver Upton
2023-12-21 13:19 ` Catalin Marinas
2023-12-21 13:19 ` Catalin Marinas
2024-01-02 17:09 ` Jason Gunthorpe
2024-01-02 17:09 ` Jason Gunthorpe
2024-01-03 13:33 ` Catalin Marinas
2024-01-03 13:33 ` Catalin Marinas
2024-01-05 20:42 ` Oliver Upton
2024-01-05 20:42 ` Oliver Upton
2024-01-08 11:04 ` Catalin Marinas
2024-01-08 11:04 ` Catalin Marinas
2024-01-08 13:18 ` Jason Gunthorpe
2024-01-08 13:18 ` Jason Gunthorpe
2024-01-02 17:26 ` Jason Gunthorpe
2024-01-02 17:26 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZXszoQ48pZ7FnQNV@linux.dev \
--to=oliver.upton@linux.dev \
--cc=acurrid@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=aniketa@nvidia.com \
--cc=ankita@nvidia.com \
--cc=apopple@nvidia.com \
--cc=ardb@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=cjia@nvidia.com \
--cc=danw@nvidia.com \
--cc=gshan@redhat.com \
--cc=james.morse@arm.com \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=kwankhede@nvidia.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lpieralisi@kernel.org \
--cc=maz@kernel.org \
--cc=mochs@nvidia.com \
--cc=suzuki.poulose@arm.com \
--cc=targupta@nvidia.com \
--cc=vsethi@nvidia.com \
--cc=will@kernel.org \
--cc=yi.l.liu@intel.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.