From: Jason Gunthorpe <jgg@nvidia.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>,
ankita@nvidia.com,
Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
oliver.upton@linux.dev, suzuki.poulose@arm.com,
yuzenghui@huawei.com, will@kernel.org, ardb@kernel.org,
akpm@linux-foundation.org, gshan@redhat.com, aniketa@nvidia.com,
cjia@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com,
vsethi@nvidia.com, acurrid@nvidia.com, apopple@nvidia.com,
jhubbard@nvidia.com, danw@nvidia.com, mochs@nvidia.com,
kvmarm@lists.linux.dev, kvm@vger.kernel.org,
lpieralisi@kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v2 1/1] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory
Date: Tue, 5 Dec 2023 12:43:18 -0400 [thread overview]
Message-ID: <20231205164318.GG2692119@nvidia.com> (raw)
In-Reply-To: <ZW9OSe8Z9gAmM7My@arm.com>
On Tue, Dec 05, 2023 at 04:22:33PM +0000, Catalin Marinas wrote:
> Yeah, I made this argument in the past. But it's a fair question to ask
> since the Arm world is different from x86. Just reusing an existing
> driver in a different context may break its expectations. Does Normal NC
> access complete by the time a TLBI (for Stage 2) and DSB (DVMsync) is
> completed? It does reach some point of serialisation with subsequent
> accesses to the same address but not sure how it is ordered with an
> access to a different location like the config space used for reset.
> Maybe it's not a problem at all or it is safe only for PCIe but it would
> be good to get to the bottom of this.
IMHO, the answer is you can't know architecturally. The specific
vfio-platform driver must do an analysis of it's specific SOC and
determine what exactly is required to order the reset. The primary
purpose of the vfio-platform drivers is to provide this reset!
In most cases I would expect some reads from the device to be required
before the reset.
> > Remember, the feedback we got from the CPU architects was that even
> > DEVICE_* will experience an uncontained failure if the device tiggers
> > an error response in shipping ARM IP.
> >
> > The reason PCIe is safe is because the PCI bridge does not generate
> > errors in the first place!
>
> That's an argument to restrict this feature to PCIe. It's really about
> fewer arguments on the behaviour of other devices. Marc did raise
> another issue with the GIC VCPU interface (does this even have a vma in
> the host VMM?). That's a class of devices where the mapping is
> context-switched, so the TLBI+DSB rules don't help.
I don't know anything about the GIC VCPU interface, to give any
comment unfortunately. Since it seems there is something to fix here I
would appreciate some background..
When you say it is context switched do you mean kvm does a register
write on every vm entry to set the proper HW context for the vCPU?
We are worrying that register write will possibly not order after
NORMAL_NC?
> > Thus, the way a platform device can actually be safe is if it too
> > never generates errors in the first place! Obviously this approach
> > works just as well with NORMAL_NC.
> >
> > If a platform device does generate errors then we shouldn't expect
> > containment at all, and the memory type has no bearing on the
> > safety. The correct answer is to block these platform devices from
> > VFIO/KVM/etc because they can trigger uncontained failures.
>
> Assuming the error containment is sorted, there are two other issues
> with other types of devices:
>
> 1. Ordering guarantees on reclaim or context switch
Solved in VFIO
> 2. Unaligned accesses
>
> On (2), I think PCIe is fairly clear on how the TLPs are generated, so I
> wouldn't expect additional errors here. But I have no idea what AMBA/AXI
> does here in general. Perhaps it's fine, I don't think we looked into it
> as the focus was mostly on PCIe.
I would expect AXI devices to throw errors in all sorts of odd
cases. eg I would not be surprised at all to see carelessly built AXI
devices error if ST64B is pointed at them. At least when I was
building AXI logic years ago I was so lazy :P
This is mostly my point - if the devices under vfio-platform were not
designed to have contained errors then, IMHO, it is hard to believe
that DEVICE_X/NORMAL_NC is the only issue in that HW.
> So, I think it would be easier to get this patch upstream if we limit
> the change to PCIe devices for now. We may relax this further in the
> future. Do you actually have a need for non-PCIe devices to support WC
> in the guest or it's more about the complexity of the logic to detect
> whether it's actually a PCIe BAR we are mapping into the guest? (I can
> see some Arm GPU folk asking for this but those devices are not easily
> virtualisable).
The complexity is my concern, and the disruption to the ecosystem with
some of the ideas given.
If there was a trivial way to convey in the VMA that it is safe then
sure, no objection from me.
My worry is this has turned from fixing a real problem we have today
into a debate about theoretical issues that nobody may care about that
are very disruptive to solve.
I would turn it around and ask we find a way to restrict platform
devices when someone comes with a platform device that wants to use
secure kvm and has a single well defined HW problem that is solved by
this work.
What if we change vfio-pci to use pgprot_device() like it already
really should and say the pgprot_noncached() is enforced as
DEVICE_nGnRnE and pgprot_device() may be DEVICE_nGnRE or NORMAL_NC?
Would that be acceptable?
Jason
WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>,
ankita@nvidia.com,
Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
oliver.upton@linux.dev, suzuki.poulose@arm.com,
yuzenghui@huawei.com, will@kernel.org, ardb@kernel.org,
akpm@linux-foundation.org, gshan@redhat.com, aniketa@nvidia.com,
cjia@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com,
vsethi@nvidia.com, acurrid@nvidia.com, apopple@nvidia.com,
jhubbard@nvidia.com, danw@nvidia.com, mochs@nvidia.com,
kvmarm@lists.linux.dev, kvm@vger.kernel.org,
lpieralisi@kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v2 1/1] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory
Date: Tue, 5 Dec 2023 12:43:18 -0400 [thread overview]
Message-ID: <20231205164318.GG2692119@nvidia.com> (raw)
In-Reply-To: <ZW9OSe8Z9gAmM7My@arm.com>
On Tue, Dec 05, 2023 at 04:22:33PM +0000, Catalin Marinas wrote:
> Yeah, I made this argument in the past. But it's a fair question to ask
> since the Arm world is different from x86. Just reusing an existing
> driver in a different context may break its expectations. Does Normal NC
> access complete by the time a TLBI (for Stage 2) and DSB (DVMsync) is
> completed? It does reach some point of serialisation with subsequent
> accesses to the same address but not sure how it is ordered with an
> access to a different location like the config space used for reset.
> Maybe it's not a problem at all or it is safe only for PCIe but it would
> be good to get to the bottom of this.
IMHO, the answer is you can't know architecturally. The specific
vfio-platform driver must do an analysis of it's specific SOC and
determine what exactly is required to order the reset. The primary
purpose of the vfio-platform drivers is to provide this reset!
In most cases I would expect some reads from the device to be required
before the reset.
> > Remember, the feedback we got from the CPU architects was that even
> > DEVICE_* will experience an uncontained failure if the device tiggers
> > an error response in shipping ARM IP.
> >
> > The reason PCIe is safe is because the PCI bridge does not generate
> > errors in the first place!
>
> That's an argument to restrict this feature to PCIe. It's really about
> fewer arguments on the behaviour of other devices. Marc did raise
> another issue with the GIC VCPU interface (does this even have a vma in
> the host VMM?). That's a class of devices where the mapping is
> context-switched, so the TLBI+DSB rules don't help.
I don't know anything about the GIC VCPU interface, to give any
comment unfortunately. Since it seems there is something to fix here I
would appreciate some background..
When you say it is context switched do you mean kvm does a register
write on every vm entry to set the proper HW context for the vCPU?
We are worrying that register write will possibly not order after
NORMAL_NC?
> > Thus, the way a platform device can actually be safe is if it too
> > never generates errors in the first place! Obviously this approach
> > works just as well with NORMAL_NC.
> >
> > If a platform device does generate errors then we shouldn't expect
> > containment at all, and the memory type has no bearing on the
> > safety. The correct answer is to block these platform devices from
> > VFIO/KVM/etc because they can trigger uncontained failures.
>
> Assuming the error containment is sorted, there are two other issues
> with other types of devices:
>
> 1. Ordering guarantees on reclaim or context switch
Solved in VFIO
> 2. Unaligned accesses
>
> On (2), I think PCIe is fairly clear on how the TLPs are generated, so I
> wouldn't expect additional errors here. But I have no idea what AMBA/AXI
> does here in general. Perhaps it's fine, I don't think we looked into it
> as the focus was mostly on PCIe.
I would expect AXI devices to throw errors in all sorts of odd
cases. eg I would not be surprised at all to see carelessly built AXI
devices error if ST64B is pointed at them. At least when I was
building AXI logic years ago I was so lazy :P
This is mostly my point - if the devices under vfio-platform were not
designed to have contained errors then, IMHO, it is hard to believe
that DEVICE_X/NORMAL_NC is the only issue in that HW.
> So, I think it would be easier to get this patch upstream if we limit
> the change to PCIe devices for now. We may relax this further in the
> future. Do you actually have a need for non-PCIe devices to support WC
> in the guest or it's more about the complexity of the logic to detect
> whether it's actually a PCIe BAR we are mapping into the guest? (I can
> see some Arm GPU folk asking for this but those devices are not easily
> virtualisable).
The complexity is my concern, and the disruption to the ecosystem with
some of the ideas given.
If there was a trivial way to convey in the VMA that it is safe then
sure, no objection from me.
My worry is this has turned from fixing a real problem we have today
into a debate about theoretical issues that nobody may care about that
are very disruptive to solve.
I would turn it around and ask we find a way to restrict platform
devices when someone comes with a platform device that wants to use
secure kvm and has a single well defined HW problem that is solved by
this work.
What if we change vfio-pci to use pgprot_device() like it already
really should and say the pgprot_noncached() is enforced as
DEVICE_nGnRnE and pgprot_device() may be DEVICE_nGnRE or NORMAL_NC?
Would that be acceptable?
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-12-05 16:43 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-05 3:30 [PATCH v2 1/1] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory ankita
2023-12-05 3:30 ` ankita
2023-12-05 9:21 ` Marc Zyngier
2023-12-05 9:21 ` Marc Zyngier
2023-12-05 11:40 ` Catalin Marinas
2023-12-05 11:40 ` Catalin Marinas
2023-12-05 13:05 ` Jason Gunthorpe
2023-12-05 13:05 ` Jason Gunthorpe
2023-12-05 14:37 ` Lorenzo Pieralisi
2023-12-05 14:37 ` Lorenzo Pieralisi
2023-12-05 14:44 ` Jason Gunthorpe
2023-12-05 14:44 ` Jason Gunthorpe
2023-12-05 16:24 ` Catalin Marinas
2023-12-05 16:24 ` Catalin Marinas
2023-12-05 17:10 ` Jason Gunthorpe
2023-12-05 17:10 ` Jason Gunthorpe
2023-12-05 16:22 ` Catalin Marinas
2023-12-05 16:22 ` Catalin Marinas
2023-12-05 16:43 ` Jason Gunthorpe [this message]
2023-12-05 16:43 ` Jason Gunthorpe
2023-12-05 17:01 ` Marc Zyngier
2023-12-05 17:01 ` Marc Zyngier
2023-12-05 17:33 ` Catalin Marinas
2023-12-05 17:33 ` Catalin Marinas
2023-12-05 17:50 ` Marc Zyngier
2023-12-05 17:50 ` Marc Zyngier
2023-12-05 18:40 ` Catalin Marinas
2023-12-05 18:40 ` Catalin Marinas
2023-12-06 11:39 ` Marc Zyngier
2023-12-06 11:39 ` Marc Zyngier
2023-12-06 12:14 ` Catalin Marinas
2023-12-06 12:14 ` Catalin Marinas
2023-12-06 15:16 ` Jason Gunthorpe
2023-12-06 15:16 ` Jason Gunthorpe
2023-12-06 16:31 ` Catalin Marinas
2023-12-06 16:31 ` Catalin Marinas
2023-12-06 17:20 ` Jason Gunthorpe
2023-12-06 17:20 ` Jason Gunthorpe
2023-12-06 18:58 ` Catalin Marinas
2023-12-06 18:58 ` Catalin Marinas
2023-12-06 19:03 ` Jason Gunthorpe
2023-12-06 19:03 ` Jason Gunthorpe
2023-12-06 19:06 ` Catalin Marinas
2023-12-06 19:06 ` Catalin Marinas
2023-12-07 2:53 ` Ankit Agrawal
2023-12-07 2:53 ` Ankit Agrawal
2023-12-06 11:52 ` Lorenzo Pieralisi
2023-12-06 11:52 ` Lorenzo Pieralisi
2023-12-05 19:24 ` Catalin Marinas
2023-12-05 19:24 ` Catalin Marinas
2023-12-05 19:48 ` Jason Gunthorpe
2023-12-05 19:48 ` Jason Gunthorpe
2023-12-06 14:49 ` Catalin Marinas
2023-12-06 14:49 ` Catalin Marinas
2023-12-06 15:05 ` Jason Gunthorpe
2023-12-06 15:05 ` Jason Gunthorpe
2023-12-06 15:18 ` Lorenzo Pieralisi
2023-12-06 15:18 ` Lorenzo Pieralisi
2023-12-06 15:38 ` Jason Gunthorpe
2023-12-06 15:38 ` Jason Gunthorpe
2023-12-06 16:23 ` Catalin Marinas
2023-12-06 16:23 ` Catalin Marinas
2023-12-06 16:48 ` Jason Gunthorpe
2023-12-06 16:48 ` Jason Gunthorpe
2023-12-07 10:13 ` Lorenzo Pieralisi
2023-12-07 10:13 ` Lorenzo Pieralisi
2023-12-07 13:38 ` Jason Gunthorpe
2023-12-07 13:38 ` Jason Gunthorpe
2023-12-07 14:50 ` Lorenzo Pieralisi
2023-12-07 14:50 ` Lorenzo Pieralisi
2023-12-05 13:28 ` Lorenzo Pieralisi
2023-12-05 13:28 ` Lorenzo Pieralisi
2023-12-05 14:16 ` Shameerali Kolothum Thodi
2023-12-05 14:16 ` Shameerali Kolothum Thodi
2023-12-06 8:17 ` Shameerali Kolothum Thodi
2023-12-06 8:17 ` Shameerali Kolothum Thodi
2023-12-05 11:48 ` Catalin Marinas
2023-12-05 11:48 ` Catalin Marinas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231205164318.GG2692119@nvidia.com \
--to=jgg@nvidia.com \
--cc=acurrid@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=aniketa@nvidia.com \
--cc=ankita@nvidia.com \
--cc=apopple@nvidia.com \
--cc=ardb@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=cjia@nvidia.com \
--cc=danw@nvidia.com \
--cc=gshan@redhat.com \
--cc=jhubbard@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=kwankhede@nvidia.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lpieralisi@kernel.org \
--cc=maz@kernel.org \
--cc=mochs@nvidia.com \
--cc=oliver.upton@linux.dev \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=suzuki.poulose@arm.com \
--cc=targupta@nvidia.com \
--cc=vsethi@nvidia.com \
--cc=will@kernel.org \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.