From: Oliver Upton <oliver.upton@linux.dev>
To: Sean Christopherson <seanjc@google.com>
Cc: Marc Zyngier <maz@kernel.org>, Ankit Agrawal <ankita@nvidia.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Jason Gunthorpe <jgg@nvidia.com>,
"joey.gouly@arm.com" <joey.gouly@arm.com>,
"suzuki.poulose@arm.com" <suzuki.poulose@arm.com>,
"yuzenghui@huawei.com" <yuzenghui@huawei.com>,
"will@kernel.org" <will@kernel.org>,
"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
"shahuang@redhat.com" <shahuang@redhat.com>,
"lpieralisi@kernel.org" <lpieralisi@kernel.org>,
"david@redhat.com" <david@redhat.com>,
Aniket Agashe <aniketa@nvidia.com>, Neo Jia <cjia@nvidia.com>,
Kirti Wankhede <kwankhede@nvidia.com>,
"Tarun Gupta (SW-GPU)" <targupta@nvidia.com>,
Vikram Sethi <vsethi@nvidia.com>,
Andy Currid <acurrid@nvidia.com>,
Alistair Popple <apopple@nvidia.com>,
John Hubbard <jhubbard@nvidia.com>,
Dan Williams <danw@nvidia.com>, Zhi Wang <zhiw@nvidia.com>,
Matt Ochs <mochs@nvidia.com>, Uday Dhoke <udhoke@nvidia.com>,
Dheeraj Nigam <dnigam@nvidia.com>,
Krishnakant Jaju <kjaju@nvidia.com>,
"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
"sebastianene@google.com" <sebastianene@google.com>,
"coltonlewis@google.com" <coltonlewis@google.com>,
"kevin.tian@intel.com" <kevin.tian@intel.com>,
"yi.l.liu@intel.com" <yi.l.liu@intel.com>,
"ardb@kernel.org" <ardb@kernel.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"gshan@redhat.com" <gshan@redhat.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"ddutile@redhat.com" <ddutile@redhat.com>,
"tabba@google.com" <tabba@google.com>,
"qperret@google.com" <qperret@google.com>,
"kvmarm@lists.linux.dev" <kvmarm@lists.linux.dev>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v3 1/1] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
Date: Wed, 26 Mar 2025 11:51:57 -0700 [thread overview]
Message-ID: <Z-RMzYHOzc36H7yR@linux.dev> (raw)
In-Reply-To: <Z-RGYO3QVj5JNjRB@google.com>
On Wed, Mar 26, 2025 at 11:24:32AM -0700, Sean Christopherson wrote:
> > > But I thought the whole problem is that mapping this fancy memory as device is
> > > unsafe on non-FWB hosts? If it's safe, then why does KVM needs to reject anything
> > > in the first place?
> >
> > I don't know where you got that idea. This is all about what memory
> > type is exposed to a guest:
> >
> > - with FWB, no need for CMOs, so cacheable memory is allowed if the
> > device supports it (i.e. it actually exposes memory), and device
> > otherwise.
> >
> > - without FWB, CMOs are required, and we don't have a host mapping for
> > these pages. As a fallback, the mapping is device only, as this
> > doesn't require any CMO by definition.
> >
> > There is no notion of "safety" here.
>
> Ah, the safety I'm talking about is the CMO requirement. IIUC, not doing CMOs
> if the memory is cacheable could result in data corruption, i.e. would be a safety
> issue for the host. But I missed that you were proposing that the !FWB behavior
> would be to force device mappings.
To Jason's earlier point, you wind up with a security issue the other
way around.
Supposing the host is using a cacheable mapping to, say, zero the $THING
at the other end of the mapping. Without a way to CMO the $THING we
cannot make the zeroing visible to a guest with a stage-2 Device-* mapping.
Marc, I understand that your proposed fallback is aligned to what we
do today, but I'm actually unconvinced that it provides any reliable/correct
behavior. We should then wind up with stage-2 memory attribute rules
like so:
1) If struct page memory, use a cacheable mapping. CMO for non-FWB.
2) If cacheable PFNMAP:
a) With FWB, use a cacheable mapping
b) Without FWB, fail.
3) If VM_ALLOW_ANY_UNCACHED, use Normal Non-Cacheable mapping
4) Otherwise, Device-nGnRE
I understand 2b breaks ABI, but the 'typical' VFIO usages fall into (3)
and (4).
> > > > Importantly, it is *userspace* that is in charge of deciding how the
> > > > device is mapped at S2. And the memslot flag is the correct
> > > > abstraction for that.
> > >
> > > I strongly disagree. Whatever owns the underlying physical memory is in charge,
> > > not userspace. For memory that's backed by a VMA, userspace can influence the
> > > behavior through mmap(), mprotect(), etc., but ultimately KVM needs to pull state
> > > from mm/, via the VMA. Or in the guest_memfd case, from guest_memfd.
> >
> > I don't buy that. Userspace needs to know the semantics of the memory
> > it gives to the guest. Or at least discover that the same device
> > plugged into to different hosts will have different behaviours. Just
> > letting things rip is not an acceptable outcome.
>
> Agreed, but that doesn't require a memslot flag. A capability to enumerate that
> KVM can do cacheable mappings for PFNMAP memory would suffice. And if we want to
> have KVM reject memslots that are cachaeable in the VMA, but would get device in
> stage-2, then we can provide that functionality through the capability, i.e. let
> userspace decide if it wants "fallback to device" vs. "error on creation" on a
> per-VM basis.
>
> What I object to is adding a memslot flag.
A capability that says "I can force cacheable things to be cacheable" is
useful beyond even PFNMAP stuff. A pedantic but correct live migration /
snapshotting implementation on non-FWB would need to do CMOs in case the
VM used a non-WB mapping for memory.
Thanks,
Oliver
next prev parent reply other threads:[~2025-03-26 18:52 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-10 10:30 [PATCH v3 0/1] KVM: arm64: Map GPU device memory as cacheable ankita
2025-03-10 10:30 ` [PATCH v3 1/1] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags ankita
2025-03-10 11:54 ` Marc Zyngier
2025-03-11 3:42 ` Ankit Agrawal
2025-03-11 11:18 ` Marc Zyngier
2025-03-11 12:07 ` Ankit Agrawal
2025-03-12 8:21 ` Marc Zyngier
2025-03-17 5:55 ` Ankit Agrawal
2025-03-17 9:27 ` Marc Zyngier
2025-03-17 19:54 ` Catalin Marinas
2025-03-18 9:39 ` Marc Zyngier
2025-03-18 12:55 ` Jason Gunthorpe
2025-03-18 19:27 ` Catalin Marinas
2025-03-18 19:35 ` David Hildenbrand
2025-03-18 19:40 ` Oliver Upton
2025-03-20 3:30 ` bibo mao
2025-03-20 7:24 ` bibo mao
2025-03-18 23:17 ` Jason Gunthorpe
2025-03-19 18:03 ` Catalin Marinas
2025-03-18 19:30 ` Oliver Upton
2025-03-18 23:09 ` Jason Gunthorpe
2025-03-19 7:01 ` Oliver Upton
2025-03-19 17:04 ` Jason Gunthorpe
2025-03-19 18:11 ` Catalin Marinas
2025-03-19 19:22 ` Jason Gunthorpe
2025-03-19 21:48 ` Catalin Marinas
2025-03-26 8:31 ` Ankit Agrawal
2025-03-26 14:53 ` Sean Christopherson
2025-03-26 15:42 ` Marc Zyngier
2025-03-26 16:10 ` Sean Christopherson
2025-03-26 18:02 ` Marc Zyngier
2025-03-26 18:24 ` Sean Christopherson
2025-03-26 18:51 ` Oliver Upton [this message]
2025-03-31 14:44 ` Jason Gunthorpe
2025-03-31 14:56 ` Jason Gunthorpe
2025-04-07 15:20 ` Sean Christopherson
2025-04-07 16:15 ` Jason Gunthorpe
2025-04-07 16:43 ` Sean Christopherson
2025-04-16 8:51 ` Ankit Agrawal
2025-04-21 16:03 ` Ankit Agrawal
2025-04-22 7:49 ` Oliver Upton
2025-04-22 13:54 ` Jason Gunthorpe
2025-04-22 16:50 ` Catalin Marinas
2025-04-22 17:03 ` Jason Gunthorpe
2025-04-22 21:28 ` Oliver Upton
2025-04-22 23:35 ` Jason Gunthorpe
2025-04-23 10:45 ` Catalin Marinas
2025-04-23 12:02 ` Jason Gunthorpe
2025-04-23 12:26 ` Catalin Marinas
2025-04-23 13:03 ` Jason Gunthorpe
2025-04-29 10:47 ` Ankit Agrawal
2025-04-29 13:27 ` Catalin Marinas
2025-04-29 14:14 ` Jason Gunthorpe
2025-04-29 16:03 ` Catalin Marinas
2025-04-29 16:44 ` Jason Gunthorpe
2025-04-29 18:09 ` Catalin Marinas
2025-04-29 18:19 ` Jason Gunthorpe
2025-05-07 15:26 ` Ankit Agrawal
2025-05-09 12:47 ` Catalin Marinas
2025-04-22 14:53 ` Sean Christopherson
2025-03-18 12:57 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z-RMzYHOzc36H7yR@linux.dev \
--to=oliver.upton@linux.dev \
--cc=acurrid@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=aniketa@nvidia.com \
--cc=ankita@nvidia.com \
--cc=apopple@nvidia.com \
--cc=ardb@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=cjia@nvidia.com \
--cc=coltonlewis@google.com \
--cc=danw@nvidia.com \
--cc=david@redhat.com \
--cc=ddutile@redhat.com \
--cc=dnigam@nvidia.com \
--cc=gshan@redhat.com \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=joey.gouly@arm.com \
--cc=kevin.tian@intel.com \
--cc=kjaju@nvidia.com \
--cc=kvmarm@lists.linux.dev \
--cc=kwankhede@nvidia.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lpieralisi@kernel.org \
--cc=maz@kernel.org \
--cc=mochs@nvidia.com \
--cc=qperret@google.com \
--cc=ryan.roberts@arm.com \
--cc=seanjc@google.com \
--cc=sebastianene@google.com \
--cc=shahuang@redhat.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=targupta@nvidia.com \
--cc=udhoke@nvidia.com \
--cc=vsethi@nvidia.com \
--cc=will@kernel.org \
--cc=yi.l.liu@intel.com \
--cc=yuzenghui@huawei.com \
--cc=zhiw@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.