From: Nikita Kalyazin <kalyazin@amazon.com>
To: Sean Christopherson <seanjc@google.com>,
Nikita Kalyazin <kalyazin@amazon.co.uk>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>,
"shuah@kernel.org" <shuah@kernel.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-kselftest@vger.kernel.org"
<linux-kselftest@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"david@redhat.com" <david@redhat.com>,
"jthoughton@google.com" <jthoughton@google.com>,
"patrick.roy@linux.dev" <patrick.roy@linux.dev>,
Jack Thomson <jackabt@amazon.co.uk>,
Derek Manwaring <derekmn@amazon.com>,
Marco Cali <xmarcalx@amazon.co.uk>, <ackerleytng@google.com>,
"Vishal Annapurve" <vannapurve@google.com>
Subject: Re: [PATCH v6 1/2] KVM: guest_memfd: add generic population via write
Date: Fri, 24 Oct 2025 15:35:34 +0100 [thread overview]
Message-ID: <8a28ddea-35c0-490e-a7d2-7fb612fdd008@amazon.com> (raw)
In-Reply-To: <aPpS2aqdobVTk_ed@google.com>
On 23/10/2025 17:07, Sean Christopherson wrote:
> On Mon, Oct 20, 2025, Nikita Kalyazin wrote:
>> From: Nikita Kalyazin <kalyazin@amazon.com>
+ Vishal and Ackerley
>>
>> write syscall populates guest_memfd with user-supplied data in a generic
>> way, ie no vendor-specific preparation is performed. If the request is
>> not page-aligned, the remaining bytes are initialised to 0.
>>
>> write is only supported for non-CoCo setups where guest memory is not
>> hardware-encrypted.
>
> Please include all of the "why". The code mostly communicates the "what", but
> it doesn't capture why write() support is at all interesting, nor does it explain
> why read() isn't supported.
Hi Sean,
Thanks for the review.
Do you think including the explanation from the cover letter would be
sufficient? Shall I additionally say that read() isn't supported
because there is no use case for it as of now or would it be obvious?
>
>> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
>> ---
>> virt/kvm/guest_memfd.c | 48 ++++++++++++++++++++++++++++++++++++++++++
>
> There's a notable lack of uAPI and Documentation chanegs. I.e. this needs a
> GUEST_MEMFD_FLAG_xxx along with proper documentation.
Would the following be ok in the doc?
When the capability KVM_CAP_GUEST_MEMFD_WRITE is supported, the 'flags'
field
supports GUEST_MEMFD_FLAG_WRITE. Setting this flag on guest_memfd creation
enables write() syscall operations to populate guest_memfd memory from host
userspace.
When a write() operation is performed on a guest_memfd file descriptor
with the
GUEST_MEMFD_FLAG_WRITE set, the syscall will populate the guest memory with
user-supplied data in a generic way, without any vendor-specific
preparation.
The write operation is only supported for non-CoCo (Confidential Computing)
setups where guest memory is not hardware-encrypted. If the write request is
not page-aligned, any remaining bytes within the page are initialized to
zero.
>
> And while it's definitely it's a-ok to land .write() in advance of the direct map
> changes, we do need to at least map out how we want the two to interact, e.g. so
> that we don't end up with constraints that are impossible to satisfy.
>
write() shall not attempt to access a page that is not in the direct
map, which I believe can be achieved via kvm_kmem_gmem_write_begin()
consulting the KVM_GMEM_FOLIO_NO_DIRECT_MAP in folio->private
(introduced in [1]).
Do you think we should mention it in the commit message in some way?
What particular constraint are you cautious about?
[1] https://lore.kernel.org/kvm/20250924152214.7292-2-roypat@amazon.co.uk/
>> 1 file changed, 48 insertions(+)
>>
>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> index 94bafd6c558c..f4e218049afa 100644
>> --- a/virt/kvm/guest_memfd.c
>> +++ b/virt/kvm/guest_memfd.c
>> @@ -380,6 +380,8 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>>
>> static struct file_operations kvm_gmem_fops = {
>> .mmap = kvm_gmem_mmap,
>> + .llseek = default_llseek,
>> + .write_iter = generic_perform_write,
>> .open = generic_file_open,
>> .release = kvm_gmem_release,
>> .fallocate = kvm_gmem_fallocate,
>> @@ -390,6 +392,49 @@ void kvm_gmem_init(struct module *module)
>> kvm_gmem_fops.owner = module;
>> }
>>
>> +static int kvm_kmem_gmem_write_begin(const struct kiocb *kiocb,
>> + struct address_space *mapping,
>> + loff_t pos, unsigned int len,
>> + struct folio **foliop,
>> + void **fsdata)
>
> Over-aggressive wrapping, this can be
>
>
> static int kvm_kmem_gmem_write_begin(const struct kiocb *kiocb,
> struct address_space *mapping, loff_t pos,
> unsigned int len, struct folio **folio,
> void **fsdata)
>
> or
>
> static int kvm_kmem_gmem_write_begin(const struct kiocb *kiocb,
> struct address_space *mapping,
> loff_t pos, unsigned int len,
> struct folio **folio, void **fsdata)
>
> if we want to bundle pos+len.
Ack.
>
>> +{
>> + struct file *file = kiocb->ki_filp;
>
> ki_filp is already a file, and even if it were a "void *", there's no need for a
> local variable.
Ack.
>
>> + struct inode *inode = file_inode(file);
>> + pgoff_t index = pos >> PAGE_SHIFT;
>> + struct folio *folio;
>> +
>> + if (!kvm_gmem_supports_mmap(inode))
>
> Checking for MMAP is neither sufficient nor strictly necessary. MMAP doesn't
> imply SHARED, and it's not clear to me that mmap() support should be in any way
> tied to WRITE support.
As in my reply to the comment about doc, I plan to introduce
KVM_CAP_GUEST_MEMFD_WRITE and GUEST_MEMFD_FLAG_WRITE. The
kvm_arch_supports_gmem_write() will be a weak symbol and relying on
!kvm_arch_has_private_mem() on x86, similar to
kvm_arch_supports_gmem_mmap(). Does it look right?
>
>> + return -ENODEV;
>> +
>> + if (pos + len > i_size_read(inode))
>> + return -EINVAL;
>> +
>> + folio = kvm_gmem_get_folio(inode, index);
>
> Eh, since "index" is only used once, my vote is to use "pos" and do the shift
> here, so that it's obvious that the input to kvm_gmem_get_folio() is being checked.
Ack.
>
>> + if (IS_ERR(folio))
>> + return -EFAULT;
>
> Why EFAULT?
Will propagate the error like you suggest below.
>
>> +
>> + *foliop = folio;
>
> There shouldn't be any need for a local "folio". What about having the "out"
> param be just "folio"?
>
> E.g.
>
> static int kvm_kmem_gmem_write_begin(const struct kiocb *kiocb,
> struct address_space *mapping,
> loff_t pos, unsigned int len,
> struct folio **folio, void **fsdata)
> {
> struct inode *inode = file_inode(kiocb->ki_filp);
>
> if (!kvm_gmem_supports_write(inode))
> return -ENODEV;
>
> if (pos + len > i_size_read(inode))
> return -EINVAL;
>
> *folio = kvm_gmem_get_folio(inode, pos >> PAGE_SHIFT);
> if (IS_ERR(*folio))
> return PTR_ERR(*folio);
>
> return 0;
> }
Ack.
>
>
>> + return 0;
>> +}
>> +
>> +static int kvm_kmem_gmem_write_end(const struct kiocb *kiocb,
>> + struct address_space *mapping,
>> + loff_t pos, unsigned int len,
>> + unsigned int copied,
>> + struct folio *folio, void *fsdata)
>> +{
>> + if (copied && copied < len) {
>
> Why check if "copied" is non-zero? I don't see why KVM should behave differently
> with respect to unwritten bytes if copy_folio_from_iter_atomic() fails on the
> first byte or the Nth byte.
No, I don't think there is a need for this check indeed. It looks like
a leftover from my previous changes.
>
>> + unsigned int from = pos & ((1UL << folio_order(folio)) - 1);
>
> Uh, isn't this just offset_in_folio()?
>
>> +
>> + folio_zero_range(folio, from + copied, len - copied);
>
> I'd probably be in favor of omitting "from" entirely, e.g.
>
> if (copied < len)
> folio_zero_range(folio, offset_in_folio(pos) + copied,
> len - copied);
>
Ack.
>> + }
>> +
>> + folio_unlock(folio);
>> + folio_put(folio);
>> +
>> + return copied;
>> +}
next prev parent reply other threads:[~2025-10-24 14:36 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-20 16:13 [PATCH v6 0/2] KVM: guest_memfd: use write for population Kalyazin, Nikita
2025-10-20 16:14 ` [PATCH v6 1/2] KVM: guest_memfd: add generic population via write Kalyazin, Nikita
2025-10-23 16:07 ` Sean Christopherson
2025-10-24 14:35 ` Nikita Kalyazin [this message]
2025-10-30 21:37 ` Sean Christopherson
2025-11-03 16:55 ` Nikita Kalyazin
2025-11-07 15:42 ` Ackerley Tng
2025-11-07 17:23 ` Nikita Kalyazin
2025-10-20 16:14 ` [PATCH v6 2/2] KVM: selftests: update guest_memfd write tests Kalyazin, Nikita
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8a28ddea-35c0-490e-a7d2-7fb612fdd008@amazon.com \
--to=kalyazin@amazon.com \
--cc=ackerleytng@google.com \
--cc=david@redhat.com \
--cc=derekmn@amazon.com \
--cc=jackabt@amazon.co.uk \
--cc=jthoughton@google.com \
--cc=kalyazin@amazon.co.uk \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=patrick.roy@linux.dev \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=vannapurve@google.com \
--cc=xmarcalx@amazon.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).