From: Sean Christopherson <seanjc@google.com>
To: Nikita Kalyazin <kalyazin@amazon.co.uk>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>,
"shuah@kernel.org" <shuah@kernel.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-kselftest@vger.kernel.org"
<linux-kselftest@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"david@redhat.com" <david@redhat.com>,
"jthoughton@google.com" <jthoughton@google.com>,
"patrick.roy@linux.dev" <patrick.roy@linux.dev>,
Jack Thomson <jackabt@amazon.co.uk>,
Derek Manwaring <derekmn@amazon.com>,
Marco Cali <xmarcalx@amazon.co.uk>
Subject: Re: [PATCH v6 1/2] KVM: guest_memfd: add generic population via write
Date: Thu, 23 Oct 2025 09:07:53 -0700 [thread overview]
Message-ID: <aPpS2aqdobVTk_ed@google.com> (raw)
In-Reply-To: <20251020161352.69257-2-kalyazin@amazon.com>
On Mon, Oct 20, 2025, Nikita Kalyazin wrote:
> From: Nikita Kalyazin <kalyazin@amazon.com>
>
> write syscall populates guest_memfd with user-supplied data in a generic
> way, ie no vendor-specific preparation is performed. If the request is
> not page-aligned, the remaining bytes are initialised to 0.
>
> write is only supported for non-CoCo setups where guest memory is not
> hardware-encrypted.
Please include all of the "why". The code mostly communicates the "what", but
it doesn't capture why write() support is at all interesting, nor does it explain
why read() isn't supported.
> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
> ---
> virt/kvm/guest_memfd.c | 48 ++++++++++++++++++++++++++++++++++++++++++
There's a notable lack of uAPI and Documentation chanegs. I.e. this needs a
GUEST_MEMFD_FLAG_xxx along with proper documentation.
And while it's definitely it's a-ok to land .write() in advance of the direct map
changes, we do need to at least map out how we want the two to interact, e.g. so
that we don't end up with constraints that are impossible to satisfy.
> 1 file changed, 48 insertions(+)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 94bafd6c558c..f4e218049afa 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -380,6 +380,8 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>
> static struct file_operations kvm_gmem_fops = {
> .mmap = kvm_gmem_mmap,
> + .llseek = default_llseek,
> + .write_iter = generic_perform_write,
> .open = generic_file_open,
> .release = kvm_gmem_release,
> .fallocate = kvm_gmem_fallocate,
> @@ -390,6 +392,49 @@ void kvm_gmem_init(struct module *module)
> kvm_gmem_fops.owner = module;
> }
>
> +static int kvm_kmem_gmem_write_begin(const struct kiocb *kiocb,
> + struct address_space *mapping,
> + loff_t pos, unsigned int len,
> + struct folio **foliop,
> + void **fsdata)
Over-aggressive wrapping, this can be
static int kvm_kmem_gmem_write_begin(const struct kiocb *kiocb,
struct address_space *mapping, loff_t pos,
unsigned int len, struct folio **folio,
void **fsdata)
or
static int kvm_kmem_gmem_write_begin(const struct kiocb *kiocb,
struct address_space *mapping,
loff_t pos, unsigned int len,
struct folio **folio, void **fsdata)
if we want to bundle pos+len.
> +{
> + struct file *file = kiocb->ki_filp;
ki_filp is already a file, and even if it were a "void *", there's no need for a
local variable.
> + struct inode *inode = file_inode(file);
> + pgoff_t index = pos >> PAGE_SHIFT;
> + struct folio *folio;
> +
> + if (!kvm_gmem_supports_mmap(inode))
Checking for MMAP is neither sufficient nor strictly necessary. MMAP doesn't
imply SHARED, and it's not clear to me that mmap() support should be in any way
tied to WRITE support.
> + return -ENODEV;
> +
> + if (pos + len > i_size_read(inode))
> + return -EINVAL;
> +
> + folio = kvm_gmem_get_folio(inode, index);
Eh, since "index" is only used once, my vote is to use "pos" and do the shift
here, so that it's obvious that the input to kvm_gmem_get_folio() is being checked.
> + if (IS_ERR(folio))
> + return -EFAULT;
Why EFAULT?
> +
> + *foliop = folio;
There shouldn't be any need for a local "folio". What about having the "out"
param be just "folio"?
E.g.
static int kvm_kmem_gmem_write_begin(const struct kiocb *kiocb,
struct address_space *mapping,
loff_t pos, unsigned int len,
struct folio **folio, void **fsdata)
{
struct inode *inode = file_inode(kiocb->ki_filp);
if (!kvm_gmem_supports_write(inode))
return -ENODEV;
if (pos + len > i_size_read(inode))
return -EINVAL;
*folio = kvm_gmem_get_folio(inode, pos >> PAGE_SHIFT);
if (IS_ERR(*folio))
return PTR_ERR(*folio);
return 0;
}
> + return 0;
> +}
> +
> +static int kvm_kmem_gmem_write_end(const struct kiocb *kiocb,
> + struct address_space *mapping,
> + loff_t pos, unsigned int len,
> + unsigned int copied,
> + struct folio *folio, void *fsdata)
> +{
> + if (copied && copied < len) {
Why check if "copied" is non-zero? I don't see why KVM should behave differently
with respect to unwritten bytes if copy_folio_from_iter_atomic() fails on the
first byte or the Nth byte.
> + unsigned int from = pos & ((1UL << folio_order(folio)) - 1);
Uh, isn't this just offset_in_folio()?
> +
> + folio_zero_range(folio, from + copied, len - copied);
I'd probably be in favor of omitting "from" entirely, e.g.
if (copied < len)
folio_zero_range(folio, offset_in_folio(pos) + copied,
len - copied);
> + }
> +
> + folio_unlock(folio);
> + folio_put(folio);
> +
> + return copied;
> +}
next prev parent reply other threads:[~2025-10-23 16:07 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-20 16:13 [PATCH v6 0/2] KVM: guest_memfd: use write for population Kalyazin, Nikita
2025-10-20 16:14 ` [PATCH v6 1/2] KVM: guest_memfd: add generic population via write Kalyazin, Nikita
2025-10-23 16:07 ` Sean Christopherson [this message]
2025-10-24 14:35 ` Nikita Kalyazin
2025-10-30 21:37 ` Sean Christopherson
2025-11-03 16:55 ` Nikita Kalyazin
2025-11-07 15:42 ` Ackerley Tng
2025-11-07 17:23 ` Nikita Kalyazin
2025-10-20 16:14 ` [PATCH v6 2/2] KVM: selftests: update guest_memfd write tests Kalyazin, Nikita
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPpS2aqdobVTk_ed@google.com \
--to=seanjc@google.com \
--cc=david@redhat.com \
--cc=derekmn@amazon.com \
--cc=jackabt@amazon.co.uk \
--cc=jthoughton@google.com \
--cc=kalyazin@amazon.co.uk \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=patrick.roy@linux.dev \
--cc=pbonzini@redhat.com \
--cc=shuah@kernel.org \
--cc=xmarcalx@amazon.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).