From: Sean Christopherson <seanjc@google.com>
To: Thanos Makatos <thanos.makatos@nutanix.com>
Cc: kvm@vger.kernel.org, john.levon@nutanix.com, mst@redhat.com,
john.g.johnson@oracle.com, dinechin@redhat.com,
cohuck@redhat.com, jasowang@redhat.com, stefanha@redhat.com,
jag.raman@oracle.com, eafanasova@gmail.com,
elena.ufimtseva@oracle.com, changpeng.liu@intel.com,
james.r.harris@intel.com, benjamin.walker@intel.com
Subject: Re: [RFC PATCH] KVM: optionally commit write on ioeventfd write
Date: Fri, 5 Sep 2025 07:09:49 -0700 [thread overview]
Message-ID: <aLrvLfkiz6TwR4ML@google.com> (raw)
In-Reply-To: <20221005211551.152216-1-thanos.makatos@nutanix.com>
On Wed, Oct 05, 2022, Thanos Makatos wrote:
Amusingly, I floated this exact idea internally without ever seeing this patch
(we ended up going a different direction). Sadly, I can't claim infringement,
as my suggestion was timestamped from December 2022 :-D
If this is useful for y'all, I don't see a reason not to do it.
> ---
> include/uapi/linux/kvm.h | 5 ++++-
> tools/include/uapi/linux/kvm.h | 2 ++
> virt/kvm/eventfd.c | 9 +++++++++
> 3 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index eed0315a77a6..0a884ac1cc76 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -804,6 +804,7 @@ enum {
> kvm_ioeventfd_flag_nr_deassign,
> kvm_ioeventfd_flag_nr_virtio_ccw_notify,
> kvm_ioeventfd_flag_nr_fast_mmio,
> + kvm_ioevetnfd_flag_nr_commit_write,
> kvm_ioeventfd_flag_nr_max,
> };
>
> @@ -812,16 +813,18 @@ enum {
> #define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign)
> #define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
> (1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify)
> +#define KVM_IOEVENTFD_FLAG_COMMIT_WRITE (1 << kvm_ioevetnfd_flag_nr_commit_write)
Maybe POST_WRITE to try to capture the effective semantics? As for read after
write hazards, my vote is to document that KVM provides no guarantees on that
front. I can't envision a use case where it makes sense to provide guarantees
in the kernel, since doing so would largely defeat the purpose of handling writes
in the fastpath.
> #define KVM_IOEVENTFD_VALID_FLAG_MASK ((1 << kvm_ioeventfd_flag_nr_max) - 1)
>
> struct kvm_ioeventfd {
> __u64 datamatch;
> __u64 addr; /* legal pio/mmio address */
> + __u64 vaddr; /* user address to write to if COMMIT_WRITE is set */
This needs to be placed at the end, i.e. actually needs to consume the pad[]
bytes. Inserting into the middle changes the layout of the structure and thus
breaks ABI.
And maybe post_addr (or commit_addr)? Because vaddr might be interpreted as the
host virtual address that corresponds to "addr", which may or may not be the case.
> __u32 len; /* 1, 2, 4, or 8 bytes; or 0 to ignore length */
> __s32 fd;
> __u32 flags;
> - __u8 pad[36];
> + __u8 pad[28];
> };
...
> @@ -812,6 +813,7 @@ enum {
> #define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign)
> #define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
> (1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify)
> +#define KVM_IOEVENTFD_FLAG_COMMIT_WRITE (1 << kvm_ioevetnfd_flag_nr_commit_write)
>
> #define KVM_IOEVENTFD_VALID_FLAG_MASK ((1 << kvm_ioeventfd_flag_nr_max) - 1)
>
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 2a3ed401ce46..c98e7b54fafa 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -682,6 +682,8 @@ struct _ioeventfd {
> struct kvm_io_device dev;
> u8 bus_idx;
> bool wildcard;
> + bool commit_write;
> + void *vaddr;
There's no need for a separate bool, just pivot on the validity of the pointer.
The simplest approach is to disallow NULL pointers (which aren't technically
illegal for userspace, but I doubt any use case actually cares). Alternatively,
set the internal pointer to e.g. -EINVAL and then act on !IS_ERR().
The pointer also needs to be tagged __user.
> };
>
> static inline struct _ioeventfd *
> @@ -753,6 +755,10 @@ ioeventfd_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this, gpa_t addr,
> if (!ioeventfd_in_range(p, addr, len, val))
> return -EOPNOTSUPP;
>
> + if (p->commit_write) {
> + if (unlikely(copy_to_user(p->vaddr, val, len)))
This needs to check that len > 0. I think it's also worth hoisting the validity
checks into kvm_assign_ioeventfd_idx() so that this can use the slightly more
optimal __copy_to_user().
E.g.
if (args->flags & KVM_IOEVENTFD_FLAG_REDIRECT) {
if (!args->len || !args->post_addr ||
args->redirect != untagged_addr(args->post_addr) ||
!access_ok((void __user *)(unsigned long)args->post_addr, args->len)) {
ret = -EINVAL;
goto fail;
}
p->post_addr = (void __user *)(unsigned long)args->post_addr;
}
And then the usage here can be
if (p->post_addr && __copy_to_user(p->post_addr, val, len))
return -EFAULT;
I assume the spinlock in eventfd_signal() provides ordering even on weakly
ordered architectures, but we should double check that, i.e. that we don't need
an explicitly barrier of some kind.
Lastly, I believe kvm_deassign_ioeventfd_idx() needs to check for a match on
post_addr (or whatever it gets named).
> + return -EFAULT;
> + }
> eventfd_signal(p->eventfd, 1);
> return 0;
> }
> @@ -832,6 +838,9 @@ static int kvm_assign_ioeventfd_idx(struct kvm *kvm,
> else
> p->wildcard = true;
>
> + p->commit_write = args->flags & KVM_IOEVENTFD_FLAG_COMMIT_WRITE;
> + p->vaddr = (void *)args->vaddr;
> +
> mutex_lock(&kvm->slots_lock);
>
> /* Verify that there isn't a match already */
> --
> 2.22.3
>
next prev parent reply other threads:[~2025-09-05 14:10 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-05 21:15 [RFC PATCH] KVM: optionally commit write on ioeventfd write Thanos Makatos
2025-09-05 14:09 ` Sean Christopherson [this message]
2025-12-02 9:15 ` Thanos Makatos
2025-12-02 22:53 ` Sean Christopherson
2025-12-03 14:25 ` Thanos Makatos
2025-12-15 22:42 ` Thanos Makatos
2025-12-19 1:20 ` Sean Christopherson
2025-12-19 10:36 ` John Levon
2026-01-13 19:43 ` Thanos Makatos
2026-01-13 20:00 ` Sean Christopherson
2026-03-02 12:28 ` [PATCH] KVM: optionally post " Thanos Makatos
2026-03-05 1:26 ` Sean Christopherson
2026-03-06 11:14 ` Thanos Makatos
2026-03-05 1:49 ` kernel test robot
2026-03-05 9:39 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aLrvLfkiz6TwR4ML@google.com \
--to=seanjc@google.com \
--cc=benjamin.walker@intel.com \
--cc=changpeng.liu@intel.com \
--cc=cohuck@redhat.com \
--cc=dinechin@redhat.com \
--cc=eafanasova@gmail.com \
--cc=elena.ufimtseva@oracle.com \
--cc=jag.raman@oracle.com \
--cc=james.r.harris@intel.com \
--cc=jasowang@redhat.com \
--cc=john.g.johnson@oracle.com \
--cc=john.levon@nutanix.com \
--cc=kvm@vger.kernel.org \
--cc=mst@redhat.com \
--cc=stefanha@redhat.com \
--cc=thanos.makatos@nutanix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox