public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Thanos Makatos <thanos.makatos@nutanix.com>
Cc: kvm@vger.kernel.org, john.levon@nutanix.com, mst@redhat.com,
	 john.g.johnson@oracle.com, dinechin@redhat.com,
	cohuck@redhat.com,  jasowang@redhat.com, stefanha@redhat.com,
	jag.raman@oracle.com,  eafanasova@gmail.com,
	elena.ufimtseva@oracle.com, changpeng.liu@intel.com,
	 james.r.harris@intel.com, benjamin.walker@intel.com
Subject: Re: [RFC PATCH] KVM: optionally commit write on ioeventfd write
Date: Fri, 5 Sep 2025 07:09:49 -0700	[thread overview]
Message-ID: <aLrvLfkiz6TwR4ML@google.com> (raw)
In-Reply-To: <20221005211551.152216-1-thanos.makatos@nutanix.com>

On Wed, Oct 05, 2022, Thanos Makatos wrote:

Amusingly, I floated this exact idea internally without ever seeing this patch
(we ended up going a different direction).  Sadly, I can't claim infringement,
as my suggestion was timestamped from December 2022 :-D

If this is useful for y'all, I don't see a reason not to do it.

> ---
>  include/uapi/linux/kvm.h       | 5 ++++-
>  tools/include/uapi/linux/kvm.h | 2 ++
>  virt/kvm/eventfd.c             | 9 +++++++++
>  3 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index eed0315a77a6..0a884ac1cc76 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -804,6 +804,7 @@ enum {
>  	kvm_ioeventfd_flag_nr_deassign,
>  	kvm_ioeventfd_flag_nr_virtio_ccw_notify,
>  	kvm_ioeventfd_flag_nr_fast_mmio,
> +	kvm_ioevetnfd_flag_nr_commit_write,
>  	kvm_ioeventfd_flag_nr_max,
>  };
>  
> @@ -812,16 +813,18 @@ enum {
>  #define KVM_IOEVENTFD_FLAG_DEASSIGN  (1 << kvm_ioeventfd_flag_nr_deassign)
>  #define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
>  	(1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify)
> +#define KVM_IOEVENTFD_FLAG_COMMIT_WRITE (1 << kvm_ioevetnfd_flag_nr_commit_write)

Maybe POST_WRITE to try to capture the effective semantics?  As for read after
write hazards, my vote is to document that KVM provides no guarantees on that
front.  I can't envision a use case where it makes sense to provide guarantees
in the kernel, since doing so would largely defeat the purpose of handling writes
in the fastpath.

>  #define KVM_IOEVENTFD_VALID_FLAG_MASK  ((1 << kvm_ioeventfd_flag_nr_max) - 1)
>  
>  struct kvm_ioeventfd {
>  	__u64 datamatch;
>  	__u64 addr;        /* legal pio/mmio address */
> +	__u64 vaddr;       /* user address to write to if COMMIT_WRITE is set */

This needs to be placed at the end, i.e. actually needs to consume the pad[]
bytes.  Inserting into the middle changes the layout of the structure and thus
breaks ABI.

And maybe post_addr (or commit_addr)?  Because vaddr might be interpreted as the
host virtual address that corresponds to "addr", which may or may not be the case.

>  	__u32 len;         /* 1, 2, 4, or 8 bytes; or 0 to ignore length */
>  	__s32 fd;
>  	__u32 flags;
> -	__u8  pad[36];
> +	__u8  pad[28];
>  };
 
...

> @@ -812,6 +813,7 @@ enum {
>  #define KVM_IOEVENTFD_FLAG_DEASSIGN  (1 << kvm_ioeventfd_flag_nr_deassign)
>  #define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
>  	(1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify)
> +#define KVM_IOEVENTFD_FLAG_COMMIT_WRITE (1 << kvm_ioevetnfd_flag_nr_commit_write)
>  
>  #define KVM_IOEVENTFD_VALID_FLAG_MASK  ((1 << kvm_ioeventfd_flag_nr_max) - 1)
>  
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 2a3ed401ce46..c98e7b54fafa 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -682,6 +682,8 @@ struct _ioeventfd {
>  	struct kvm_io_device dev;
>  	u8                   bus_idx;
>  	bool                 wildcard;
> +	bool                 commit_write;
> +	void                 *vaddr;

There's no need for a separate bool, just pivot on the validity of the pointer.
The simplest approach is to disallow NULL pointers (which aren't technically
illegal for userspace, but I doubt any use case actually cares).  Alternatively,
set the internal pointer to e.g. -EINVAL and then act on !IS_ERR().

The pointer also needs to be tagged __user.

>  };
>  
>  static inline struct _ioeventfd *
> @@ -753,6 +755,10 @@ ioeventfd_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this, gpa_t addr,
>  	if (!ioeventfd_in_range(p, addr, len, val))
>  		return -EOPNOTSUPP;
>  
> +	if (p->commit_write) {
> +		if (unlikely(copy_to_user(p->vaddr, val, len)))

This needs to check that len > 0.  I think it's also worth hoisting the validity
checks into kvm_assign_ioeventfd_idx() so that this can use the slightly more
optimal __copy_to_user().

E.g. 

	if (args->flags & KVM_IOEVENTFD_FLAG_REDIRECT) {
		if (!args->len || !args->post_addr ||
		    args->redirect != untagged_addr(args->post_addr) ||
		    !access_ok((void __user *)(unsigned long)args->post_addr, args->len)) {
			ret = -EINVAL;
			goto fail;
		}

		p->post_addr = (void __user *)(unsigned long)args->post_addr;
	}

And then the usage here can be

	if (p->post_addr && __copy_to_user(p->post_addr, val, len))
		return -EFAULT;

I assume the spinlock in eventfd_signal() provides ordering even on weakly
ordered architectures, but we should double check that, i.e. that we don't need
an explicitly barrier of some kind.

Lastly, I believe kvm_deassign_ioeventfd_idx() needs to check for a match on
post_addr (or whatever it gets named).

> +			return -EFAULT;
> +	}
>  	eventfd_signal(p->eventfd, 1);
>  	return 0;
>  }
> @@ -832,6 +838,9 @@ static int kvm_assign_ioeventfd_idx(struct kvm *kvm,
>  	else
>  		p->wildcard = true;
>  
> +	p->commit_write = args->flags & KVM_IOEVENTFD_FLAG_COMMIT_WRITE;
> +	p->vaddr = (void *)args->vaddr;
> +
>  	mutex_lock(&kvm->slots_lock);
>  
>  	/* Verify that there isn't a match already */
> -- 
> 2.22.3
> 

  reply	other threads:[~2025-09-05 14:10 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-05 21:15 [RFC PATCH] KVM: optionally commit write on ioeventfd write Thanos Makatos
2025-09-05 14:09 ` Sean Christopherson [this message]
2025-12-02  9:15   ` Thanos Makatos
2025-12-02 22:53     ` Sean Christopherson
2025-12-03 14:25       ` Thanos Makatos
2025-12-15 22:42         ` Thanos Makatos
2025-12-19  1:20         ` Sean Christopherson
2025-12-19 10:36           ` John Levon
2026-01-13 19:43           ` Thanos Makatos
2026-01-13 20:00             ` Sean Christopherson
2026-03-02 12:28               ` [PATCH] KVM: optionally post " Thanos Makatos
2026-03-05  1:26                 ` Sean Christopherson
2026-03-06 11:14                   ` Thanos Makatos
2026-03-05  1:49                 ` kernel test robot
2026-03-05  9:39                 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aLrvLfkiz6TwR4ML@google.com \
    --to=seanjc@google.com \
    --cc=benjamin.walker@intel.com \
    --cc=changpeng.liu@intel.com \
    --cc=cohuck@redhat.com \
    --cc=dinechin@redhat.com \
    --cc=eafanasova@gmail.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=jag.raman@oracle.com \
    --cc=james.r.harris@intel.com \
    --cc=jasowang@redhat.com \
    --cc=john.g.johnson@oracle.com \
    --cc=john.levon@nutanix.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=thanos.makatos@nutanix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox