All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Thanos Makatos <thanos.makatos@nutanix.com>
Cc: kvm@vger.kernel.org, john.levon@nutanix.com, mst@redhat.com,
	 john.g.johnson@oracle.com, dinechin@redhat.com,
	cohuck@redhat.com,  jasowang@redhat.com, stefanha@redhat.com,
	jag.raman@oracle.com,  eafanasova@gmail.com,
	elena.ufimtseva@oracle.com, changpeng.liu@intel.com,
	 james.r.harris@intel.com, benjamin.walker@intel.com
Subject: Re: [RFC PATCH] KVM: optionally commit write on ioeventfd write
Date: Fri, 5 Sep 2025 07:09:49 -0700	[thread overview]
Message-ID: <aLrvLfkiz6TwR4ML@google.com> (raw)
In-Reply-To: <20221005211551.152216-1-thanos.makatos@nutanix.com>

On Wed, Oct 05, 2022, Thanos Makatos wrote:

Amusingly, I floated this exact idea internally without ever seeing this patch
(we ended up going a different direction).  Sadly, I can't claim infringement,
as my suggestion was timestamped from December 2022 :-D

If this is useful for y'all, I don't see a reason not to do it.

> ---
>  include/uapi/linux/kvm.h       | 5 ++++-
>  tools/include/uapi/linux/kvm.h | 2 ++
>  virt/kvm/eventfd.c             | 9 +++++++++
>  3 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index eed0315a77a6..0a884ac1cc76 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -804,6 +804,7 @@ enum {
>  	kvm_ioeventfd_flag_nr_deassign,
>  	kvm_ioeventfd_flag_nr_virtio_ccw_notify,
>  	kvm_ioeventfd_flag_nr_fast_mmio,
> +	kvm_ioevetnfd_flag_nr_commit_write,
>  	kvm_ioeventfd_flag_nr_max,
>  };
>  
> @@ -812,16 +813,18 @@ enum {
>  #define KVM_IOEVENTFD_FLAG_DEASSIGN  (1 << kvm_ioeventfd_flag_nr_deassign)
>  #define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
>  	(1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify)
> +#define KVM_IOEVENTFD_FLAG_COMMIT_WRITE (1 << kvm_ioevetnfd_flag_nr_commit_write)

Maybe POST_WRITE to try to capture the effective semantics?  As for read after
write hazards, my vote is to document that KVM provides no guarantees on that
front.  I can't envision a use case where it makes sense to provide guarantees
in the kernel, since doing so would largely defeat the purpose of handling writes
in the fastpath.

>  #define KVM_IOEVENTFD_VALID_FLAG_MASK  ((1 << kvm_ioeventfd_flag_nr_max) - 1)
>  
>  struct kvm_ioeventfd {
>  	__u64 datamatch;
>  	__u64 addr;        /* legal pio/mmio address */
> +	__u64 vaddr;       /* user address to write to if COMMIT_WRITE is set */

This needs to be placed at the end, i.e. actually needs to consume the pad[]
bytes.  Inserting into the middle changes the layout of the structure and thus
breaks ABI.

And maybe post_addr (or commit_addr)?  Because vaddr might be interpreted as the
host virtual address that corresponds to "addr", which may or may not be the case.

>  	__u32 len;         /* 1, 2, 4, or 8 bytes; or 0 to ignore length */
>  	__s32 fd;
>  	__u32 flags;
> -	__u8  pad[36];
> +	__u8  pad[28];
>  };
 
...

> @@ -812,6 +813,7 @@ enum {
>  #define KVM_IOEVENTFD_FLAG_DEASSIGN  (1 << kvm_ioeventfd_flag_nr_deassign)
>  #define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
>  	(1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify)
> +#define KVM_IOEVENTFD_FLAG_COMMIT_WRITE (1 << kvm_ioevetnfd_flag_nr_commit_write)
>  
>  #define KVM_IOEVENTFD_VALID_FLAG_MASK  ((1 << kvm_ioeventfd_flag_nr_max) - 1)
>  
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 2a3ed401ce46..c98e7b54fafa 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -682,6 +682,8 @@ struct _ioeventfd {
>  	struct kvm_io_device dev;
>  	u8                   bus_idx;
>  	bool                 wildcard;
> +	bool                 commit_write;
> +	void                 *vaddr;

There's no need for a separate bool, just pivot on the validity of the pointer.
The simplest approach is to disallow NULL pointers (which aren't technically
illegal for userspace, but I doubt any use case actually cares).  Alternatively,
set the internal pointer to e.g. -EINVAL and then act on !IS_ERR().

The pointer also needs to be tagged __user.

>  };
>  
>  static inline struct _ioeventfd *
> @@ -753,6 +755,10 @@ ioeventfd_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this, gpa_t addr,
>  	if (!ioeventfd_in_range(p, addr, len, val))
>  		return -EOPNOTSUPP;
>  
> +	if (p->commit_write) {
> +		if (unlikely(copy_to_user(p->vaddr, val, len)))

This needs to check that len > 0.  I think it's also worth hoisting the validity
checks into kvm_assign_ioeventfd_idx() so that this can use the slightly more
optimal __copy_to_user().

E.g. 

	if (args->flags & KVM_IOEVENTFD_FLAG_REDIRECT) {
		if (!args->len || !args->post_addr ||
		    args->redirect != untagged_addr(args->post_addr) ||
		    !access_ok((void __user *)(unsigned long)args->post_addr, args->len)) {
			ret = -EINVAL;
			goto fail;
		}

		p->post_addr = (void __user *)(unsigned long)args->post_addr;
	}

And then the usage here can be

	if (p->post_addr && __copy_to_user(p->post_addr, val, len))
		return -EFAULT;

I assume the spinlock in eventfd_signal() provides ordering even on weakly
ordered architectures, but we should double check that, i.e. that we don't need
an explicitly barrier of some kind.

Lastly, I believe kvm_deassign_ioeventfd_idx() needs to check for a match on
post_addr (or whatever it gets named).

> +			return -EFAULT;
> +	}
>  	eventfd_signal(p->eventfd, 1);
>  	return 0;
>  }
> @@ -832,6 +838,9 @@ static int kvm_assign_ioeventfd_idx(struct kvm *kvm,
>  	else
>  		p->wildcard = true;
>  
> +	p->commit_write = args->flags & KVM_IOEVENTFD_FLAG_COMMIT_WRITE;
> +	p->vaddr = (void *)args->vaddr;
> +
>  	mutex_lock(&kvm->slots_lock);
>  
>  	/* Verify that there isn't a match already */
> -- 
> 2.22.3
> 

  reply	other threads:[~2025-09-05 14:10 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-05 21:15 [RFC PATCH] KVM: optionally commit write on ioeventfd write Thanos Makatos
2025-09-05 14:09 ` Sean Christopherson [this message]
2025-12-02  9:15   ` Thanos Makatos
2025-12-02 22:53     ` Sean Christopherson
2025-12-03 14:25       ` Thanos Makatos
2025-12-15 22:42         ` Thanos Makatos
2025-12-19  1:20         ` Sean Christopherson
2025-12-19 10:36           ` John Levon
2026-01-13 19:43           ` Thanos Makatos
2026-01-13 20:00             ` Sean Christopherson
2026-03-02 12:28               ` [PATCH] KVM: optionally post " Thanos Makatos
2026-03-05  1:26                 ` Sean Christopherson
2026-03-06 11:14                   ` Thanos Makatos
2026-03-05  1:49                 ` kernel test robot
2026-03-05  9:39                 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aLrvLfkiz6TwR4ML@google.com \
    --to=seanjc@google.com \
    --cc=benjamin.walker@intel.com \
    --cc=changpeng.liu@intel.com \
    --cc=cohuck@redhat.com \
    --cc=dinechin@redhat.com \
    --cc=eafanasova@gmail.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=jag.raman@oracle.com \
    --cc=james.r.harris@intel.com \
    --cc=jasowang@redhat.com \
    --cc=john.g.johnson@oracle.com \
    --cc=john.levon@nutanix.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=thanos.makatos@nutanix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.