From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Christopher Clark <christopher.w.clark@gmail.com>
Cc: Juergen Gross <jgross@suse.com>,
Stefano Stabellini <sstabellini@kernel.org>,
Wei Liu <wei.liu2@citrix.com>,
Ross Philipson <ross.philipson@gmail.com>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
George Dunlap <George.Dunlap@eu.citrix.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Jason Andryuk <jandryuk@gmail.com>,
Ian Jackson <ian.jackson@eu.citrix.com>,
Rich Persaud <persaur@gmail.com>, Tim Deegan <tim@xen.org>,
Daniel Smith <dpsmith@apertussolutions.com>,
Julien Grall <julien.grall@arm.com>,
Paul Durrant <paul.durrant@citrix.com>,
Jan Beulich <jbeulich@suse.com>,
xen-devel@lists.xenproject.org,
James McKenzie <james@bromium.com>,
Eric Chanudet <eric.chanudet@gmail.com>
Subject: Re: [PATCH v5 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
Date: Tue, 22 Jan 2019 13:08:22 +0100 [thread overview]
Message-ID: <20190122120822.v5eyc64fa4moourd@mac> (raw)
In-Reply-To: <1548064795-18160-10-git-send-email-christopher.w.clark@gmail.com>
On Mon, Jan 21, 2019 at 01:59:49AM -0800, Christopher Clark wrote:
> sendv operation is invoked to perform a synchronous send of buffers
> contained in iovs to a remote domain's registered ring.
>
> It takes:
> * A destination address (domid, port) for the ring to send to.
> It performs a most-specific match lookup, to allow for wildcard.
> * A source address, used to inform the destination of where to reply.
> * The address of an array of iovs containing the data to send
> * .. and the length of that array of iovs
> * and a 32-bit message type, available to communicate message context
> data (eg. kernel-to-kernel, separate from the application data).
>
> If insufficient space exists in the destination ring, it will return
> -EAGAIN and Xen will notify the caller when sufficient space becomes
> available.
>
> Accesses to the ring indices are appropriately atomic. The rings are
> mapped into Xen's private address space to write as needed and the
> mappings are retained for later use.
>
> Notifications are sent to guests via VIRQ and send_guest_global_virq is
> exposed in the change to enable argo to call it. VIRQ_ARGO_MESSAGE is
^ VIRQ_ARGO
> claimed from the VIRQ previously reserved for this purpose (#11).
>
> The VIRQ notification method is used rather than sending events using
> evtchn functions directly because:
>
> * no current event channel type is an exact fit for the intended
> behaviour. ECS_IPI is closest, but it disallows migration to
> other VCPUs which is not necessarily a requirement for Argo.
>
> * at the point of argo_init, allocation of an event channel is
> complicated by none of the guest VCPUs being initialized yet
> and the event channel logic expects that a valid event channel
> has a present VCPU.
IMO iff you wanted to use event channels those _must_ be setup by the
guest, ie: the guest argo driver would load, allocate an event channel
and then tell the hypervisor about the event channel that should be
used for argo notifications.
> +static int
> +memcpy_to_guest_ring(const struct domain *d, struct argo_ring_info *ring_info,
> + unsigned int offset,
> + const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd,
> + unsigned int len)
> +{
> + unsigned int mfns_index = offset >> PAGE_SHIFT;
> + void *dst;
> + int ret;
> + unsigned int src_offset = 0;
> +
> + ASSERT(LOCKING_L3(d, ring_info));
> +
> + offset &= ~PAGE_MASK;
> +
> + if ( len + offset > XEN_ARGO_MAX_RING_SIZE )
> + return -EFAULT;
> +
> + while ( len )
> + {
> + unsigned int head_len = (offset + len) > PAGE_SIZE ? PAGE_SIZE - offset
> + : len;
IMO that would be clearer as:
head_len = min(PAGE_SIZE - offset, len);
But anyway, this should go away when you move to using vmap.
[...]
> +static int
> +ringbuf_insert(const struct domain *d, struct argo_ring_info *ring_info,
> + const struct argo_ring_id *src_id,
> + XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd,
> + unsigned long niov, uint32_t message_type,
> + unsigned long *out_len)
> +{
> + xen_argo_ring_t ring;
> + struct xen_argo_ring_message_header mh = { };
> + int sp, ret;
> + unsigned int len = 0;
> + xen_argo_iov_t iovs[XEN_ARGO_MAXIOV];
> + xen_argo_iov_t *piov;
> + XEN_GUEST_HANDLE(uint8_t) NULL_hnd =
> + guest_handle_from_param(guest_handle_from_ptr(NULL, uint8_t), uint8_t);
> +
> + ASSERT(LOCKING_L3(d, ring_info));
> +
> + ret = __copy_from_guest(iovs, iovs_hnd, niov) ? -EFAULT : 0;
> + if ( ret )
> + return ret;
> +
> + /*
> + * Obtain the total size of data to transmit -- sets the 'len' variable
> + * -- and sanity check that the iovs conform to size and number limits.
> + * Enforced below: no more than 'len' bytes of guest data
> + * (plus the message header) will be sent in this operation.
> + */
> + ret = iov_count(iovs, niov, &len);
> + if ( ret )
> + return ret;
> +
> + /*
> + * Size bounds check against ring size and static maximum message limit.
> + * The message must not fill the ring; there must be at least one slot
> + * remaining so we can distinguish a full ring from an empty one.
> + */
> + if ( ((ROUNDUP_MESSAGE(len) +
> + sizeof(struct xen_argo_ring_message_header)) >= ring_info->len) ||
> + (len > MAX_ARGO_MESSAGE_SIZE) )
len is already checked to be <= MAX_ARGO_MESSAGE_SIZE in iov_count
where it gets set, this is redundant.
> + return -EMSGSIZE;
> +
> + ret = get_sanitized_ring(d, &ring, ring_info);
> + if ( ret )
> + return ret;
> +
> + argo_dprintk("ring.tx_ptr=%u ring.rx_ptr=%u ring len=%u"
> + " ring_info->tx_ptr=%u\n",
> + ring.tx_ptr, ring.rx_ptr, ring_info->len, ring_info->tx_ptr);
> +
> + if ( ring.rx_ptr == ring.tx_ptr )
> + sp = ring_info->len;
> + else
> + {
> + sp = ring.rx_ptr - ring.tx_ptr;
> + if ( sp < 0 )
> + sp += ring_info->len;
> + }
> +
> + /*
> + * Size bounds check against currently available space in the ring.
> + * Again: the message must not fill the ring leaving no space remaining.
> + */
> + if ( (ROUNDUP_MESSAGE(len) +
> + sizeof(struct xen_argo_ring_message_header)) >= sp )
> + {
> + argo_dprintk("EAGAIN\n");
> + return -EAGAIN;
> + }
> +
> + mh.len = len + sizeof(struct xen_argo_ring_message_header);
> + mh.source.aport = src_id->aport;
> + mh.source.domain_id = src_id->domain_id;
> + mh.message_type = message_type;
> +
> + /*
> + * For this copy to the guest ring, tx_ptr is always 16-byte aligned
> + * and the message header is 16 bytes long.
> + */
> + BUILD_BUG_ON(
> + sizeof(struct xen_argo_ring_message_header) != ROUNDUP_MESSAGE(1));
> +
> + /*
> + * First data write into the destination ring: fixed size, message header.
> + * This cannot overrun because the available free space (value in 'sp')
> + * is checked above and must be at least this size.
> + */
> + ret = memcpy_to_guest_ring(d, ring_info,
> + ring.tx_ptr + sizeof(xen_argo_ring_t),
> + &mh, NULL_hnd, sizeof(mh));
> + if ( ret )
> + {
> + gprintk(XENLOG_ERR,
> + "argo: failed to write message header to ring (vm%u:%x vm%u)\n",
> + ring_info->id.domain_id, ring_info->id.aport,
> + ring_info->id.partner_id);
> +
> + return ret;
> + }
> +
> + ring.tx_ptr += sizeof(mh);
> + if ( ring.tx_ptr == ring_info->len )
> + ring.tx_ptr = 0;
> +
> + for ( piov = iovs; niov--; piov++ )
> + {
> + XEN_GUEST_HANDLE_64(uint8_t) buf_hnd = piov->iov_hnd;
> + unsigned int iov_len = piov->iov_len;
> +
> + /* If no data is provided in this iov, moan and skip on to the next */
> + if ( !iov_len )
> + {
> + gprintk(XENLOG_ERR,
This should likely be WARN or INFO, since it's not an error?
> + "argo: no data iov_len=0 iov_hnd=%p ring (vm%u:%x vm%u)\n",
> + buf_hnd.p, ring_info->id.domain_id, ring_info->id.aport,
> + ring_info->id.partner_id);
> +
> + continue;
> + }
> +
> + if ( unlikely(!guest_handle_okay(buf_hnd, iov_len)) )
> + {
> + gprintk(XENLOG_ERR,
> + "argo: bad iov handle [%p, %u] (vm%u:%x vm%u)\n",
> + buf_hnd.p, iov_len,
> + ring_info->id.domain_id, ring_info->id.aport,
> + ring_info->id.partner_id);
> +
> + return -EFAULT;
> + }
> +
> + sp = ring_info->len - ring.tx_ptr;
> +
> + /* Check: iov data size versus free space at the tail of the ring */
> + if ( iov_len > sp )
> + {
> + /*
> + * Second possible data write: ring-tail-wrap-write.
> + * Populate the ring tail and update the internal tx_ptr to handle
> + * wrapping at the end of ring.
> + * Size of data written here: sp
> + * which is the exact full amount of free space available at the
> + * tail of the ring, so this cannot overrun.
> + */
> + ret = memcpy_to_guest_ring(d, ring_info,
> + ring.tx_ptr + sizeof(xen_argo_ring_t),
> + NULL, buf_hnd, sp);
> + if ( ret )
> + {
> + gprintk(XENLOG_ERR,
> + "argo: failed to copy {%p, %d} (vm%u:%x vm%u)\n",
> + buf_hnd.p, sp,
> + ring_info->id.domain_id, ring_info->id.aport,
> + ring_info->id.partner_id);
> +
> + return ret;
> + }
> +
> + ring.tx_ptr = 0;
> + iov_len -= sp;
> + guest_handle_add_offset(buf_hnd, sp);
> +
> + ASSERT(iov_len <= ring_info->len);
> + }
> +
> + /*
> + * Third possible data write: all data remaining for this iov.
> + * Size of data written here: iov_len
> + *
> + * Case 1: if the ring-tail-wrap-write above was performed, then
> + * iov_len has been decreased by 'sp' and ring.tx_ptr is zero.
> + *
> + * We know from checking the result of iov_count:
> + * len + sizeof(message_header) <= ring_info->len
> + * We also know that len is the total of summing all iov_lens, so:
> + * iov_len <= len
> + * so by transitivity:
> + * iov_len <= len <= (ring_info->len - sizeof(msgheader))
> + * and therefore:
> + * (iov_len + sizeof(msgheader) <= ring_info->len) &&
> + * (ring.tx_ptr == 0)
> + * so this write cannot overrun here.
> + *
> + * Case 2: ring-tail-wrap-write above was not performed
> + * -> so iov_len is the guest-supplied value and: (iov_len <= sp)
> + * ie. less than available space at the tail of the ring:
> + * so this write cannot overrun.
> + */
> + ret = memcpy_to_guest_ring(d, ring_info,
> + ring.tx_ptr + sizeof(xen_argo_ring_t),
> + NULL, buf_hnd, iov_len);
> + if ( ret )
> + {
> + gprintk(XENLOG_ERR,
> + "argo: failed to copy [%p, %u] (vm%u:%x vm%u)\n",
> + buf_hnd.p, iov_len, ring_info->id.domain_id,
> + ring_info->id.aport, ring_info->id.partner_id);
> +
> + return ret;
> + }
> +
> + ring.tx_ptr += iov_len;
> +
> + if ( ring.tx_ptr == ring_info->len )
> + ring.tx_ptr = 0;
> + }
> +
> + ring.tx_ptr = ROUNDUP_MESSAGE(ring.tx_ptr);
> +
> + if ( ring.tx_ptr >= ring_info->len )
> + ring.tx_ptr -= ring_info->len;
You seem to handle the wrapping after each possible write, so I think
the above is not needed? Maybe it should be an assert instead?
> +
> + update_tx_ptr(d, ring_info, ring.tx_ptr);
> +
> + /*
> + * At this point (and also on an error exit paths from this function) it is
> + * possible to unmap the ring_info, ie:
> + * ring_unmap(d, ring_info);
> + * but performance should be improved by not doing so, and retaining
> + * the mapping.
> + * An XSM policy control over level of confidentiality required
> + * versus performance cost could be added to decide that here.
> + */
> +
> + *out_len = len;
> +
> + return ret;
> +}
> +
> static void
> wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent)
> {
> @@ -497,6 +918,25 @@ wildcard_pending_list_remove(domid_t domain_id, struct pending_ent *ent)
> }
>
> static void
> +wildcard_pending_list_insert(domid_t domain_id, struct pending_ent *ent)
> +{
> + struct domain *d = get_domain_by_id(domain_id);
> +
> + if ( !d )
> + return;
> +
> + ASSERT(LOCKING_Read_L1);
> +
> + if ( d->argo )
> + {
> + spin_lock(&d->argo->wildcard_L2_lock);
> + list_add(&ent->wildcard_node, &d->argo->wildcard_pend_list);
> + spin_unlock(&d->argo->wildcard_L2_lock);
> + }
> + put_domain(d);
> +}
> +
> +static void
> pending_remove_all(const struct domain *d, struct argo_ring_info *ring_info)
> {
> struct list_head *ring_pending = &ring_info->pending;
> @@ -518,6 +958,70 @@ pending_remove_all(const struct domain *d, struct argo_ring_info *ring_info)
> ring_info->npending = 0;
> }
>
> +static int
> +pending_queue(const struct domain *d, struct argo_ring_info *ring_info,
> + domid_t src_id, unsigned int len)
> +{
> + struct pending_ent *ent;
> +
> + ASSERT(LOCKING_L3(d, ring_info));
> +
> + if ( ring_info->npending >= MAX_PENDING_PER_RING )
> + return -ENOSPC;
> +
> + ent = xmalloc(struct pending_ent);
> + if ( !ent )
> + return -ENOMEM;
> +
> + ent->len = len;
> + ent->domain_id = src_id;
> + ent->ring_info = ring_info;
> +
> + if ( ring_info->id.partner_id == XEN_ARGO_DOMID_ANY )
> + wildcard_pending_list_insert(src_id, ent);
> + list_add(&ent->node, &ring_info->pending);
> + ring_info->npending++;
> +
> + return 0;
> +}
> +
> +static int
> +pending_requeue(const struct domain *d, struct argo_ring_info *ring_info,
> + domid_t src_id, unsigned int len)
> +{
> + struct list_head *cursor, *head;
> +
> + ASSERT(LOCKING_L3(d, ring_info));
> +
> + /* List structure is not modified here. Update len in a match if found. */
> + head = &ring_info->pending;
> +
> + for ( cursor = head->next; cursor != head; cursor = cursor->next )
list_for_each_entry
> long
> do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
> XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
> @@ -1145,6 +1734,53 @@ do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
> break;
> }
>
> + case XEN_ARGO_OP_sendv:
> + {
> + xen_argo_send_addr_t send_addr;
> +
> + XEN_GUEST_HANDLE_PARAM(xen_argo_send_addr_t) send_addr_hnd =
> + guest_handle_cast(arg1, xen_argo_send_addr_t);
> + XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd =
> + guest_handle_cast(arg2, xen_argo_iov_t);
> + /* arg3 is niov */
> + /* arg4 is message_type. Must be a 32-bit value. */
> +
> + rc = copy_from_guest(&send_addr, send_addr_hnd, 1) ? -EFAULT : 0;
> + if ( rc )
> + break;
> +
> + /*
> + * Check padding is zeroed. Reject niov above limit or message_types
> + * that are outside 32 bit range.
> + */
> + if ( unlikely(send_addr.src.pad || send_addr.dst.pad ||
> + (arg3 > XEN_ARGO_MAXIOV) || (arg4 & ~0xffffffffUL)) )
arg4 & (GB(4) - 1)
Is clearer IMO, or:
arg4 > UINT32_MAX
> + {
> + rc = -EINVAL;
> + break;
> + }
> +
> + if ( send_addr.src.domain_id == XEN_ARGO_DOMID_ANY )
> + send_addr.src.domain_id = currd->domain_id;
> +
> + /* No domain is currently authorized to send on behalf of another */
> + if ( unlikely(send_addr.src.domain_id != currd->domain_id) )
> + {
> + rc = -EPERM;
> + break;
> + }
> +
> + /*
> + * Check access to the whole array here so we can use the faster __copy
> + * operations to read each element later.
> + */
> + if ( unlikely(!guest_handle_okay(iovs_hnd, arg3)) )
You need to set rc to EFAULT here, because the call to copy_from_guest
has set it to 0.
Alternatively you can change the call above to be:
if ( copy_from_guest(&send_addr, send_addr_hnd, 1) )
return -EFAULT;
So rc doesn't get set to 0 on success.
With those taken care of:
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
next prev parent reply other threads:[~2019-01-22 12:08 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-21 9:59 [PATCH v5 00/15] Argo: hypervisor-mediated interdomain communication Christopher Clark
2019-01-21 9:59 ` [PATCH v5 01/15] argo: Introduce the Kconfig option to govern inclusion of Argo Christopher Clark
2019-01-21 9:59 ` [PATCH v5 02/15] argo: introduce the argo_op hypercall boilerplate Christopher Clark
2019-01-21 9:59 ` [PATCH v5 03/15] argo: define argo_dprintk for subsystem debugging Christopher Clark
2019-01-21 9:59 ` [PATCH v5 04/15] argo: init, destroy and soft-reset, with enable command line opt Christopher Clark
2019-01-21 17:55 ` Roger Pau Monné
2019-01-31 4:06 ` Christopher Clark
2019-01-31 10:28 ` Jan Beulich
2019-01-21 9:59 ` [PATCH v5 05/15] errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI Christopher Clark
2019-01-21 9:59 ` [PATCH v5 06/15] xen/arm: introduce guest_handle_for_field() Christopher Clark
2019-01-21 9:59 ` [PATCH v5 07/15] argo: implement the register op Christopher Clark
2019-01-22 9:59 ` Roger Pau Monné
2019-01-31 4:08 ` Christopher Clark
2019-01-21 9:59 ` [PATCH v5 08/15] argo: implement the unregister op Christopher Clark
2019-01-22 11:02 ` Roger Pau Monné
2019-01-21 9:59 ` [PATCH v5 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq Christopher Clark
2019-01-22 12:08 ` Roger Pau Monné [this message]
2019-01-31 4:10 ` Christopher Clark
2019-01-31 10:18 ` Roger Pau Monné
2019-01-31 10:35 ` Jan Beulich
2019-01-31 11:00 ` Roger Pau Monné
2019-02-03 17:56 ` Christopher Clark
2019-02-04 9:30 ` Roger Pau Monné
2019-01-21 9:59 ` [PATCH v5 10/15] argo: implement the notify op Christopher Clark
2019-01-22 14:09 ` Roger Pau Monné
2019-01-31 4:12 ` Christopher Clark
2019-01-21 9:59 ` [PATCH v5 11/15] xsm, argo: XSM control for argo register Christopher Clark
2019-01-21 9:59 ` [PATCH v5 12/15] xsm, argo: XSM control for argo message send operation Christopher Clark
2019-01-21 9:59 ` [PATCH v5 13/15] xsm, argo: XSM control for any access to argo by a domain Christopher Clark
2019-01-21 9:59 ` [PATCH v5 14/15] xsm, argo: notify: don't describe rings that cannot be sent to Christopher Clark
2019-01-21 9:59 ` [PATCH v5 15/15] MAINTAINERS: add new section for Argo and self as maintainer Christopher Clark
2019-01-21 10:58 ` Wei Liu
2019-01-21 11:07 ` Jan Beulich
2019-01-22 14:17 ` [PATCH v5 00/15] Argo: hypervisor-mediated interdomain communication Roger Pau Monné
2019-01-31 4:05 ` Christopher Clark
2019-01-31 13:39 ` Roger Pau Monné
2019-02-03 18:04 ` Christopher Clark
2019-02-04 10:07 ` Roger Pau Monné
2019-02-04 18:22 ` Stefano Stabellini
2019-02-13 10:31 ` Rich Persaud
2019-02-13 22:19 ` Stefano Stabellini
2019-02-06 10:31 ` Roger Pau Monné
2019-02-06 10:36 ` Roger Pau Monné
2019-02-13 10:43 ` Rich Persaud
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190122120822.v5eyc64fa4moourd@mac \
--to=roger.pau@citrix.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=andrew.cooper3@citrix.com \
--cc=christopher.w.clark@gmail.com \
--cc=dpsmith@apertussolutions.com \
--cc=eric.chanudet@gmail.com \
--cc=ian.jackson@eu.citrix.com \
--cc=james@bromium.com \
--cc=jandryuk@gmail.com \
--cc=jbeulich@suse.com \
--cc=jgross@suse.com \
--cc=julien.grall@arm.com \
--cc=konrad.wilk@oracle.com \
--cc=paul.durrant@citrix.com \
--cc=persaur@gmail.com \
--cc=ross.philipson@gmail.com \
--cc=sstabellini@kernel.org \
--cc=tim@xen.org \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.