From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B7EA0C7115B for ; Thu, 26 Jun 2025 07:49:42 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uUhJy-0006JM-2R; Thu, 26 Jun 2025 03:46:58 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uUhJv-00066d-Ni for qemu-devel@nongnu.org; Thu, 26 Jun 2025 03:46:55 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uUhJs-000220-5H for qemu-devel@nongnu.org; Thu, 26 Jun 2025 03:46:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1750924006; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=e+H0MhAcAv4nq4Wbd68BYhhQHn3QGKnFblTAmr6dPgk=; b=ICbvdC5lTeHJt4Qp+6wHdoc4sX6QFQQKLgGaFbReal1rN0GqU4kIa5CYqM5L/aAKOSPppq M0DJqyU3H6DnUsy2Dq+M9k++CZwfQpqHHGqfVXAvHm+XVMHC7UCRCq5TDr2de9i8aSb3Ho 3fO0jTQU6K/7baE0nkL3THk0oXvvTB4= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-651-66eXxJ1AOP2NOarmzh-7HQ-1; Thu, 26 Jun 2025 03:46:42 -0400 X-MC-Unique: 66eXxJ1AOP2NOarmzh-7HQ-1 X-Mimecast-MFC-AGG-ID: 66eXxJ1AOP2NOarmzh-7HQ_1750924001 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A5CF118DA385; Thu, 26 Jun 2025 07:46:41 +0000 (UTC) Received: from corto.redhat.com (unknown [10.44.32.51]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id D1ECC180045B; Thu, 26 Jun 2025 07:46:38 +0000 (UTC) From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= To: qemu-devel@nongnu.org Cc: Alex Williamson , John Levon , John Johnson , Elena Ufimtseva , Jagannathan Raman , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Subject: [PULL 23/25] vfio-user: add coalesced posted writes Date: Thu, 26 Jun 2025 09:45:27 +0200 Message-ID: <20250626074529.1384114-24-clg@redhat.com> In-Reply-To: <20250626074529.1384114-1-clg@redhat.com> References: <20250626074529.1384114-1-clg@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Received-SPF: pass client-ip=170.10.129.124; envelope-from=clg@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: John Levon Add new message to send multiple writes to server in a single message. Prevents the outgoing queue from overflowing when a long latency operation is followed by a series of posted writes. Originally-by: John Johnson Signed-off-by: Elena Ufimtseva Signed-off-by: Jagannathan Raman Signed-off-by: John Levon Reviewed-by: Cédric Le Goater Link: https://lore.kernel.org/qemu-devel/20250625193012.2316242-18-john.levon@nutanix.com Signed-off-by: Cédric Le Goater --- hw/vfio-user/protocol.h | 21 ++++++++++ hw/vfio-user/proxy.h | 12 ++++++ hw/vfio-user/device.c | 40 +++++++++++++++++++ hw/vfio-user/proxy.c | 84 +++++++++++++++++++++++++++++++++++++++ hw/vfio-user/trace-events | 1 + 5 files changed, 158 insertions(+) diff --git a/hw/vfio-user/protocol.h b/hw/vfio-user/protocol.h index 3e9d8e576bae6a51d8fbc7cf26bb35d29e0e1aee..3249a4a6b62af95e3718ed5053bbf310cd08c024 100644 --- a/hw/vfio-user/protocol.h +++ b/hw/vfio-user/protocol.h @@ -39,6 +39,7 @@ enum vfio_user_command { VFIO_USER_DMA_WRITE = 12, VFIO_USER_DEVICE_RESET = 13, VFIO_USER_DIRTY_PAGES = 14, + VFIO_USER_REGION_WRITE_MULTI = 15, VFIO_USER_MAX, }; @@ -72,6 +73,7 @@ typedef struct { #define VFIO_USER_CAP_PGSIZES "pgsizes" #define VFIO_USER_CAP_MAP_MAX "max_dma_maps" #define VFIO_USER_CAP_MIGR "migration" +#define VFIO_USER_CAP_MULTI "write_multiple" /* "migration" members */ #define VFIO_USER_CAP_PGSIZE "pgsize" @@ -218,4 +220,23 @@ typedef struct { char data[]; } VFIOUserBitmap; +/* + * VFIO_USER_REGION_WRITE_MULTI + */ +#define VFIO_USER_MULTI_DATA 8 +#define VFIO_USER_MULTI_MAX 200 + +typedef struct { + uint64_t offset; + uint32_t region; + uint32_t count; + char data[VFIO_USER_MULTI_DATA]; +} VFIOUserWROne; + +typedef struct { + VFIOUserHdr hdr; + uint64_t wr_cnt; + VFIOUserWROne wrs[VFIO_USER_MULTI_MAX]; +} VFIOUserWRMulti; + #endif /* VFIO_USER_PROTOCOL_H */ diff --git a/hw/vfio-user/proxy.h b/hw/vfio-user/proxy.h index 0418f58bf156121d65d363fd647114ca0cfecec0..61e64a0020fcd3a12e6cf846682bb0c8e4352c07 100644 --- a/hw/vfio-user/proxy.h +++ b/hw/vfio-user/proxy.h @@ -85,6 +85,8 @@ typedef struct VFIOUserProxy { VFIOUserMsg *last_nowait; VFIOUserMsg *part_recv; size_t recv_left; + VFIOUserWRMulti *wr_multi; + int num_outgoing; enum proxy_state state; } VFIOUserProxy; @@ -92,6 +94,11 @@ typedef struct VFIOUserProxy { #define VFIO_PROXY_CLIENT 0x1 #define VFIO_PROXY_FORCE_QUEUED 0x4 #define VFIO_PROXY_NO_POST 0x8 +#define VFIO_PROXY_USE_MULTI 0x16 + +/* coalescing high and low water marks for VFIOProxy num_outgoing */ +#define VFIO_USER_OUT_HIGH 1024 +#define VFIO_USER_OUT_LOW 128 typedef struct VFIODevice VFIODevice; @@ -120,4 +127,9 @@ bool vfio_user_send_async(VFIOUserProxy *proxy, VFIOUserHdr *hdr, void vfio_user_send_reply(VFIOUserProxy *proxy, VFIOUserHdr *hdr, int size); void vfio_user_send_error(VFIOUserProxy *proxy, VFIOUserHdr *hdr, int error); +void vfio_user_flush_multi(VFIOUserProxy *proxy); +void vfio_user_create_multi(VFIOUserProxy *proxy); +void vfio_user_add_multi(VFIOUserProxy *proxy, uint8_t index, + off_t offset, uint32_t count, void *data); + #endif /* VFIO_USER_PROXY_H */ diff --git a/hw/vfio-user/device.c b/hw/vfio-user/device.c index aa07eac3303aef07d32592f9a56a58e85011e6ab..0609a7dc25428c1e7efed1e7a4e85de91aace2d4 100644 --- a/hw/vfio-user/device.c +++ b/hw/vfio-user/device.c @@ -9,6 +9,8 @@ #include "qemu/osdep.h" #include "qapi/error.h" #include "qemu/error-report.h" +#include "qemu/lockable.h" +#include "qemu/thread.h" #include "hw/vfio-user/device.h" #include "hw/vfio-user/trace.h" @@ -337,6 +339,7 @@ static int vfio_user_device_io_region_write(VFIODevice *vbasedev, uint8_t index, VFIOUserProxy *proxy = vbasedev->proxy; int size = sizeof(*msgp) + count; Error *local_err = NULL; + bool can_multi; int flags = 0; int ret; @@ -352,6 +355,43 @@ static int vfio_user_device_io_region_write(VFIODevice *vbasedev, uint8_t index, flags |= VFIO_USER_NO_REPLY; } + /* write eligible to be in a WRITE_MULTI msg ? */ + can_multi = (proxy->flags & VFIO_PROXY_USE_MULTI) && post && + count <= VFIO_USER_MULTI_DATA; + + /* + * This should be a rare case, so first check without the lock, + * if we're wrong, vfio_send_queued() will flush any posted writes + * we missed here + */ + if (proxy->wr_multi != NULL || + (proxy->num_outgoing > VFIO_USER_OUT_HIGH && can_multi)) { + + /* + * re-check with lock + * + * if already building a WRITE_MULTI msg, + * add this one if possible else flush pending before + * sending the current one + * + * else if outgoing queue is over the highwater, + * start a new WRITE_MULTI message + */ + WITH_QEMU_LOCK_GUARD(&proxy->lock) { + if (proxy->wr_multi != NULL) { + if (can_multi) { + vfio_user_add_multi(proxy, index, off, count, data); + return count; + } + vfio_user_flush_multi(proxy); + } else if (proxy->num_outgoing > VFIO_USER_OUT_HIGH && can_multi) { + vfio_user_create_multi(proxy); + vfio_user_add_multi(proxy, index, off, count, data); + return count; + } + } + } + msgp = g_malloc0(size); vfio_user_request_msg(&msgp->hdr, VFIO_USER_REGION_WRITE, size, flags); msgp->offset = off; diff --git a/hw/vfio-user/proxy.c b/hw/vfio-user/proxy.c index 7ce8573abbb4337028ab67cd04562ab822c672d2..c418954440839b92864518cd757c750f0576d0af 100644 --- a/hw/vfio-user/proxy.c +++ b/hw/vfio-user/proxy.c @@ -13,12 +13,14 @@ #include "hw/vfio-user/proxy.h" #include "hw/vfio-user/trace.h" #include "qapi/error.h" +#include "qobject/qbool.h" #include "qobject/qdict.h" #include "qobject/qjson.h" #include "qobject/qnum.h" #include "qemu/error-report.h" #include "qemu/lockable.h" #include "qemu/main-loop.h" +#include "qemu/thread.h" #include "system/iothread.h" static IOThread *vfio_user_iothread; @@ -445,6 +447,7 @@ static ssize_t vfio_user_send_one(VFIOUserProxy *proxy, Error **errp) } QTAILQ_REMOVE(&proxy->outgoing, msg, next); + proxy->num_outgoing--; if (msg->type == VFIO_MSG_ASYNC) { vfio_user_recycle(proxy, msg); } else { @@ -481,6 +484,11 @@ static void vfio_user_send(void *opaque) } qio_channel_set_aio_fd_handler(proxy->ioc, proxy->ctx, vfio_user_recv, NULL, NULL, proxy); + + /* queue empty - send any pending multi write msgs */ + if (proxy->wr_multi != NULL) { + vfio_user_flush_multi(proxy); + } } } @@ -579,11 +587,18 @@ static bool vfio_user_send_queued(VFIOUserProxy *proxy, VFIOUserMsg *msg, { int ret; + /* older coalesced writes go first */ + if (proxy->wr_multi != NULL && + ((msg->hdr->flags & VFIO_USER_TYPE) == VFIO_USER_REQUEST)) { + vfio_user_flush_multi(proxy); + } + /* * Unsent outgoing msgs - add to tail */ if (!QTAILQ_EMPTY(&proxy->outgoing)) { QTAILQ_INSERT_TAIL(&proxy->outgoing, msg, next); + proxy->num_outgoing++; return true; } @@ -598,6 +613,7 @@ static bool vfio_user_send_queued(VFIOUserProxy *proxy, VFIOUserMsg *msg, if (ret == QIO_CHANNEL_ERR_BLOCK) { QTAILQ_INSERT_HEAD(&proxy->outgoing, msg, next); + proxy->num_outgoing = 1; qio_channel_set_aio_fd_handler(proxy->ioc, proxy->ctx, vfio_user_recv, proxy->ctx, vfio_user_send, proxy); @@ -1151,12 +1167,27 @@ static bool check_migr(VFIOUserProxy *proxy, QObject *qobj, Error **errp) return caps_parse(proxy, qdict, caps_migr, errp); } +static bool check_multi(VFIOUserProxy *proxy, QObject *qobj, Error **errp) +{ + QBool *qb = qobject_to(QBool, qobj); + + if (qb == NULL) { + error_setg(errp, "malformed %s", VFIO_USER_CAP_MULTI); + return false; + } + if (qbool_get_bool(qb)) { + proxy->flags |= VFIO_PROXY_USE_MULTI; + } + return true; +} + static struct cap_entry caps_cap[] = { { VFIO_USER_CAP_MAX_FDS, check_max_fds }, { VFIO_USER_CAP_MAX_XFER, check_max_xfer }, { VFIO_USER_CAP_PGSIZES, check_pgsizes }, { VFIO_USER_CAP_MAP_MAX, check_max_dma }, { VFIO_USER_CAP_MIGR, check_migr }, + { VFIO_USER_CAP_MULTI, check_multi }, { NULL } }; @@ -1215,6 +1246,7 @@ static GString *caps_json(void) qdict_put_int(capdict, VFIO_USER_CAP_MAX_XFER, VFIO_USER_DEF_MAX_XFER); qdict_put_int(capdict, VFIO_USER_CAP_PGSIZES, VFIO_USER_DEF_PGSIZE); qdict_put_int(capdict, VFIO_USER_CAP_MAP_MAX, VFIO_USER_DEF_MAP_MAX); + qdict_put_bool(capdict, VFIO_USER_CAP_MULTI, true); qdict_put_obj(dict, VFIO_USER_CAP, QOBJECT(capdict)); @@ -1270,3 +1302,55 @@ bool vfio_user_validate_version(VFIOUserProxy *proxy, Error **errp) trace_vfio_user_version(msgp->major, msgp->minor, msgp->capabilities); return true; } + +void vfio_user_flush_multi(VFIOUserProxy *proxy) +{ + VFIOUserMsg *msg; + VFIOUserWRMulti *wm = proxy->wr_multi; + Error *local_err = NULL; + + proxy->wr_multi = NULL; + + /* adjust size for actual # of writes */ + wm->hdr.size -= (VFIO_USER_MULTI_MAX - wm->wr_cnt) * sizeof(VFIOUserWROne); + + msg = vfio_user_getmsg(proxy, &wm->hdr, NULL); + msg->id = wm->hdr.id; + msg->rsize = 0; + msg->type = VFIO_MSG_ASYNC; + trace_vfio_user_wrmulti("flush", wm->wr_cnt); + + if (!vfio_user_send_queued(proxy, msg, &local_err)) { + error_report_err(local_err); + vfio_user_recycle(proxy, msg); + } +} + +void vfio_user_create_multi(VFIOUserProxy *proxy) +{ + VFIOUserWRMulti *wm; + + wm = g_malloc0(sizeof(*wm)); + vfio_user_request_msg(&wm->hdr, VFIO_USER_REGION_WRITE_MULTI, + sizeof(*wm), VFIO_USER_NO_REPLY); + proxy->wr_multi = wm; +} + +void vfio_user_add_multi(VFIOUserProxy *proxy, uint8_t index, + off_t offset, uint32_t count, void *data) +{ + VFIOUserWRMulti *wm = proxy->wr_multi; + VFIOUserWROne *w1 = &wm->wrs[wm->wr_cnt]; + + w1->offset = offset; + w1->region = index; + w1->count = count; + memcpy(&w1->data, data, count); + + wm->wr_cnt++; + trace_vfio_user_wrmulti("add", wm->wr_cnt); + if (wm->wr_cnt == VFIO_USER_MULTI_MAX || + proxy->num_outgoing < VFIO_USER_OUT_LOW) { + vfio_user_flush_multi(proxy); + } +} diff --git a/hw/vfio-user/trace-events b/hw/vfio-user/trace-events index 44dde020b3198e215065c4adcb9d13832e5c4287..abb67f4c119af533a77ad04e383fa027e5aaa684 100644 --- a/hw/vfio-user/trace-events +++ b/hw/vfio-user/trace-events @@ -13,6 +13,7 @@ vfio_user_get_region_info(uint32_t index, uint32_t flags, uint64_t size) " index vfio_user_region_rw(uint32_t region, uint64_t off, uint32_t count) " region %d offset 0x%"PRIx64" count %d" vfio_user_get_irq_info(uint32_t index, uint32_t flags, uint32_t count) " index %d flags 0x%x count %d" vfio_user_set_irqs(uint32_t index, uint32_t start, uint32_t count, uint32_t flags) " index %d start %d count %d flags 0x%x" +vfio_user_wrmulti(const char *s, uint64_t wr_cnt) " %s count 0x%"PRIx64 # container.c vfio_user_dma_map(uint64_t iova, uint64_t size, uint64_t off, uint32_t flags, bool async_ops) " iova 0x%"PRIx64" size 0x%"PRIx64" off 0x%"PRIx64" flags 0x%x async_ops %d" -- 2.49.0