From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40601) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZZfEe-0002Rp-S9 for qemu-devel@nongnu.org; Wed, 09 Sep 2015 09:16:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZZfEZ-0001QB-Eh for qemu-devel@nongnu.org; Wed, 09 Sep 2015 09:16:24 -0400 Received: from mga03.intel.com ([134.134.136.65]:43041) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZZfEZ-0001Pd-0j for qemu-devel@nongnu.org; Wed, 09 Sep 2015 09:16:19 -0400 Date: Wed, 9 Sep 2015 21:19:06 +0800 From: Yuanhan Liu Message-ID: <20150909131906.GN2925@yliu-dev.sh.intel.com> References: <1441697927-16456-1-git-send-email-yuanhan.liu@linux.intel.com> <1441697927-16456-7-git-send-email-yuanhan.liu@linux.intel.com> <20150909151701-mutt-send-email-mst@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150909151701-mutt-send-email-mst@redhat.com> Subject: Re: [Qemu-devel] [PATCH 6/7] vhost-user: add multiple queue support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: qemu-devel@nongnu.org, changchun.ouyang@intel.com On Wed, Sep 09, 2015 at 03:18:53PM +0300, Michael S. Tsirkin wrote: > On Tue, Sep 08, 2015 at 03:38:46PM +0800, Yuanhan Liu wrote: > > From: Ouyang Changchun > > > > This patch is initially based a patch from Nikolay Nikolaev. > > > > Here is the latest version for adding vhost-user multiple queue support, > > by creating a nc and vhost_net pair for each queue. > > > > What differs from last version is that this patch addresses two major > > concerns from Michael, and fixes one hidden bug. > > > > - Concern #1: no feedback when the backend can't support # of > > requested queues(by providing queues=# option). > > > > Here we address this issue by querying VHOST_USER_PROTOCOL_F_MQ > > protocol features first, if not set, it means the backend don't > > support mq feature, and let max_queues be 1. Otherwise, we send > > another message, VHOST_USER_GET_QUEUE_NUM, for getting the max_queues > > the backend supports. > > > > At vhost-user initiation stage(net_vhost_user_start), we then initiate > > one queue first, which, in the meantime, also gets the max_queues. > > We then do a simple compare: if requested_queues > max_queues, we > > exit(I guess it's safe to exit here, as the vm is not running yet). > > > > - Concern #2: some messages are sent more times than necessary. > > > > We came an agreement with Michael that we could categorize vhost > > user messages to 2 types: none-vring specific messages, which should > > be sent only once, and vring specific messages, which should be sent > > per queue. > > > > Here I introduced a helper function vhost_user_one_time_request(), > > which lists following messages as none-vring specific messages: > > > > VHOST_USER_GET_FEATURES > > VHOST_USER_SET_FEATURES > > VHOST_USER_GET_PROTOCOL_FEATURES > > VHOST_USER_SET_PROTOCOL_FEATURES > > VHOST_USER_SET_OWNER > > VHOST_USER_RESET_DEVICE > > VHOST_USER_SET_MEM_TABLE > > VHOST_USER_GET_QUEUE_NUM > > > > For above messages, we simply ignore them when they are not sent the first > > time. > > > > I also observed a hidden bug from last version. We register the char dev > > event handler N times, which is not necessary, as well as buggy: A later > > register overwrites the former one, as qemu_chr_add_handlers() will not > > chain those handlers, but instead overwrites the old one. So, in theory, > > invoking qemu_chr_add_handlers N times will not end up with calling the > > handler N times. > > > > However, the reason the handler is invoked N times is because we start > > the backend(the connection server) first, and hence when net_vhost_user_init() > > is executed, the connection is already established, and qemu_chr_add_handlers() > > then invoke the handler immediately, which just looks like we invoke the > > handler(net_vhost_user_event) directly from net_vhost_user_init(). > > > > The solution I came up with is to make VhostUserState as an upper level > > structure, making it includes N nc and vhost_net pairs: > > > > struct VhostUserNetPeer { > > NetClientState *nc; > > VHostNetState *vhost_net; > > }; > > > > typedef struct VhostUserState { > > CharDriverState *chr; > > bool running; > > int queues; > > struct VhostUserNetPeer peers[]; > > } VhostUserState; > > > > Signed-off-by: Nikolay Nikolaev > > Signed-off-by: Changchun Ouyang > > Signed-off-by: Yuanhan Liu > > --- > > docs/specs/vhost-user.txt | 13 +++++ > > hw/virtio/vhost-user.c | 31 ++++++++++- > > include/net/net.h | 1 + > > net/vhost-user.c | 136 ++++++++++++++++++++++++++++++++-------------- > > qapi-schema.json | 6 +- > > qemu-options.hx | 5 +- > > 6 files changed, 146 insertions(+), 46 deletions(-) > > > > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt > > index 43db9b4..99d79be 100644 > > --- a/docs/specs/vhost-user.txt > > +++ b/docs/specs/vhost-user.txt > > @@ -135,6 +135,19 @@ As older slaves don't support negotiating protocol features, > > a feature bit was dedicated for this purpose: > > #define VHOST_USER_F_PROTOCOL_FEATURES 30 > > > > +Multiple queue support > > +------------------- > > +Multiple queue is treated as a protocal extension, hence the slave has to > > +implement protocol features first. Multiple queues is supported only when > > +the protocol feature VHOST_USER_PROTOCOL_F_MQ(bit 0) is set. > > + > > +The max # of queues the slave support can be queried with message > > +VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the # of requested > > +queues is bigger than that. > > + > > +As all queues share one connection, the master use a unique index for each > > +queue in the sent message to identify one specified queue. > > + > > Please also document that only 2 queues are enabled initially. > More queues are enabled dynamically. > To enable more queues, the new message needs to be sent > (added by patch 7). Hi Michael, Will do that in next version. Besides that, do you have more comments? Say, does this patch address your two concerns? --yliu > > > Message types > > ------------- > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > index 8046bc0..11e46b5 100644 > > --- a/hw/virtio/vhost-user.c > > +++ b/hw/virtio/vhost-user.c > > @@ -187,6 +187,23 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg, > > 0 : -1; > > } > > > > +static bool vhost_user_one_time_request(VhostUserRequest request) > > +{ > > + switch (request) { > > + case VHOST_USER_GET_FEATURES: > > + case VHOST_USER_SET_FEATURES: > > + case VHOST_USER_GET_PROTOCOL_FEATURES: > > + case VHOST_USER_SET_PROTOCOL_FEATURES: > > + case VHOST_USER_SET_OWNER: > > + case VHOST_USER_RESET_DEVICE: > > + case VHOST_USER_SET_MEM_TABLE: > > + case VHOST_USER_GET_QUEUE_NUM: > > + return true; > > + default: > > + return false; > > + } > > +} > > + > > static int vhost_user_call(struct vhost_dev *dev, unsigned long int request, > > void *arg) > > { > > @@ -206,6 +223,14 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request, > > else > > msg_request = request; > > > > + /* > > + * For none-vring specific requests, like VHOST_USER_GET_FEATURES, > > + * we just need send it once in the first time. For later such > > + * request, we just ignore it. > > + */ > > + if (vhost_user_one_time_request(msg_request) && dev->vq_index != 0) > > + return 0; > > + > > msg.request = msg_request; > > msg.flags = VHOST_USER_VERSION; > > msg.size = 0; > > @@ -268,17 +293,20 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request, > > case VHOST_USER_SET_VRING_NUM: > > case VHOST_USER_SET_VRING_BASE: > > memcpy(&msg.state, arg, sizeof(struct vhost_vring_state)); > > + msg.addr.index += dev->vq_index; > > msg.size = sizeof(m.state); > > break; > > > > case VHOST_USER_GET_VRING_BASE: > > memcpy(&msg.state, arg, sizeof(struct vhost_vring_state)); > > + msg.addr.index += dev->vq_index; > > msg.size = sizeof(m.state); > > need_reply = 1; > > break; > > > > case VHOST_USER_SET_VRING_ADDR: > > memcpy(&msg.addr, arg, sizeof(struct vhost_vring_addr)); > > + msg.addr.index += dev->vq_index; > > msg.size = sizeof(m.addr); > > break; > > > > @@ -286,7 +314,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request, > > case VHOST_USER_SET_VRING_CALL: > > case VHOST_USER_SET_VRING_ERR: > > file = arg; > > - msg.u64 = file->index & VHOST_USER_VRING_IDX_MASK; > > + msg.u64 = (file->index + dev->vq_index) & VHOST_USER_VRING_IDX_MASK; > > msg.size = sizeof(m.u64); > > if (ioeventfd_enabled() && file->fd > 0) { > > fds[fd_num++] = file->fd; > > @@ -330,6 +358,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request, > > error_report("Received bad msg size."); > > return -1; > > } > > + msg.state.index -= dev->vq_index; > > memcpy(arg, &msg.state, sizeof(struct vhost_vring_state)); > > break; > > default: > > diff --git a/include/net/net.h b/include/net/net.h > > index 6a6cbef..6f20656 100644 > > --- a/include/net/net.h > > +++ b/include/net/net.h > > @@ -92,6 +92,7 @@ struct NetClientState { > > NetClientDestructor *destructor; > > unsigned int queue_index; > > unsigned rxfilter_notify_enabled:1; > > + void *opaque; > > }; > > > > typedef struct NICState { > > diff --git a/net/vhost-user.c b/net/vhost-user.c > > index 2d6bbe5..7d4ac69 100644 > > --- a/net/vhost-user.c > > +++ b/net/vhost-user.c > > @@ -15,10 +15,16 @@ > > #include "qemu/config-file.h" > > #include "qemu/error-report.h" > > > > +struct VhostUserNetPeer { > > + NetClientState *nc; > > + VHostNetState *vhost_net; > > +}; > > + > > typedef struct VhostUserState { > > - NetClientState nc; > > CharDriverState *chr; > > - VHostNetState *vhost_net; > > + bool running; > > + int queues; > > + struct VhostUserNetPeer peers[]; > > } VhostUserState; > > > > typedef struct VhostUserChardevProps { > > @@ -29,48 +35,75 @@ typedef struct VhostUserChardevProps { > > > > VHostNetState *vhost_user_get_vhost_net(NetClientState *nc) > > { > > - VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc); > > + VhostUserState *s = nc->opaque; > > assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER); > > - return s->vhost_net; > > -} > > - > > -static int vhost_user_running(VhostUserState *s) > > -{ > > - return (s->vhost_net) ? 1 : 0; > > + return s->peers[nc->queue_index].vhost_net; > > } > > > > static int vhost_user_start(VhostUserState *s) > > { > > VhostNetOptions options; > > + VHostNetState *vhost_net; > > + int max_queues; > > + int i = 0; > > > > - if (vhost_user_running(s)) { > > + if (s->running) > > return 0; > > - } > > > > options.backend_type = VHOST_BACKEND_TYPE_USER; > > - options.net_backend = &s->nc; > > options.opaque = s->chr; > > > > - s->vhost_net = vhost_net_init(&options); > > + options.net_backend = s->peers[i].nc; > > + vhost_net = s->peers[i++].vhost_net = vhost_net_init(&options); > > + > > + max_queues = vhost_net_get_max_queues(vhost_net); > > + if (s->queues >= max_queues) { > > + error_report("you are asking more queues than supported: %d\n", > > + max_queues); > > + return -1; > > + } > > + > > + for (; i < s->queues; i++) { > > + options.net_backend = s->peers[i].nc; > > + > > + s->peers[i].vhost_net = vhost_net_init(&options); > > + if (!s->peers[i].vhost_net) > > + return -1; > > + } > > + s->running = true; > > > > - return vhost_user_running(s) ? 0 : -1; > > + return 0; > > } > > > > static void vhost_user_stop(VhostUserState *s) > > { > > - if (vhost_user_running(s)) { > > - vhost_net_cleanup(s->vhost_net); > > + int i; > > + VHostNetState *vhost_net; > > + > > + if (!s->running) > > + return; > > + > > + for (i = 0; i < s->queues; i++) { > > + vhost_net = s->peers[i].vhost_net; > > + if (vhost_net) > > + vhost_net_cleanup(vhost_net); > > } > > > > - s->vhost_net = 0; > > + s->running = false; > > } > > > > static void vhost_user_cleanup(NetClientState *nc) > > { > > - VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc); > > + VhostUserState *s = nc->opaque; > > + VHostNetState *vhost_net = s->peers[nc->queue_index].vhost_net; > > + > > + if (vhost_net) > > + vhost_net_cleanup(vhost_net); > > > > - vhost_user_stop(s); > > qemu_purge_queued_packets(nc); > > + > > + if (nc->queue_index == s->queues - 1) > > + free(s); > > } > > > > static bool vhost_user_has_vnet_hdr(NetClientState *nc) > > @@ -89,7 +122,7 @@ static bool vhost_user_has_ufo(NetClientState *nc) > > > > static NetClientInfo net_vhost_user_info = { > > .type = NET_CLIENT_OPTIONS_KIND_VHOST_USER, > > - .size = sizeof(VhostUserState), > > + .size = sizeof(NetClientState), > > .cleanup = vhost_user_cleanup, > > .has_vnet_hdr = vhost_user_has_vnet_hdr, > > .has_ufo = vhost_user_has_ufo, > > @@ -97,18 +130,25 @@ static NetClientInfo net_vhost_user_info = { > > > > static void net_vhost_link_down(VhostUserState *s, bool link_down) > > { > > - s->nc.link_down = link_down; > > + NetClientState *nc; > > + int i; > > > > - if (s->nc.peer) { > > - s->nc.peer->link_down = link_down; > > - } > > + for (i = 0; i < s->queues; i++) { > > + nc = s->peers[i].nc; > > > > - if (s->nc.info->link_status_changed) { > > - s->nc.info->link_status_changed(&s->nc); > > - } > > + nc->link_down = link_down; > > + > > + if (nc->peer) { > > + nc->peer->link_down = link_down; > > + } > > > > - if (s->nc.peer && s->nc.peer->info->link_status_changed) { > > - s->nc.peer->info->link_status_changed(s->nc.peer); > > + if (nc->info->link_status_changed) { > > + nc->info->link_status_changed(nc); > > + } > > + > > + if (nc->peer && nc->peer->info->link_status_changed) { > > + nc->peer->info->link_status_changed(nc->peer); > > + } > > } > > } > > > > @@ -118,7 +158,8 @@ static void net_vhost_user_event(void *opaque, int event) > > > > switch (event) { > > case CHR_EVENT_OPENED: > > - vhost_user_start(s); > > + if (vhost_user_start(s) < 0) > > + exit(1); > > net_vhost_link_down(s, false); > > error_report("chardev \"%s\" went up", s->chr->label); > > break; > > @@ -131,24 +172,28 @@ static void net_vhost_user_event(void *opaque, int event) > > } > > > > static int net_vhost_user_init(NetClientState *peer, const char *device, > > - const char *name, CharDriverState *chr) > > + const char *name, VhostUserState *s) > > { > > NetClientState *nc; > > - VhostUserState *s; > > + CharDriverState *chr = s->chr; > > + int i; > > > > - nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name); > > + for (i = 0; i < s->queues; i++) { > > + nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name); > > > > - snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user to %s", > > - chr->label); > > + snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to %s", > > + i, chr->label); > > > > - s = DO_UPCAST(VhostUserState, nc, nc); > > + /* We don't provide a receive callback */ > > + nc->receive_disabled = 1; > > > > - /* We don't provide a receive callback */ > > - s->nc.receive_disabled = 1; > > - s->chr = chr; > > - nc->queue_index = 0; > > + nc->queue_index = i; > > + nc->opaque = s; > > > > - qemu_chr_add_handlers(s->chr, NULL, NULL, net_vhost_user_event, s); > > + s->peers[i].nc = nc; > > + } > > + > > + qemu_chr_add_handlers(chr, NULL, NULL, net_vhost_user_event, s); > > > > return 0; > > } > > @@ -227,8 +272,10 @@ static int net_vhost_check_net(void *opaque, QemuOpts *opts, Error **errp) > > int net_init_vhost_user(const NetClientOptions *opts, const char *name, > > NetClientState *peer, Error **errp) > > { > > + int queues; > > const NetdevVhostUserOptions *vhost_user_opts; > > CharDriverState *chr; > > + VhostUserState *s; > > > > assert(opts->kind == NET_CLIENT_OPTIONS_KIND_VHOST_USER); > > vhost_user_opts = opts->vhost_user; > > @@ -244,6 +291,11 @@ int net_init_vhost_user(const NetClientOptions *opts, const char *name, > > return -1; > > } > > > > + queues = vhost_user_opts->has_queues ? vhost_user_opts->queues : 1; > > + s = g_malloc0(sizeof(VhostUserState) + > > + queues * sizeof(struct VhostUserNetPeer)); > > + s->queues = queues; > > + s->chr = chr; > > > > - return net_vhost_user_init(peer, "vhost_user", name, chr); > > + return net_vhost_user_init(peer, "vhost_user", name, s); > > } > > diff --git a/qapi-schema.json b/qapi-schema.json > > index 67fef37..55c33db 100644 > > --- a/qapi-schema.json > > +++ b/qapi-schema.json > > @@ -2480,12 +2480,16 @@ > > # > > # @vhostforce: #optional vhost on for non-MSIX virtio guests (default: false). > > # > > +# @queues: #optional number of queues to be created for multiqueue vhost-user > > +# (default: 1) (Since 2.5) > > +# > > # Since 2.1 > > ## > > { 'struct': 'NetdevVhostUserOptions', > > 'data': { > > 'chardev': 'str', > > - '*vhostforce': 'bool' } } > > + '*vhostforce': 'bool', > > + '*queues': 'int' } } > > > > ## > > # @NetClientOptions > > diff --git a/qemu-options.hx b/qemu-options.hx > > index 77f5853..5bfa7a3 100644 > > --- a/qemu-options.hx > > +++ b/qemu-options.hx > > @@ -1963,13 +1963,14 @@ The hubport netdev lets you connect a NIC to a QEMU "vlan" instead of a single > > netdev. @code{-net} and @code{-device} with parameter @option{vlan} create the > > required hub automatically. > > > > -@item -netdev vhost-user,chardev=@var{id}[,vhostforce=on|off] > > +@item -netdev vhost-user,chardev=@var{id}[,vhostforce=on|off][,queues=n] > > > > Establish a vhost-user netdev, backed by a chardev @var{id}. The chardev should > > be a unix domain socket backed one. The vhost-user uses a specifically defined > > protocol to pass vhost ioctl replacement messages to an application on the other > > end of the socket. On non-MSIX guests, the feature can be forced with > > -@var{vhostforce}. > > +@var{vhostforce}. Use 'queues=@var{n}' to specify the number of queues to > > +be created for multiqueue vhost-user. > > > > Example: > > @example > > -- > > 1.9.0 > >