From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:40601)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <yuanhan.liu@linux.intel.com>) id 1ZZfEe-0002Rp-S9
	for qemu-devel@nongnu.org; Wed, 09 Sep 2015 09:16:30 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <yuanhan.liu@linux.intel.com>) id 1ZZfEZ-0001QB-Eh
	for qemu-devel@nongnu.org; Wed, 09 Sep 2015 09:16:24 -0400
Received: from mga03.intel.com ([134.134.136.65]:43041)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <yuanhan.liu@linux.intel.com>) id 1ZZfEZ-0001Pd-0j
	for qemu-devel@nongnu.org; Wed, 09 Sep 2015 09:16:19 -0400
Date: Wed, 9 Sep 2015 21:19:06 +0800
From: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Message-ID: <20150909131906.GN2925@yliu-dev.sh.intel.com>
References: <1441697927-16456-1-git-send-email-yuanhan.liu@linux.intel.com>
	<1441697927-16456-7-git-send-email-yuanhan.liu@linux.intel.com>
	<20150909151701-mutt-send-email-mst@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150909151701-mutt-send-email-mst@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 6/7] vhost-user: add multiple queue support
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: qemu-devel@nongnu.org, changchun.ouyang@intel.com

On Wed, Sep 09, 2015 at 03:18:53PM +0300, Michael S. Tsirkin wrote:
> On Tue, Sep 08, 2015 at 03:38:46PM +0800, Yuanhan Liu wrote:
> > From: Ouyang Changchun <changchun.ouyang@intel.com>
> > 
> > This patch is initially based a patch from Nikolay Nikolaev.
> > 
> > Here is the latest version for adding vhost-user multiple queue support,
> > by creating a nc and vhost_net pair for each queue.
> > 
> > What differs from last version is that this patch addresses two major
> > concerns from Michael, and fixes one hidden bug.
> > 
> > - Concern #1: no feedback when the backend can't support # of
> >   requested queues(by providing queues=# option).
> > 
> >   Here we address this issue by querying VHOST_USER_PROTOCOL_F_MQ
> >   protocol features first, if not set, it means the backend don't
> >   support mq feature, and let max_queues be 1. Otherwise, we send
> >   another message, VHOST_USER_GET_QUEUE_NUM, for getting the max_queues
> >   the backend supports.
> > 
> >   At vhost-user initiation stage(net_vhost_user_start), we then initiate
> >   one queue first, which, in the meantime, also gets the max_queues.
> >   We then do a simple compare: if requested_queues > max_queues, we
> >   exit(I guess it's safe to exit here, as the vm is not running yet).
> > 
> > - Concern #2: some messages are sent more times than necessary.
> > 
> >   We came an agreement with Michael that we could categorize vhost
> >   user messages to 2 types: none-vring specific messages, which should
> >   be sent only once, and vring specific messages, which should be sent
> >   per queue.
> > 
> >   Here I introduced a helper function vhost_user_one_time_request(),
> >   which lists following messages as none-vring specific messages:
> > 
> >         VHOST_USER_GET_FEATURES
> >         VHOST_USER_SET_FEATURES
> >         VHOST_USER_GET_PROTOCOL_FEATURES
> >         VHOST_USER_SET_PROTOCOL_FEATURES
> >         VHOST_USER_SET_OWNER
> >         VHOST_USER_RESET_DEVICE
> >         VHOST_USER_SET_MEM_TABLE
> >         VHOST_USER_GET_QUEUE_NUM
> > 
> >   For above messages, we simply ignore them when they are not sent the first
> >   time.
> > 
> > I also observed a hidden bug from last version. We register the char dev
> > event handler N times, which is not necessary, as well as buggy: A later
> > register overwrites the former one, as qemu_chr_add_handlers() will not
> > chain those handlers, but instead overwrites the old one. So, in theory,
> > invoking qemu_chr_add_handlers N times will not end up with calling the
> > handler N times.
> > 
> > However, the reason the handler is invoked N times is because we start
> > the backend(the connection server) first, and hence when net_vhost_user_init()
> > is executed, the connection is already established, and qemu_chr_add_handlers()
> > then invoke the handler immediately, which just looks like we invoke the
> > handler(net_vhost_user_event) directly from net_vhost_user_init().
> > 
> > The solution I came up with is to make VhostUserState as an upper level
> > structure, making it includes N nc and vhost_net pairs:
> > 
> >    struct VhostUserNetPeer {
> >        NetClientState *nc;
> >        VHostNetState  *vhost_net;
> >    };
> > 
> >    typedef struct VhostUserState {
> >        CharDriverState *chr;
> >        bool running;
> >        int queues;
> >        struct VhostUserNetPeer peers[];
> >    } VhostUserState;
> > 
> > Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
> > Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
> > Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > ---
> >  docs/specs/vhost-user.txt |  13 +++++
> >  hw/virtio/vhost-user.c    |  31 ++++++++++-
> >  include/net/net.h         |   1 +
> >  net/vhost-user.c          | 136 ++++++++++++++++++++++++++++++++--------------
> >  qapi-schema.json          |   6 +-
> >  qemu-options.hx           |   5 +-
> >  6 files changed, 146 insertions(+), 46 deletions(-)
> > 
> > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> > index 43db9b4..99d79be 100644
> > --- a/docs/specs/vhost-user.txt
> > +++ b/docs/specs/vhost-user.txt
> > @@ -135,6 +135,19 @@ As older slaves don't support negotiating protocol features,
> >  a feature bit was dedicated for this purpose:
> >  #define VHOST_USER_F_PROTOCOL_FEATURES 30
> >  
> > +Multiple queue support
> > +-------------------
> > +Multiple queue is treated as a protocal extension, hence the slave has to
> > +implement protocol features first. Multiple queues is supported only when
> > +the protocol feature VHOST_USER_PROTOCOL_F_MQ(bit 0) is set.
> > +
> > +The max # of queues the slave support can be queried with message
> > +VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the # of requested
> > +queues is bigger than that.
> > +
> > +As all queues share one connection, the master use a unique index for each
> > +queue in the sent message to identify one specified queue.
> > +
> 
> Please also document that only 2 queues are enabled initially.
> More queues are enabled dynamically.
> To enable more queues, the new message needs to be sent
> (added by patch 7).

Hi Michael,

Will do that in next version. Besides that, do you have more comments?
Say, does this patch address your two concerns?

	--yliu

> 
> >  Message types
> >  -------------
> >  
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 8046bc0..11e46b5 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -187,6 +187,23 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
> >              0 : -1;
> >  }
> >  
> > +static bool vhost_user_one_time_request(VhostUserRequest request)
> > +{
> > +    switch (request) {
> > +    case VHOST_USER_GET_FEATURES:
> > +    case VHOST_USER_SET_FEATURES:
> > +    case VHOST_USER_GET_PROTOCOL_FEATURES:
> > +    case VHOST_USER_SET_PROTOCOL_FEATURES:
> > +    case VHOST_USER_SET_OWNER:
> > +    case VHOST_USER_RESET_DEVICE:
> > +    case VHOST_USER_SET_MEM_TABLE:
> > +    case VHOST_USER_GET_QUEUE_NUM:
> > +        return true;
> > +    default:
> > +        return false;
> > +    }
> > +}
> > +
> >  static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
> >          void *arg)
> >  {
> > @@ -206,6 +223,14 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
> >      else
> >          msg_request = request;
> >  
> > +    /*
> > +     * For none-vring specific requests, like VHOST_USER_GET_FEATURES,
> > +     * we just need send it once in the first time. For later such
> > +     * request, we just ignore it.
> > +     */
> > +    if (vhost_user_one_time_request(msg_request) && dev->vq_index != 0)
> > +        return 0;
> > +
> >      msg.request = msg_request;
> >      msg.flags = VHOST_USER_VERSION;
> >      msg.size = 0;
> > @@ -268,17 +293,20 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
> >      case VHOST_USER_SET_VRING_NUM:
> >      case VHOST_USER_SET_VRING_BASE:
> >          memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
> > +        msg.addr.index += dev->vq_index;
> >          msg.size = sizeof(m.state);
> >          break;
> >  
> >      case VHOST_USER_GET_VRING_BASE:
> >          memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
> > +        msg.addr.index += dev->vq_index;
> >          msg.size = sizeof(m.state);
> >          need_reply = 1;
> >          break;
> >  
> >      case VHOST_USER_SET_VRING_ADDR:
> >          memcpy(&msg.addr, arg, sizeof(struct vhost_vring_addr));
> > +        msg.addr.index += dev->vq_index;
> >          msg.size = sizeof(m.addr);
> >          break;
> >  
> > @@ -286,7 +314,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
> >      case VHOST_USER_SET_VRING_CALL:
> >      case VHOST_USER_SET_VRING_ERR:
> >          file = arg;
> > -        msg.u64 = file->index & VHOST_USER_VRING_IDX_MASK;
> > +        msg.u64 = (file->index + dev->vq_index) & VHOST_USER_VRING_IDX_MASK;
> >          msg.size = sizeof(m.u64);
> >          if (ioeventfd_enabled() && file->fd > 0) {
> >              fds[fd_num++] = file->fd;
> > @@ -330,6 +358,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
> >                  error_report("Received bad msg size.");
> >                  return -1;
> >              }
> > +            msg.state.index -= dev->vq_index;
> >              memcpy(arg, &msg.state, sizeof(struct vhost_vring_state));
> >              break;
> >          default:
> > diff --git a/include/net/net.h b/include/net/net.h
> > index 6a6cbef..6f20656 100644
> > --- a/include/net/net.h
> > +++ b/include/net/net.h
> > @@ -92,6 +92,7 @@ struct NetClientState {
> >      NetClientDestructor *destructor;
> >      unsigned int queue_index;
> >      unsigned rxfilter_notify_enabled:1;
> > +    void *opaque;
> >  };
> >  
> >  typedef struct NICState {
> > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > index 2d6bbe5..7d4ac69 100644
> > --- a/net/vhost-user.c
> > +++ b/net/vhost-user.c
> > @@ -15,10 +15,16 @@
> >  #include "qemu/config-file.h"
> >  #include "qemu/error-report.h"
> >  
> > +struct VhostUserNetPeer {
> > +    NetClientState *nc;
> > +    VHostNetState  *vhost_net;
> > +};
> > +
> >  typedef struct VhostUserState {
> > -    NetClientState nc;
> >      CharDriverState *chr;
> > -    VHostNetState *vhost_net;
> > +    bool running;
> > +    int queues;
> > +    struct VhostUserNetPeer peers[];
> >  } VhostUserState;
> >  
> >  typedef struct VhostUserChardevProps {
> > @@ -29,48 +35,75 @@ typedef struct VhostUserChardevProps {
> >  
> >  VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
> >  {
> > -    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
> > +    VhostUserState *s = nc->opaque;
> >      assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
> > -    return s->vhost_net;
> > -}
> > -
> > -static int vhost_user_running(VhostUserState *s)
> > -{
> > -    return (s->vhost_net) ? 1 : 0;
> > +    return s->peers[nc->queue_index].vhost_net;
> >  }
> >  
> >  static int vhost_user_start(VhostUserState *s)
> >  {
> >      VhostNetOptions options;
> > +    VHostNetState *vhost_net;
> > +    int max_queues;
> > +    int i = 0;
> >  
> > -    if (vhost_user_running(s)) {
> > +    if (s->running)
> >          return 0;
> > -    }
> >  
> >      options.backend_type = VHOST_BACKEND_TYPE_USER;
> > -    options.net_backend = &s->nc;
> >      options.opaque = s->chr;
> >  
> > -    s->vhost_net = vhost_net_init(&options);
> > +    options.net_backend = s->peers[i].nc;
> > +    vhost_net = s->peers[i++].vhost_net = vhost_net_init(&options);
> > +
> > +    max_queues = vhost_net_get_max_queues(vhost_net);
> > +    if (s->queues >= max_queues) {
> > +        error_report("you are asking more queues than supported: %d\n",
> > +                     max_queues);
> > +        return -1;
> > +    }
> > +
> > +    for (; i < s->queues; i++) {
> > +        options.net_backend = s->peers[i].nc;
> > +
> > +        s->peers[i].vhost_net = vhost_net_init(&options);
> > +        if (!s->peers[i].vhost_net)
> > +            return -1;
> > +    }
> > +    s->running = true;
> >  
> > -    return vhost_user_running(s) ? 0 : -1;
> > +    return 0;
> >  }
> >  
> >  static void vhost_user_stop(VhostUserState *s)
> >  {
> > -    if (vhost_user_running(s)) {
> > -        vhost_net_cleanup(s->vhost_net);
> > +    int i;
> > +    VHostNetState *vhost_net;
> > +
> > +    if (!s->running)
> > +        return;
> > +
> > +    for (i = 0;  i < s->queues; i++) {
> > +        vhost_net = s->peers[i].vhost_net;
> > +        if (vhost_net)
> > +            vhost_net_cleanup(vhost_net);
> >      }
> >  
> > -    s->vhost_net = 0;
> > +    s->running = false;
> >  }
> >  
> >  static void vhost_user_cleanup(NetClientState *nc)
> >  {
> > -    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
> > +    VhostUserState *s = nc->opaque;
> > +    VHostNetState *vhost_net = s->peers[nc->queue_index].vhost_net;
> > +
> > +    if (vhost_net)
> > +        vhost_net_cleanup(vhost_net);
> >  
> > -    vhost_user_stop(s);
> >      qemu_purge_queued_packets(nc);
> > +
> > +    if (nc->queue_index == s->queues - 1)
> > +        free(s);
> >  }
> >  
> >  static bool vhost_user_has_vnet_hdr(NetClientState *nc)
> > @@ -89,7 +122,7 @@ static bool vhost_user_has_ufo(NetClientState *nc)
> >  
> >  static NetClientInfo net_vhost_user_info = {
> >          .type = NET_CLIENT_OPTIONS_KIND_VHOST_USER,
> > -        .size = sizeof(VhostUserState),
> > +        .size = sizeof(NetClientState),
> >          .cleanup = vhost_user_cleanup,
> >          .has_vnet_hdr = vhost_user_has_vnet_hdr,
> >          .has_ufo = vhost_user_has_ufo,
> > @@ -97,18 +130,25 @@ static NetClientInfo net_vhost_user_info = {
> >  
> >  static void net_vhost_link_down(VhostUserState *s, bool link_down)
> >  {
> > -    s->nc.link_down = link_down;
> > +    NetClientState *nc;
> > +    int i;
> >  
> > -    if (s->nc.peer) {
> > -        s->nc.peer->link_down = link_down;
> > -    }
> > +    for (i = 0; i < s->queues; i++) {
> > +        nc = s->peers[i].nc;
> >  
> > -    if (s->nc.info->link_status_changed) {
> > -        s->nc.info->link_status_changed(&s->nc);
> > -    }
> > +        nc->link_down = link_down;
> > +
> > +        if (nc->peer) {
> > +            nc->peer->link_down = link_down;
> > +        }
> >  
> > -    if (s->nc.peer && s->nc.peer->info->link_status_changed) {
> > -        s->nc.peer->info->link_status_changed(s->nc.peer);
> > +        if (nc->info->link_status_changed) {
> > +            nc->info->link_status_changed(nc);
> > +        }
> > +
> > +        if (nc->peer && nc->peer->info->link_status_changed) {
> > +            nc->peer->info->link_status_changed(nc->peer);
> > +        }
> >      }
> >  }
> >  
> > @@ -118,7 +158,8 @@ static void net_vhost_user_event(void *opaque, int event)
> >  
> >      switch (event) {
> >      case CHR_EVENT_OPENED:
> > -        vhost_user_start(s);
> > +        if (vhost_user_start(s) < 0)
> > +            exit(1);
> >          net_vhost_link_down(s, false);
> >          error_report("chardev \"%s\" went up", s->chr->label);
> >          break;
> > @@ -131,24 +172,28 @@ static void net_vhost_user_event(void *opaque, int event)
> >  }
> >  
> >  static int net_vhost_user_init(NetClientState *peer, const char *device,
> > -                               const char *name, CharDriverState *chr)
> > +                               const char *name, VhostUserState *s)
> >  {
> >      NetClientState *nc;
> > -    VhostUserState *s;
> > +    CharDriverState *chr = s->chr;
> > +    int i;
> >  
> > -    nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
> > +    for (i = 0; i < s->queues; i++) {
> > +        nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
> >  
> > -    snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user to %s",
> > -             chr->label);
> > +        snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to %s",
> > +                 i, chr->label);
> >  
> > -    s = DO_UPCAST(VhostUserState, nc, nc);
> > +        /* We don't provide a receive callback */
> > +        nc->receive_disabled = 1;
> >  
> > -    /* We don't provide a receive callback */
> > -    s->nc.receive_disabled = 1;
> > -    s->chr = chr;
> > -    nc->queue_index = 0;
> > +        nc->queue_index = i;
> > +        nc->opaque      = s;
> >  
> > -    qemu_chr_add_handlers(s->chr, NULL, NULL, net_vhost_user_event, s);
> > +        s->peers[i].nc = nc;
> > +    }
> > +
> > +    qemu_chr_add_handlers(chr, NULL, NULL, net_vhost_user_event, s);
> >  
> >      return 0;
> >  }
> > @@ -227,8 +272,10 @@ static int net_vhost_check_net(void *opaque, QemuOpts *opts, Error **errp)
> >  int net_init_vhost_user(const NetClientOptions *opts, const char *name,
> >                          NetClientState *peer, Error **errp)
> >  {
> > +    int queues;
> >      const NetdevVhostUserOptions *vhost_user_opts;
> >      CharDriverState *chr;
> > +    VhostUserState *s;
> >  
> >      assert(opts->kind == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
> >      vhost_user_opts = opts->vhost_user;
> > @@ -244,6 +291,11 @@ int net_init_vhost_user(const NetClientOptions *opts, const char *name,
> >          return -1;
> >      }
> >  
> > +    queues = vhost_user_opts->has_queues ? vhost_user_opts->queues : 1;
> > +    s = g_malloc0(sizeof(VhostUserState) +
> > +                  queues * sizeof(struct VhostUserNetPeer));
> > +    s->queues = queues;
> > +    s->chr    = chr;
> >  
> > -    return net_vhost_user_init(peer, "vhost_user", name, chr);
> > +    return net_vhost_user_init(peer, "vhost_user", name, s);
> >  }
> > diff --git a/qapi-schema.json b/qapi-schema.json
> > index 67fef37..55c33db 100644
> > --- a/qapi-schema.json
> > +++ b/qapi-schema.json
> > @@ -2480,12 +2480,16 @@
> >  #
> >  # @vhostforce: #optional vhost on for non-MSIX virtio guests (default: false).
> >  #
> > +# @queues: #optional number of queues to be created for multiqueue vhost-user
> > +#          (default: 1) (Since 2.5)
> > +#
> >  # Since 2.1
> >  ##
> >  { 'struct': 'NetdevVhostUserOptions',
> >    'data': {
> >      'chardev':        'str',
> > -    '*vhostforce':    'bool' } }
> > +    '*vhostforce':    'bool',
> > +    '*queues':        'int' } }
> >  
> >  ##
> >  # @NetClientOptions
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 77f5853..5bfa7a3 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -1963,13 +1963,14 @@ The hubport netdev lets you connect a NIC to a QEMU "vlan" instead of a single
> >  netdev.  @code{-net} and @code{-device} with parameter @option{vlan} create the
> >  required hub automatically.
> >  
> > -@item -netdev vhost-user,chardev=@var{id}[,vhostforce=on|off]
> > +@item -netdev vhost-user,chardev=@var{id}[,vhostforce=on|off][,queues=n]
> >  
> >  Establish a vhost-user netdev, backed by a chardev @var{id}. The chardev should
> >  be a unix domain socket backed one. The vhost-user uses a specifically defined
> >  protocol to pass vhost ioctl replacement messages to an application on the other
> >  end of the socket. On non-MSIX guests, the feature can be forced with
> > -@var{vhostforce}.
> > +@var{vhostforce}. Use 'queues=@var{n}' to specify the number of queues to
> > +be created for multiqueue vhost-user.
> >  
> >  Example:
> >  @example
> > -- 
> > 1.9.0
> >