From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36084) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b7bWW-0007aN-8Q for qemu-devel@nongnu.org; Tue, 31 May 2016 00:43:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b7bWQ-0007lw-7b for qemu-devel@nongnu.org; Tue, 31 May 2016 00:43:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47687) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b7bWP-0007lp-VF for qemu-devel@nongnu.org; Tue, 31 May 2016 00:43:18 -0400 References: <1460004984-3784-1-git-send-email-jasowang@redhat.com> <20160530210308-mutt-send-email-mst@redhat.com> <574CFF32.2090005@redhat.com> <20160531041942.GA10021@redhat.com> From: Jason Wang Message-ID: <574D165F.5000106@redhat.com> Date: Tue, 31 May 2016 12:43:11 +0800 MIME-Version: 1.0 In-Reply-To: <20160531041942.GA10021@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH V3] tap: vhost busy polling support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: qemu-devel@nongnu.org, eblake@redhat.com, gkurz@linux.vnet.ibm.com, famz@redhat.com On 2016=E5=B9=B405=E6=9C=8831=E6=97=A5 12:19, Michael S. Tsirkin wrote: > On Tue, May 31, 2016 at 11:04:18AM +0800, Jason Wang wrote: >> >> On 2016=E5=B9=B405=E6=9C=8831=E6=97=A5 02:07, Michael S. Tsirkin wrote= : >>> On Thu, Apr 07, 2016 at 12:56:24PM +0800, Jason Wang wrote: >>>> This patch add the capability of basic vhost net busy polling which = is >>>> supported by recent kernel. User could configure the maximum number = of >>>> us that could be spent on busy polling through a new property of tap >>>> "vhost-poll-us". >>> I applied this but now I had a thought - should we generalize this to >>> "poll-us"? Down the road tun could support busy olling just like >>> sockets do. >> Looks two different things. Socket busy polling depends on the value s= et by >> sysctl or SO_BUSY_POLL, which should be transparent to qemu. > This is what I am saying. qemu can set SO_BUSY_POLL if poll-us is spec= ified, > can it not? With CAP_NET_ADMIN, it can. Without it, it can only decrease the value. > Onthe one hand this suggests a more generic name > for the option. I see, but there're some differences: - socket busy polling only poll for rx, vhost busy polling poll for both=20 tx and rx. - vhost busy polling does not depends on socket busy polling, it can=20 work with socket busy polling disabled. > On the other how does user discover whether it's > implemented for tap without vhost? Do we require kernel support? I believe this could be done by management through ioctl probing. > Does an unblocking read with tap without vhost also speed > things up a bit? Kernel will try one more round of napi poll, so we can only get speed up=20 if we can poll some packet in this case. > >>>> Signed-off-by: Jason Wang >>>> --- >>>> hw/net/vhost_net.c | 2 +- >>>> hw/scsi/vhost-scsi.c | 2 +- >>>> hw/virtio/vhost-backend.c | 8 ++++++++ >>>> hw/virtio/vhost.c | 40 +++++++++++++++++++++++++++= +++++++++++- >>>> include/hw/virtio/vhost-backend.h | 3 +++ >>>> include/hw/virtio/vhost.h | 3 ++- >>>> include/net/vhost_net.h | 1 + >>>> net/tap.c | 10 ++++++++-- >>>> net/vhost-user.c | 1 + >>>> qapi-schema.json | 6 +++++- >>>> qemu-options.hx | 3 +++ >>>> 11 files changed, 72 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c >>>> index 6e1032f..1840c73 100644 >>>> --- a/hw/net/vhost_net.c >>>> +++ b/hw/net/vhost_net.c >>>> @@ -166,7 +166,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions= *options) >>>> } >>>> r =3D vhost_dev_init(&net->dev, options->opaque, >>>> - options->backend_type); >>>> + options->backend_type, options->busyloop_tim= eout); >>>> if (r < 0) { >>>> goto fail; >>>> } >>>> diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c >>>> index 9261d51..2a00f2f 100644 >>>> --- a/hw/scsi/vhost-scsi.c >>>> +++ b/hw/scsi/vhost-scsi.c >>>> @@ -248,7 +248,7 @@ static void vhost_scsi_realize(DeviceState *dev,= Error **errp) >>>> s->dev.backend_features =3D 0; >>>> ret =3D vhost_dev_init(&s->dev, (void *)(uintptr_t)vhostfd, >>>> - VHOST_BACKEND_TYPE_KERNEL); >>>> + VHOST_BACKEND_TYPE_KERNEL, 0); >>>> if (ret < 0) { >>>> error_setg(errp, "vhost-scsi: vhost initialization failed:= %s", >>>> strerror(-ret)); >>>> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c >>>> index b358902..d62372e 100644 >>>> --- a/hw/virtio/vhost-backend.c >>>> +++ b/hw/virtio/vhost-backend.c >>>> @@ -138,6 +138,12 @@ static int vhost_kernel_set_vring_call(struct v= host_dev *dev, >>>> return vhost_kernel_call(dev, VHOST_SET_VRING_CALL, file); >>>> } >>>> +static int vhost_kernel_set_vring_busyloop_timeout(struct vhost_dev= *dev, >>>> + struct vhost_vri= ng_state *s) >>>> +{ >>>> + return vhost_kernel_call(dev, VHOST_SET_VRING_BUSYLOOP_TIMEOUT,= s); >>>> +} >>>> + >>>> static int vhost_kernel_set_features(struct vhost_dev *dev, >>>> uint64_t features) >>>> { >>>> @@ -185,6 +191,8 @@ static const VhostOps kernel_ops =3D { >>>> .vhost_get_vring_base =3D vhost_kernel_get_vring_base, >>>> .vhost_set_vring_kick =3D vhost_kernel_set_vring_kick, >>>> .vhost_set_vring_call =3D vhost_kernel_set_vring_call, >>>> + .vhost_set_vring_busyloop_timeout =3D >>>> + vhost_kernel_set_vring_busyloop_tim= eout, >>>> .vhost_set_features =3D vhost_kernel_set_features, >>>> .vhost_get_features =3D vhost_kernel_get_features, >>>> .vhost_set_owner =3D vhost_kernel_set_owner, >>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >>>> index 4400718..ebf8b08 100644 >>>> --- a/hw/virtio/vhost.c >>>> +++ b/hw/virtio/vhost.c >>>> @@ -964,6 +964,28 @@ static void vhost_eventfd_del(MemoryListener *l= istener, >>>> { >>>> } >>>> +static int vhost_virtqueue_set_busyloop_timeout(struct vhost_dev *d= ev, >>>> + int n, uint32_t tim= eout) >>>> +{ >>>> + int vhost_vq_index =3D dev->vhost_ops->vhost_get_vq_index(dev, = n); >>>> + struct vhost_vring_state state =3D { >>>> + .index =3D vhost_vq_index, >>>> + .num =3D timeout, >>>> + }; >>>> + int r; >>>> + >>>> + if (!dev->vhost_ops->vhost_set_vring_busyloop_timeout) { >>>> + return -EINVAL; >>>> + } >>>> + >>>> + r =3D dev->vhost_ops->vhost_set_vring_busyloop_timeout(dev, &st= ate); >>>> + if (r) { >>>> + return r; >>>> + } >>>> + >>>> + return 0; >>>> +} >>>> + >>>> static int vhost_virtqueue_init(struct vhost_dev *dev, >>>> struct vhost_virtqueue *vq, int n) >>>> { >>>> @@ -994,7 +1016,7 @@ static void vhost_virtqueue_cleanup(struct vhos= t_virtqueue *vq) >>>> } >>>> int vhost_dev_init(struct vhost_dev *hdev, void *opaque, >>>> - VhostBackendType backend_type) >>>> + VhostBackendType backend_type, uint32_t busyloop= _timeout) >>>> { >>>> uint64_t features; >>>> int i, r; >>>> @@ -1035,6 +1057,17 @@ int vhost_dev_init(struct vhost_dev *hdev, vo= id *opaque, >>>> goto fail_vq; >>>> } >>>> } >>>> + >>>> + if (busyloop_timeout) { >>>> + for (i =3D 0; i < hdev->nvqs; ++i) { >>>> + r =3D vhost_virtqueue_set_busyloop_timeout(hdev, hdev->= vq_index + i, >>>> + busyloop_timeo= ut); >>>> + if (r < 0) { >>>> + goto fail_busyloop; >>>> + } >>>> + } >>>> + } >>>> + >>>> hdev->features =3D features; >>>> hdev->memory_listener =3D (MemoryListener) { >>>> @@ -1077,6 +1110,11 @@ int vhost_dev_init(struct vhost_dev *hdev, vo= id *opaque, >>>> hdev->memory_changed =3D false; >>>> memory_listener_register(&hdev->memory_listener, &address_spac= e_memory); >>>> return 0; >>>> +fail_busyloop: >>>> + while (--i >=3D 0) { >>>> + vhost_virtqueue_set_busyloop_timeout(hdev, hdev->vq_index += i, 0); >>>> + } >>>> + i =3D hdev->nvqs; >>>> fail_vq: >>>> while (--i >=3D 0) { >>>> vhost_virtqueue_cleanup(hdev->vqs + i); >>>> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/v= host-backend.h >>>> index 95fcc96..84e1cb7 100644 >>>> --- a/include/hw/virtio/vhost-backend.h >>>> +++ b/include/hw/virtio/vhost-backend.h >>>> @@ -57,6 +57,8 @@ typedef int (*vhost_set_vring_kick_op)(struct vhos= t_dev *dev, >>>> struct vhost_vring_file *fi= le); >>>> typedef int (*vhost_set_vring_call_op)(struct vhost_dev *dev, >>>> struct vhost_vring_file *fi= le); >>>> +typedef int (*vhost_set_vring_busyloop_timeout_op)(struct vhost_dev= *dev, >>>> + struct vhost_vri= ng_state *r); >>>> typedef int (*vhost_set_features_op)(struct vhost_dev *dev, >>>> uint64_t features); >>>> typedef int (*vhost_get_features_op)(struct vhost_dev *dev, >>>> @@ -91,6 +93,7 @@ typedef struct VhostOps { >>>> vhost_get_vring_base_op vhost_get_vring_base; >>>> vhost_set_vring_kick_op vhost_set_vring_kick; >>>> vhost_set_vring_call_op vhost_set_vring_call; >>>> + vhost_set_vring_busyloop_timeout_op vhost_set_vring_busyloop_ti= meout; >>>> vhost_set_features_op vhost_set_features; >>>> vhost_get_features_op vhost_get_features; >>>> vhost_set_owner_op vhost_set_owner; >>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h >>>> index b60d758..2106ed8 100644 >>>> --- a/include/hw/virtio/vhost.h >>>> +++ b/include/hw/virtio/vhost.h >>>> @@ -64,7 +64,8 @@ struct vhost_dev { >>>> }; >>>> int vhost_dev_init(struct vhost_dev *hdev, void *opaque, >>>> - VhostBackendType backend_type); >>>> + VhostBackendType backend_type, >>>> + uint32_t busyloop_timeout); >>>> void vhost_dev_cleanup(struct vhost_dev *hdev); >>>> int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev); >>>> void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev); >>>> diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h >>>> index 3389b41..8354b98 100644 >>>> --- a/include/net/vhost_net.h >>>> +++ b/include/net/vhost_net.h >>>> @@ -10,6 +10,7 @@ typedef struct vhost_net VHostNetState; >>>> typedef struct VhostNetOptions { >>>> VhostBackendType backend_type; >>>> NetClientState *net_backend; >>>> + uint32_t busyloop_timeout; >>>> void *opaque; >>>> } VhostNetOptions; >>>> diff --git a/net/tap.c b/net/tap.c >>>> index 740e8a2..7e28102 100644 >>>> --- a/net/tap.c >>>> +++ b/net/tap.c >>>> @@ -663,6 +663,11 @@ static void net_init_tap_one(const NetdevTapOpt= ions *tap, NetClientState *peer, >>>> options.backend_type =3D VHOST_BACKEND_TYPE_KERNEL; >>>> options.net_backend =3D &s->nc; >>>> + if (tap->has_vhost_poll_us) { >>>> + options.busyloop_timeout =3D tap->vhost_poll_us; >>>> + } else { >>>> + options.busyloop_timeout =3D 0; >>>> + } >>>> if (vhostfdname) { >>>> vhostfd =3D monitor_fd_param(cur_mon, vhostfdname, &er= r); >>>> @@ -686,8 +691,9 @@ static void net_init_tap_one(const NetdevTapOpti= ons *tap, NetClientState *peer, >>>> "vhost-net requested but could not be initi= alized"); >>>> return; >>>> } >>>> - } else if (vhostfdname) { >>>> - error_setg(errp, "vhostfd=3D is not valid without vhost"); >>>> + } else if (vhostfdname || tap->has_vhost_poll_us) { >>>> + error_setg(errp, "vhostfd(s)=3D or vhost_poll_us=3D is not = valid" >>>> + " without vhost"); >>>> } >>>> } >>>> diff --git a/net/vhost-user.c b/net/vhost-user.c >>>> index 1b9e73a..b200182 100644 >>>> --- a/net/vhost-user.c >>>> +++ b/net/vhost-user.c >>>> @@ -80,6 +80,7 @@ static int vhost_user_start(int queues, NetClientS= tate *ncs[]) >>>> options.net_backend =3D ncs[i]; >>>> options.opaque =3D s->chr; >>>> + options.busyloop_timeout =3D 0; >>>> s->vhost_net =3D vhost_net_init(&options); >>>> if (!s->vhost_net) { >>>> error_report("failed to init vhost_net for queue %d", = i); >>>> diff --git a/qapi-schema.json b/qapi-schema.json >>>> index 54634c4..8afdedf 100644 >>>> --- a/qapi-schema.json >>>> +++ b/qapi-schema.json >>>> @@ -2531,6 +2531,9 @@ >>>> # >>>> # @queues: #optional number of queues to be created for multiqueue= capable tap >>>> # >>>> +# @vhost-poll-us: #optional maximum number of microseconds that cou= ld >>>> +# be spent on busy polling for vhost net (since 2.7) >>>> +# >>>> # Since 1.2 >>>> ## >>>> { 'struct': 'NetdevTapOptions', >>>> @@ -2547,7 +2550,8 @@ >>>> '*vhostfd': 'str', >>>> '*vhostfds': 'str', >>>> '*vhostforce': 'bool', >>>> - '*queues': 'uint32'} } >>>> + '*queues': 'uint32', >>>> + '*vhost-poll-us': 'uint32'} } >>>> ## >>>> # @NetdevSocketOptions >>>> diff --git a/qemu-options.hx b/qemu-options.hx >>>> index 587de8f..22ddb82 100644 >>>> --- a/qemu-options.hx >>>> +++ b/qemu-options.hx >>>> @@ -1569,6 +1569,7 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev, >>>> "-netdev tap,id=3Dstr[,fd=3Dh][,fds=3Dx:y:...:z][,ifname=3Dnam= e][,script=3Dfile][,downscript=3Ddfile]\n" >>>> " [,helper=3Dhelper][,sndbuf=3Dnbytes][,vnet_hdr=3Don|= off][,vhost=3Don|off]\n" >>>> " [,vhostfd=3Dh][,vhostfds=3Dx:y:...:z][,vhostforce=3D= on|off][,queues=3Dn]\n" >>>> + " [,vhost-poll-us=3Dn]\n" >>>> " configure a host TAP network backend with ID = 'str'\n" >>>> " use network scripts 'file' (default=3D" DEFAU= LT_NETWORK_SCRIPT ")\n" >>>> " to configure it and 'dfile' (default=3D" DEFA= ULT_NETWORK_DOWN_SCRIPT ")\n" >>>> @@ -1588,6 +1589,8 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev, >>>> " use 'vhostfd=3Dh' to connect to an already op= ened vhost net device\n" >>>> " use 'vhostfds=3Dx:y:...:z to connect to multi= ple already opened vhost net devices\n" >>>> " use 'queues=3Dn' to specify the number of que= ues to be created for multiqueue TAP\n" >>>> + " use 'vhost-poll-us=3Dn' to speciy the maximum = number of microseconds that could be\n" > typo above > >>>> + " spent on busy polling for vhost net\n" >>>> "-netdev bridge,id=3Dstr[,br=3Dbridge][,helper=3Dhelper]\n" >>>> " configure a host TAP network backend with ID = 'str' that is\n" >>>> " connected to a bridge (default=3D" DEFAULT_BR= IDGE_INTERFACE ")\n" >>>> --=20 >>>> 2.5.0