Linux virtualization list

Linux virtualization list
 help / color / mirror / Atom feed

* Re: [PATCH net-next v5 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Jason Wang @ 2018-07-04  9:18 UTC (permalink / raw)
  To: Toshiaki Makita, xiangxia.m.yue
  Cc: netdev, virtualization, Tonghao Zhang, mst
In-Reply-To: <808dea9b-6240-8055-acaa-a1b96389a673@lab.ntt.co.jp>



On 2018年07月04日 15:59, Toshiaki Makita wrote:
> On 2018/07/04 13:31, xiangxia.m.yue@gmail.com wrote:
> ...
>> +static void vhost_net_busy_poll(struct vhost_net *net,
>> +				struct vhost_virtqueue *rvq,
>> +				struct vhost_virtqueue *tvq,
>> +				bool rx)
>> +{
>> +	unsigned long uninitialized_var(endtime);
>> +	unsigned long busyloop_timeout;
>> +	struct socket *sock;
>> +	struct vhost_virtqueue *vq = rx ? tvq : rvq;
>> +
>> +	mutex_lock_nested(&vq->mutex, rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
>> +
>> +	vhost_disable_notify(&net->dev, vq);
>> +	sock = rvq->private_data;
>> +	busyloop_timeout = rx ? rvq->busyloop_timeout : tvq->busyloop_timeout;
>> +
>> +	preempt_disable();
>> +	endtime = busy_clock() + busyloop_timeout;
>> +	while (vhost_can_busy_poll(tvq->dev, endtime) &&
>> +	       !(sock && sk_has_rx_data(sock->sk)) &&
>> +	       vhost_vq_avail_empty(tvq->dev, tvq))
>> +		cpu_relax();
>> +	preempt_enable();
>> +
>> +	if ((rx && !vhost_vq_avail_empty(&net->dev, vq)) ||
>> +	    (!rx && (sock && sk_has_rx_data(sock->sk)))) {
>> +		vhost_poll_queue(&vq->poll);
>> +	} else if (vhost_enable_notify(&net->dev, vq) && rx) {
> Hmm... on tx here sock has no rx data, so you are waiting for sock
> wakeup for rx and vhost_enable_notify() seems not needed. Do you want
> this actually?
>
> } else if (rx && vhost_enable_notify(&net->dev, vq)) {

Right, rx need to be checked first here.

Thanks

>> +		vhost_disable_notify(&net->dev, vq);
>> +		vhost_poll_queue(&vq->poll);
>> +	}

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net-next 8/8] vhost: event suppression for packed ring
From: Wei Xu @ 2018-07-04  8:13 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, mst, netdev, linux-kernel, virtualization, maxime.coquelin
In-Reply-To: <493c3a3e-e088-d6fb-1da4-cda8bfb34400@redhat.com>

On Wed, Jul 04, 2018 at 01:23:18PM +0800, Jason Wang wrote:
> 
> 
> On 2018年07月04日 12:13, Wei Xu wrote:
> >On Tue, Jul 03, 2018 at 01:38:04PM +0800, Jason Wang wrote:
> >>This patch introduces support for event suppression. This is done by
> >>have a two areas: device area and driver area. One side could then try
> >>to disable or enable (delayed) notification from other side by using a
> >>boolean hint or event index interface in the areas.
> >>
> >>For more information, please refer Virtio spec.
> >>
> >>Signed-off-by: Jason Wang<jasowang@redhat.com>
> >>---
> >>  drivers/vhost/vhost.c | 191 ++++++++++++++++++++++++++++++++++++++++++++++----
> >>  drivers/vhost/vhost.h |  10 ++-
> >>  2 files changed, 185 insertions(+), 16 deletions(-)
> >>
> >>diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >>index 0f3f07c..cccbc82 100644
> >>--- a/drivers/vhost/vhost.c
> >>+++ b/drivers/vhost/vhost.c
> >>@@ -1115,10 +1115,15 @@ static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num,
> >>  			       struct vring_used __user *used)
> >>  {
> >>  	struct vring_desc_packed *packed = (struct vring_desc_packed *)desc;
> >>+	struct vring_packed_desc_event *driver_event =
> >>+		(struct vring_packed_desc_event *)avail;
> >>+	struct vring_packed_desc_event *device_event =
> >>+		(struct vring_packed_desc_event *)used;
> >>-	/* TODO: check device area and driver area */
> >>  	return access_ok(VERIFY_READ, packed, num * sizeof(*packed)) &&
> >>-	       access_ok(VERIFY_WRITE, packed, num * sizeof(*packed));
> >>+	       access_ok(VERIFY_WRITE, packed, num * sizeof(*packed)) &&
> >R/W parameter doesn't make sense to most architectures and the comment in x86
> >says WRITE is a superset of READ, is it possible to converge them here?
> >
> >/**
> >  * access_ok: - Checks if a user space pointer is valid
> >  * @type: Type of access: %VERIFY_READ or %VERIFY_WRITE.  Note that
> >  *        %VERIFY_WRITE is a superset of %VERIFY_READ - if it is safe
> >  *        to write to a block, it is always safe to read from it.
> >  * @addr: User space pointer to start of block to check
> >  * @size: Size of block to check
> >  *
> >  * Context: User context only. This function may sleep if pagefaults are
> >  *          enabled.
> >  *
> >  * Checks if a pointer to a block of memory in user space is valid.
> >  *
> >  * Returns true (nonzero) if the memory block may be valid, false (zero)
> >  * if it is definitely invalid.
> >  *
> >  * Note that, depending on architecture, this function probably just
> >  * checks that the pointer is in the user space range - after calling
> >  * this function, memory access functions may still return -EFAULT.
> >  */
> >#define access_ok(type, addr, size)
> >......
> >
> >Thanks,
> >Wei
> >
> 
> Well, this is a question that beyond the scope of this patch.
> 
> My understanding is we should keep it unless type was meaningless on all
> archs.

No problem, go ahead.

Wei

> 
> Thanks
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net-next v5 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Toshiaki Makita @ 2018-07-04  7:59 UTC (permalink / raw)
  To: xiangxia.m.yue, jasowang; +Cc: netdev, virtualization, Tonghao Zhang, mst
In-Reply-To: <1530678698-33427-4-git-send-email-xiangxia.m.yue@gmail.com>

On 2018/07/04 13:31, xiangxia.m.yue@gmail.com wrote:
...
> +static void vhost_net_busy_poll(struct vhost_net *net,
> +				struct vhost_virtqueue *rvq,
> +				struct vhost_virtqueue *tvq,
> +				bool rx)
> +{
> +	unsigned long uninitialized_var(endtime);
> +	unsigned long busyloop_timeout;
> +	struct socket *sock;
> +	struct vhost_virtqueue *vq = rx ? tvq : rvq;
> +
> +	mutex_lock_nested(&vq->mutex, rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
> +
> +	vhost_disable_notify(&net->dev, vq);
> +	sock = rvq->private_data;
> +	busyloop_timeout = rx ? rvq->busyloop_timeout : tvq->busyloop_timeout;
> +
> +	preempt_disable();
> +	endtime = busy_clock() + busyloop_timeout;
> +	while (vhost_can_busy_poll(tvq->dev, endtime) &&
> +	       !(sock && sk_has_rx_data(sock->sk)) &&
> +	       vhost_vq_avail_empty(tvq->dev, tvq))
> +		cpu_relax();
> +	preempt_enable();
> +
> +	if ((rx && !vhost_vq_avail_empty(&net->dev, vq)) ||
> +	    (!rx && (sock && sk_has_rx_data(sock->sk)))) {
> +		vhost_poll_queue(&vq->poll);
> +	} else if (vhost_enable_notify(&net->dev, vq) && rx) {

Hmm... on tx here sock has no rx data, so you are waiting for sock
wakeup for rx and vhost_enable_notify() seems not needed. Do you want
this actually?

} else if (rx && vhost_enable_notify(&net->dev, vq)) {

> +		vhost_disable_notify(&net->dev, vq);
> +		vhost_poll_queue(&vq->poll);
> +	}

-- 
Toshiaki Makita

^ permalink raw reply

* Re: [PATCH net-next v5 0/4] net: vhost: improve performance when enable busyloop
From: Jason Wang @ 2018-07-04  6:27 UTC (permalink / raw)
  To: xiangxia.m.yue; +Cc: netdev, virtualization, mst
In-Reply-To: <1530678698-33427-1-git-send-email-xiangxia.m.yue@gmail.com>



On 2018年07月04日 12:31, xiangxia.m.yue@gmail.com wrote:
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> This patches improve the guest receive and transmit performance.
> On the handle_tx side, we poll the sock receive queue at the same time.
> handle_rx do that in the same way.
>
> For more performance report, see patch 4.
>
> v4 -> v5:
> fix some issues
>
> v3 -> v4:
> fix some issues
>
> v2 -> v3:
> This patches are splited from previous big patch:
> http://patchwork.ozlabs.org/patch/934673/
>
> Tonghao Zhang (4):
>    vhost: lock the vqs one by one
>    net: vhost: replace magic number of lock annotation
>    net: vhost: factor out busy polling logic to vhost_net_busy_poll()
>    net: vhost: add rx busy polling in tx path
>
>   drivers/vhost/net.c   | 108 ++++++++++++++++++++++++++++----------------------
>   drivers/vhost/vhost.c |  24 ++++-------
>   2 files changed, 67 insertions(+), 65 deletions(-)
>

Acked-by: Jason Wang <jasowang@redhat.com>

Note: you probably need a rebase on top of Makita's recent patches. It 
depends on the order of being merged, let's see.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v2 net-next 0/4] vhost_net: Avoid vq kicks during busyloop
From: Michael S. Tsirkin @ 2018-07-04  5:35 UTC (permalink / raw)
  To: Toshiaki Makita; +Cc: kvm, netdev, virtualization, David S. Miller
In-Reply-To: <1530603094-2476-1-git-send-email-makita.toshiaki@lab.ntt.co.jp>

On Tue, Jul 03, 2018 at 04:31:30PM +0900, Toshiaki Makita wrote:
> Under heavy load vhost tx busypoll tend not to suppress vq kicks, which
> causes poor guest tx performance. The detailed scenario is described in
> commitlog of patch 2.
> Rx seems not to have that serious problem, but for consistency I made a
> similar change on rx to avoid rx wakeups (patch 3).
> Additionary patch 4 is to avoid rx kicks under heavy load during
> busypoll.
> 
> Tx performance is greatly improved by this change. I don't see notable
> performance change on rx with this series though.
> 
> Performance numbers (tx):
> 
> - Bulk transfer from guest to external physical server.
>     [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e --(wire)--> [Server]
> - Set 10us busypoll.
> - Guest disables checksum and TSO because of host XDP.
> - Measured single flow Mbps by netperf, and kicks by perf kvm stat
>   (EPT_MISCONFIG event).
> 
>                             Before              After
>                           Mbps  kicks/s      Mbps  kicks/s
> UDP_STREAM 1472byte              247758                 27
>                 Send   3645.37            6958.10
>                 Recv   3588.56            6958.10
>               1byte                9865                 37
>                 Send      4.34               5.43
>                 Recv      4.17               5.26
> TCP_STREAM             8801.03    45794   9592.77     2884

Thanks!
Series:

Acked-by: Michael S. Tsirkin <mst@redhat.com>


> v2:
> - Split patches into 3 parts (renaming variables, tx-kick fix, rx-wakeup
>   fix).
> - Avoid rx-kicks too (patch 4).
> - Don't memorize endtime as it is not needed for now.
> 
> Toshiaki Makita (4):
>   vhost_net: Rename local variables in vhost_net_rx_peek_head_len
>   vhost_net: Avoid tx vring kicks during busyloop
>   vhost_net: Avoid rx queue wake-ups during busypoll
>   vhost_net: Avoid rx vring kicks during busyloop
> 
>  drivers/vhost/net.c | 95 +++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 60 insertions(+), 35 deletions(-)
> 
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* Re: [PATCH net-next 6/8] virtio: introduce packed ring defines
From: Jason Wang @ 2018-07-04  5:26 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: kvm, mst, netdev, linux-kernel, virtualization, maxime.coquelin,
	wexu
In-Reply-To: <20180704044823.GA22808@debian>



On 2018年07月04日 12:48, Tiwei Bie wrote:
> On Tue, Jul 03, 2018 at 01:38:02PM +0800, Jason Wang wrote:
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   include/uapi/linux/virtio_config.h |  2 ++
>>   include/uapi/linux/virtio_ring.h   | 32 ++++++++++++++++++++++++++++++++
>>   2 files changed, 34 insertions(+)
>>
>> diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
>> index 449132c..947f6a3 100644
>> --- a/include/uapi/linux/virtio_config.h
>> +++ b/include/uapi/linux/virtio_config.h
>> @@ -75,6 +75,8 @@
>>    */
>>   #define VIRTIO_F_IOMMU_PLATFORM		33
>>   
>> +#define VIRTIO_F_RING_PACKED		34
> It's better to add some comments for this macro.

Ok.

>
>> +
>>   /*
>>    * Does the device support Single Root I/O Virtualization?
>>    */
>> diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
>> index 6d5d5fa..71c7a46 100644
>> --- a/include/uapi/linux/virtio_ring.h
>> +++ b/include/uapi/linux/virtio_ring.h
>> @@ -43,6 +43,8 @@
>>   #define VRING_DESC_F_WRITE	2
>>   /* This means the buffer contains a list of buffer descriptors. */
>>   #define VRING_DESC_F_INDIRECT	4
>> +#define VRING_DESC_F_AVAIL      7
> It's better to use tab between VRING_DESC_F_AVAIL and 7.

Let me fix.

>
>> +#define VRING_DESC_F_USED	15
> Maybe it's better to define VRING_DESC_F_AVAIL and
> VRING_DESC_F_USED as (1 << 7) and (1 << 15) or something
> similar to make them consistent with VRING_DESC_F_NEXT
> (1 << 0), VRING_DESC_F_WRITE (1 << 1) and
> VRING_DESC_F_INDIRECT (1 << 2).

Ok.

>
> I plan to define below macros in virtio_ring.c:
>
> #define _VRING_DESC_F_AVAIL(b)  ((__u16)(b) << 7)
> #define _VRING_DESC_F_USED(b)   ((__u16)(b) << 15)
>
>>   
>>   /* The Host uses this in used->flags to advise the Guest: don't kick me when
>>    * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
>> @@ -62,6 +64,36 @@
>>    * at the end of the used ring. Guest should ignore the used->flags field. */
>>   #define VIRTIO_RING_F_EVENT_IDX		29
>>   
>> +struct vring_desc_packed {
> We may have other related types named as:
>
> struct vring_packed;
> struct vring_packed_desc_event;
>
> So maybe it's better to name this type as:
>
> struct vring_packed_desc;

Yes.

>
>> +	/* Buffer Address. */
>> +	__virtio64 addr;
>> +	/* Buffer Length. */
>> +	__virtio32 len;
>> +	/* Buffer ID. */
>> +	__virtio16 id;
>> +	/* The flags depending on descriptor type. */
>> +	__virtio16 flags;
>> +};
>> +
>> +/* Enable events */
>> +#define RING_EVENT_FLAGS_ENABLE 0x0
>> +/* Disable events */
>> +#define RING_EVENT_FLAGS_DISABLE 0x1
>> +/*
>> + * Enable events for a specific descriptor
>> + * (as specified by Descriptor Ring Change Event Offset/Wrap Counter).
>> + * Only valid if VIRTIO_F_RING_EVENT_IDX has been negotiated.
>> + */
>> +#define RING_EVENT_FLAGS_DESC 0x2
> For above three macros, maybe it's better to name
> them as:
>
> VRING_EVENT_FLAGS_ENABLE
> VRING_EVENT_FLAGS_DISABLE
> VRING_EVENT_FLAGS_DESC
>
> or
>
> VRING_EVENT_F_ENABLE
> VRING_EVENT_F_DISABLE
> VRING_EVENT_F_DESC
>
> VRING_EVENT_F_* will be more consistent with
> VIRTIO_F_*, VRING_DESC_F_*, etc

True. Let me fix above in next version.

Thanks

>> +/* The value 0x3 is reserved */
>> +
>> +struct vring_packed_desc_event {
>> +	/* Descriptor Ring Change Event Offset and Wrap Counter */
>> +	__virtio16 off_wrap;
>> +	/* Descriptor Ring Change Event Flags */
>> +	__virtio16 flags;
>> +};
>> +
>>   /* Virtio ring descriptors: 16 bytes.  These can chain together via "next". */
>>   struct vring_desc {
>>   	/* Address (guest-physical). */
>> -- 
>> 2.7.4
>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net-next 8/8] vhost: event suppression for packed ring
From: Jason Wang @ 2018-07-04  5:23 UTC (permalink / raw)
  To: Wei Xu; +Cc: kvm, mst, netdev, linux-kernel, virtualization, maxime.coquelin
In-Reply-To: <20180704041352.GA9287@wei-ubt>



On 2018年07月04日 12:13, Wei Xu wrote:
> On Tue, Jul 03, 2018 at 01:38:04PM +0800, Jason Wang wrote:
>> This patch introduces support for event suppression. This is done by
>> have a two areas: device area and driver area. One side could then try
>> to disable or enable (delayed) notification from other side by using a
>> boolean hint or event index interface in the areas.
>>
>> For more information, please refer Virtio spec.
>>
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>> ---
>>   drivers/vhost/vhost.c | 191 ++++++++++++++++++++++++++++++++++++++++++++++----
>>   drivers/vhost/vhost.h |  10 ++-
>>   2 files changed, 185 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index 0f3f07c..cccbc82 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -1115,10 +1115,15 @@ static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num,
>>   			       struct vring_used __user *used)
>>   {
>>   	struct vring_desc_packed *packed = (struct vring_desc_packed *)desc;
>> +	struct vring_packed_desc_event *driver_event =
>> +		(struct vring_packed_desc_event *)avail;
>> +	struct vring_packed_desc_event *device_event =
>> +		(struct vring_packed_desc_event *)used;
>>   
>> -	/* TODO: check device area and driver area */
>>   	return access_ok(VERIFY_READ, packed, num * sizeof(*packed)) &&
>> -	       access_ok(VERIFY_WRITE, packed, num * sizeof(*packed));
>> +	       access_ok(VERIFY_WRITE, packed, num * sizeof(*packed)) &&
> R/W parameter doesn't make sense to most architectures and the comment in x86
> says WRITE is a superset of READ, is it possible to converge them here?
>
> /**
>   * access_ok: - Checks if a user space pointer is valid
>   * @type: Type of access: %VERIFY_READ or %VERIFY_WRITE.  Note that
>   *        %VERIFY_WRITE is a superset of %VERIFY_READ - if it is safe
>   *        to write to a block, it is always safe to read from it.
>   * @addr: User space pointer to start of block to check
>   * @size: Size of block to check
>   *
>   * Context: User context only. This function may sleep if pagefaults are
>   *          enabled.
>   *
>   * Checks if a pointer to a block of memory in user space is valid.
>   *
>   * Returns true (nonzero) if the memory block may be valid, false (zero)
>   * if it is definitely invalid.
>   *
>   * Note that, depending on architecture, this function probably just
>   * checks that the pointer is in the user space range - after calling
>   * this function, memory access functions may still return -EFAULT.
>   */
> #define access_ok(type, addr, size)
> ......
>
> Thanks,
> Wei
>

Well, this is a question that beyond the scope of this patch.

My understanding is we should keep it unless type was meaningless on all 
archs.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net-next 6/8] virtio: introduce packed ring defines
From: Tiwei Bie @ 2018-07-04  4:48 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, mst, netdev, linux-kernel, virtualization, maxime.coquelin,
	wexu
In-Reply-To: <1530596284-4101-7-git-send-email-jasowang@redhat.com>

On Tue, Jul 03, 2018 at 01:38:02PM +0800, Jason Wang wrote:
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  include/uapi/linux/virtio_config.h |  2 ++
>  include/uapi/linux/virtio_ring.h   | 32 ++++++++++++++++++++++++++++++++
>  2 files changed, 34 insertions(+)
> 
> diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> index 449132c..947f6a3 100644
> --- a/include/uapi/linux/virtio_config.h
> +++ b/include/uapi/linux/virtio_config.h
> @@ -75,6 +75,8 @@
>   */
>  #define VIRTIO_F_IOMMU_PLATFORM		33
>  
> +#define VIRTIO_F_RING_PACKED		34

It's better to add some comments for this macro.

> +
>  /*
>   * Does the device support Single Root I/O Virtualization?
>   */
> diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
> index 6d5d5fa..71c7a46 100644
> --- a/include/uapi/linux/virtio_ring.h
> +++ b/include/uapi/linux/virtio_ring.h
> @@ -43,6 +43,8 @@
>  #define VRING_DESC_F_WRITE	2
>  /* This means the buffer contains a list of buffer descriptors. */
>  #define VRING_DESC_F_INDIRECT	4
> +#define VRING_DESC_F_AVAIL      7

It's better to use tab between VRING_DESC_F_AVAIL and 7.

> +#define VRING_DESC_F_USED	15

Maybe it's better to define VRING_DESC_F_AVAIL and
VRING_DESC_F_USED as (1 << 7) and (1 << 15) or something
similar to make them consistent with VRING_DESC_F_NEXT
(1 << 0), VRING_DESC_F_WRITE (1 << 1) and
VRING_DESC_F_INDIRECT (1 << 2).

I plan to define below macros in virtio_ring.c:

#define _VRING_DESC_F_AVAIL(b)  ((__u16)(b) << 7)
#define _VRING_DESC_F_USED(b)   ((__u16)(b) << 15)

>  
>  /* The Host uses this in used->flags to advise the Guest: don't kick me when
>   * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
> @@ -62,6 +64,36 @@
>   * at the end of the used ring. Guest should ignore the used->flags field. */
>  #define VIRTIO_RING_F_EVENT_IDX		29
>  
> +struct vring_desc_packed {

We may have other related types named as:

struct vring_packed;
struct vring_packed_desc_event;

So maybe it's better to name this type as:

struct vring_packed_desc;

> +	/* Buffer Address. */
> +	__virtio64 addr;
> +	/* Buffer Length. */
> +	__virtio32 len;
> +	/* Buffer ID. */
> +	__virtio16 id;
> +	/* The flags depending on descriptor type. */
> +	__virtio16 flags;
> +};
> +
> +/* Enable events */
> +#define RING_EVENT_FLAGS_ENABLE 0x0
> +/* Disable events */
> +#define RING_EVENT_FLAGS_DISABLE 0x1
> +/*
> + * Enable events for a specific descriptor
> + * (as specified by Descriptor Ring Change Event Offset/Wrap Counter).
> + * Only valid if VIRTIO_F_RING_EVENT_IDX has been negotiated.
> + */
> +#define RING_EVENT_FLAGS_DESC 0x2

For above three macros, maybe it's better to name
them as:

VRING_EVENT_FLAGS_ENABLE
VRING_EVENT_FLAGS_DISABLE
VRING_EVENT_FLAGS_DESC

or

VRING_EVENT_F_ENABLE
VRING_EVENT_F_DISABLE
VRING_EVENT_F_DESC

VRING_EVENT_F_* will be more consistent with
VIRTIO_F_*, VRING_DESC_F_*, etc

> +/* The value 0x3 is reserved */
> +
> +struct vring_packed_desc_event {
> +	/* Descriptor Ring Change Event Offset and Wrap Counter */
> +	__virtio16 off_wrap;
> +	/* Descriptor Ring Change Event Flags */
> +	__virtio16 flags;
> +};
> +
>  /* Virtio ring descriptors: 16 bytes.  These can chain together via "next". */
>  struct vring_desc {
>  	/* Address (guest-physical). */
> -- 
> 2.7.4
> 

^ permalink raw reply

* [PATCH net-next v5 4/4] net: vhost: add rx busy polling in tx path
From: xiangxia.m.yue @ 2018-07-04  4:31 UTC (permalink / raw)
  To: jasowang; +Cc: mst, netdev, virtualization, Tonghao Zhang
In-Reply-To: <1530678698-33427-1-git-send-email-xiangxia.m.yue@gmail.com>

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This patch improves the guest receive and transmit performance.
On the handle_tx side, we poll the sock receive queue at the
same time. handle_rx do that in the same way.

We set the poll-us=100us and use the iperf3 to test
its bandwidth, use the netperf to test throughput and mean
latency. When running the tests, the vhost-net kthread of
that VM, is alway 100% CPU. The commands are shown as below.

iperf3  -s -D
iperf3  -c IP -i 1 -P 1 -t 20 -M 1400

or
netserver
netperf -H IP -t TCP_RR -l 20 -- -O "THROUGHPUT,MEAN_LATENCY"

host -> guest:
iperf3:
* With the patch:     27.0 Gbits/sec
* Without the patch:  14.4 Gbits/sec

netperf (TCP_RR):
* With the patch:     48039.56 trans/s, 20.64us mean latency
* Without the patch:  46027.07 trans/s, 21.58us mean latency

This patch also improves the guest transmit performance.

guest -> host:
iperf3:
* With the patch:     27.2 Gbits/sec
* Without the patch:  24.4 Gbits/sec

netperf (TCP_RR):
* With the patch:     47963.25 trans/s, 20.71us mean latency
* Without the patch:  45796.70 trans/s, 21.68us mean latency

Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
---
 drivers/vhost/net.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 2790959..3f26547 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -480,17 +480,13 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 				    struct iovec iov[], unsigned int iov_size,
 				    unsigned int *out_num, unsigned int *in_num)
 {
-	unsigned long uninitialized_var(endtime);
+	struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
 	int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
 				  out_num, in_num, NULL, NULL);
 
 	if (r == vq->num && vq->busyloop_timeout) {
-		preempt_disable();
-		endtime = busy_clock() + vq->busyloop_timeout;
-		while (vhost_can_busy_poll(vq->dev, endtime) &&
-		       vhost_vq_avail_empty(vq->dev, vq))
-			cpu_relax();
-		preempt_enable();
+		vhost_net_busy_poll(net, &rnvq->vq, vq, false);
+
 		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
 				      out_num, in_num, NULL, NULL);
 	}
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v5 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: xiangxia.m.yue @ 2018-07-04  4:31 UTC (permalink / raw)
  To: jasowang; +Cc: mst, netdev, virtualization, Tonghao Zhang
In-Reply-To: <1530678698-33427-1-git-send-email-xiangxia.m.yue@gmail.com>

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Factor out generic busy polling logic and will be
used for in tx path in the next patch. And with the patch,
qemu can set differently the busyloop_timeout for rx queue.

Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
---
 drivers/vhost/net.c | 94 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 55 insertions(+), 39 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 62bb8e8..2790959 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -429,6 +429,52 @@ static int vhost_net_enable_vq(struct vhost_net *n,
 	return vhost_poll_start(poll, sock->file);
 }
 
+static int sk_has_rx_data(struct sock *sk)
+{
+	struct socket *sock = sk->sk_socket;
+
+	if (sock->ops->peek_len)
+		return sock->ops->peek_len(sock);
+
+	return skb_queue_empty(&sk->sk_receive_queue);
+}
+
+static void vhost_net_busy_poll(struct vhost_net *net,
+				struct vhost_virtqueue *rvq,
+				struct vhost_virtqueue *tvq,
+				bool rx)
+{
+	unsigned long uninitialized_var(endtime);
+	unsigned long busyloop_timeout;
+	struct socket *sock;
+	struct vhost_virtqueue *vq = rx ? tvq : rvq;
+
+	mutex_lock_nested(&vq->mutex, rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
+
+	vhost_disable_notify(&net->dev, vq);
+	sock = rvq->private_data;
+	busyloop_timeout = rx ? rvq->busyloop_timeout : tvq->busyloop_timeout;
+
+	preempt_disable();
+	endtime = busy_clock() + busyloop_timeout;
+	while (vhost_can_busy_poll(tvq->dev, endtime) &&
+	       !(sock && sk_has_rx_data(sock->sk)) &&
+	       vhost_vq_avail_empty(tvq->dev, tvq))
+		cpu_relax();
+	preempt_enable();
+
+	if ((rx && !vhost_vq_avail_empty(&net->dev, vq)) ||
+	    (!rx && (sock && sk_has_rx_data(sock->sk)))) {
+		vhost_poll_queue(&vq->poll);
+	} else if (vhost_enable_notify(&net->dev, vq) && rx) {
+		vhost_disable_notify(&net->dev, vq);
+		vhost_poll_queue(&vq->poll);
+	}
+
+	mutex_unlock(&vq->mutex);
+}
+
+
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 				    struct vhost_virtqueue *vq,
 				    struct iovec iov[], unsigned int iov_size,
@@ -621,16 +667,6 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk)
 	return len;
 }
 
-static int sk_has_rx_data(struct sock *sk)
-{
-	struct socket *sock = sk->sk_socket;
-
-	if (sock->ops->peek_len)
-		return sock->ops->peek_len(sock);
-
-	return skb_queue_empty(&sk->sk_receive_queue);
-}
-
 static void vhost_rx_signal_used(struct vhost_net_virtqueue *nvq)
 {
 	struct vhost_virtqueue *vq = &nvq->vq;
@@ -645,39 +681,19 @@ static void vhost_rx_signal_used(struct vhost_net_virtqueue *nvq)
 
 static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
 {
-	struct vhost_net_virtqueue *rvq = &net->vqs[VHOST_NET_VQ_RX];
-	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
-	struct vhost_virtqueue *vq = &nvq->vq;
-	unsigned long uninitialized_var(endtime);
-	int len = peek_head_len(rvq, sk);
+	struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
+	struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
 
-	if (!len && vq->busyloop_timeout) {
-		/* Flush batched heads first */
-		vhost_rx_signal_used(rvq);
-		/* Both tx vq and rx socket were polled here */
-		mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_TX);
-		vhost_disable_notify(&net->dev, vq);
+	int len = peek_head_len(rnvq, sk);
 
-		preempt_disable();
-		endtime = busy_clock() + vq->busyloop_timeout;
-
-		while (vhost_can_busy_poll(&net->dev, endtime) &&
-		       !sk_has_rx_data(sk) &&
-		       vhost_vq_avail_empty(&net->dev, vq))
-			cpu_relax();
-
-		preempt_enable();
-
-		if (!vhost_vq_avail_empty(&net->dev, vq))
-			vhost_poll_queue(&vq->poll);
-		else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
-			vhost_disable_notify(&net->dev, vq);
-			vhost_poll_queue(&vq->poll);
-		}
+	if (!len && rnvq->vq.busyloop_timeout) {
+		/* Flush batched heads first */
+		vhost_rx_signal_used(rnvq);
 
-		mutex_unlock(&vq->mutex);
+		/* Both tx vq and rx socket were polled here */
+		vhost_net_busy_poll(net, &rnvq->vq, &tnvq->vq, true);
 
-		len = peek_head_len(rvq, sk);
+		len = peek_head_len(rnvq, sk);
 	}
 
 	return len;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v5 2/4] net: vhost: replace magic number of lock annotation
From: xiangxia.m.yue @ 2018-07-04  4:31 UTC (permalink / raw)
  To: jasowang; +Cc: mst, netdev, virtualization, Tonghao Zhang
In-Reply-To: <1530678698-33427-1-git-send-email-xiangxia.m.yue@gmail.com>

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Use the VHOST_NET_VQ_XXX as a subclass for mutex_lock_nested.

Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index e7cf7d2..62bb8e8 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -484,7 +484,7 @@ static void handle_tx(struct vhost_net *net)
 	bool zcopy, zcopy_used;
 	int sent_pkts = 0;
 
-	mutex_lock(&vq->mutex);
+	mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_TX);
 	sock = vq->private_data;
 	if (!sock)
 		goto out;
@@ -655,7 +655,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
 		/* Flush batched heads first */
 		vhost_rx_signal_used(rvq);
 		/* Both tx vq and rx socket were polled here */
-		mutex_lock_nested(&vq->mutex, 1);
+		mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_TX);
 		vhost_disable_notify(&net->dev, vq);
 
 		preempt_disable();
@@ -789,7 +789,7 @@ static void handle_rx(struct vhost_net *net)
 	__virtio16 num_buffers;
 	int recv_pkts = 0;
 
-	mutex_lock_nested(&vq->mutex, 0);
+	mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_RX);
 	sock = vq->private_data;
 	if (!sock)
 		goto out;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v5 1/4] net: vhost: lock the vqs one by one
From: xiangxia.m.yue @ 2018-07-04  4:31 UTC (permalink / raw)
  To: jasowang; +Cc: mst, netdev, virtualization, Tonghao Zhang
In-Reply-To: <1530678698-33427-1-git-send-email-xiangxia.m.yue@gmail.com>

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This patch changes the way that lock all vqs
at the same, to lock them one by one. It will
be used for next patch to avoid the deadlock.

Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 24 +++++++-----------------
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 895eaa2..4ca9383 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -294,8 +294,11 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
 {
 	int i;
 
-	for (i = 0; i < d->nvqs; ++i)
+	for (i = 0; i < d->nvqs; ++i) {
+		mutex_lock(&d->vqs[i]->mutex);
 		__vhost_vq_meta_reset(d->vqs[i]);
+		mutex_unlock(&d->vqs[i]->mutex);
+	}
 }
 
 static void vhost_vq_reset(struct vhost_dev *dev,
@@ -887,20 +890,6 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 #define vhost_get_used(vq, x, ptr) \
 	vhost_get_user(vq, x, ptr, VHOST_ADDR_USED)
 
-static void vhost_dev_lock_vqs(struct vhost_dev *d)
-{
-	int i = 0;
-	for (i = 0; i < d->nvqs; ++i)
-		mutex_lock_nested(&d->vqs[i]->mutex, i);
-}
-
-static void vhost_dev_unlock_vqs(struct vhost_dev *d)
-{
-	int i = 0;
-	for (i = 0; i < d->nvqs; ++i)
-		mutex_unlock(&d->vqs[i]->mutex);
-}
-
 static int vhost_new_umem_range(struct vhost_umem *umem,
 				u64 start, u64 size, u64 end,
 				u64 userspace_addr, int perm)
@@ -950,7 +939,10 @@ static void vhost_iotlb_notify_vq(struct vhost_dev *d,
 		if (msg->iova <= vq_msg->iova &&
 		    msg->iova + msg->size - 1 > vq_msg->iova &&
 		    vq_msg->type == VHOST_IOTLB_MISS) {
+			mutex_lock(&node->vq->mutex);
 			vhost_poll_queue(&node->vq->poll);
+			mutex_unlock(&node->vq->mutex);
+
 			list_del(&node->node);
 			kfree(node);
 		}
@@ -982,7 +974,6 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
 	int ret = 0;
 
 	mutex_lock(&dev->mutex);
-	vhost_dev_lock_vqs(dev);
 	switch (msg->type) {
 	case VHOST_IOTLB_UPDATE:
 		if (!dev->iotlb) {
@@ -1016,7 +1007,6 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
 		break;
 	}
 
-	vhost_dev_unlock_vqs(dev);
 	mutex_unlock(&dev->mutex);
 
 	return ret;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v5 0/4] net: vhost: improve performance when enable busyloop
From: xiangxia.m.yue @ 2018-07-04  4:31 UTC (permalink / raw)
  To: jasowang; +Cc: netdev, virtualization, mst

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This patches improve the guest receive and transmit performance.
On the handle_tx side, we poll the sock receive queue at the same time.
handle_rx do that in the same way.

For more performance report, see patch 4.

v4 -> v5:
fix some issues

v3 -> v4:
fix some issues

v2 -> v3:
This patches are splited from previous big patch:
http://patchwork.ozlabs.org/patch/934673/

Tonghao Zhang (4):
  vhost: lock the vqs one by one
  net: vhost: replace magic number of lock annotation
  net: vhost: factor out busy polling logic to vhost_net_busy_poll()
  net: vhost: add rx busy polling in tx path

 drivers/vhost/net.c   | 108 ++++++++++++++++++++++++++++----------------------
 drivers/vhost/vhost.c |  24 ++++-------
 2 files changed, 67 insertions(+), 65 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Re: [PATCH net-next 8/8] vhost: event suppression for packed ring
From: Wei Xu @ 2018-07-04  4:13 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, mst, netdev, linux-kernel, virtualization, maxime.coquelin
In-Reply-To: <1530596284-4101-9-git-send-email-jasowang@redhat.com>

On Tue, Jul 03, 2018 at 01:38:04PM +0800, Jason Wang wrote:
> This patch introduces support for event suppression. This is done by
> have a two areas: device area and driver area. One side could then try
> to disable or enable (delayed) notification from other side by using a
> boolean hint or event index interface in the areas.
> 
> For more information, please refer Virtio spec.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/vhost.c | 191 ++++++++++++++++++++++++++++++++++++++++++++++----
>  drivers/vhost/vhost.h |  10 ++-
>  2 files changed, 185 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 0f3f07c..cccbc82 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1115,10 +1115,15 @@ static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num,
>  			       struct vring_used __user *used)
>  {
>  	struct vring_desc_packed *packed = (struct vring_desc_packed *)desc;
> +	struct vring_packed_desc_event *driver_event =
> +		(struct vring_packed_desc_event *)avail;
> +	struct vring_packed_desc_event *device_event =
> +		(struct vring_packed_desc_event *)used;
>  
> -	/* TODO: check device area and driver area */
>  	return access_ok(VERIFY_READ, packed, num * sizeof(*packed)) &&
> -	       access_ok(VERIFY_WRITE, packed, num * sizeof(*packed));
> +	       access_ok(VERIFY_WRITE, packed, num * sizeof(*packed)) &&

R/W parameter doesn't make sense to most architectures and the comment in x86
says WRITE is a superset of READ, is it possible to converge them here?

/**
 * access_ok: - Checks if a user space pointer is valid
 * @type: Type of access: %VERIFY_READ or %VERIFY_WRITE.  Note that
 *        %VERIFY_WRITE is a superset of %VERIFY_READ - if it is safe
 *        to write to a block, it is always safe to read from it.
 * @addr: User space pointer to start of block to check
 * @size: Size of block to check
 *
 * Context: User context only. This function may sleep if pagefaults are
 *          enabled.
 *
 * Checks if a pointer to a block of memory in user space is valid.
 *
 * Returns true (nonzero) if the memory block may be valid, false (zero)
 * if it is definitely invalid.
 *
 * Note that, depending on architecture, this function probably just
 * checks that the pointer is in the user space range - after calling
 * this function, memory access functions may still return -EFAULT.
 */
#define access_ok(type, addr, size)     
......

Thanks,
Wei

> +	       access_ok(VERIFY_READ, driver_event, sizeof(*driver_event)) &&
> +	       access_ok(VERIFY_WRITE, device_event, sizeof(*device_event));
>  }
>  
>  static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num,
> @@ -1193,14 +1198,27 @@ static bool iotlb_access_ok(struct vhost_virtqueue *vq,
>  	return true;
>  }
>  
> -int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
> +int vq_iotlb_prefetch_packed(struct vhost_virtqueue *vq)
> +{
> +	int num = vq->num;
> +
> +	return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
> +			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
> +	       iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->desc,
> +			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
> +	       iotlb_access_ok(vq, VHOST_ACCESS_RO,
> +			       (u64)(uintptr_t)vq->driver_event,
> +			       sizeof(*vq->driver_event), VHOST_ADDR_AVAIL) &&
> +	       iotlb_access_ok(vq, VHOST_ACCESS_WO,
> +			       (u64)(uintptr_t)vq->device_event,
> +			       sizeof(*vq->device_event), VHOST_ADDR_USED);
> +}
> +
> +int vq_iotlb_prefetch_split(struct vhost_virtqueue *vq)
>  {
>  	size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
>  	unsigned int num = vq->num;
>  
> -	if (!vq->iotlb)
> -		return 1;
> -
>  	return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
>  			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
>  	       iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->avail,
> @@ -1212,6 +1230,17 @@ int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
>  			       num * sizeof(*vq->used->ring) + s,
>  			       VHOST_ADDR_USED);
>  }
> +
> +int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
> +{
> +	if (!vq->iotlb)
> +		return 1;
> +
> +	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> +		return vq_iotlb_prefetch_packed(vq);
> +	else
> +		return vq_iotlb_prefetch_split(vq);
> +}
>  EXPORT_SYMBOL_GPL(vq_iotlb_prefetch);
>  
>  /* Can we log writes? */
> @@ -1771,6 +1800,50 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
>  	return 0;
>  }
>  
> +static int vhost_update_device_flags(struct vhost_virtqueue *vq,
> +				     __virtio16 device_flags)
> +{
> +	void __user *flags;
> +
> +	if (vhost_put_user(vq, device_flags, &vq->device_event->flags,
> +			   VHOST_ADDR_USED) < 0)
> +		return -EFAULT;
> +	if (unlikely(vq->log_used)) {
> +		/* Make sure the flag is seen before log. */
> +		smp_wmb();
> +		/* Log used flag write. */
> +		flags = &vq->device_event->flags;
> +		log_write(vq->log_base, vq->log_addr +
> +			  (flags - (void __user *)vq->device_event),
> +			  sizeof(vq->device_event->flags));
> +		if (vq->log_ctx)
> +			eventfd_signal(vq->log_ctx, 1);
> +	}
> +	return 0;
> +}
> +
> +static int vhost_update_device_off_wrap(struct vhost_virtqueue *vq,
> +					__virtio16 device_off_wrap)
> +{
> +	void __user *off_wrap;
> +
> +	if (vhost_put_user(vq, device_off_wrap, &vq->device_event->off_wrap,
> +			   VHOST_ADDR_USED) < 0)
> +		return -EFAULT;
> +	if (unlikely(vq->log_used)) {
> +		/* Make sure the flag is seen before log. */
> +		smp_wmb();
> +		/* Log used flag write. */
> +		off_wrap = &vq->device_event->off_wrap;
> +		log_write(vq->log_base, vq->log_addr +
> +			  (off_wrap - (void __user *)vq->device_event),
> +			  sizeof(vq->device_event->off_wrap));
> +		if (vq->log_ctx)
> +			eventfd_signal(vq->log_ctx, 1);
> +	}
> +	return 0;
> +}
> +
>  static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
>  {
>  	if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx),
> @@ -2756,16 +2829,13 @@ int vhost_add_used_n(struct vhost_virtqueue *vq,
>  }
>  EXPORT_SYMBOL_GPL(vhost_add_used_n);
>  
> -static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> +static bool vhost_notify_split(struct vhost_dev *dev,
> +			       struct vhost_virtqueue *vq)
>  {
>  	__u16 old, new;
>  	__virtio16 event;
>  	bool v;
>  
> -	/* TODO: check driver area */
> -	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> -		return true;
> -
>  	/* Flush out used index updates. This is paired
>  	 * with the barrier that the Guest executes when enabling
>  	 * interrupts. */
> @@ -2798,6 +2868,64 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>  	return vring_need_event(vhost16_to_cpu(vq, event), new, old);
>  }
>  
> +static bool vhost_notify_packed(struct vhost_dev *dev,
> +				struct vhost_virtqueue *vq)
> +{
> +	__virtio16 event_off_wrap, event_flags;
> +	__u16 old, new, off_wrap;
> +	bool v;
> +
> +	/* Flush out used descriptors updates. This is paired
> +	 * with the barrier that the Guest executes when enabling
> +	 * interrupts.
> +	 */
> +	smp_mb();
> +
> +	if (vhost_get_avail(vq, event_flags,
> +			   &vq->driver_event->flags) < 0) {
> +		vq_err(vq, "Failed to get driver desc_event_flags");
> +		return true;
> +	}
> +
> +	if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX))
> +		return event_flags !=
> +		       cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE);
> +
> +	old = vq->signalled_used;
> +	v = vq->signalled_used_valid;
> +	new = vq->signalled_used = vq->last_used_idx;
> +	vq->signalled_used_valid = true;
> +
> +	if (event_flags != cpu_to_vhost16(vq, RING_EVENT_FLAGS_DESC))
> +		return event_flags !=
> +		       cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE);
> +
> +	/* Read desc event flags before event_off and event_wrap */
> +	smp_rmb();
> +
> +	if (vhost_get_avail(vq, event_off_wrap,
> +			    &vq->driver_event->off_wrap) < 0) {
> +		vq_err(vq, "Failed to get driver desc_event_off/wrap");
> +		return true;
> +	}
> +
> +	off_wrap = vhost16_to_cpu(vq, event_off_wrap);
> +
> +	if (unlikely(!v))
> +		return true;
> +
> +	return vhost_vring_packed_need_event(vq, vq->last_used_wrap_counter,
> +					     off_wrap, new, old);
> +}
> +
> +static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> +{
> +	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> +		return vhost_notify_packed(dev, vq);
> +	else
> +		return vhost_notify_split(dev, vq);
> +}
> +
>  /* This actually signals the guest, using eventfd. */
>  void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>  {
> @@ -2875,10 +3003,34 @@ static bool vhost_enable_notify_packed(struct vhost_dev *dev,
>  				       struct vhost_virtqueue *vq)
>  {
>  	struct vring_desc_packed *d = vq->desc_packed + vq->avail_idx;
> -	__virtio16 flags;
> +	__virtio16 flags = cpu_to_vhost16(vq, RING_EVENT_FLAGS_ENABLE);
>  	int ret;
>  
> -	/* TODO: enable notification through device area */
> +	if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY))
> +		return false;
> +	vq->used_flags &= ~VRING_USED_F_NO_NOTIFY;
> +
> +	if (vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
> +		__virtio16 off_wrap = cpu_to_vhost16(vq, vq->avail_idx |
> +				      vq->avail_wrap_counter << 15);
> +
> +		ret = vhost_update_device_off_wrap(vq, off_wrap);
> +		if (ret) {
> +			vq_err(vq, "Failed to write to off warp at %p: %d\n",
> +			       &vq->device_event->off_wrap, ret);
> +			return false;
> +		}
> +		/* Make sure off_wrap is wrote before flags */
> +		smp_wmb();
> +		flags = cpu_to_vhost16(vq, RING_EVENT_FLAGS_DESC);
> +	}
> +
> +	ret = vhost_update_device_flags(vq, flags);
> +	if (ret) {
> +		vq_err(vq, "Failed to enable notification at %p: %d\n",
> +			&vq->device_event->flags, ret);
> +		return false;
> +	}
>  
>  	/* They could have slipped one in as we were doing that: make
>  	 * sure it's written, then check again.
> @@ -2945,7 +3097,18 @@ EXPORT_SYMBOL_GPL(vhost_enable_notify);
>  static void vhost_disable_notify_packed(struct vhost_dev *dev,
>  					struct vhost_virtqueue *vq)
>  {
> -	/* TODO: disable notification through device area */
> +	__virtio16 flags;
> +	int r;
> +
> +	if (vq->used_flags & VRING_USED_F_NO_NOTIFY)
> +		return;
> +	vq->used_flags |= VRING_USED_F_NO_NOTIFY;
> +
> +	flags = cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE);
> +	r = vhost_update_device_flags(vq, flags);
> +	if (r)
> +		vq_err(vq, "Failed to enable notification at %p: %d\n",
> +		       &vq->device_event->flags, r);
>  }
>  
>  static void vhost_disable_notify_split(struct vhost_dev *dev,
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index db09958..d071daf 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -96,8 +96,14 @@ struct vhost_virtqueue {
>  		struct vring_desc __user *desc;
>  		struct vring_desc_packed __user *desc_packed;
>  	};
> -	struct vring_avail __user *avail;
> -	struct vring_used __user *used;
> +	union {
> +		struct vring_avail __user *avail;
> +		struct vring_packed_desc_event __user *driver_event;
> +	};
> +	union {
> +		struct vring_used __user *used;
> +		struct vring_packed_desc_event __user *device_event;
> +	};
>  	const struct vhost_umem_node *meta_iotlb[VHOST_NUM_ADDRS];
>  	struct file *kick;
>  	struct eventfd_ctx *call_ctx;
> -- 
> 2.7.4
> 

^ permalink raw reply

* Re: [PATCH net-next 1/8] vhost: move get_rx_bufs to vhost.c
From: Jason Wang @ 2018-07-04  3:28 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kvm, mst, netdev, linux-kernel, virtualization, maxime.coquelin,
	kbuild-all, wexu
In-Reply-To: <201807040813.LULqRH8c%fengguang.wu@intel.com>



On 2018年07月04日 09:01, kbuild test robot wrote:
> Hi Jason,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on net-next/master]
>
> url:https://github.com/0day-ci/linux/commits/Jason-Wang/Packed-virtqueue-for-vhost/20180703-154751
> config: x86_64-randconfig-s0-07032254 (attached as .config)
> compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
> reproduce:
>          # save the attached .config to linux build tree
>          make ARCH=x86_64
>
> Note: the linux-review/Jason-Wang/Packed-virtqueue-for-vhost/20180703-154751 HEAD 01b902f1126212ea2597e6d09802bd9c4431bf82 builds fine.
>        It only hurts bisectibility.
>
> All errors (new ones prefixed by >>):
>
>     drivers//vhost/net.c: In function 'handle_rx':
>>> drivers//vhost/net.c:738:15: error: implicit declaration of function 'get_rx_bufs' [-Werror=implicit-function-declaration]
>        headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
>                    ^~~~~~~~~~~
>     cc1: some warnings being treated as errors
>
> vim +/get_rx_bufs +738 drivers//vhost/net.c

My bad, forget to do one by one compling test.

Will post V2.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v2 net-next 4/4] vhost_net: Avoid rx vring kicks during busyloop
From: Jason Wang @ 2018-07-04  3:27 UTC (permalink / raw)
  To: Toshiaki Makita, David S. Miller, Michael S. Tsirkin
  Cc: netdev, kvm, virtualization
In-Reply-To: <1530603094-2476-5-git-send-email-makita.toshiaki@lab.ntt.co.jp>



On 2018年07月03日 15:31, Toshiaki Makita wrote:
> We may run out of avail rx ring descriptor under heavy load but busypoll
> did not detect it so busypoll may have exited prematurely. Avoid this by
> checking rx ring full during busypoll.
>
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> ---
>   drivers/vhost/net.c | 10 +++++++---
>   1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 791bc8b..b224036 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -658,6 +658,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
>   {
>   	struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
>   	struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
> +	struct vhost_virtqueue *rvq = &rnvq->vq;
>   	struct vhost_virtqueue *tvq = &tnvq->vq;
>   	unsigned long uninitialized_var(endtime);
>   	int len = peek_head_len(rnvq, sk);
> @@ -677,7 +678,8 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
>   				*busyloop_intr = true;
>   				break;
>   			}
> -			if (sk_has_rx_data(sk) ||
> +			if ((sk_has_rx_data(sk) &&
> +			     !vhost_vq_avail_empty(&net->dev, rvq)) ||
>   			    !vhost_vq_avail_empty(&net->dev, tvq))
>   				break;
>   			cpu_relax();
> @@ -827,7 +829,6 @@ static void handle_rx(struct vhost_net *net)
>   
>   	while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
>   						      &busyloop_intr))) {
> -		busyloop_intr = false;
>   		sock_len += sock_hlen;
>   		vhost_len = sock_len + vhost_hlen;
>   		headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
> @@ -838,7 +839,9 @@ static void handle_rx(struct vhost_net *net)
>   			goto out;
>   		/* OK, now we need to know about added descriptors. */
>   		if (!headcount) {
> -			if (unlikely(vhost_enable_notify(&net->dev, vq))) {
> +			if (unlikely(busyloop_intr)) {
> +				vhost_poll_queue(&vq->poll);
> +			} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>   				/* They have slipped one in as we were
>   				 * doing that: check again. */
>   				vhost_disable_notify(&net->dev, vq);
> @@ -848,6 +851,7 @@ static void handle_rx(struct vhost_net *net)
>   			 * they refilled. */
>   			goto out;
>   		}
> +		busyloop_intr = false;
>   		if (nvq->rx_ring)
>   			msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
>   		/* On overrun, truncate and discard */

Acked-by: Jason Wang <jasowang@redhat.com>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v2 net-next 3/4] vhost_net: Avoid rx queue wake-ups during busypoll
From: Jason Wang @ 2018-07-04  3:26 UTC (permalink / raw)
  To: Toshiaki Makita, David S. Miller, Michael S. Tsirkin
  Cc: netdev, kvm, virtualization
In-Reply-To: <1530603094-2476-4-git-send-email-makita.toshiaki@lab.ntt.co.jp>



On 2018年07月03日 15:31, Toshiaki Makita wrote:
> We may run handle_rx() while rx work is queued. For example a packet can
> push the rx work during the window before handle_rx calls
> vhost_net_disable_vq().
> In that case busypoll immediately exits due to vhost_has_work()
> condition and enables vq again. This can lead to another unnecessary rx
> wake-ups, so poll rx work instead of enabling the vq.
>
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> ---
>   drivers/vhost/net.c | 26 +++++++++++++++++++-------
>   1 file changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 811c0e5..791bc8b 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -653,7 +653,8 @@ static void vhost_rx_signal_used(struct vhost_net_virtqueue *nvq)
>   	nvq->done_idx = 0;
>   }
>   
> -static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
> +static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
> +				      bool *busyloop_intr)
>   {
>   	struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
>   	struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
> @@ -671,11 +672,16 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
>   		preempt_disable();
>   		endtime = busy_clock() + tvq->busyloop_timeout;
>   
> -		while (vhost_can_busy_poll(endtime) &&
> -		       !vhost_has_work(&net->dev) &&
> -		       !sk_has_rx_data(sk) &&
> -		       vhost_vq_avail_empty(&net->dev, tvq))
> +		while (vhost_can_busy_poll(endtime)) {
> +			if (vhost_has_work(&net->dev)) {
> +				*busyloop_intr = true;
> +				break;
> +			}
> +			if (sk_has_rx_data(sk) ||
> +			    !vhost_vq_avail_empty(&net->dev, tvq))
> +				break;
>   			cpu_relax();
> +		}
>   
>   		preempt_enable();
>   
> @@ -795,6 +801,7 @@ static void handle_rx(struct vhost_net *net)
>   	s16 headcount;
>   	size_t vhost_hlen, sock_hlen;
>   	size_t vhost_len, sock_len;
> +	bool busyloop_intr = false;
>   	struct socket *sock;
>   	struct iov_iter fixup;
>   	__virtio16 num_buffers;
> @@ -818,7 +825,9 @@ static void handle_rx(struct vhost_net *net)
>   		vq->log : NULL;
>   	mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
>   
> -	while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
> +	while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
> +						      &busyloop_intr))) {
> +		busyloop_intr = false;
>   		sock_len += sock_hlen;
>   		vhost_len = sock_len + vhost_hlen;
>   		headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
> @@ -905,7 +914,10 @@ static void handle_rx(struct vhost_net *net)
>   			goto out;
>   		}
>   	}
> -	vhost_net_enable_vq(net, vq);
> +	if (unlikely(busyloop_intr))
> +		vhost_poll_queue(&vq->poll);
> +	else
> +		vhost_net_enable_vq(net, vq);
>   out:
>   	vhost_rx_signal_used(nvq);
>   	mutex_unlock(&vq->mutex);

Acked-by: Jason Wang <jasowang@redhat.com>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v2 net-next 4/4] vhost_net: Avoid rx vring kicks during busyloop
From: Jason Wang @ 2018-07-04  3:26 UTC (permalink / raw)
  To: Toshiaki Makita, David S. Miller, Michael S. Tsirkin
  Cc: netdev, kvm, virtualization
In-Reply-To: <251f4d24-f8e2-7549-064e-94b4d6c6f4bd@lab.ntt.co.jp>



On 2018年07月04日 09:21, Toshiaki Makita wrote:
> On 2018/07/03 18:05, Jason Wang wrote:
>> On 2018年07月03日 15:31, Toshiaki Makita wrote:
>>> We may run out of avail rx ring descriptor under heavy load but busypoll
>>> did not detect it so busypoll may have exited prematurely. Avoid this by
>>> checking rx ring full during busypoll.
>> Actually, we're checking whether it was empty in fact?
> My understanding is that on rx empty avail means no free descriptors
> from the perspective of host so the ring is conceptually full.

Ok.

>
>>> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
>>> ---
>>>    drivers/vhost/net.c | 10 +++++++---
>>>    1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>> index 791bc8b..b224036 100644
>>> --- a/drivers/vhost/net.c
>>> +++ b/drivers/vhost/net.c
>>> @@ -658,6 +658,7 @@ static int vhost_net_rx_peek_head_len(struct
>>> vhost_net *net, struct sock *sk,
>>>    {
>>>        struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
>>>        struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
>>> +    struct vhost_virtqueue *rvq = &rnvq->vq;
>>>        struct vhost_virtqueue *tvq = &tnvq->vq;
>>>        unsigned long uninitialized_var(endtime);
>>>        int len = peek_head_len(rnvq, sk);
>>> @@ -677,7 +678,8 @@ static int vhost_net_rx_peek_head_len(struct
>>> vhost_net *net, struct sock *sk,
>>>                    *busyloop_intr = true;
>>>                    break;
>>>                }
>>> -            if (sk_has_rx_data(sk) ||
>>> +            if ((sk_has_rx_data(sk) &&
>>> +                 !vhost_vq_avail_empty(&net->dev, rvq)) ||
>>>                    !vhost_vq_avail_empty(&net->dev, tvq))
>>>                    break;
>>>                cpu_relax();
>>> @@ -827,7 +829,6 @@ static void handle_rx(struct vhost_net *net)
>>>    
>> I thought below codes should belong to patch 3. Or I may miss something.
> That codes are added for the case that busypoll is waiting for rx vq
> avail but interrupted by another work. At the point of patch 3 busypoll
> does not wait for rx vq avail so I don't think we move them to patch 3.

I get this.

Thanks

>
>> Thanks
>>
>>>        while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
>>>                                  &busyloop_intr))) {
>>> -        busyloop_intr = false;
>>>            sock_len += sock_hlen;
>>>            vhost_len = sock_len + vhost_hlen;
>>>            headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
>>> @@ -838,7 +839,9 @@ static void handle_rx(struct vhost_net *net)
>>>                goto out;
>>>            /* OK, now we need to know about added descriptors. */
>>>            if (!headcount) {
>>> -            if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>>> +            if (unlikely(busyloop_intr)) {
>>> +                vhost_poll_queue(&vq->poll);
>>> +            } else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>>>                    /* They have slipped one in as we were
>>>                     * doing that: check again. */
>>>                    vhost_disable_notify(&net->dev, vq);
>>> @@ -848,6 +851,7 @@ static void handle_rx(struct vhost_net *net)
>>>                 * they refilled. */
>>>                goto out;
>>>            }
>>> +        busyloop_intr = false;
>>>            if (nvq->rx_ring)
>>>                msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
>>>            /* On overrun, truncate and discard */

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net-next v4 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Tonghao Zhang @ 2018-07-04  1:40 UTC (permalink / raw)
  To: jasowang
  Cc: Linux Kernel Network Developers, virtualization, Tonghao Zhang,
	mst
In-Reply-To: <2f5d20a0-cc95-ed37-3f6c-3368f92a7586@redhat.com>

On Tue, Jul 3, 2018 at 1:55 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
>
> On 2018年07月03日 13:48, Tonghao Zhang wrote:
> > On Tue, Jul 3, 2018 at 10:12 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >>
> >> On 2018年07月02日 20:57, xiangxia.m.yue@gmail.com wrote:
> >>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >>>
> >>> Factor out generic busy polling logic and will be
> >>> used for in tx path in the next patch. And with the patch,
> >>> qemu can set differently the busyloop_timeout for rx queue.
> >>>
> >>> Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
> >>> ---
> >>>    drivers/vhost/net.c | 94 +++++++++++++++++++++++++++++++----------------------
> >>>    1 file changed, 55 insertions(+), 39 deletions(-)
> >>>
> >>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> >>> index 62bb8e8..2790959 100644
> >>> --- a/drivers/vhost/net.c
> >>> +++ b/drivers/vhost/net.c
> >>> @@ -429,6 +429,52 @@ static int vhost_net_enable_vq(struct vhost_net *n,
> >>>        return vhost_poll_start(poll, sock->file);
> >>>    }
> >>>
> >>> +static int sk_has_rx_data(struct sock *sk)
> >>> +{
> >>> +     struct socket *sock = sk->sk_socket;
> >>> +
> >>> +     if (sock->ops->peek_len)
> >>> +             return sock->ops->peek_len(sock);
> >>> +
> >>> +     return skb_queue_empty(&sk->sk_receive_queue);
> >>> +}
> >>> +
> >>> +static void vhost_net_busy_poll(struct vhost_net *net,
> >>> +                             struct vhost_virtqueue *rvq,
> >>> +                             struct vhost_virtqueue *tvq,
> >>> +                             bool rx)
> >>> +{
> >>> +     unsigned long uninitialized_var(endtime);
> >>> +     unsigned long busyloop_timeout;
> >>> +     struct socket *sock;
> >>> +     struct vhost_virtqueue *vq = rx ? tvq : rvq;
> >>> +
> >>> +     mutex_lock_nested(&vq->mutex, rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
> >>> +
> >>> +     vhost_disable_notify(&net->dev, vq);
> >>> +     sock = rvq->private_data;
> >>> +     busyloop_timeout = rx ? rvq->busyloop_timeout : tvq->busyloop_timeout;
> >>> +
> >>> +     preempt_disable();
> >>> +     endtime = busy_clock() + busyloop_timeout;
> >>> +     while (vhost_can_busy_poll(tvq->dev, endtime) &&
> >>> +            !(sock && sk_has_rx_data(sock->sk)) &&
> >>> +            vhost_vq_avail_empty(tvq->dev, tvq))
> >>> +             cpu_relax();
> >>> +     preempt_enable();
> >>> +
> >>> +     if ((rx && !vhost_vq_avail_empty(&net->dev, vq)) ||
> >>> +         (!rx && (sock && sk_has_rx_data(sock->sk)))) {
> >>> +             vhost_poll_queue(&vq->poll);
> >>> +     } else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
> >> One last question, do we need this for rx? This check will be always
> >> true under light or medium load.
> > will remove the 'unlikely'
>
> Not only the unlikely. We only want rx kick when packet is pending at
> socket but we're out of available buffers. This is not the case here
> (you have polled the vq above).
>
> So we probably want to do this only when rx == true.
got it. change it to:
+       } else if (vhost_enable_notify(&net->dev, vq) && rx) {
+               vhost_disable_notify(&net->dev, vq);
+               vhost_poll_queue(&vq->poll);
+       }

> Thanks
>
> >
> >> Thanks
> >>
> >>> +             vhost_disable_notify(&net->dev, vq);
> >>> +             vhost_poll_queue(&vq->poll);
> >>> +     }
> >>> +
> >>> +     mutex_unlock(&vq->mutex);
> >>> +}
> >>> +
> >>> +
> >>>    static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
> >>>                                    struct vhost_virtqueue *vq,
> >>>                                    struct iovec iov[], unsigned int iov_size,
> >>> @@ -621,16 +667,6 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk)
> >>>        return len;
> >>>    }
> >>>
> >>> -static int sk_has_rx_data(struct sock *sk)
> >>> -{
> >>> -     struct socket *sock = sk->sk_socket;
> >>> -
> >>> -     if (sock->ops->peek_len)
> >>> -             return sock->ops->peek_len(sock);
> >>> -
> >>> -     return skb_queue_empty(&sk->sk_receive_queue);
> >>> -}
> >>> -
> >>>    static void vhost_rx_signal_used(struct vhost_net_virtqueue *nvq)
> >>>    {
> >>>        struct vhost_virtqueue *vq = &nvq->vq;
> >>> @@ -645,39 +681,19 @@ static void vhost_rx_signal_used(struct vhost_net_virtqueue *nvq)
> >>>
> >>>    static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
> >>>    {
> >>> -     struct vhost_net_virtqueue *rvq = &net->vqs[VHOST_NET_VQ_RX];
> >>> -     struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
> >>> -     struct vhost_virtqueue *vq = &nvq->vq;
> >>> -     unsigned long uninitialized_var(endtime);
> >>> -     int len = peek_head_len(rvq, sk);
> >>> +     struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
> >>> +     struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
> >>>
> >>> -     if (!len && vq->busyloop_timeout) {
> >>> -             /* Flush batched heads first */
> >>> -             vhost_rx_signal_used(rvq);
> >>> -             /* Both tx vq and rx socket were polled here */
> >>> -             mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_TX);
> >>> -             vhost_disable_notify(&net->dev, vq);
> >>> +     int len = peek_head_len(rnvq, sk);
> >>>
> >>> -             preempt_disable();
> >>> -             endtime = busy_clock() + vq->busyloop_timeout;
> >>> -
> >>> -             while (vhost_can_busy_poll(&net->dev, endtime) &&
> >>> -                    !sk_has_rx_data(sk) &&
> >>> -                    vhost_vq_avail_empty(&net->dev, vq))
> >>> -                     cpu_relax();
> >>> -
> >>> -             preempt_enable();
> >>> -
> >>> -             if (!vhost_vq_avail_empty(&net->dev, vq))
> >>> -                     vhost_poll_queue(&vq->poll);
> >>> -             else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
> >>> -                     vhost_disable_notify(&net->dev, vq);
> >>> -                     vhost_poll_queue(&vq->poll);
> >>> -             }
> >>> +     if (!len && rnvq->vq.busyloop_timeout) {
> >>> +             /* Flush batched heads first */
> >>> +             vhost_rx_signal_used(rnvq);
> >>>
> >>> -             mutex_unlock(&vq->mutex);
> >>> +             /* Both tx vq and rx socket were polled here */
> >>> +             vhost_net_busy_poll(net, &rnvq->vq, &tnvq->vq, true);
> >>>
> >>> -             len = peek_head_len(rvq, sk);
> >>> +             len = peek_head_len(rnvq, sk);
> >>>        }
> >>>
> >>>        return len;
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v2 net-next 4/4] vhost_net: Avoid rx vring kicks during busyloop
From: Toshiaki Makita @ 2018-07-04  1:21 UTC (permalink / raw)
  To: Jason Wang, David S. Miller, Michael S. Tsirkin
  Cc: netdev, kvm, virtualization
In-Reply-To: <abc4f8ce-35da-f628-811f-053250fa9b0e@redhat.com>

On 2018/07/03 18:05, Jason Wang wrote:
> On 2018年07月03日 15:31, Toshiaki Makita wrote:
>> We may run out of avail rx ring descriptor under heavy load but busypoll
>> did not detect it so busypoll may have exited prematurely. Avoid this by
>> checking rx ring full during busypoll.
> 
> Actually, we're checking whether it was empty in fact?

My understanding is that on rx empty avail means no free descriptors
from the perspective of host so the ring is conceptually full.

>> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
>> ---
>>   drivers/vhost/net.c | 10 +++++++---
>>   1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index 791bc8b..b224036 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -658,6 +658,7 @@ static int vhost_net_rx_peek_head_len(struct
>> vhost_net *net, struct sock *sk,
>>   {
>>       struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
>>       struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
>> +    struct vhost_virtqueue *rvq = &rnvq->vq;
>>       struct vhost_virtqueue *tvq = &tnvq->vq;
>>       unsigned long uninitialized_var(endtime);
>>       int len = peek_head_len(rnvq, sk);
>> @@ -677,7 +678,8 @@ static int vhost_net_rx_peek_head_len(struct
>> vhost_net *net, struct sock *sk,
>>                   *busyloop_intr = true;
>>                   break;
>>               }
>> -            if (sk_has_rx_data(sk) ||
>> +            if ((sk_has_rx_data(sk) &&
>> +                 !vhost_vq_avail_empty(&net->dev, rvq)) ||
>>                   !vhost_vq_avail_empty(&net->dev, tvq))
>>                   break;
>>               cpu_relax();
>> @@ -827,7 +829,6 @@ static void handle_rx(struct vhost_net *net)
>>   
> 
> I thought below codes should belong to patch 3. Or I may miss something.

That codes are added for the case that busypoll is waiting for rx vq
avail but interrupted by another work. At the point of patch 3 busypoll
does not wait for rx vq avail so I don't think we move them to patch 3.

> 
> Thanks
> 
>>       while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
>>                                 &busyloop_intr))) {
>> -        busyloop_intr = false;
>>           sock_len += sock_hlen;
>>           vhost_len = sock_len + vhost_hlen;
>>           headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
>> @@ -838,7 +839,9 @@ static void handle_rx(struct vhost_net *net)
>>               goto out;
>>           /* OK, now we need to know about added descriptors. */
>>           if (!headcount) {
>> -            if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>> +            if (unlikely(busyloop_intr)) {
>> +                vhost_poll_queue(&vq->poll);
>> +            } else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>>                   /* They have slipped one in as we were
>>                    * doing that: check again. */
>>                   vhost_disable_notify(&net->dev, vq);
>> @@ -848,6 +851,7 @@ static void handle_rx(struct vhost_net *net)
>>                * they refilled. */
>>               goto out;
>>           }
>> +        busyloop_intr = false;
>>           if (nvq->rx_ring)
>>               msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
>>           /* On overrun, truncate and discard */

-- 
Toshiaki Makita

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net-next 1/8] vhost: move get_rx_bufs to vhost.c
From: kbuild test robot @ 2018-07-04  1:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, mst, netdev, linux-kernel, virtualization, maxime.coquelin,
	kbuild-all, wexu
In-Reply-To: <1530596284-4101-2-git-send-email-jasowang@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 12105 bytes --]

Hi Jason,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Jason-Wang/Packed-virtqueue-for-vhost/20180703-154751
config: x86_64-randconfig-s0-07032254 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

Note: the linux-review/Jason-Wang/Packed-virtqueue-for-vhost/20180703-154751 HEAD 01b902f1126212ea2597e6d09802bd9c4431bf82 builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

   drivers//vhost/net.c: In function 'handle_rx':
>> drivers//vhost/net.c:738:15: error: implicit declaration of function 'get_rx_bufs' [-Werror=implicit-function-declaration]
      headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
                  ^~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/get_rx_bufs +738 drivers//vhost/net.c

030881372 Jason Wang         2016-03-04  687  
3a4d5c94e Michael S. Tsirkin 2010-01-14  688  /* Expects to be always run from workqueue - which acts as
3a4d5c94e Michael S. Tsirkin 2010-01-14  689   * read-size critical section for our kind of RCU. */
94249369e Jason Wang         2011-01-17  690  static void handle_rx(struct vhost_net *net)
3a4d5c94e Michael S. Tsirkin 2010-01-14  691  {
81f95a558 Michael S. Tsirkin 2013-04-28  692  	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_RX];
81f95a558 Michael S. Tsirkin 2013-04-28  693  	struct vhost_virtqueue *vq = &nvq->vq;
8dd014adf David Stevens      2010-07-27  694  	unsigned uninitialized_var(in), log;
8dd014adf David Stevens      2010-07-27  695  	struct vhost_log *vq_log;
8dd014adf David Stevens      2010-07-27  696  	struct msghdr msg = {
8dd014adf David Stevens      2010-07-27  697  		.msg_name = NULL,
8dd014adf David Stevens      2010-07-27  698  		.msg_namelen = 0,
8dd014adf David Stevens      2010-07-27  699  		.msg_control = NULL, /* FIXME: get and handle RX aux data. */
8dd014adf David Stevens      2010-07-27  700  		.msg_controllen = 0,
8dd014adf David Stevens      2010-07-27  701  		.msg_flags = MSG_DONTWAIT,
8dd014adf David Stevens      2010-07-27  702  	};
0960b6417 Jason Wang         2015-02-15  703  	struct virtio_net_hdr hdr = {
0960b6417 Jason Wang         2015-02-15  704  		.flags = 0,
0960b6417 Jason Wang         2015-02-15  705  		.gso_type = VIRTIO_NET_HDR_GSO_NONE
8dd014adf David Stevens      2010-07-27  706  	};
8dd014adf David Stevens      2010-07-27  707  	size_t total_len = 0;
910a578f7 Michael S. Tsirkin 2012-10-24  708  	int err, mergeable;
f5a4941aa Jason Wang         2018-05-29  709  	s16 headcount;
8dd014adf David Stevens      2010-07-27  710  	size_t vhost_hlen, sock_hlen;
8dd014adf David Stevens      2010-07-27  711  	size_t vhost_len, sock_len;
2e26af79b Asias He           2013-05-07  712  	struct socket *sock;
ba7438aed Al Viro            2014-12-10  713  	struct iov_iter fixup;
0960b6417 Jason Wang         2015-02-15  714  	__virtio16 num_buffers;
db688c24e Paolo Abeni        2018-04-24  715  	int recv_pkts = 0;
8dd014adf David Stevens      2010-07-27  716  
aaa3149bb Jason Wang         2018-03-26  717  	mutex_lock_nested(&vq->mutex, 0);
2e26af79b Asias He           2013-05-07  718  	sock = vq->private_data;
2e26af79b Asias He           2013-05-07  719  	if (!sock)
2e26af79b Asias He           2013-05-07  720  		goto out;
6b1e6cc78 Jason Wang         2016-06-23  721  
6b1e6cc78 Jason Wang         2016-06-23  722  	if (!vq_iotlb_prefetch(vq))
6b1e6cc78 Jason Wang         2016-06-23  723  		goto out;
6b1e6cc78 Jason Wang         2016-06-23  724  
8ea8cf89e Michael S. Tsirkin 2011-05-20  725  	vhost_disable_notify(&net->dev, vq);
8241a1e46 Jason Wang         2016-06-01  726  	vhost_net_disable_vq(net, vq);
2e26af79b Asias He           2013-05-07  727  
81f95a558 Michael S. Tsirkin 2013-04-28  728  	vhost_hlen = nvq->vhost_hlen;
81f95a558 Michael S. Tsirkin 2013-04-28  729  	sock_hlen = nvq->sock_hlen;
8dd014adf David Stevens      2010-07-27  730  
ea16c5143 Michael S. Tsirkin 2014-06-05  731  	vq_log = unlikely(vhost_has_feature(vq, VHOST_F_LOG_ALL)) ?
8dd014adf David Stevens      2010-07-27  732  		vq->log : NULL;
ea16c5143 Michael S. Tsirkin 2014-06-05  733  	mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
8dd014adf David Stevens      2010-07-27  734  
030881372 Jason Wang         2016-03-04  735  	while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
8dd014adf David Stevens      2010-07-27  736  		sock_len += sock_hlen;
8dd014adf David Stevens      2010-07-27  737  		vhost_len = sock_len + vhost_hlen;
f5a4941aa Jason Wang         2018-05-29 @738  		headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
f5a4941aa Jason Wang         2018-05-29  739  					vhost_len, &in, vq_log, &log,
94249369e Jason Wang         2011-01-17  740  					likely(mergeable) ? UIO_MAXIOV : 1);
8dd014adf David Stevens      2010-07-27  741  		/* On error, stop handling until the next kick. */
8dd014adf David Stevens      2010-07-27  742  		if (unlikely(headcount < 0))
8241a1e46 Jason Wang         2016-06-01  743  			goto out;
8dd014adf David Stevens      2010-07-27  744  		/* OK, now we need to know about added descriptors. */
8dd014adf David Stevens      2010-07-27  745  		if (!headcount) {
8ea8cf89e Michael S. Tsirkin 2011-05-20  746  			if (unlikely(vhost_enable_notify(&net->dev, vq))) {
8dd014adf David Stevens      2010-07-27  747  				/* They have slipped one in as we were
8dd014adf David Stevens      2010-07-27  748  				 * doing that: check again. */
8ea8cf89e Michael S. Tsirkin 2011-05-20  749  				vhost_disable_notify(&net->dev, vq);
8dd014adf David Stevens      2010-07-27  750  				continue;
8dd014adf David Stevens      2010-07-27  751  			}
8dd014adf David Stevens      2010-07-27  752  			/* Nothing new?  Wait for eventfd to tell us
8dd014adf David Stevens      2010-07-27  753  			 * they refilled. */
8241a1e46 Jason Wang         2016-06-01  754  			goto out;
8dd014adf David Stevens      2010-07-27  755  		}
5990a3051 Jason Wang         2018-01-04  756  		if (nvq->rx_ring)
6e474083f Wei Xu             2017-12-01  757  			msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
6e474083f Wei Xu             2017-12-01  758  		/* On overrun, truncate and discard */
6e474083f Wei Xu             2017-12-01  759  		if (unlikely(headcount > UIO_MAXIOV)) {
6e474083f Wei Xu             2017-12-01  760  			iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
6e474083f Wei Xu             2017-12-01  761  			err = sock->ops->recvmsg(sock, &msg,
6e474083f Wei Xu             2017-12-01  762  						 1, MSG_DONTWAIT | MSG_TRUNC);
6e474083f Wei Xu             2017-12-01  763  			pr_debug("Discarded rx packet: len %zd\n", sock_len);
6e474083f Wei Xu             2017-12-01  764  			continue;
6e474083f Wei Xu             2017-12-01  765  		}
8dd014adf David Stevens      2010-07-27  766  		/* We don't need to be notified again. */
ba7438aed Al Viro            2014-12-10  767  		iov_iter_init(&msg.msg_iter, READ, vq->iov, in, vhost_len);
ba7438aed Al Viro            2014-12-10  768  		fixup = msg.msg_iter;
ba7438aed Al Viro            2014-12-10  769  		if (unlikely((vhost_hlen))) {
ba7438aed Al Viro            2014-12-10  770  			/* We will supply the header ourselves
ba7438aed Al Viro            2014-12-10  771  			 * TODO: support TSO.
ba7438aed Al Viro            2014-12-10  772  			 */
ba7438aed Al Viro            2014-12-10  773  			iov_iter_advance(&msg.msg_iter, vhost_hlen);
ba7438aed Al Viro            2014-12-10  774  		}
1b7841404 Ying Xue           2015-03-02  775  		err = sock->ops->recvmsg(sock, &msg,
8dd014adf David Stevens      2010-07-27  776  					 sock_len, MSG_DONTWAIT | MSG_TRUNC);
8dd014adf David Stevens      2010-07-27  777  		/* Userspace might have consumed the packet meanwhile:
8dd014adf David Stevens      2010-07-27  778  		 * it's not supposed to do this usually, but might be hard
8dd014adf David Stevens      2010-07-27  779  		 * to prevent. Discard data we got (if any) and keep going. */
8dd014adf David Stevens      2010-07-27  780  		if (unlikely(err != sock_len)) {
8dd014adf David Stevens      2010-07-27  781  			pr_debug("Discarded rx packet: "
8dd014adf David Stevens      2010-07-27  782  				 " len %d, expected %zd\n", err, sock_len);
8dd014adf David Stevens      2010-07-27  783  			vhost_discard_vq_desc(vq, headcount);
8dd014adf David Stevens      2010-07-27  784  			continue;
8dd014adf David Stevens      2010-07-27  785  		}
ba7438aed Al Viro            2014-12-10  786  		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
4c5a84421 Michael S. Tsirkin 2015-02-25  787  		if (unlikely(vhost_hlen)) {
4c5a84421 Michael S. Tsirkin 2015-02-25  788  			if (copy_to_iter(&hdr, sizeof(hdr),
4c5a84421 Michael S. Tsirkin 2015-02-25  789  					 &fixup) != sizeof(hdr)) {
4c5a84421 Michael S. Tsirkin 2015-02-25  790  				vq_err(vq, "Unable to write vnet_hdr "
4c5a84421 Michael S. Tsirkin 2015-02-25  791  				       "at addr %p\n", vq->iov->iov_base);
8241a1e46 Jason Wang         2016-06-01  792  				goto out;
8dd014adf David Stevens      2010-07-27  793  			}
4c5a84421 Michael S. Tsirkin 2015-02-25  794  		} else {
4c5a84421 Michael S. Tsirkin 2015-02-25  795  			/* Header came from socket; we'll need to patch
4c5a84421 Michael S. Tsirkin 2015-02-25  796  			 * ->num_buffers over if VIRTIO_NET_F_MRG_RXBUF
4c5a84421 Michael S. Tsirkin 2015-02-25  797  			 */
4c5a84421 Michael S. Tsirkin 2015-02-25  798  			iov_iter_advance(&fixup, sizeof(hdr));
4c5a84421 Michael S. Tsirkin 2015-02-25  799  		}
8dd014adf David Stevens      2010-07-27  800  		/* TODO: Should check and handle checksum. */
5201aa49b Michael S. Tsirkin 2015-02-03  801  
0960b6417 Jason Wang         2015-02-15  802  		num_buffers = cpu_to_vhost16(vq, headcount);
cfbdab951 Jason Wang         2011-01-17  803  		if (likely(mergeable) &&
0d79a493e Michael S. Tsirkin 2015-02-25  804  		    copy_to_iter(&num_buffers, sizeof num_buffers,
0d79a493e Michael S. Tsirkin 2015-02-25  805  				 &fixup) != sizeof num_buffers) {
8dd014adf David Stevens      2010-07-27  806  			vq_err(vq, "Failed num_buffers write");
8dd014adf David Stevens      2010-07-27  807  			vhost_discard_vq_desc(vq, headcount);
8241a1e46 Jason Wang         2016-06-01  808  			goto out;
8dd014adf David Stevens      2010-07-27  809  		}
f5a4941aa Jason Wang         2018-05-29  810  		nvq->done_idx += headcount;
f5a4941aa Jason Wang         2018-05-29  811  		if (nvq->done_idx > VHOST_RX_BATCH)
f5a4941aa Jason Wang         2018-05-29  812  			vhost_rx_signal_used(nvq);
8dd014adf David Stevens      2010-07-27  813  		if (unlikely(vq_log))
8dd014adf David Stevens      2010-07-27  814  			vhost_log_write(vq, vq_log, log, vhost_len);
8dd014adf David Stevens      2010-07-27  815  		total_len += vhost_len;
db688c24e Paolo Abeni        2018-04-24  816  		if (unlikely(total_len >= VHOST_NET_WEIGHT) ||
db688c24e Paolo Abeni        2018-04-24  817  		    unlikely(++recv_pkts >= VHOST_NET_PKT_WEIGHT)) {
8dd014adf David Stevens      2010-07-27  818  			vhost_poll_queue(&vq->poll);
8241a1e46 Jason Wang         2016-06-01  819  			goto out;
8dd014adf David Stevens      2010-07-27  820  		}
8dd014adf David Stevens      2010-07-27  821  	}
8241a1e46 Jason Wang         2016-06-01  822  	vhost_net_enable_vq(net, vq);
2e26af79b Asias He           2013-05-07  823  out:
f5a4941aa Jason Wang         2018-05-29  824  	vhost_rx_signal_used(nvq);
8dd014adf David Stevens      2010-07-27  825  	mutex_unlock(&vq->mutex);
8dd014adf David Stevens      2010-07-27  826  }
8dd014adf David Stevens      2010-07-27  827  

:::::: The code at line 738 was first introduced by commit
:::::: f5a4941aa6d190e676065e8f4ed35999f52a01c3 vhost_net: flush batched heads before trying to busy polling

:::::: TO: Jason Wang <jasowang@redhat.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34871 bytes --]

[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 14/15] drm/virtio: Remove unecessary dma_fence_ops
From: Daniel Vetter @ 2018-07-03 11:14 UTC (permalink / raw)
  To: DRI Development
  Cc: David Airlie, Daniel Vetter, Intel Graphics Development,
	virtualization
In-Reply-To: <20180503142603.28513-15-daniel.vetter@ffwll.ch>

On Thu, May 03, 2018 at 04:26:02PM +0200, Daniel Vetter wrote:
> dma_fence_default_wait is the default now, same for the trivial
> enable_signaling implementation.
> 
> Reviewed-by: Eric Anholt <eric@anholt.net>
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: David Airlie <airlied@linux.ie>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: virtualization@lists.linux-foundation.org

Ok pulled in the remaining patches here with acks/r-bs, will resend the
others.
-Daniel

> ---
>  drivers/gpu/drm/virtio/virtgpu_fence.c | 7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/virtio/virtgpu_fence.c b/drivers/gpu/drm/virtio/virtgpu_fence.c
> index 23353521f903..00c742a441bf 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_fence.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_fence.c
> @@ -36,11 +36,6 @@ static const char *virtio_get_timeline_name(struct dma_fence *f)
>  	return "controlq";
>  }
>  
> -static bool virtio_enable_signaling(struct dma_fence *f)
> -{
> -	return true;
> -}
> -
>  static bool virtio_signaled(struct dma_fence *f)
>  {
>  	struct virtio_gpu_fence *fence = to_virtio_fence(f);
> @@ -67,9 +62,7 @@ static void virtio_timeline_value_str(struct dma_fence *f, char *str, int size)
>  static const struct dma_fence_ops virtio_fence_ops = {
>  	.get_driver_name     = virtio_get_driver_name,
>  	.get_timeline_name   = virtio_get_timeline_name,
> -	.enable_signaling    = virtio_enable_signaling,
>  	.signaled            = virtio_signaled,
> -	.wait                = dma_fence_default_wait,
>  	.fence_value_str     = virtio_fence_value_str,
>  	.timeline_value_str  = virtio_timeline_value_str,
>  };
> -- 
> 2.17.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply

* Re: [PATCH v2 net-next 4/4] vhost_net: Avoid rx vring kicks during busyloop
From: Jason Wang @ 2018-07-03  9:05 UTC (permalink / raw)
  To: Toshiaki Makita, David S. Miller, Michael S. Tsirkin
  Cc: netdev, kvm, virtualization
In-Reply-To: <1530603094-2476-5-git-send-email-makita.toshiaki@lab.ntt.co.jp>



On 2018年07月03日 15:31, Toshiaki Makita wrote:
> We may run out of avail rx ring descriptor under heavy load but busypoll
> did not detect it so busypoll may have exited prematurely. Avoid this by
> checking rx ring full during busypoll.

Actually, we're checking whether it was empty in fact?

>
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> ---
>   drivers/vhost/net.c | 10 +++++++---
>   1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 791bc8b..b224036 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -658,6 +658,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
>   {
>   	struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
>   	struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
> +	struct vhost_virtqueue *rvq = &rnvq->vq;
>   	struct vhost_virtqueue *tvq = &tnvq->vq;
>   	unsigned long uninitialized_var(endtime);
>   	int len = peek_head_len(rnvq, sk);
> @@ -677,7 +678,8 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
>   				*busyloop_intr = true;
>   				break;
>   			}
> -			if (sk_has_rx_data(sk) ||
> +			if ((sk_has_rx_data(sk) &&
> +			     !vhost_vq_avail_empty(&net->dev, rvq)) ||
>   			    !vhost_vq_avail_empty(&net->dev, tvq))
>   				break;
>   			cpu_relax();
> @@ -827,7 +829,6 @@ static void handle_rx(struct vhost_net *net)
>   

I thought below codes should belong to patch 3. Or I may miss something.

Thanks

>   	while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
>   						      &busyloop_intr))) {
> -		busyloop_intr = false;
>   		sock_len += sock_hlen;
>   		vhost_len = sock_len + vhost_hlen;
>   		headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
> @@ -838,7 +839,9 @@ static void handle_rx(struct vhost_net *net)
>   			goto out;
>   		/* OK, now we need to know about added descriptors. */
>   		if (!headcount) {
> -			if (unlikely(vhost_enable_notify(&net->dev, vq))) {
> +			if (unlikely(busyloop_intr)) {
> +				vhost_poll_queue(&vq->poll);
> +			} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>   				/* They have slipped one in as we were
>   				 * doing that: check again. */
>   				vhost_disable_notify(&net->dev, vq);
> @@ -848,6 +851,7 @@ static void handle_rx(struct vhost_net *net)
>   			 * they refilled. */
>   			goto out;
>   		}
> +		busyloop_intr = false;
>   		if (nvq->rx_ring)
>   			msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
>   		/* On overrun, truncate and discard */

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v2 net-next 2/4] vhost_net: Avoid tx vring kicks during busyloop
From: Jason Wang @ 2018-07-03  9:04 UTC (permalink / raw)
  To: Toshiaki Makita, David S. Miller, Michael S. Tsirkin
  Cc: netdev, kvm, virtualization
In-Reply-To: <1530603094-2476-3-git-send-email-makita.toshiaki@lab.ntt.co.jp>



On 2018年07月03日 15:31, Toshiaki Makita wrote:
> Under heavy load vhost busypoll may run without suppressing
> notification. For example tx zerocopy callback can push tx work while
> handle_tx() is running, then busyloop exits due to vhost_has_work()
> condition and enables notification but immediately reenters handle_tx()
> because the pushed work was tx. In this case handle_tx() tries to
> disable notification again, but when using event_idx it by design
> cannot. Then busyloop will run without suppressing notification.
> Another example is the case where handle_tx() tries to enable
> notification but avail idx is advanced so disables it again. This case
> also leads to the same situation with event_idx.
>
> The problem is that once we enter this situation busyloop does not work
> under heavy load for considerable amount of time, because notification
> is likely to happen during busyloop and handle_tx() immediately enables
> notification after notification happens. Specifically busyloop detects
> notification by vhost_has_work() and then handle_tx() calls
> vhost_enable_notify(). Because the detected work was the tx work, it
> enters handle_tx(), and enters busyloop without suppression again.
> This is likely to be repeated, so with event_idx we are almost not able
> to suppress notification in this case.
>
> To fix this, poll the work instead of enabling notification when
> busypoll is interrupted by something. IMHO vhost_has_work() is kind of
> interruption rather than a signal to completely cancel the busypoll, so
> let's run busypoll after the necessary work is done.
>
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> ---
>   drivers/vhost/net.c | 35 ++++++++++++++++++++++-------------
>   1 file changed, 22 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 3939c50..811c0e5 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -396,13 +396,10 @@ static inline unsigned long busy_clock(void)
>   	return local_clock() >> 10;
>   }
>   
> -static bool vhost_can_busy_poll(struct vhost_dev *dev,
> -				unsigned long endtime)
> +static bool vhost_can_busy_poll(unsigned long endtime)
>   {
> -	return likely(!need_resched()) &&
> -	       likely(!time_after(busy_clock(), endtime)) &&
> -	       likely(!signal_pending(current)) &&
> -	       !vhost_has_work(dev);
> +	return likely(!need_resched() && !time_after(busy_clock(), endtime) &&
> +		      !signal_pending(current));
>   }
>   
>   static void vhost_net_disable_vq(struct vhost_net *n,
> @@ -434,7 +431,8 @@ static int vhost_net_enable_vq(struct vhost_net *n,
>   static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
>   				    struct vhost_virtqueue *vq,
>   				    struct iovec iov[], unsigned int iov_size,
> -				    unsigned int *out_num, unsigned int *in_num)
> +				    unsigned int *out_num, unsigned int *in_num,
> +				    bool *busyloop_intr)
>   {
>   	unsigned long uninitialized_var(endtime);
>   	int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> @@ -443,9 +441,15 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
>   	if (r == vq->num && vq->busyloop_timeout) {
>   		preempt_disable();
>   		endtime = busy_clock() + vq->busyloop_timeout;
> -		while (vhost_can_busy_poll(vq->dev, endtime) &&
> -		       vhost_vq_avail_empty(vq->dev, vq))
> +		while (vhost_can_busy_poll(endtime)) {
> +			if (vhost_has_work(vq->dev)) {
> +				*busyloop_intr = true;
> +				break;
> +			}
> +			if (!vhost_vq_avail_empty(vq->dev, vq))
> +				break;
>   			cpu_relax();
> +		}
>   		preempt_enable();
>   		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
>   				      out_num, in_num, NULL, NULL);
> @@ -501,20 +505,24 @@ static void handle_tx(struct vhost_net *net)
>   	zcopy = nvq->ubufs;
>   
>   	for (;;) {
> +		bool busyloop_intr;
> +
>   		/* Release DMAs done buffers first */
>   		if (zcopy)
>   			vhost_zerocopy_signal_used(net, vq);
>   
> -
> +		busyloop_intr = false;
>   		head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
>   						ARRAY_SIZE(vq->iov),
> -						&out, &in);
> +						&out, &in, &busyloop_intr);
>   		/* On error, stop handling until the next kick. */
>   		if (unlikely(head < 0))
>   			break;
>   		/* Nothing new?  Wait for eventfd to tell us they refilled. */
>   		if (head == vq->num) {
> -			if (unlikely(vhost_enable_notify(&net->dev, vq))) {
> +			if (unlikely(busyloop_intr)) {
> +				vhost_poll_queue(&vq->poll);
> +			} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>   				vhost_disable_notify(&net->dev, vq);
>   				continue;
>   			}
> @@ -663,7 +671,8 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
>   		preempt_disable();
>   		endtime = busy_clock() + tvq->busyloop_timeout;
>   
> -		while (vhost_can_busy_poll(&net->dev, endtime) &&
> +		while (vhost_can_busy_poll(endtime) &&
> +		       !vhost_has_work(&net->dev) &&
>   		       !sk_has_rx_data(sk) &&
>   		       vhost_vq_avail_empty(&net->dev, tvq))
>   			cpu_relax();

Acked-by: Jason Wang <jasowang@redhat.com>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v2 net-next 1/4] vhost_net: Rename local variables in vhost_net_rx_peek_head_len
From: Jason Wang @ 2018-07-03  9:03 UTC (permalink / raw)
  To: Toshiaki Makita, David S. Miller, Michael S. Tsirkin
  Cc: netdev, kvm, virtualization
In-Reply-To: <1530603094-2476-2-git-send-email-makita.toshiaki@lab.ntt.co.jp>



On 2018年07月03日 15:31, Toshiaki Makita wrote:
> So we can easily see which variable is for which, tx or rx.
>
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> ---
>   drivers/vhost/net.c | 34 +++++++++++++++++-----------------
>   1 file changed, 17 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 29756d8..3939c50 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -647,39 +647,39 @@ static void vhost_rx_signal_used(struct vhost_net_virtqueue *nvq)
>   
>   static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
>   {
> -	struct vhost_net_virtqueue *rvq = &net->vqs[VHOST_NET_VQ_RX];
> -	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
> -	struct vhost_virtqueue *vq = &nvq->vq;
> +	struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
> +	struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
> +	struct vhost_virtqueue *tvq = &tnvq->vq;
>   	unsigned long uninitialized_var(endtime);
> -	int len = peek_head_len(rvq, sk);
> +	int len = peek_head_len(rnvq, sk);
>   
> -	if (!len && vq->busyloop_timeout) {
> +	if (!len && tvq->busyloop_timeout) {
>   		/* Flush batched heads first */
> -		vhost_rx_signal_used(rvq);
> +		vhost_rx_signal_used(rnvq);
>   		/* Both tx vq and rx socket were polled here */
> -		mutex_lock_nested(&vq->mutex, 1);
> -		vhost_disable_notify(&net->dev, vq);
> +		mutex_lock_nested(&tvq->mutex, 1);
> +		vhost_disable_notify(&net->dev, tvq);
>   
>   		preempt_disable();
> -		endtime = busy_clock() + vq->busyloop_timeout;
> +		endtime = busy_clock() + tvq->busyloop_timeout;
>   
>   		while (vhost_can_busy_poll(&net->dev, endtime) &&
>   		       !sk_has_rx_data(sk) &&
> -		       vhost_vq_avail_empty(&net->dev, vq))
> +		       vhost_vq_avail_empty(&net->dev, tvq))
>   			cpu_relax();
>   
>   		preempt_enable();
>   
> -		if (!vhost_vq_avail_empty(&net->dev, vq))
> -			vhost_poll_queue(&vq->poll);
> -		else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
> -			vhost_disable_notify(&net->dev, vq);
> -			vhost_poll_queue(&vq->poll);
> +		if (!vhost_vq_avail_empty(&net->dev, tvq)) {
> +			vhost_poll_queue(&tvq->poll);
> +		} else if (unlikely(vhost_enable_notify(&net->dev, tvq))) {
> +			vhost_disable_notify(&net->dev, tvq);
> +			vhost_poll_queue(&tvq->poll);
>   		}
>   
> -		mutex_unlock(&vq->mutex);
> +		mutex_unlock(&tvq->mutex);
>   
> -		len = peek_head_len(rvq, sk);
> +		len = peek_head_len(rnvq, sk);
>   	}
>   
>   	return len;

Acked-by: Jason Wang <jasowang@redhat.com>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox