Re: [virtio-dev] [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets

virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

* Re: [virtio-dev] [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
       [not found] ` <20220901021038.84751-3-gavinl@nvidia.com>
@ 2022-09-07  2:17   ` Jason Wang
  2022-09-07  5:31   ` Michael S. Tsirkin
  2022-09-22  9:35   ` Michael S. Tsirkin
  2 siblings, 0 replies; 29+ messages in thread
From: Jason Wang @ 2022-09-07  2:17 UTC (permalink / raw)
  To: Gavin Li, stephen, davem, jesse.brandeburg, alexander.h.duyck,
	kuba, sridhar.samudrala, loseweigh, netdev, virtualization,
	virtio-dev, mst
  Cc: gavi


在 2022/9/1 10:10, Gavin Li 写道:
> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> packets even when GUEST_* offloads are not present on the device.
> However, if guest GSO is not supported, it would be sufficient to
> allocate segments to cover just up the MTU size and no further.
> Allocating the maximum amount of segments results in a large waste of
> buffer space in the queue, which limits the number of packets that can
> be buffered and can result in reduced performance.
>
> Therefore, if guest GSO is not supported, use the MTU to calculate the
> optimal amount of segments required.
>
> When guest offload is enabled at runtime, RQ already has packets of bytes
> less than 64K. So when packet of 64KB arrives, all the packets of such
> size will be dropped. and RQ is now not usable.
>
> So this means that during set_guest_offloads() phase, RQs have to be
> destroyed and recreated, which requires almost driver reload.
>
> If VIRTIO_NET_F_CTRL_GUEST_OFFLOADS has been negotiated, then it should
> always treat them as GSO enabled.
>
> Accordingly, for now the assumption is that if guest GSO has been
> negotiated then it has been enabled, even if it's actually been disabled
> at runtime through VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.


Nit: Actually, it's not the assumption but the behavior of the codes 
itself. Since we don't try to change guest offloading in probe so it's 
ok to check GSO via negotiated features?

Thanks


>
> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> 1 VQ, queue size 1024, before and after the change, with the iperf
> server running over the virtio-net interface.
>
> MTU(Bytes)/Bandwidth (Gbit/s)
>               Before   After
>    1500        22.5     22.4
>    9000        12.8     25.9
>
> Signed-off-by: Gavin Li <gavinl@nvidia.com>
> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
> changelog:
> v4->v5
> - Addressed comments from Michael S. Tsirkin
> - Improve commit message
> v3->v4
> - Addressed comments from Si-Wei
> - Rename big_packets_sg_num with big_packets_num_skbfrags
> v2->v3
> - Addressed comments from Si-Wei
> - Simplify the condition check to enable the optimization
> v1->v2
> - Addressed comments from Jason, Michael, Si-Wei.
> - Remove the flag of guest GSO support, set sg_num for big packets and
>    use it directly
> - Recalculate sg_num for big packets in virtnet_set_guest_offloads
> - Replace the round up algorithm with DIV_ROUND_UP
> ---
>   drivers/net/virtio_net.c | 37 ++++++++++++++++++++++++-------------
>   1 file changed, 24 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index f831a0290998..dbffd5f56fb8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -225,6 +225,9 @@ struct virtnet_info {
>   	/* I like... big packets and I cannot lie! */
>   	bool big_packets;
>   
> +	/* number of sg entries allocated for big packets */
> +	unsigned int big_packets_num_skbfrags;
> +
>   	/* Host will merge rx buffers for big packets (shake it! shake it!) */
>   	bool mergeable_rx_bufs;
>   
> @@ -1331,10 +1334,10 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>   	char *p;
>   	int i, err, offset;
>   
> -	sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> +	sg_init_table(rq->sg, vi->big_packets_num_skbfrags + 2);
>   
> -	/* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> -	for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> +	/* page in rq->sg[vi->big_packets_num_skbfrags + 1] is list tail */
> +	for (i = vi->big_packets_num_skbfrags + 1; i > 1; --i) {
>   		first = get_a_page(rq, gfp);
>   		if (!first) {
>   			if (list)
> @@ -1365,7 +1368,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>   
>   	/* chain first in list head */
>   	first->private = (unsigned long)list;
> -	err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> +	err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags + 2,
>   				  first, gfp);
>   	if (err < 0)
>   		give_pages(rq, first);
> @@ -3690,13 +3693,27 @@ static bool virtnet_check_guest_gso(const struct virtnet_info *vi)
>   		virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO);
>   }
>   
> +static void virtnet_set_big_packets_fields(struct virtnet_info *vi, const int mtu)
> +{
> +	bool guest_gso = virtnet_check_guest_gso(vi);
> +
> +	/* If device can receive ANY guest GSO packets, regardless of mtu,
> +	 * allocate packets of maximum size, otherwise limit it to only
> +	 * mtu size worth only.
> +	 */
> +	if (mtu > ETH_DATA_LEN || guest_gso) {
> +		vi->big_packets = true;
> +		vi->big_packets_num_skbfrags = guest_gso ? MAX_SKB_FRAGS : DIV_ROUND_UP(mtu, PAGE_SIZE);
> +	}
> +}
> +
>   static int virtnet_probe(struct virtio_device *vdev)
>   {
>   	int i, err = -ENOMEM;
>   	struct net_device *dev;
>   	struct virtnet_info *vi;
>   	u16 max_queue_pairs;
> -	int mtu;
> +	int mtu = 0;
>   
>   	/* Find if host supports multiqueue/rss virtio_net device */
>   	max_queue_pairs = 1;
> @@ -3784,10 +3801,6 @@ static int virtnet_probe(struct virtio_device *vdev)
>   	INIT_WORK(&vi->config_work, virtnet_config_changed_work);
>   	spin_lock_init(&vi->refill_lock);
>   
> -	/* If we can receive ANY GSO packets, we must allocate large ones. */
> -	if (virtnet_check_guest_gso(vi))
> -		vi->big_packets = true;
> -
>   	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>   		vi->mergeable_rx_bufs = true;
>   
> @@ -3853,12 +3866,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>   
>   		dev->mtu = mtu;
>   		dev->max_mtu = mtu;
> -
> -		/* TODO: size buffers correctly in this case. */
> -		if (dev->mtu > ETH_DATA_LEN)
> -			vi->big_packets = true;
>   	}
>   
> +	virtnet_set_big_packets_fields(vi, mtu);
> +
>   	if (vi->any_header_sg)
>   		dev->needed_headroom = vi->hdr_len;
>   

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
       [not found] ` <20220901021038.84751-3-gavinl@nvidia.com>
  2022-09-07  2:17   ` [virtio-dev] [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets Jason Wang
@ 2022-09-07  5:31   ` Michael S. Tsirkin
       [not found]     ` <0355d1e4-a3cf-5b16-8c7f-b39b1ec14ade@nvidia.com>
  2022-09-22  9:35   ` Michael S. Tsirkin
  2 siblings, 1 reply; 29+ messages in thread
From: Michael S. Tsirkin @ 2022-09-07  5:31 UTC (permalink / raw)
  To: Gavin Li
  Cc: alexander.h.duyck, virtio-dev, sridhar.samudrala,
	jesse.brandeburg, gavi, virtualization, stephen, loseweigh,
	netdev, kuba, davem

On Thu, Sep 01, 2022 at 05:10:38AM +0300, Gavin Li wrote:
> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> packets even when GUEST_* offloads are not present on the device.
> However, if guest GSO is not supported, it would be sufficient to
> allocate segments to cover just up the MTU size and no further.
> Allocating the maximum amount of segments results in a large waste of
> buffer space in the queue, which limits the number of packets that can
> be buffered and can result in reduced performance.
> 
> Therefore, if guest GSO is not supported, use the MTU to calculate the
> optimal amount of segments required.
> 
> When guest offload is enabled at runtime, RQ already has packets of bytes
> less than 64K. So when packet of 64KB arrives, all the packets of such
> size will be dropped. and RQ is now not usable.
> 
> So this means that during set_guest_offloads() phase, RQs have to be
> destroyed and recreated, which requires almost driver reload.
> 
> If VIRTIO_NET_F_CTRL_GUEST_OFFLOADS has been negotiated, then it should
> always treat them as GSO enabled.
> 
> Accordingly, for now the assumption is that if guest GSO has been
> negotiated then it has been enabled, even if it's actually been disabled
> at runtime through VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
> 
> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> 1 VQ, queue size 1024, before and after the change, with the iperf
> server running over the virtio-net interface.
> 
> MTU(Bytes)/Bandwidth (Gbit/s)
>              Before   After
>   1500        22.5     22.4
>   9000        12.8     25.9
> 
> Signed-off-by: Gavin Li <gavinl@nvidia.com>
> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>


Which configurations were tested?
Did you test devices without VIRTIO_NET_F_MTU ?

> ---
> changelog:
> v4->v5
> - Addressed comments from Michael S. Tsirkin
> - Improve commit message
> v3->v4
> - Addressed comments from Si-Wei
> - Rename big_packets_sg_num with big_packets_num_skbfrags
> v2->v3
> - Addressed comments from Si-Wei
> - Simplify the condition check to enable the optimization
> v1->v2
> - Addressed comments from Jason, Michael, Si-Wei.
> - Remove the flag of guest GSO support, set sg_num for big packets and
>   use it directly
> - Recalculate sg_num for big packets in virtnet_set_guest_offloads
> - Replace the round up algorithm with DIV_ROUND_UP
> ---
>  drivers/net/virtio_net.c | 37 ++++++++++++++++++++++++-------------
>  1 file changed, 24 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index f831a0290998..dbffd5f56fb8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -225,6 +225,9 @@ struct virtnet_info {
>  	/* I like... big packets and I cannot lie! */
>  	bool big_packets;
>  
> +	/* number of sg entries allocated for big packets */
> +	unsigned int big_packets_num_skbfrags;
> +
>  	/* Host will merge rx buffers for big packets (shake it! shake it!) */
>  	bool mergeable_rx_bufs;
>  
> @@ -1331,10 +1334,10 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>  	char *p;
>  	int i, err, offset;
>  
> -	sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> +	sg_init_table(rq->sg, vi->big_packets_num_skbfrags + 2);
>  
> -	/* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> -	for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> +	/* page in rq->sg[vi->big_packets_num_skbfrags + 1] is list tail */
> +	for (i = vi->big_packets_num_skbfrags + 1; i > 1; --i) {
>  		first = get_a_page(rq, gfp);
>  		if (!first) {
>  			if (list)
> @@ -1365,7 +1368,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>  
>  	/* chain first in list head */
>  	first->private = (unsigned long)list;
> -	err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> +	err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags + 2,
>  				  first, gfp);
>  	if (err < 0)
>  		give_pages(rq, first);
> @@ -3690,13 +3693,27 @@ static bool virtnet_check_guest_gso(const struct virtnet_info *vi)
>  		virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO);
>  }
>  
> +static void virtnet_set_big_packets_fields(struct virtnet_info *vi, const int mtu)


I'd rename this to just virtnet_set_big_packets.


> +{
> +	bool guest_gso = virtnet_check_guest_gso(vi);
> +
> +	/* If device can receive ANY guest GSO packets, regardless of mtu,
> +	 * allocate packets of maximum size, otherwise limit it to only
> +	 * mtu size worth only.
> +	 */
> +	if (mtu > ETH_DATA_LEN || guest_gso) {
> +		vi->big_packets = true;
> +		vi->big_packets_num_skbfrags = guest_gso ? MAX_SKB_FRAGS : DIV_ROUND_UP(mtu, PAGE_SIZE);
> +	}
> +}
> +
>  static int virtnet_probe(struct virtio_device *vdev)
>  {
>  	int i, err = -ENOMEM;
>  	struct net_device *dev;
>  	struct virtnet_info *vi;
>  	u16 max_queue_pairs;
> -	int mtu;
> +	int mtu = 0;
>  
>  	/* Find if host supports multiqueue/rss virtio_net device */
>  	max_queue_pairs = 1;
> @@ -3784,10 +3801,6 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	INIT_WORK(&vi->config_work, virtnet_config_changed_work);
>  	spin_lock_init(&vi->refill_lock);
>  
> -	/* If we can receive ANY GSO packets, we must allocate large ones. */
> -	if (virtnet_check_guest_gso(vi))
> -		vi->big_packets = true;
> -
>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>  		vi->mergeable_rx_bufs = true;
>  
> @@ -3853,12 +3866,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>  
>  		dev->mtu = mtu;
>  		dev->max_mtu = mtu;
> -
> -		/* TODO: size buffers correctly in this case. */
> -		if (dev->mtu > ETH_DATA_LEN)
> -			vi->big_packets = true;
>  	}
>  
> +	virtnet_set_big_packets_fields(vi, mtu);
> +

If VIRTIO_NET_F_MTU is off, then mtu is uninitialized.
You should move it to within if () above to fix.


>  	if (vi->any_header_sg)
>  		dev->needed_headroom = vi->hdr_len;
>  
> -- 
> 2.31.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

[parent not found: <0355d1e4-a3cf-5b16-8c7f-b39b1ec14ade@nvidia.com>]

* Re: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
       [not found]     ` <0355d1e4-a3cf-5b16-8c7f-b39b1ec14ade@nvidia.com>
@ 2022-09-07  9:26       ` Michael S. Tsirkin
  2022-09-07 14:08         ` Parav Pandit via Virtualization
  0 siblings, 1 reply; 29+ messages in thread
From: Michael S. Tsirkin @ 2022-09-07  9:26 UTC (permalink / raw)
  To: Gavin Li
  Cc: alexander.h.duyck, virtio-dev, sridhar.samudrala,
	jesse.brandeburg, gavi, virtualization, stephen, loseweigh,
	netdev, kuba, davem

On Wed, Sep 07, 2022 at 04:08:54PM +0800, Gavin Li wrote:
> 
> On 9/7/2022 1:31 PM, Michael S. Tsirkin wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Thu, Sep 01, 2022 at 05:10:38AM +0300, Gavin Li wrote:
> > > Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> > > packets even when GUEST_* offloads are not present on the device.
> > > However, if guest GSO is not supported, it would be sufficient to
> > > allocate segments to cover just up the MTU size and no further.
> > > Allocating the maximum amount of segments results in a large waste of
> > > buffer space in the queue, which limits the number of packets that can
> > > be buffered and can result in reduced performance.

actually how does this waste space? Is this because your
device does not have INDIRECT?

> > > 
> > > Therefore, if guest GSO is not supported, use the MTU to calculate the
> > > optimal amount of segments required.
> > > 
> > > When guest offload is enabled at runtime, RQ already has packets of bytes
> > > less than 64K. So when packet of 64KB arrives, all the packets of such
> > > size will be dropped. and RQ is now not usable.
> > > 
> > > So this means that during set_guest_offloads() phase, RQs have to be
> > > destroyed and recreated, which requires almost driver reload.
> > > 
> > > If VIRTIO_NET_F_CTRL_GUEST_OFFLOADS has been negotiated, then it should
> > > always treat them as GSO enabled.
> > > 
> > > Accordingly, for now the assumption is that if guest GSO has been
> > > negotiated then it has been enabled, even if it's actually been disabled
> > > at runtime through VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
> > > 
> > > Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> > > 1 VQ, queue size 1024, before and after the change, with the iperf
> > > server running over the virtio-net interface.
> > > 
> > > MTU(Bytes)/Bandwidth (Gbit/s)
> > >               Before   After
> > >    1500        22.5     22.4
> > >    9000        12.8     25.9


is this buffer space?
just the overhead of allocating/freeing the buffers?
of using INDIRECT?


> > > 
> > > Signed-off-by: Gavin Li <gavinl@nvidia.com>
> > > Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > 
> > Which configurations were tested?
> I tested it with DPDK vDPA + qemu vhost. Do you mean the feature set of the
> VM?

yes

> > Did you test devices without VIRTIO_NET_F_MTU ?
> No.  It will need code changes.


Testing to make sure nothing broke should not require changes.

> > 
> > > ---
> > > changelog:
> > > v4->v5
> > > - Addressed comments from Michael S. Tsirkin
> > > - Improve commit message
> > > v3->v4
> > > - Addressed comments from Si-Wei
> > > - Rename big_packets_sg_num with big_packets_num_skbfrags
> > > v2->v3
> > > - Addressed comments from Si-Wei
> > > - Simplify the condition check to enable the optimization
> > > v1->v2
> > > - Addressed comments from Jason, Michael, Si-Wei.
> > > - Remove the flag of guest GSO support, set sg_num for big packets and
> > >    use it directly
> > > - Recalculate sg_num for big packets in virtnet_set_guest_offloads
> > > - Replace the round up algorithm with DIV_ROUND_UP
> > > ---
> > >   drivers/net/virtio_net.c | 37 ++++++++++++++++++++++++-------------
> > >   1 file changed, 24 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index f831a0290998..dbffd5f56fb8 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -225,6 +225,9 @@ struct virtnet_info {
> > >        /* I like... big packets and I cannot lie! */
> > >        bool big_packets;
> > > 
> > > +     /* number of sg entries allocated for big packets */
> > > +     unsigned int big_packets_num_skbfrags;
> > > +
> > >        /* Host will merge rx buffers for big packets (shake it! shake it!) */
> > >        bool mergeable_rx_bufs;
> > > 
> > > @@ -1331,10 +1334,10 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> > >        char *p;
> > >        int i, err, offset;
> > > 
> > > -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> > > +     sg_init_table(rq->sg, vi->big_packets_num_skbfrags + 2);
> > > 
> > > -     /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> > > -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> > > +     /* page in rq->sg[vi->big_packets_num_skbfrags + 1] is list tail */
> > > +     for (i = vi->big_packets_num_skbfrags + 1; i > 1; --i) {
> > >                first = get_a_page(rq, gfp);
> > >                if (!first) {
> > >                        if (list)
> > > @@ -1365,7 +1368,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> > > 
> > >        /* chain first in list head */
> > >        first->private = (unsigned long)list;
> > > -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> > > +     err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags + 2,
> > >                                  first, gfp);
> > >        if (err < 0)
> > >                give_pages(rq, first);
> > > @@ -3690,13 +3693,27 @@ static bool virtnet_check_guest_gso(const struct virtnet_info *vi)
> > >                virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO);
> > >   }
> > > 
> > > +static void virtnet_set_big_packets_fields(struct virtnet_info *vi, const int mtu)
> > 
> > I'd rename this to just virtnet_set_big_packets.
> ACK
> > 
> > 
> > > +{
> > > +     bool guest_gso = virtnet_check_guest_gso(vi);
> > > +
> > > +     /* If device can receive ANY guest GSO packets, regardless of mtu,
> > > +      * allocate packets of maximum size, otherwise limit it to only
> > > +      * mtu size worth only.
> > > +      */
> > > +     if (mtu > ETH_DATA_LEN || guest_gso) {
> > > +             vi->big_packets = true;
> > > +             vi->big_packets_num_skbfrags = guest_gso ? MAX_SKB_FRAGS : DIV_ROUND_UP(mtu, PAGE_SIZE);
> > > +     }
> > > +}
> > > +
> > >   static int virtnet_probe(struct virtio_device *vdev)
> > >   {
> > >        int i, err = -ENOMEM;
> > >        struct net_device *dev;
> > >        struct virtnet_info *vi;
> > >        u16 max_queue_pairs;
> > > -     int mtu;
> > > +     int mtu = 0;
> > > 
> > >        /* Find if host supports multiqueue/rss virtio_net device */
> > >        max_queue_pairs = 1;
> > > @@ -3784,10 +3801,6 @@ static int virtnet_probe(struct virtio_device *vdev)
> > >        INIT_WORK(&vi->config_work, virtnet_config_changed_work);
> > >        spin_lock_init(&vi->refill_lock);
> > > 
> > > -     /* If we can receive ANY GSO packets, we must allocate large ones. */
> > > -     if (virtnet_check_guest_gso(vi))
> > > -             vi->big_packets = true;
> > > -
> > >        if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> > >                vi->mergeable_rx_bufs = true;
> > > 
> > > @@ -3853,12 +3866,10 @@ static int virtnet_probe(struct virtio_device *vdev)
> > > 
> > >                dev->mtu = mtu;
> > >                dev->max_mtu = mtu;
> > > -
> > > -             /* TODO: size buffers correctly in this case. */
> > > -             if (dev->mtu > ETH_DATA_LEN)
> > > -                     vi->big_packets = true;
> > >        }
> > > 
> > > +     virtnet_set_big_packets_fields(vi, mtu);
> > > +
> > If VIRTIO_NET_F_MTU is off, then mtu is uninitialized.
> > You should move it to within if () above to fix.
> mtu was initialized to 0 at the beginning of probe if VIRTIO_NET_F_MTU is
> off.
> 
> In this case,  big_packets_num_skbfrags will be set according to guest gso.
> 
> If guest gso is supported, it will be set to MAX_SKB_FRAGS else zero---- do
> you
> 
> think this is a bug to be fixed?


yes I think with no mtu this should behave as it did
historically.

> > 
> > >        if (vi->any_header_sg)
> > >                dev->needed_headroom = vi->hdr_len;
> > > 
> > > --
> > > 2.31.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07  9:26       ` Michael S. Tsirkin
@ 2022-09-07 14:08         ` Parav Pandit via Virtualization
  2022-09-07 14:29           ` Michael S. Tsirkin
  0 siblings, 1 reply; 29+ messages in thread
From: Parav Pandit via Virtualization @ 2022-09-07 14:08 UTC (permalink / raw)
  To: Michael S. Tsirkin, Gavin Li
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, September 7, 2022 5:27 AM
> 
> On Wed, Sep 07, 2022 at 04:08:54PM +0800, Gavin Li wrote:
> >
> > On 9/7/2022 1:31 PM, Michael S. Tsirkin wrote:
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Thu, Sep 01, 2022 at 05:10:38AM +0300, Gavin Li wrote:
> > > > Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for
> > > > big packets even when GUEST_* offloads are not present on the
> device.
> > > > However, if guest GSO is not supported, it would be sufficient to
> > > > allocate segments to cover just up the MTU size and no further.
> > > > Allocating the maximum amount of segments results in a large waste
> > > > of buffer space in the queue, which limits the number of packets
> > > > that can be buffered and can result in reduced performance.
> 
> actually how does this waste space? Is this because your device does not
> have INDIRECT?
VQ is 256 entries deep.
Driver posted total of 256 descriptors.
Each descriptor points to a page of 4K.
These descriptors are chained as 4K * 16.
So total packets that can be serviced are 256/16 = 16.
So effective queue depth = 16.

So, when GSO is off, for 9K mtu, packet buffer needed = 3 pages. (12k).
So, 13 descriptors (= 13 x 4K =52K) per packet buffer is wasted.

After this improvement, these 13 descriptors are available, increasing the effective queue depth = 256/3 = 85.

[..]
> > > >
> > > > MTU(Bytes)/Bandwidth (Gbit/s)
> > > >               Before   After
> > > >    1500        22.5     22.4
> > > >    9000        12.8     25.9
> 
> 
> is this buffer space?
Above performance numbers are showing improvement in bandwidth. In Gbps/sec.

> just the overhead of allocating/freeing the buffers?
> of using INDIRECT?
The effective queue depth is so small, device cannot receive all the packets at given bw-delay product.

> > >
> > > Which configurations were tested?
> > I tested it with DPDK vDPA + qemu vhost. Do you mean the feature set
> > of the VM?
> 
The configuration of interest is mtu, not the backend.
Which is different mtu as shown in above perf numbers.

> > > Did you test devices without VIRTIO_NET_F_MTU ?
> > No.  It will need code changes.
No. It doesn't need any code changes. This is misleading/vague.

This patch doesn't have any relation to a device which doesn't offer VIRTIO_NET_F_MTU.
Just the code restructuring is touching this area, that may require some existing tests.
I assume virtio tree will have some automation tests for such a device?

> > > >
> > > > @@ -3853,12 +3866,10 @@ static int virtnet_probe(struct
> > > > virtio_device *vdev)
> > > >
> > > >                dev->mtu = mtu;
> > > >                dev->max_mtu = mtu;
> > > > -
> > > > -             /* TODO: size buffers correctly in this case. */
> > > > -             if (dev->mtu > ETH_DATA_LEN)
> > > > -                     vi->big_packets = true;
> > > >        }
> > > >
> > > > +     virtnet_set_big_packets_fields(vi, mtu);
> > > > +
> > > If VIRTIO_NET_F_MTU is off, then mtu is uninitialized.
> > > You should move it to within if () above to fix.
> > mtu was initialized to 0 at the beginning of probe if VIRTIO_NET_F_MTU
> > is off.
> >
> > In this case,  big_packets_num_skbfrags will be set according to guest gso.
> >
> > If guest gso is supported, it will be set to MAX_SKB_FRAGS else
> > zero---- do you
> >
> > think this is a bug to be fixed?
> 
> 
> yes I think with no mtu this should behave as it did historically.
> 
Michael is right.
It should behave as today. There is no new bug introduced by this patch.
dev->mtu and dev->max_mtu is set only when VIRTIO_NET_F_MTU is offered with/without this patch.

Please have mtu related fix/change in different patch.

> > >
> > > >        if (vi->any_header_sg)
> > > >                dev->needed_headroom = vi->hdr_len;
> > > >
> > > > --
> > > > 2.31.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 14:08         ` Parav Pandit via Virtualization
@ 2022-09-07 14:29           ` Michael S. Tsirkin
  2022-09-07 14:33             ` Parav Pandit via Virtualization
  0 siblings, 1 reply; 29+ messages in thread
From: Michael S. Tsirkin @ 2022-09-07 14:29 UTC (permalink / raw)
  To: Parav Pandit
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li

On Wed, Sep 07, 2022 at 02:08:18PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, September 7, 2022 5:27 AM
> > 
> > On Wed, Sep 07, 2022 at 04:08:54PM +0800, Gavin Li wrote:
> > >
> > > On 9/7/2022 1:31 PM, Michael S. Tsirkin wrote:
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > On Thu, Sep 01, 2022 at 05:10:38AM +0300, Gavin Li wrote:
> > > > > Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for
> > > > > big packets even when GUEST_* offloads are not present on the
> > device.
> > > > > However, if guest GSO is not supported, it would be sufficient to
> > > > > allocate segments to cover just up the MTU size and no further.
> > > > > Allocating the maximum amount of segments results in a large waste
> > > > > of buffer space in the queue, which limits the number of packets
> > > > > that can be buffered and can result in reduced performance.
> > 
> > actually how does this waste space? Is this because your device does not
> > have INDIRECT?
> VQ is 256 entries deep.
> Driver posted total of 256 descriptors.
> Each descriptor points to a page of 4K.
> These descriptors are chained as 4K * 16.

So without indirect then? with indirect each descriptor can
point to 16 entries.

> So total packets that can be serviced are 256/16 = 16.
> So effective queue depth = 16.
> 
> So, when GSO is off, for 9K mtu, packet buffer needed = 3 pages. (12k).
> So, 13 descriptors (= 13 x 4K =52K) per packet buffer is wasted.
> 
> After this improvement, these 13 descriptors are available, increasing the effective queue depth = 256/3 = 85.
> 
> [..]
> > > > >
> > > > > MTU(Bytes)/Bandwidth (Gbit/s)
> > > > >               Before   After
> > > > >    1500        22.5     22.4
> > > > >    9000        12.8     25.9
> > 
> > 
> > is this buffer space?
> Above performance numbers are showing improvement in bandwidth. In Gbps/sec.
> 
> > just the overhead of allocating/freeing the buffers?
> > of using INDIRECT?
> The effective queue depth is so small, device cannot receive all the packets at given bw-delay product.
> 
> > > >
> > > > Which configurations were tested?
> > > I tested it with DPDK vDPA + qemu vhost. Do you mean the feature set
> > > of the VM?
> > 
> The configuration of interest is mtu, not the backend.
> Which is different mtu as shown in above perf numbers.
> 
> > > > Did you test devices without VIRTIO_NET_F_MTU ?
> > > No.  It will need code changes.
> No. It doesn't need any code changes. This is misleading/vague.
> 
> This patch doesn't have any relation to a device which doesn't offer VIRTIO_NET_F_MTU.
> Just the code restructuring is touching this area, that may require some existing tests.
> I assume virtio tree will have some automation tests for such a device?

I have some automated tests but I also expect developer to do testing.

> > > > >
> > > > > @@ -3853,12 +3866,10 @@ static int virtnet_probe(struct
> > > > > virtio_device *vdev)
> > > > >
> > > > >                dev->mtu = mtu;
> > > > >                dev->max_mtu = mtu;
> > > > > -
> > > > > -             /* TODO: size buffers correctly in this case. */
> > > > > -             if (dev->mtu > ETH_DATA_LEN)
> > > > > -                     vi->big_packets = true;
> > > > >        }
> > > > >
> > > > > +     virtnet_set_big_packets_fields(vi, mtu);
> > > > > +
> > > > If VIRTIO_NET_F_MTU is off, then mtu is uninitialized.
> > > > You should move it to within if () above to fix.
> > > mtu was initialized to 0 at the beginning of probe if VIRTIO_NET_F_MTU
> > > is off.
> > >
> > > In this case,  big_packets_num_skbfrags will be set according to guest gso.
> > >
> > > If guest gso is supported, it will be set to MAX_SKB_FRAGS else
> > > zero---- do you
> > >
> > > think this is a bug to be fixed?
> > 
> > 
> > yes I think with no mtu this should behave as it did historically.
> > 
> Michael is right.
> It should behave as today. There is no new bug introduced by this patch.
> dev->mtu and dev->max_mtu is set only when VIRTIO_NET_F_MTU is offered with/without this patch.
> 
> Please have mtu related fix/change in different patch.
> 
> > > >
> > > > >        if (vi->any_header_sg)
> > > > >                dev->needed_headroom = vi->hdr_len;
> > > > >
> > > > > --
> > > > > 2.31.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 14:29           ` Michael S. Tsirkin
@ 2022-09-07 14:33             ` Parav Pandit via Virtualization
  2022-09-07 14:40               ` Michael S. Tsirkin
  0 siblings, 1 reply; 29+ messages in thread
From: Parav Pandit via Virtualization @ 2022-09-07 14:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, September 7, 2022 10:30 AM

[..]
> > > actually how does this waste space? Is this because your device does
> > > not have INDIRECT?
> > VQ is 256 entries deep.
> > Driver posted total of 256 descriptors.
> > Each descriptor points to a page of 4K.
> > These descriptors are chained as 4K * 16.
> 
> So without indirect then? with indirect each descriptor can point to 16
> entries.
> 
With indirect, can it post 256 * 16 size buffers even though vq depth is 256 entries?
I recall that total number of descriptors with direct/indirect descriptors is limited to vq depth.
Was there some recent clarification occurred in the spec to clarify this?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 14:33             ` Parav Pandit via Virtualization
@ 2022-09-07 14:40               ` Michael S. Tsirkin
  2022-09-07 16:12                 ` Parav Pandit via Virtualization
  0 siblings, 1 reply; 29+ messages in thread
From: Michael S. Tsirkin @ 2022-09-07 14:40 UTC (permalink / raw)
  To: Parav Pandit
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li

On Wed, Sep 07, 2022 at 02:33:02PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, September 7, 2022 10:30 AM
> 
> [..]
> > > > actually how does this waste space? Is this because your device does
> > > > not have INDIRECT?
> > > VQ is 256 entries deep.
> > > Driver posted total of 256 descriptors.
> > > Each descriptor points to a page of 4K.
> > > These descriptors are chained as 4K * 16.
> > 
> > So without indirect then? with indirect each descriptor can point to 16
> > entries.
> > 
> With indirect, can it post 256 * 16 size buffers even though vq depth is 256 entries?
> I recall that total number of descriptors with direct/indirect descriptors is limited to vq depth.


> Was there some recent clarification occurred in the spec to clarify this?


This would make INDIRECT completely pointless.  So I don't think we ever
had such a limitation.
The only thing that comes to mind is this:

	A driver MUST NOT create a descriptor chain longer than the Queue Size of
	the device.

but this limits individual chain length not the total length
of all chains.

We have an open bug that we forgot to include this requirement
in the packed ring documentation.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 14:40               ` Michael S. Tsirkin
@ 2022-09-07 16:12                 ` Parav Pandit via Virtualization
  2022-09-07 18:15                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 29+ messages in thread
From: Parav Pandit via Virtualization @ 2022-09-07 16:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, September 7, 2022 10:40 AM
> 
> On Wed, Sep 07, 2022 at 02:33:02PM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Wednesday, September 7, 2022 10:30 AM
> >
> > [..]
> > > > > actually how does this waste space? Is this because your device
> > > > > does not have INDIRECT?
> > > > VQ is 256 entries deep.
> > > > Driver posted total of 256 descriptors.
> > > > Each descriptor points to a page of 4K.
> > > > These descriptors are chained as 4K * 16.
> > >
> > > So without indirect then? with indirect each descriptor can point to
> > > 16 entries.
> > >
> > With indirect, can it post 256 * 16 size buffers even though vq depth is 256
> entries?
> > I recall that total number of descriptors with direct/indirect descriptors is
> limited to vq depth.
> 
> 
> > Was there some recent clarification occurred in the spec to clarify this?
> 
> 
> This would make INDIRECT completely pointless.  So I don't think we ever
> had such a limitation.
> The only thing that comes to mind is this:
> 
> 	A driver MUST NOT create a descriptor chain longer than the Queue
> Size of
> 	the device.
> 
> but this limits individual chain length not the total length of all chains.
> 
Right.
I double checked in virtqueue_add_split() which doesn't count table entries towards desc count of VQ for indirect case.

With indirect descriptors without this patch the situation is even worse with memory usage.
Driver will allocate 64K * 256 = 16MB buffer per VQ, while needed (and used) buffer is only 2.3 Mbytes.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 16:12                 ` Parav Pandit via Virtualization
@ 2022-09-07 18:15                   ` Michael S. Tsirkin
  2022-09-07 19:06                     ` Parav Pandit via Virtualization
  0 siblings, 1 reply; 29+ messages in thread
From: Michael S. Tsirkin @ 2022-09-07 18:15 UTC (permalink / raw)
  To: Parav Pandit
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li

On Wed, Sep 07, 2022 at 04:12:47PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, September 7, 2022 10:40 AM
> > 
> > On Wed, Sep 07, 2022 at 02:33:02PM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Wednesday, September 7, 2022 10:30 AM
> > >
> > > [..]
> > > > > > actually how does this waste space? Is this because your device
> > > > > > does not have INDIRECT?
> > > > > VQ is 256 entries deep.
> > > > > Driver posted total of 256 descriptors.
> > > > > Each descriptor points to a page of 4K.
> > > > > These descriptors are chained as 4K * 16.
> > > >
> > > > So without indirect then? with indirect each descriptor can point to
> > > > 16 entries.
> > > >
> > > With indirect, can it post 256 * 16 size buffers even though vq depth is 256
> > entries?
> > > I recall that total number of descriptors with direct/indirect descriptors is
> > limited to vq depth.
> > 
> > 
> > > Was there some recent clarification occurred in the spec to clarify this?
> > 
> > 
> > This would make INDIRECT completely pointless.  So I don't think we ever
> > had such a limitation.
> > The only thing that comes to mind is this:
> > 
> > 	A driver MUST NOT create a descriptor chain longer than the Queue
> > Size of
> > 	the device.
> > 
> > but this limits individual chain length not the total length of all chains.
> > 
> Right.
> I double checked in virtqueue_add_split() which doesn't count table entries towards desc count of VQ for indirect case.
> 
> With indirect descriptors without this patch the situation is even worse with memory usage.
> Driver will allocate 64K * 256 = 16MB buffer per VQ, while needed (and used) buffer is only 2.3 Mbytes.

Yes. So just so we understand the reason for the performance improvement
is this because of memory usage? Or is this because device does not
have INDIRECT?

Thanks,

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 18:15                   ` Michael S. Tsirkin
@ 2022-09-07 19:06                     ` Parav Pandit via Virtualization
  2022-09-07 19:11                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 29+ messages in thread
From: Parav Pandit via Virtualization @ 2022-09-07 19:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li

> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, September 7, 2022 2:16 PM
> 
> On Wed, Sep 07, 2022 at 04:12:47PM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Wednesday, September 7, 2022 10:40 AM
> > >
> > > On Wed, Sep 07, 2022 at 02:33:02PM +0000, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Wednesday, September 7, 2022 10:30 AM
> > > >
> > > > [..]
> > > > > > > actually how does this waste space? Is this because your
> > > > > > > device does not have INDIRECT?
> > > > > > VQ is 256 entries deep.
> > > > > > Driver posted total of 256 descriptors.
> > > > > > Each descriptor points to a page of 4K.
> > > > > > These descriptors are chained as 4K * 16.
> > > > >
> > > > > So without indirect then? with indirect each descriptor can
> > > > > point to
> > > > > 16 entries.
> > > > >
> > > > With indirect, can it post 256 * 16 size buffers even though vq
> > > > depth is 256
> > > entries?
> > > > I recall that total number of descriptors with direct/indirect
> > > > descriptors is
> > > limited to vq depth.
> > >
> > >
> > > > Was there some recent clarification occurred in the spec to clarify this?
> > >
> > >
> > > This would make INDIRECT completely pointless.  So I don't think we
> > > ever had such a limitation.
> > > The only thing that comes to mind is this:
> > >
> > > 	A driver MUST NOT create a descriptor chain longer than the Queue
> > > Size of
> > > 	the device.
> > >
> > > but this limits individual chain length not the total length of all chains.
> > >
> > Right.
> > I double checked in virtqueue_add_split() which doesn't count table
> entries towards desc count of VQ for indirect case.
> >
> > With indirect descriptors without this patch the situation is even worse
> with memory usage.
> > Driver will allocate 64K * 256 = 16MB buffer per VQ, while needed (and
> used) buffer is only 2.3 Mbytes.
> 
> Yes. So just so we understand the reason for the performance improvement
> is this because of memory usage? Or is this because device does not have
> INDIRECT?

Because of shallow queue of 16 entries deep.
With driver turn around time to repost buffers, device is idle without any RQ buffers.
With this improvement, device has 85 buffers instead of 16 to receive packets.

Enabling indirect in device can help at cost of 7x higher memory per VQ in the guest VM.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 19:06                     ` Parav Pandit via Virtualization
@ 2022-09-07 19:11                       ` Michael S. Tsirkin
  2022-09-07 19:18                         ` Parav Pandit via Virtualization
  0 siblings, 1 reply; 29+ messages in thread
From: Michael S. Tsirkin @ 2022-09-07 19:11 UTC (permalink / raw)
  To: Parav Pandit
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li

On Wed, Sep 07, 2022 at 07:06:09PM +0000, Parav Pandit wrote:
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, September 7, 2022 2:16 PM
> > 
> > On Wed, Sep 07, 2022 at 04:12:47PM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Wednesday, September 7, 2022 10:40 AM
> > > >
> > > > On Wed, Sep 07, 2022 at 02:33:02PM +0000, Parav Pandit wrote:
> > > > >
> > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > Sent: Wednesday, September 7, 2022 10:30 AM
> > > > >
> > > > > [..]
> > > > > > > > actually how does this waste space? Is this because your
> > > > > > > > device does not have INDIRECT?
> > > > > > > VQ is 256 entries deep.
> > > > > > > Driver posted total of 256 descriptors.
> > > > > > > Each descriptor points to a page of 4K.
> > > > > > > These descriptors are chained as 4K * 16.
> > > > > >
> > > > > > So without indirect then? with indirect each descriptor can
> > > > > > point to
> > > > > > 16 entries.
> > > > > >
> > > > > With indirect, can it post 256 * 16 size buffers even though vq
> > > > > depth is 256
> > > > entries?
> > > > > I recall that total number of descriptors with direct/indirect
> > > > > descriptors is
> > > > limited to vq depth.
> > > >
> > > >
> > > > > Was there some recent clarification occurred in the spec to clarify this?
> > > >
> > > >
> > > > This would make INDIRECT completely pointless.  So I don't think we
> > > > ever had such a limitation.
> > > > The only thing that comes to mind is this:
> > > >
> > > > 	A driver MUST NOT create a descriptor chain longer than the Queue
> > > > Size of
> > > > 	the device.
> > > >
> > > > but this limits individual chain length not the total length of all chains.
> > > >
> > > Right.
> > > I double checked in virtqueue_add_split() which doesn't count table
> > entries towards desc count of VQ for indirect case.
> > >
> > > With indirect descriptors without this patch the situation is even worse
> > with memory usage.
> > > Driver will allocate 64K * 256 = 16MB buffer per VQ, while needed (and
> > used) buffer is only 2.3 Mbytes.
> > 
> > Yes. So just so we understand the reason for the performance improvement
> > is this because of memory usage? Or is this because device does not have
> > INDIRECT?
> 
> Because of shallow queue of 16 entries deep.

but why is the queue just 16 entries?
does the device not support indirect?

because with indirect you get 256 entries, with 16 s/g each.


> With driver turn around time to repost buffers, device is idle without any RQ buffers.
> With this improvement, device has 85 buffers instead of 16 to receive packets.
> 
> Enabling indirect in device can help at cost of 7x higher memory per VQ in the guest VM.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 19:11                       ` Michael S. Tsirkin
@ 2022-09-07 19:18                         ` Parav Pandit via Virtualization
  2022-09-07 19:23                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 29+ messages in thread
From: Parav Pandit via Virtualization @ 2022-09-07 19:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, September 7, 2022 3:12 PM

> > Because of shallow queue of 16 entries deep.
> 
> but why is the queue just 16 entries?
I explained the calculation in [1] about 16 entries.

[1] https://lore.kernel.org/netdev/PH0PR12MB54812EC7F4711C1EA4CAA119DC419@PH0PR12MB5481.namprd12.prod.outlook.com/

> does the device not support indirect?
>
Yes, indirect feature bit is disabled on the device.
 
> because with indirect you get 256 entries, with 16 s/g each.
> 
Sure. I explained below that indirect comes with 7x memory cost that is not desired.
(Ignored the table memory allocation cost and extra latency).

Hence don't want to enable indirect in this scenario.
This optimization also works with indirect with smaller indirect table.

> 
> > With driver turn around time to repost buffers, device is idle without any
> RQ buffers.
> > With this improvement, device has 85 buffers instead of 16 to receive
> packets.
> >
> > Enabling indirect in device can help at cost of 7x higher memory per VQ in
> the guest VM.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 19:18                         ` Parav Pandit via Virtualization
@ 2022-09-07 19:23                           ` Michael S. Tsirkin
  2022-09-07 19:27                             ` Parav Pandit via Virtualization
  0 siblings, 1 reply; 29+ messages in thread
From: Michael S. Tsirkin @ 2022-09-07 19:23 UTC (permalink / raw)
  To: Parav Pandit
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li

On Wed, Sep 07, 2022 at 07:18:06PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, September 7, 2022 3:12 PM
> 
> > > Because of shallow queue of 16 entries deep.
> > 
> > but why is the queue just 16 entries?
> I explained the calculation in [1] about 16 entries.
> 
> [1] https://lore.kernel.org/netdev/PH0PR12MB54812EC7F4711C1EA4CAA119DC419@PH0PR12MB5481.namprd12.prod.outlook.com/
> 
> > does the device not support indirect?
> >
> Yes, indirect feature bit is disabled on the device.

OK that explains it.

> > because with indirect you get 256 entries, with 16 s/g each.
> > 
> Sure. I explained below that indirect comes with 7x memory cost that is not desired.
> (Ignored the table memory allocation cost and extra latency).

Oh sure, it's a waste. I wonder what effect does the patch have
on bandwidth with indirect enabled though.


> Hence don't want to enable indirect in this scenario.
> This optimization also works with indirect with smaller indirect table.
> 
> > 
> > > With driver turn around time to repost buffers, device is idle without any
> > RQ buffers.
> > > With this improvement, device has 85 buffers instead of 16 to receive
> > packets.
> > >
> > > Enabling indirect in device can help at cost of 7x higher memory per VQ in
> > the guest VM.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 19:23                           ` Michael S. Tsirkin
@ 2022-09-07 19:27                             ` Parav Pandit via Virtualization
  2022-09-07 19:36                               ` Michael S. Tsirkin
  0 siblings, 1 reply; 29+ messages in thread
From: Parav Pandit via Virtualization @ 2022-09-07 19:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, September 7, 2022 3:24 PM
> 
> On Wed, Sep 07, 2022 at 07:18:06PM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Wednesday, September 7, 2022 3:12 PM
> >
> > > > Because of shallow queue of 16 entries deep.
> > >
> > > but why is the queue just 16 entries?
> > I explained the calculation in [1] about 16 entries.
> >
> > [1]
> >
> https://lore.kernel.org/netdev/PH0PR12MB54812EC7F4711C1EA4CAA119DC
> 419@
> > PH0PR12MB5481.namprd12.prod.outlook.com/
> >
> > > does the device not support indirect?
> > >
> > Yes, indirect feature bit is disabled on the device.
> 
> OK that explains it.

So can we proceed with v6 to contain 
(a) updated commit message and
(b) function name change you suggested to drop _fields suffix?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 19:27                             ` Parav Pandit via Virtualization
@ 2022-09-07 19:36                               ` Michael S. Tsirkin
  2022-09-07 19:37                                 ` Michael S. Tsirkin
                                                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Michael S. Tsirkin @ 2022-09-07 19:36 UTC (permalink / raw)
  To: Parav Pandit
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li

On Wed, Sep 07, 2022 at 07:27:16PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, September 7, 2022 3:24 PM
> > 
> > On Wed, Sep 07, 2022 at 07:18:06PM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Wednesday, September 7, 2022 3:12 PM
> > >
> > > > > Because of shallow queue of 16 entries deep.
> > > >
> > > > but why is the queue just 16 entries?
> > > I explained the calculation in [1] about 16 entries.
> > >
> > > [1]
> > >
> > https://lore.kernel.org/netdev/PH0PR12MB54812EC7F4711C1EA4CAA119DC
> > 419@
> > > PH0PR12MB5481.namprd12.prod.outlook.com/
> > >
> > > > does the device not support indirect?
> > > >
> > > Yes, indirect feature bit is disabled on the device.
> > 
> > OK that explains it.
> 
> So can we proceed with v6 to contain 
> (a) updated commit message and
> (b) function name change you suggested to drop _fields suffix?

(c) replace mtu = 0 with sensibly not calling the function
when mtu is unknown.


And I'd like commit log to include results of perf testing
- with indirect feature on
- with mtu feature off
just to make sure nothing breaks.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 19:36                               ` Michael S. Tsirkin
@ 2022-09-07 19:37                                 ` Michael S. Tsirkin
  2022-09-07 19:54                                   ` Parav Pandit via Virtualization
  2022-09-07 19:51                                 ` Parav Pandit via Virtualization
  2022-09-07 20:04                                 ` Parav Pandit via Virtualization
  2 siblings, 1 reply; 29+ messages in thread
From: Michael S. Tsirkin @ 2022-09-07 19:37 UTC (permalink / raw)
  To: Parav Pandit
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li

On Wed, Sep 07, 2022 at 03:36:16PM -0400, Michael S. Tsirkin wrote:
> On Wed, Sep 07, 2022 at 07:27:16PM +0000, Parav Pandit wrote:
> > 
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Wednesday, September 7, 2022 3:24 PM
> > > 
> > > On Wed, Sep 07, 2022 at 07:18:06PM +0000, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Wednesday, September 7, 2022 3:12 PM
> > > >
> > > > > > Because of shallow queue of 16 entries deep.
> > > > >
> > > > > but why is the queue just 16 entries?
> > > > I explained the calculation in [1] about 16 entries.
> > > >
> > > > [1]
> > > >
> > > https://lore.kernel.org/netdev/PH0PR12MB54812EC7F4711C1EA4CAA119DC
> > > 419@
> > > > PH0PR12MB5481.namprd12.prod.outlook.com/
> > > >
> > > > > does the device not support indirect?
> > > > >
> > > > Yes, indirect feature bit is disabled on the device.
> > > 
> > > OK that explains it.
> > 
> > So can we proceed with v6 to contain 
> > (a) updated commit message and
> > (b) function name change you suggested to drop _fields suffix?
> 
> (c) replace mtu = 0 with sensibly not calling the function
> when mtu is unknown.
> 
> 
> And I'd like commit log to include results of perf testing
> - with indirect feature on
> - with mtu feature off
> just to make sure nothing breaks.

and if possible a larger ring. 1k?

> -- 
> MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 19:37                                 ` Michael S. Tsirkin
@ 2022-09-07 19:54                                   ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 29+ messages in thread
From: Parav Pandit via Virtualization @ 2022-09-07 19:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, September 7, 2022 3:38 PM
> 
> and if possible a larger ring. 1k?

What do you expect to see here for which test report should be added to commit log?
What is special about 1k vs 512, 128 and 2k? is 1K default for some configuration?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 19:36                               ` Michael S. Tsirkin
  2022-09-07 19:37                                 ` Michael S. Tsirkin
@ 2022-09-07 19:51                                 ` Parav Pandit via Virtualization
  2022-09-07 21:39                                   ` [virtio-dev] " Si-Wei Liu
  2022-09-22  9:26                                   ` Michael S. Tsirkin
  2022-09-07 20:04                                 ` Parav Pandit via Virtualization
  2 siblings, 2 replies; 29+ messages in thread
From: Parav Pandit via Virtualization @ 2022-09-07 19:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, September 7, 2022 3:36 PM
> 
> On Wed, Sep 07, 2022 at 07:27:16PM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Wednesday, September 7, 2022 3:24 PM
> > >
> > > On Wed, Sep 07, 2022 at 07:18:06PM +0000, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Wednesday, September 7, 2022 3:12 PM
> > > >
> > > > > > Because of shallow queue of 16 entries deep.
> > > > >
> > > > > but why is the queue just 16 entries?
> > > > I explained the calculation in [1] about 16 entries.
> > > >
> > > > [1]
> > > >
> > >
> https://lore.kernel.org/netdev/PH0PR12MB54812EC7F4711C1EA4CAA119DC
> > > 419@
> > > > PH0PR12MB5481.namprd12.prod.outlook.com/
> > > >
> > > > > does the device not support indirect?
> > > > >
> > > > Yes, indirect feature bit is disabled on the device.
> > >
> > > OK that explains it.
> >
> > So can we proceed with v6 to contain
> > (a) updated commit message and
> > (b) function name change you suggested to drop _fields suffix?
> 
> (c) replace mtu = 0 with sensibly not calling the function when mtu is
> unknown.

> 
> 
> And I'd like commit log to include results of perf testing
> - with indirect feature on
Which device do you suggest using for this test?

> - with mtu feature off
Why is this needed when it is not touching the area of mtu being not offered?

> just to make sure nothing breaks.
Not sure why you demand this.
Can you please share the link to other patches that ensured that nothing breaks, for example I didn't see a similar "test ask" in v14 series [1]?
What is so special about current patch of interest vs [1] that requires this special testing details in commit log, and it is not required in [1] or past patches?
Do you have link to the tests done with synchronization tests by commit [2]?

This will help to define test matrix for developers and internal regression and similar report in all subsequent patches like [1].

[1] https://lore.kernel.org/bpf/20220801063902.129329-41-xuanzhuo@linux.alibaba.com/
[2] 6213f07cb54
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [virtio-dev] RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 19:51                                 ` Parav Pandit via Virtualization
@ 2022-09-07 21:39                                   ` Si-Wei Liu
  2022-09-07 22:11                                     ` Parav Pandit via Virtualization
  2022-09-22  9:26                                   ` Michael S. Tsirkin
  1 sibling, 1 reply; 29+ messages in thread
From: Si-Wei Liu @ 2022-09-07 21:39 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li



On 9/7/2022 12:51 PM, Parav Pandit wrote:
>> And I'd like commit log to include results of perf testing
>> - with indirect feature on
> Which device do you suggest using for this test?
>
You may use software vhost-net backend with and without fix to compare. 
Since this driver fix effectively lowers down the buffer size for the 
indirect=on case as well, it's a natural request to make sure no perf 
degradation is seen on devices with indirect descriptor enabled. I don't 
expect degradation though and think this patch should improve efficiency 
with less memory foot print.

Cheers,
-Siwei
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [virtio-dev] RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 21:39                                   ` [virtio-dev] " Si-Wei Liu
@ 2022-09-07 22:11                                     ` Parav Pandit via Virtualization
  2022-09-07 22:57                                       ` Si-Wei Liu
  0 siblings, 1 reply; 29+ messages in thread
From: Parav Pandit via Virtualization @ 2022-09-07 22:11 UTC (permalink / raw)
  To: Si-Wei Liu, Michael S. Tsirkin
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li


> From: Si-Wei Liu <si-wei.liu@oracle.com>
> Sent: Wednesday, September 7, 2022 5:40 PM
> 
> 
> On 9/7/2022 12:51 PM, Parav Pandit wrote:
> >> And I'd like commit log to include results of perf testing
> >> - with indirect feature on
> > Which device do you suggest using for this test?
> >
> You may use software vhost-net backend with and without fix to compare.
> Since this driver fix effectively lowers down the buffer size for the
> indirect=on case as well, 
Do you have sample example for this?

> it's a natural request to make sure no perf
> degradation is seen on devices with indirect descriptor enabled. I don't
> expect degradation though and think this patch should improve efficiency
> with less memory foot print.
> 
Any specific reason to discount test for the packed vq here because the change applies to packed vq too?

It is counter intuitive to see degradation with smaller size buffers.
But sure, code reviews can miss things that can bring regression for which it should be tested.

I am not against the test itself. It is good thing to do more test coverage.
What is puzzling me is, I fail to see test results not available in previous commits and cover letters, which is making this patch special for test coverage.
Or a new trend begins with this specific patch onwards?

For example, I don’t see a test results in commit [1], [2], [3] to indicate that no degradation is seen which heavily touches the lock in core data path.

So want to know test reporting guidelines in the commit logs for this and other patches.

[1] commit 5a2f966d0f
[2] commit a7766ef18b33 
[3] commit 22bc63c58e876
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [virtio-dev] RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 22:11                                     ` Parav Pandit via Virtualization
@ 2022-09-07 22:57                                       ` Si-Wei Liu
  0 siblings, 0 replies; 29+ messages in thread
From: Si-Wei Liu @ 2022-09-07 22:57 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li



On 9/7/2022 3:11 PM, Parav Pandit wrote:
>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>> Sent: Wednesday, September 7, 2022 5:40 PM
>>
>>
>> On 9/7/2022 12:51 PM, Parav Pandit wrote:
>>>> And I'd like commit log to include results of perf testing
>>>> - with indirect feature on
>>> Which device do you suggest using for this test?
>>>
>> You may use software vhost-net backend with and without fix to compare.
>> Since this driver fix effectively lowers down the buffer size for the
>> indirect=on case as well,
> Do you have sample example for this?

ip link add link ens300f1 name macvtap1 address $mac type macvtap mode 
bridge
ifindex1=`cat /sys/class/net/macvtap1/ifindex`

qemu-system-x86_64 \
   ...
   -netdev type=tap,id=hnet1,vhost=on,fd=25,vhostfd=27 \
   -device 
virtio-net-pci,netdev=hnet1,id=vnet1,mac=$mac,host_mtu=9000,mrg_rxbuf=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,indirect_desc=on 
\
   25<>/dev/tap${ifindex1} 27<>/dev/vhost-net

indirect_desc=on is not actually required as it's on by default if not 
specified.

-Siwei
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 19:51                                 ` Parav Pandit via Virtualization
  2022-09-07 21:39                                   ` [virtio-dev] " Si-Wei Liu
@ 2022-09-22  9:26                                   ` Michael S. Tsirkin
  2022-09-22 10:07                                     ` Parav Pandit via Virtualization
  1 sibling, 1 reply; 29+ messages in thread
From: Michael S. Tsirkin @ 2022-09-22  9:26 UTC (permalink / raw)
  To: Parav Pandit
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li

On Wed, Sep 07, 2022 at 07:51:38PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, September 7, 2022 3:36 PM
> > 
> > On Wed, Sep 07, 2022 at 07:27:16PM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Wednesday, September 7, 2022 3:24 PM
> > > >
> > > > On Wed, Sep 07, 2022 at 07:18:06PM +0000, Parav Pandit wrote:
> > > > >
> > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > Sent: Wednesday, September 7, 2022 3:12 PM
> > > > >
> > > > > > > Because of shallow queue of 16 entries deep.
> > > > > >
> > > > > > but why is the queue just 16 entries?
> > > > > I explained the calculation in [1] about 16 entries.
> > > > >
> > > > > [1]
> > > > >
> > > >
> > https://lore.kernel.org/netdev/PH0PR12MB54812EC7F4711C1EA4CAA119DC
> > > > 419@
> > > > > PH0PR12MB5481.namprd12.prod.outlook.com/
> > > > >
> > > > > > does the device not support indirect?
> > > > > >
> > > > > Yes, indirect feature bit is disabled on the device.
> > > >
> > > > OK that explains it.
> > >
> > > So can we proceed with v6 to contain
> > > (a) updated commit message and
> > > (b) function name change you suggested to drop _fields suffix?
> > 
> > (c) replace mtu = 0 with sensibly not calling the function when mtu is
> > unknown.
> 
> > 
> > 
> > And I'd like commit log to include results of perf testing
> > - with indirect feature on
> Which device do you suggest using for this test?

AFAIK most devices support INDIRECT, e.g. don't nvidia cards do this?


> > - with mtu feature off
> Why is this needed when it is not touching the area of mtu being not offered?

I don't really like it that instead of checking the MTU feature bit
everywhere the patch sets mtu variable to 0. Because of this
it wasn't all that obvious that the patch did not affect !MTU
performance (the code does change).

Rereading it afresh I think it's ok. But explicit check for !MTU
would be better imho making it obvious we do not need to test !MTU.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-22  9:26                                   ` Michael S. Tsirkin
@ 2022-09-22 10:07                                     ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 29+ messages in thread
From: Parav Pandit via Virtualization @ 2022-09-22 10:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, September 22, 2022 5:27 AM

> > >
> > > And I'd like commit log to include results of perf testing
> > > - with indirect feature on
> > Which device do you suggest using for this test?
> 
> AFAIK most devices support INDIRECT, e.g. don't nvidia cards do this?
The device of interest didn't have INDIRECT support.
But the question is no more relevant as Gavin supplied the numbers with INDIRECT in commit log of v6.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-07 19:36                               ` Michael S. Tsirkin
  2022-09-07 19:37                                 ` Michael S. Tsirkin
  2022-09-07 19:51                                 ` Parav Pandit via Virtualization
@ 2022-09-07 20:04                                 ` Parav Pandit via Virtualization
  2 siblings, 0 replies; 29+ messages in thread
From: Parav Pandit via Virtualization @ 2022-09-07 20:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev@lists.oasis-open.org, sridhar.samudrala@intel.com,
	jesse.brandeburg@intel.com, Gavi Teitz,
	virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, September 7, 2022 3:36 PM
> 
> (c) replace mtu = 0 with sensibly not calling the function when mtu is
> unknown.
Even when mtu is zero, virtnet_set_big_packets() must be called to act on the gso bits.
Currently handling by virtnet_set_big_packets() seems more simpler taking care of mtu and gso both.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
       [not found] ` <20220901021038.84751-3-gavinl@nvidia.com>
  2022-09-07  2:17   ` [virtio-dev] [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets Jason Wang
  2022-09-07  5:31   ` Michael S. Tsirkin
@ 2022-09-22  9:35   ` Michael S. Tsirkin
  2022-09-22 10:04     ` Parav Pandit via Virtualization
  2 siblings, 1 reply; 29+ messages in thread
From: Michael S. Tsirkin @ 2022-09-22  9:35 UTC (permalink / raw)
  To: Gavin Li
  Cc: alexander.h.duyck, virtio-dev, sridhar.samudrala,
	jesse.brandeburg, gavi, virtualization, stephen, loseweigh,
	netdev, kuba, davem

On Thu, Sep 01, 2022 at 05:10:38AM +0300, Gavin Li wrote:
> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> packets even when GUEST_* offloads are not present on the device.
> However, if guest GSO is not supported, it would be sufficient to
> allocate segments to cover just up the MTU size and no further.
> Allocating the maximum amount of segments results in a large waste of
> buffer space in the queue, which limits the number of packets that can
> be buffered and can result in reduced performance.
> 
> Therefore, if guest GSO is not supported, use the MTU to calculate the
> optimal amount of segments required.
> 
> When guest offload is enabled at runtime, RQ already has packets of bytes
> less than 64K. So when packet of 64KB arrives, all the packets of such
> size will be dropped. and RQ is now not usable.
> 
> So this means that during set_guest_offloads() phase, RQs have to be
> destroyed and recreated, which requires almost driver reload.
> 
> If VIRTIO_NET_F_CTRL_GUEST_OFFLOADS has been negotiated, then it should
> always treat them as GSO enabled.
> 
> Accordingly, for now the assumption is that if guest GSO has been
> negotiated then it has been enabled, even if it's actually been disabled
> at runtime through VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
> 
> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> 1 VQ, queue size 1024, before and after the change, with the iperf
> server running over the virtio-net interface.
> 
> MTU(Bytes)/Bandwidth (Gbit/s)
>              Before   After
>   1500        22.5     22.4
>   9000        12.8     25.9
> 
> Signed-off-by: Gavin Li <gavinl@nvidia.com>
> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>

OK I think the logic is correct, it's just a bit harder to read
than necessary. Small improvement suggestions:


> ---
> changelog:
> v4->v5
> - Addressed comments from Michael S. Tsirkin
> - Improve commit message
> v3->v4
> - Addressed comments from Si-Wei
> - Rename big_packets_sg_num with big_packets_num_skbfrags
> v2->v3
> - Addressed comments from Si-Wei
> - Simplify the condition check to enable the optimization
> v1->v2
> - Addressed comments from Jason, Michael, Si-Wei.
> - Remove the flag of guest GSO support, set sg_num for big packets and
>   use it directly
> - Recalculate sg_num for big packets in virtnet_set_guest_offloads
> - Replace the round up algorithm with DIV_ROUND_UP
> ---
>  drivers/net/virtio_net.c | 37 ++++++++++++++++++++++++-------------
>  1 file changed, 24 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index f831a0290998..dbffd5f56fb8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -225,6 +225,9 @@ struct virtnet_info {
>  	/* I like... big packets and I cannot lie! */
>  	bool big_packets;
>  
> +	/* number of sg entries allocated for big packets */
> +	unsigned int big_packets_num_skbfrags;
> +
>  	/* Host will merge rx buffers for big packets (shake it! shake it!) */
>  	bool mergeable_rx_bufs;
>  

big_packets_num_skbfrags -> big_packet_num_skbfrags

> @@ -1331,10 +1334,10 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>  	char *p;
>  	int i, err, offset;
>  
> -	sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> +	sg_init_table(rq->sg, vi->big_packets_num_skbfrags + 2);
>  
> -	/* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> -	for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> +	/* page in rq->sg[vi->big_packets_num_skbfrags + 1] is list tail */
> +	for (i = vi->big_packets_num_skbfrags + 1; i > 1; --i) {
>  		first = get_a_page(rq, gfp);
>  		if (!first) {
>  			if (list)
> @@ -1365,7 +1368,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>  
>  	/* chain first in list head */
>  	first->private = (unsigned long)list;
> -	err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> +	err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags + 2,
>  				  first, gfp);
>  	if (err < 0)
>  		give_pages(rq, first);
> @@ -3690,13 +3693,27 @@ static bool virtnet_check_guest_gso(const struct virtnet_info *vi)
>  		virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO);
>  }
>  
> +static void virtnet_set_big_packets_fields(struct virtnet_info *vi, const int mtu)
> +{
> +	bool guest_gso = virtnet_check_guest_gso(vi);
> +
> +	/* If device can receive ANY guest GSO packets, regardless of mtu,
> +	 * allocate packets of maximum size, otherwise limit it to only
> +	 * mtu size worth only.
> +	 */
> +	if (mtu > ETH_DATA_LEN || guest_gso) {
> +		vi->big_packets = true;
> +		vi->big_packets_num_skbfrags = guest_gso ? MAX_SKB_FRAGS : DIV_ROUND_UP(mtu, PAGE_SIZE);
> +	}
> +}
> +
>  static int virtnet_probe(struct virtio_device *vdev)
>  {
>  	int i, err = -ENOMEM;
>  	struct net_device *dev;
>  	struct virtnet_info *vi;
>  	u16 max_queue_pairs;
> -	int mtu;
> +	int mtu = 0;
>

I think it's better to drop this and instead just put the code
where we already know the config. So:
  
>  	/* Find if host supports multiqueue/rss virtio_net device */
>  	max_queue_pairs = 1;
> @@ -3784,10 +3801,6 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	INIT_WORK(&vi->config_work, virtnet_config_changed_work);
>  	spin_lock_init(&vi->refill_lock);
>  
> -	/* If we can receive ANY GSO packets, we must allocate large ones. */
> -	if (virtnet_check_guest_gso(vi))
> -		vi->big_packets = true;
> -
>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>  		vi->mergeable_rx_bufs = true;
>  
> @@ -3853,12 +3866,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>  
>  		dev->mtu = mtu;
>  		dev->max_mtu = mtu;
> -
> -		/* TODO: size buffers correctly in this case. */
> -		if (dev->mtu > ETH_DATA_LEN)
> -			vi->big_packets = true;

    /* Size buffers to fit mtu. */
    if (mtu > ETH_DATA_LEN) {
                    vi->big_packets = true;
                    vi->big_packets_num_skbfrags = DIV_ROUND_UP(mtu, PAGE_SIZE);
    }

>  	}
>  
> +	virtnet_set_big_packets_fields(vi, mtu);
> +

and here:
        /* If device can receive guest GSO packets, allocate buffers for
         * packets of maximum size, regardless of mtu.
	 */

	if (virtnet_check_guest_gso(vi)) {
		vi->big_packets = true;
		vi->big_packets_num_skbfrags = MAX_SKB_FRAGS;
        }


>  	if (vi->any_header_sg)
>  		dev->needed_headroom = vi->hdr_len;
>  
> -- 
> 2.31.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-22  9:35   ` Michael S. Tsirkin
@ 2022-09-22 10:04     ` Parav Pandit via Virtualization
  2022-09-22 10:14       ` Michael S. Tsirkin
  0 siblings, 1 reply; 29+ messages in thread
From: Parav Pandit via Virtualization @ 2022-09-22 10:04 UTC (permalink / raw)
  To: Michael S. Tsirkin, Gavin Li
  Cc: alexander.h.duyck@intel.com, virtio-dev@lists.oasis-open.org,
	sridhar.samudrala@intel.com, jesse.brandeburg@intel.com,
	Gavi Teitz, virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, September 22, 2022 5:35 AM
> 
> On Thu, Sep 01, 2022 at 05:10:38AM +0300, Gavin Li wrote:
> > Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> > packets even when GUEST_* offloads are not present on the device.
> > However, if guest GSO is not supported, it would be sufficient to
> > allocate segments to cover just up the MTU size and no further.
> > Allocating the maximum amount of segments results in a large waste of
> > buffer space in the queue, which limits the number of packets that can
> > be buffered and can result in reduced performance.
> >
> > Therefore, if guest GSO is not supported, use the MTU to calculate the
> > optimal amount of segments required.
> >
> > When guest offload is enabled at runtime, RQ already has packets of
> > bytes less than 64K. So when packet of 64KB arrives, all the packets
> > of such size will be dropped. and RQ is now not usable.
> >
> > So this means that during set_guest_offloads() phase, RQs have to be
> > destroyed and recreated, which requires almost driver reload.
> >
> > If VIRTIO_NET_F_CTRL_GUEST_OFFLOADS has been negotiated, then it
> > should always treat them as GSO enabled.
> >
> > Accordingly, for now the assumption is that if guest GSO has been
> > negotiated then it has been enabled, even if it's actually been
> > disabled at runtime through VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
> >
> > Below is the iperf TCP test results over a Mellanox NIC, using vDPA
> > for
> > 1 VQ, queue size 1024, before and after the change, with the iperf
> > server running over the virtio-net interface.
> >
> > MTU(Bytes)/Bandwidth (Gbit/s)
> >              Before   After
> >   1500        22.5     22.4
> >   9000        12.8     25.9
> >
> > Signed-off-by: Gavin Li <gavinl@nvidia.com>
> > Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
> 
> OK I think the logic is correct, it's just a bit harder to read than necessary.
> Small improvement suggestions:
> 
> 
> > ---
> > changelog:
> > v4->v5
> > - Addressed comments from Michael S. Tsirkin
> > - Improve commit message
> > v3->v4
> > - Addressed comments from Si-Wei
> > - Rename big_packets_sg_num with big_packets_num_skbfrags
> > v2->v3
> > - Addressed comments from Si-Wei
> > - Simplify the condition check to enable the optimization
> > v1->v2
> > - Addressed comments from Jason, Michael, Si-Wei.
> > - Remove the flag of guest GSO support, set sg_num for big packets and
> >   use it directly
> > - Recalculate sg_num for big packets in virtnet_set_guest_offloads
> > - Replace the round up algorithm with DIV_ROUND_UP
> > ---
> >  drivers/net/virtio_net.c | 37 ++++++++++++++++++++++++-------------
> >  1 file changed, 24 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index
> > f831a0290998..dbffd5f56fb8 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -225,6 +225,9 @@ struct virtnet_info {
> >  	/* I like... big packets and I cannot lie! */
> >  	bool big_packets;
> >
> > +	/* number of sg entries allocated for big packets */
> > +	unsigned int big_packets_num_skbfrags;
> > +
> >  	/* Host will merge rx buffers for big packets (shake it! shake it!) */
> >  	bool mergeable_rx_bufs;
> >
> 
> big_packets_num_skbfrags -> big_packet_num_skbfrags
> 
> > @@ -1331,10 +1334,10 @@ static int add_recvbuf_big(struct virtnet_info
> *vi, struct receive_queue *rq,
> >  	char *p;
> >  	int i, err, offset;
> >
> > -	sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> > +	sg_init_table(rq->sg, vi->big_packets_num_skbfrags + 2);
> >
> > -	/* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> > -	for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> > +	/* page in rq->sg[vi->big_packets_num_skbfrags + 1] is list tail */
> > +	for (i = vi->big_packets_num_skbfrags + 1; i > 1; --i) {
> >  		first = get_a_page(rq, gfp);
> >  		if (!first) {
> >  			if (list)
> > @@ -1365,7 +1368,7 @@ static int add_recvbuf_big(struct virtnet_info
> > *vi, struct receive_queue *rq,
> >
> >  	/* chain first in list head */
> >  	first->private = (unsigned long)list;
> > -	err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> > +	err = virtqueue_add_inbuf(rq->vq, rq->sg,
> > +vi->big_packets_num_skbfrags + 2,
> >  				  first, gfp);
> >  	if (err < 0)
> >  		give_pages(rq, first);
> > @@ -3690,13 +3693,27 @@ static bool virtnet_check_guest_gso(const
> struct virtnet_info *vi)
> >  		virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO);  }
> >
> > +static void virtnet_set_big_packets_fields(struct virtnet_info *vi,
> > +const int mtu) {
> > +	bool guest_gso = virtnet_check_guest_gso(vi);
> > +
> > +	/* If device can receive ANY guest GSO packets, regardless of mtu,
> > +	 * allocate packets of maximum size, otherwise limit it to only
> > +	 * mtu size worth only.
> > +	 */
> > +	if (mtu > ETH_DATA_LEN || guest_gso) {
> > +		vi->big_packets = true;
> > +		vi->big_packets_num_skbfrags = guest_gso ?
> MAX_SKB_FRAGS : DIV_ROUND_UP(mtu, PAGE_SIZE);
> > +	}
> > +}
> > +
> >  static int virtnet_probe(struct virtio_device *vdev)  {
> >  	int i, err = -ENOMEM;
> >  	struct net_device *dev;
> >  	struct virtnet_info *vi;
> >  	u16 max_queue_pairs;
> > -	int mtu;
> > +	int mtu = 0;
> >
> 
> I think it's better to drop this and instead just put the code
> where we already know the config. So:
> 
> >  	/* Find if host supports multiqueue/rss virtio_net device */
> >  	max_queue_pairs = 1;
> > @@ -3784,10 +3801,6 @@ static int virtnet_probe(struct virtio_device
> *vdev)
> >  	INIT_WORK(&vi->config_work, virtnet_config_changed_work);
> >  	spin_lock_init(&vi->refill_lock);
> >
> > -	/* If we can receive ANY GSO packets, we must allocate large ones.
> */
> > -	if (virtnet_check_guest_gso(vi))
> > -		vi->big_packets = true;
> > -
> >  	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> >  		vi->mergeable_rx_bufs = true;
> >
> > @@ -3853,12 +3866,10 @@ static int virtnet_probe(struct virtio_device
> *vdev)
> >
> >  		dev->mtu = mtu;
> >  		dev->max_mtu = mtu;
> > -
> > -		/* TODO: size buffers correctly in this case. */
> > -		if (dev->mtu > ETH_DATA_LEN)
> > -			vi->big_packets = true;
> 
>     /* Size buffers to fit mtu. */
>     if (mtu > ETH_DATA_LEN) {
>                     vi->big_packets = true;
>                     vi->big_packets_num_skbfrags = DIV_ROUND_UP(mtu,
> PAGE_SIZE);
>     }
> 
How doing things twice is better i.e. when mtu is > ETH_DATA_LEN and gso is offered?
It calculates big_packets variable twice.

It also easier to read the code at single place where big_packets decision is taken.

> >  	}
> >
> > +	virtnet_set_big_packets_fields(vi, mtu);
> > +
> 
> and here:
>         /* If device can receive guest GSO packets, allocate buffers for
>          * packets of maximum size, regardless of mtu.
> 	 */
> 
> 	if (virtnet_check_guest_gso(vi)) {
> 		vi->big_packets = true;
> 		vi->big_packets_num_skbfrags = MAX_SKB_FRAGS;
>         }
> 
> 
> >  	if (vi->any_header_sg)
> >  		dev->needed_headroom = vi->hdr_len;
> >
> > --
> > 2.31.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-22 10:04     ` Parav Pandit via Virtualization
@ 2022-09-22 10:14       ` Michael S. Tsirkin
  2022-09-22 10:29         ` Parav Pandit via Virtualization
       [not found]         ` <20220922053458.66f31136@kernel.org>
  0 siblings, 2 replies; 29+ messages in thread
From: Michael S. Tsirkin @ 2022-09-22 10:14 UTC (permalink / raw)
  To: Parav Pandit
  Cc: alexander.h.duyck@intel.com, virtio-dev@lists.oasis-open.org,
	sridhar.samudrala@intel.com, jesse.brandeburg@intel.com,
	Gavi Teitz, virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li

On Thu, Sep 22, 2022 at 10:04:53AM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, September 22, 2022 5:35 AM
> > 
> > On Thu, Sep 01, 2022 at 05:10:38AM +0300, Gavin Li wrote:
> > > Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> > > packets even when GUEST_* offloads are not present on the device.
> > > However, if guest GSO is not supported, it would be sufficient to
> > > allocate segments to cover just up the MTU size and no further.
> > > Allocating the maximum amount of segments results in a large waste of
> > > buffer space in the queue, which limits the number of packets that can
> > > be buffered and can result in reduced performance.
> > >
> > > Therefore, if guest GSO is not supported, use the MTU to calculate the
> > > optimal amount of segments required.
> > >
> > > When guest offload is enabled at runtime, RQ already has packets of
> > > bytes less than 64K. So when packet of 64KB arrives, all the packets
> > > of such size will be dropped. and RQ is now not usable.
> > >
> > > So this means that during set_guest_offloads() phase, RQs have to be
> > > destroyed and recreated, which requires almost driver reload.
> > >
> > > If VIRTIO_NET_F_CTRL_GUEST_OFFLOADS has been negotiated, then it
> > > should always treat them as GSO enabled.
> > >
> > > Accordingly, for now the assumption is that if guest GSO has been
> > > negotiated then it has been enabled, even if it's actually been
> > > disabled at runtime through VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
> > >
> > > Below is the iperf TCP test results over a Mellanox NIC, using vDPA
> > > for
> > > 1 VQ, queue size 1024, before and after the change, with the iperf
> > > server running over the virtio-net interface.
> > >
> > > MTU(Bytes)/Bandwidth (Gbit/s)
> > >              Before   After
> > >   1500        22.5     22.4
> > >   9000        12.8     25.9
> > >
> > > Signed-off-by: Gavin Li <gavinl@nvidia.com>
> > > Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > 
> > OK I think the logic is correct, it's just a bit harder to read than necessary.
> > Small improvement suggestions:
> > 
> > 
> > > ---
> > > changelog:
> > > v4->v5
> > > - Addressed comments from Michael S. Tsirkin
> > > - Improve commit message
> > > v3->v4
> > > - Addressed comments from Si-Wei
> > > - Rename big_packets_sg_num with big_packets_num_skbfrags
> > > v2->v3
> > > - Addressed comments from Si-Wei
> > > - Simplify the condition check to enable the optimization
> > > v1->v2
> > > - Addressed comments from Jason, Michael, Si-Wei.
> > > - Remove the flag of guest GSO support, set sg_num for big packets and
> > >   use it directly
> > > - Recalculate sg_num for big packets in virtnet_set_guest_offloads
> > > - Replace the round up algorithm with DIV_ROUND_UP
> > > ---
> > >  drivers/net/virtio_net.c | 37 ++++++++++++++++++++++++-------------
> > >  1 file changed, 24 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index
> > > f831a0290998..dbffd5f56fb8 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -225,6 +225,9 @@ struct virtnet_info {
> > >  	/* I like... big packets and I cannot lie! */
> > >  	bool big_packets;
> > >
> > > +	/* number of sg entries allocated for big packets */
> > > +	unsigned int big_packets_num_skbfrags;
> > > +
> > >  	/* Host will merge rx buffers for big packets (shake it! shake it!) */
> > >  	bool mergeable_rx_bufs;
> > >
> > 
> > big_packets_num_skbfrags -> big_packet_num_skbfrags
> > 
> > > @@ -1331,10 +1334,10 @@ static int add_recvbuf_big(struct virtnet_info
> > *vi, struct receive_queue *rq,
> > >  	char *p;
> > >  	int i, err, offset;
> > >
> > > -	sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> > > +	sg_init_table(rq->sg, vi->big_packets_num_skbfrags + 2);
> > >
> > > -	/* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> > > -	for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> > > +	/* page in rq->sg[vi->big_packets_num_skbfrags + 1] is list tail */
> > > +	for (i = vi->big_packets_num_skbfrags + 1; i > 1; --i) {
> > >  		first = get_a_page(rq, gfp);
> > >  		if (!first) {
> > >  			if (list)
> > > @@ -1365,7 +1368,7 @@ static int add_recvbuf_big(struct virtnet_info
> > > *vi, struct receive_queue *rq,
> > >
> > >  	/* chain first in list head */
> > >  	first->private = (unsigned long)list;
> > > -	err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> > > +	err = virtqueue_add_inbuf(rq->vq, rq->sg,
> > > +vi->big_packets_num_skbfrags + 2,
> > >  				  first, gfp);
> > >  	if (err < 0)
> > >  		give_pages(rq, first);
> > > @@ -3690,13 +3693,27 @@ static bool virtnet_check_guest_gso(const
> > struct virtnet_info *vi)
> > >  		virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO);  }
> > >
> > > +static void virtnet_set_big_packets_fields(struct virtnet_info *vi,
> > > +const int mtu) {
> > > +	bool guest_gso = virtnet_check_guest_gso(vi);
> > > +
> > > +	/* If device can receive ANY guest GSO packets, regardless of mtu,
> > > +	 * allocate packets of maximum size, otherwise limit it to only
> > > +	 * mtu size worth only.
> > > +	 */
> > > +	if (mtu > ETH_DATA_LEN || guest_gso) {
> > > +		vi->big_packets = true;
> > > +		vi->big_packets_num_skbfrags = guest_gso ?
> > MAX_SKB_FRAGS : DIV_ROUND_UP(mtu, PAGE_SIZE);
> > > +	}
> > > +}
> > > +
> > >  static int virtnet_probe(struct virtio_device *vdev)  {
> > >  	int i, err = -ENOMEM;
> > >  	struct net_device *dev;
> > >  	struct virtnet_info *vi;
> > >  	u16 max_queue_pairs;
> > > -	int mtu;
> > > +	int mtu = 0;
> > >
> > 
> > I think it's better to drop this and instead just put the code
> > where we already know the config. So:
> > 
> > >  	/* Find if host supports multiqueue/rss virtio_net device */
> > >  	max_queue_pairs = 1;
> > > @@ -3784,10 +3801,6 @@ static int virtnet_probe(struct virtio_device
> > *vdev)
> > >  	INIT_WORK(&vi->config_work, virtnet_config_changed_work);
> > >  	spin_lock_init(&vi->refill_lock);
> > >
> > > -	/* If we can receive ANY GSO packets, we must allocate large ones.
> > */
> > > -	if (virtnet_check_guest_gso(vi))
> > > -		vi->big_packets = true;
> > > -
> > >  	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> > >  		vi->mergeable_rx_bufs = true;
> > >
> > > @@ -3853,12 +3866,10 @@ static int virtnet_probe(struct virtio_device
> > *vdev)
> > >
> > >  		dev->mtu = mtu;
> > >  		dev->max_mtu = mtu;
> > > -
> > > -		/* TODO: size buffers correctly in this case. */
> > > -		if (dev->mtu > ETH_DATA_LEN)
> > > -			vi->big_packets = true;
> > 
> >     /* Size buffers to fit mtu. */
> >     if (mtu > ETH_DATA_LEN) {
> >                     vi->big_packets = true;
> >                     vi->big_packets_num_skbfrags = DIV_ROUND_UP(mtu,
> > PAGE_SIZE);
> >     }
> > 
> How doing things twice is better i.e. when mtu is > ETH_DATA_LEN and gso is offered?
> It calculates big_packets variable twice.
> 
> It also easier to read the code at single place where big_packets decision is taken.

I guess it depends on what you want to keep in one place.
I just wanted to reduce the testing burden on the submitter.
What I proposed makes the functional change minimal.

It's nitpicking to be frank. v6 arrived while I was traveling
and I didn't notice it.  I see Jason acked that so I guess I will
just apply as is. Do you ack v6 too?


> > >  	}
> > >
> > > +	virtnet_set_big_packets_fields(vi, mtu);
> > > +
> > 
> > and here:
> >         /* If device can receive guest GSO packets, allocate buffers for
> >          * packets of maximum size, regardless of mtu.
> > 	 */
> > 
> > 	if (virtnet_check_guest_gso(vi)) {
> > 		vi->big_packets = true;
> > 		vi->big_packets_num_skbfrags = MAX_SKB_FRAGS;
> >         }
> > 
> > 
> > >  	if (vi->any_header_sg)
> > >  		dev->needed_headroom = vi->hdr_len;
> > >
> > > --
> > > 2.31.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
  2022-09-22 10:14       ` Michael S. Tsirkin
@ 2022-09-22 10:29         ` Parav Pandit via Virtualization
       [not found]         ` <20220922053458.66f31136@kernel.org>
  1 sibling, 0 replies; 29+ messages in thread
From: Parav Pandit via Virtualization @ 2022-09-22 10:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck@intel.com, virtio-dev@lists.oasis-open.org,
	sridhar.samudrala@intel.com, jesse.brandeburg@intel.com,
	Gavi Teitz, virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	Gavin Li


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, September 22, 2022 6:15 AM
> 
> It's nitpicking to be frank. v6 arrived while I was traveling and I didn't notice it.
> I see Jason acked that so I guess I will just apply as is. Do you ack v6 too?
> 
Yes. I reviewed it. Gavin added reviewed-by.
Thanks.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

[parent not found: <20220922053458.66f31136@kernel.org>]

* RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
       [not found]         ` <20220922053458.66f31136@kernel.org>
@ 2022-10-05 10:29           ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 29+ messages in thread
From: Parav Pandit via Virtualization @ 2022-10-05 10:29 UTC (permalink / raw)
  To: Jakub Kicinski, Michael S. Tsirkin
  Cc: alexander.h.duyck@intel.com, virtio-dev@lists.oasis-open.org,
	sridhar.samudrala@intel.com, jesse.brandeburg@intel.com,
	Gavi Teitz, virtualization@lists.linux-foundation.org,
	stephen@networkplumber.org, loseweigh@gmail.com,
	netdev@vger.kernel.org, davem@davemloft.net, Gavin Li

> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Thursday, September 22, 2022 6:05 PM
> 
> On Thu, 22 Sep 2022 06:14:59 -0400 Michael S. Tsirkin wrote:
> > It's nitpicking to be frank. v6 arrived while I was traveling and I
> > didn't notice it.  I see Jason acked that so I guess I will just apply
> > as is.
> 
> Oh, you wanna take it? The split on who takes virtio_net is a touch unclear
> but if it makes more sense here to go via virtio feel free to slap my ack on the
> v6.

Michael,

Did you get a chance to take this?
I don't see it at https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/log/?h=linux-next
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2022-10-05 10:29 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20220901021038.84751-1-gavinl@nvidia.com>
     [not found] ` <20220901021038.84751-3-gavinl@nvidia.com>
2022-09-07  2:17   ` [virtio-dev] [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets Jason Wang
2022-09-07  5:31   ` Michael S. Tsirkin
     [not found]     ` <0355d1e4-a3cf-5b16-8c7f-b39b1ec14ade@nvidia.com>
2022-09-07  9:26       ` Michael S. Tsirkin
2022-09-07 14:08         ` Parav Pandit via Virtualization
2022-09-07 14:29           ` Michael S. Tsirkin
2022-09-07 14:33             ` Parav Pandit via Virtualization
2022-09-07 14:40               ` Michael S. Tsirkin
2022-09-07 16:12                 ` Parav Pandit via Virtualization
2022-09-07 18:15                   ` Michael S. Tsirkin
2022-09-07 19:06                     ` Parav Pandit via Virtualization
2022-09-07 19:11                       ` Michael S. Tsirkin
2022-09-07 19:18                         ` Parav Pandit via Virtualization
2022-09-07 19:23                           ` Michael S. Tsirkin
2022-09-07 19:27                             ` Parav Pandit via Virtualization
2022-09-07 19:36                               ` Michael S. Tsirkin
2022-09-07 19:37                                 ` Michael S. Tsirkin
2022-09-07 19:54                                   ` Parav Pandit via Virtualization
2022-09-07 19:51                                 ` Parav Pandit via Virtualization
2022-09-07 21:39                                   ` [virtio-dev] " Si-Wei Liu
2022-09-07 22:11                                     ` Parav Pandit via Virtualization
2022-09-07 22:57                                       ` Si-Wei Liu
2022-09-22  9:26                                   ` Michael S. Tsirkin
2022-09-22 10:07                                     ` Parav Pandit via Virtualization
2022-09-07 20:04                                 ` Parav Pandit via Virtualization
2022-09-22  9:35   ` Michael S. Tsirkin
2022-09-22 10:04     ` Parav Pandit via Virtualization
2022-09-22 10:14       ` Michael S. Tsirkin
2022-09-22 10:29         ` Parav Pandit via Virtualization
     [not found]         ` <20220922053458.66f31136@kernel.org>
2022-10-05 10:29           ` Parav Pandit via Virtualization

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).