Re: [PATCH net-next RFC 3/3] virtio-net: conditionally enable tx interrupt

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jason Wang <jasowang@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: kvm@vger.kernel.org, mst@redhat.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	linux-api@vger.kernel.org
Subject: Re: [PATCH net-next RFC 3/3] virtio-net: conditionally enable tx interrupt
Date: Mon, 13 Oct 2014 14:02:42 +0800	[thread overview]
Message-ID: <543B6B02.5000404@redhat.com> (raw)
In-Reply-To: <1413038899.9362.43.camel@edumazet-glaptop2.roam.corp.google.com>

On 10/11/2014 10:48 PM, Eric Dumazet wrote:
> On Sat, 2014-10-11 at 15:16 +0800, Jason Wang wrote:
>> We free transmitted packets in ndo_start_xmit() in the past to get better
>> performance in the past. One side effect is that skb_orphan() needs to be
>> called in ndo_start_xmit() which makes sk_wmem_alloc not accurate in
>> fact. For TCP protocol, this means several optimization could not work well
>> such as TCP small queue and auto corking. This can lead extra low
>> throughput of small packets stream.
>>
>> Thanks to the urgent descriptor support. This patch tries to solve this
>> issue by enable the tx interrupt selectively for stream packets. This means
>> we don't need to orphan TCP stream packets in ndo_start_xmit() but enable
>> tx interrupt for those packets. After we get tx interrupt, a tx napi was
>> scheduled to free those packets.
>>
>> With this method, sk_wmem_alloc of TCP socket were more accurate than in
>> the past which let TCP can batch more through TSQ and auto corking.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>  drivers/net/virtio_net.c | 164 ++++++++++++++++++++++++++++++++++++-----------
>>  1 file changed, 128 insertions(+), 36 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 5810841..b450fc4 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -72,6 +72,8 @@ struct send_queue {
>>  
>>  	/* Name of the send queue: output.$index */
>>  	char name[40];
>> +
>> +	struct napi_struct napi;
>>  };
>>  
>>  /* Internal representation of a receive virtqueue */
>> @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
>>  	return p;
>>  }
>>  
>> +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
>> +{
>> +	struct sk_buff *skb;
>> +	unsigned int len;
>> +	struct virtnet_info *vi = sq->vq->vdev->priv;
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> +	int sent = 0;
>> +
>> +	while (sent < budget &&
>> +	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>> +		pr_debug("Sent skb %p\n", skb);
>> +
>> +		u64_stats_update_begin(&stats->tx_syncp);
>> +		stats->tx_bytes += skb->len;
>> +		stats->tx_packets++;
>> +		u64_stats_update_end(&stats->tx_syncp);
>> +
>> +		dev_kfree_skb_any(skb);
>> +		sent++;
>> +	}
>> +
> You could accumulate skb->len in a totlen var, and perform a single
>
> 	u64_stats_update_begin(&stats->tx_syncp);
> 	stats->tx_bytes += totlen;
> 	stats->tx_packets += sent;
> 	u64_stats_update_end(&stats->tx_syncp);
>
> after the loop.
>

Yes, will do this in a separated patch.
>> +	return sent;
>> +}
>> +
> ...
>
>> +
>> +static bool virtnet_skb_needs_intr(struct sk_buff *skb)
>> +{
>> +	union {
>> +		unsigned char *network;
>> +		struct iphdr *ipv4;
>> +		struct ipv6hdr *ipv6;
>> +	} hdr;
>> +	struct tcphdr *th = tcp_hdr(skb);
>> +	u16 payload_len;
>> +
>> +	hdr.network = skb_network_header(skb);
>> +
>> +	/* Only IPv4/IPv6 with TCP is supported */
> 	Oh well, yet another packet flow dissector :)
>
> 	If most packets were caught by your implementation, you could use it
> for fast patj and fallback to skb_flow_dissect() for encapsulated
> traffic.
>
> 	struct flow_keys keys;   
>
> 	if (!skb_flow_dissect(skb, &keys)) 
> 		return false;
>
> 	if (keys.ip_proto != IPPROTO_TCP)
> 		return false;
>
> 	then check __skb_get_poff() how to get th, and check if there is some
> payload...
>
>

Yes, but we don't know if most of packets were TCP or encapsulated TCP,
it depends on userspace application. If not, looks like
skb_flow_dissect() can bring some overhead, or it could be ignored?

skb_flow_dissect

WARNING: multiple messages have this Message-ID (diff)

From: Jason Wang <jasowang@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: rusty@rustcorp.com.au, mst@redhat.com,
	virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-api@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH net-next RFC 3/3] virtio-net: conditionally enable tx interrupt
Date: Mon, 13 Oct 2014 14:02:42 +0800	[thread overview]
Message-ID: <543B6B02.5000404@redhat.com> (raw)
In-Reply-To: <1413038899.9362.43.camel@edumazet-glaptop2.roam.corp.google.com>

On 10/11/2014 10:48 PM, Eric Dumazet wrote:
> On Sat, 2014-10-11 at 15:16 +0800, Jason Wang wrote:
>> We free transmitted packets in ndo_start_xmit() in the past to get better
>> performance in the past. One side effect is that skb_orphan() needs to be
>> called in ndo_start_xmit() which makes sk_wmem_alloc not accurate in
>> fact. For TCP protocol, this means several optimization could not work well
>> such as TCP small queue and auto corking. This can lead extra low
>> throughput of small packets stream.
>>
>> Thanks to the urgent descriptor support. This patch tries to solve this
>> issue by enable the tx interrupt selectively for stream packets. This means
>> we don't need to orphan TCP stream packets in ndo_start_xmit() but enable
>> tx interrupt for those packets. After we get tx interrupt, a tx napi was
>> scheduled to free those packets.
>>
>> With this method, sk_wmem_alloc of TCP socket were more accurate than in
>> the past which let TCP can batch more through TSQ and auto corking.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>  drivers/net/virtio_net.c | 164 ++++++++++++++++++++++++++++++++++++-----------
>>  1 file changed, 128 insertions(+), 36 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 5810841..b450fc4 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -72,6 +72,8 @@ struct send_queue {
>>  
>>  	/* Name of the send queue: output.$index */
>>  	char name[40];
>> +
>> +	struct napi_struct napi;
>>  };
>>  
>>  /* Internal representation of a receive virtqueue */
>> @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
>>  	return p;
>>  }
>>  
>> +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
>> +{
>> +	struct sk_buff *skb;
>> +	unsigned int len;
>> +	struct virtnet_info *vi = sq->vq->vdev->priv;
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> +	int sent = 0;
>> +
>> +	while (sent < budget &&
>> +	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>> +		pr_debug("Sent skb %p\n", skb);
>> +
>> +		u64_stats_update_begin(&stats->tx_syncp);
>> +		stats->tx_bytes += skb->len;
>> +		stats->tx_packets++;
>> +		u64_stats_update_end(&stats->tx_syncp);
>> +
>> +		dev_kfree_skb_any(skb);
>> +		sent++;
>> +	}
>> +
> You could accumulate skb->len in a totlen var, and perform a single
>
> 	u64_stats_update_begin(&stats->tx_syncp);
> 	stats->tx_bytes += totlen;
> 	stats->tx_packets += sent;
> 	u64_stats_update_end(&stats->tx_syncp);
>
> after the loop.
>

Yes, will do this in a separated patch.
>> +	return sent;
>> +}
>> +
> ...
>
>> +
>> +static bool virtnet_skb_needs_intr(struct sk_buff *skb)
>> +{
>> +	union {
>> +		unsigned char *network;
>> +		struct iphdr *ipv4;
>> +		struct ipv6hdr *ipv6;
>> +	} hdr;
>> +	struct tcphdr *th = tcp_hdr(skb);
>> +	u16 payload_len;
>> +
>> +	hdr.network = skb_network_header(skb);
>> +
>> +	/* Only IPv4/IPv6 with TCP is supported */
> 	Oh well, yet another packet flow dissector :)
>
> 	If most packets were caught by your implementation, you could use it
> for fast patj and fallback to skb_flow_dissect() for encapsulated
> traffic.
>
> 	struct flow_keys keys;   
>
> 	if (!skb_flow_dissect(skb, &keys)) 
> 		return false;
>
> 	if (keys.ip_proto != IPPROTO_TCP)
> 		return false;
>
> 	then check __skb_get_poff() how to get th, and check if there is some
> payload...
>
>

Yes, but we don't know if most of packets were TCP or encapsulated TCP,
it depends on userspace application. If not, looks like
skb_flow_dissect() can bring some overhead, or it could be ignored?

skb_flow_dissect

next prev parent reply	other threads:[~2014-10-13  6:02 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-11  7:16 [PATCH net-next RFC 0/3] virtio-net: Conditionally enable tx interrupt Jason Wang
2014-10-11  7:16 ` Jason Wang
2014-10-11  7:16 ` [PATCH net-next RFC 1/3] virtio: support for urgent descriptors Jason Wang
2014-10-11  7:16   ` Jason Wang
2014-10-12  9:27   ` Michael S. Tsirkin
2014-10-12  9:27     ` Michael S. Tsirkin
2014-10-13  6:22     ` Jason Wang
2014-10-13  6:22       ` Jason Wang
2014-10-13  7:16       ` Michael S. Tsirkin
2014-10-13  7:16         ` Michael S. Tsirkin
2014-10-15  5:40   ` Rusty Russell
2014-10-15  5:40     ` Rusty Russell
2014-10-17  5:23     ` Jason Wang
2014-10-11  7:16 ` [PATCH net-next RFC 2/3] vhost: support " Jason Wang
2014-10-11  7:16   ` Jason Wang
2014-10-11  7:16 ` [PATCH net-next RFC 3/3] virtio-net: conditionally enable tx interrupt Jason Wang
2014-10-11  7:16   ` Jason Wang
2014-10-11 14:48   ` Eric Dumazet
2014-10-11 14:48     ` Eric Dumazet
2014-10-13  6:02     ` Jason Wang [this message]
2014-10-13  6:02       ` Jason Wang
2014-10-14 21:51   ` Michael S. Tsirkin
2014-10-14 21:51     ` Michael S. Tsirkin
2014-10-15  3:34     ` Jason Wang
2014-10-15  3:34       ` Jason Wang
2014-10-14 18:53 ` [PATCH net-next RFC 0/3] virtio-net: Conditionally " David Miller
     [not found] ` <1413011806-3813-1-git-send-email-jasowang-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-10-14 18:53   ` David Miller
2014-10-14 18:53     ` David Miller
2014-10-14 21:51     ` Michael S. Tsirkin
2014-10-14 21:51       ` Michael S. Tsirkin
2014-10-15  3:24       ` Jason Wang
2014-10-15  3:24         ` Jason Wang
2014-10-14 23:06     ` Michael S. Tsirkin
2014-10-14 23:06       ` Michael S. Tsirkin
2014-10-15  7:28       ` Jason Wang
2014-10-15  7:28         ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=543B6B02.5000404@redhat.com \
    --to=jasowang@redhat.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.