Re: [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Eric Dumazet <eric.dumazet@gmail.com>
To: Harold Huang <baymaxhuang@gmail.com>, netdev@vger.kernel.org
Cc: jasowang@redhat.com, pabeni@redhat.com,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	open list <linux-kernel@vger.kernel.org>,
	"open list:XDP (eXpress Data Path)" <bpf@vger.kernel.org>,
	edumazet@google.com
Subject: Re: [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs
Date: Sun, 27 Feb 2022 20:06:03 -0800	[thread overview]
Message-ID: <c687e1d8-e36a-8f23-342a-22b2a1efb372@gmail.com> (raw)
In-Reply-To: <20220225090223.636877-1-baymaxhuang@gmail.com>


On 2/25/22 01:02, Harold Huang wrote:
> In tun, NAPI is supported and we can also use NAPI in the path of
> batched XDP buffs to accelerate packet processing. What is more, after
> we use NAPI, GRO is also supported. The iperf shows that the throughput of
> single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> Gbps nearly reachs the line speed of the phy nic and there is still about
> 15% idle cpu core remaining on the vhost thread.
>
> Test topology:
>
> [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
>
> Iperf stream:
>
> Before:
> ...
> [  5]   5.00-6.00   sec   558 MBytes  4.68 Gbits/sec    0   1.50 MBytes
> [  5]   6.00-7.00   sec   556 MBytes  4.67 Gbits/sec    1   1.35 MBytes
> [  5]   7.00-8.00   sec   556 MBytes  4.67 Gbits/sec    2   1.18 MBytes
> [  5]   8.00-9.00   sec   559 MBytes  4.69 Gbits/sec    0   1.48 MBytes
> [  5]   9.00-10.00  sec   556 MBytes  4.67 Gbits/sec    1   1.33 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec  5.39 GBytes  4.63 Gbits/sec   72          sender
> [  5]   0.00-10.04  sec  5.39 GBytes  4.61 Gbits/sec               receiver
>
> After:
> ...
> [  5]   5.00-6.00   sec  1.07 GBytes  9.19 Gbits/sec    0   1.55 MBytes
> [  5]   6.00-7.00   sec  1.08 GBytes  9.30 Gbits/sec    0   1.63 MBytes
> [  5]   7.00-8.00   sec  1.08 GBytes  9.25 Gbits/sec    0   1.72 MBytes
> [  5]   8.00-9.00   sec  1.08 GBytes  9.25 Gbits/sec   77   1.31 MBytes
> [  5]   9.00-10.00  sec  1.08 GBytes  9.24 Gbits/sec    0   1.48 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec  10.8 GBytes  9.28 Gbits/sec  166          sender
> [  5]   0.00-10.04  sec  10.8 GBytes  9.24 Gbits/sec               receiver
> ....
>
> Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> Signed-off-by: Harold Huang <baymaxhuang@gmail.com>
> ---
> v1 -> v2
>   - fix commit messages
>   - add queued flag to avoid void unnecessary napi suggested by Jason
>
>   drivers/net/tun.c | 20 ++++++++++++++++----
>   1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index fed85447701a..c7d8b7c821d8 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2379,7 +2379,7 @@ static void tun_put_page(struct tun_page *tpage)
>   }
>   
>   static int tun_xdp_one(struct tun_struct *tun,
> -		       struct tun_file *tfile,
> +		       struct tun_file *tfile, int *queued,
>   		       struct xdp_buff *xdp, int *flush,
>   		       struct tun_page *tpage)
>   {
> @@ -2388,6 +2388,7 @@ static int tun_xdp_one(struct tun_struct *tun,
>   	struct virtio_net_hdr *gso = &hdr->gso;
>   	struct bpf_prog *xdp_prog;
>   	struct sk_buff *skb = NULL;
> +	struct sk_buff_head *queue;
>   	u32 rxhash = 0, act;
>   	int buflen = hdr->buflen;
>   	int err = 0;
> @@ -2464,7 +2465,15 @@ static int tun_xdp_one(struct tun_struct *tun,
>   	    !tfile->detached)
>   		rxhash = __skb_get_hash_symmetric(skb);
>   
> -	netif_receive_skb(skb);
> +	if (tfile->napi_enabled) {
> +		queue = &tfile->sk.sk_write_queue;
> +		spin_lock(&queue->lock);
> +		__skb_queue_tail(queue, skb);
> +		spin_unlock(&queue->lock);
> +		(*queued)++;
> +	} else {
> +		netif_receive_skb(skb);
> +	}
>   
>   	/* No need to disable preemption here since this function is
>   	 * always called with bh disabled
> @@ -2492,7 +2501,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>   	if (ctl && (ctl->type == TUN_MSG_PTR)) {
>   		struct tun_page tpage;
>   		int n = ctl->num;
> -		int flush = 0;
> +		int flush = 0, queued = 0;
>   
>   		memset(&tpage, 0, sizeof(tpage));
>   
> @@ -2501,12 +2510,15 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>   
>   		for (i = 0; i < n; i++) {
>   			xdp = &((struct xdp_buff *)ctl->ptr)[i];
> -			tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
> +			tun_xdp_one(tun, tfile, &queued, xdp, &flush, &tpage);


How big n can be ?

BTW I could not find where m->msg_controllen was checked in tun_sendmsg().

struct tun_msg_ctl *ctl = m->msg_control;

if (ctl && (ctl->type == TUN_MSG_PTR)) {

     int n = ctl->num;  // can be set to values in [0..65535]

     for (i = 0; i < n; i++) {

         xdp = &((struct xdp_buff *)ctl->ptr)[i];


I really do not understand how we prevent malicious user space from 
crashing the kernel.



>   		}
>   
>   		if (flush)
>   			xdp_do_flush();
>   
> +		if (tfile->napi_enabled && queued > 0)
> +			napi_schedule(&tfile->napi);
> +
>   		rcu_read_unlock();
>   		local_bh_enable();
>

next prev parent reply	other threads:[~2022-02-28  4:06 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-24 10:38 [PATCH] tun: support NAPI to accelerate packet processing Harold Huang
2022-02-24 17:22 ` Paolo Abeni
2022-02-25  3:36   ` Harold Huang
2022-02-25  3:46 ` Jason Wang
2022-02-25  9:02 ` [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs Harold Huang
2022-02-28  2:15   ` Jason Wang
2022-02-28  4:06   ` Eric Dumazet [this message]
2022-02-28  4:20     ` Jason Wang
     [not found]       ` <CANn89iKLhhwGnmEyfZuEKjtt7OwTbVyDYcFUMDYoRpdXjbMwiA@mail.gmail.com>
2022-02-28  5:17         ` Jason Wang
2022-02-28  7:26           ` Harold Huang
2022-02-28  7:56             ` Jason Wang
2022-02-28  3:38 ` [PATCH net-next v3] " Harold Huang
2022-02-28  7:46   ` Jason Wang
2022-02-28 17:15     ` Stephen Hemminger
2022-03-01  1:47       ` Jason Wang
2022-03-01  1:58       ` Harold Huang
2022-03-02  1:40   ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c687e1d8-e36a-8f23-342a-22b2a1efb372@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=ast@kernel.org \
    --cc=baymaxhuang@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=jasowang@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).