From: Eric Dumazet <eric.dumazet@gmail.com>
To: Harold Huang <baymaxhuang@gmail.com>, netdev@vger.kernel.org
Cc: jasowang@redhat.com, pabeni@redhat.com,
"David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
open list <linux-kernel@vger.kernel.org>,
"open list:XDP (eXpress Data Path)" <bpf@vger.kernel.org>,
edumazet@google.com
Subject: Re: [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs
Date: Sun, 27 Feb 2022 20:06:03 -0800 [thread overview]
Message-ID: <c687e1d8-e36a-8f23-342a-22b2a1efb372@gmail.com> (raw)
In-Reply-To: <20220225090223.636877-1-baymaxhuang@gmail.com>
On 2/25/22 01:02, Harold Huang wrote:
> In tun, NAPI is supported and we can also use NAPI in the path of
> batched XDP buffs to accelerate packet processing. What is more, after
> we use NAPI, GRO is also supported. The iperf shows that the throughput of
> single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> Gbps nearly reachs the line speed of the phy nic and there is still about
> 15% idle cpu core remaining on the vhost thread.
>
> Test topology:
>
> [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
>
> Iperf stream:
>
> Before:
> ...
> [ 5] 5.00-6.00 sec 558 MBytes 4.68 Gbits/sec 0 1.50 MBytes
> [ 5] 6.00-7.00 sec 556 MBytes 4.67 Gbits/sec 1 1.35 MBytes
> [ 5] 7.00-8.00 sec 556 MBytes 4.67 Gbits/sec 2 1.18 MBytes
> [ 5] 8.00-9.00 sec 559 MBytes 4.69 Gbits/sec 0 1.48 MBytes
> [ 5] 9.00-10.00 sec 556 MBytes 4.67 Gbits/sec 1 1.33 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 5.39 GBytes 4.63 Gbits/sec 72 sender
> [ 5] 0.00-10.04 sec 5.39 GBytes 4.61 Gbits/sec receiver
>
> After:
> ...
> [ 5] 5.00-6.00 sec 1.07 GBytes 9.19 Gbits/sec 0 1.55 MBytes
> [ 5] 6.00-7.00 sec 1.08 GBytes 9.30 Gbits/sec 0 1.63 MBytes
> [ 5] 7.00-8.00 sec 1.08 GBytes 9.25 Gbits/sec 0 1.72 MBytes
> [ 5] 8.00-9.00 sec 1.08 GBytes 9.25 Gbits/sec 77 1.31 MBytes
> [ 5] 9.00-10.00 sec 1.08 GBytes 9.24 Gbits/sec 0 1.48 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 10.8 GBytes 9.28 Gbits/sec 166 sender
> [ 5] 0.00-10.04 sec 10.8 GBytes 9.24 Gbits/sec receiver
> ....
>
> Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> Signed-off-by: Harold Huang <baymaxhuang@gmail.com>
> ---
> v1 -> v2
> - fix commit messages
> - add queued flag to avoid void unnecessary napi suggested by Jason
>
> drivers/net/tun.c | 20 ++++++++++++++++----
> 1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index fed85447701a..c7d8b7c821d8 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2379,7 +2379,7 @@ static void tun_put_page(struct tun_page *tpage)
> }
>
> static int tun_xdp_one(struct tun_struct *tun,
> - struct tun_file *tfile,
> + struct tun_file *tfile, int *queued,
> struct xdp_buff *xdp, int *flush,
> struct tun_page *tpage)
> {
> @@ -2388,6 +2388,7 @@ static int tun_xdp_one(struct tun_struct *tun,
> struct virtio_net_hdr *gso = &hdr->gso;
> struct bpf_prog *xdp_prog;
> struct sk_buff *skb = NULL;
> + struct sk_buff_head *queue;
> u32 rxhash = 0, act;
> int buflen = hdr->buflen;
> int err = 0;
> @@ -2464,7 +2465,15 @@ static int tun_xdp_one(struct tun_struct *tun,
> !tfile->detached)
> rxhash = __skb_get_hash_symmetric(skb);
>
> - netif_receive_skb(skb);
> + if (tfile->napi_enabled) {
> + queue = &tfile->sk.sk_write_queue;
> + spin_lock(&queue->lock);
> + __skb_queue_tail(queue, skb);
> + spin_unlock(&queue->lock);
> + (*queued)++;
> + } else {
> + netif_receive_skb(skb);
> + }
>
> /* No need to disable preemption here since this function is
> * always called with bh disabled
> @@ -2492,7 +2501,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> if (ctl && (ctl->type == TUN_MSG_PTR)) {
> struct tun_page tpage;
> int n = ctl->num;
> - int flush = 0;
> + int flush = 0, queued = 0;
>
> memset(&tpage, 0, sizeof(tpage));
>
> @@ -2501,12 +2510,15 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>
> for (i = 0; i < n; i++) {
> xdp = &((struct xdp_buff *)ctl->ptr)[i];
> - tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
> + tun_xdp_one(tun, tfile, &queued, xdp, &flush, &tpage);
How big n can be ?
BTW I could not find where m->msg_controllen was checked in tun_sendmsg().
struct tun_msg_ctl *ctl = m->msg_control;
if (ctl && (ctl->type == TUN_MSG_PTR)) {
int n = ctl->num; // can be set to values in [0..65535]
for (i = 0; i < n; i++) {
xdp = &((struct xdp_buff *)ctl->ptr)[i];
I really do not understand how we prevent malicious user space from
crashing the kernel.
> }
>
> if (flush)
> xdp_do_flush();
>
> + if (tfile->napi_enabled && queued > 0)
> + napi_schedule(&tfile->napi);
> +
> rcu_read_unlock();
> local_bh_enable();
>
next prev parent reply other threads:[~2022-02-28 4:06 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-24 10:38 [PATCH] tun: support NAPI to accelerate packet processing Harold Huang
2022-02-24 17:22 ` Paolo Abeni
2022-02-25 3:36 ` Harold Huang
2022-02-25 3:46 ` Jason Wang
2022-02-25 9:02 ` [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs Harold Huang
2022-02-28 2:15 ` Jason Wang
2022-02-28 4:06 ` Eric Dumazet [this message]
2022-02-28 4:20 ` Jason Wang
[not found] ` <CANn89iKLhhwGnmEyfZuEKjtt7OwTbVyDYcFUMDYoRpdXjbMwiA@mail.gmail.com>
2022-02-28 5:17 ` Jason Wang
2022-02-28 7:26 ` Harold Huang
2022-02-28 7:56 ` Jason Wang
2022-02-28 3:38 ` [PATCH net-next v3] " Harold Huang
2022-02-28 7:46 ` Jason Wang
2022-02-28 17:15 ` Stephen Hemminger
2022-03-01 1:47 ` Jason Wang
2022-03-01 1:58 ` Harold Huang
2022-03-02 1:40 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c687e1d8-e36a-8f23-342a-22b2a1efb372@gmail.com \
--to=eric.dumazet@gmail.com \
--cc=ast@kernel.org \
--cc=baymaxhuang@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=jasowang@redhat.com \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).