* [PATCH net v3 1/2] Revert "tuntap: add missing xdp flush"
@ 2018-02-22 9:36 Jason Wang
2018-02-22 9:36 ` [PATCH net v3 2/2] tuntap: correctly add the missing xdp flush Jason Wang
0 siblings, 1 reply; 4+ messages in thread
From: Jason Wang @ 2018-02-22 9:36 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: mst, christoffer.dall, sergei.shtylyov, Jason Wang
This reverts commit 762c330d670e3d4b795cf7a8d761866fdd1eef49. The
reason is we try to batch packets for devmap which causes calling
xdp_do_flush() in the process context. Simply disabling preemption
may not work since process may move among processors which lead
xdp_do_flush() to miss some flushes on some processors.
So simply revert the patch, a follow-up patch will add the xdp flush
correctly.
Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
Fixes: 762c330d670e ("tuntap: add missing xdp flush")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/tun.c | 15 ---------------
1 file changed, 15 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index b52258c..2823a4a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -181,7 +181,6 @@ struct tun_file {
struct tun_struct *detached;
struct ptr_ring tx_ring;
struct xdp_rxq_info xdp_rxq;
- int xdp_pending_pkts;
};
struct tun_flow_entry {
@@ -1662,7 +1661,6 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
case XDP_REDIRECT:
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
- ++tfile->xdp_pending_pkts;
err = xdp_do_redirect(tun->dev, &xdp, xdp_prog);
if (err)
goto err_redirect;
@@ -1984,11 +1982,6 @@ static ssize_t tun_chr_write_iter(struct kiocb *iocb, struct iov_iter *from)
result = tun_get_user(tun, tfile, NULL, from,
file->f_flags & O_NONBLOCK, false);
- if (tfile->xdp_pending_pkts) {
- tfile->xdp_pending_pkts = 0;
- xdp_do_flush_map();
- }
-
tun_put(tun);
return result;
}
@@ -2325,13 +2318,6 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
ret = tun_get_user(tun, tfile, m->msg_control, &m->msg_iter,
m->msg_flags & MSG_DONTWAIT,
m->msg_flags & MSG_MORE);
-
- if (tfile->xdp_pending_pkts >= NAPI_POLL_WEIGHT ||
- !(m->msg_flags & MSG_MORE)) {
- tfile->xdp_pending_pkts = 0;
- xdp_do_flush_map();
- }
-
tun_put(tun);
return ret;
}
@@ -3163,7 +3149,6 @@ static int tun_chr_open(struct inode *inode, struct file * file)
sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);
memset(&tfile->tx_ring, 0, sizeof(tfile->tx_ring));
- tfile->xdp_pending_pkts = 0;
return 0;
}
--
2.7.4
^ permalink raw reply related [flat|nested] 4+ messages in thread* [PATCH net v3 2/2] tuntap: correctly add the missing xdp flush
2018-02-22 9:36 [PATCH net v3 1/2] Revert "tuntap: add missing xdp flush" Jason Wang
@ 2018-02-22 9:36 ` Jason Wang
2018-02-22 17:46 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 4+ messages in thread
From: Jason Wang @ 2018-02-22 9:36 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: mst, christoffer.dall, sergei.shtylyov, Jason Wang
Commit 762c330d670e ("tuntap: add missing xdp flush") tries to fix the
devmap stall caused by missed xdp flush by counting the pending xdp
redirected packets and flush when it exceeds NAPI_POLL_WEIGHT or
MSG_MORE is clear. This may lead to BUG() since xdp_do_flush() was
called in the process context with preemption enabled. Simply
disabling preemption may silence the warning but be not enough since
process may move between different CPUS during a batch which cause
xdp_do_flush() misses some CPU where the process run
previously. Consider the fallouts, that commit was reverted. To fix
the issue correctly, we can simply call xdp_do_flush() immediately
after xdp_do_redirect(), a side effect is that this removes any
possibility of batching which could be addressed in the future.
Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
Fixes: 762c330d670e ("tuntap: add missing xdp flush")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/tun.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 2823a4a..a363ea2 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1662,6 +1662,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
err = xdp_do_redirect(tun->dev, &xdp, xdp_prog);
+ xdp_do_flush_map();
if (err)
goto err_redirect;
rcu_read_unlock();
--
2.7.4
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH net v3 2/2] tuntap: correctly add the missing xdp flush
2018-02-22 9:36 ` [PATCH net v3 2/2] tuntap: correctly add the missing xdp flush Jason Wang
@ 2018-02-22 17:46 ` Jesper Dangaard Brouer
2018-02-23 1:59 ` Jason Wang
0 siblings, 1 reply; 4+ messages in thread
From: Jesper Dangaard Brouer @ 2018-02-22 17:46 UTC (permalink / raw)
To: Jason Wang
Cc: brouer, netdev, linux-kernel, mst, christoffer.dall,
sergei.shtylyov
On Thu, 22 Feb 2018 17:36:46 +0800
Jason Wang <jasowang@redhat.com> wrote:
> Commit 762c330d670e ("tuntap: add missing xdp flush") tries to fix the
> devmap stall caused by missed xdp flush by counting the pending xdp
> redirected packets and flush when it exceeds NAPI_POLL_WEIGHT or
> MSG_MORE is clear. This may lead to BUG() since xdp_do_flush() was
> called in the process context with preemption enabled. Simply
> disabling preemption may silence the warning but be not enough since
> process may move between different CPUS during a batch which cause
> xdp_do_flush() misses some CPU where the process run
> previously. Consider the fallouts, that commit was reverted. To fix
> the issue correctly, we can simply call xdp_do_flush() immediately
> after xdp_do_redirect(), a side effect is that this removes any
> possibility of batching which could be addressed in the future.
>
> Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
> Fixes: 762c330d670e ("tuntap: add missing xdp flush")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/net/tun.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 2823a4a..a363ea2 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1662,6 +1662,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
> get_page(alloc_frag->page);
> alloc_frag->offset += buflen;
> err = xdp_do_redirect(tun->dev, &xdp, xdp_prog);
> + xdp_do_flush_map();
> if (err)
> goto err_redirect;
> rcu_read_unlock();
As you have noticed, the xdp_do_redirect() + xdp_do_flush_map() rely
heavily on being executed in softirq/napi_schedule context.
Particularly the map infra devmap[1]+cpumap depend on the enqueue and
flush operation MUST happen on the same CPU (e.g. stores which
devices needs flushing in a this_cpu_ptr bitmap [1]).
What context is tun_build_skb() invoked under?
Even when you call xdp_do_redirect and xdp_do_flush_map right after
each-other, are we sure we cannot be preempted here?
[1] https://github.com/torvalds/linux/blob/master/kernel/bpf/devmap.c#L209-L215
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH net v3 2/2] tuntap: correctly add the missing xdp flush
2018-02-22 17:46 ` Jesper Dangaard Brouer
@ 2018-02-23 1:59 ` Jason Wang
0 siblings, 0 replies; 4+ messages in thread
From: Jason Wang @ 2018-02-23 1:59 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: netdev, linux-kernel, mst, christoffer.dall, sergei.shtylyov
On 2018年02月23日 01:46, Jesper Dangaard Brouer wrote:
> On Thu, 22 Feb 2018 17:36:46 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> Commit 762c330d670e ("tuntap: add missing xdp flush") tries to fix the
>> devmap stall caused by missed xdp flush by counting the pending xdp
>> redirected packets and flush when it exceeds NAPI_POLL_WEIGHT or
>> MSG_MORE is clear. This may lead to BUG() since xdp_do_flush() was
>> called in the process context with preemption enabled. Simply
>> disabling preemption may silence the warning but be not enough since
>> process may move between different CPUS during a batch which cause
>> xdp_do_flush() misses some CPU where the process run
>> previously. Consider the fallouts, that commit was reverted. To fix
>> the issue correctly, we can simply call xdp_do_flush() immediately
>> after xdp_do_redirect(), a side effect is that this removes any
>> possibility of batching which could be addressed in the future.
>>
>> Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
>> Fixes: 762c330d670e ("tuntap: add missing xdp flush")
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> drivers/net/tun.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 2823a4a..a363ea2 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1662,6 +1662,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>> get_page(alloc_frag->page);
>> alloc_frag->offset += buflen;
>> err = xdp_do_redirect(tun->dev, &xdp, xdp_prog);
>> + xdp_do_flush_map();
>> if (err)
>> goto err_redirect;
>> rcu_read_unlock();
> As you have noticed, the xdp_do_redirect() + xdp_do_flush_map() rely
> heavily on being executed in softirq/napi_schedule context.
> Particularly the map infra devmap[1]+cpumap depend on the enqueue and
> flush operation MUST happen on the same CPU (e.g. stores which
> devices needs flushing in a this_cpu_ptr bitmap [1]).
>
> What context is tun_build_skb() invoked under?
>
> Even when you call xdp_do_redirect and xdp_do_flush_map right after
> each-other, are we sure we cannot be preempted here?
Ok, I miss the fact that we can be preempted here with preemptible RCU.
Let me disable preemption here and post a V4.
Thanks
>
>
> [1] https://github.com/torvalds/linux/blob/master/kernel/bpf/devmap.c#L209-L215
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-02-23 1:59 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-22 9:36 [PATCH net v3 1/2] Revert "tuntap: add missing xdp flush" Jason Wang
2018-02-22 9:36 ` [PATCH net v3 2/2] tuntap: correctly add the missing xdp flush Jason Wang
2018-02-22 17:46 ` Jesper Dangaard Brouer
2018-02-23 1:59 ` Jason Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).