public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG] tuntap: using smp_processor_id() in preemptible from xdp flush
@ 2018-02-13 13:21 Christoffer Dall
  2018-02-13 14:15 ` Jason Wang
  0 siblings, 1 reply; 4+ messages in thread
From: Christoffer Dall @ 2018-02-13 13:21 UTC (permalink / raw)
  To: jasowang; +Cc: davem, Marc Zyngier, kvm

Hi Jason,

With v4.16-rc1 I see a low of these when running my KVM/ARM test loop:

  BUG: using smp_processor_id() in preemptible [00000000] code: vhost-2877/2900
  caller is debug_smp_processor_id+0x1c/0x28
  CPU: 0 PID: 2900 Comm: vhost-2877 Not tainted 4.16.0-rc1 #1333
  Hardware name: APM X-Gene Mustang board (DT)
  Call trace:
   dump_backtrace+0x0/0x180
   show_stack+0x24/0x30
   dump_stack+0x8c/0xac
   check_preemption_disabled+0xf8/0x100
   debug_smp_processor_id+0x1c/0x28
   xdp_do_flush_map+0x24/0x48
   tun_sendmsg+0x90/0xa0
   handle_tx+0x254/0x548
   handle_tx_kick+0x20/0x30
   vhost_worker+0xc0/0x158
   kthread+0x104/0x130
   ret_from_fork+0x10/0x1c

I confirmed that reverting
  762c330d670e, "tuntap: add missing xdp flush", 2018-02-07
solves the problem for me.

I'm not at all familiar with this part of the kernel and not sure what
the proper fix is.  I'd be grateful if you could take a look and I'm
happy to help test etc.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] tuntap: using smp_processor_id() in preemptible from xdp flush
  2018-02-13 13:21 [BUG] tuntap: using smp_processor_id() in preemptible from xdp flush Christoffer Dall
@ 2018-02-13 14:15 ` Jason Wang
  2018-02-13 14:35   ` Christoffer Dall
  0 siblings, 1 reply; 4+ messages in thread
From: Jason Wang @ 2018-02-13 14:15 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: davem, Marc Zyngier, kvm

[-- Attachment #1: Type: text/plain, Size: 1509 bytes --]



On 2018年02月13日 21:21, Christoffer Dall wrote:
> Hi Jason,
>
> With v4.16-rc1 I see a low of these when running my KVM/ARM test loop:
>
>    BUG: using smp_processor_id() in preemptible [00000000] code: vhost-2877/2900
>    caller is debug_smp_processor_id+0x1c/0x28
>    CPU: 0 PID: 2900 Comm: vhost-2877 Not tainted 4.16.0-rc1 #1333
>    Hardware name: APM X-Gene Mustang board (DT)
>    Call trace:
>     dump_backtrace+0x0/0x180
>     show_stack+0x24/0x30
>     dump_stack+0x8c/0xac
>     check_preemption_disabled+0xf8/0x100
>     debug_smp_processor_id+0x1c/0x28
>     xdp_do_flush_map+0x24/0x48
>     tun_sendmsg+0x90/0xa0
>     handle_tx+0x254/0x548
>     handle_tx_kick+0x20/0x30
>     vhost_worker+0xc0/0x158
>     kthread+0x104/0x130
>     ret_from_fork+0x10/0x1c
>
> I confirmed that reverting
>    762c330d670e, "tuntap: add missing xdp flush", 2018-02-07
> solves the problem for me.
>
> I'm not at all familiar with this part of the kernel and not sure what
> the proper fix is.  I'd be grateful if you could take a look and I'm
> happy to help test etc.
>
> Thanks,
> -Christoffer

Hi Christoffer:

Thanks for the reporting. Looking like it was because I try hard to do 
batching for XDP devmap which seems a little hard since it assumes XDP 
was running inside NAPI. Simply disable preemption can silent the 
warning but may lead other issue e.g miss some CPU where the process run 
previously. The only way is to disable batching now.

Please help to test the attached patch.

Thanks

[-- Attachment #2: 0001-tuntap-try-not-batch-packets-for-devmap.patch --]
[-- Type: text/x-patch, Size: 2798 bytes --]

>From 289267a71bc417bd86aa0040d2f3822e9c5aee37 Mon Sep 17 00:00:00 2001
From: Jason Wang <jasowang@redhat.com>
Date: Tue, 13 Feb 2018 22:06:04 +0800
Subject: [PATCH] tuntap: try not batch packets for devmap

Commit 762c330d670e ("tuntap: add missing xdp flush") tries to fix the
devmap stall caused by missed xdp flush by counting the pending xdp
redirected packets and flush when it exceeds NAPI_POLL_WEIGHT or
MSG_MORE is clear. This may lead BUG() since xdp_do_flush() was
called under process context with preemption enabled. Simply disable
preemption may silent the warning but be not enough since process may
move between different CPUS during a batch which cause xdp_do_flush()
misses some CPU where the process run previously. For -net, fix this
by simply calling xdp_do_flush() immediately after xdp_do_redirect(),
a side effect is that this removes any possibility of batching which
could be addressed in the future.

Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
Fixes: 762c330d670e ("tuntap: add missing xdp flush")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c | 15 +--------------
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 17e496b..6a4cd97 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -181,7 +181,6 @@ struct tun_file {
 	struct tun_struct *detached;
 	struct ptr_ring tx_ring;
 	struct xdp_rxq_info xdp_rxq;
-	int xdp_pending_pkts;
 };
 
 struct tun_flow_entry {
@@ -1666,10 +1665,10 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 		case XDP_REDIRECT:
 			get_page(alloc_frag->page);
 			alloc_frag->offset += buflen;
-			++tfile->xdp_pending_pkts;
 			err = xdp_do_redirect(tun->dev, &xdp, xdp_prog);
 			if (err)
 				goto err_redirect;
+			xdp_do_flush_map();
 			rcu_read_unlock();
 			return NULL;
 		case XDP_TX:
@@ -1988,11 +1987,6 @@ static ssize_t tun_chr_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	result = tun_get_user(tun, tfile, NULL, from,
 			      file->f_flags & O_NONBLOCK, false);
 
-	if (tfile->xdp_pending_pkts) {
-		tfile->xdp_pending_pkts = 0;
-		xdp_do_flush_map();
-	}
-
 	tun_put(tun);
 	return result;
 }
@@ -2330,12 +2324,6 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 			   m->msg_flags & MSG_DONTWAIT,
 			   m->msg_flags & MSG_MORE);
 
-	if (tfile->xdp_pending_pkts >= NAPI_POLL_WEIGHT ||
-	    !(m->msg_flags & MSG_MORE)) {
-		tfile->xdp_pending_pkts = 0;
-		xdp_do_flush_map();
-	}
-
 	tun_put(tun);
 	return ret;
 }
@@ -3167,7 +3155,6 @@ static int tun_chr_open(struct inode *inode, struct file * file)
 	sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);
 
 	memset(&tfile->tx_ring, 0, sizeof(tfile->tx_ring));
-	tfile->xdp_pending_pkts = 0;
 
 	return 0;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [BUG] tuntap: using smp_processor_id() in preemptible from xdp flush
  2018-02-13 14:15 ` Jason Wang
@ 2018-02-13 14:35   ` Christoffer Dall
  2018-02-14  2:41     ` Jason Wang
  0 siblings, 1 reply; 4+ messages in thread
From: Christoffer Dall @ 2018-02-13 14:35 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, Marc Zyngier, kvm

On Tue, Feb 13, 2018 at 10:15:20PM +0800, Jason Wang wrote:
> 
> 
> On 2018年02月13日 21:21, Christoffer Dall wrote:
> >With v4.16-rc1 I see a low of these when running my KVM/ARM test loop:
> >
> >   BUG: using smp_processor_id() in preemptible [00000000] code: vhost-2877/2900
> >   caller is debug_smp_processor_id+0x1c/0x28
> >   CPU: 0 PID: 2900 Comm: vhost-2877 Not tainted 4.16.0-rc1 #1333
> >   Hardware name: APM X-Gene Mustang board (DT)
> >   Call trace:
> >    dump_backtrace+0x0/0x180
> >    show_stack+0x24/0x30
> >    dump_stack+0x8c/0xac
> >    check_preemption_disabled+0xf8/0x100
> >    debug_smp_processor_id+0x1c/0x28
> >    xdp_do_flush_map+0x24/0x48
> >    tun_sendmsg+0x90/0xa0
> >    handle_tx+0x254/0x548
> >    handle_tx_kick+0x20/0x30
> >    vhost_worker+0xc0/0x158
> >    kthread+0x104/0x130
> >    ret_from_fork+0x10/0x1c
> >
> >I confirmed that reverting
> >   762c330d670e, "tuntap: add missing xdp flush", 2018-02-07
> >solves the problem for me.
> >
> >I'm not at all familiar with this part of the kernel and not sure what
> >the proper fix is.  I'd be grateful if you could take a look and I'm
> >happy to help test etc.
> >
> 
> Thanks for the reporting. Looking like it was because I try hard to do
> batching for XDP devmap which seems a little hard since it assumes XDP was
> running inside NAPI. Simply disable preemption can silent the warning but
> may lead other issue e.g miss some CPU where the process run previously. The
> only way is to disable batching now.
> 
> Please help to test the attached patch.
> 
I can confirm that the patch solves the issue.  I put some stress on a
VM using tuntap by running netperf in TCP_STREAM, TCP_MAERTS, and TCP_RR
on there, and I didn't see any warnings:

Tested-by: Christoffer Dall <christoffer.dall@linaro.org>

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] tuntap: using smp_processor_id() in preemptible from xdp flush
  2018-02-13 14:35   ` Christoffer Dall
@ 2018-02-14  2:41     ` Jason Wang
  0 siblings, 0 replies; 4+ messages in thread
From: Jason Wang @ 2018-02-14  2:41 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: davem, Marc Zyngier, kvm



On 2018年02月13日 22:35, Christoffer Dall wrote:
> On Tue, Feb 13, 2018 at 10:15:20PM +0800, Jason Wang wrote:
>>
>> On 2018年02月13日 21:21, Christoffer Dall wrote:
>>> With v4.16-rc1 I see a low of these when running my KVM/ARM test loop:
>>>
>>>    BUG: using smp_processor_id() in preemptible [00000000] code: vhost-2877/2900
>>>    caller is debug_smp_processor_id+0x1c/0x28
>>>    CPU: 0 PID: 2900 Comm: vhost-2877 Not tainted 4.16.0-rc1 #1333
>>>    Hardware name: APM X-Gene Mustang board (DT)
>>>    Call trace:
>>>     dump_backtrace+0x0/0x180
>>>     show_stack+0x24/0x30
>>>     dump_stack+0x8c/0xac
>>>     check_preemption_disabled+0xf8/0x100
>>>     debug_smp_processor_id+0x1c/0x28
>>>     xdp_do_flush_map+0x24/0x48
>>>     tun_sendmsg+0x90/0xa0
>>>     handle_tx+0x254/0x548
>>>     handle_tx_kick+0x20/0x30
>>>     vhost_worker+0xc0/0x158
>>>     kthread+0x104/0x130
>>>     ret_from_fork+0x10/0x1c
>>>
>>> I confirmed that reverting
>>>    762c330d670e, "tuntap: add missing xdp flush", 2018-02-07
>>> solves the problem for me.
>>>
>>> I'm not at all familiar with this part of the kernel and not sure what
>>> the proper fix is.  I'd be grateful if you could take a look and I'm
>>> happy to help test etc.
>>>
>> Thanks for the reporting. Looking like it was because I try hard to do
>> batching for XDP devmap which seems a little hard since it assumes XDP was
>> running inside NAPI. Simply disable preemption can silent the warning but
>> may lead other issue e.g miss some CPU where the process run previously. The
>> only way is to disable batching now.
>>
>> Please help to test the attached patch.
>>
> I can confirm that the patch solves the issue.  I put some stress on a
> VM using tuntap by running netperf in TCP_STREAM, TCP_MAERTS, and TCP_RR
> on there, and I didn't see any warnings:
>
> Tested-by: Christoffer Dall <christoffer.dall@linaro.org>
>
> Thanks,
> -Christoffer

Thanks for the testing.

Will post a formal patch soon.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-02-14  2:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-13 13:21 [BUG] tuntap: using smp_processor_id() in preemptible from xdp flush Christoffer Dall
2018-02-13 14:15 ` Jason Wang
2018-02-13 14:35   ` Christoffer Dall
2018-02-14  2:41     ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox