From: Julien Grall <julien.grall@linaro.org>
To: David Vrabel <david.vrabel@citrix.com>, Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>,
xen-devel <xen-devel@lists.xen.org>
Subject: Re: rcu_sched self-detect stall when disable vif device
Date: Wed, 28 Jan 2015 17:27:49 +0000 [thread overview]
Message-ID: <54C91C15.7030709@linaro.org> (raw)
In-Reply-To: <54C91712.3020806@citrix.com>
On 28/01/15 17:06, David Vrabel wrote:
> On 28/01/15 16:45, Julien Grall wrote:
>> On 27/01/15 16:53, Wei Liu wrote:
>>> On Tue, Jan 27, 2015 at 04:47:45PM +0000, Julien Grall wrote:
>>>> On 27/01/15 16:45, Wei Liu wrote:
>>>>> On Tue, Jan 27, 2015 at 04:03:52PM +0000, Julien Grall wrote:
>>>>>> Hi,
>>>>>>
>>>>>> While I'm working on support for 64K page in netfront, I got
>>>>>> an rcu_sced self-detect message. It happens when netback is
>>>>>> disabling the vif device due to an error.
>>>>>>
>>>>>> I'm using Linux 3.19-rc5 on seattle (ARM64). Any idea why
>>>>>> the processor is stucked in xenvif_rx_queue_purge?
>>>>>>
>>>>>
>>>>> When you try to release a SKB, core network driver need to enter some
>>>>> RCU cirital region to clean up. dst_release for one, calls call_rcu.
>>>>
>>>> But this message shouldn't happen in normal condition or because of
>>>> netfront. Right?
>>>>
>>>
>>> Never saw report like this before, even in the case that netfront is
>>> buggy.
>>
>> This is only happening when preemption is not enabled (i.e
>> CONFIG_PREEMPT_NONE in the config file) in the backend kernel.
>>
>> When the vif is disabled, the loop in xenvif_kthread_guest_rx turned
>> into an infinite loop. In my case, the code executed looks like:
>>
>>
>> 1. for (;;) {
>> 2. xenvif_wait_for_rx_work(queue);
>> 3.
>> 4. if (kthread_should_stop())
>> 5. break;
>> 6.
>> 7. if (unlikely(vif->disabled && queue->id == 0) {
>> 8. xenvif_carrier_off(vif);
>> 9. xenvif_rx_queue_purge(queue);
>> 10. continue;
>> 11. }
>> 12. }
>>
>> The wait on line 2 will return directly because the vif is disabled
>> (see xenvif_have_rx_work)
>>
>> We are on queue 0, so the condition on line 7 is true. Therefore we will
>> loop on line 10. And so on...
>>
>> On platform where preemption is not enabled, this thread will never
>> yield/give the hand to another thread (unless the domain is destroyed).
>
> I'm not sure why we have a continue in the vif->disabled case and not
> just a break. Can you try that?
So I applied this small patches:
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 908e65e..9448c6c 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -2110,7 +2110,7 @@ int xenvif_kthread_guest_rx(void *data)
if (unlikely(vif->disabled && queue->id == 0)) {
xenvif_carrier_off(vif);
xenvif_rx_queue_purge(queue);
- continue;
+ break;
}
if (!skb_queue_empty(&queue->rx_queue))
While I don't get anymore message rcu_sched stall, when I destroy the
guest, the backend hits a NULL pointer dereference:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = ffff800000a50000
[00000000] *pgd=00000083de82a003, *pud=00000083de82b003, *pmd=00000083de82c003, *pte=00600000e1110707
Internal error: Oops: 96000006 [#1] SMP
Modules linked in:
CPU: 4 PID: 34 Comm: xenwatch Not tainted 3.19.0-rc5-xen-seattle+ #13
Hardware name: AMD Seattle (RevA) Development Board (Overdrive) (DT)
task: ffff80001ea39480 ti: ffff80001ea78000 task.ti: ffff80001ea78000
PC is at exit_creds+0x18/0x70
LR is at __put_task_struct+0x3c/0xd4
pc : [<ffff8000000b2d94>] lr : [<ffff800000094990>] pstate: 80000145
sp : ffff80001ea7bc50
x29: ffff80001ea7bc50 x28: 0000000000000000
x27: 0000000000000000 x26: 0000000000000000
x25: 0000000000000000 x24: ffff80001eb3c840
x23: ffff80001eb3c840 x22: 000000000006c560
x21: ffff0000011f7000 x20: 0000000000000000
x19: ffff80001ba06680 x18: 0000ffffd2635bd0
x17: 0000ffff839e4074 x16: 00000000deadbeef
x15: ffffffffffffffff x14: 0ffffffffffffffe
x13: 0000000000000028 x12: 0000000000000010
x11: 0000000000000030 x10: 0101010101010101
x9 : ffff80001ea7b8e0 x8 : ffff7c01cf6e2740
x7 : 0000000000000000 x6 : 0000000000002fc9
x5 : 0000000000000000 x4 : 0000000000000001
x3 : 0000000000000000 x2 : ffff80001ba06690
x1 : 0000000000000000 x0 : 0000000000000000
Process xenwatch (pid: 34, stack limit = 0xffff80001ea78058)
Stack: (0xffff80001ea7bc50 to 0xffff80001ea7c000)
bc40: 1ea7bc70 ffff8000 00094990 ffff8000
bc60: 1ba06680 ffff8000 008b45a8 ffff8000 1ea7bc90 ffff8000 000b15f0 ffff8000
bc80: 1ba06680 ffff8000 005bcab8 ffff8000 1ea7bcc0 ffff8000 00541efc ffff8000
bca0: 011ed000 ffff0000 00000000 00000000 011f7000 ffff0000 00000006 00000000
bcc0: 1ea7bd00 ffff8000 00540984 ffff8000 1ce23680 ffff8000 00000006 00000000
bce0: 00752cf0 ffff8000 00000001 00000000 00752e38 ffff8000 1ea7bd98 ffff8000
bd00: 1ea7bd40 ffff8000 00540bcc ffff8000 1ce23680 ffff8000 1cce0c00 ffff8000
bd20: 00000000 00000000 1cce0c00 ffff8000 009b0288 ffff8000 1ea7be20 ffff8000
bd40: 1ea7bd70 ffff8000 0048011c ffff8000 1ce23700 ffff8000 1cf71000 ffff8000
bd60: 009a6258 ffff8000 00a36d38 00000000 1ea7bdb0 ffff8000 00480ea4 ffff8000
bd80: 1b89d800 ffff8000 009a62b0 ffff8000 009a6258 ffff8000 00a36d38 ffff8000
bda0: 00a36e30 ffff8000 0047f7c0 ffff8000 1ea7bdc0 ffff8000 0047f82c ffff8000
bdc0: 1ea7be30 ffff8000 000b1064 ffff8000 1ea48cc0 ffff8000 009dbfe8 ffff8000
bde0: 008552d8 ffff8000 00000000 00000000 0047f778 ffff8000 00000000 00000000
be00: 1ea7be30 ffff8000 00000000 ffff8000 1ea39480 ffff8000 000c75f8 ffff8000
be20: 1ea7be20 ffff8000 1ea7be20 ffff8000 00000000 00000000 00085930 ffff8000
be40: 000b0f88 ffff8000 1ea48cc0 ffff8000 00000000 00000000 00000000 00000000
be60: 00000000 00000000 1ea48cc0 ffff8000 00000000 00000000 00000000 00000000
be80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bea0: 1ea7bea0 ffff8000 1ea7bea0 ffff8000 00000000 ffff8000 00000000 00000000
bec0: 1ea7bec0 ffff8000 1ea7bec0 ffff8000 00000000 00000000 00000000 00000000
bee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bf00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bf20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bf40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bf60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bf80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bfa0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000005 00000000
bfe0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Call trace:
[<ffff8000000b2d94>] exit_creds+0x18/0x70
[<ffff80000009498c>] __put_task_struct+0x38/0xd4
[<ffff8000000b15ec>] kthread_stop+0xc0/0x130
[<ffff800000541ef8>] xenvif_disconnect+0x58/0xd0
[<ffff800000540980>] set_backend_state+0x134/0x278
[<ffff800000540bc8>] frontend_changed+0x8c/0xec
[<ffff800000480118>] xenbus_otherend_changed+0x9c/0xa4
[<ffff800000480ea0>] frontend_changed+0xc/0x18
[<ffff80000047f828>] xenwatch_thread+0xb0/0x140
[<ffff8000000b1060>] kthread+0xd8/0xf0
Code: f9000bf3 aa0003f3 f9422401 f9422000 (b9400021)
---[ end trace af11d521ee530da8 ]---
Regards,
--
Julien Grall
next prev parent reply other threads:[~2015-01-28 17:27 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-27 16:03 rcu_sched self-detect stall when disable vif device Julien Grall
2015-01-27 16:45 ` Wei Liu
2015-01-27 16:47 ` Julien Grall
2015-01-27 16:53 ` Wei Liu
2015-01-28 16:45 ` Julien Grall
2015-01-28 17:06 ` David Vrabel
2015-01-28 17:27 ` Julien Grall [this message]
2015-01-30 16:04 ` David Vrabel
2015-02-02 13:54 ` Julien Grall
2015-01-27 16:56 ` David Vrabel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54C91C15.7030709@linaro.org \
--to=julien.grall@linaro.org \
--cc=david.vrabel@citrix.com \
--cc=ian.campbell@citrix.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.