From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Matthias <matthias.kannenberg@googlemail.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Is: RCU callback detects an RCU hang with Linux 3.12+ Was: Re: Status of FLR in Xen 4.4
Date: Fri, 27 Sep 2013 13:53:27 -0400 [thread overview]
Message-ID: <20130927175327.GA21364@phenom.dumpdata.com> (raw)
In-Reply-To: <CABoYbGriBmND6vYVbRO0wkOX+Fy3Ldz30gmGJzj9B=fiG-ySEw@mail.gmail.com>
On Fri, Sep 27, 2013 at 07:07:33PM +0200, Matthias wrote:
> Hi Konrad,
>
> good call! I was able to reproduce the error with the 3.12-rc2 kernel, got
> a lot of information with the new NMI traces (log attached), but since I'm
> not a xen hacker I don't really know how to continue from here. So I might
> add this to the original post and maybe someone can help me. After all the
> error persists for half a year now and besides 2 kernel version / .config
> Combinations (a 3.8.2 and a 3.6.something) I could never trace this issue
> back (even with bisecting the .config because at some point it seemed
> random).
Can you tell me a bit on how this happens? Is it happening after you
boot the machine? Does it happen after a specific workload?
It looks like something in the RCU is taking far too long and
the RCU callback mechanism starts complaining. The CPU0 is when the
RCU mechanism detects that something is off and starts sending NMI to
all CPUs. CPU2 is the only one that looks to be doing RCU callback:
NMI backtrace for cpu 1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.12.0-rc2 #2
Hardware name: System manufacturer System Product Name/Crosshair IV Formula, BIOS 3029 10/09/2012
task: ffff8800658da080 ti: ffff880065900000 task.ti: ffff880065900000
RIP: e030:[<ffffffff8125b2b2>] [<ffffffff8125b2b2>] cfb_imageblit+0x1b3/0x411
RSP: e02b:ffff88007de439f0 EFLAGS: 00000046
RAX: 0000000000000000 RBX: ffff88001e1c2800 RCX: 0000000000000003
RDX: 000000000000003b RSI: ffff88001e00614e RDI: 0000000000000000
RBP: 0000000000000013 R08: 0000000000000001 R09: ffffffff814655f0
R10: ffff88001e006116 R11: ffffc90014875710 R12: 000000000000000d
R13: 0000000000000000 R14: ffffc90014875714 R15: ffffc90014875000
FS: 00007fb294ab4900(0000) GS:ffff88007de40000(0000) knlGS:0000000000000000
CS: e033 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 00007fb29177a9a0 CR3: 000000000160c000 CR4: 0000000000000660
Stack:
0000000100aaaaaa 00000000000001d8 0000000000000000 0000000000aaaaaa
ffff8800532f0a40 ffff88001e1c2800 0000000000000001 ffff88001e1c2800
0000000000000000 ffff88007d424400 00000000ffff00ff 000000000000003b
Call Trace:
<IRQ> [<ffffffff81256ac4>] ? bit_putcs+0x352/0x39d
[<ffffffff81219825>] ? paravirt_read_tsc+0x5/0x8
[<ffffffff81256772>] ? bit_cursor+0x45d/0x45d
[<ffffffff812523a8>] ? fbcon_putcs+0xbd/0xcc
[<ffffffff812bc6b6>] ? vt_console_print+0x234/0x290
[<ffffffff810b336f>] ? call_console_drivers.constprop.18+0xb3/0xfc
[<ffffffff810b3c7d>] ? console_unlock+0x131/0x306
[<ffffffff810b420e>] ? vprintk_emit+0x3bc/0x3eb
[<ffffffff812c92f5>] ? paravirt_read_tsc+0x5/0x8
[<ffffffff812cae43>] ? add_interrupt_randomness+0x3f/0x15d
[<ffffffff813db9c8>] ? printk+0x4f/0x51
[<ffffffff810e4433>] ? rcu_check_callbacks+0x195/0x598 <==================
[<ffffffff810a3b50>] ? irqtime_account_process_tick.isra.2+0xd6/0x239
[<ffffffff810c232a>] ? tick_sched_do_timer+0x2e/0x2e
[<ffffffff81084c35>] ? update_process_times+0x30/0x5b
[<ffffffff810c2237>] ? tick_sched_handle+0x3e/0x4a
[<ffffffff810c235a>] ? tick_sched_timer+0x30/0x4c
[<ffffffff81098355>] ? __run_hrtimer+0x93/0x159
[<ffffffff81098b72>] ? hrtimer_interrupt+0xe3/0x1ca
[<ffffffff8103d8e4>] ? xen_timer_interrupt+0x31/0x13b
[<ffffffff81294c4c>] ? HYPERVISOR_event_channel_op+0xd/0x1d
[<ffffffff8103d79b>] ? xen_force_evtchn_callback+0x9/0xa
[<ffffffff8103df22>] ? check_events+0x12/0x20
[<ffffffff810b5b7a>] ? handle_irq_event_percpu+0x4d/0x1c5
[<ffffffff813e556e>] ? notifier_call_chain+0x32/0x52
[<ffffffff810b8287>] ? handle_percpu_irq+0x39/0x4c
[<ffffffff812951c0>] ? __xen_evtchn_do_upcall+0x107/0x2cb
[<ffffffff81219936>] ? delay_tsc+0x9c/0xc6
[<ffffffff81093ba0>] ? __rcu_read_unlock+0x33/0x51
[<ffffffff8129663a>] ? xen_evtchn_do_upcall+0x22/0x32
[<ffffffff813e897e>] ? xen_do_hypervisor_callback+0x1e/0x30
<EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[<ffffffff8103d768>] ? xen_safe_halt+0xc/0x13
[<ffffffff8104ae0b>] ? default_idle+0x14/0x3e
[<ffffffff810b53ee>] ? cpu_startup_entry+0x107/0x160
Code: fb 4c 89 d6 b9 08 00 00 00 ff cd 83 fd ff 74 32 44 0f be 2e 44 29 c1 8b 44 24 18 4d 8d 73 04 41 d3 fd 44 23 6c 24 04 43 23 04 a9 <41> 89 c5 41 31 fd 45 89 2b 85 c9 75 05 48 ff c6 b1 08 4d 89 f3
Which looks to be printing something on the VT console (which is running
in KMS mode as it uses framebuffer calls). So is there something on the
screen scrolling widly in a loop?
But then there are also complains about
INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 1.115 msecs
this taking too long. I am wondering if there is some time issue
on your box.
What version of Xen do you have?
>
>
> 2013/9/27 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>
> > On Thu, Sep 26, 2013 at 07:59:40PM +0200, Matthias wrote:
> > > I'm currently on a vanilla 3.8.2 kernel because this is the only >3.4
> > > kernel I found which doesn't give me this issue:
> > > http://lists.xen.org/archives/html/xen-users/2013-02/msg00114.html
> >
> > So v3.12 (or rather the latest and greaters of the Linus) has the mechanism
> > for the NMI - so you can actually see what is causing the stall.
> >
next prev parent reply other threads:[~2013-09-27 17:53 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-26 16:05 Status of FLR in Xen 4.4 Matthias
2013-09-26 16:16 ` Ian Campbell
2013-09-26 17:59 ` Matthias
2013-09-27 13:34 ` Konrad Rzeszutek Wilk
2013-09-27 17:07 ` Matthias
2013-09-27 17:28 ` Sander Eikelenboom
2013-09-27 19:19 ` Matthias
2013-09-27 19:33 ` Sander Eikelenboom
2013-09-27 19:48 ` Matthias
2013-09-27 20:06 ` Sander Eikelenboom
2013-09-27 17:53 ` Konrad Rzeszutek Wilk [this message]
2013-10-03 22:34 ` Is: RCU callback detects an RCU hang with Linux 3.12+ Was: " Matthias
2013-10-04 6:07 ` Pasi Kärkkäinen
2013-09-26 16:20 ` David Vrabel
2013-09-26 17:48 ` Ross Philipson
2013-09-26 18:01 ` David Vrabel
2013-09-26 18:41 ` Matthias
2013-09-26 19:13 ` Gordan Bobic
2013-09-27 12:26 ` Matthias
2013-09-27 13:27 ` Gordan Bobic
2013-09-27 13:48 ` Konrad Rzeszutek Wilk
2013-09-27 14:00 ` Gordan Bobic
2013-10-03 22:20 ` Matthias
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130927175327.GA21364@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=Ian.Campbell@citrix.com \
--cc=matthias.kannenberg@googlemail.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).