* [PATCH v2 0/2] Xen: Fix Xen hypervisor panic during dumping timer info on huge machine.
@ 2016-10-12 7:58 Lan Tianyu
2016-10-12 7:58 ` [PATCH v2 1/2] Xen/Keyhandler: Rework process of nonirq keyhandler Lan Tianyu
2016-10-12 7:58 ` [PATCH v2 2/2] Xen/timer: Process softirq during dumping timer info Lan Tianyu
0 siblings, 2 replies; 9+ messages in thread
From: Lan Tianyu @ 2016-10-12 7:58 UTC (permalink / raw)
To: xen-devel, xen-devel
Cc: Lan Tianyu, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
ian.jackson, jbeulich
This patchset is to fix triggering NMI watchdog during dump timer info
on the huge machine with a mount of physical cpus. Detail please see
change log of Patch 1.
Previous discussion:
https://patchwork.kernel.org/patch/9328449/
Change since V1:
Add "async" param for handle_keypress() to identify
whether run nonirq keyhandler in tasklet or not. This is to
avoid processing debugkey sysctl asynchronously.
Lan Tianyu (2):
Xen/Keyhandler: Rework process of nonirq keyhandler
Xen/timer: Process softirq during dumping timer info
xen/common/keyhandler.c | 8 +++++---
xen/common/sysctl.c | 2 +-
xen/common/timer.c | 1 +
xen/drivers/char/console.c | 2 +-
xen/include/xen/keyhandler.h | 4 +++-
5 files changed, 11 insertions(+), 6 deletions(-)
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH v2 1/2] Xen/Keyhandler: Rework process of nonirq keyhandler 2016-10-12 7:58 [PATCH v2 0/2] Xen: Fix Xen hypervisor panic during dumping timer info on huge machine Lan Tianyu @ 2016-10-12 7:58 ` Lan Tianyu 2016-10-12 13:19 ` Jan Beulich 2016-10-12 7:58 ` [PATCH v2 2/2] Xen/timer: Process softirq during dumping timer info Lan Tianyu 1 sibling, 1 reply; 9+ messages in thread From: Lan Tianyu @ 2016-10-12 7:58 UTC (permalink / raw) To: xen-devel, xen-devel Cc: Lan Tianyu, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3, ian.jackson, jbeulich Keyhandler may run for a long time in serial port driver's timer handler on the large machine with a lot of physical cpus(e,g dump_timerq()) when serial port driver works in the poll mode(via the exception mechanism). If a timer handler runs a long time, it will block nmi_timer_fn() to feed NMI watchdog and cause Xen hypervisor panic. Inserting process_pending_softirqs() in timer handler will not help. when timer interrupt arrives, timer subsystem calls all expired timer handlers before programming next timer interrupt. There is no timer interrupt arriving to trigger timer softirq during run a timer handler. This patch is to fix the issue to make nonirq keyhandler run in tasklet when receive debug key from serial port. Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> --- xen/common/keyhandler.c | 8 +++++--- xen/common/sysctl.c | 2 +- xen/drivers/char/console.c | 2 +- xen/include/xen/keyhandler.h | 4 +++- 4 files changed, 10 insertions(+), 6 deletions(-) diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c index 16de6e8..3d50041 100644 --- a/xen/common/keyhandler.c +++ b/xen/common/keyhandler.c @@ -75,19 +75,21 @@ static struct keyhandler { static void keypress_action(unsigned long unused) { - handle_keypress(keypress_key, NULL); + console_start_log_everything(); + key_table[keypress_key].fn(keypress_key); + console_end_log_everything(); } static DECLARE_TASKLET(keypress_tasklet, keypress_action, 0); -void handle_keypress(unsigned char key, struct cpu_user_regs *regs) +void handle_keypress(unsigned char key, struct cpu_user_regs *regs, bool async) { struct keyhandler *h; if ( key >= ARRAY_SIZE(key_table) || !(h = &key_table[key])->fn ) return; - if ( !in_irq() || h->irq_callback ) + if ( h->irq_callback || !async ) { console_start_log_everything(); h->irq_callback ? h->irq_fn(key, regs) : h->fn(key); diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c index 8aea6ef..1eb7bad 100644 --- a/xen/common/sysctl.c +++ b/xen/common/sysctl.c @@ -136,7 +136,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl) { if ( copy_from_guest_offset(&c, op->u.debug_keys.keys, i, 1) ) goto out; - handle_keypress(c, guest_cpu_user_regs()); + handle_keypress(c, guest_cpu_user_regs(), false); } ret = 0; copyback = 0; diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c index 55ae31a..184b523 100644 --- a/xen/drivers/char/console.c +++ b/xen/drivers/char/console.c @@ -347,7 +347,7 @@ static void switch_serial_input(void) static void __serial_rx(char c, struct cpu_user_regs *regs) { if ( xen_rx ) - return handle_keypress(c, regs); + return handle_keypress(c, regs, true); /* Deliver input to guest buffer, unless it is already full. */ if ( (serial_rx_prod-serial_rx_cons) != SERIAL_RX_SIZE ) diff --git a/xen/include/xen/keyhandler.h b/xen/include/xen/keyhandler.h index 06c05c8..e9595bd 100644 --- a/xen/include/xen/keyhandler.h +++ b/xen/include/xen/keyhandler.h @@ -46,7 +46,9 @@ void register_irq_keyhandler(unsigned char key, bool_t diagnostic); /* Inject a keypress into the key-handling subsystem. */ -extern void handle_keypress(unsigned char key, struct cpu_user_regs *regs); +extern void handle_keypress(unsigned char key, + struct cpu_user_regs *regs, + bool async); /* Scratch space is available for use of any keyhandler. */ extern char keyhandler_scratch[1024]; -- 1.7.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] Xen/Keyhandler: Rework process of nonirq keyhandler 2016-10-12 7:58 ` [PATCH v2 1/2] Xen/Keyhandler: Rework process of nonirq keyhandler Lan Tianyu @ 2016-10-12 13:19 ` Jan Beulich 2016-10-12 14:30 ` Lan, Tianyu 0 siblings, 1 reply; 9+ messages in thread From: Jan Beulich @ 2016-10-12 13:19 UTC (permalink / raw) To: Lan Tianyu Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3, ian.jackson, xen-devel >>> On 12.10.16 at 09:58, <tianyu.lan@intel.com> wrote: > --- a/xen/drivers/char/console.c > +++ b/xen/drivers/char/console.c > @@ -347,7 +347,7 @@ static void switch_serial_input(void) > static void __serial_rx(char c, struct cpu_user_regs *regs) > { > if ( xen_rx ) > - return handle_keypress(c, regs); > + return handle_keypress(c, regs, true); I think it would be nice to pass true here only when in polling mode, unless you know or can deduce that the a similar problem also exists in IRQ mode. Perhaps you could simply move the !in_irq() here? (Of course the new function parameter would then want to be renamed.) Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] Xen/Keyhandler: Rework process of nonirq keyhandler 2016-10-12 13:19 ` Jan Beulich @ 2016-10-12 14:30 ` Lan, Tianyu 2016-10-12 16:03 ` Jan Beulich 0 siblings, 1 reply; 9+ messages in thread From: Lan, Tianyu @ 2016-10-12 14:30 UTC (permalink / raw) To: Jan Beulich Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3, ian.jackson, xen-devel On 10/12/2016 9:19 PM, Jan Beulich wrote: >>>> On 12.10.16 at 09:58, <tianyu.lan@intel.com> wrote: >> --- a/xen/drivers/char/console.c >> +++ b/xen/drivers/char/console.c >> @@ -347,7 +347,7 @@ static void switch_serial_input(void) >> static void __serial_rx(char c, struct cpu_user_regs *regs) >> { >> if ( xen_rx ) >> - return handle_keypress(c, regs); >> + return handle_keypress(c, regs, true); > > I think it would be nice to pass true here only when in polling mode, > unless you know or can deduce that the a similar problem also exists > in IRQ mode. Perhaps you could simply move the !in_irq() here? That's a good idea. Thanks. >(Of course the new function parameter would then want to be renamed.) Since the issue happens when handle_keypress() runs in a timer handler, how about to name new parameter "intimer"? __serial_rx() is called in a timer handler or interrupt handler. Or do you have other suggestion? > > Jan > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] Xen/Keyhandler: Rework process of nonirq keyhandler 2016-10-12 14:30 ` Lan, Tianyu @ 2016-10-12 16:03 ` Jan Beulich 2016-10-13 1:15 ` Lan Tianyu 0 siblings, 1 reply; 9+ messages in thread From: Jan Beulich @ 2016-10-12 16:03 UTC (permalink / raw) To: Tianyu Lan Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3, ian.jackson, xen-devel >>> On 12.10.16 at 16:30, <tianyu.lan@intel.com> wrote: > > On 10/12/2016 9:19 PM, Jan Beulich wrote: >>>>> On 12.10.16 at 09:58, <tianyu.lan@intel.com> wrote: >>> --- a/xen/drivers/char/console.c >>> +++ b/xen/drivers/char/console.c >>> @@ -347,7 +347,7 @@ static void switch_serial_input(void) >>> static void __serial_rx(char c, struct cpu_user_regs *regs) >>> { >>> if ( xen_rx ) >>> - return handle_keypress(c, regs); >>> + return handle_keypress(c, regs, true); >> >> I think it would be nice to pass true here only when in polling mode, >> unless you know or can deduce that the a similar problem also exists >> in IRQ mode. Perhaps you could simply move the !in_irq() here? > > That's a good idea. Thanks. > >>(Of course the new function parameter would then want to be renamed.) > > Since the issue happens when handle_keypress() runs in a timer handler, > how about to name new parameter "intimer"? __serial_rx() is called in a > timer handler or interrupt handler. Or do you have other suggestion? I think "intimer" can be confusing (to be mixed up with timer interrupt). How about "force_tasklet"? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] Xen/Keyhandler: Rework process of nonirq keyhandler 2016-10-12 16:03 ` Jan Beulich @ 2016-10-13 1:15 ` Lan Tianyu 0 siblings, 0 replies; 9+ messages in thread From: Lan Tianyu @ 2016-10-13 1:15 UTC (permalink / raw) To: Jan Beulich Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3, ian.jackson, xen-devel On 2016年10月13日 00:03, Jan Beulich wrote: >>>> On 12.10.16 at 16:30, <tianyu.lan@intel.com> wrote: >> >> Since the issue happens when handle_keypress() runs in a timer handler, >> how about to name new parameter "intimer"? __serial_rx() is called in a >> timer handler or interrupt handler. Or do you have other suggestion? > > I think "intimer" can be confusing (to be mixed up with timer interrupt). > How about "force_tasklet"? OK. I will update. -- Best regards Tianyu Lan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 2/2] Xen/timer: Process softirq during dumping timer info 2016-10-12 7:58 [PATCH v2 0/2] Xen: Fix Xen hypervisor panic during dumping timer info on huge machine Lan Tianyu 2016-10-12 7:58 ` [PATCH v2 1/2] Xen/Keyhandler: Rework process of nonirq keyhandler Lan Tianyu @ 2016-10-12 7:58 ` Lan Tianyu 2016-10-21 17:27 ` Wei Liu 1 sibling, 1 reply; 9+ messages in thread From: Lan Tianyu @ 2016-10-12 7:58 UTC (permalink / raw) To: xen-devel, xen-devel Cc: Lan Tianyu, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3, ian.jackson, jbeulich Dumping timer info may run for a long time on the huge machine with a lot of physical cpus. To avoid triggering NMI watchdog, add process_pending_softirqs() in the loop of dumping timer info. Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> --- xen/common/timer.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/xen/common/timer.c b/xen/common/timer.c index 29a60a9..ab6bca0 100644 --- a/xen/common/timer.c +++ b/xen/common/timer.c @@ -530,6 +530,7 @@ static void dump_timerq(unsigned char key) { ts = &per_cpu(timers, i); + process_pending_softirqs(); printk("CPU%02d:\n", i); spin_lock_irqsave(&ts->lock, flags); for ( j = 1; j <= GET_HEAP_SIZE(ts->heap); j++ ) -- 1.7.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] Xen/timer: Process softirq during dumping timer info 2016-10-12 7:58 ` [PATCH v2 2/2] Xen/timer: Process softirq during dumping timer info Lan Tianyu @ 2016-10-21 17:27 ` Wei Liu 2016-10-22 3:52 ` Lan, Tianyu 0 siblings, 1 reply; 9+ messages in thread From: Wei Liu @ 2016-10-21 17:27 UTC (permalink / raw) To: Lan Tianyu Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3, ian.jackson, xen-devel, jbeulich, xen-devel On Wed, Oct 12, 2016 at 03:58:24PM +0800, Lan Tianyu wrote: > Dumping timer info may run for a long time on the huge machine with > a lot of physical cpus. To avoid triggering NMI watchdog, add > process_pending_softirqs() in the loop of dumping timer info. > > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> > --- > xen/common/timer.c | 1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/xen/common/timer.c b/xen/common/timer.c > index 29a60a9..ab6bca0 100644 > --- a/xen/common/timer.c > +++ b/xen/common/timer.c > @@ -530,6 +530,7 @@ static void dump_timerq(unsigned char key) > { > ts = &per_cpu(timers, i); > > + process_pending_softirqs(); This is causing issues in ARM (x86 has a similar issue): Oct 20 01:43:31.410010 (XEN) Xen call trace: Oct 20 01:43:31.410048 (XEN) [<00233920>] process_pending_softirqs+0x34/0x5c (PC) Oct 20 01:43:31.417990 (XEN) [<00237c6c>] timer.c#dump_timerq+0x9c/0x1fc (LR) Oct 20 01:43:31.418030 (XEN) [<00218658>] handle_keypress+0xc0/0xf4 Oct 20 01:43:31.426001 (XEN) [<002490c8>] console.c#__serial_rx+0x4c/0x9c Oct 20 01:43:31.433970 (XEN) [<00249b74>] console.c#serial_rx+0xcc/0xe4 Oct 20 01:43:31.434007 (XEN) [<0024b6ec>] serial_rx_interrupt+0xcc/0xf8 Oct 20 01:43:31.441964 (XEN) [<0024ae54>] exynos4210-uart.c#exynos4210_uart_interrupt+0xf8/0x160 Oct 20 01:43:31.450001 (XEN) [<00256338>] do_IRQ+0x1a0/0x228 Oct 20 01:43:31.450040 (XEN) [<00254074>] gic_interrupt+0x58/0xfc Oct 20 01:43:31.457985 (XEN) [<00260f98>] do_trap_irq+0x24/0x38 Oct 20 01:43:31.458022 (XEN) [<00264970>] entry.o#return_from_trap+0/0x4 Oct 20 01:43:31.466010 (XEN) [<0030a240>] 0030a240 Oct 20 01:43:31.466044 (XEN) Oct 20 01:43:31.466066 (XEN) Oct 20 01:43:31.466099 (XEN) **************************************** Oct 20 01:43:31.473998 (XEN) Panic on CPU 0: Oct 20 01:43:31.474029 (XEN) Assertion '!in_irq() && local_irq_is_enabled()' failed at softirq.c:57 Oct 20 01:43:31.481982 (XEN) **************************************** See http://logs.test-lab.xenproject.org/osstest/logs/101571/test-armhf-armhf-libvirt/serial-arndale-bluewater.log I've reverted this patch in staging. Wei. > printk("CPU%02d:\n", i); > spin_lock_irqsave(&ts->lock, flags); > for ( j = 1; j <= GET_HEAP_SIZE(ts->heap); j++ ) > -- > 1.7.1 > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] Xen/timer: Process softirq during dumping timer info 2016-10-21 17:27 ` Wei Liu @ 2016-10-22 3:52 ` Lan, Tianyu 0 siblings, 0 replies; 9+ messages in thread From: Lan, Tianyu @ 2016-10-22 3:52 UTC (permalink / raw) To: Wei Liu Cc: sstabellini, George.Dunlap, andrew.cooper3, ian.jackson, xen-devel, jbeulich On 10/22/2016 1:27 AM, Wei Liu wrote: > On Wed, Oct 12, 2016 at 03:58:24PM +0800, Lan Tianyu wrote: >> Dumping timer info may run for a long time on the huge machine with >> a lot of physical cpus. To avoid triggering NMI watchdog, add >> process_pending_softirqs() in the loop of dumping timer info. >> >> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> >> --- >> xen/common/timer.c | 1 + >> 1 files changed, 1 insertions(+), 0 deletions(-) >> >> diff --git a/xen/common/timer.c b/xen/common/timer.c >> index 29a60a9..ab6bca0 100644 >> --- a/xen/common/timer.c >> +++ b/xen/common/timer.c >> @@ -530,6 +530,7 @@ static void dump_timerq(unsigned char key) >> { >> ts = &per_cpu(timers, i); >> >> + process_pending_softirqs(); > > This is causing issues in ARM (x86 has a similar issue): > > Oct 20 01:43:31.410010 (XEN) Xen call trace: > Oct 20 01:43:31.410048 (XEN) [<00233920>] process_pending_softirqs+0x34/0x5c (PC) > Oct 20 01:43:31.417990 (XEN) [<00237c6c>] timer.c#dump_timerq+0x9c/0x1fc (LR) > Oct 20 01:43:31.418030 (XEN) [<00218658>] handle_keypress+0xc0/0xf4 > Oct 20 01:43:31.426001 (XEN) [<002490c8>] console.c#__serial_rx+0x4c/0x9c > Oct 20 01:43:31.433970 (XEN) [<00249b74>] console.c#serial_rx+0xcc/0xe4 > Oct 20 01:43:31.434007 (XEN) [<0024b6ec>] serial_rx_interrupt+0xcc/0xf8 > Oct 20 01:43:31.441964 (XEN) [<0024ae54>] exynos4210-uart.c#exynos4210_uart_interrupt+0xf8/0x160 > Oct 20 01:43:31.450001 (XEN) [<00256338>] do_IRQ+0x1a0/0x228 > Oct 20 01:43:31.450040 (XEN) [<00254074>] gic_interrupt+0x58/0xfc > Oct 20 01:43:31.457985 (XEN) [<00260f98>] do_trap_irq+0x24/0x38 > Oct 20 01:43:31.458022 (XEN) [<00264970>] entry.o#return_from_trap+0/0x4 > Oct 20 01:43:31.466010 (XEN) [<0030a240>] 0030a240 > Oct 20 01:43:31.466044 (XEN) > Oct 20 01:43:31.466066 (XEN) > Oct 20 01:43:31.466099 (XEN) **************************************** > Oct 20 01:43:31.473998 (XEN) Panic on CPU 0: > Oct 20 01:43:31.474029 (XEN) Assertion '!in_irq() && local_irq_is_enabled()' failed at softirq.c:57 > Oct 20 01:43:31.481982 (XEN) **************************************** > > See > http://logs.test-lab.xenproject.org/osstest/logs/101571/test-armhf-armhf-libvirt/serial-arndale-bluewater.log > > I've reverted this patch in staging. > > Wei. dump_timerq() or other non-irq keyhandlers should not run in irq context and has sent out a fix patch. https://lists.xen.org/archives/html/xen-devel/2016-10/msg01391.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-10-22 3:52 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-10-12 7:58 [PATCH v2 0/2] Xen: Fix Xen hypervisor panic during dumping timer info on huge machine Lan Tianyu 2016-10-12 7:58 ` [PATCH v2 1/2] Xen/Keyhandler: Rework process of nonirq keyhandler Lan Tianyu 2016-10-12 13:19 ` Jan Beulich 2016-10-12 14:30 ` Lan, Tianyu 2016-10-12 16:03 ` Jan Beulich 2016-10-13 1:15 ` Lan Tianyu 2016-10-12 7:58 ` [PATCH v2 2/2] Xen/timer: Process softirq during dumping timer info Lan Tianyu 2016-10-21 17:27 ` Wei Liu 2016-10-22 3:52 ` Lan, Tianyu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).