From: Pratyush Anand <panand@redhat.com>
To: Dave Young <dyoung@redhat.com>
Cc: mingo@kernel.org, alexandre.belloni@free-electrons.com,
tglx@linutronix.de, hpa@zytor.com, x86@kernel.org,
rtc-linux@googlegroups.com, linux-kernel@vger.kernel.org,
prarit@redhat.com, dzickus@redhat.com, a.zummo@towertech.it
Subject: [rtc-linux] Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered
Date: Tue, 30 Aug 2016 15:24:23 +0530 [thread overview]
Message-ID: <20160830095423.GA7298@localhost.localdomain> (raw)
In-Reply-To: <20160830082230.GA7000@dhcp-128-65.nay.redhat.com>
Hi Dave,
On 30/08/2016:04:22:30 PM, Dave Young wrote:
> Hi, Pratyush
>
> On 08/16/16 at 08:55am, Pratyush Anand wrote:
> > We have observed on few x86 machines with rtc-cmos device that
> > hpet_rtc_interrupt() is called just after irq registration and before
> > cmos_do_probe() could call hpet_rtc_timer_init().
> >
> > So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time
> > interrupt is raised in the given situation, and this results in NMI
> > watchdog LOCKUP.
> >
> > It has only been observed sporadically on kdump secondary kernels.
> >
> > See the call trace:
> > ---<-snip->---
> > 27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on
> > cpu 0
> > [ 27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > 3.10.0-342.el7.x86_64 #1
> > [ 27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014
> > [ 27.919455] ffffffff8186a728 0000000059c82488 ffff880034e05af0
> > ffffffff81637bd4
> > [ 27.921870] ffff880034e05b70 ffffffff8163144a 0000000000000010
> > ffff880034e05b80
> > [ 27.924257] ffff880034e05b20 0000000059c82488 0000000000000000
> > 0000000000000000
> > [ 27.926599] Call Trace:
> > [ 27.927352] <NMI> [<ffffffff81637bd4>] dump_stack+0x19/0x1b
> > [ 27.929080] [<ffffffff8163144a>] panic+0xd8/0x1e7
> > [ 27.930588] [<ffffffff8111d3e0>] ? restart_watchdog_hrtimer+0x50/0x50
> > [ 27.932502] [<ffffffff8111d4a2>] watchdog_overflow_callback+0xc2/0xd0
> > [ 27.934427] [<ffffffff811612c1>] __perf_event_overflow+0xa1/0x250
> > [ 27.936232] [<ffffffff81161d94>] perf_event_overflow+0x14/0x20
> > [ 27.937957] [<ffffffff81032ae8>] intel_pmu_handle_irq+0x1e8/0x470
> > [ 27.939799] [<ffffffff8164164b>] perf_event_nmi_handler+0x2b/0x50
> > [ 27.941649] [<ffffffff81640d99>] nmi_handle.isra.0+0x69/0xb0
> > [ 27.943348] [<ffffffff81640f49>] do_nmi+0x169/0x340
> > [ 27.944802] [<ffffffff816401d3>] end_repeat_nmi+0x1e/0x2e
> > [ 27.946424] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380
> > [ 27.948197] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380
> > [ 27.949992] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380
> > [ 27.951816] <<EOE>> <IRQ> [<ffffffff8108f5a3>] ?
> > run_timer_softirq+0x43/0x340
> > [ 27.954114] [<ffffffff8111e24e>] handle_irq_event_percpu+0x3e/0x1e0
> > [ 27.955962] [<ffffffff8111e42d>] handle_irq_event+0x3d/0x60
> > [ 27.957635] [<ffffffff811210c7>] handle_edge_irq+0x77/0x130
> > [ 27.959332] [<ffffffff8101704f>] handle_irq+0xbf/0x150
> > [ 27.960949] [<ffffffff8164a86f>] do_IRQ+0x4f/0xf0
> > [ 27.962434] [<ffffffff8163faed>] common_interrupt+0x6d/0x6d
> > [ 27.964101] <EOI> [<ffffffff8163f43b>] ?
> > _raw_spin_unlock_irqrestore+0x1b/0x40
> > [ 27.966308] [<fffff8111ff07>] __setup_irq+0x2a7/0x570
> > [ 28.067859] [<ffffffff81056e60>] ? hpet_cpuhp_notify+0x140/0x140
> > [ 28.069709] [<ffffffff8112032c>] request_threaded_irq+0xcc/0x170
> > [ 28.071585] [<ffffffff814b24a6>] cmos_do_probe+0x1e6/0x450
> > [ 28.073240] [<ffffffff814b2710>] ? cmos_do_probe+0x450/0x450
> > [ 28.074911] [<ffffffff814b27cb>] cmos_pnp_probe+0xbb/0xc0
> > [ 28.076533] [<ffffffff8139b245>] pnp_device_probe+0x65/0xd0
> > [ 28.078198] [<ffffffff813f8ca7>] driver_probe_device+0x87/0x390
> > [ 28.079971] [<ffffffff813f9083>] __driver_attach+0x93/0xa0
> > [ 28.081660] [<ffffffff813f8ff0>] ? __device_attach+0x40/0x40
> > [ 28.083662] [<ffffffff813f6a13>] bus_for_each_dev+0x73/0xc0
> > [ 28.085370] [<ffffffff813f86fe>] driver_attach+0x1e/0x20
> > [ 28.086974] [<ffffffff813f8250>] bus_add_driver+0x200/0x2d0
> > [ 28.088634] [<ffffffff81ade49a>] ? rtc_sysfs_init+0xe/0xe
> > [ 28.090349] [<ffffffff813f9704>] driver_register+0x64/0xf0
> > [ 28.091989] [<ffffffff8139b070>] pnp_register_driver+0x20/0x30
> > [ 28.093707] [<ffffffff81ade4ab>] cmos_init+0x11/0x71
> > ---<-snip->---
> >
> > The previous patch split hpet_rtc_timer_init into
> > hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable().
> >
> > Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ
> > registration, so that we can gracefully handle such spurious interrupts.
> >
> > We were able to reproduce the problem in maximum 15 trials of kdump
> > secondary kernel boot on an hp-dl160gen8 machine without this patch.
> > However, more than 35 trials went fine after applying this patch.
> >
> > Signed-off-by: Pratyush Anand <panand@redhat.com>
> > [dzickus@redhat.com: edited the patch's summary]
> > Signed-off-by: Don Zickus <dzickus@redhat.com>
> > ---
> > drivers/rtc/rtc-cmos.c | 13 ++++++++++++-
> > 1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
> > index 43745cac0141..089d987f2638 100644
> > --- a/drivers/rtc/rtc-cmos.c
> > +++ b/drivers/rtc/rtc-cmos.c
> > @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void)
> > return 0;
> > }
> >
> > +static inline int hpet_rtc_timer_counter_init(void)
> > +{
> > + return 0;
> > +}
> > +
> > +static inline int hpet_rtc_timer_enable(void)
> > +{
> > + return 0;
> > +}
> > +
>
> Can these dummy functions go to /usr/include/linux/hpet.h alont with
> the #ifdef etc.
I kept them here because, similar functions like hpet_set_alarm_time() were
already there. So, if you suggest that I should have an additional cleanup patch
first, which moves existing #ifdef block to inlcude/linux/hpet.h and this
patch adds it's inline in linux/hpet.h, then may be I can take that. But not
sure if there is something more to be done which will help the MAINTAINER to
take it.
~Pratyush
>
> > static inline int hpet_rtc_timer_init(void)
> > {
> > return 0;
> > @@ -707,6 +717,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq)
> > goto cleanup1;
> > }
> >
> > + hpet_rtc_timer_counter_init();
> > if (is_valid_irq(rtc_irq)) {
> > irq_handler_t rtc_cmos_int_handler;
> >
> > @@ -729,7 +740,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq)
> > goto cleanup1;
> > }
> > }
> > - hpet_rtc_timer_init();
> > + hpet_rtc_timer_enable();
> >
> > /* export at least the first block of NVRAM */
> > nvram.size = address_space - NVRAM_OFFSET;
> > --
> > 2.5.5
> >
>
> Thanks
> Dave
--
You received this message because you are subscribed to "rtc-linux".
Membership options at http://groups.google.com/group/rtc-linux .
Please read http://groups.google.com/group/rtc-linux/web/checklist
before submitting a driver.
---
You received this message because you are subscribed to the Google Groups "rtc-linux" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtc-linux+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
WARNING: multiple messages have this Message-ID (diff)
From: Pratyush Anand <panand@redhat.com>
To: Dave Young <dyoung@redhat.com>
Cc: mingo@kernel.org, alexandre.belloni@free-electrons.com,
tglx@linutronix.de, hpa@zytor.com, x86@kernel.org,
rtc-linux@googlegroups.com, linux-kernel@vger.kernel.org,
prarit@redhat.com, dzickus@redhat.com, a.zummo@towertech.it
Subject: Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered
Date: Tue, 30 Aug 2016 15:24:23 +0530 [thread overview]
Message-ID: <20160830095423.GA7298@localhost.localdomain> (raw)
In-Reply-To: <20160830082230.GA7000@dhcp-128-65.nay.redhat.com>
Hi Dave,
On 30/08/2016:04:22:30 PM, Dave Young wrote:
> Hi, Pratyush
>
> On 08/16/16 at 08:55am, Pratyush Anand wrote:
> > We have observed on few x86 machines with rtc-cmos device that
> > hpet_rtc_interrupt() is called just after irq registration and before
> > cmos_do_probe() could call hpet_rtc_timer_init().
> >
> > So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time
> > interrupt is raised in the given situation, and this results in NMI
> > watchdog LOCKUP.
> >
> > It has only been observed sporadically on kdump secondary kernels.
> >
> > See the call trace:
> > ---<-snip->---
> > 27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on
> > cpu 0
> > [ 27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > 3.10.0-342.el7.x86_64 #1
> > [ 27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014
> > [ 27.919455] ffffffff8186a728 0000000059c82488 ffff880034e05af0
> > ffffffff81637bd4
> > [ 27.921870] ffff880034e05b70 ffffffff8163144a 0000000000000010
> > ffff880034e05b80
> > [ 27.924257] ffff880034e05b20 0000000059c82488 0000000000000000
> > 0000000000000000
> > [ 27.926599] Call Trace:
> > [ 27.927352] <NMI> [<ffffffff81637bd4>] dump_stack+0x19/0x1b
> > [ 27.929080] [<ffffffff8163144a>] panic+0xd8/0x1e7
> > [ 27.930588] [<ffffffff8111d3e0>] ? restart_watchdog_hrtimer+0x50/0x50
> > [ 27.932502] [<ffffffff8111d4a2>] watchdog_overflow_callback+0xc2/0xd0
> > [ 27.934427] [<ffffffff811612c1>] __perf_event_overflow+0xa1/0x250
> > [ 27.936232] [<ffffffff81161d94>] perf_event_overflow+0x14/0x20
> > [ 27.937957] [<ffffffff81032ae8>] intel_pmu_handle_irq+0x1e8/0x470
> > [ 27.939799] [<ffffffff8164164b>] perf_event_nmi_handler+0x2b/0x50
> > [ 27.941649] [<ffffffff81640d99>] nmi_handle.isra.0+0x69/0xb0
> > [ 27.943348] [<ffffffff81640f49>] do_nmi+0x169/0x340
> > [ 27.944802] [<ffffffff816401d3>] end_repeat_nmi+0x1e/0x2e
> > [ 27.946424] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380
> > [ 27.948197] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380
> > [ 27.949992] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380
> > [ 27.951816] <<EOE>> <IRQ> [<ffffffff8108f5a3>] ?
> > run_timer_softirq+0x43/0x340
> > [ 27.954114] [<ffffffff8111e24e>] handle_irq_event_percpu+0x3e/0x1e0
> > [ 27.955962] [<ffffffff8111e42d>] handle_irq_event+0x3d/0x60
> > [ 27.957635] [<ffffffff811210c7>] handle_edge_irq+0x77/0x130
> > [ 27.959332] [<ffffffff8101704f>] handle_irq+0xbf/0x150
> > [ 27.960949] [<ffffffff8164a86f>] do_IRQ+0x4f/0xf0
> > [ 27.962434] [<ffffffff8163faed>] common_interrupt+0x6d/0x6d
> > [ 27.964101] <EOI> [<ffffffff8163f43b>] ?
> > _raw_spin_unlock_irqrestore+0x1b/0x40
> > [ 27.966308] [<fffff8111ff07>] __setup_irq+0x2a7/0x570
> > [ 28.067859] [<ffffffff81056e60>] ? hpet_cpuhp_notify+0x140/0x140
> > [ 28.069709] [<ffffffff8112032c>] request_threaded_irq+0xcc/0x170
> > [ 28.071585] [<ffffffff814b24a6>] cmos_do_probe+0x1e6/0x450
> > [ 28.073240] [<ffffffff814b2710>] ? cmos_do_probe+0x450/0x450
> > [ 28.074911] [<ffffffff814b27cb>] cmos_pnp_probe+0xbb/0xc0
> > [ 28.076533] [<ffffffff8139b245>] pnp_device_probe+0x65/0xd0
> > [ 28.078198] [<ffffffff813f8ca7>] driver_probe_device+0x87/0x390
> > [ 28.079971] [<ffffffff813f9083>] __driver_attach+0x93/0xa0
> > [ 28.081660] [<ffffffff813f8ff0>] ? __device_attach+0x40/0x40
> > [ 28.083662] [<ffffffff813f6a13>] bus_for_each_dev+0x73/0xc0
> > [ 28.085370] [<ffffffff813f86fe>] driver_attach+0x1e/0x20
> > [ 28.086974] [<ffffffff813f8250>] bus_add_driver+0x200/0x2d0
> > [ 28.088634] [<ffffffff81ade49a>] ? rtc_sysfs_init+0xe/0xe
> > [ 28.090349] [<ffffffff813f9704>] driver_register+0x64/0xf0
> > [ 28.091989] [<ffffffff8139b070>] pnp_register_driver+0x20/0x30
> > [ 28.093707] [<ffffffff81ade4ab>] cmos_init+0x11/0x71
> > ---<-snip->---
> >
> > The previous patch split hpet_rtc_timer_init into
> > hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable().
> >
> > Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ
> > registration, so that we can gracefully handle such spurious interrupts.
> >
> > We were able to reproduce the problem in maximum 15 trials of kdump
> > secondary kernel boot on an hp-dl160gen8 machine without this patch.
> > However, more than 35 trials went fine after applying this patch.
> >
> > Signed-off-by: Pratyush Anand <panand@redhat.com>
> > [dzickus@redhat.com: edited the patch's summary]
> > Signed-off-by: Don Zickus <dzickus@redhat.com>
> > ---
> > drivers/rtc/rtc-cmos.c | 13 ++++++++++++-
> > 1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
> > index 43745cac0141..089d987f2638 100644
> > --- a/drivers/rtc/rtc-cmos.c
> > +++ b/drivers/rtc/rtc-cmos.c
> > @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void)
> > return 0;
> > }
> >
> > +static inline int hpet_rtc_timer_counter_init(void)
> > +{
> > + return 0;
> > +}
> > +
> > +static inline int hpet_rtc_timer_enable(void)
> > +{
> > + return 0;
> > +}
> > +
>
> Can these dummy functions go to /usr/include/linux/hpet.h alont with
> the #ifdef etc.
I kept them here because, similar functions like hpet_set_alarm_time() were
already there. So, if you suggest that I should have an additional cleanup patch
first, which moves existing #ifdef block to inlcude/linux/hpet.h and this
patch adds it's inline in linux/hpet.h, then may be I can take that. But not
sure if there is something more to be done which will help the MAINTAINER to
take it.
~Pratyush
>
> > static inline int hpet_rtc_timer_init(void)
> > {
> > return 0;
> > @@ -707,6 +717,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq)
> > goto cleanup1;
> > }
> >
> > + hpet_rtc_timer_counter_init();
> > if (is_valid_irq(rtc_irq)) {
> > irq_handler_t rtc_cmos_int_handler;
> >
> > @@ -729,7 +740,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq)
> > goto cleanup1;
> > }
> > }
> > - hpet_rtc_timer_init();
> > + hpet_rtc_timer_enable();
> >
> > /* export at least the first block of NVRAM */
> > nvram.size = address_space - NVRAM_OFFSET;
> > --
> > 2.5.5
> >
>
> Thanks
> Dave
next prev parent reply other threads:[~2016-08-30 9:54 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-16 3:25 [rtc-linux] [PATCH V3 0/2] rtc-cmos: Workaround unwanted interrupt generation Pratyush Anand
2016-08-16 3:25 ` Pratyush Anand
2016-08-16 3:25 ` [rtc-linux] [PATCH V3 1/2] rtc/hpet: Factorize hpet_rtc_timer_init() Pratyush Anand
2016-08-16 3:25 ` Pratyush Anand
2016-08-16 3:25 ` [rtc-linux] [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered Pratyush Anand
2016-08-16 3:25 ` Pratyush Anand
2016-08-30 8:22 ` [rtc-linux] " Dave Young
2016-08-30 8:22 ` Dave Young
2016-08-30 8:38 ` [rtc-linux] " Dave Young
2016-08-30 8:38 ` Dave Young
2016-08-30 9:10 ` [rtc-linux] " Dave Young
2016-08-30 9:10 ` Dave Young
2016-08-30 9:54 ` Pratyush Anand [this message]
2016-08-30 9:54 ` Pratyush Anand
2016-08-31 4:56 ` [rtc-linux] " Dave Young
2016-08-31 4:56 ` Dave Young
2016-08-31 6:44 ` [rtc-linux] " Alexandre Belloni
2016-08-31 6:44 ` Alexandre Belloni
2016-09-06 9:58 ` [rtc-linux] " Thomas Gleixner
2016-09-06 9:58 ` Thomas Gleixner
2016-09-06 10:40 ` [rtc-linux] " Pratyush Anand
2016-09-06 10:40 ` Pratyush Anand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160830095423.GA7298@localhost.localdomain \
--to=panand@redhat.com \
--cc=a.zummo@towertech.it \
--cc=alexandre.belloni@free-electrons.com \
--cc=dyoung@redhat.com \
--cc=dzickus@redhat.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=prarit@redhat.com \
--cc=rtc-linux@googlegroups.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.