From: Stephen Warren <swarren-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
To: Stephen Boyd <sboyd-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>,
Mark Rutland <Mark.Rutland-5wv7dgnIgG8@public.gmane.org>,
Marc Zyngier <marc.zyngier-5wv7dgnIgG8@public.gmane.org>
Cc: "linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
ARM kernel mailing list
<linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>,
Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
John Stultz <john.stultz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
Joseph Lo <josephl-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
Subject: CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver
Date: Mon, 08 Jul 2013 11:36:21 -0600 [thread overview]
Message-ID: <51DAF895.1020700@wwwdotorg.org> (raw)
CPU hotplug (replug) on Tegra HW seems to be occasionally broken due to
commit 0647065 "clocksource: Add generic dummy timer driver" in
linux-next. Reverting that commit solves the issue.
The symptom is that ~10% of the time, when re-plugging CPU1 (in a 2-core
system, after unplugging it about 1 second before), I'll see the
following WARN trigger in clockevents_program_event():
> int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
> bool force)
> {
> unsigned long long clc;
> int64_t delta;
> int rc;
>
> if (unlikely(expires.tv64 < 0)) {
> WARN_ON_ONCE(1);
> return -ETIME;
> }
This appears to be because in tick_handle_periodic_broadcast(),
dev->next_event == KTIME_MAX. The system then hangs; I think that loop
just keeps adding tick_period onto next_event, which doesn't manage to
get to an acceptable value for a long time, if ever!
Do you have any idea why this could happen? I assume that during
switching between the dummy timer added by that patch, and the real
Tegra timer (drivers/clocksource/tegra20_timer.c) the Tegra timer's
dev->next_event is temporarily set to KTIME_MAX, but somehow the timer
IRQ handling goes off while the device is in this temporary state? The
timer core seems to take steps to prevent this though, i.e. callilng
spin_lock_irqsave() in places.
If I modify tick_handle_periodic_broadcast() to check for a negative
dev->next_event and simply return in that case, the system seems to work
fine, and I do see tick_handle_periodic_broadcast() being called at a
later time, so obviously something is coming along later and programming
the HW to generate additional events. On this HW, I believe struct
clock_event_device.set_next_event is being used to emulate the periodic
broadcast using a one-shot timer, rather than using the HW's native
periodic capability, probably due to CONFIG_NO_HZ.
Any hints greatly appreciated!
WARNING: multiple messages have this Message-ID (diff)
From: swarren@wwwdotorg.org (Stephen Warren)
To: linux-arm-kernel@lists.infradead.org
Subject: CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver
Date: Mon, 08 Jul 2013 11:36:21 -0600 [thread overview]
Message-ID: <51DAF895.1020700@wwwdotorg.org> (raw)
CPU hotplug (replug) on Tegra HW seems to be occasionally broken due to
commit 0647065 "clocksource: Add generic dummy timer driver" in
linux-next. Reverting that commit solves the issue.
The symptom is that ~10% of the time, when re-plugging CPU1 (in a 2-core
system, after unplugging it about 1 second before), I'll see the
following WARN trigger in clockevents_program_event():
> int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
> bool force)
> {
> unsigned long long clc;
> int64_t delta;
> int rc;
>
> if (unlikely(expires.tv64 < 0)) {
> WARN_ON_ONCE(1);
> return -ETIME;
> }
This appears to be because in tick_handle_periodic_broadcast(),
dev->next_event == KTIME_MAX. The system then hangs; I think that loop
just keeps adding tick_period onto next_event, which doesn't manage to
get to an acceptable value for a long time, if ever!
Do you have any idea why this could happen? I assume that during
switching between the dummy timer added by that patch, and the real
Tegra timer (drivers/clocksource/tegra20_timer.c) the Tegra timer's
dev->next_event is temporarily set to KTIME_MAX, but somehow the timer
IRQ handling goes off while the device is in this temporary state? The
timer core seems to take steps to prevent this though, i.e. callilng
spin_lock_irqsave() in places.
If I modify tick_handle_periodic_broadcast() to check for a negative
dev->next_event and simply return in that case, the system seems to work
fine, and I do see tick_handle_periodic_broadcast() being called at a
later time, so obviously something is coming along later and programming
the HW to generate additional events. On this HW, I believe struct
clock_event_device.set_next_event is being used to emulate the periodic
broadcast using a one-shot timer, rather than using the HW's native
periodic capability, probably due to CONFIG_NO_HZ.
Any hints greatly appreciated!
next reply other threads:[~2013-07-08 17:36 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-08 17:36 Stephen Warren [this message]
2013-07-08 17:36 ` CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver Stephen Warren
[not found] ` <51DAF895.1020700-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2013-07-09 0:58 ` Stephen Boyd
2013-07-09 0:58 ` Stephen Boyd
[not found] ` <20130709005837.GC830-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2013-07-09 16:05 ` Stephen Warren
2013-07-09 16:05 ` Stephen Warren
[not found] ` <51DC34AD.30009-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2013-07-09 16:35 ` Stephen Boyd
2013-07-09 16:35 ` Stephen Boyd
[not found] ` <20130709163518.GD830-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2013-07-09 16:52 ` Stephen Warren
2013-07-09 16:52 ` Stephen Warren
[not found] ` <51DC3FD2.9070308-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2013-07-09 23:05 ` Stephen Boyd
2013-07-09 23:05 ` Stephen Boyd
[not found] ` <20130709230528.GE830-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2013-07-10 16:09 ` Stephen Warren
2013-07-10 16:09 ` Stephen Warren
[not found] ` <51DD872D.7020101-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2013-07-11 14:00 ` Stephen Boyd
2013-07-11 14:00 ` Stephen Boyd
2013-07-12 10:39 ` [tip:timers/urgent] tick: broadcast: Check broadcast mode on CPU hotplug tip-bot for Stephen Boyd
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51DAF895.1020700@wwwdotorg.org \
--to=swarren-3lzwwm7+weoh9zmkesr00q@public.gmane.org \
--cc=Mark.Rutland-5wv7dgnIgG8@public.gmane.org \
--cc=john.stultz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org \
--cc=josephl-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org \
--cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
--cc=linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=marc.zyngier-5wv7dgnIgG8@public.gmane.org \
--cc=sboyd-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org \
--cc=tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.