From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Thomas Gleixner <tglx@linutronix.de>,
Richard Cochran <richardcochran@gmail.com>,
Prarit Bhargava <prarit@redhat.com>,
John Stultz <john.stultz@linaro.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>, Ingo Molnar <mingo@elte.hu>,
linux-kernel@vger.kernel.org, lttng-dev@lists.lttng.org
Subject: [RFC PATCH] timekeeping: introduce timekeeping_is_busy()
Date: Wed, 11 Sep 2013 11:08:53 -0400 [thread overview]
Message-ID: <20130911150853.GA19800@Krystal> (raw)
Starting from commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40
"timekeeping: Hold timekeepering locks in do_adjtimex and hardpps"
(3.10 kernels), the xtime write seqlock is held across calls to
__do_adjtimex(), which includes a call to notify_cmos_timer(), and hence
schedule_delayed_work().
This introduces a side-effect for a set of tracepoints, including mainly
the workqueue tracepoints: a tracer hooking on those tracepoints and
reading current time with ktime_get() will cause hard system LOCKUP such
as:
WARNING: CPU: 6 PID: 2258 at kernel/watchdog.c:245 watchdog_overflow_callback+0x93/0x9e()
Watchdog detected hard LOCKUP on cpu 6
Modules linked in: lttng_probe_workqueue(O) lttng_probe_vmscan(O) lttng_probe_udp(O) lttng_probe_timer(O) lttng_probe_s]
CPU: 6 PID: 2258 Comm: ntpd Tainted: G O 3.11.0 #158
Hardware name: Supermicro X7DAL/X7DAL, BIOS 6.00 12/03/2007
0000000000000000 ffffffff814f83eb ffffffff813b206a ffff88042fd87c78
ffffffff8106a07c 0000000000000000 ffffffff810c94c2 0000000000000000
ffff88041f31bc00 0000000000000000 ffff88042fd87d68 ffff88042fd87ef8
Call Trace:
<NMI> [<ffffffff813b206a>] ? dump_stack+0x41/0x51
[<ffffffff8106a07c>] ? warn_slowpath_common+0x79/0x92
[<ffffffff810c94c2>] ? watchdog_overflow_callback+0x93/0x9e
[<ffffffff8106a12d>] ? warn_slowpath_fmt+0x45/0x4a
[<ffffffff810c94c2>] ? watchdog_overflow_callback+0x93/0x9e
[<ffffffff810c942f>] ? watchdog_enable_all_cpus.part.2+0x31/0x31
[<ffffffff810ecc66>] ? __perf_event_overflow+0x12c/0x1ae
[<ffffffff810eab60>] ? perf_event_update_userpage+0x13/0xc2
[<ffffffff81016820>] ? intel_pmu_handle_irq+0x26a/0x2fd
[<ffffffff813b7a0b>] ? perf_event_nmi_handler+0x24/0x3d
[<ffffffff813b728f>] ? nmi_handle.isra.3+0x58/0x12f
[<ffffffff813b7a59>] ? perf_ibs_nmi_handler+0x35/0x35
[<ffffffff813b7404>] ? do_nmi+0x9e/0x2bc
[<ffffffff813b6af7>] ? end_repeat_nmi+0x1e/0x2e
[<ffffffff810a2a33>] ? read_seqcount_begin.constprop.4+0x8/0xf
[<ffffffff810a2a33>] ? read_seqcount_begin.constprop.4+0x8/0xf
[<ffffffff810a2a33>] ? read_seqcount_begin.constprop.4+0x8/0xf
<<EOE>> [<ffffffff810a2d6c>] ? ktime_get+0x23/0x5e
[<ffffffffa0314670>] ? lib_ring_buffer_clock_read.isra.28+0x1f/0x21 [lttng_ring_buffer_client_discard]
[<ffffffffa0314786>] ? lttng_event_reserve+0x112/0x3f3 [lttng_ring_buffer_client_discard]
[<ffffffffa045b1c5>] ? __event_probe__workqueue_queue_work+0x72/0xe0 [lttng_probe_workqueue]
[<ffffffff812ef7e9>] ? sock_aio_read.part.10+0x110/0x124
[<ffffffff81133a36>] ? do_sync_readv_writev+0x50/0x76
[<ffffffff8107d514>] ? __queue_work+0x1ab/0x265
[<ffffffff8107da7e>] ? queue_delayed_work_on+0x3f/0x4e
[<ffffffff810a473d>] ? __do_adjtimex+0x408/0x413
[<ffffffff810a3e9a>] ? do_adjtimex+0x98/0xee
[<ffffffff8106cec6>] ? SYSC_adjtimex+0x32/0x5d
[<ffffffff813bb74b>] ? tracesys+0xdd/0xe2
LTTng uses ktime to have the same time-base across kernel and
user-space, so traces gathered from LTTng-modules and LTTng-UST can be
correlated. We plan on using ktime until a fast, scalable, and
fine-grained time-source for tracing that can be used across kernel and
user-space, and which does not rely on read seqlock for kernel-level
synchronization, makes its way into the kernel.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
diff --git a/include/linux/time.h b/include/linux/time.h
index d5d229b..f2cec6b 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -267,4 +267,11 @@ static __always_inline void timespec_add_ns(struct timespec *a, u64 ns)
a->tv_nsec = ns;
}
+/**
+ * timekeeping_is_busy - is timekeeping busy ?
+ *
+ * Returns 0 if the current execution context can safely read time.
+ */
+extern int timekeeping_is_busy(void);
+
#endif
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 48b9fff..8ec2b5f 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -34,6 +34,7 @@
static struct timekeeper timekeeper;
static DEFINE_RAW_SPINLOCK(timekeeper_lock);
static seqcount_t timekeeper_seq;
+static int timekeeper_seq_owner = -1;
static struct timekeeper shadow_timekeeper;
/* flag for if timekeeping is suspended */
@@ -1690,6 +1691,9 @@ int do_adjtimex(struct timex *txc)
getnstimeofday(&ts);
raw_spin_lock_irqsave(&timekeeper_lock, flags);
+ timekeeper_seq_owner = smp_processor_id();
+ /* store to timekeeper_seq_owner before seqcount */
+ barrier();
write_seqcount_begin(&timekeeper_seq);
orig_tai = tai = tk->tai_offset;
@@ -1701,6 +1705,9 @@ int do_adjtimex(struct timex *txc)
clock_was_set_delayed();
}
write_seqcount_end(&timekeeper_seq);
+ /* store to seqcount before timekeeper_seq_owner */
+ barrier();
+ timekeeper_seq_owner = -1;
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
return ret;
@@ -1715,11 +1722,17 @@ void hardpps(const struct timespec *phase_ts, const struct timespec *raw_ts)
unsigned long flags;
raw_spin_lock_irqsave(&timekeeper_lock, flags);
+ timekeeper_seq_owner = smp_processor_id();
+ /* store to timekeeper_seq_owner before seqcount */
+ barrier();
write_seqcount_begin(&timekeeper_seq);
__hardpps(phase_ts, raw_ts);
write_seqcount_end(&timekeeper_seq);
+ /* store to seqcount before timekeeper_seq_owner */
+ barrier();
+ timekeeper_seq_owner = -1;
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
}
EXPORT_SYMBOL(hardpps);
@@ -1737,3 +1750,11 @@ void xtime_update(unsigned long ticks)
do_timer(ticks);
write_sequnlock(&jiffies_lock);
}
+
+int timekeeping_is_busy(void)
+{
+ if (timekeeper_seq_owner == raw_smp_processor_id())
+ return 1;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(timekeeping_is_busy);
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
next reply other threads:[~2013-09-11 15:09 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-11 15:08 Mathieu Desnoyers [this message]
2013-09-11 16:40 ` [RFC PATCH] timekeeping: introduce timekeeping_is_busy() John Stultz
2013-09-11 17:07 ` Steven Rostedt
2013-09-11 17:49 ` Mathieu Desnoyers
2013-09-11 17:53 ` John Stultz
2013-09-11 18:54 ` Mathieu Desnoyers
2013-09-11 20:36 ` Paul E. McKenney
2013-09-12 0:48 ` Mathieu Desnoyers
2013-09-12 1:25 ` Peter Zijlstra
2013-09-12 3:22 ` Mathieu Desnoyers
2013-09-12 12:09 ` Peter Zijlstra
2013-09-12 13:48 ` Mathieu Desnoyers
2013-09-12 14:46 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130911150853.GA19800@Krystal \
--to=mathieu.desnoyers@efficios.com \
--cc=gregkh@linuxfoundation.org \
--cc=john.stultz@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lttng-dev@lists.lttng.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=prarit@redhat.com \
--cc=richardcochran@gmail.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox