linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gregory Haskins <ghaskins@novell.com>
To: mingo@elte.hu, rostedt@goodmis.org, tglx@linutronix.de
Cc: ghaskins@novell.com, linux-kernel@vger.kernel.org,
	linux-rt-users@vger.kernel.org
Subject: [PATCH RT 1/2] seqlock: make sure that raw_seqlock_t retries readers while writes are pending
Date: Tue, 19 Aug 2008 05:19:18 -0400	[thread overview]
Message-ID: <20080819091918.21725.39839.stgit@dev.haskins.net> (raw)
In-Reply-To: <20080819091817.21725.81831.stgit@dev.haskins.net>

The seqlock protocol is broken in -rt for raw_seqlock_t objects.  This
manifested in my 2.6.26-rt1 kernel as a 500ms (yes, millisecond) spike
which was traced out with ftrace/preemptirqsoff to be originating in
the HRT (hrtimer_interrupt, to be precise).  It would occasionally
spin processing the same CLOCK_MONOTONIC timer (the scheduler-tick)
in a tight loop with interrupts disabled.  Investigating, it turned out
that the time-basis recorded for "now" early in the interrupt was
momentarily moved 500ms in the future.  This caused all timers with
correct expiration times to appear to have expired a long time ago.
Even rescheduling the timer via hrtimer_forward ultimately placed the
timer in an "expired" state since the "now" basis was in the future.

So I began investigating how this time-basis (derived from ktime_get())
could have done this.  I observed that ktime_get() readers were able to
successfully read a time value even while another core held a
write-lock on the xtime_lock.  Therefore the fundamental issue was
that ktime_get was able to return transitional states of the
xtime/clocksource infrastructure, which is clearly not intended.

I root caused the issue to the raw_seqlock_t implementation.  It was
missing support for retrying a reader if it finds a write-pending
flag.  Investigating further, I think I can speculate why.

Back in April, Ingo and Thomas checked in a fix to mainline for seqlocks,
referenced here:

	commit 88a411c07b6fedcfc97b8dc51ae18540bd2beda0
	Author: Ingo Molnar <mingo@elte.hu>
	Date:   Thu Apr 3 09:06:13 2008 +0200

	seqlock: livelock fix

	Thomas Gleixner debugged a particularly ugly seqlock related livelock:
	do not process the seq-read section if we know it beforehand that the
	test at the end of the section will fail ...

	Signed-off-by: Ingo Molnar <mingo@elte.hu>

Of course, mainline only has seqlock_t.  In -rt, we have both seqlock_t
and raw_seqlock_t.  It would appear that the merge-resolution for
commit 88a411c07b6 to the -rt branch inadvertently applied one hunk
of the fix to seqlock_t, and the other to raw_seqlock_t.  The normal
seqlocks now have two checks for retry, while the raw_seqlocks have none.
This lack of a check is what causes the protocol failure, which ultimately
caused the bad clock info and a latency spike.

This patch corrects the above condition by applying the conceptual change
from 88a411c07b6 to both seqlock_t and raw_seqlock_t equally.  The observed
problems with the HRT spike are confirmed to no longer be reproducible as
as result.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Thomas Gleixner <tglx@linutronix.de>
---

 include/linux/seqlock.h |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index e6ecb46..345d726 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -145,7 +145,7 @@ static inline int __read_seqretry(seqlock_t *sl, unsigned iv)
 	int ret;
 
 	smp_rmb();
-	ret = (iv & 1) | (sl->sequence ^ iv);
+	ret = (sl->sequence != iv);
 	/*
 	 * If invalid then serialize with the writer, to make sure we
 	 * are not livelocking it:
@@ -228,8 +228,16 @@ static __always_inline int __write_tryseqlock_raw(raw_seqlock_t *sl)
 
 static __always_inline unsigned __read_seqbegin_raw(const raw_seqlock_t *sl)
 {
-	unsigned ret = sl->sequence;
+	unsigned ret;
+
+repeat:
+	ret = sl->sequence;
 	smp_rmb();
+	if (unlikely(ret & 1)) {
+		cpu_relax();
+		goto repeat;
+	}
+
 	return ret;
 }
 


  reply	other threads:[~2008-08-19  9:21 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-19  9:19 [PATCH RT 0/2] Misc fixes for 2.6.26-rt1 Gregory Haskins
2008-08-19  9:19 ` Gregory Haskins [this message]
2008-08-19  9:19 ` [PATCH RT 2/2] ftrace: fix elevated preempt_count in wakeup-tracer Gregory Haskins
2008-08-19 13:12   ` Peter Zijlstra
2008-08-19 13:21     ` Gregory Haskins
2008-08-19 13:45       ` Steven Rostedt
2008-08-19 14:07         ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080819091918.21725.39839.stgit@dev.haskins.net \
    --to=ghaskins@novell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).