* [PATCH RT 0/2] Misc fixes for 2.6.26-rt1
@ 2008-08-19 9:19 Gregory Haskins
2008-08-19 9:19 ` [PATCH RT 1/2] seqlock: make sure that raw_seqlock_t retries readers while writes are pending Gregory Haskins
2008-08-19 9:19 ` [PATCH RT 2/2] ftrace: fix elevated preempt_count in wakeup-tracer Gregory Haskins
0 siblings, 2 replies; 7+ messages in thread
From: Gregory Haskins @ 2008-08-19 9:19 UTC (permalink / raw)
To: mingo, rostedt, tglx; +Cc: ghaskins, linux-kernel, linux-rt-users
Hi Ingo, Thomas, Steven,
Please consider the following patches for 2.6.26-rt2
---
Gregory Haskins (2):
ftrace: fix elevated preempt_count in wakeup-tracer
seqlock: make sure that raw_seqlock_t retries readers while writes are pending
include/linux/seqlock.h | 12 ++++++++++--
kernel/trace/trace_sched_wakeup.c | 2 ++
2 files changed, 12 insertions(+), 2 deletions(-)
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH RT 1/2] seqlock: make sure that raw_seqlock_t retries readers while writes are pending
2008-08-19 9:19 [PATCH RT 0/2] Misc fixes for 2.6.26-rt1 Gregory Haskins
@ 2008-08-19 9:19 ` Gregory Haskins
2008-08-19 9:19 ` [PATCH RT 2/2] ftrace: fix elevated preempt_count in wakeup-tracer Gregory Haskins
1 sibling, 0 replies; 7+ messages in thread
From: Gregory Haskins @ 2008-08-19 9:19 UTC (permalink / raw)
To: mingo, rostedt, tglx; +Cc: ghaskins, linux-kernel, linux-rt-users
The seqlock protocol is broken in -rt for raw_seqlock_t objects. This
manifested in my 2.6.26-rt1 kernel as a 500ms (yes, millisecond) spike
which was traced out with ftrace/preemptirqsoff to be originating in
the HRT (hrtimer_interrupt, to be precise). It would occasionally
spin processing the same CLOCK_MONOTONIC timer (the scheduler-tick)
in a tight loop with interrupts disabled. Investigating, it turned out
that the time-basis recorded for "now" early in the interrupt was
momentarily moved 500ms in the future. This caused all timers with
correct expiration times to appear to have expired a long time ago.
Even rescheduling the timer via hrtimer_forward ultimately placed the
timer in an "expired" state since the "now" basis was in the future.
So I began investigating how this time-basis (derived from ktime_get())
could have done this. I observed that ktime_get() readers were able to
successfully read a time value even while another core held a
write-lock on the xtime_lock. Therefore the fundamental issue was
that ktime_get was able to return transitional states of the
xtime/clocksource infrastructure, which is clearly not intended.
I root caused the issue to the raw_seqlock_t implementation. It was
missing support for retrying a reader if it finds a write-pending
flag. Investigating further, I think I can speculate why.
Back in April, Ingo and Thomas checked in a fix to mainline for seqlocks,
referenced here:
commit 88a411c07b6fedcfc97b8dc51ae18540bd2beda0
Author: Ingo Molnar <mingo@elte.hu>
Date: Thu Apr 3 09:06:13 2008 +0200
seqlock: livelock fix
Thomas Gleixner debugged a particularly ugly seqlock related livelock:
do not process the seq-read section if we know it beforehand that the
test at the end of the section will fail ...
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Of course, mainline only has seqlock_t. In -rt, we have both seqlock_t
and raw_seqlock_t. It would appear that the merge-resolution for
commit 88a411c07b6 to the -rt branch inadvertently applied one hunk
of the fix to seqlock_t, and the other to raw_seqlock_t. The normal
seqlocks now have two checks for retry, while the raw_seqlocks have none.
This lack of a check is what causes the protocol failure, which ultimately
caused the bad clock info and a latency spike.
This patch corrects the above condition by applying the conceptual change
from 88a411c07b6 to both seqlock_t and raw_seqlock_t equally. The observed
problems with the HRT spike are confirmed to no longer be reproducible as
as result.
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Thomas Gleixner <tglx@linutronix.de>
---
include/linux/seqlock.h | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index e6ecb46..345d726 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -145,7 +145,7 @@ static inline int __read_seqretry(seqlock_t *sl, unsigned iv)
int ret;
smp_rmb();
- ret = (iv & 1) | (sl->sequence ^ iv);
+ ret = (sl->sequence != iv);
/*
* If invalid then serialize with the writer, to make sure we
* are not livelocking it:
@@ -228,8 +228,16 @@ static __always_inline int __write_tryseqlock_raw(raw_seqlock_t *sl)
static __always_inline unsigned __read_seqbegin_raw(const raw_seqlock_t *sl)
{
- unsigned ret = sl->sequence;
+ unsigned ret;
+
+repeat:
+ ret = sl->sequence;
smp_rmb();
+ if (unlikely(ret & 1)) {
+ cpu_relax();
+ goto repeat;
+ }
+
return ret;
}
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH RT 2/2] ftrace: fix elevated preempt_count in wakeup-tracer
2008-08-19 9:19 [PATCH RT 0/2] Misc fixes for 2.6.26-rt1 Gregory Haskins
2008-08-19 9:19 ` [PATCH RT 1/2] seqlock: make sure that raw_seqlock_t retries readers while writes are pending Gregory Haskins
@ 2008-08-19 9:19 ` Gregory Haskins
2008-08-19 13:12 ` Peter Zijlstra
1 sibling, 1 reply; 7+ messages in thread
From: Gregory Haskins @ 2008-08-19 9:19 UTC (permalink / raw)
To: mingo, rostedt, tglx; +Cc: ghaskins, linux-kernel, linux-rt-users
Suggested by Steve Rostedt to fix an observed "+1" in the preempt-count
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---
kernel/trace/trace_sched_wakeup.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index c3a15bd..ae523fd 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -70,7 +70,9 @@ wakeup_tracer_call(unsigned long ip, unsigned long parent_ip)
if (task_cpu(wakeup_task) != cpu)
goto unlock;
+ preempt_enable_no_resched_notrace();
trace_function(tr, data, ip, parent_ip, flags);
+ preempt_disable_notrace();
unlock:
__raw_spin_unlock(&wakeup_lock);
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH RT 2/2] ftrace: fix elevated preempt_count in wakeup-tracer
2008-08-19 9:19 ` [PATCH RT 2/2] ftrace: fix elevated preempt_count in wakeup-tracer Gregory Haskins
@ 2008-08-19 13:12 ` Peter Zijlstra
2008-08-19 13:21 ` Gregory Haskins
0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2008-08-19 13:12 UTC (permalink / raw)
To: Gregory Haskins; +Cc: mingo, rostedt, tglx, linux-kernel, linux-rt-users
On Tue, 2008-08-19 at 05:19 -0400, Gregory Haskins wrote:
> Suggested by Steve Rostedt to fix an observed "+1" in the preempt-count
>
> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
> ---
>
> kernel/trace/trace_sched_wakeup.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
> index c3a15bd..ae523fd 100644
> --- a/kernel/trace/trace_sched_wakeup.c
> +++ b/kernel/trace/trace_sched_wakeup.c
> @@ -70,7 +70,9 @@ wakeup_tracer_call(unsigned long ip, unsigned long parent_ip)
> if (task_cpu(wakeup_task) != cpu)
> goto unlock;
>
> + preempt_enable_no_resched_notrace();
> trace_function(tr, data, ip, parent_ip, flags);
> + preempt_disable_notrace();
Is preempt_count > 1 at all times here?
If not, it might drop to 0 and any interrupt might cause preemption -
and its not obvious to me that that is actually correct.
Just asking, as neither the changelog nor the code fragment enlightens
me on the subject.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH RT 2/2] ftrace: fix elevated preempt_count in wakeup-tracer
2008-08-19 13:12 ` Peter Zijlstra
@ 2008-08-19 13:21 ` Gregory Haskins
2008-08-19 13:45 ` Steven Rostedt
0 siblings, 1 reply; 7+ messages in thread
From: Gregory Haskins @ 2008-08-19 13:21 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: mingo, rostedt, tglx, linux-kernel, linux-rt-users
[-- Attachment #1: Type: text/plain, Size: 1672 bytes --]
Peter Zijlstra wrote:
> On Tue, 2008-08-19 at 05:19 -0400, Gregory Haskins wrote:
>
>> Suggested by Steve Rostedt to fix an observed "+1" in the preempt-count
>>
>> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
>> ---
>>
>> kernel/trace/trace_sched_wakeup.c | 2 ++
>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
>> index c3a15bd..ae523fd 100644
>> --- a/kernel/trace/trace_sched_wakeup.c
>> +++ b/kernel/trace/trace_sched_wakeup.c
>> @@ -70,7 +70,9 @@ wakeup_tracer_call(unsigned long ip, unsigned long parent_ip)
>> if (task_cpu(wakeup_task) != cpu)
>> goto unlock;
>>
>> + preempt_enable_no_resched_notrace();
>> trace_function(tr, data, ip, parent_ip, flags);
>> + preempt_disable_notrace();
>>
>
> Is preempt_count > 1 at all times here?
>
> If not, it might drop to 0 and any interrupt might cause preemption -
> and its not obvious to me that that is actually correct.
>
According to Steve, we are already in an interrupt-disabled section
here, but I will defer to him. He suggested I try this over an IRC
conversation when I noticed a strange wakeup trace, and it seems to have
solved the problem.
Im really sending this patch more of a reminder to Steve that he was
going to fix this, rather than to accept my patch as is. Of course I
don't mind if it is accepted as is, and I can make the prologue/comments
more descriptive if necessary. But if Steve wants to do something like
fold this into his ftrace series, that is fine too. I just didn't want
it to be forgotten ;)
-Greg
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH RT 2/2] ftrace: fix elevated preempt_count in wakeup-tracer
2008-08-19 13:21 ` Gregory Haskins
@ 2008-08-19 13:45 ` Steven Rostedt
2008-08-19 14:07 ` Peter Zijlstra
0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2008-08-19 13:45 UTC (permalink / raw)
To: Gregory Haskins; +Cc: Peter Zijlstra, mingo, tglx, linux-kernel, linux-rt-users
On Tue, 19 Aug 2008, Gregory Haskins wrote:
> >
> > Is preempt_count > 1 at all times here?
> >
> > If not, it might drop to 0 and any interrupt might cause preemption -
> > and its not obvious to me that that is actually correct.
> >
> According to Steve, we are already in an interrupt-disabled section here, but
> I will defer to him. He suggested I try this over an IRC conversation when I
> noticed a strange wakeup trace, and it seems to have solved the problem.
Exactly.
Peter,
We disable preemption, do a per_cpu check, if we are not nested we grab
disable interrupts and grab the spin lock.
The issue that Gregory brought up, was that the tracer would always show
that preemption was disabled, when it really wasn't. It was recording the
preempt disabled that the trace function itself made. The answer was
simply to do as Greg did, and enable preemption (but interrupts are
still disabled), call the function tracer, then disable preemption again
(to match the outside preemption enabling.
>
> Im really sending this patch more of a reminder to Steve that he was going to
> fix this, rather than to accept my patch as is. Of course I don't mind if it
> is accepted as is, and I can make the prologue/comments more descriptive if
> necessary. But if Steve wants to do something like fold this into his ftrace
> series, that is fine too. I just didn't want it to be forgotten ;)
Greg thanks, otherwise I would have probably forgotten it ;-)
-- Steve
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH RT 2/2] ftrace: fix elevated preempt_count in wakeup-tracer
2008-08-19 13:45 ` Steven Rostedt
@ 2008-08-19 14:07 ` Peter Zijlstra
0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2008-08-19 14:07 UTC (permalink / raw)
To: Steven Rostedt; +Cc: Gregory Haskins, mingo, tglx, linux-kernel, linux-rt-users
On Tue, 2008-08-19 at 09:45 -0400, Steven Rostedt wrote:
> On Tue, 19 Aug 2008, Gregory Haskins wrote:
> > >
> > > Is preempt_count > 1 at all times here?
> > >
> > > If not, it might drop to 0 and any interrupt might cause preemption -
> > > and its not obvious to me that that is actually correct.
> > >
> > According to Steve, we are already in an interrupt-disabled section here, but
> > I will defer to him. He suggested I try this over an IRC conversation when I
> > noticed a strange wakeup trace, and it seems to have solved the problem.
>
> Exactly.
>
> Peter,
>
> We disable preemption, do a per_cpu check, if we are not nested we grab
> disable interrupts and grab the spin lock.
>
> The issue that Gregory brought up, was that the tracer would always show
> that preemption was disabled, when it really wasn't. It was recording the
> preempt disabled that the trace function itself made. The answer was
> simply to do as Greg did, and enable preemption (but interrupts are
> still disabled), call the function tracer, then disable preemption again
> (to match the outside preemption enabling.
Ok good.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-08-19 14:07 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-19 9:19 [PATCH RT 0/2] Misc fixes for 2.6.26-rt1 Gregory Haskins
2008-08-19 9:19 ` [PATCH RT 1/2] seqlock: make sure that raw_seqlock_t retries readers while writes are pending Gregory Haskins
2008-08-19 9:19 ` [PATCH RT 2/2] ftrace: fix elevated preempt_count in wakeup-tracer Gregory Haskins
2008-08-19 13:12 ` Peter Zijlstra
2008-08-19 13:21 ` Gregory Haskins
2008-08-19 13:45 ` Steven Rostedt
2008-08-19 14:07 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).