public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re:[PATCH -v4] x86: enable preemption in delay
@ 2008-05-25 18:08 Marin Mitov
  2008-05-25 19:30 ` Steven Rostedt
  2008-05-25 22:21 ` Thomas Gleixner
  0 siblings, 2 replies; 23+ messages in thread
From: Marin Mitov @ 2008-05-25 18:08 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel

Hi Steven,

look at this patch, proposed by me and Ingo few month ago.
I think it solves the problem you had fond, but unfortunately
it had been lost (not included in the mainline).

http://lkml.org/lkml/2007/11/20/343

It is against the kernel at the moment the message was sent.
I think it could be applied to the current kernel too.

Best regards

Marin Mitov

^ permalink raw reply	[flat|nested] 23+ messages in thread
* [PATCH] x86: enable preemption in delay
@ 2008-05-25  3:11 Steven Rostedt
  2008-05-25 12:44 ` Thomas Gleixner
  0 siblings, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2008-05-25  3:11 UTC (permalink / raw)
  To: LKML, linux-rt-users
  Cc: akpm, Ingo Molnar, Thomas Gleixner, Clark Williams,
	Peter Zijlstra, Luis Claudio R. Goncalves, Gregory Haskins,
	Andi Kleen, Linus Torvalds


The RT team has been searching for a nasty latency. This latency shows
up out of the blue and has been seen to be as big as 5ms!

Using ftrace I found the cause of the latency.

   pcscd-2995  3dNh1 52360300us : irq_exit (smp_apic_timer_interrupt)
   pcscd-2995  3dN.2 52360301us : idle_cpu (irq_exit)
   pcscd-2995  3dN.2 52360301us : rcu_irq_exit (irq_exit)
   pcscd-2995  3dN.1 52360771us : smp_apic_timer_interrupt (apic_timer_interrupt
)
   pcscd-2995  3dN.1 52360771us : exit_idle (smp_apic_timer_interrupt)

Here's an example of a 400 us latency. pcscd took a timer interrupt and
returned with "need resched" enabled, but did not reschedule until after
the next interrupt came in at 52360771us 400us later!

At first I thought we somehow missed a preemption check in entry.S. But
I also noticed that this always seemed to happen during a __delay call.

   pcscd-2995  3dN.2 52360836us : rcu_irq_exit (irq_exit)
   pcscd-2995  3.N.. 52361265us : preempt_schedule (__delay)

Looking at the x86 delay, I found my problem.

In git commit 35d5d08a085c56f153458c3f5d8ce24123617faf, Andrew Morton
placed preempt_disable around the entire delay due to TSC's not working
nicely on SMP.  Unfortunately for those that care about latencies this
is devastating! Especially when we have callers to mdelay(8).

Here I enable preemption during the loop and account for anytime the task
migrates to a new CPU. The delay asked for may be extended a bit by
the migration, but delay only guarantees that it will delay for that minimum
time. Delaying longer should not be an issue.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/lib/delay_32.c |   23 ++++++++++++++++++++++-
 arch/x86/lib/delay_64.c |   26 +++++++++++++++++++++++---
 2 files changed, 45 insertions(+), 4 deletions(-)

Index: linux-compile.git/arch/x86/lib/delay_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/lib/delay_32.c	2008-05-24 22:30:49.000000000 -0400
+++ linux-compile.git/arch/x86/lib/delay_32.c	2008-05-24 22:31:43.000000000 -0400
@@ -44,12 +44,33 @@ static void delay_loop(unsigned long loo
 static void delay_tsc(unsigned long loops)
 {
 	unsigned long bclock, now;
+	int cpu;

-	preempt_disable();		/* TSC's are per-cpu */
+	preempt_disable();
+	cpu = smp_processor_id();
 	rdtscl(bclock);
 	do {
 		rep_nop();
 		rdtscl(now);
+		/* Allow RT tasks to run */
+		preempt_enable();
+		preempt_disable();
+		/*
+		 * It is possible that we moved to another CPU,
+		 * and since TSC's are per-cpu we need to
+		 * calculate that. The delay must guarantee that
+		 * we wait "at least" the amount of time. Being
+		 * moved to another CPU could make the wait longer
+		 * but we just need to make sure we waited long
+		 * enough. Rebalance the counter for this CPU.
+		 */
+		if (unlikely(cpu != smp_processor_id())) {
+			if ((now-bclock) >= loops)
+				break;
+			loops -= (now - bclock);
+			rdtscl(bclock);
+			rdtscl(now);
+		}
 	} while ((now-bclock) < loops);
 	preempt_enable();
 }
Index: linux-compile.git/arch/x86/lib/delay_64.c
===================================================================
--- linux-compile.git.orig/arch/x86/lib/delay_64.c	2008-05-24 22:30:49.000000000 -0400
+++ linux-compile.git/arch/x86/lib/delay_64.c	2008-05-24 22:31:43.000000000 -0400
@@ -31,14 +31,34 @@ int __devinit read_current_timer(unsigne
 void __delay(unsigned long loops)
 {
 	unsigned bclock, now;
+	int cpu;

-	preempt_disable();		/* TSC's are pre-cpu */
+	preempt_disable();
+	cpu = smp_processor_id();
 	rdtscl(bclock);
 	do {
 		rep_nop();
 		rdtscl(now);
-	}
-	while ((now-bclock) < loops);
+		/* Allow RT tasks to run */
+		preempt_enable();
+		preempt_disable();
+		/*
+		 * It is possible that we moved to another CPU,
+		 * and since TSC's are per-cpu we need to
+		 * calculate that. The delay must guarantee that
+		 * we wait "at least" the amount of time. Being
+		 * moved to another CPU could make the wait longer
+		 * but we just need to make sure we waited long
+		 * enough. Rebalance the counter for this CPU.
+		 */
+		if (unlikely(cpu != smp_processor_id())) {
+			if ((now-bclock) >= loops)
+				break;
+			loops -= (now - bclock);
+			rdtscl(bclock);
+			rdtscl(now);
+		}
+	} while ((now-bclock) < loops);
 	preempt_enable();
 }
 EXPORT_SYMBOL(__delay);


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2008-07-06 14:47 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-25 18:08 Re:[PATCH -v4] x86: enable preemption in delay Marin Mitov
2008-05-25 19:30 ` Steven Rostedt
2008-05-25 19:58   ` [PATCH " Marin Mitov
2008-05-25 22:21 ` Thomas Gleixner
2008-05-26  5:14   ` [PATCH " Marin Mitov
2008-06-01 16:01   ` [PATCH][resubmit] " Marin Mitov
2008-06-01 16:25     ` Andi Kleen
2008-06-01 17:17       ` Marin Mitov
2008-06-09 12:13     ` Ingo Molnar
2008-06-09 16:11       ` Marin Mitov
2008-06-09 16:16         ` Ingo Molnar
2008-06-15 17:58           ` Marin Mitov
2008-06-18  7:55             ` Ingo Molnar
2008-06-18 12:08               ` Gregory Haskins
2008-06-18 12:13                 ` Gregory Haskins
2008-06-18 12:16                 ` Peter Zijlstra
2008-06-18 12:25                   ` Gregory Haskins
2008-06-18 12:42                     ` Nick Piggin
2008-06-18 13:04                       ` Gregory Haskins
2008-06-18 16:16                       ` Steven Rostedt
2008-06-18 17:18                         ` Nick Piggin
2008-06-28 10:44               ` Pavel Machek
  -- strict thread matches above, loose matches on Subject: below --
2008-05-25  3:11 [PATCH] " Steven Rostedt
2008-05-25 12:44 ` Thomas Gleixner
2008-05-25 13:35   ` [PATCH -v2] " Steven Rostedt
2008-05-25 13:44     ` Steven Rostedt
2008-05-25 13:51       ` [PATCH -v3] " Steven Rostedt
2008-05-25 15:13         ` [PATCH -v4] " Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox