public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: David Miller <davem@davemloft.net>
Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: Soft lockup regression from today's sched.git merge.
Date: Wed, 23 Apr 2008 11:40:07 +0200	[thread overview]
Message-ID: <20080423094007.GA23293@elte.hu> (raw)
In-Reply-To: <20080422.224212.32058127.davem@davemloft.net>


* David Miller <davem@davemloft.net> wrote:

> The code is fatally flawed, once __sync_cpu_clock() calls start 
> happening, they will happen on every cpu_clock() call.
> 
> So like my bisect showed from the get-go, these cpu_clock() changes 
> have major problems, so it was quite a mind boggling stretch to stick 
> a touch_softlockup_watchdog() call somewhere to try and fix this when 
> the guilty change in question didn't touch that area at all.
> :-(
> 
> Furthermore, this is an extremely expensive way to ensure monotonic 
> per-rq timestamps.  A global spinlock taken every 100000 ns on every 
> cpu?!?!  :-/
> 
> At least move any implication of "high speed" from the comments above 
> cpu_clock() if we're going to need something like this.  I have 128 
> cpus, that's 128 grabs of that spinlock every quantum.  My next system 
> I'm getting will have 256 cpus.  The expense of your solution 
> increases linearly with the number of cpus, which doesn't scale.

we are glad about your feedback and we will fix any and all bugs in this 
code, as fast as we can. Let me also defend the code as there are two 
factors that you might not be aware of:

Firstly, cpu_clock() is only used in debug code, not in a fastpath. 
Using a global lock there was a conscious choice, it's been in -mm for 
two months and in linux-next as well for some time. I booted it on a 
64-way box with nohz enabled and it wasnt a visible problem in 
benchmarks.

The time_sync_thresh thing was indeed an afterthought and it was indeed 
buggy but harmless to functionality. (that's why it went unnoticed - 
until you found the thinko - thanks for that.) The patches i sent today 
should largely address that.

Secondly, and perhaps more importantly, the nohz code _already_ uses a 
very heavy lock in the idle path and always did that! It is exactly the 
kind of global lock you complain about above, just much worse: it's used 
in the fastpath.

Every time irq_enter() is done on an idle CPU we do:

|  static void tick_do_update_jiffies64(ktime_t now)
|  {
|          unsigned long ticks = 0;
|          ktime_t delta;
|
|          /* Reevalute with xtime_lock held */
|          write_seqlock(&xtime_lock);

... that xtime_lock is a heavy global seqlock - just as heavy as any 
global spinlock. That's the price we pay for global jiffies and shoddy 
drivers that rely on it.

This has been there in the fastpath unconditionally (not in debug code) 
and it was not reported as a problem before. The reason is that while 
this global lock truly sucks it spends time from an _idle_ CPU's time 
which mutes some of its overhead and makes it mostly "invisible" to any 
real performance measurement.

It's still not ideal because it slightly increases irq latency. So it 
would be nice to improve upon it but it's nothing new and it's not 
caused by these commits. Any ideas how to do it? I guess we could first 
check jiffies unlocked - this would mute much of the polling that goes 
on here. Something like the patch below. Hm?

	Ingo

------------------------->
Subject: nohz: reduce jiffies polling overhead
From: Ingo Molnar <mingo@elte.hu>
Date: Wed Apr 23 11:05:09 CEST 2008

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/time/tick-sched.c |    7 +++++++
 1 file changed, 7 insertions(+)

Index: linux/kernel/time/tick-sched.c
===================================================================
--- linux.orig/kernel/time/tick-sched.c
+++ linux/kernel/time/tick-sched.c
@@ -48,6 +48,13 @@ static void tick_do_update_jiffies64(kti
 	unsigned long ticks = 0;
 	ktime_t delta;
 
+	/*
+	 * Do a quick check without holding xtime_lock:
+	 */
+	delta = ktime_sub(now, last_jiffies_update);
+	if (delta.tv64 < tick_period.tv64)
+		return;
+
 	/* Reevalute with xtime_lock held */
 	write_seqlock(&xtime_lock);
 

      parent reply	other threads:[~2008-04-23  9:40 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-22  8:59 Soft lockup regression from today's sched.git merge David Miller
2008-04-22  9:14 ` Ingo Molnar
2008-04-22 10:05   ` David Miller
2008-04-22 12:45     ` Peter Zijlstra
2008-05-06 22:41       ` Rafael J. Wysocki
2008-05-06 23:05         ` David Miller
2008-05-07  6:43           ` Ingo Molnar
2008-05-07 18:56             ` Rafael J. Wysocki
2008-04-23  8:50     ` [patch] softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds Ingo Molnar
2008-04-23 10:55       ` David Miller
2008-04-23 12:29         ` David Miller
2008-04-23 13:36           ` Ingo Molnar
2008-04-23 23:23             ` David Miller
2008-04-23  5:42   ` Soft lockup regression from today's sched.git merge David Miller
2008-04-23  7:32     ` Dhaval Giani
2008-04-23  7:51     ` Ingo Molnar
2008-04-23  9:40     ` Ingo Molnar [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080423094007.GA23293@elte.hu \
    --to=mingo@elte.hu \
    --cc=a.p.zijlstra@chello.nl \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox