All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: David Miller <davem@davemloft.net>
Cc: mingo@elte.hu, linux-kernel@vger.kernel.org, tglx@linutronix.de
Subject: Re: Soft lockup regression from today's sched.git merge.
Date: Tue, 22 Apr 2008 14:45:04 +0200	[thread overview]
Message-ID: <1208868304.7115.251.camel@twins> (raw)
In-Reply-To: <20080422.030519.259068348.davem@davemloft.net>

On Tue, 2008-04-22 at 03:05 -0700, David Miller wrote:

> BTW, I'm also getting cpu's wedged in the group aggregate code:
> 
> [  121.338742] TSTATE: 0000009980001602 TPC: 000000000054ea20 TNPC: 0000000000456828 Y: 00000000    Not tainted
> [  121.338778] TPC: <__first_cpu+0x4/0x28>
> [  121.338791] g0: 0000000000000000 g1: 0000000000000002 g2: 0000000000000000 g3: 0000000000000002
> [  121.338809] g4: fffff803fe9b24c0 g5: fffff8001587c000 g6: fffff803fe9d0000 g7: 00000000007b7260
> [  121.338827] o0: 0000000000000002 o1: 00000000007b7258 o2: 0000000000000000 o3: 00000000007b7800
> [  121.338845] o4: 0000000000845000 o5: 0000000000000400 sp: fffff803fe9d2ed1 ret_pc: 0000000000456820
> [  121.338879] RPC: <aggregate_group_shares+0x10c/0x16c>
> [  121.338893] l0: 0000000000000400 l1: 000000000000000d l2: 00000000000003ff l3: 0000000000000000
> [  121.338911] l4: 0000000000000000 l5: 0000000000000000 l6: fffff803fe9d0000 l7: 0000000080009002
> [  121.338928] i0: 0000000000801c20 i1: fffff800161ca508 i2: 00000000000001d8 i3: 0000000000000001
> [  121.338946] i4: fffff800161d9c00 i5: 0000000000000001 i6: fffff803fe9d2f91 i7: 0000000000456904
> [  121.338968] I7: <aggregate_get_down+0x84/0x13c>
> 
> I'm suspecting the deluge of cpumask changes that also went in today.
> 
> I guess I'll be bisecting all day tomorrow too :-/

Sadly both the cpumask changes and the group load-balancer came in the
same batch - so its all new code. Also, it looks like the cpumask
changes are before the load balance changes - so bisecting this will be
'fun'.

That said; the code in question looks like:

static
void aggregate_group_shares(struct task_group *tg, struct sched_domain *sd)
{
        unsigned long shares = 0;
        int i;

again:
        for_each_cpu_mask(i, sd->span)
                shares += tg->cfs_rq[i]->shares;

        /*
         * When the span doesn't have any shares assigned, but does have
         * tasks to run do a machine wide rebalance (should be rare).
         */
        if (unlikely(!shares && aggregate(tg, sd)->rq_weight)) {
                __aggregate_redistribute_shares(tg);
                goto again;
        }

        aggregate(tg, sd)->shares = shares;
}

and the __first_cpu() call is from the for_each_cpu_mask() afaict. and
sd->span should be good - that's not new. So I'm a bit puzzled.

But you say they get wedged - so the above trace is a snapshot (NMI,
sysrq-[tw] or the like) that could also mean they get wedged in this
'again' loop we have here.

I have two patches; the first will stick in a printk() to see if it is
indeed the loop in this function. The second is an attempt at breaking
out of it.

BTW, what does the sched_domain tree look like on that 128-cpu machine?

---
 kernel/printk.c |    2 --
 kernel/sched.c  |    1 +
 2 files changed, 1 insertion(+), 2 deletions(-)

Index: linux-2.6-2/kernel/printk.c
===================================================================
--- linux-2.6-2.orig/kernel/printk.c
+++ linux-2.6-2/kernel/printk.c
@@ -1020,8 +1020,6 @@ void release_console_sem(void)
 	console_locked = 0;
 	up(&console_sem);
 	spin_unlock_irqrestore(&logbuf_lock, flags);
-	if (wake_klogd)
-		wake_up_klogd();
 }
 EXPORT_SYMBOL(release_console_sem);
 
Index: linux-2.6-2/kernel/sched.c
===================================================================
--- linux-2.6-2.orig/kernel/sched.c
+++ linux-2.6-2/kernel/sched.c
@@ -1710,6 +1710,7 @@ again:
 	 * tasks to run do a machine wide rebalance (should be rare).
 	 */
 	if (unlikely(!shares && aggregate(tg, sd)->rq_weight)) {
+		printk(KERN_EMERG "[%d] no sd shares\n", raw_smp_processor_id());
 		__aggregate_redistribute_shares(tg);
 		goto again;
 	}


---
 kernel/sched.c |   10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

Index: linux-2.6-2/kernel/sched.c
===================================================================
--- linux-2.6-2.orig/kernel/sched.c
+++ linux-2.6-2/kernel/sched.c
@@ -1661,9 +1661,9 @@ void aggregate_group_weight(struct task_
  */
 static void __aggregate_redistribute_shares(struct task_group *tg)
 {
-	int i, max_cpu = smp_processor_id();
+	int i;
 	unsigned long rq_weight = 0;
-	unsigned long shares, max_shares = 0, shares_rem = tg->shares;
+	unsigned long shares, shares_rem = tg->shares;
 
 	for_each_possible_cpu(i)
 		rq_weight += tg->cfs_rq[i]->load.weight;
@@ -1677,10 +1677,6 @@ static void __aggregate_redistribute_sha
 
 		tg->cfs_rq[i]->shares = shares;
 
-		if (shares > max_shares) {
-			max_shares = shares;
-			max_cpu = i;
-		}
 		shares_rem -= shares;
 	}
 
@@ -1689,7 +1685,7 @@ static void __aggregate_redistribute_sha
 	 * due to rounding down when computing the per-cpu shares.
 	 */
 	if (shares_rem)
-		tg->cfs_rq[max_cpu]->shares += shares_rem;
+		tg->cfs_rq[smp_process_id()]->shares += shares_rem;
 }
 
 /*



  reply	other threads:[~2008-04-22 12:45 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-22  8:59 Soft lockup regression from today's sched.git merge David Miller
2008-04-22  9:14 ` Ingo Molnar
2008-04-22 10:05   ` David Miller
2008-04-22 12:45     ` Peter Zijlstra [this message]
2008-05-06 22:41       ` Rafael J. Wysocki
2008-05-06 23:05         ` David Miller
2008-05-07  6:43           ` Ingo Molnar
2008-05-07 18:56             ` Rafael J. Wysocki
2008-04-23  8:50     ` [patch] softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds Ingo Molnar
2008-04-23 10:55       ` David Miller
2008-04-23 12:29         ` David Miller
2008-04-23 13:36           ` Ingo Molnar
2008-04-23 23:23             ` David Miller
2008-04-23  5:42   ` Soft lockup regression from today's sched.git merge David Miller
2008-04-23  7:32     ` Dhaval Giani
2008-04-23  7:51     ` Ingo Molnar
2008-04-23  9:40     ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1208868304.7115.251.camel@twins \
    --to=peterz@infradead.org \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.