From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xensource.com,
julie Sullivan <kernelmail.jms@gmail.com>,
linux-kernel@vger.kernel.org, chengxu@linux.vnet.ibm.com
Subject: Re: PROBLEM: 3.0-rc kernels unbootable since -rc3
Date: Tue, 12 Jul 2011 09:39:47 -0700 [thread overview]
Message-ID: <20110712163947.GF2326@linux.vnet.ibm.com> (raw)
In-Reply-To: <20110712160324.GA1186@dumpdata.com>
On Tue, Jul 12, 2011 at 12:03:24PM -0400, Konrad Rzeszutek Wilk wrote:
> > > http://darnok.org/xen/cpu1.log
> >
> > OK, a fair amount of variety, then lots and lots of task_waking_fair(),
> > so I still feel good about asking you for the following.
> .. snup..
> > Hmmm... Given that this is persisting for many many seconds, it might
> > be better to check for at least 10,000,000 passes. In contrast, 1000
> > passes might elapse just waiting for a cache miss to complete.
>
> Changed it to that large number. This is the diff I used:
>
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 433491c..e185c04 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -1392,14 +1392,19 @@ static void task_waking_fair(struct task_struct *p)
> struct sched_entity *se = &p->se;
> struct cfs_rq *cfs_rq = cfs_rq_of(se);
> u64 min_vruntime;
> + u64 loop_cnt = 0UL;
>
> #ifndef CONFIG_64BIT
> u64 min_vruntime_copy;
> -
> + loop_cnt = 0UL;
> do {
> min_vruntime_copy = cfs_rq->min_vruntime_copy;
> smp_rmb();
> min_vruntime = cfs_rq->min_vruntime;
> + if (loop_cnt++ > 10000000) {
> + printk(KERN_INFO "POKE!\n");
> + loop_cnt = 0UL;
> + }
> } while (min_vruntime != min_vruntime_copy);
> #else
> min_vruntime = cfs_rq->min_vruntime;
>
> And the log is:
> http://darnok.org/xen/loop_cnt.log
>
> which seems to imply that we are indeed stuck in that loop
> forever.
It does indeed, thank you! Also it looks like interrupts are
disabled, and that timekeeping is similarly out of action.
> > Other possible causes include:
>
> What is really strange is that I can only reproduce this on 32-bit builds.
Not strange at all. If you have a 64-bit build, the function doesn't
have a loop. ;-)
> > o A mismatch between Xen's and RCU's ideas of how CONFIG_NO_HZ
> > works. If Xen thinks that the CPU is in CONFIG_NO_HZ's
> > dyntick-idle mode, but RCU thinks otherwise, the grace period
> > might stall.
>
> One sure way to figure this out is to disable CONFIG_NO_HZ right?
> Or will that take away task_waking_fair case as well?
Disabling CONFIG_NO_HZ would be an interesting test case.
> > o Problems due to portions of the code attempting to use
> > RCU read-side critical sections while in dyntick-idle mode.
> > Frederic Weisbecker has located some of these, (though not yet
> > in Xen) and he has some diagnositics which may be found at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git
> >
> > on branch eqscheck.2011.07.08a.
> >
> > You need to enable CONFIG_PROVE_RCU for these diagnostics to
> > be executed.
>
> Ok, let me try those too.
Thank you!
> > o As always, there might be bugs in RCU. ;-)
> >
> > But the loop in task_waking_fair() looks like the most prominent smoking
> > gun at the moment.
And could you also please try out the patch that I posted earlier?
Thaxn, Paul
next prev parent reply other threads:[~2011-07-12 17:22 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-06 21:10 PROBLEM: 3.0-rc kernels unbootable since -rc3 julie Sullivan
2011-07-06 21:23 ` Paul E. McKenney
2011-07-07 19:31 ` Paul E. McKenney
2011-07-07 19:47 ` julie Sullivan
2011-07-07 19:58 ` Paul E. McKenney
2011-07-07 20:28 ` julie Sullivan
2011-07-07 20:47 ` julie Sullivan
2011-07-08 0:29 ` Paul E. McKenney
2011-07-09 10:09 ` Paul E. McKenney
2011-07-10 0:45 ` julie Sullivan
2011-07-10 3:25 ` Paul E. McKenney
2011-07-10 16:38 ` julie Sullivan
2011-07-10 17:16 ` Paul E. McKenney
2011-07-10 17:35 ` Paul E. McKenney
2011-07-10 20:30 ` julie Sullivan
2011-07-10 21:46 ` Paul E. McKenney
2011-07-10 21:50 ` julie Sullivan
2011-07-10 23:14 ` Paul E. McKenney
2011-07-11 16:24 ` Konrad Rzeszutek Wilk
2011-07-11 17:13 ` Paul E. McKenney
2011-07-11 19:30 ` Konrad Rzeszutek Wilk
2011-07-11 20:15 ` Paul E. McKenney
2011-07-11 21:09 ` Konrad Rzeszutek Wilk
2011-07-12 10:55 ` Paul E. McKenney
2011-07-12 14:12 ` Konrad Rzeszutek Wilk
2011-07-12 14:49 ` Paul E. McKenney
2011-07-12 15:07 ` Paul E. McKenney
2011-07-12 15:15 ` Paul E. McKenney
2011-07-12 15:22 ` Paul E. McKenney
2011-07-12 16:32 ` PROBLEM: 3.0-rc kernels unbootable since -rc3 - under Xen, 32-bit guest only Konrad Rzeszutek Wilk
2011-07-12 16:32 ` Konrad Rzeszutek Wilk
2011-07-12 16:46 ` Paul E. McKenney
2011-07-12 16:03 ` PROBLEM: 3.0-rc kernels unbootable since -rc3 Konrad Rzeszutek Wilk
2011-07-12 16:39 ` Paul E. McKenney [this message]
2011-07-12 18:01 ` Konrad Rzeszutek Wilk
2011-07-12 18:59 ` Paul E. McKenney
2011-07-12 19:07 ` Konrad Rzeszutek Wilk
2011-07-12 20:52 ` Paul E. McKenney
2011-07-12 19:10 ` Peter Zijlstra
2011-07-12 19:57 ` Konrad Rzeszutek Wilk
2011-07-12 20:46 ` Paul E. McKenney
2011-07-12 21:04 ` Julie Sullivan
2011-07-12 21:07 ` Paul E. McKenney
2011-07-12 20:05 ` Paul E. McKenney
2011-07-12 6:33 ` [Xen-devel] " Sander Eikelenboom
2011-07-12 6:33 ` Sander Eikelenboom
2011-07-12 14:05 ` [Xen-devel] " Paul E. McKenney
[not found] ` <CAAVPGOMSprJSkzziH6hJv9PweOONzsMaRZEK2ZSrV3xFBReTPw@mail.gmail.com>
[not found] ` <20110711214301.GP2245@linux.vnet.ibm.com>
2011-07-12 21:15 ` Julie Sullivan
2011-07-12 21:29 ` Paul E. McKenney
2011-07-12 21:35 ` Julie Sullivan
2011-07-12 21:49 ` Julie Sullivan
2011-07-12 22:00 ` Paul E. McKenney
2011-07-13 7:18 ` RKK
2011-07-13 15:47 ` Paul E. McKenney
2011-07-13 20:57 ` Julie Sullivan
2011-07-13 21:23 ` Paul E. McKenney
2011-07-07 17:28 ` julie Sullivan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110712163947.GF2326@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=chengxu@linux.vnet.ibm.com \
--cc=kernelmail.jms@gmail.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.