From: Frederic Weisbecker <fweisbec@gmail.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Boyer <jwboyer@redhat.com>,
linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: 3.0-git15 Atomic scheduling in pidmap_init
Date: Tue, 9 Aug 2011 13:35:18 +0200 [thread overview]
Message-ID: <20110809113514.GA27301@somewhere.redhat.com> (raw)
In-Reply-To: <20110808031014.GE2385@linux.vnet.ibm.com>
On Sun, Aug 07, 2011 at 08:10:14PM -0700, Paul E. McKenney wrote:
> On Mon, Aug 08, 2011 at 04:55:07AM +0200, Frederic Weisbecker wrote:
> > On Sun, Aug 07, 2011 at 07:09:14PM -0700, Paul E. McKenney wrote:
> > > On Sat, Aug 06, 2011 at 01:12:18AM +0200, Frederic Weisbecker wrote:
> > > > On Fri, Aug 05, 2011 at 03:26:41PM -0700, Paul E. McKenney wrote:
> > > > > On Fri, Aug 05, 2011 at 07:08:08PM +0200, Frederic Weisbecker wrote:
> > > > > > On Fri, Aug 05, 2011 at 10:22:45AM -0400, Josh Boyer wrote:
> > > > > > > On Thu, Aug 04, 2011 at 11:56:46PM -0700, Paul E. McKenney wrote:
> > > > > > > > > > I could be missing something obvious, but I don't see a way to avoid
> > > > > > > > > > using GFP_KERNEL without a lot of rip-up in the rest of the init path.
> > > > > > > > >
> > > > > > > > > As an aside, I bisected this back to:
> > > > > > > > >
> > > > > > > > > e8f7c70f44f sched: Make sleeping inside spinlock detection working in
> > > > > > > > > !CONFIG_PREEMPT
> > > > > > > >
> > > > > > > > OK, added Frederic on CC.
> > > > > > > >
> > > > > > > > > However, that doesn't seem all that helpful. The
> > > > > > > > > CONFIG_DEBUG_SPINLOCK_SLEEP option later got renamed to
> > > > > > > > > DEBUG_ATOMIC_SLEEP, and all it's doing is selecting PREEMPT_COUNT. At
> > > > > > > > > first glance, it seems this commit just allowed an issue that's been
> > > > > > > > > around for a while (benign or otherwise) to finally show up.
> > > > > > > > >
> > > > > > > > > (The Fedora kernel configs have CONFIG_PREEMPT_VOLUNTARY set, but not
> > > > > > > > > CONFIG_PREEMPT so PREEMPT_COUNT wasn't getting selected until this
> > > > > > > > > option did so.)
> > > > > > > >
> > > > > > > > Understood. So my question is "what is the real way to fix this?"
> > > > > > > > Within RCU, I would probably wrapper the calls to set_need_resched()
> > > > > > > > so that it checks for the scheduler being fully alive. Except for the
> > > > > > > > call from rcu_enter_nohz(), of course -- if that one is called before
> > > > > > > > the scheduler is ready, then that is a bug that needs to be fixed.
> > > > > > >
> > > > > > > By scheduler being fully alive, do you mean when rcu_scheduler_starting
> > > > > > > is called? Or do you mean the actual scheduler, because sched_init is
> > > > > > > called well before any of this happens.
> > > > > > >
> > > > > > > > Nevertheless, I am wondering if all of this isn't really papering over
> > > > > > > > some real problem somewhere. The way we get to this place is from people
> > > > > > > > registering RCU callbacks during early boot, which is OK in and of itself,
> > > > > > > > at least in moderation. But if someone is expecting those callbacks to be
> > > > > > > > invoked before the scheduler is fully set up and running multiple tasks,
> > > > > > > > they are going to be disappointed.
> > > > > > >
> > > > > > > Is there a way to dump what callbacks have been registered? As far as I
> > > > > > > can see, we call rcu_check_callbacks unconditionally when a timer
> > > > > > > interrupt is taken and that calls rcu_pending unconditionally as well.
> > > > > > > Before that, rcu_init is called which eventually sets up the per-cpu
> > > > > > > data via rcu_init_percpu_data and that sets rdp->qs_pending = 1.
> > > > > > > Until a quiescent state is reached __rcu_pending is going to try and
> > > > > > > force it, which is where the set_need_resched is called.
> > > > > > >
> > > > > > > Basically, I took what you said about wrapping set_need_resched and came
> > > > > > > up with the patch below. It gets rid of the oops from pidmap_init, but
> > > > > > > I need to test it a bit more. Would be happy to have feedback.
> > > > > > >
> > > > > > > josh
> > > > > > >
> > > > > > > ---
> > > > > > >
> > > > > > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > > > > > > index ba06207..8c6cb6e 100644
> > > > > > > --- a/kernel/rcutree.c
> > > > > > > +++ b/kernel/rcutree.c
> > > > > > > @@ -1681,8 +1681,14 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
> > > > > > > rdp->n_rp_qs_pending++;
> > > > > > > if (!rdp->preemptible &&
> > > > > > > ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs) - 1,
> > > > > > > - jiffies))
> > > > > > > - set_need_resched();
> > > > > > > + jiffies)) {
> > > > > > > + /* Make sure we're ready to mark the task as needing
> > > > > > > + * rescheduling otherwise we can trigger oopes early
> > > > > > > + * in the init path
> > > > > > > + */
> > > > > > > + if (rcu_scheduler_active)
> > > > > > > + set_need_resched();
> > > > > >
> > > > > > What about we avoid setting rdp->qs_pending = 1 for the CPU
> > > > > > that handles the boot?
> > > > >
> > > > > That sounds promising -- only checked at the beginning of a grace period,
> > > > > so not too much overhead. In contrast, __rcu_pending() is called
> > > > > multiple times per transition to dyntick-idle state.
> > > > >
> > > > > I will take a look at this.
> > > > >
> > > > > Thanx, Paul
> > > >
> > > > I actually thought it could be done from rcu_init_percpu_data(). This
> > > > is where we initialize the qs_pending to 1, which I believe is responsible
> > > > for that set_need_resched() from rcu soon after on boot.
> > >
> > > That would cover some of the situations, but...
> > >
> > > > It's possible we also have secondary offenders in places that enqueue
> > > > rcu callbacks in the boot. But if not, then we are fine with tweaking
> > > > that qs_pending on cpu boot.
> > >
> > > The first time we take a scheduling-clock interrupt on a CPU with a
> > > callback queued, we will also set qs_pending. Hence the need to also
> > > suppress the assignment in __note_new_gpnum(). Or better yet, just
> > > prevent new grace periods in cpu_needs_another_gp(). I believe that doing
> > > this will make it unnecessary to do anything in rcu_init_percpu_data().
> >
> > Yeah if we have callbacks enqueued during the boot then we need to have
> > a check in cpu_needs_another_gp().
> >
> > Now rcu_init_percpu_data() still sets rdp->qs_pending to 1, and that
> > is going to stay as is as long as preemption is disabled.
>
> But setting rdp->qs_pending to 1 in rcu_init_percpu_data() has no effect
> until a grace period starts. So, if grace periods are prevented from
> starting, no need to mess with rcu_init_percpu_data(). Especially given
> that rcu_init_percpu_data() is also used at late boot and runtime for
> CPU hotplug.
Ok.
>
> So I believe that it is sufficient to change cpu_needs_another_gp()
> to check for boot being far enough along to allow grace periods.
Yep, sounds good.
Thanks.
next prev parent reply other threads:[~2011-08-09 11:35 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-01 15:46 3.0-git15 Atomic scheduling in pidmap_init Josh Boyer
2011-08-04 11:46 ` Josh Boyer
2011-08-04 14:04 ` Paul E. McKenney
2011-08-04 15:06 ` Josh Boyer
2011-08-04 16:26 ` Paul E. McKenney
2011-08-04 17:31 ` Josh Boyer
2011-08-05 1:19 ` Josh Boyer
2011-08-05 6:56 ` Paul E. McKenney
2011-08-05 14:22 ` Josh Boyer
2011-08-05 17:08 ` Frederic Weisbecker
2011-08-05 22:26 ` Paul E. McKenney
2011-08-05 23:12 ` Frederic Weisbecker
2011-08-08 2:09 ` Paul E. McKenney
2011-08-08 2:55 ` Frederic Weisbecker
2011-08-08 3:10 ` Paul E. McKenney
2011-08-09 11:35 ` Frederic Weisbecker [this message]
2011-08-10 12:45 ` Josh Boyer
2011-08-10 14:53 ` Frederic Weisbecker
2011-08-10 15:03 ` Josh Boyer
2011-08-14 23:04 ` Paul E. McKenney
2011-08-15 14:04 ` Josh Boyer
2011-08-15 15:20 ` Paul E. McKenney
2011-08-17 22:37 ` Josh Boyer
2011-08-17 22:49 ` Paul E. McKenney
2011-08-17 23:02 ` Josh Boyer
2011-08-17 23:06 ` Frederic Weisbecker
2011-08-17 23:17 ` Josh Boyer
2011-08-18 18:35 ` Paul E. McKenney
2011-08-18 19:11 ` Josh Boyer
2011-08-18 21:00 ` Andrew Morton
2011-08-18 21:23 ` Paul E. McKenney
2011-08-18 21:55 ` Paul E. McKenney
2011-08-18 22:21 ` Josh Boyer
2011-08-18 23:01 ` Paul E. McKenney
2011-08-24 22:45 ` Frederic Weisbecker
2011-08-24 23:12 ` Paul E. McKenney
2011-08-24 23:34 ` Frederic Weisbecker
2011-08-24 23:57 ` Paul E. McKenney
2011-08-18 22:19 ` Josh Boyer
2011-08-18 23:16 ` Paul E. McKenney
2011-08-18 23:27 ` Andrew Morton
2011-08-19 0:38 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110809113514.GA27301@somewhere.redhat.com \
--to=fweisbec@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=jwboyer@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.