From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Josh Boyer <jwboyer@redhat.com>
Cc: linux-kernel@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>
Subject: Re: 3.0-git15 Atomic scheduling in pidmap_init
Date: Thu, 4 Aug 2011 09:26:58 -0700 [thread overview]
Message-ID: <20110804162658.GZ13065@linux.vnet.ibm.com> (raw)
In-Reply-To: <20110804150603.GL2096@zod.bos.redhat.com>
On Thu, Aug 04, 2011 at 11:06:03AM -0400, Josh Boyer wrote:
> On Thu, Aug 04, 2011 at 07:04:38AM -0700, Paul E. McKenney wrote:
> > On Thu, Aug 04, 2011 at 07:46:03AM -0400, Josh Boyer wrote:
> > > On Mon, Aug 1, 2011 at 11:46 AM, Josh Boyer <jwboyer@redhat.com> wrote:
> > > > We're seeing a scheduling while atomic backtrace in rawhide from pidmap_init
> > > > (https://bugzilla.redhat.com/show_bug.cgi?id=726877). While this seems
> > > > mostly harmless given that there isn't anything else to schedule to at
> > > > this point, I do wonder why things are marked as needing rescheduled so
> > > > early.
> > > >
> > > > We get to might_sleep through the might_sleep_if call in
> > > > slab_pre_alloc_hook because both kzalloc and KMEM_CACHE are called with
> > > > GFP_KERNEL. That eventually has a call chain like:
> > > >
> > > > might_resched->_cond_resched->should_resched
> > > >
> > > > which apparently returns true. Why the initial thread says it should
> > > > reschedule at this point, I'm not sure.
> > > >
> > > > I tried cheating by making the kzalloc call in pidmap_init use GFP_IOFS
> > > > instead of GFP_KERNEL to avoid the might_sleep_if call, and that worked
> > > > but I can't do the same for the kmalloc calls in kmem_cache_create, so
> > > > getting to the bottom of why should_resched is returning true seems to
> > > > be a better approach.
> > >
> > > A bit more info on this.
> > >
> > > What seems to be happening is that late_time_init is called, which
> > > gets around to calling hpet_time_init, which enables the HPET, and
> > > then calls setup_default_timer_irq. setup_default_timer_irq in
> > > arch/x86/kernel/time.c calls setup_irq with the timer_interrupt
> > > handler.
> > >
> > > At this point the timer interrupt hits, and then tick_handle_periodic is called
> > >
> > > timer int
> > > tick_handle_periodic -> tick_periodic -> update_process_times ->
> > > rcu_check_callbacks -> rcu_pending ->
> > > __rcp_pending -> set_need_resched (this is called around line 1685 in
> > > kernel/rcutree.c)
> > >
> > > So what's happening is that once the timer interrupt starts, RCU is
> > > coming in and marking current as needing reschedule, and that in turn
> > > makes the slab_pre_alloc_hook -> might_sleep_if -> might_sleep ->
> > > might_resched -> _cond_resched to trigger when pidmap_init calls
> > > kzalloc later on and produce the oops below later on in the init
> > > sequence. I believe we see this because of all the debugging options
> > > we have enabled in the kernel configs.
> > >
> > > This might be normal for all I know, but the oops is rather annoying.
> > > It seems RCU isn't in a quiescent state, we aren't preemptible yet,
> > > and it _really_ wants to reschedule things to make itself happy.
> > > Anyone have any thoughts on how to either keep RCU from marking
> > > current as needing reschdule so early, or otherwise working around the
> > > bug?
> >
> > The deal is that RCU realizes that RCU needs a quiescent state from
> > this CPU. The set_need_resched() is intended to cause one. But there
> > is not much point this early in boot, because the scheduler isn't going
> > to do anything anyway. I can prevent this with the following patch,
> > but isn't this same thing possible later at runtime?
>
> Possibly, but I'm not sure at the moment. The patch avoids the oops and
> I haven't seen another once in some brief runtime testing. (Trivial
> fixing to make it apply to current linus.)
>
> > You really do need to be able to handle set_need_resched() at random
> > times, and at first glance it appears to me that the warning could be
> > triggered at runtime as well. If so, the real fix is elsewhere, right?
> > Especially given that the patch imposes extra cost at runtime...
>
> In staring at it for a while it seems to be a combination of being in
> atomic context according to the scheduler but things in early boot using
> GFP_KERNEL. At the point we're at in the boot, that is perfectly legal
> as it's not being called from an interrupt handler and the mm subsystem
> should be all setup, but we're still really early in boot and preempt is
> disabled.
Isn't preemption disabled at that point in boot? And isn't GFP_KERNEL
illegal within preemption-disabled regions?
> As I mentioned before, KMEM_CACHE calls kmalloc with
> GFP_KERNEL and I don't think we want to change that.
>
> Once we're past early boot, I would expect that things running in true
> atomic context won't be calling KMEM_CACHE or using GFP_KERNEL. Or
> maybe I hope?
>
> I understand the desire to avoid another conditional, but I certainly
> don't have any other suggestions at the moment.
How about doing GFP_ATOMIC on allocations done during that portion
of the boot patch for which preemption is disabled?
Thanx, Paul
next prev parent reply other threads:[~2011-08-04 16:34 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-01 15:46 3.0-git15 Atomic scheduling in pidmap_init Josh Boyer
2011-08-04 11:46 ` Josh Boyer
2011-08-04 14:04 ` Paul E. McKenney
2011-08-04 15:06 ` Josh Boyer
2011-08-04 16:26 ` Paul E. McKenney [this message]
2011-08-04 17:31 ` Josh Boyer
2011-08-05 1:19 ` Josh Boyer
2011-08-05 6:56 ` Paul E. McKenney
2011-08-05 14:22 ` Josh Boyer
2011-08-05 17:08 ` Frederic Weisbecker
2011-08-05 22:26 ` Paul E. McKenney
2011-08-05 23:12 ` Frederic Weisbecker
2011-08-08 2:09 ` Paul E. McKenney
2011-08-08 2:55 ` Frederic Weisbecker
2011-08-08 3:10 ` Paul E. McKenney
2011-08-09 11:35 ` Frederic Weisbecker
2011-08-10 12:45 ` Josh Boyer
2011-08-10 14:53 ` Frederic Weisbecker
2011-08-10 15:03 ` Josh Boyer
2011-08-14 23:04 ` Paul E. McKenney
2011-08-15 14:04 ` Josh Boyer
2011-08-15 15:20 ` Paul E. McKenney
2011-08-17 22:37 ` Josh Boyer
2011-08-17 22:49 ` Paul E. McKenney
2011-08-17 23:02 ` Josh Boyer
2011-08-17 23:06 ` Frederic Weisbecker
2011-08-17 23:17 ` Josh Boyer
2011-08-18 18:35 ` Paul E. McKenney
2011-08-18 19:11 ` Josh Boyer
2011-08-18 21:00 ` Andrew Morton
2011-08-18 21:23 ` Paul E. McKenney
2011-08-18 21:55 ` Paul E. McKenney
2011-08-18 22:21 ` Josh Boyer
2011-08-18 23:01 ` Paul E. McKenney
2011-08-24 22:45 ` Frederic Weisbecker
2011-08-24 23:12 ` Paul E. McKenney
2011-08-24 23:34 ` Frederic Weisbecker
2011-08-24 23:57 ` Paul E. McKenney
2011-08-18 22:19 ` Josh Boyer
2011-08-18 23:16 ` Paul E. McKenney
2011-08-18 23:27 ` Andrew Morton
2011-08-19 0:38 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110804162658.GZ13065@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=jwboyer@redhat.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox