From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Josh Boyer <jwboyer@redhat.com>
Cc: linux-kernel@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>
Subject: Re: 3.0-git15 Atomic scheduling in pidmap_init
Date: Thu, 4 Aug 2011 09:26:58 -0700 [thread overview]
Message-ID: <20110804162658.GZ13065@linux.vnet.ibm.com> (raw)
In-Reply-To: <20110804150603.GL2096@zod.bos.redhat.com>
On Thu, Aug 04, 2011 at 11:06:03AM -0400, Josh Boyer wrote:
> On Thu, Aug 04, 2011 at 07:04:38AM -0700, Paul E. McKenney wrote:
> > On Thu, Aug 04, 2011 at 07:46:03AM -0400, Josh Boyer wrote:
> > > On Mon, Aug 1, 2011 at 11:46 AM, Josh Boyer <jwboyer@redhat.com> wrote:
> > > > We're seeing a scheduling while atomic backtrace in rawhide from pidmap_init
> > > > (https://bugzilla.redhat.com/show_bug.cgi?id=726877). While this seems
> > > > mostly harmless given that there isn't anything else to schedule to at
> > > > this point, I do wonder why things are marked as needing rescheduled so
> > > > early.
> > > >
> > > > We get to might_sleep through the might_sleep_if call in
> > > > slab_pre_alloc_hook because both kzalloc and KMEM_CACHE are called with
> > > > GFP_KERNEL. That eventually has a call chain like:
> > > >
> > > > might_resched->_cond_resched->should_resched
> > > >
> > > > which apparently returns true. Why the initial thread says it should
> > > > reschedule at this point, I'm not sure.
> > > >
> > > > I tried cheating by making the kzalloc call in pidmap_init use GFP_IOFS
> > > > instead of GFP_KERNEL to avoid the might_sleep_if call, and that worked
> > > > but I can't do the same for the kmalloc calls in kmem_cache_create, so
> > > > getting to the bottom of why should_resched is returning true seems to
> > > > be a better approach.
> > >
> > > A bit more info on this.
> > >
> > > What seems to be happening is that late_time_init is called, which
> > > gets around to calling hpet_time_init, which enables the HPET, and
> > > then calls setup_default_timer_irq. setup_default_timer_irq in
> > > arch/x86/kernel/time.c calls setup_irq with the timer_interrupt
> > > handler.
> > >
> > > At this point the timer interrupt hits, and then tick_handle_periodic is called
> > >
> > > timer int
> > > tick_handle_periodic -> tick_periodic -> update_process_times ->
> > > rcu_check_callbacks -> rcu_pending ->
> > > __rcp_pending -> set_need_resched (this is called around line 1685 in
> > > kernel/rcutree.c)
> > >
> > > So what's happening is that once the timer interrupt starts, RCU is
> > > coming in and marking current as needing reschedule, and that in turn
> > > makes the slab_pre_alloc_hook -> might_sleep_if -> might_sleep ->
> > > might_resched -> _cond_resched to trigger when pidmap_init calls
> > > kzalloc later on and produce the oops below later on in the init
> > > sequence. I believe we see this because of all the debugging options
> > > we have enabled in the kernel configs.
> > >
> > > This might be normal for all I know, but the oops is rather annoying.
> > > It seems RCU isn't in a quiescent state, we aren't preemptible yet,
> > > and it _really_ wants to reschedule things to make itself happy.
> > > Anyone have any thoughts on how to either keep RCU from marking
> > > current as needing reschdule so early, or otherwise working around the
> > > bug?
> >
> > The deal is that RCU realizes that RCU needs a quiescent state from
> > this CPU. The set_need_resched() is intended to cause one. But there
> > is not much point this early in boot, because the scheduler isn't going
> > to do anything anyway. I can prevent this with the following patch,
> > but isn't this same thing possible later at runtime?
>
> Possibly, but I'm not sure at the moment. The patch avoids the oops and
> I haven't seen another once in some brief runtime testing. (Trivial
> fixing to make it apply to current linus.)
>
> > You really do need to be able to handle set_need_resched() at random
> > times, and at first glance it appears to me that the warning could be
> > triggered at runtime as well. If so, the real fix is elsewhere, right?
> > Especially given that the patch imposes extra cost at runtime...
>
> In staring at it for a while it seems to be a combination of being in
> atomic context according to the scheduler but things in early boot using
> GFP_KERNEL. At the point we're at in the boot, that is perfectly legal
> as it's not being called from an interrupt handler and the mm subsystem
> should be all setup, but we're still really early in boot and preempt is
> disabled.
Isn't preemption disabled at that point in boot? And isn't GFP_KERNEL
illegal within preemption-disabled regions?
> As I mentioned before, KMEM_CACHE calls kmalloc with
> GFP_KERNEL and I don't think we want to change that.
>
> Once we're past early boot, I would expect that things running in true
> atomic context won't be calling KMEM_CACHE or using GFP_KERNEL. Or
> maybe I hope?
>
> I understand the desire to avoid another conditional, but I certainly
> don't have any other suggestions at the moment.
How about doing GFP_ATOMIC on allocations done during that portion
of the boot patch for which preemption is disabled?
Thanx, Paul
next prev parent reply other threads:[~2011-08-04 16:34 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-01 15:46 3.0-git15 Atomic scheduling in pidmap_init Josh Boyer
2011-08-04 11:46 ` Josh Boyer
2011-08-04 14:04 ` Paul E. McKenney
2011-08-04 15:06 ` Josh Boyer
2011-08-04 16:26 ` Paul E. McKenney [this message]
2011-08-04 17:31 ` Josh Boyer
2011-08-05 1:19 ` Josh Boyer
2011-08-05 6:56 ` Paul E. McKenney
2011-08-05 14:22 ` Josh Boyer
2011-08-05 17:08 ` Frederic Weisbecker
2011-08-05 22:26 ` Paul E. McKenney
2011-08-05 23:12 ` Frederic Weisbecker
2011-08-08 2:09 ` Paul E. McKenney
2011-08-08 2:55 ` Frederic Weisbecker
2011-08-08 3:10 ` Paul E. McKenney
2011-08-09 11:35 ` Frederic Weisbecker
2011-08-10 12:45 ` Josh Boyer
2011-08-10 14:53 ` Frederic Weisbecker
2011-08-10 15:03 ` Josh Boyer
2011-08-14 23:04 ` Paul E. McKenney
2011-08-15 14:04 ` Josh Boyer
2011-08-15 15:20 ` Paul E. McKenney
2011-08-17 22:37 ` Josh Boyer
2011-08-17 22:49 ` Paul E. McKenney
2011-08-17 23:02 ` Josh Boyer
2011-08-17 23:06 ` Frederic Weisbecker
2011-08-17 23:17 ` Josh Boyer
2011-08-18 18:35 ` Paul E. McKenney
2011-08-18 19:11 ` Josh Boyer
2011-08-18 21:00 ` Andrew Morton
2011-08-18 21:23 ` Paul E. McKenney
2011-08-18 21:55 ` Paul E. McKenney
2011-08-18 22:21 ` Josh Boyer
2011-08-18 23:01 ` Paul E. McKenney
2011-08-24 22:45 ` Frederic Weisbecker
2011-08-24 23:12 ` Paul E. McKenney
2011-08-24 23:34 ` Frederic Weisbecker
2011-08-24 23:57 ` Paul E. McKenney
2011-08-18 22:19 ` Josh Boyer
2011-08-18 23:16 ` Paul E. McKenney
2011-08-18 23:27 ` Andrew Morton
2011-08-19 0:38 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110804162658.GZ13065@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=jwboyer@redhat.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.