From: John Stultz <john.stultz@linaro.org>
To: NeilBrown <neilb@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
Sander Eikelenboom <linux@eikelenboom.it>,
stefan.bader@canonical.com, rjw@sisk.pl,
Thomas Gleixner <tglx@linutronix.de>,
linux-kernel@vger.kernel.org
Subject: Re: Regression: ONE CPU fails bootup at Re: [3.2.0-RC7] BUG: unable to handle kernel NULL pointer dereference at 0000000000000598 [ 1.478005] IP: [<ffffffff8107a6c4>] queue_work_on+0x4/0x30
Date: Tue, 03 Jan 2012 16:53:00 -0800 [thread overview]
Message-ID: <1325638380.3037.69.camel@work-vm> (raw)
In-Reply-To: <20120104113155.27bf6e46@notabene.brown>
On Wed, 2012-01-04 at 11:31 +1100, NeilBrown wrote:
> On Tue, 03 Jan 2012 15:09:48 -0800 John Stultz <john.stultz@linaro.org> wrote:
> > >From the stack trace, we've kicked off a rtc_timer_do_work, probably
> > from the rtc_initialize_alarm() schedule_work call added in Neil's
> > patch. From there, we call __rtc_set_alarm -> cmos_set_alarm ->
> > cmos_rq_disable -> cmos_checkintr -> rtc_update_irq -> schedule_work.
> >
> > So, what it looks to me is that in cmos_checkintr, we grab the cmos->rtc
> > and pass that along. Unfortunately, since the cmos->rtc value isn't set
> > until after rtc_device_register() returns its null at that point. So
> > your patch isn't really fixing the issue, but just reducing the race
> > window for the second cpu to schedule the work.
> >
> > Sigh. I'd guess dropping the schedule_work call from
> > rtc_initialize_alarm() is the right approach (see below). When reviewing
> > Neil's patch it seemed like a good idea there, but it seems off to me
> > now.
> >
> > Neil, any thoughts on the following? Can you expand on the condition you
> > were worried about in around that call?
>
> If you set an alarm in the future, then shutdown and boot again after that
> time, then you will end up with a timer_queue node which is in the past.
Thanks for explaining this again.
Hrm. It seems the easy answer is to simply not add alarms that are in
the past. Further, I'm a bit perplexed, as if they are in the past, the
enabled flag shouldn't be set. __rtc_read_alarm() does check the
current time, so maybe we can make sure we don't return old values? I
guess I assumed __rtc_read_alarm() avoided returning stale values, but
apparently not.
> When this happens the queue gets stuck. That entry-in-the-past won't get
> removed until and interrupt happens and an interrupt won't happen because the
> RTC only triggers an interrupt when the alarm is "now".
>
> So you'll find that e.g. "hwclock" will always tell you that 'select' timed
> out.
>
> So we force the interrupt work to happen at the start just in case.
Unfortunately its too early.
> Did you see my proposed patch which converted those calls to do the work
> in-process rather than passing it to a worker-thread? I think that is a
> clean fix.
I don't think I saw it today. Was it from before the holidays?
Even so, at this point, I don't know if we have enough time for testing,
so I'm thinking we either just drop the problematic sched_work call or
revert the whole thing and try again for 3.3
thanks
-john
next prev parent reply other threads:[~2012-01-04 0:53 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-03 16:13 [3.2.0-RC7] BUG: unable to handle kernel NULL pointer dereference at 0000000000000598 [ 1.478005] IP: [<ffffffff8107a6c4>] queue_work_on+0x4/0x30 Sander Eikelenboom
2012-01-03 19:07 ` Regression: ONE CPU fails bootup at " Konrad Rzeszutek Wilk
2012-01-03 19:17 ` Sander Eikelenboom
2012-01-03 19:26 ` Stefan Bader
2012-01-03 20:11 ` Sander Eikelenboom
2012-01-03 20:10 ` Sander Eikelenboom
2012-01-03 22:33 ` Konrad Rzeszutek Wilk
2012-01-03 23:09 ` John Stultz
2012-01-04 0:31 ` NeilBrown
2012-01-04 0:53 ` John Stultz [this message]
2012-01-04 1:20 ` NeilBrown
2012-01-04 14:46 ` Konrad Rzeszutek Wilk
2012-01-04 15:12 ` Regression: ONE CPU fails bootup at Re: [3.2.0-RC7] BUG: unable to handle kernel NULL pointer dereference at 0000000000000598 " Stefan Bader
2012-01-05 22:03 ` NeilBrown
2012-01-04 8:17 ` Stefan Bader
2012-01-04 12:25 ` Stefan Bader
2012-01-04 13:17 ` Sander Eikelenboom
2012-01-04 18:33 ` John Stultz
2012-01-04 14:13 ` Stefan Bader
2012-01-06 20:41 ` John Stultz
2012-01-08 20:48 ` Sander Eikelenboom
2012-01-09 13:26 ` Stefan Bader
2012-01-04 18:35 ` John Stultz
2012-01-04 18:36 ` John Stultz
2012-01-04 18:50 ` Stefan Bader
2012-01-04 19:47 ` Konrad Rzeszutek Wilk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1325638380.3037.69.camel@work-vm \
--to=john.stultz@linaro.org \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@eikelenboom.it \
--cc=neilb@suse.de \
--cc=rjw@sisk.pl \
--cc=stefan.bader@canonical.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox