All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: John Stultz <john.stultz@linaro.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Sander Eikelenboom <linux@eikelenboom.it>,
	stefan.bader@canonical.com, rjw@sisk.pl,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org
Subject: Re: Regression: ONE CPU fails bootup at Re: [3.2.0-RC7] BUG: unable to handle kernel NULL pointer dereference at 0000000000000598 [    1.478005] IP: [<ffffffff8107a6c4>] queue_work_on+0x4/0x30
Date: Wed, 4 Jan 2012 12:20:40 +1100	[thread overview]
Message-ID: <20120104122040.5a0fa6a3@notabene.brown> (raw)
In-Reply-To: <1325638380.3037.69.camel@work-vm>

[-- Attachment #1: Type: text/plain, Size: 3215 bytes --]

On Tue, 03 Jan 2012 16:53:00 -0800 John Stultz <john.stultz@linaro.org> wrote:

> On Wed, 2012-01-04 at 11:31 +1100, NeilBrown wrote:
> > On Tue, 03 Jan 2012 15:09:48 -0800 John Stultz <john.stultz@linaro.org> wrote:
> > > >From the stack trace, we've kicked off a rtc_timer_do_work, probably
> > > from the rtc_initialize_alarm() schedule_work call added in Neil's
> > > patch.  From there, we call __rtc_set_alarm -> cmos_set_alarm ->
> > > cmos_rq_disable -> cmos_checkintr -> rtc_update_irq -> schedule_work.
> > > 
> > > So, what it looks to me is that in cmos_checkintr, we grab the cmos->rtc
> > > and pass that along. Unfortunately, since the cmos->rtc value isn't set
> > > until after rtc_device_register() returns its null at that point. So
> > > your patch isn't really fixing the issue, but just reducing the race
> > > window for the second cpu to schedule the work.
> > > 
> > > Sigh. I'd guess dropping the schedule_work call from
> > > rtc_initialize_alarm() is the right approach (see below). When reviewing
> > > Neil's patch it seemed like a good idea there, but it seems off to me
> > > now.
> > > 
> > > Neil, any thoughts on the following? Can you expand on the condition you
> > > were worried about in around that call?
> > 
> > If you set an alarm in the future, then shutdown and boot again after that
> > time, then you will end up with a timer_queue node which is in the past.
> 
> Thanks for explaining this again.
> 
> Hrm. It seems the easy answer is to simply not add alarms that are in
> the past.  Further, I'm a bit perplexed, as if they are in the past, the
> enabled flag shouldn't be set.   __rtc_read_alarm() does check the
> current time, so maybe we can make sure we don't return old values? I
> guess I assumed __rtc_read_alarm() avoided returning stale values, but
> apparently not.

That would probably be a more robust approach.  Also it might make sense to
clean out old alarms whenever we are about to add a new one.

> 
> > When this happens the queue gets stuck.  That entry-in-the-past won't get
> > removed until and interrupt happens and an interrupt won't happen because the
> > RTC only triggers an interrupt when the alarm is "now".
> > 
> > So you'll find that e.g. "hwclock" will always tell you that 'select' timed
> > out.
> > 
> > So we force the interrupt work to happen at the start just in case.
> 
> Unfortunately its too early. 
> 
> > Did you see my proposed patch which converted those calls to do the work
> > in-process rather than passing it to a worker-thread?  I think that is a
> > clean fix.
> 
> I don't think I saw it today. Was it from before the holidays?

About 4 hours ago:
  Subject: Re: Patch Upstream: rtc: Expire alarms after the time is set.

> 
> Even so, at this point, I don't know if we have enough time for testing,
> so I'm thinking we either just drop the problematic sched_work call or
> revert the whole thing and try again for 3.3

I wouldn't object to that.  The bug only triggers in unusual circumstances
and is quite easy to work around so it is safer  to wait until we have a
really good fix.

Thanks,
NeilBrown


> 
> thanks
> -john
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2012-01-04  1:20 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-03 16:13 [3.2.0-RC7] BUG: unable to handle kernel NULL pointer dereference at 0000000000000598 [ 1.478005] IP: [<ffffffff8107a6c4>] queue_work_on+0x4/0x30 Sander Eikelenboom
2012-01-03 19:07 ` Regression: ONE CPU fails bootup at " Konrad Rzeszutek Wilk
2012-01-03 19:17   ` Sander Eikelenboom
2012-01-03 19:26   ` Stefan Bader
2012-01-03 20:11     ` Sander Eikelenboom
2012-01-03 20:10   ` Sander Eikelenboom
2012-01-03 22:33     ` Konrad Rzeszutek Wilk
2012-01-03 23:09   ` John Stultz
2012-01-04  0:31     ` NeilBrown
2012-01-04  0:53       ` John Stultz
2012-01-04  1:20         ` NeilBrown [this message]
2012-01-04 14:46           ` Konrad Rzeszutek Wilk
2012-01-04 15:12           ` Regression: ONE CPU fails bootup at Re: [3.2.0-RC7] BUG: unable to handle kernel NULL pointer dereference at 0000000000000598 " Stefan Bader
2012-01-05 22:03             ` NeilBrown
2012-01-04  8:17         ` Stefan Bader
2012-01-04 12:25           ` Stefan Bader
2012-01-04 13:17             ` Sander Eikelenboom
2012-01-04 18:33               ` John Stultz
2012-01-04 14:13             ` Stefan Bader
2012-01-06 20:41               ` John Stultz
2012-01-08 20:48                 ` Sander Eikelenboom
2012-01-09 13:26                 ` Stefan Bader
2012-01-04 18:35             ` John Stultz
2012-01-04 18:36           ` John Stultz
2012-01-04 18:50             ` Stefan Bader
2012-01-04 19:47             ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120104122040.5a0fa6a3@notabene.brown \
    --to=neilb@suse.de \
    --cc=john.stultz@linaro.org \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@eikelenboom.it \
    --cc=rjw@sisk.pl \
    --cc=stefan.bader@canonical.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.