public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Stefan Bader <stefan.bader@canonical.com>
To: John Stultz <john.stultz@linaro.org>
Cc: NeilBrown <neilb@suse.de>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Sander Eikelenboom <linux@eikelenboom.it>,
	rjw@sisk.pl, Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org
Subject: Re: Regression: ONE CPU fails bootup at Re: [3.2.0-RC7] BUG: unable to handle kernel NULL pointer dereference at 0000000000000598  1.478005] IP: [<ffffffff8107a6c4>] queue_work_on+0x4/0x30
Date: Mon, 09 Jan 2012 14:26:37 +0100	[thread overview]
Message-ID: <4F0AEB0D.4050403@canonical.com> (raw)
In-Reply-To: <1325882475.2290.106.camel@work-vm>

On 06.01.2012 21:41, John Stultz wrote:
> On Wed, 2012-01-04 at 15:13 +0100, Stefan Bader wrote:
>> On 04.01.2012 13:25, Stefan Bader wrote:
>>> On 04.01.2012 09:17, Stefan Bader wrote:
>>>> On 04.01.2012 01:53, John Stultz wrote:
>>>>> On Wed, 2012-01-04 at 11:31 +1100, NeilBrown wrote:
>>>>>> On Tue, 03 Jan 2012 15:09:48 -0800 John Stultz <john.stultz@linaro.org> wrote:
>>>>>>> >From the stack trace, we've kicked off a rtc_timer_do_work, probably
>>>>>>> from the rtc_initialize_alarm() schedule_work call added in Neil's
>>>>>>> patch.  From there, we call __rtc_set_alarm -> cmos_set_alarm ->
>>>>>>> cmos_rq_disable -> cmos_checkintr -> rtc_update_irq -> schedule_work.
>>>>
>>>> Sorry, I was off for the evening a while after sending this out. And I just
>>>> started, so a few thing I will be doing later but have not yet had time.
>>>>
>>>> Over night I had still be thinking on this and maybe one important fact I had
>>>> been ignoring. This really has only been observed on paravirt guests on Xen as
>>>> far as I know. And one thing that I should have pointed out is that
>>>>
>>>> [    0.792634] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
>>>> [    0.792725] rtc_cmos: probe of rtc_cmos failed with error -38
>>>>
>>>> So first the registration is done and the first line is the last thing printed
>>>> in the registration function. Then, and that line always comes after, the probe,
>>>> which looks like being done asynchronously, detects that the rtc is not
>>>> implemented. I would assume that this causes the rtc to be unregistered again
>>>> and that is probably the point where, under the right circumstances, the worker
>>>> triggered by the initialize alarm is trying to set another alarm. Probably while
>>>> some of the elements of the structure started to be torn down. I need to check
>>>> on that code path, yet. So right now its more a guess.
>>>>
>>>>>>>
>>>>>>> So, what it looks to me is that in cmos_checkintr, we grab the cmos->rtc
>>>>>>> and pass that along. Unfortunately, since the cmos->rtc value isn't set
>>>>>>> until after rtc_device_register() returns its null at that point. So
>>>>>>> your patch isn't really fixing the issue, but just reducing the race
>>>>>>> window for the second cpu to schedule the work.
>>>>>>>
>>>>>>> Sigh. I'd guess dropping the schedule_work call from
>>>>>>> rtc_initialize_alarm() is the right approach (see below). When reviewing
>>>>>>> Neil's patch it seemed like a good idea there, but it seems off to me
>>>>>>> now.
>>>>>>>
>>>>>>> Neil, any thoughts on the following? Can you expand on the condition you
>>>>>>> were worried about in around that call?
>>>>>>
>>>>>> If you set an alarm in the future, then shutdown and boot again after that
>>>>>> time, then you will end up with a timer_queue node which is in the past.
>>>>>
>>>>> Thanks for explaining this again.
>>>>>
>>>>> Hrm. It seems the easy answer is to simply not add alarms that are in
>>>>> the past.  Further, I'm a bit perplexed, as if they are in the past, the
>>>>> enabled flag shouldn't be set.   __rtc_read_alarm() does check the
>>>>> current time, so maybe we can make sure we don't return old values? I
>>>>> guess I assumed __rtc_read_alarm() avoided returning stale values, but
>>>>> apparently not.
>>>>>
>>>>>> When this happens the queue gets stuck.  That entry-in-the-past won't get
>>>>>> removed until and interrupt happens and an interrupt won't happen because the
>>>>>> RTC only triggers an interrupt when the alarm is "now".
>>>>>>
>>>>>> So you'll find that e.g. "hwclock" will always tell you that 'select' timed
>>>>>> out.
>>>>>>
>>>>>> So we force the interrupt work to happen at the start just in case.
>>>>>
>>>>> Unfortunately its too early. 
>>>>>
>>>>>> Did you see my proposed patch which converted those calls to do the work
>>>>>> in-process rather than passing it to a worker-thread?  I think that is a
>>>>>> clean fix.
>>>>>
>>>>> I don't think I saw it today. Was it from before the holidays?
>>>>>
>>>>
>>>> I fear I caused a bit of confusion there. Neil responded to my initial mail
>>>> which was done as a reply to the mail announcing this patch for stable (which
>>>> just was the first thread I could get hold of).
>>>> I will try Neil's patch as well. And in parallel try to see whether the theory I
>>>> had this night makes sense. If it does, then it is only indirectly that the work
>>>> is scheduled too early. In that case just the teardown needs to make sure that
>>>> no work is being run while removal. Well, maybe the question is whether there
>>>> should be a delay in running the irq work until the device really, really is
>>>> completely set up... But that sounds a bit more complicated.
>>>
>>> By now I tried Neil's proposed patch and unfortunately that makes things rather
>>> worse. I also played around with the idea of the unregistration race. Maybe
>>> there also is one (that cancel_work_sync should be called before unregistering
>>> the device) but definitely it is not what happens at least in the one CPU case.
>>> I added some more printk's and the crash happens before even the rtc core class
>>> has been fully registered. And no unregister is call has been made either.
>>>
>>> Which may point to execution of the irq worker (including a schedule_work)
>>> before the rtc-cmos parts are finished... Would explain why moving the
>>> initialize call further down does at least narrow the window for it to happen...
>>> The only thing I do not understand then is why that seems only to happen on Xen
>>> guests...
>>>
>>
>> Darn, guess I understand now... So cmos-rtc will call rtc_device_register from
>> within do_cmos_probe and set cmos_rtc.rtc with the pointer that
>> rtc_device_register returns.
>> But when having rtc_initialize_alarm being called earlier in
>> rtc_device_register, and that queues the work, there is a chance it will fire
>> even before that function returns.
>> And then cmos_checkintr will dereference the null pointer still stored in
>> cmos_rtc.rtc for calling rtc_update_irq...
> 
> Hey Stefan, Konrad, Sander, Neil,
> 	Thanks again for the bug reporting and sorry for the earlier trouble.
> So after the problematic patch was reverted for 3.2, I worked with Neil
> to try to address the issue in a better way. It looks like its working
> for Neil, so I was wondering if any of you could look over and maybe
> give the following tree a whirl:
> 	git://git.linaro.org/people/jstultz/linux.git dev/rtc-fixups
> 

Had a go with the patches applied with one HVM setup and two PVM setups (one and
4 vcpus) and have not seen any bad side effects.

Tested-by: Stefan Bader <stefan.bader@canonical.com>

> The shortlog is here:
> http://git.linaro.org/gitweb?p=people/jstultz/linux.git;a=shortlog;h=refs/heads/dev/rtc-fixups
> 
> If this avoids the issue for you, and no one sees any other problems,
> I'll queue these for 3.3
> 
> thanks
> -john
> 


  parent reply	other threads:[~2012-01-09 13:26 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-03 16:13 [3.2.0-RC7] BUG: unable to handle kernel NULL pointer dereference at 0000000000000598 [ 1.478005] IP: [<ffffffff8107a6c4>] queue_work_on+0x4/0x30 Sander Eikelenboom
2012-01-03 19:07 ` Regression: ONE CPU fails bootup at " Konrad Rzeszutek Wilk
2012-01-03 19:17   ` Sander Eikelenboom
2012-01-03 19:26   ` Stefan Bader
2012-01-03 20:11     ` Sander Eikelenboom
2012-01-03 20:10   ` Sander Eikelenboom
2012-01-03 22:33     ` Konrad Rzeszutek Wilk
2012-01-03 23:09   ` John Stultz
2012-01-04  0:31     ` NeilBrown
2012-01-04  0:53       ` John Stultz
2012-01-04  1:20         ` NeilBrown
2012-01-04 14:46           ` Konrad Rzeszutek Wilk
2012-01-04 15:12           ` Regression: ONE CPU fails bootup at Re: [3.2.0-RC7] BUG: unable to handle kernel NULL pointer dereference at 0000000000000598 " Stefan Bader
2012-01-05 22:03             ` NeilBrown
2012-01-04  8:17         ` Stefan Bader
2012-01-04 12:25           ` Stefan Bader
2012-01-04 13:17             ` Sander Eikelenboom
2012-01-04 18:33               ` John Stultz
2012-01-04 14:13             ` Stefan Bader
2012-01-06 20:41               ` John Stultz
2012-01-08 20:48                 ` Sander Eikelenboom
2012-01-09 13:26                 ` Stefan Bader [this message]
2012-01-04 18:35             ` John Stultz
2012-01-04 18:36           ` John Stultz
2012-01-04 18:50             ` Stefan Bader
2012-01-04 19:47             ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F0AEB0D.4050403@canonical.com \
    --to=stefan.bader@canonical.com \
    --cc=john.stultz@linaro.org \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@eikelenboom.it \
    --cc=neilb@suse.de \
    --cc=rjw@sisk.pl \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox