All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Bader <stefan.bader@canonical.com>
To: John Stultz <john.stultz@linaro.org>
Cc: NeilBrown <neilb@suse.de>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Sander Eikelenboom <linux@eikelenboom.it>,
	rjw@sisk.pl, Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org
Subject: Re: Regression: ONE CPU fails bootup at Re: [3.2.0-RC7] BUG: unable to handle kernel NULL pointer dereference at 0000000000000598  1.478005] IP: [<ffffffff8107a6c4>] queue_work_on+0x4/0x30
Date: Mon, 09 Jan 2012 14:26:37 +0100	[thread overview]
Message-ID: <4F0AEB0D.4050403@canonical.com> (raw)
In-Reply-To: <1325882475.2290.106.camel@work-vm>

On 06.01.2012 21:41, John Stultz wrote:
> On Wed, 2012-01-04 at 15:13 +0100, Stefan Bader wrote:
>> On 04.01.2012 13:25, Stefan Bader wrote:
>>> On 04.01.2012 09:17, Stefan Bader wrote:
>>>> On 04.01.2012 01:53, John Stultz wrote:
>>>>> On Wed, 2012-01-04 at 11:31 +1100, NeilBrown wrote:
>>>>>> On Tue, 03 Jan 2012 15:09:48 -0800 John Stultz <john.stultz@linaro.org> wrote:
>>>>>>> >From the stack trace, we've kicked off a rtc_timer_do_work, probably
>>>>>>> from the rtc_initialize_alarm() schedule_work call added in Neil's
>>>>>>> patch.  From there, we call __rtc_set_alarm -> cmos_set_alarm ->
>>>>>>> cmos_rq_disable -> cmos_checkintr -> rtc_update_irq -> schedule_work.
>>>>
>>>> Sorry, I was off for the evening a while after sending this out. And I just
>>>> started, so a few thing I will be doing later but have not yet had time.
>>>>
>>>> Over night I had still be thinking on this and maybe one important fact I had
>>>> been ignoring. This really has only been observed on paravirt guests on Xen as
>>>> far as I know. And one thing that I should have pointed out is that
>>>>
>>>> [    0.792634] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
>>>> [    0.792725] rtc_cmos: probe of rtc_cmos failed with error -38
>>>>
>>>> So first the registration is done and the first line is the last thing printed
>>>> in the registration function. Then, and that line always comes after, the probe,
>>>> which looks like being done asynchronously, detects that the rtc is not
>>>> implemented. I would assume that this causes the rtc to be unregistered again
>>>> and that is probably the point where, under the right circumstances, the worker
>>>> triggered by the initialize alarm is trying to set another alarm. Probably while
>>>> some of the elements of the structure started to be torn down. I need to check
>>>> on that code path, yet. So right now its more a guess.
>>>>
>>>>>>>
>>>>>>> So, what it looks to me is that in cmos_checkintr, we grab the cmos->rtc
>>>>>>> and pass that along. Unfortunately, since the cmos->rtc value isn't set
>>>>>>> until after rtc_device_register() returns its null at that point. So
>>>>>>> your patch isn't really fixing the issue, but just reducing the race
>>>>>>> window for the second cpu to schedule the work.
>>>>>>>
>>>>>>> Sigh. I'd guess dropping the schedule_work call from
>>>>>>> rtc_initialize_alarm() is the right approach (see below). When reviewing
>>>>>>> Neil's patch it seemed like a good idea there, but it seems off to me
>>>>>>> now.
>>>>>>>
>>>>>>> Neil, any thoughts on the following? Can you expand on the condition you
>>>>>>> were worried about in around that call?
>>>>>>
>>>>>> If you set an alarm in the future, then shutdown and boot again after that
>>>>>> time, then you will end up with a timer_queue node which is in the past.
>>>>>
>>>>> Thanks for explaining this again.
>>>>>
>>>>> Hrm. It seems the easy answer is to simply not add alarms that are in
>>>>> the past.  Further, I'm a bit perplexed, as if they are in the past, the
>>>>> enabled flag shouldn't be set.   __rtc_read_alarm() does check the
>>>>> current time, so maybe we can make sure we don't return old values? I
>>>>> guess I assumed __rtc_read_alarm() avoided returning stale values, but
>>>>> apparently not.
>>>>>
>>>>>> When this happens the queue gets stuck.  That entry-in-the-past won't get
>>>>>> removed until and interrupt happens and an interrupt won't happen because the
>>>>>> RTC only triggers an interrupt when the alarm is "now".
>>>>>>
>>>>>> So you'll find that e.g. "hwclock" will always tell you that 'select' timed
>>>>>> out.
>>>>>>
>>>>>> So we force the interrupt work to happen at the start just in case.
>>>>>
>>>>> Unfortunately its too early. 
>>>>>
>>>>>> Did you see my proposed patch which converted those calls to do the work
>>>>>> in-process rather than passing it to a worker-thread?  I think that is a
>>>>>> clean fix.
>>>>>
>>>>> I don't think I saw it today. Was it from before the holidays?
>>>>>
>>>>
>>>> I fear I caused a bit of confusion there. Neil responded to my initial mail
>>>> which was done as a reply to the mail announcing this patch for stable (which
>>>> just was the first thread I could get hold of).
>>>> I will try Neil's patch as well. And in parallel try to see whether the theory I
>>>> had this night makes sense. If it does, then it is only indirectly that the work
>>>> is scheduled too early. In that case just the teardown needs to make sure that
>>>> no work is being run while removal. Well, maybe the question is whether there
>>>> should be a delay in running the irq work until the device really, really is
>>>> completely set up... But that sounds a bit more complicated.
>>>
>>> By now I tried Neil's proposed patch and unfortunately that makes things rather
>>> worse. I also played around with the idea of the unregistration race. Maybe
>>> there also is one (that cancel_work_sync should be called before unregistering
>>> the device) but definitely it is not what happens at least in the one CPU case.
>>> I added some more printk's and the crash happens before even the rtc core class
>>> has been fully registered. And no unregister is call has been made either.
>>>
>>> Which may point to execution of the irq worker (including a schedule_work)
>>> before the rtc-cmos parts are finished... Would explain why moving the
>>> initialize call further down does at least narrow the window for it to happen...
>>> The only thing I do not understand then is why that seems only to happen on Xen
>>> guests...
>>>
>>
>> Darn, guess I understand now... So cmos-rtc will call rtc_device_register from
>> within do_cmos_probe and set cmos_rtc.rtc with the pointer that
>> rtc_device_register returns.
>> But when having rtc_initialize_alarm being called earlier in
>> rtc_device_register, and that queues the work, there is a chance it will fire
>> even before that function returns.
>> And then cmos_checkintr will dereference the null pointer still stored in
>> cmos_rtc.rtc for calling rtc_update_irq...
> 
> Hey Stefan, Konrad, Sander, Neil,
> 	Thanks again for the bug reporting and sorry for the earlier trouble.
> So after the problematic patch was reverted for 3.2, I worked with Neil
> to try to address the issue in a better way. It looks like its working
> for Neil, so I was wondering if any of you could look over and maybe
> give the following tree a whirl:
> 	git://git.linaro.org/people/jstultz/linux.git dev/rtc-fixups
> 

Had a go with the patches applied with one HVM setup and two PVM setups (one and
4 vcpus) and have not seen any bad side effects.

Tested-by: Stefan Bader <stefan.bader@canonical.com>

> The shortlog is here:
> http://git.linaro.org/gitweb?p=people/jstultz/linux.git;a=shortlog;h=refs/heads/dev/rtc-fixups
> 
> If this avoids the issue for you, and no one sees any other problems,
> I'll queue these for 3.3
> 
> thanks
> -john
> 


  parent reply	other threads:[~2012-01-09 13:26 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-03 16:13 [3.2.0-RC7] BUG: unable to handle kernel NULL pointer dereference at 0000000000000598 [ 1.478005] IP: [<ffffffff8107a6c4>] queue_work_on+0x4/0x30 Sander Eikelenboom
2012-01-03 19:07 ` Regression: ONE CPU fails bootup at " Konrad Rzeszutek Wilk
2012-01-03 19:17   ` Sander Eikelenboom
2012-01-03 19:26   ` Stefan Bader
2012-01-03 20:11     ` Sander Eikelenboom
2012-01-03 20:10   ` Sander Eikelenboom
2012-01-03 22:33     ` Konrad Rzeszutek Wilk
2012-01-03 23:09   ` John Stultz
2012-01-04  0:31     ` NeilBrown
2012-01-04  0:53       ` John Stultz
2012-01-04  1:20         ` NeilBrown
2012-01-04 14:46           ` Konrad Rzeszutek Wilk
2012-01-04 15:12           ` Regression: ONE CPU fails bootup at Re: [3.2.0-RC7] BUG: unable to handle kernel NULL pointer dereference at 0000000000000598 " Stefan Bader
2012-01-05 22:03             ` NeilBrown
2012-01-04  8:17         ` Stefan Bader
2012-01-04 12:25           ` Stefan Bader
2012-01-04 13:17             ` Sander Eikelenboom
2012-01-04 18:33               ` John Stultz
2012-01-04 14:13             ` Stefan Bader
2012-01-06 20:41               ` John Stultz
2012-01-08 20:48                 ` Sander Eikelenboom
2012-01-09 13:26                 ` Stefan Bader [this message]
2012-01-04 18:35             ` John Stultz
2012-01-04 18:36           ` John Stultz
2012-01-04 18:50             ` Stefan Bader
2012-01-04 19:47             ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F0AEB0D.4050403@canonical.com \
    --to=stefan.bader@canonical.com \
    --cc=john.stultz@linaro.org \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@eikelenboom.it \
    --cc=neilb@suse.de \
    --cc=rjw@sisk.pl \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.