linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* BUG: Null pointer dereference on booting TC2 with vexpress_defconfig
@ 2014-02-17 22:39 Christoffer Dall
  2014-02-18  0:33 ` Christoffer Dall
  2014-02-18 11:04 ` Sudeep Holla
  0 siblings, 2 replies; 7+ messages in thread
From: Christoffer Dall @ 2014-02-17 22:39 UTC (permalink / raw)
  To: linux-arm-kernel

Booting my TC2 using 3.14-rc3 and vexpress_defconfig causes a NULL
pointer dereference in schedule_work_on.

A quick look at the trace indicates that schedule_work() is called
before system_wq is initialized.

Further, a bisect seems to indicate that this call path is triggered by
the changes in this merge commit of Theodore T'so's random_for_linus
tag:
0891ad829d2a0501053703df66029e843e3b8365

(However, my bisect may not be 100% correct, as some of the commits
between 3.13 and 3.12 don't boot the TC2 with vexpress_defconfig,
specifically the ones after f9300eaaac1ca300083ad41937923a90cc3a2394,
which causes boot to halt after "ARM CCI driver probed").

Disabling CONFIG_ARCH_VEXPRESS_TC2_PM avoids the issue.

I'm not familiar enough with any of these code paths to quicly identify
what the issue could be.  Apologies if I missed a previous post about
this issue (I couldn't find anything but would be surprised if I'm the
only one doing vexpress_defconfig on a TC2).

Here's the full details of the error I'm seeing:

Unable to handle kernel NULL pointer dereference at virtual address
00000080
pgd = 80004000
[00000080] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc3 #463
task: ee460000 ti: ee446000 task.ti: ee446000
PC is at __queue_work+0x20/0x27c
LR is at queue_work_on+0x48/0x54
pc : [<80033d68>]    lr : [<80034078>]    psr: 200001d3
sp : ee447c60  ip : 00000000  fp : 00000570
r10: 00004000  r9 : 00007ffe  r8 : 00000008
r7 : 00000000  r6 : 00000000  r5 : 805ec388  r4 : 800001d3
r3 : 600001d3  r2 : 805ec388  r1 : 00000000  r0 : 00000008

Backtrace:

[<80033d68>] (__queue_work) from [<80034078>] (queue_work_on+0x48/0x54)
[<80034078>] (queue_work_on) from [<802748a4>]
(credit_entropy_bits+0x1b0/0x248)
[<802748a4>] (credit_entropy_bits) from [<802756b4>]
(add_interrupt_randomness+0x17c/0x1a)
[<802756b4>] (add_interrupt_randomness) from [<80058070>]
(handle_irq_event_percpu+0x8c/0)
[<80058070>] (handle_irq_event_percpu) from [<800581c0>]
(handle_irq_event+0x44/0x64)
[<800581c0>] (handle_irq_event) from [<8005afd0>]
(handle_fasteoi_irq+0x7c/0x148)
[<8005afd0>] (handle_fasteoi_irq) from [<80057a38>]
(generic_handle_irq+0x20/0x30)
[<80057a38>] (generic_handle_irq) from [<8000ec34>]
(handle_IRQ+0x38/0x94)
[<8000ec34>] (handle_IRQ) from [<80008568>] (gic_handle_irq+0x28/0x5c)
[<80008568>] (gic_handle_irq) from [<80012040>] (__irq_svc+0x40/0x50)


Thanks,
-- 
Christoffer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* BUG: Null pointer dereference on booting TC2 with vexpress_defconfig
  2014-02-17 22:39 BUG: Null pointer dereference on booting TC2 with vexpress_defconfig Christoffer Dall
@ 2014-02-18  0:33 ` Christoffer Dall
  2014-02-18 11:04 ` Sudeep Holla
  1 sibling, 0 replies; 7+ messages in thread
From: Christoffer Dall @ 2014-02-18  0:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 17, 2014 at 02:39:07PM -0800, Christoffer Dall wrote:
> Booting my TC2 using 3.14-rc3 and vexpress_defconfig causes a NULL
> pointer dereference in schedule_work_on.
> 
> A quick look at the trace indicates that schedule_work() is called
> before system_wq is initialized.
> 
> Further, a bisect seems to indicate that this call path is triggered by
> the changes in this merge commit of Theodore T'so's random_for_linus
> tag:
> 0891ad829d2a0501053703df66029e843e3b8365
> 

Update, it's in fact this commit causing the trouble:
6265e169cd313d6f3aad3c33d0a5b0d9624f69f5

As far as I can gather the problem is that an IRQ comes in early, before
the workqueue structures have been setup, and add_interrupt_randomness()
calls credit_entropy_bits(), which calls schedule_work() and then
everything breaks.

Just removing the extra bit that pushes work to a workqueue doesn't
work, causing the system to just stop responding, because I assume mem
alloc functions need that extra entropy.

So I'm wondering if the system is trying to take interrupts too early,
or what exactly is going on.  The interrupts are from the non-secure
arch timer, which I think the kernel is in complete control of at this
point, so it shouldn't be a bootloader issue.

Anyway, hope someone can help me out.

-Christoffer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* BUG: Null pointer dereference on booting TC2 with vexpress_defconfig
  2014-02-17 22:39 BUG: Null pointer dereference on booting TC2 with vexpress_defconfig Christoffer Dall
  2014-02-18  0:33 ` Christoffer Dall
@ 2014-02-18 11:04 ` Sudeep Holla
  2014-02-18 16:33   ` Christoffer Dall
  1 sibling, 1 reply; 7+ messages in thread
From: Sudeep Holla @ 2014-02-18 11:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 17/02/14 22:39, Christoffer Dall wrote:
> Booting my TC2 using 3.14-rc3 and vexpress_defconfig causes a NULL
> pointer dereference in schedule_work_on.
> 
> A quick look at the trace indicates that schedule_work() is called
> before system_wq is initialized.
> 
> Further, a bisect seems to indicate that this call path is triggered by
> the changes in this merge commit of Theodore T'so's random_for_linus
> tag:
> 0891ad829d2a0501053703df66029e843e3b8365
> 
> (However, my bisect may not be 100% correct, as some of the commits
> between 3.13 and 3.12 don't boot the TC2 with vexpress_defconfig,
> specifically the ones after f9300eaaac1ca300083ad41937923a90cc3a2394,
> which causes boot to halt after "ARM CCI driver probed").
> 
Yes that requires some changes in defconfig to continue to work across
these changes(mainly have all these new configs disabled)

> Disabling CONFIG_ARCH_VEXPRESS_TC2_PM avoids the issue.
> 
Are you just disabling this or even CONFIG_MCPM ? Are the secondaries cpus
coming up ?

> I'm not familiar enough with any of these code paths to quicly identify
> what the issue could be.  Apologies if I missed a previous post about
> this issue (I couldn't find anything but would be surprised if I'm the
> only one doing vexpress_defconfig on a TC2).
> 
I just tried and it works fine. If CONFIG_MCPM and CONFIG_ARCH_VEXPRESS_TC2_PM,
it requires some changes in board configurations(for bootmon), I assume you have
done those changes.

Regards,
Sudeep

^ permalink raw reply	[flat|nested] 7+ messages in thread

* BUG: Null pointer dereference on booting TC2 with vexpress_defconfig
  2014-02-18 11:04 ` Sudeep Holla
@ 2014-02-18 16:33   ` Christoffer Dall
  2014-02-18 16:59     ` Sudeep Holla
  0 siblings, 1 reply; 7+ messages in thread
From: Christoffer Dall @ 2014-02-18 16:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 18 February 2014 03:04, Sudeep Holla <Sudeep.Holla@arm.com> wrote:
> On 17/02/14 22:39, Christoffer Dall wrote:
>> Booting my TC2 using 3.14-rc3 and vexpress_defconfig causes a NULL
>> pointer dereference in schedule_work_on.
>>
>> A quick look at the trace indicates that schedule_work() is called
>> before system_wq is initialized.
>>
>> Further, a bisect seems to indicate that this call path is triggered by
>> the changes in this merge commit of Theodore T'so's random_for_linus
>> tag:
>> 0891ad829d2a0501053703df66029e843e3b8365
>>
>> (However, my bisect may not be 100% correct, as some of the commits
>> between 3.13 and 3.12 don't boot the TC2 with vexpress_defconfig,
>> specifically the ones after f9300eaaac1ca300083ad41937923a90cc3a2394,
>> which causes boot to halt after "ARM CCI driver probed").
>>
> Yes that requires some changes in defconfig to continue to work across
> these changes(mainly have all these new configs disabled)
>

So vexpress_defconfig is known to not work on TC2?

>> Disabling CONFIG_ARCH_VEXPRESS_TC2_PM avoids the issue.
>>
> Are you just disabling this or even CONFIG_MCPM ? Are the secondaries cpus
> coming up ?
>

Disabling CONFIG_ARCH_VEXPRESS_TC2_PM allows the system to boot, but
only with one CPU.

Disabling CONFIG_MCPM allows SMP boot as well.

>> I'm not familiar enough with any of these code paths to quicly identify
>> what the issue could be.  Apologies if I missed a previous post about
>> this issue (I couldn't find anything but would be surprised if I'm the
>> only one doing vexpress_defconfig on a TC2).
>>
> I just tried and it works fine. If CONFIG_MCPM and CONFIG_ARCH_VEXPRESS_TC2_PM,
> it requires some changes in board configurations(for bootmon), I assume you have
> done those changes.
>

What works fine? With both configs enabled?

I didn't change anything on the boot monitor side.  Can you give me a
pointer to the specifics?  Was there an announcement about this
somewhere that I failed to locate?

Thanks for your help!

-Christoffer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* BUG: Null pointer dereference on booting TC2 with vexpress_defconfig
  2014-02-18 16:33   ` Christoffer Dall
@ 2014-02-18 16:59     ` Sudeep Holla
  2014-02-18 21:32       ` Christoffer Dall
  0 siblings, 1 reply; 7+ messages in thread
From: Sudeep Holla @ 2014-02-18 16:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 18/02/14 16:33, Christoffer Dall wrote:
> On 18 February 2014 03:04, Sudeep Holla <Sudeep.Holla@arm.com> wrote:
>> On 17/02/14 22:39, Christoffer Dall wrote:
>>> Booting my TC2 using 3.14-rc3 and vexpress_defconfig causes a NULL
>>> pointer dereference in schedule_work_on.
>>>
>>> A quick look at the trace indicates that schedule_work() is called
>>> before system_wq is initialized.
>>>
>>> Further, a bisect seems to indicate that this call path is triggered by
>>> the changes in this merge commit of Theodore T'so's random_for_linus
>>> tag:
>>> 0891ad829d2a0501053703df66029e843e3b8365
>>>
>>> (However, my bisect may not be 100% correct, as some of the commits
>>> between 3.13 and 3.12 don't boot the TC2 with vexpress_defconfig,
>>> specifically the ones after f9300eaaac1ca300083ad41937923a90cc3a2394,
>>> which causes boot to halt after "ARM CCI driver probed").
>>>
>> Yes that requires some changes in defconfig to continue to work across
>> these changes(mainly have all these new configs disabled)
>>
> 
> So vexpress_defconfig is known to not work on TC2?
>

I would say yes before v3.13 for TC2 at-least. Pawel's commit: 81d6e719d1f8(
ARM: vexpress: Enable platform-specific options in defconfig) enabled several VE
specific features.

>>> Disabling CONFIG_ARCH_VEXPRESS_TC2_PM avoids the issue.
>>>
>> Are you just disabling this or even CONFIG_MCPM ? Are the secondaries cpus
>> coming up ?
>>
> 
> Disabling CONFIG_ARCH_VEXPRESS_TC2_PM allows the system to boot, but
> only with one CPU.
> 
> Disabling CONFIG_MCPM allows SMP boot as well.
> 
Yes that's what I suspected.

>>> I'm not familiar enough with any of these code paths to quicly identify
>>> what the issue could be.  Apologies if I missed a previous post about
>>> this issue (I couldn't find anything but would be surprised if I'm the
>>> only one doing vexpress_defconfig on a TC2).
>>>
>> I just tried and it works fine. If CONFIG_MCPM and CONFIG_ARCH_VEXPRESS_TC2_PM,
>> it requires some changes in board configurations(for bootmon), I assume you have
>> done those changes.
>>
> 
> What works fine? With both configs enabled?
> 
Yes with the default vexpress_defconfig as is in the mainline.

> I didn't change anything on the boot monitor side.  Can you give me a
> pointer to the specifics?  Was there an announcement about this
> somewhere that I failed to locate?
> 

You might be having very old firmware that doesn't support percpu mailbox and
hence can't enable CONFIG_MCPM. You can refer CFGREG48 in Section 3.3.2 of [1]
for details. You can grab the latest firmware in single step from [2] under
Firmware tab.

Regards,
Sudeep

[1] http://infocenter.arm.com/help/topic/com.arm.doc.ddi0503g/CHDCADED.html
[2] http://releases.linaro.org/14.01/openembedded/vexpress-lsk/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* BUG: Null pointer dereference on booting TC2 with vexpress_defconfig
  2014-02-18 16:59     ` Sudeep Holla
@ 2014-02-18 21:32       ` Christoffer Dall
  2014-02-19 11:59         ` Sudeep Holla
  0 siblings, 1 reply; 7+ messages in thread
From: Christoffer Dall @ 2014-02-18 21:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 18, 2014 at 04:59:23PM +0000, Sudeep Holla wrote:
> On 18/02/14 16:33, Christoffer Dall wrote:
> > On 18 February 2014 03:04, Sudeep Holla <Sudeep.Holla@arm.com> wrote:
> >> On 17/02/14 22:39, Christoffer Dall wrote:
> >>> Booting my TC2 using 3.14-rc3 and vexpress_defconfig causes a NULL
> >>> pointer dereference in schedule_work_on.
> >>>
> >>> A quick look at the trace indicates that schedule_work() is called
> >>> before system_wq is initialized.
> >>>
> >>> Further, a bisect seems to indicate that this call path is triggered by
> >>> the changes in this merge commit of Theodore T'so's random_for_linus
> >>> tag:
> >>> 0891ad829d2a0501053703df66029e843e3b8365
> >>>
> >>> (However, my bisect may not be 100% correct, as some of the commits
> >>> between 3.13 and 3.12 don't boot the TC2 with vexpress_defconfig,
> >>> specifically the ones after f9300eaaac1ca300083ad41937923a90cc3a2394,
> >>> which causes boot to halt after "ARM CCI driver probed").
> >>>
> >> Yes that requires some changes in defconfig to continue to work across
> >> these changes(mainly have all these new configs disabled)
> >>
> > 
> > So vexpress_defconfig is known to not work on TC2?
> >
> 
> I would say yes before v3.13 for TC2 at-least. Pawel's commit: 81d6e719d1f8(
> ARM: vexpress: Enable platform-specific options in defconfig) enabled several VE
> specific features.
> 
> >>> Disabling CONFIG_ARCH_VEXPRESS_TC2_PM avoids the issue.
> >>>
> >> Are you just disabling this or even CONFIG_MCPM ? Are the secondaries cpus
> >> coming up ?
> >>
> > 
> > Disabling CONFIG_ARCH_VEXPRESS_TC2_PM allows the system to boot, but
> > only with one CPU.
> > 
> > Disabling CONFIG_MCPM allows SMP boot as well.
> > 
> Yes that's what I suspected.
> 
> >>> I'm not familiar enough with any of these code paths to quicly identify
> >>> what the issue could be.  Apologies if I missed a previous post about
> >>> this issue (I couldn't find anything but would be surprised if I'm the
> >>> only one doing vexpress_defconfig on a TC2).
> >>>
> >> I just tried and it works fine. If CONFIG_MCPM and CONFIG_ARCH_VEXPRESS_TC2_PM,
> >> it requires some changes in board configurations(for bootmon), I assume you have
> >> done those changes.
> >>
> > 
> > What works fine? With both configs enabled?
> > 
> Yes with the default vexpress_defconfig as is in the mainline.
> 
> > I didn't change anything on the boot monitor side.  Can you give me a
> > pointer to the specifics?  Was there an announcement about this
> > somewhere that I failed to locate?
> > 
> 
> You might be having very old firmware that doesn't support percpu mailbox and
> hence can't enable CONFIG_MCPM. You can refer CFGREG48 in Section 3.3.2 of [1]
> for details. You can grab the latest firmware in single step from [2] under
> Firmware tab.
> 
> [2] http://releases.linaro.org/14.01/openembedded/vexpress-lsk/

Thanks, unfortunately when I flash this newest firmware, I just see:

Setting DVI mode for VGA.
Releasing Daughterboard resets.
Switching MCC log to UART1.

And it never seems to proceed.  Any pointers to what I could be doing
wrong?

-Christoffer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* BUG: Null pointer dereference on booting TC2 with vexpress_defconfig
  2014-02-18 21:32       ` Christoffer Dall
@ 2014-02-19 11:59         ` Sudeep Holla
  0 siblings, 0 replies; 7+ messages in thread
From: Sudeep Holla @ 2014-02-19 11:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 18/02/14 21:32, Christoffer Dall wrote:
> On Tue, Feb 18, 2014 at 04:59:23PM +0000, Sudeep Holla wrote:
>> On 18/02/14 16:33, Christoffer Dall wrote:
>>> On 18 February 2014 03:04, Sudeep Holla <Sudeep.Holla@arm.com> wrote:
>>>> On 17/02/14 22:39, Christoffer Dall wrote:
>>>>> Booting my TC2 using 3.14-rc3 and vexpress_defconfig causes a NULL
>>>>> pointer dereference in schedule_work_on.
>>>>>
>>>>> A quick look at the trace indicates that schedule_work() is called
>>>>> before system_wq is initialized.
>>>>>
>>>>> Further, a bisect seems to indicate that this call path is triggered by
>>>>> the changes in this merge commit of Theodore T'so's random_for_linus
>>>>> tag:
>>>>> 0891ad829d2a0501053703df66029e843e3b8365
>>>>>
>>>>> (However, my bisect may not be 100% correct, as some of the commits
>>>>> between 3.13 and 3.12 don't boot the TC2 with vexpress_defconfig,
>>>>> specifically the ones after f9300eaaac1ca300083ad41937923a90cc3a2394,
>>>>> which causes boot to halt after "ARM CCI driver probed").
>>>>>
>>>> Yes that requires some changes in defconfig to continue to work across
>>>> these changes(mainly have all these new configs disabled)
>>>>
>>>
>>> So vexpress_defconfig is known to not work on TC2?
>>>
>>
>> I would say yes before v3.13 for TC2 at-least. Pawel's commit: 81d6e719d1f8(
>> ARM: vexpress: Enable platform-specific options in defconfig) enabled several VE
>> specific features.
>>
>>>>> Disabling CONFIG_ARCH_VEXPRESS_TC2_PM avoids the issue.
>>>>>
>>>> Are you just disabling this or even CONFIG_MCPM ? Are the secondaries cpus
>>>> coming up ?
>>>>
>>>
>>> Disabling CONFIG_ARCH_VEXPRESS_TC2_PM allows the system to boot, but
>>> only with one CPU.
>>>
>>> Disabling CONFIG_MCPM allows SMP boot as well.
>>>
>> Yes that's what I suspected.
>>
>>>>> I'm not familiar enough with any of these code paths to quicly identify
>>>>> what the issue could be.  Apologies if I missed a previous post about
>>>>> this issue (I couldn't find anything but would be surprised if I'm the
>>>>> only one doing vexpress_defconfig on a TC2).
>>>>>
>>>> I just tried and it works fine. If CONFIG_MCPM and CONFIG_ARCH_VEXPRESS_TC2_PM,
>>>> it requires some changes in board configurations(for bootmon), I assume you have
>>>> done those changes.
>>>>
>>>
>>> What works fine? With both configs enabled?
>>>
>> Yes with the default vexpress_defconfig as is in the mainline.
>>
>>> I didn't change anything on the boot monitor side.  Can you give me a
>>> pointer to the specifics?  Was there an announcement about this
>>> somewhere that I failed to locate?
>>>
>>
>> You might be having very old firmware that doesn't support percpu mailbox and
>> hence can't enable CONFIG_MCPM. You can refer CFGREG48 in Section 3.3.2 of [1]
>> for details. You can grab the latest firmware in single step from [2] under
>> Firmware tab.
>>
>> [2] http://releases.linaro.org/14.01/openembedded/vexpress-lsk/
> 
> Thanks, unfortunately when I flash this newest firmware, I just see:
> 
> Setting DVI mode for VGA.
> Releasing Daughterboard resets.
> Switching MCC log to UART1.
> 
> And it never seems to proceed.  Any pointers to what I could be doing
> wrong?

I just tried dumping the exact contents from the git clone and it boots fine to
bootmon prompt to me.

BTW after a chat with MarcZ, got to know that you would enter linux kernel in
hyp-mode. It's broken esp. with U-boot and MCPM combination last time I checked.
So IMO it's better to revert back to !CONFIG_MCPM & !CONFIG_VEXPRESS_TC2_PM and
old firmware for now for all you KVM developments .

However the new firmware is backward compatible and should work w/o MCPM.

Regards,
Sudeep

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-02-19 11:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-17 22:39 BUG: Null pointer dereference on booting TC2 with vexpress_defconfig Christoffer Dall
2014-02-18  0:33 ` Christoffer Dall
2014-02-18 11:04 ` Sudeep Holla
2014-02-18 16:33   ` Christoffer Dall
2014-02-18 16:59     ` Sudeep Holla
2014-02-18 21:32       ` Christoffer Dall
2014-02-19 11:59         ` Sudeep Holla

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).