* Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150
[not found] ` <20171030110511.scfrdtlnf5lbdhu5@pd.tnic>
@ 2017-10-30 17:20 ` Linus Torvalds
2017-10-30 17:42 ` Borislav Petkov
2017-10-30 17:46 ` Linus Torvalds
0 siblings, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2017-10-30 17:20 UTC (permalink / raw)
To: Borislav Petkov, Len Brown, Tony Luck
Cc: Fengguang Wu, Tyler Baicar, Huang Ying, Chen Gong,
Linux Kernel Mailing List, Will Deacon, Rafael J. Wysocki,
Linux ACPI
On Mon, Oct 30, 2017 at 4:05 AM, Borislav Petkov <bp@suse.de> wrote:
>
> Looks like Tyler broke it:
>
> 77b246b32b2c ("acpi: apei: check for pending errors when probing GHES entries")
>
> and it went into 4.13 and -stable.
I think this whole driver is garbage.
It does ioremap_page_range() in both NMI and irq context.
The fact that it triggers at probe time is just pure luck, and is
probably because at that point we don't happen to have the page tables
for the ioremap set up yet, so it actually does an allocation, which
is what then causes the warning.
But we should have warned much eariler, and this code has apparently
never worked right.
The driver is COMPLETELY broken. It needs to do the ioremap not at
interrupt time, but when setting up the device, and outside a
spinlock.
I think somebody must have known how broken this whole thing was,
because it literally uses a RAW spinloick, and I suspect the reason
for that is because lockdep complained about the breakage without it.
Reverting just the latest addition is not going to help. The breakage
is much more fundamental than that.
Note that doing this thing in NMI context is *really* wrong, because
the whole ioremap() code is definitely not NMI-safe. I don't think
it's irq-safe either.
I will add a "might_sleep()" to ioremap_page_range() itself, so that
we get this warning more reliably and much eailer. Right now it has
been hidden by the fact that most of the time the time the page tables
may be already allocated, but even then it's broken.
The only safe way to do that kind of access is likely using the FIXMAP model.
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150
2017-10-30 17:20 ` [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150 Linus Torvalds
@ 2017-10-30 17:42 ` Borislav Petkov
2017-10-30 17:46 ` Linus Torvalds
1 sibling, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2017-10-30 17:42 UTC (permalink / raw)
To: Linus Torvalds
Cc: Len Brown, Tony Luck, Fengguang Wu, Tyler Baicar, Huang Ying,
Linux Kernel Mailing List, Will Deacon, Rafael J. Wysocki,
Linux ACPI
On Mon, Oct 30, 2017 at 10:20:40AM -0700, Linus Torvalds wrote:
> I think this whole driver is garbage.
This "driver" was supposed to implement the handle-hw-errors-in-fw glue
crap. The thing is, no one has been using it properly because there's
not even a single firmware vendor who has managed to produce a working
fw glue reporting errors properly. At least I haven't seen one.
Which means, no one is really using this. And now ARM start using it for
real and shit hits fan.
And I'd love to fix it but finding a box which *actually* has a usable
GHES glue is, as I mentioned above, almost impossible so testing fixes
would be hard. I need to figure out something...
Thanks.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150
2017-10-30 17:20 ` [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150 Linus Torvalds
2017-10-30 17:42 ` Borislav Petkov
@ 2017-10-30 17:46 ` Linus Torvalds
2017-10-30 17:49 ` Will Deacon
2017-10-30 20:14 ` Tyler Baicar
1 sibling, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2017-10-30 17:46 UTC (permalink / raw)
To: Borislav Petkov, Len Brown, Tony Luck
Cc: Fengguang Wu, Tyler Baicar, Huang Ying, Chen Gong,
Linux Kernel Mailing List, Will Deacon, Rafael J. Wysocki,
Linux ACPI
On Mon, Oct 30, 2017 at 10:20 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I will add a "might_sleep()" to ioremap_page_range() itself, so that
> we get this warning more reliably and much eailer. Right now it has
> been hidden by the fact that most of the time the time the page tables
> may be already allocated, but even then it's broken.
Done. It doesn't report anything for me, so _hopefully_ the GHES
driver is the only one that does games like this. See commit
b39ab98e2f47 ("Mark 'ioremap_page_range()' as possibly sleeping").
So now it should hopefully warn about this bad usage of page remapping
reliably, at least if you have CONFIG_DEBUG_ATOMIC_SLEEP enabled.
Can somebody who has a working GHES setup (although Borislav seems to
think no such thing exists) verify?
This obviously won't _fix_ anything, but at least it should make it
clear it's not that recent change that broke things - that just
happened to expose it. And hopefully somebody who knows that driver
will do the proper fixmap thing (or just ioremap once at probe time,
rather than at run-time).
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150
2017-10-30 17:46 ` Linus Torvalds
@ 2017-10-30 17:49 ` Will Deacon
2017-10-30 18:00 ` Linus Torvalds
2017-10-30 20:14 ` Tyler Baicar
1 sibling, 1 reply; 13+ messages in thread
From: Will Deacon @ 2017-10-30 17:49 UTC (permalink / raw)
To: Linus Torvalds
Cc: Borislav Petkov, Len Brown, Tony Luck, Fengguang Wu, Tyler Baicar,
Huang Ying, Chen Gong, Linux Kernel Mailing List,
Rafael J. Wysocki, Linux ACPI
On Mon, Oct 30, 2017 at 10:46:31AM -0700, Linus Torvalds wrote:
> On Mon, Oct 30, 2017 at 10:20 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I will add a "might_sleep()" to ioremap_page_range() itself, so that
> > we get this warning more reliably and much eailer. Right now it has
> > been hidden by the fact that most of the time the time the page tables
> > may be already allocated, but even then it's broken.
>
> Done. It doesn't report anything for me, so _hopefully_ the GHES
> driver is the only one that does games like this. See commit
> b39ab98e2f47 ("Mark 'ioremap_page_range()' as possibly sleeping").
>
> So now it should hopefully warn about this bad usage of page remapping
> reliably, at least if you have CONFIG_DEBUG_ATOMIC_SLEEP enabled.
>
> Can somebody who has a working GHES setup (although Borislav seems to
> think no such thing exists) verify?
>
> This obviously won't _fix_ anything, but at least it should make it
> clear it's not that recent change that broke things - that just
> happened to expose it. And hopefully somebody who knows that driver
> will do the proper fixmap thing (or just ioremap once at probe time,
> rather than at run-time).
FWIW, we discussed some of this back in 2015, because the TLB invalidation
looks busted to me too:
https://marc.info/?l=linux-kernel&m=145009681808308&w=2
Didn't go anywhere though...
Will
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150
2017-10-30 17:49 ` Will Deacon
@ 2017-10-30 18:00 ` Linus Torvalds
0 siblings, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2017-10-30 18:00 UTC (permalink / raw)
To: Will Deacon
Cc: Borislav Petkov, Len Brown, Tony Luck, Fengguang Wu, Tyler Baicar,
Huang Ying, Chen Gong, Linux Kernel Mailing List,
Rafael J. Wysocki, Linux ACPI
On Mon, Oct 30, 2017 at 10:49 AM, Will Deacon <will.deacon@arm.com> wrote:
>
> FWIW, we discussed some of this back in 2015, because the TLB invalidation
> looks busted to me too:
Yeah, I think the basic issue is that ioremap() is not supposed to map
*over* an existing mapping, it's designed to map pages into a new
mapping.
I think *every* other user of "ioremap_page_range()" is literally the
architecture-specific implementation of "ioremap()" (which does the
whole "allocate new VM area, then remap page range into that".
So the GHES driver use of this function really looks very wrong on so
many levels.
Checking..
Oh, git grep shows "drivers/pci/host/pci-tegra.c".
I'm afraid to even look into that file.
And pci_remap_iospace() looks potentially like a problem spot too -
but hopefully is done only at driver init time (but it could possibly
have the TLB issue).
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150
2017-10-30 17:46 ` Linus Torvalds
2017-10-30 17:49 ` Will Deacon
@ 2017-10-30 20:14 ` Tyler Baicar
2017-10-31 10:38 ` Will Deacon
[not found] ` <20171106224635.qopgsszwxzuitkpf@wfg-t540p.sh.intel.com>
1 sibling, 2 replies; 13+ messages in thread
From: Tyler Baicar @ 2017-10-30 20:14 UTC (permalink / raw)
To: Linus Torvalds, Borislav Petkov, Len Brown, Tony Luck
Cc: Fengguang Wu, Huang Ying, Chen Gong, Linux Kernel Mailing List,
Will Deacon, Rafael J. Wysocki, Linux ACPI, Timur Tabi
On 10/30/2017 1:46 PM, Linus Torvalds wrote:
> On Mon, Oct 30, 2017 at 10:20 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> I will add a "might_sleep()" to ioremap_page_range() itself, so that
>> we get this warning more reliably and much eailer. Right now it has
>> been hidden by the fact that most of the time the time the page tables
>> may be already allocated, but even then it's broken.
> Done. It doesn't report anything for me, so _hopefully_ the GHES
> driver is the only one that does games like this. See commit
> b39ab98e2f47 ("Mark 'ioremap_page_range()' as possibly sleeping").
>
> So now it should hopefully warn about this bad usage of page remapping
> reliably, at least if you have CONFIG_DEBUG_ATOMIC_SLEEP enabled.
>
> Can somebody who has a working GHES setup (although Borislav seems to
> think no such thing exists) verify?
Hello Linus,
I have verified that this flags the error for me every time ghes_proc() is used.
But I also see it flagged in ARM PMU code:
[ 7.381153] BUG: sleeping function called from invalid context at mm/slab.h:420
[ 7.387625] in_atomic(): 0, irqs_disabled(): 128, pid: 11, name: cpuhp/0
[ 7.394310] CPU: 0 PID: 11 Comm: cpuhp/0 Not tainted 4.14.0-rc7 #46
[ 7.400559] Hardware name: Qualcomm Qualcomm Centriq(TM) 2400 Development
Platform
[ 7.414361] Call trace:
[ 7.416797] [<ffff000008088b28>] dump_backtrace+0x0/0x270
[ 7.422175] [<ffff000008088dbc>] show_stack+0x24/0x30
[ 7.427211] [<ffff0000090d01f0>] dump_stack+0x98/0xb8
[ 7.432246] [<ffff00000810118c>] ___might_sleep+0x104/0x128
[ 7.437799] [<ffff000008101208>] __might_sleep+0x58/0x90
[ 7.443097] [<ffff000008254a7c>] kmem_cache_alloc_trace+0x224/0x280
[ 7.449347] [<ffff000008e9c938>] armpmu_alloc+0x30/0x168
[ 7.454639] [<ffff000008e9d15c>] arm_pmu_acpi_cpu_starting+0x114/0x148
[ 7.461151] [<ffff0000080d0f30>] cpuhp_invoke_callback+0xb8/0x760
[ 7.467226] [<ffff0000080d1ec4>] cpuhp_thread_fun+0xa4/0x1b8
[ 7.472872] [<ffff0000080f661c>] smpboot_thread_fn+0x174/0x250
[ 7.478684] [<ffff0000080f18ec>] kthread+0x114/0x140
[ 7.483632] [<ffff000008084774>] ret_from_fork+0x10/0x1c
For a GHES polling source:
[ 47.944596] BUG: sleeping function called from invalid context at
lib/ioremap.c:164
[ 47.951290] in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/19
[ 47.958150] CPU: 19 PID: 0 Comm: swapper/19 Tainted: G W 4.14.0-rc7 #46
[ 47.958152] Hardware name: Qualcomm Qualcomm Centriq(TM) 2400 Development
Platform
[ 47.958154] Call trace:
[ 47.958161] [<ffff000008088b28>] dump_backtrace+0x0/0x270
[ 47.958165] [<ffff000008088dbc>] show_stack+0x24/0x30
[ 47.958169] [<ffff0000090d01f0>] dump_stack+0x98/0xb8
[ 47.958174] [<ffff00000810118c>] ___might_sleep+0x104/0x128
[ 47.958177] [<ffff000008101208>] __might_sleep+0x58/0x90
[ 47.958180] [<ffff0000090d3d20>] ioremap_page_range+0x40/0x310
[ 47.958185] [<ffff0000086c5a98>] ghes_copy_tofrom_phys+0x1f8/0x240
[ 47.958188] [<ffff0000086c5da8>] ghes_proc+0xb0/0x8f0
[ 47.958190] [<ffff0000086c6ae8>] ghes_poll_func+0x20/0x40
[ 47.958196] [<ffff00000814b3dc>] call_timer_fn+0x3c/0x1b0
[ 47.958198] [<ffff00000814b638>] expire_timers+0xe8/0x170
[ 47.958201] [<ffff00000814b7fc>] run_timer_softirq+0x13c/0x188
[ 47.958203] [<ffff000008081964>] __do_softirq+0x144/0x33c
[ 47.958206] [<ffff0000080d6e78>] irq_exit+0xd0/0x108
[ 47.958210] [<ffff00000812dc44>] __handle_domain_irq+0x6c/0xc0
[ 47.958212] [<ffff000008081764>] gic_handle_irq+0xcc/0x188
For a GHES interrupt source:
[ 265.502603] BUG: sleeping function called from invalid context at
lib/ioremap.c:164
[ 265.509296] in_atomic(): 1, irqs_disabled(): 128, pid: 3, name: kworker/0:0
[ 265.516242] CPU: 0 PID: 3 Comm: kworker/0:0 Tainted: G W 4.14.0-rc7 #46
[ 265.516244] Hardware name: Qualcomm Qualcomm Centriq(TM) 2400 Development
Platform
[ 265.516251] Workqueue: kacpi_notify acpi_os_execute_deferred
[ 265.516254] Call trace:
[ 265.516258] [<ffff000008088b28>] dump_backtrace+0x0/0x270
[ 265.516261] [<ffff000008088dbc>] show_stack+0x24/0x30
[ 265.516264] [<ffff0000090d01f0>] dump_stack+0x98/0xb8
[ 265.516268] [<ffff00000810118c>] ___might_sleep+0x104/0x128
[ 265.516270] [<ffff000008101208>] __might_sleep+0x58/0x90
[ 265.516273] [<ffff0000090d3d20>] ioremap_page_range+0x40/0x310
[ 265.516277] [<ffff0000086c5a98>] ghes_copy_tofrom_phys+0x1f8/0x240
[ 265.516279] [<ffff0000086c5da8>] ghes_proc+0xb0/0x8f0
[ 265.516282] [<ffff0000086c6670>] ghes_notify_hed+0x50/0x90
[ 265.516286] [<ffff0000080f36a4>] notifier_call_chain+0x5c/0xa0
[ 265.516289] [<ffff0000080f3b80>] __blocking_notifier_call_chain+0x58/0xa0
[ 265.516291] [<ffff0000080f3c04>] blocking_notifier_call_chain+0x3c/0x50
[ 265.516293] [<ffff0000086c1140>] acpi_hed_notify+0x28/0x30
[ 265.516296] [<ffff000008678100>] acpi_device_notify+0x30/0x40
[ 265.516301] [<ffff000008691fb8>] acpi_ev_notify_dispatch+0x64/0x74
[ 265.516304] [<ffff00000867296c>] acpi_os_execute_deferred+0x24/0x38
[ 265.516308] [<ffff0000080ea748>] process_one_work+0x1f8/0x488
[ 265.516310] [<ffff0000080eaa30>] worker_thread+0x58/0x4a0
[ 265.516312] [<ffff0000080f18ec>] kthread+0x114/0x140
[ 265.516315] [<ffff000008084774>] ret_from_fork+0x10/0x1c
Thanks,
Tyler
--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150
2017-10-30 20:14 ` Tyler Baicar
@ 2017-10-31 10:38 ` Will Deacon
2017-10-31 12:29 ` Mark Rutland
[not found] ` <20171106224635.qopgsszwxzuitkpf@wfg-t540p.sh.intel.com>
1 sibling, 1 reply; 13+ messages in thread
From: Will Deacon @ 2017-10-31 10:38 UTC (permalink / raw)
To: Tyler Baicar
Cc: Linus Torvalds, Borislav Petkov, Len Brown, Tony Luck,
Fengguang Wu, Huang Ying, Chen Gong, Linux Kernel Mailing List,
Rafael J. Wysocki, Linux ACPI, Timur Tabi, mark.rutland
On Mon, Oct 30, 2017 at 04:14:15PM -0400, Tyler Baicar wrote:
> On 10/30/2017 1:46 PM, Linus Torvalds wrote:
> >On Mon, Oct 30, 2017 at 10:20 AM, Linus Torvalds
> ><torvalds@linux-foundation.org> wrote:
> >>I will add a "might_sleep()" to ioremap_page_range() itself, so that
> >>we get this warning more reliably and much eailer. Right now it has
> >>been hidden by the fact that most of the time the time the page tables
> >>may be already allocated, but even then it's broken.
> >Done. It doesn't report anything for me, so _hopefully_ the GHES
> >driver is the only one that does games like this. See commit
> >b39ab98e2f47 ("Mark 'ioremap_page_range()' as possibly sleeping").
> >
> >So now it should hopefully warn about this bad usage of page remapping
> >reliably, at least if you have CONFIG_DEBUG_ATOMIC_SLEEP enabled.
> >
> >Can somebody who has a working GHES setup (although Borislav seems to
> >think no such thing exists) verify?
> Hello Linus,
>
> I have verified that this flags the error for me every time ghes_proc() is used.
> But I also see it flagged in ARM PMU code:
>
> [ 7.381153] BUG: sleeping function called from invalid context at mm/slab.h:420
> [ 7.387625] in_atomic(): 0, irqs_disabled(): 128, pid: 11, name: cpuhp/0
> [ 7.394310] CPU: 0 PID: 11 Comm: cpuhp/0 Not tainted 4.14.0-rc7 #46
> [ 7.400559] Hardware name: Qualcomm Qualcomm Centriq(TM) 2400 Development
> Platform
> [ 7.414361] Call trace:
> [ 7.416797] [<ffff000008088b28>] dump_backtrace+0x0/0x270
> [ 7.422175] [<ffff000008088dbc>] show_stack+0x24/0x30
> [ 7.427211] [<ffff0000090d01f0>] dump_stack+0x98/0xb8
> [ 7.432246] [<ffff00000810118c>] ___might_sleep+0x104/0x128
> [ 7.437799] [<ffff000008101208>] __might_sleep+0x58/0x90
> [ 7.443097] [<ffff000008254a7c>] kmem_cache_alloc_trace+0x224/0x280
> [ 7.449347] [<ffff000008e9c938>] armpmu_alloc+0x30/0x168
> [ 7.454639] [<ffff000008e9d15c>] arm_pmu_acpi_cpu_starting+0x114/0x148
> [ 7.461151] [<ffff0000080d0f30>] cpuhp_invoke_callback+0xb8/0x760
> [ 7.467226] [<ffff0000080d1ec4>] cpuhp_thread_fun+0xa4/0x1b8
> [ 7.472872] [<ffff0000080f661c>] smpboot_thread_fn+0x174/0x250
> [ 7.478684] [<ffff0000080f18ec>] kthread+0x114/0x140
> [ 7.483632] [<ffff000008084774>] ret_from_fork+0x10/0x1c
I know Mark was doing some fixes in the ACPI notifier code here, so I've
added him to CC.
Will
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150
2017-10-31 10:38 ` Will Deacon
@ 2017-10-31 12:29 ` Mark Rutland
0 siblings, 0 replies; 13+ messages in thread
From: Mark Rutland @ 2017-10-31 12:29 UTC (permalink / raw)
To: Will Deacon
Cc: Tyler Baicar, Linus Torvalds, Borislav Petkov, Len Brown,
Tony Luck, Fengguang Wu, Huang Ying, Chen Gong,
Linux Kernel Mailing List, Rafael J. Wysocki, Linux ACPI,
Timur Tabi
On Tue, Oct 31, 2017 at 10:38:33AM +0000, Will Deacon wrote:
> On Mon, Oct 30, 2017 at 04:14:15PM -0400, Tyler Baicar wrote:
> > On 10/30/2017 1:46 PM, Linus Torvalds wrote:
> > >On Mon, Oct 30, 2017 at 10:20 AM, Linus Torvalds
> > ><torvalds@linux-foundation.org> wrote:
> > >>I will add a "might_sleep()" to ioremap_page_range() itself, so that
> > >>we get this warning more reliably and much eailer. Right now it has
> > >>been hidden by the fact that most of the time the time the page tables
> > >>may be already allocated, but even then it's broken.
> > >Done. It doesn't report anything for me, so _hopefully_ the GHES
> > >driver is the only one that does games like this. See commit
> > >b39ab98e2f47 ("Mark 'ioremap_page_range()' as possibly sleeping").
> > >
> > >So now it should hopefully warn about this bad usage of page remapping
> > >reliably, at least if you have CONFIG_DEBUG_ATOMIC_SLEEP enabled.
> > >
> > >Can somebody who has a working GHES setup (although Borislav seems to
> > >think no such thing exists) verify?
> > Hello Linus,
> >
> > I have verified that this flags the error for me every time ghes_proc() is used.
> > But I also see it flagged in ARM PMU code:
> >
> > [ 7.381153] BUG: sleeping function called from invalid context at mm/slab.h:420
> > [ 7.387625] in_atomic(): 0, irqs_disabled(): 128, pid: 11, name: cpuhp/0
> > [ 7.394310] CPU: 0 PID: 11 Comm: cpuhp/0 Not tainted 4.14.0-rc7 #46
> > [ 7.400559] Hardware name: Qualcomm Qualcomm Centriq(TM) 2400 Development
> > Platform
> > [ 7.414361] Call trace:
> > [ 7.416797] [<ffff000008088b28>] dump_backtrace+0x0/0x270
> > [ 7.422175] [<ffff000008088dbc>] show_stack+0x24/0x30
> > [ 7.427211] [<ffff0000090d01f0>] dump_stack+0x98/0xb8
> > [ 7.432246] [<ffff00000810118c>] ___might_sleep+0x104/0x128
> > [ 7.437799] [<ffff000008101208>] __might_sleep+0x58/0x90
> > [ 7.443097] [<ffff000008254a7c>] kmem_cache_alloc_trace+0x224/0x280
> > [ 7.449347] [<ffff000008e9c938>] armpmu_alloc+0x30/0x168
> > [ 7.454639] [<ffff000008e9d15c>] arm_pmu_acpi_cpu_starting+0x114/0x148
> > [ 7.461151] [<ffff0000080d0f30>] cpuhp_invoke_callback+0xb8/0x760
> > [ 7.467226] [<ffff0000080d1ec4>] cpuhp_thread_fun+0xa4/0x1b8
> > [ 7.472872] [<ffff0000080f661c>] smpboot_thread_fn+0x174/0x250
> > [ 7.478684] [<ffff0000080f18ec>] kthread+0x114/0x140
> > [ 7.483632] [<ffff000008084774>] ret_from_fork+0x10/0x1c
>
> I know Mark was doing some fixes in the ACPI notifier code here, so I've
> added him to CC.
Sorry for the delay on this; I have a rather hideous fix that I'll clean
up and post shortly.
Thanks,
Mark.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [v4.14-rc8 ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at lib/ioremap.c:165
[not found] ` <20171106224635.qopgsszwxzuitkpf@wfg-t540p.sh.intel.com>
@ 2017-11-06 22:57 ` Linus Torvalds
2017-11-06 23:20 ` Fengguang Wu
2017-11-06 23:02 ` Borislav Petkov
1 sibling, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2017-11-06 22:57 UTC (permalink / raw)
To: Fengguang Wu, James Morse
Cc: Tyler Baicar, Borislav Petkov, Len Brown, Tony Luck, Huang Ying,
Chen Gong, Linux Kernel Mailing List, Will Deacon,
Rafael J. Wysocki, Linux ACPI, Timur Tabi, Mark Rutland
On Mon, Nov 6, 2017 at 2:46 PM, Fengguang Wu <fengguang.wu@intel.com> wrote:
>
> I can see that in RC8, too:
James Morse posted a new version of his series to fix this, and it's
gotten a few tests, but not a lot. Since you clearly have GHES support
on at least some of your machines, it might be worth adding that
series from James to 0day testing.
The patches look good to me, and I assume I'll be be getting it
through Rafael from the ACPI tree (which is how the other ghes code
reaches me), but maybe by now for 4.15 with a stable backport.
The actual problem is definitely not new. Only the warning message.
So the code should work as well as it ever has, which may or may not
be saying a lot. It might be worth fixing for 4.14 just to not scare
people too much with messages, but at the same time it's not a
_functional_ regression.
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [v4.14-rc8 ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at lib/ioremap.c:165
[not found] ` <20171106224635.qopgsszwxzuitkpf@wfg-t540p.sh.intel.com>
2017-11-06 22:57 ` [v4.14-rc8 ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at lib/ioremap.c:165 Linus Torvalds
@ 2017-11-06 23:02 ` Borislav Petkov
2017-11-06 23:04 ` Rafael J. Wysocki
2017-11-07 13:39 ` Fengguang Wu
1 sibling, 2 replies; 13+ messages in thread
From: Borislav Petkov @ 2017-11-06 23:02 UTC (permalink / raw)
To: Fengguang Wu, Rafael J. Wysocki
Cc: Tyler Baicar, Linus Torvalds, Len Brown, Tony Luck, Huang Ying,
Linux Kernel Mailing List, Will Deacon, Linux ACPI, Timur Tabi,
Mark Rutland
On Tue, Nov 07, 2017 at 06:46:35AM +0800, Fengguang Wu wrote:
> I can see that in RC8, too:
https://lkml.kernel.org/r/20171106184427.31905-1-james.morse@arm.com
Rafael, you could still queue them for the merge window next week - they
look pretty straightforward and low risk to me.
Thx.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [v4.14-rc8 ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at lib/ioremap.c:165
2017-11-06 23:02 ` Borislav Petkov
@ 2017-11-06 23:04 ` Rafael J. Wysocki
2017-11-07 13:39 ` Fengguang Wu
1 sibling, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2017-11-06 23:04 UTC (permalink / raw)
To: Borislav Petkov
Cc: Fengguang Wu, Rafael J. Wysocki, Tyler Baicar, Linus Torvalds,
Len Brown, Tony Luck, Huang Ying, Linux Kernel Mailing List,
Will Deacon, Linux ACPI, Timur Tabi, Mark Rutland
On Tue, Nov 7, 2017 at 12:02 AM, Borislav Petkov <bp@suse.de> wrote:
> On Tue, Nov 07, 2017 at 06:46:35AM +0800, Fengguang Wu wrote:
>> I can see that in RC8, too:
>
> https://lkml.kernel.org/r/20171106184427.31905-1-james.morse@arm.com
>
> Rafael, you could still queue them for the merge window next week - they
> look pretty straightforward and low risk to me.
OK, I'll queue them up.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [v4.14-rc8 ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at lib/ioremap.c:165
2017-11-06 22:57 ` [v4.14-rc8 ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at lib/ioremap.c:165 Linus Torvalds
@ 2017-11-06 23:20 ` Fengguang Wu
0 siblings, 0 replies; 13+ messages in thread
From: Fengguang Wu @ 2017-11-06 23:20 UTC (permalink / raw)
To: Linus Torvalds
Cc: James Morse, Tyler Baicar, Borislav Petkov, Len Brown, Tony Luck,
Huang Ying, Chen Gong, Linux Kernel Mailing List, Will Deacon,
Rafael J. Wysocki, Linux ACPI, Timur Tabi, Mark Rutland
On Mon, Nov 06, 2017 at 02:57:20PM -0800, Linus Torvalds wrote:
>On Mon, Nov 6, 2017 at 2:46 PM, Fengguang Wu <fengguang.wu@intel.com> wrote:
>>
>> I can see that in RC8, too:
>
>James Morse posted a new version of his series to fix this, and it's
>gotten a few tests, but not a lot. Since you clearly have GHES support
>on at least some of your machines, it might be worth adding that
>series from James to 0day testing.
Sure. I'll test Rafael's git tree including James' patches.
I can see the GHES warnings in a number of 0day machines:
- ivb44: Ivytown Ivy Bridge-EP, E5-2697 v2
HW details can be found in
https://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git/tree/hosts/ivb44
- lkp-bdw-ep6: Broadwell-EP, E5-2699 v4
- lkp-bdw-ex2: Broadwell-EX, E7-8890 v4
- lkp-skl-2sp3: Skylake
- lkp-skl-4sp1: Skylake
- lkp-avoton2: Atom
Regards,
Fengguang
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [v4.14-rc8 ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at lib/ioremap.c:165
2017-11-06 23:02 ` Borislav Petkov
2017-11-06 23:04 ` Rafael J. Wysocki
@ 2017-11-07 13:39 ` Fengguang Wu
1 sibling, 0 replies; 13+ messages in thread
From: Fengguang Wu @ 2017-11-07 13:39 UTC (permalink / raw)
To: Borislav Petkov
Cc: Rafael J. Wysocki, Tyler Baicar, Linus Torvalds, Len Brown,
Tony Luck, Huang Ying, Linux Kernel Mailing List, Will Deacon,
Linux ACPI, Timur Tabi, Mark Rutland
On Tue, Nov 07, 2017 at 12:02:19AM +0100, Borislav Petkov wrote:
>On Tue, Nov 07, 2017 at 06:46:35AM +0800, Fengguang Wu wrote:
>> I can see that in RC8, too:
>
>https://lkml.kernel.org/r/20171106184427.31905-1-james.morse@arm.com
>
>Rafael, you could still queue them for the merge window next week - they
>look pretty straightforward and low risk to me.
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
I tried 100 boots with various test jobs and there is no
more GHES errors.
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-11-07 13:39 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CA+55aFxSJGeN=2X-uX-on1Uq2Nb8+v1aiMDz5H1+tKW_N5Q+6g@mail.gmail.com>
[not found] ` <20171029225155.qcum5i75awrt5tzm@wfg-t540p.sh.intel.com>
[not found] ` <20171029231835.3725fnd5yehlmqob@wfg-t540p.sh.intel.com>
[not found] ` <20171030110511.scfrdtlnf5lbdhu5@pd.tnic>
2017-10-30 17:20 ` [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150 Linus Torvalds
2017-10-30 17:42 ` Borislav Petkov
2017-10-30 17:46 ` Linus Torvalds
2017-10-30 17:49 ` Will Deacon
2017-10-30 18:00 ` Linus Torvalds
2017-10-30 20:14 ` Tyler Baicar
2017-10-31 10:38 ` Will Deacon
2017-10-31 12:29 ` Mark Rutland
[not found] ` <20171106224635.qopgsszwxzuitkpf@wfg-t540p.sh.intel.com>
2017-11-06 22:57 ` [v4.14-rc8 ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at lib/ioremap.c:165 Linus Torvalds
2017-11-06 23:20 ` Fengguang Wu
2017-11-06 23:02 ` Borislav Petkov
2017-11-06 23:04 ` Rafael J. Wysocki
2017-11-07 13:39 ` Fengguang Wu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox