* Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150 [not found] ` <20171030110511.scfrdtlnf5lbdhu5@pd.tnic> @ 2017-10-30 17:20 ` Linus Torvalds 2017-10-30 17:42 ` Borislav Petkov 2017-10-30 17:46 ` Linus Torvalds 0 siblings, 2 replies; 13+ messages in thread From: Linus Torvalds @ 2017-10-30 17:20 UTC (permalink / raw) To: Borislav Petkov, Len Brown, Tony Luck Cc: Fengguang Wu, Tyler Baicar, Huang Ying, Chen Gong, Linux Kernel Mailing List, Will Deacon, Rafael J. Wysocki, Linux ACPI On Mon, Oct 30, 2017 at 4:05 AM, Borislav Petkov <bp@suse.de> wrote: > > Looks like Tyler broke it: > > 77b246b32b2c ("acpi: apei: check for pending errors when probing GHES entries") > > and it went into 4.13 and -stable. I think this whole driver is garbage. It does ioremap_page_range() in both NMI and irq context. The fact that it triggers at probe time is just pure luck, and is probably because at that point we don't happen to have the page tables for the ioremap set up yet, so it actually does an allocation, which is what then causes the warning. But we should have warned much eariler, and this code has apparently never worked right. The driver is COMPLETELY broken. It needs to do the ioremap not at interrupt time, but when setting up the device, and outside a spinlock. I think somebody must have known how broken this whole thing was, because it literally uses a RAW spinloick, and I suspect the reason for that is because lockdep complained about the breakage without it. Reverting just the latest addition is not going to help. The breakage is much more fundamental than that. Note that doing this thing in NMI context is *really* wrong, because the whole ioremap() code is definitely not NMI-safe. I don't think it's irq-safe either. I will add a "might_sleep()" to ioremap_page_range() itself, so that we get this warning more reliably and much eailer. Right now it has been hidden by the fact that most of the time the time the page tables may be already allocated, but even then it's broken. The only safe way to do that kind of access is likely using the FIXMAP model. Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150 2017-10-30 17:20 ` [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150 Linus Torvalds @ 2017-10-30 17:42 ` Borislav Petkov 2017-10-30 17:46 ` Linus Torvalds 1 sibling, 0 replies; 13+ messages in thread From: Borislav Petkov @ 2017-10-30 17:42 UTC (permalink / raw) To: Linus Torvalds Cc: Len Brown, Tony Luck, Fengguang Wu, Tyler Baicar, Huang Ying, Linux Kernel Mailing List, Will Deacon, Rafael J. Wysocki, Linux ACPI On Mon, Oct 30, 2017 at 10:20:40AM -0700, Linus Torvalds wrote: > I think this whole driver is garbage. This "driver" was supposed to implement the handle-hw-errors-in-fw glue crap. The thing is, no one has been using it properly because there's not even a single firmware vendor who has managed to produce a working fw glue reporting errors properly. At least I haven't seen one. Which means, no one is really using this. And now ARM start using it for real and shit hits fan. And I'd love to fix it but finding a box which *actually* has a usable GHES glue is, as I mentioned above, almost impossible so testing fixes would be hard. I need to figure out something... Thanks. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150 2017-10-30 17:20 ` [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150 Linus Torvalds 2017-10-30 17:42 ` Borislav Petkov @ 2017-10-30 17:46 ` Linus Torvalds 2017-10-30 17:49 ` Will Deacon 2017-10-30 20:14 ` Tyler Baicar 1 sibling, 2 replies; 13+ messages in thread From: Linus Torvalds @ 2017-10-30 17:46 UTC (permalink / raw) To: Borislav Petkov, Len Brown, Tony Luck Cc: Fengguang Wu, Tyler Baicar, Huang Ying, Chen Gong, Linux Kernel Mailing List, Will Deacon, Rafael J. Wysocki, Linux ACPI On Mon, Oct 30, 2017 at 10:20 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I will add a "might_sleep()" to ioremap_page_range() itself, so that > we get this warning more reliably and much eailer. Right now it has > been hidden by the fact that most of the time the time the page tables > may be already allocated, but even then it's broken. Done. It doesn't report anything for me, so _hopefully_ the GHES driver is the only one that does games like this. See commit b39ab98e2f47 ("Mark 'ioremap_page_range()' as possibly sleeping"). So now it should hopefully warn about this bad usage of page remapping reliably, at least if you have CONFIG_DEBUG_ATOMIC_SLEEP enabled. Can somebody who has a working GHES setup (although Borislav seems to think no such thing exists) verify? This obviously won't _fix_ anything, but at least it should make it clear it's not that recent change that broke things - that just happened to expose it. And hopefully somebody who knows that driver will do the proper fixmap thing (or just ioremap once at probe time, rather than at run-time). Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150 2017-10-30 17:46 ` Linus Torvalds @ 2017-10-30 17:49 ` Will Deacon 2017-10-30 18:00 ` Linus Torvalds 2017-10-30 20:14 ` Tyler Baicar 1 sibling, 1 reply; 13+ messages in thread From: Will Deacon @ 2017-10-30 17:49 UTC (permalink / raw) To: Linus Torvalds Cc: Borislav Petkov, Len Brown, Tony Luck, Fengguang Wu, Tyler Baicar, Huang Ying, Chen Gong, Linux Kernel Mailing List, Rafael J. Wysocki, Linux ACPI On Mon, Oct 30, 2017 at 10:46:31AM -0700, Linus Torvalds wrote: > On Mon, Oct 30, 2017 at 10:20 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > I will add a "might_sleep()" to ioremap_page_range() itself, so that > > we get this warning more reliably and much eailer. Right now it has > > been hidden by the fact that most of the time the time the page tables > > may be already allocated, but even then it's broken. > > Done. It doesn't report anything for me, so _hopefully_ the GHES > driver is the only one that does games like this. See commit > b39ab98e2f47 ("Mark 'ioremap_page_range()' as possibly sleeping"). > > So now it should hopefully warn about this bad usage of page remapping > reliably, at least if you have CONFIG_DEBUG_ATOMIC_SLEEP enabled. > > Can somebody who has a working GHES setup (although Borislav seems to > think no such thing exists) verify? > > This obviously won't _fix_ anything, but at least it should make it > clear it's not that recent change that broke things - that just > happened to expose it. And hopefully somebody who knows that driver > will do the proper fixmap thing (or just ioremap once at probe time, > rather than at run-time). FWIW, we discussed some of this back in 2015, because the TLB invalidation looks busted to me too: https://marc.info/?l=linux-kernel&m=145009681808308&w=2 Didn't go anywhere though... Will ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150 2017-10-30 17:49 ` Will Deacon @ 2017-10-30 18:00 ` Linus Torvalds 0 siblings, 0 replies; 13+ messages in thread From: Linus Torvalds @ 2017-10-30 18:00 UTC (permalink / raw) To: Will Deacon Cc: Borislav Petkov, Len Brown, Tony Luck, Fengguang Wu, Tyler Baicar, Huang Ying, Chen Gong, Linux Kernel Mailing List, Rafael J. Wysocki, Linux ACPI On Mon, Oct 30, 2017 at 10:49 AM, Will Deacon <will.deacon@arm.com> wrote: > > FWIW, we discussed some of this back in 2015, because the TLB invalidation > looks busted to me too: Yeah, I think the basic issue is that ioremap() is not supposed to map *over* an existing mapping, it's designed to map pages into a new mapping. I think *every* other user of "ioremap_page_range()" is literally the architecture-specific implementation of "ioremap()" (which does the whole "allocate new VM area, then remap page range into that". So the GHES driver use of this function really looks very wrong on so many levels. Checking.. Oh, git grep shows "drivers/pci/host/pci-tegra.c". I'm afraid to even look into that file. And pci_remap_iospace() looks potentially like a problem spot too - but hopefully is done only at driver init time (but it could possibly have the TLB issue). Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150 2017-10-30 17:46 ` Linus Torvalds 2017-10-30 17:49 ` Will Deacon @ 2017-10-30 20:14 ` Tyler Baicar 2017-10-31 10:38 ` Will Deacon [not found] ` <20171106224635.qopgsszwxzuitkpf@wfg-t540p.sh.intel.com> 1 sibling, 2 replies; 13+ messages in thread From: Tyler Baicar @ 2017-10-30 20:14 UTC (permalink / raw) To: Linus Torvalds, Borislav Petkov, Len Brown, Tony Luck Cc: Fengguang Wu, Huang Ying, Chen Gong, Linux Kernel Mailing List, Will Deacon, Rafael J. Wysocki, Linux ACPI, Timur Tabi On 10/30/2017 1:46 PM, Linus Torvalds wrote: > On Mon, Oct 30, 2017 at 10:20 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> I will add a "might_sleep()" to ioremap_page_range() itself, so that >> we get this warning more reliably and much eailer. Right now it has >> been hidden by the fact that most of the time the time the page tables >> may be already allocated, but even then it's broken. > Done. It doesn't report anything for me, so _hopefully_ the GHES > driver is the only one that does games like this. See commit > b39ab98e2f47 ("Mark 'ioremap_page_range()' as possibly sleeping"). > > So now it should hopefully warn about this bad usage of page remapping > reliably, at least if you have CONFIG_DEBUG_ATOMIC_SLEEP enabled. > > Can somebody who has a working GHES setup (although Borislav seems to > think no such thing exists) verify? Hello Linus, I have verified that this flags the error for me every time ghes_proc() is used. But I also see it flagged in ARM PMU code: [ 7.381153] BUG: sleeping function called from invalid context at mm/slab.h:420 [ 7.387625] in_atomic(): 0, irqs_disabled(): 128, pid: 11, name: cpuhp/0 [ 7.394310] CPU: 0 PID: 11 Comm: cpuhp/0 Not tainted 4.14.0-rc7 #46 [ 7.400559] Hardware name: Qualcomm Qualcomm Centriq(TM) 2400 Development Platform [ 7.414361] Call trace: [ 7.416797] [<ffff000008088b28>] dump_backtrace+0x0/0x270 [ 7.422175] [<ffff000008088dbc>] show_stack+0x24/0x30 [ 7.427211] [<ffff0000090d01f0>] dump_stack+0x98/0xb8 [ 7.432246] [<ffff00000810118c>] ___might_sleep+0x104/0x128 [ 7.437799] [<ffff000008101208>] __might_sleep+0x58/0x90 [ 7.443097] [<ffff000008254a7c>] kmem_cache_alloc_trace+0x224/0x280 [ 7.449347] [<ffff000008e9c938>] armpmu_alloc+0x30/0x168 [ 7.454639] [<ffff000008e9d15c>] arm_pmu_acpi_cpu_starting+0x114/0x148 [ 7.461151] [<ffff0000080d0f30>] cpuhp_invoke_callback+0xb8/0x760 [ 7.467226] [<ffff0000080d1ec4>] cpuhp_thread_fun+0xa4/0x1b8 [ 7.472872] [<ffff0000080f661c>] smpboot_thread_fn+0x174/0x250 [ 7.478684] [<ffff0000080f18ec>] kthread+0x114/0x140 [ 7.483632] [<ffff000008084774>] ret_from_fork+0x10/0x1c For a GHES polling source: [ 47.944596] BUG: sleeping function called from invalid context at lib/ioremap.c:164 [ 47.951290] in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/19 [ 47.958150] CPU: 19 PID: 0 Comm: swapper/19 Tainted: G W 4.14.0-rc7 #46 [ 47.958152] Hardware name: Qualcomm Qualcomm Centriq(TM) 2400 Development Platform [ 47.958154] Call trace: [ 47.958161] [<ffff000008088b28>] dump_backtrace+0x0/0x270 [ 47.958165] [<ffff000008088dbc>] show_stack+0x24/0x30 [ 47.958169] [<ffff0000090d01f0>] dump_stack+0x98/0xb8 [ 47.958174] [<ffff00000810118c>] ___might_sleep+0x104/0x128 [ 47.958177] [<ffff000008101208>] __might_sleep+0x58/0x90 [ 47.958180] [<ffff0000090d3d20>] ioremap_page_range+0x40/0x310 [ 47.958185] [<ffff0000086c5a98>] ghes_copy_tofrom_phys+0x1f8/0x240 [ 47.958188] [<ffff0000086c5da8>] ghes_proc+0xb0/0x8f0 [ 47.958190] [<ffff0000086c6ae8>] ghes_poll_func+0x20/0x40 [ 47.958196] [<ffff00000814b3dc>] call_timer_fn+0x3c/0x1b0 [ 47.958198] [<ffff00000814b638>] expire_timers+0xe8/0x170 [ 47.958201] [<ffff00000814b7fc>] run_timer_softirq+0x13c/0x188 [ 47.958203] [<ffff000008081964>] __do_softirq+0x144/0x33c [ 47.958206] [<ffff0000080d6e78>] irq_exit+0xd0/0x108 [ 47.958210] [<ffff00000812dc44>] __handle_domain_irq+0x6c/0xc0 [ 47.958212] [<ffff000008081764>] gic_handle_irq+0xcc/0x188 For a GHES interrupt source: [ 265.502603] BUG: sleeping function called from invalid context at lib/ioremap.c:164 [ 265.509296] in_atomic(): 1, irqs_disabled(): 128, pid: 3, name: kworker/0:0 [ 265.516242] CPU: 0 PID: 3 Comm: kworker/0:0 Tainted: G W 4.14.0-rc7 #46 [ 265.516244] Hardware name: Qualcomm Qualcomm Centriq(TM) 2400 Development Platform [ 265.516251] Workqueue: kacpi_notify acpi_os_execute_deferred [ 265.516254] Call trace: [ 265.516258] [<ffff000008088b28>] dump_backtrace+0x0/0x270 [ 265.516261] [<ffff000008088dbc>] show_stack+0x24/0x30 [ 265.516264] [<ffff0000090d01f0>] dump_stack+0x98/0xb8 [ 265.516268] [<ffff00000810118c>] ___might_sleep+0x104/0x128 [ 265.516270] [<ffff000008101208>] __might_sleep+0x58/0x90 [ 265.516273] [<ffff0000090d3d20>] ioremap_page_range+0x40/0x310 [ 265.516277] [<ffff0000086c5a98>] ghes_copy_tofrom_phys+0x1f8/0x240 [ 265.516279] [<ffff0000086c5da8>] ghes_proc+0xb0/0x8f0 [ 265.516282] [<ffff0000086c6670>] ghes_notify_hed+0x50/0x90 [ 265.516286] [<ffff0000080f36a4>] notifier_call_chain+0x5c/0xa0 [ 265.516289] [<ffff0000080f3b80>] __blocking_notifier_call_chain+0x58/0xa0 [ 265.516291] [<ffff0000080f3c04>] blocking_notifier_call_chain+0x3c/0x50 [ 265.516293] [<ffff0000086c1140>] acpi_hed_notify+0x28/0x30 [ 265.516296] [<ffff000008678100>] acpi_device_notify+0x30/0x40 [ 265.516301] [<ffff000008691fb8>] acpi_ev_notify_dispatch+0x64/0x74 [ 265.516304] [<ffff00000867296c>] acpi_os_execute_deferred+0x24/0x38 [ 265.516308] [<ffff0000080ea748>] process_one_work+0x1f8/0x488 [ 265.516310] [<ffff0000080eaa30>] worker_thread+0x58/0x4a0 [ 265.516312] [<ffff0000080f18ec>] kthread+0x114/0x140 [ 265.516315] [<ffff000008084774>] ret_from_fork+0x10/0x1c Thanks, Tyler -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150 2017-10-30 20:14 ` Tyler Baicar @ 2017-10-31 10:38 ` Will Deacon 2017-10-31 12:29 ` Mark Rutland [not found] ` <20171106224635.qopgsszwxzuitkpf@wfg-t540p.sh.intel.com> 1 sibling, 1 reply; 13+ messages in thread From: Will Deacon @ 2017-10-31 10:38 UTC (permalink / raw) To: Tyler Baicar Cc: Linus Torvalds, Borislav Petkov, Len Brown, Tony Luck, Fengguang Wu, Huang Ying, Chen Gong, Linux Kernel Mailing List, Rafael J. Wysocki, Linux ACPI, Timur Tabi, mark.rutland On Mon, Oct 30, 2017 at 04:14:15PM -0400, Tyler Baicar wrote: > On 10/30/2017 1:46 PM, Linus Torvalds wrote: > >On Mon, Oct 30, 2017 at 10:20 AM, Linus Torvalds > ><torvalds@linux-foundation.org> wrote: > >>I will add a "might_sleep()" to ioremap_page_range() itself, so that > >>we get this warning more reliably and much eailer. Right now it has > >>been hidden by the fact that most of the time the time the page tables > >>may be already allocated, but even then it's broken. > >Done. It doesn't report anything for me, so _hopefully_ the GHES > >driver is the only one that does games like this. See commit > >b39ab98e2f47 ("Mark 'ioremap_page_range()' as possibly sleeping"). > > > >So now it should hopefully warn about this bad usage of page remapping > >reliably, at least if you have CONFIG_DEBUG_ATOMIC_SLEEP enabled. > > > >Can somebody who has a working GHES setup (although Borislav seems to > >think no such thing exists) verify? > Hello Linus, > > I have verified that this flags the error for me every time ghes_proc() is used. > But I also see it flagged in ARM PMU code: > > [ 7.381153] BUG: sleeping function called from invalid context at mm/slab.h:420 > [ 7.387625] in_atomic(): 0, irqs_disabled(): 128, pid: 11, name: cpuhp/0 > [ 7.394310] CPU: 0 PID: 11 Comm: cpuhp/0 Not tainted 4.14.0-rc7 #46 > [ 7.400559] Hardware name: Qualcomm Qualcomm Centriq(TM) 2400 Development > Platform > [ 7.414361] Call trace: > [ 7.416797] [<ffff000008088b28>] dump_backtrace+0x0/0x270 > [ 7.422175] [<ffff000008088dbc>] show_stack+0x24/0x30 > [ 7.427211] [<ffff0000090d01f0>] dump_stack+0x98/0xb8 > [ 7.432246] [<ffff00000810118c>] ___might_sleep+0x104/0x128 > [ 7.437799] [<ffff000008101208>] __might_sleep+0x58/0x90 > [ 7.443097] [<ffff000008254a7c>] kmem_cache_alloc_trace+0x224/0x280 > [ 7.449347] [<ffff000008e9c938>] armpmu_alloc+0x30/0x168 > [ 7.454639] [<ffff000008e9d15c>] arm_pmu_acpi_cpu_starting+0x114/0x148 > [ 7.461151] [<ffff0000080d0f30>] cpuhp_invoke_callback+0xb8/0x760 > [ 7.467226] [<ffff0000080d1ec4>] cpuhp_thread_fun+0xa4/0x1b8 > [ 7.472872] [<ffff0000080f661c>] smpboot_thread_fn+0x174/0x250 > [ 7.478684] [<ffff0000080f18ec>] kthread+0x114/0x140 > [ 7.483632] [<ffff000008084774>] ret_from_fork+0x10/0x1c I know Mark was doing some fixes in the ACPI notifier code here, so I've added him to CC. Will ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150 2017-10-31 10:38 ` Will Deacon @ 2017-10-31 12:29 ` Mark Rutland 0 siblings, 0 replies; 13+ messages in thread From: Mark Rutland @ 2017-10-31 12:29 UTC (permalink / raw) To: Will Deacon Cc: Tyler Baicar, Linus Torvalds, Borislav Petkov, Len Brown, Tony Luck, Fengguang Wu, Huang Ying, Chen Gong, Linux Kernel Mailing List, Rafael J. Wysocki, Linux ACPI, Timur Tabi On Tue, Oct 31, 2017 at 10:38:33AM +0000, Will Deacon wrote: > On Mon, Oct 30, 2017 at 04:14:15PM -0400, Tyler Baicar wrote: > > On 10/30/2017 1:46 PM, Linus Torvalds wrote: > > >On Mon, Oct 30, 2017 at 10:20 AM, Linus Torvalds > > ><torvalds@linux-foundation.org> wrote: > > >>I will add a "might_sleep()" to ioremap_page_range() itself, so that > > >>we get this warning more reliably and much eailer. Right now it has > > >>been hidden by the fact that most of the time the time the page tables > > >>may be already allocated, but even then it's broken. > > >Done. It doesn't report anything for me, so _hopefully_ the GHES > > >driver is the only one that does games like this. See commit > > >b39ab98e2f47 ("Mark 'ioremap_page_range()' as possibly sleeping"). > > > > > >So now it should hopefully warn about this bad usage of page remapping > > >reliably, at least if you have CONFIG_DEBUG_ATOMIC_SLEEP enabled. > > > > > >Can somebody who has a working GHES setup (although Borislav seems to > > >think no such thing exists) verify? > > Hello Linus, > > > > I have verified that this flags the error for me every time ghes_proc() is used. > > But I also see it flagged in ARM PMU code: > > > > [ 7.381153] BUG: sleeping function called from invalid context at mm/slab.h:420 > > [ 7.387625] in_atomic(): 0, irqs_disabled(): 128, pid: 11, name: cpuhp/0 > > [ 7.394310] CPU: 0 PID: 11 Comm: cpuhp/0 Not tainted 4.14.0-rc7 #46 > > [ 7.400559] Hardware name: Qualcomm Qualcomm Centriq(TM) 2400 Development > > Platform > > [ 7.414361] Call trace: > > [ 7.416797] [<ffff000008088b28>] dump_backtrace+0x0/0x270 > > [ 7.422175] [<ffff000008088dbc>] show_stack+0x24/0x30 > > [ 7.427211] [<ffff0000090d01f0>] dump_stack+0x98/0xb8 > > [ 7.432246] [<ffff00000810118c>] ___might_sleep+0x104/0x128 > > [ 7.437799] [<ffff000008101208>] __might_sleep+0x58/0x90 > > [ 7.443097] [<ffff000008254a7c>] kmem_cache_alloc_trace+0x224/0x280 > > [ 7.449347] [<ffff000008e9c938>] armpmu_alloc+0x30/0x168 > > [ 7.454639] [<ffff000008e9d15c>] arm_pmu_acpi_cpu_starting+0x114/0x148 > > [ 7.461151] [<ffff0000080d0f30>] cpuhp_invoke_callback+0xb8/0x760 > > [ 7.467226] [<ffff0000080d1ec4>] cpuhp_thread_fun+0xa4/0x1b8 > > [ 7.472872] [<ffff0000080f661c>] smpboot_thread_fn+0x174/0x250 > > [ 7.478684] [<ffff0000080f18ec>] kthread+0x114/0x140 > > [ 7.483632] [<ffff000008084774>] ret_from_fork+0x10/0x1c > > I know Mark was doing some fixes in the ACPI notifier code here, so I've > added him to CC. Sorry for the delay on this; I have a rather hideous fix that I'll clean up and post shortly. Thanks, Mark. ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20171106224635.qopgsszwxzuitkpf@wfg-t540p.sh.intel.com>]
* Re: [v4.14-rc8 ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at lib/ioremap.c:165 [not found] ` <20171106224635.qopgsszwxzuitkpf@wfg-t540p.sh.intel.com> @ 2017-11-06 22:57 ` Linus Torvalds 2017-11-06 23:20 ` Fengguang Wu 2017-11-06 23:02 ` Borislav Petkov 1 sibling, 1 reply; 13+ messages in thread From: Linus Torvalds @ 2017-11-06 22:57 UTC (permalink / raw) To: Fengguang Wu, James Morse Cc: Tyler Baicar, Borislav Petkov, Len Brown, Tony Luck, Huang Ying, Chen Gong, Linux Kernel Mailing List, Will Deacon, Rafael J. Wysocki, Linux ACPI, Timur Tabi, Mark Rutland On Mon, Nov 6, 2017 at 2:46 PM, Fengguang Wu <fengguang.wu@intel.com> wrote: > > I can see that in RC8, too: James Morse posted a new version of his series to fix this, and it's gotten a few tests, but not a lot. Since you clearly have GHES support on at least some of your machines, it might be worth adding that series from James to 0day testing. The patches look good to me, and I assume I'll be be getting it through Rafael from the ACPI tree (which is how the other ghes code reaches me), but maybe by now for 4.15 with a stable backport. The actual problem is definitely not new. Only the warning message. So the code should work as well as it ever has, which may or may not be saying a lot. It might be worth fixing for 4.14 just to not scare people too much with messages, but at the same time it's not a _functional_ regression. Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [v4.14-rc8 ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at lib/ioremap.c:165 2017-11-06 22:57 ` [v4.14-rc8 ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at lib/ioremap.c:165 Linus Torvalds @ 2017-11-06 23:20 ` Fengguang Wu 0 siblings, 0 replies; 13+ messages in thread From: Fengguang Wu @ 2017-11-06 23:20 UTC (permalink / raw) To: Linus Torvalds Cc: James Morse, Tyler Baicar, Borislav Petkov, Len Brown, Tony Luck, Huang Ying, Chen Gong, Linux Kernel Mailing List, Will Deacon, Rafael J. Wysocki, Linux ACPI, Timur Tabi, Mark Rutland On Mon, Nov 06, 2017 at 02:57:20PM -0800, Linus Torvalds wrote: >On Mon, Nov 6, 2017 at 2:46 PM, Fengguang Wu <fengguang.wu@intel.com> wrote: >> >> I can see that in RC8, too: > >James Morse posted a new version of his series to fix this, and it's >gotten a few tests, but not a lot. Since you clearly have GHES support >on at least some of your machines, it might be worth adding that >series from James to 0day testing. Sure. I'll test Rafael's git tree including James' patches. I can see the GHES warnings in a number of 0day machines: - ivb44: Ivytown Ivy Bridge-EP, E5-2697 v2 HW details can be found in https://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git/tree/hosts/ivb44 - lkp-bdw-ep6: Broadwell-EP, E5-2699 v4 - lkp-bdw-ex2: Broadwell-EX, E7-8890 v4 - lkp-skl-2sp3: Skylake - lkp-skl-4sp1: Skylake - lkp-avoton2: Atom Regards, Fengguang ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [v4.14-rc8 ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at lib/ioremap.c:165 [not found] ` <20171106224635.qopgsszwxzuitkpf@wfg-t540p.sh.intel.com> 2017-11-06 22:57 ` [v4.14-rc8 ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at lib/ioremap.c:165 Linus Torvalds @ 2017-11-06 23:02 ` Borislav Petkov 2017-11-06 23:04 ` Rafael J. Wysocki 2017-11-07 13:39 ` Fengguang Wu 1 sibling, 2 replies; 13+ messages in thread From: Borislav Petkov @ 2017-11-06 23:02 UTC (permalink / raw) To: Fengguang Wu, Rafael J. Wysocki Cc: Tyler Baicar, Linus Torvalds, Len Brown, Tony Luck, Huang Ying, Linux Kernel Mailing List, Will Deacon, Linux ACPI, Timur Tabi, Mark Rutland On Tue, Nov 07, 2017 at 06:46:35AM +0800, Fengguang Wu wrote: > I can see that in RC8, too: https://lkml.kernel.org/r/20171106184427.31905-1-james.morse@arm.com Rafael, you could still queue them for the merge window next week - they look pretty straightforward and low risk to me. Thx. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [v4.14-rc8 ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at lib/ioremap.c:165 2017-11-06 23:02 ` Borislav Petkov @ 2017-11-06 23:04 ` Rafael J. Wysocki 2017-11-07 13:39 ` Fengguang Wu 1 sibling, 0 replies; 13+ messages in thread From: Rafael J. Wysocki @ 2017-11-06 23:04 UTC (permalink / raw) To: Borislav Petkov Cc: Fengguang Wu, Rafael J. Wysocki, Tyler Baicar, Linus Torvalds, Len Brown, Tony Luck, Huang Ying, Linux Kernel Mailing List, Will Deacon, Linux ACPI, Timur Tabi, Mark Rutland On Tue, Nov 7, 2017 at 12:02 AM, Borislav Petkov <bp@suse.de> wrote: > On Tue, Nov 07, 2017 at 06:46:35AM +0800, Fengguang Wu wrote: >> I can see that in RC8, too: > > https://lkml.kernel.org/r/20171106184427.31905-1-james.morse@arm.com > > Rafael, you could still queue them for the merge window next week - they > look pretty straightforward and low risk to me. OK, I'll queue them up. Thanks, Rafael ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [v4.14-rc8 ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at lib/ioremap.c:165 2017-11-06 23:02 ` Borislav Petkov 2017-11-06 23:04 ` Rafael J. Wysocki @ 2017-11-07 13:39 ` Fengguang Wu 1 sibling, 0 replies; 13+ messages in thread From: Fengguang Wu @ 2017-11-07 13:39 UTC (permalink / raw) To: Borislav Petkov Cc: Rafael J. Wysocki, Tyler Baicar, Linus Torvalds, Len Brown, Tony Luck, Huang Ying, Linux Kernel Mailing List, Will Deacon, Linux ACPI, Timur Tabi, Mark Rutland On Tue, Nov 07, 2017 at 12:02:19AM +0100, Borislav Petkov wrote: >On Tue, Nov 07, 2017 at 06:46:35AM +0800, Fengguang Wu wrote: >> I can see that in RC8, too: > >https://lkml.kernel.org/r/20171106184427.31905-1-james.morse@arm.com > >Rafael, you could still queue them for the merge window next week - they >look pretty straightforward and low risk to me. Tested-by: Fengguang Wu <fengguang.wu@intel.com> I tried 100 boots with various test jobs and there is no more GHES errors. Thanks, Fengguang ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-11-07 13:39 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CA+55aFxSJGeN=2X-uX-on1Uq2Nb8+v1aiMDz5H1+tKW_N5Q+6g@mail.gmail.com>
[not found] ` <20171029225155.qcum5i75awrt5tzm@wfg-t540p.sh.intel.com>
[not found] ` <20171029231835.3725fnd5yehlmqob@wfg-t540p.sh.intel.com>
[not found] ` <20171030110511.scfrdtlnf5lbdhu5@pd.tnic>
2017-10-30 17:20 ` [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150 Linus Torvalds
2017-10-30 17:42 ` Borislav Petkov
2017-10-30 17:46 ` Linus Torvalds
2017-10-30 17:49 ` Will Deacon
2017-10-30 18:00 ` Linus Torvalds
2017-10-30 20:14 ` Tyler Baicar
2017-10-31 10:38 ` Will Deacon
2017-10-31 12:29 ` Mark Rutland
[not found] ` <20171106224635.qopgsszwxzuitkpf@wfg-t540p.sh.intel.com>
2017-11-06 22:57 ` [v4.14-rc8 ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at lib/ioremap.c:165 Linus Torvalds
2017-11-06 23:20 ` Fengguang Wu
2017-11-06 23:02 ` Borislav Petkov
2017-11-06 23:04 ` Rafael J. Wysocki
2017-11-07 13:39 ` Fengguang Wu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox