* amd-iommu/x2apic: sleeping function called from invalid context
@ 2017-07-25 13:56 Artem Savkov
[not found] ` <20170725135618.hev4vj7w24gm3a5q-TUG+jSMfqtFQcClZ3XN9yxcY2uh10dtjAL8bYrjMMd8@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Artem Savkov @ 2017-07-25 13:56 UTC (permalink / raw)
To: Joerg Roedel, Thomas Gleixner; +Cc: iommu, x86, linux-kernel
Hi,
Commit 1c3c5ea "sched/core: Enable might_sleep() and smp_processor_id()
checks early" seem to have uncovered an issue with amd-iommu/x2apic.
Starting with that commit the following warning started to show up on AMD
systems during boot:
[ 0.140480] smpboot: Max logical packages: 6
[ 0.160000] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
[ 0.160000] in_atomic(): 0, irqs_disabled(): 1, pid: 1, name: swapper/0
[ 0.160000] no locks held by swapper/0/1.
[ 0.160000] irq event stamp: 304
[ 0.160000] hardirqs last enabled at (303): [<ffffffff818a87b6>] _raw_spin_unlock_irqrestore+0x36/0x60
[ 0.160000] hardirqs last disabled at (304): [<ffffffff8235d440>] enable_IR_x2apic+0x79/0x196
[ 0.160000] softirqs last enabled at (36): [<ffffffff818ae75f>] __do_softirq+0x35f/0x4ec
[ 0.160000] softirqs last disabled at (31): [<ffffffff810c1955>] irq_exit+0x105/0x120
[ 0.160000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc2.1.el7a.test.x86_64.debug #1
[ 0.160000] Hardware name: PowerEdge C6145 /040N24, BIOS 3.5.0 10/28/2014
[ 0.160000] Call Trace:
[ 0.160000] dump_stack+0x85/0xca
[ 0.160000] ___might_sleep+0x22a/0x260
[ 0.160000] __might_sleep+0x4a/0x80
[ 0.160000] __mutex_lock+0x58/0x960
[ 0.160000] ? iommu_completion_wait.part.17+0xb5/0x160
[ 0.160000] ? register_syscore_ops+0x1d/0x70
[ 0.160000] ? iommu_flush_all_caches+0x120/0x150
[ 0.160000] mutex_lock_nested+0x1b/0x20
[ 0.160000] register_syscore_ops+0x1d/0x70
[ 0.160000] state_next+0x119/0x910
[ 0.160000] iommu_go_to_state+0x29/0x30
[ 0.160000] amd_iommu_enable+0x13/0x23
[ 0.160000] irq_remapping_enable+0x1b/0x39
[ 0.160000] enable_IR_x2apic+0x91/0x196
[ 0.160000] default_setup_apic_routing+0x16/0x6e
[ 0.160000] native_smp_prepare_cpus+0x257/0x2d5
[ 0.160000] kernel_init_freeable+0x131/0x2a7
[ 0.160000] ? kernel_init+0xe/0x104
[ 0.160000] ? _raw_spin_unlock_irq+0x2c/0x40
[ 0.160000] ? rest_init+0xe0/0xe0
[ 0.160000] kernel_init+0xe/0x104
[ 0.160000] ret_from_fork+0x2a/0x40
[ 0.160010] Switched APIC routing to physical flat.
--
Regards,
Artem
^ permalink raw reply [flat|nested] 6+ messages in thread[parent not found: <20170725135618.hev4vj7w24gm3a5q-TUG+jSMfqtFQcClZ3XN9yxcY2uh10dtjAL8bYrjMMd8@public.gmane.org>]
* Re: amd-iommu/x2apic: sleeping function called from invalid context [not found] ` <20170725135618.hev4vj7w24gm3a5q-TUG+jSMfqtFQcClZ3XN9yxcY2uh10dtjAL8bYrjMMd8@public.gmane.org> @ 2017-07-26 10:42 ` Thomas Gleixner 2017-07-26 12:26 ` [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code Joerg Roedel 0 siblings, 1 reply; 6+ messages in thread From: Thomas Gleixner @ 2017-07-26 10:42 UTC (permalink / raw) To: Artem Savkov Cc: x86-DgEjT+Ai2ygdnm+yROfE0A, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Tue, 25 Jul 2017, Artem Savkov wrote: > Hi, > > Commit 1c3c5ea "sched/core: Enable might_sleep() and smp_processor_id() > checks early" seem to have uncovered an issue with amd-iommu/x2apic. > > Starting with that commit the following warning started to show up on AMD > systems during boot: > [ 0.160000] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747 > [ 0.160000] mutex_lock_nested+0x1b/0x20 > [ 0.160000] register_syscore_ops+0x1d/0x70 > [ 0.160000] state_next+0x119/0x910 > [ 0.160000] iommu_go_to_state+0x29/0x30 > [ 0.160000] amd_iommu_enable+0x13/0x23 > [ 0.160000] irq_remapping_enable+0x1b/0x39 > [ 0.160000] enable_IR_x2apic+0x91/0x196 > [ 0.160000] default_setup_apic_routing+0x16/0x6e > [ 0.160000] native_smp_prepare_cpus+0x257/0x2d5 Yep, that's clearly stupid. The completely untested patch below should cure the issue. Thanks, tglx 8<--------------- --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -2440,7 +2440,6 @@ static int __init state_next(void) break; case IOMMU_ACPI_FINISHED: early_enable_iommus(); - register_syscore_ops(&amd_iommu_syscore_ops); x86_platform.iommu_shutdown = disable_iommus; init_state = IOMMU_ENABLED; break; @@ -2559,6 +2558,8 @@ static int __init amd_iommu_init(void) for_each_iommu(iommu) iommu_flush_all_caches(iommu); } + } else { + register_syscore_ops(&amd_iommu_syscore_ops); } return ret; ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code 2017-07-26 10:42 ` Thomas Gleixner @ 2017-07-26 12:26 ` Joerg Roedel 2017-07-26 13:23 ` Thomas Gleixner 2017-07-26 13:25 ` Artem Savkov 0 siblings, 2 replies; 6+ messages in thread From: Joerg Roedel @ 2017-07-26 12:26 UTC (permalink / raw) To: Thomas Gleixner Cc: Artem Savkov, x86-DgEjT+Ai2ygdnm+yROfE0A, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Hi Artem, Thomas, On Wed, Jul 26, 2017 at 12:42:49PM +0200, Thomas Gleixner wrote: > On Tue, 25 Jul 2017, Artem Savkov wrote: > > > Hi, > > > > Commit 1c3c5ea "sched/core: Enable might_sleep() and smp_processor_id() > > checks early" seem to have uncovered an issue with amd-iommu/x2apic. > > > > Starting with that commit the following warning started to show up on AMD > > systems during boot: > > > [ 0.160000] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747 > > > [ 0.160000] mutex_lock_nested+0x1b/0x20 > > [ 0.160000] register_syscore_ops+0x1d/0x70 > > [ 0.160000] state_next+0x119/0x910 > > [ 0.160000] iommu_go_to_state+0x29/0x30 > > [ 0.160000] amd_iommu_enable+0x13/0x23 > > [ 0.160000] irq_remapping_enable+0x1b/0x39 > > [ 0.160000] enable_IR_x2apic+0x91/0x196 > > [ 0.160000] default_setup_apic_routing+0x16/0x6e > > [ 0.160000] native_smp_prepare_cpus+0x257/0x2d5 Thanks for the report! > --- a/drivers/iommu/amd_iommu_init.c > +++ b/drivers/iommu/amd_iommu_init.c > @@ -2440,7 +2440,6 @@ static int __init state_next(void) > break; > case IOMMU_ACPI_FINISHED: > early_enable_iommus(); > - register_syscore_ops(&amd_iommu_syscore_ops); > x86_platform.iommu_shutdown = disable_iommus; > init_state = IOMMU_ENABLED; > break; > @@ -2559,6 +2558,8 @@ static int __init amd_iommu_init(void) > for_each_iommu(iommu) > iommu_flush_all_caches(iommu); > } > + } else { > + register_syscore_ops(&amd_iommu_syscore_ops); > } > > return ret; Yes, that should fix it, but I think its better to just move the register_syscore_ops() call to a later initialization step, like in the patch below. I tested it an will queue it to my iommu/fixes branch. >From 461242d7211c7777901b6ccdf349cc89235bd5da Mon Sep 17 00:00:00 2001 From: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> Date: Wed, 26 Jul 2017 14:17:55 +0200 Subject: [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code The register_syscore_ops() function takes a mutex and might sleep. In the IOMMU initialization code it is invoked during irq-remapping setup already, where irqs are disabled. This causes a schedule-while-atomic bug: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747 in_atomic(): 0, irqs_disabled(): 1, pid: 1, name: swapper/0 no locks held by swapper/0/1. irq event stamp: 304 hardirqs last enabled at (303): [<ffffffff818a87b6>] _raw_spin_unlock_irqrestore+0x36/0x60 hardirqs last disabled at (304): [<ffffffff8235d440>] enable_IR_x2apic+0x79/0x196 softirqs last enabled at (36): [<ffffffff818ae75f>] __do_softirq+0x35f/0x4ec softirqs last disabled at (31): [<ffffffff810c1955>] irq_exit+0x105/0x120 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc2.1.el7a.test.x86_64.debug #1 Hardware name: PowerEdge C6145 /040N24, BIOS 3.5.0 10/28/2014 Call Trace: dump_stack+0x85/0xca ___might_sleep+0x22a/0x260 __might_sleep+0x4a/0x80 __mutex_lock+0x58/0x960 ? iommu_completion_wait.part.17+0xb5/0x160 ? register_syscore_ops+0x1d/0x70 ? iommu_flush_all_caches+0x120/0x150 mutex_lock_nested+0x1b/0x20 register_syscore_ops+0x1d/0x70 state_next+0x119/0x910 iommu_go_to_state+0x29/0x30 amd_iommu_enable+0x13/0x23 Fix it by moving the register_syscore_ops() call to the next initialization step, which runs with irqs enabled. Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> --- drivers/iommu/amd_iommu_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 5cc597b383c7..372303700566 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -2440,11 +2440,11 @@ static int __init state_next(void) break; case IOMMU_ACPI_FINISHED: early_enable_iommus(); - register_syscore_ops(&amd_iommu_syscore_ops); x86_platform.iommu_shutdown = disable_iommus; init_state = IOMMU_ENABLED; break; case IOMMU_ENABLED: + register_syscore_ops(&amd_iommu_syscore_ops); ret = amd_iommu_init_pci(); init_state = ret ? IOMMU_INIT_ERROR : IOMMU_PCI_INIT; enable_iommus_v2(); -- 2.13.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code 2017-07-26 12:26 ` [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code Joerg Roedel @ 2017-07-26 13:23 ` Thomas Gleixner 2017-07-26 13:25 ` Artem Savkov 1 sibling, 0 replies; 6+ messages in thread From: Thomas Gleixner @ 2017-07-26 13:23 UTC (permalink / raw) To: Joerg Roedel; +Cc: Artem Savkov, iommu, x86, linux-kernel On Wed, 26 Jul 2017, Joerg Roedel wrote: > Yes, that should fix it, but I think its better to just move the > register_syscore_ops() call to a later initialization step, like in the > patch below. I tested it an will queue it to my iommu/fixes branch. Fair enough. Acked-by-me. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code 2017-07-26 12:26 ` [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code Joerg Roedel 2017-07-26 13:23 ` Thomas Gleixner @ 2017-07-26 13:25 ` Artem Savkov 2017-07-26 13:41 ` Joerg Roedel 1 sibling, 1 reply; 6+ messages in thread From: Artem Savkov @ 2017-07-26 13:25 UTC (permalink / raw) To: Joerg Roedel; +Cc: Thomas Gleixner, iommu, x86, linux-kernel On Wed, Jul 26, 2017 at 02:26:14PM +0200, Joerg Roedel wrote: > Hi Artem, Thomas, > > On Wed, Jul 26, 2017 at 12:42:49PM +0200, Thomas Gleixner wrote: > > On Tue, 25 Jul 2017, Artem Savkov wrote: > > > > > Hi, > > > > > > Commit 1c3c5ea "sched/core: Enable might_sleep() and smp_processor_id() > > > checks early" seem to have uncovered an issue with amd-iommu/x2apic. > > > > > > Starting with that commit the following warning started to show up on AMD > > > systems during boot: > > > > > [ 0.160000] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747 > > > > > [ 0.160000] mutex_lock_nested+0x1b/0x20 > > > [ 0.160000] register_syscore_ops+0x1d/0x70 > > > [ 0.160000] state_next+0x119/0x910 > > > [ 0.160000] iommu_go_to_state+0x29/0x30 > > > [ 0.160000] amd_iommu_enable+0x13/0x23 > > > [ 0.160000] irq_remapping_enable+0x1b/0x39 > > > [ 0.160000] enable_IR_x2apic+0x91/0x196 > > > [ 0.160000] default_setup_apic_routing+0x16/0x6e > > > [ 0.160000] native_smp_prepare_cpus+0x257/0x2d5 > > Thanks for the report! > > > --- a/drivers/iommu/amd_iommu_init.c > > +++ b/drivers/iommu/amd_iommu_init.c > > @@ -2440,7 +2440,6 @@ static int __init state_next(void) > > break; > > case IOMMU_ACPI_FINISHED: > > early_enable_iommus(); > > - register_syscore_ops(&amd_iommu_syscore_ops); > > x86_platform.iommu_shutdown = disable_iommus; > > init_state = IOMMU_ENABLED; > > break; > > @@ -2559,6 +2558,8 @@ static int __init amd_iommu_init(void) > > for_each_iommu(iommu) > > iommu_flush_all_caches(iommu); > > } > > + } else { > > + register_syscore_ops(&amd_iommu_syscore_ops); > > } > > > > return ret; > > Yes, that should fix it, but I think its better to just move the > register_syscore_ops() call to a later initialization step, like in the > patch below. I tested it an will queue it to my iommu/fixes branch. Checked it as well just in case, didn't see any issues. Thank you. Reported-and-tested-by: Artem Savkov <asavkov@redhat.com> -- Regards, Artem ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code 2017-07-26 13:25 ` Artem Savkov @ 2017-07-26 13:41 ` Joerg Roedel 0 siblings, 0 replies; 6+ messages in thread From: Joerg Roedel @ 2017-07-26 13:41 UTC (permalink / raw) To: Artem Savkov; +Cc: Thomas Gleixner, iommu, x86, linux-kernel On Wed, Jul 26, 2017 at 03:25:05PM +0200, Artem Savkov wrote: > On Wed, Jul 26, 2017 at 02:26:14PM +0200, Joerg Roedel wrote: > > Yes, that should fix it, but I think its better to just move the > > register_syscore_ops() call to a later initialization step, like in the > > patch below. I tested it an will queue it to my iommu/fixes branch. > > Checked it as well just in case, didn't see any issues. Thank you. > > Reported-and-tested-by: Artem Savkov <asavkov@redhat.com> Thanks for testing it! I added your's and Thomas' tags and applied the patch to my tree. It should go upstream this week. Joerg ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-07-26 13:41 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-25 13:56 amd-iommu/x2apic: sleeping function called from invalid context Artem Savkov
[not found] ` <20170725135618.hev4vj7w24gm3a5q-TUG+jSMfqtFQcClZ3XN9yxcY2uh10dtjAL8bYrjMMd8@public.gmane.org>
2017-07-26 10:42 ` Thomas Gleixner
2017-07-26 12:26 ` [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code Joerg Roedel
2017-07-26 13:23 ` Thomas Gleixner
2017-07-26 13:25 ` Artem Savkov
2017-07-26 13:41 ` Joerg Roedel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).