iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* amd-iommu/x2apic: sleeping function called from invalid context
@ 2017-07-25 13:56 Artem Savkov
       [not found] ` <20170725135618.hev4vj7w24gm3a5q-TUG+jSMfqtFQcClZ3XN9yxcY2uh10dtjAL8bYrjMMd8@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Artem Savkov @ 2017-07-25 13:56 UTC (permalink / raw)
  To: Joerg Roedel, Thomas Gleixner; +Cc: iommu, x86, linux-kernel

Hi,

Commit 1c3c5ea "sched/core: Enable might_sleep() and smp_processor_id()
checks early" seem to have uncovered an issue with amd-iommu/x2apic.

Starting with that commit the following warning started to show up on AMD
systems during boot:

[    0.140480] smpboot: Max logical packages: 6 
[    0.160000] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747 
[    0.160000] in_atomic(): 0, irqs_disabled(): 1, pid: 1, name: swapper/0 
[    0.160000] no locks held by swapper/0/1. 
[    0.160000] irq event stamp: 304 
[    0.160000] hardirqs last  enabled at (303): [<ffffffff818a87b6>] _raw_spin_unlock_irqrestore+0x36/0x60 
[    0.160000] hardirqs last disabled at (304): [<ffffffff8235d440>] enable_IR_x2apic+0x79/0x196 
[    0.160000] softirqs last  enabled at (36): [<ffffffff818ae75f>] __do_softirq+0x35f/0x4ec 
[    0.160000] softirqs last disabled at (31): [<ffffffff810c1955>] irq_exit+0x105/0x120 
[    0.160000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc2.1.el7a.test.x86_64.debug #1 
[    0.160000] Hardware name:          PowerEdge C6145 /040N24, BIOS 3.5.0 10/28/2014 
[    0.160000] Call Trace: 
[    0.160000]  dump_stack+0x85/0xca 
[    0.160000]  ___might_sleep+0x22a/0x260 
[    0.160000]  __might_sleep+0x4a/0x80 
[    0.160000]  __mutex_lock+0x58/0x960 
[    0.160000]  ? iommu_completion_wait.part.17+0xb5/0x160 
[    0.160000]  ? register_syscore_ops+0x1d/0x70 
[    0.160000]  ? iommu_flush_all_caches+0x120/0x150 
[    0.160000]  mutex_lock_nested+0x1b/0x20 
[    0.160000]  register_syscore_ops+0x1d/0x70 
[    0.160000]  state_next+0x119/0x910 
[    0.160000]  iommu_go_to_state+0x29/0x30 
[    0.160000]  amd_iommu_enable+0x13/0x23 
[    0.160000]  irq_remapping_enable+0x1b/0x39 
[    0.160000]  enable_IR_x2apic+0x91/0x196 
[    0.160000]  default_setup_apic_routing+0x16/0x6e 
[    0.160000]  native_smp_prepare_cpus+0x257/0x2d5 
[    0.160000]  kernel_init_freeable+0x131/0x2a7 
[    0.160000]  ? kernel_init+0xe/0x104 
[    0.160000]  ? _raw_spin_unlock_irq+0x2c/0x40 
[    0.160000]  ? rest_init+0xe0/0xe0 
[    0.160000]  kernel_init+0xe/0x104 
[    0.160000]  ret_from_fork+0x2a/0x40 
[    0.160010] Switched APIC routing to physical flat. 

-- 
Regards,
  Artem

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: amd-iommu/x2apic: sleeping function called from invalid context
       [not found] ` <20170725135618.hev4vj7w24gm3a5q-TUG+jSMfqtFQcClZ3XN9yxcY2uh10dtjAL8bYrjMMd8@public.gmane.org>
@ 2017-07-26 10:42   ` Thomas Gleixner
  2017-07-26 12:26     ` [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code Joerg Roedel
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Gleixner @ 2017-07-26 10:42 UTC (permalink / raw)
  To: Artem Savkov
  Cc: x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue, 25 Jul 2017, Artem Savkov wrote:

> Hi,
> 
> Commit 1c3c5ea "sched/core: Enable might_sleep() and smp_processor_id()
> checks early" seem to have uncovered an issue with amd-iommu/x2apic.
> 
> Starting with that commit the following warning started to show up on AMD
> systems during boot:
 
> [    0.160000] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747 

> [    0.160000]  mutex_lock_nested+0x1b/0x20 
> [    0.160000]  register_syscore_ops+0x1d/0x70 
> [    0.160000]  state_next+0x119/0x910 
> [    0.160000]  iommu_go_to_state+0x29/0x30 
> [    0.160000]  amd_iommu_enable+0x13/0x23 
> [    0.160000]  irq_remapping_enable+0x1b/0x39 
> [    0.160000]  enable_IR_x2apic+0x91/0x196 
> [    0.160000]  default_setup_apic_routing+0x16/0x6e 
> [    0.160000]  native_smp_prepare_cpus+0x257/0x2d5 

Yep, that's clearly stupid. The completely untested patch below should cure
the issue.

Thanks,

	tglx

8<---------------	
	
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2440,7 +2440,6 @@ static int __init state_next(void)
 		break;
 	case IOMMU_ACPI_FINISHED:
 		early_enable_iommus();
-		register_syscore_ops(&amd_iommu_syscore_ops);
 		x86_platform.iommu_shutdown = disable_iommus;
 		init_state = IOMMU_ENABLED;
 		break;
@@ -2559,6 +2558,8 @@ static int __init amd_iommu_init(void)
 			for_each_iommu(iommu)
 				iommu_flush_all_caches(iommu);
 		}
+	} else {
+		register_syscore_ops(&amd_iommu_syscore_ops);
 	}
 
 	return ret;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code
  2017-07-26 10:42   ` Thomas Gleixner
@ 2017-07-26 12:26     ` Joerg Roedel
  2017-07-26 13:23       ` Thomas Gleixner
  2017-07-26 13:25       ` Artem Savkov
  0 siblings, 2 replies; 6+ messages in thread
From: Joerg Roedel @ 2017-07-26 12:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Artem Savkov, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hi Artem, Thomas,

On Wed, Jul 26, 2017 at 12:42:49PM +0200, Thomas Gleixner wrote:
> On Tue, 25 Jul 2017, Artem Savkov wrote:
> 
> > Hi,
> > 
> > Commit 1c3c5ea "sched/core: Enable might_sleep() and smp_processor_id()
> > checks early" seem to have uncovered an issue with amd-iommu/x2apic.
> > 
> > Starting with that commit the following warning started to show up on AMD
> > systems during boot:
>  
> > [    0.160000] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747 
> 
> > [    0.160000]  mutex_lock_nested+0x1b/0x20 
> > [    0.160000]  register_syscore_ops+0x1d/0x70 
> > [    0.160000]  state_next+0x119/0x910 
> > [    0.160000]  iommu_go_to_state+0x29/0x30 
> > [    0.160000]  amd_iommu_enable+0x13/0x23 
> > [    0.160000]  irq_remapping_enable+0x1b/0x39 
> > [    0.160000]  enable_IR_x2apic+0x91/0x196 
> > [    0.160000]  default_setup_apic_routing+0x16/0x6e 
> > [    0.160000]  native_smp_prepare_cpus+0x257/0x2d5

Thanks for the report!

> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -2440,7 +2440,6 @@ static int __init state_next(void)
>  		break;
>  	case IOMMU_ACPI_FINISHED:
>  		early_enable_iommus();
> -		register_syscore_ops(&amd_iommu_syscore_ops);
>  		x86_platform.iommu_shutdown = disable_iommus;
>  		init_state = IOMMU_ENABLED;
>  		break;
> @@ -2559,6 +2558,8 @@ static int __init amd_iommu_init(void)
>  			for_each_iommu(iommu)
>  				iommu_flush_all_caches(iommu);
>  		}
> +	} else {
> +		register_syscore_ops(&amd_iommu_syscore_ops);
>  	}
>  
>  	return ret;

Yes, that should fix it, but I think its better to just move the
register_syscore_ops() call to a later initialization step, like in the
patch below. I tested it an will queue it to my iommu/fixes branch.

>From 461242d7211c7777901b6ccdf349cc89235bd5da Mon Sep 17 00:00:00 2001
From: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org>
Date: Wed, 26 Jul 2017 14:17:55 +0200
Subject: [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization
 code

The register_syscore_ops() function takes a mutex and might
sleep. In the IOMMU initialization code it is invoked during
irq-remapping setup already, where irqs are disabled.

This causes a schedule-while-atomic bug:

 BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
 in_atomic(): 0, irqs_disabled(): 1, pid: 1, name: swapper/0
 no locks held by swapper/0/1.
 irq event stamp: 304
 hardirqs last  enabled at (303): [<ffffffff818a87b6>] _raw_spin_unlock_irqrestore+0x36/0x60
 hardirqs last disabled at (304): [<ffffffff8235d440>] enable_IR_x2apic+0x79/0x196
 softirqs last  enabled at (36): [<ffffffff818ae75f>] __do_softirq+0x35f/0x4ec
 softirqs last disabled at (31): [<ffffffff810c1955>] irq_exit+0x105/0x120
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc2.1.el7a.test.x86_64.debug #1
 Hardware name:          PowerEdge C6145 /040N24, BIOS 3.5.0 10/28/2014
 Call Trace:
  dump_stack+0x85/0xca
  ___might_sleep+0x22a/0x260
  __might_sleep+0x4a/0x80
  __mutex_lock+0x58/0x960
  ? iommu_completion_wait.part.17+0xb5/0x160
  ? register_syscore_ops+0x1d/0x70
  ? iommu_flush_all_caches+0x120/0x150
  mutex_lock_nested+0x1b/0x20
  register_syscore_ops+0x1d/0x70
  state_next+0x119/0x910
  iommu_go_to_state+0x29/0x30
  amd_iommu_enable+0x13/0x23

Fix it by moving the register_syscore_ops() call to the next
initialization step, which runs with irqs enabled.

Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org>
---
 drivers/iommu/amd_iommu_init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 5cc597b383c7..372303700566 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2440,11 +2440,11 @@ static int __init state_next(void)
 		break;
 	case IOMMU_ACPI_FINISHED:
 		early_enable_iommus();
-		register_syscore_ops(&amd_iommu_syscore_ops);
 		x86_platform.iommu_shutdown = disable_iommus;
 		init_state = IOMMU_ENABLED;
 		break;
 	case IOMMU_ENABLED:
+		register_syscore_ops(&amd_iommu_syscore_ops);
 		ret = amd_iommu_init_pci();
 		init_state = ret ? IOMMU_INIT_ERROR : IOMMU_PCI_INIT;
 		enable_iommus_v2();
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code
  2017-07-26 12:26     ` [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code Joerg Roedel
@ 2017-07-26 13:23       ` Thomas Gleixner
  2017-07-26 13:25       ` Artem Savkov
  1 sibling, 0 replies; 6+ messages in thread
From: Thomas Gleixner @ 2017-07-26 13:23 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Artem Savkov, iommu, x86, linux-kernel

On Wed, 26 Jul 2017, Joerg Roedel wrote:
> Yes, that should fix it, but I think its better to just move the
> register_syscore_ops() call to a later initialization step, like in the
> patch below. I tested it an will queue it to my iommu/fixes branch.

Fair enough. Acked-by-me.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code
  2017-07-26 12:26     ` [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code Joerg Roedel
  2017-07-26 13:23       ` Thomas Gleixner
@ 2017-07-26 13:25       ` Artem Savkov
  2017-07-26 13:41         ` Joerg Roedel
  1 sibling, 1 reply; 6+ messages in thread
From: Artem Savkov @ 2017-07-26 13:25 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Thomas Gleixner, iommu, x86, linux-kernel

On Wed, Jul 26, 2017 at 02:26:14PM +0200, Joerg Roedel wrote:
> Hi Artem, Thomas,
> 
> On Wed, Jul 26, 2017 at 12:42:49PM +0200, Thomas Gleixner wrote:
> > On Tue, 25 Jul 2017, Artem Savkov wrote:
> > 
> > > Hi,
> > > 
> > > Commit 1c3c5ea "sched/core: Enable might_sleep() and smp_processor_id()
> > > checks early" seem to have uncovered an issue with amd-iommu/x2apic.
> > > 
> > > Starting with that commit the following warning started to show up on AMD
> > > systems during boot:
> >  
> > > [    0.160000] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747 
> > 
> > > [    0.160000]  mutex_lock_nested+0x1b/0x20 
> > > [    0.160000]  register_syscore_ops+0x1d/0x70 
> > > [    0.160000]  state_next+0x119/0x910 
> > > [    0.160000]  iommu_go_to_state+0x29/0x30 
> > > [    0.160000]  amd_iommu_enable+0x13/0x23 
> > > [    0.160000]  irq_remapping_enable+0x1b/0x39 
> > > [    0.160000]  enable_IR_x2apic+0x91/0x196 
> > > [    0.160000]  default_setup_apic_routing+0x16/0x6e 
> > > [    0.160000]  native_smp_prepare_cpus+0x257/0x2d5
> 
> Thanks for the report!
> 
> > --- a/drivers/iommu/amd_iommu_init.c
> > +++ b/drivers/iommu/amd_iommu_init.c
> > @@ -2440,7 +2440,6 @@ static int __init state_next(void)
> >  		break;
> >  	case IOMMU_ACPI_FINISHED:
> >  		early_enable_iommus();
> > -		register_syscore_ops(&amd_iommu_syscore_ops);
> >  		x86_platform.iommu_shutdown = disable_iommus;
> >  		init_state = IOMMU_ENABLED;
> >  		break;
> > @@ -2559,6 +2558,8 @@ static int __init amd_iommu_init(void)
> >  			for_each_iommu(iommu)
> >  				iommu_flush_all_caches(iommu);
> >  		}
> > +	} else {
> > +		register_syscore_ops(&amd_iommu_syscore_ops);
> >  	}
> >  
> >  	return ret;
> 
> Yes, that should fix it, but I think its better to just move the
> register_syscore_ops() call to a later initialization step, like in the
> patch below. I tested it an will queue it to my iommu/fixes branch.

Checked it as well just in case, didn't see any issues. Thank you.

Reported-and-tested-by: Artem Savkov <asavkov@redhat.com>

-- 
Regards,
  Artem

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code
  2017-07-26 13:25       ` Artem Savkov
@ 2017-07-26 13:41         ` Joerg Roedel
  0 siblings, 0 replies; 6+ messages in thread
From: Joerg Roedel @ 2017-07-26 13:41 UTC (permalink / raw)
  To: Artem Savkov; +Cc: Thomas Gleixner, iommu, x86, linux-kernel

On Wed, Jul 26, 2017 at 03:25:05PM +0200, Artem Savkov wrote:
> On Wed, Jul 26, 2017 at 02:26:14PM +0200, Joerg Roedel wrote:
> > Yes, that should fix it, but I think its better to just move the
> > register_syscore_ops() call to a later initialization step, like in the
> > patch below. I tested it an will queue it to my iommu/fixes branch.
> 
> Checked it as well just in case, didn't see any issues. Thank you.
> 
> Reported-and-tested-by: Artem Savkov <asavkov@redhat.com>

Thanks for testing it! I added your's and Thomas' tags and applied the
patch to my tree. It should go upstream this week.


	Joerg

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-07-26 13:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-25 13:56 amd-iommu/x2apic: sleeping function called from invalid context Artem Savkov
     [not found] ` <20170725135618.hev4vj7w24gm3a5q-TUG+jSMfqtFQcClZ3XN9yxcY2uh10dtjAL8bYrjMMd8@public.gmane.org>
2017-07-26 10:42   ` Thomas Gleixner
2017-07-26 12:26     ` [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code Joerg Roedel
2017-07-26 13:23       ` Thomas Gleixner
2017-07-26 13:25       ` Artem Savkov
2017-07-26 13:41         ` Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).