Linux IOMMU Development
 help / color / mirror / Atom feed
* dma_ops_domain_alloc causes kernel 4.1.0-next-20150626+ panic
@ 2015-06-29 17:44 George Wang
       [not found] ` <CAPBX1x+zagVVYebbXU0M7VkEaDkzvqBGnkt6PW_N42fRQRQ9Gg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: George Wang @ 2015-06-29 17:44 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hi,
  I am trying to do some tests for kernel 4.1.0-next-20150626+, but
panic in amd_iommu_attach_dev. After some digging inside amd_iommu.c,
I found the suspecting code:

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index d3e5e9a..4f6da17 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1882,6 +1882,7 @@ static struct dma_ops_domain *dma_ops_domain_alloc(void)
                return NULL;

        spin_lock_init(&dma_dom->domain.lock);
+       mutex_init(&dma_dom->domain.api_lock);

When I initialize the api_lock, then I can go forward with another problem.

Thanks,

George

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: dma_ops_domain_alloc causes kernel 4.1.0-next-20150626+ panic
       [not found] ` <CAPBX1x+zagVVYebbXU0M7VkEaDkzvqBGnkt6PW_N42fRQRQ9Gg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-06-29 19:34   ` Joerg Roedel
       [not found]     ` <20150629193402.GM18569-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Joerg Roedel @ 2015-06-29 19:34 UTC (permalink / raw)
  To: George Wang; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue, Jun 30, 2015 at 01:44:34AM +0800, George Wang wrote:
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index d3e5e9a..4f6da17 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -1882,6 +1882,7 @@ static struct dma_ops_domain *dma_ops_domain_alloc(void)
>                 return NULL;
> 
>         spin_lock_init(&dma_dom->domain.lock);
> +       mutex_init(&dma_dom->domain.api_lock);
> 
> When I initialize the api_lock, then I can go forward with another problem.

How do you trigger this? The DMA-API domains are not used via the
IOMMU-API yet, so the initializing the api-lock for it shouldn't matter.


	Joerg

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: dma_ops_domain_alloc causes kernel 4.1.0-next-20150626+ panic
       [not found]     ` <20150629193402.GM18569-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
@ 2015-06-30  3:55       ` George Wang
       [not found]         ` <CAPBX1xLA_GDeoi9wq-9A7njwzL3NBqJYYT_PqhwEzBAg=9=8kA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: George Wang @ 2015-06-30  3:55 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue, Jun 30, 2015 at 3:34 AM, Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:
> On Tue, Jun 30, 2015 at 01:44:34AM +0800, George Wang wrote:
>> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>> index d3e5e9a..4f6da17 100644
>> --- a/drivers/iommu/amd_iommu.c
>> +++ b/drivers/iommu/amd_iommu.c
>> @@ -1882,6 +1882,7 @@ static struct dma_ops_domain *dma_ops_domain_alloc(void)
>>                 return NULL;
>>
>>         spin_lock_init(&dma_dom->domain.lock);
>> +       mutex_init(&dma_dom->domain.api_lock);
>>
>> When I initialize the api_lock, then I can go forward with another problem.
>
> How do you trigger this? The DMA-API domains are not used via the
> IOMMU-API yet, so the initializing the api-lock for it shouldn't matter.
>
>
>         Joerg
>

I don't know what triger it, I just build the kernel, install, and
panic. The call call trace is like below:

[   11.687392] BUG: unable to handle kernel NULL pointer dereference
at           (null)
[   11.690196] IP: [<ffffffff813326ef>] __list_add+0x1f/0xc0
[   11.692026] PGD 0
[   11.692794] Oops: 0000 [#1] SMP
[   11.693939] Modules linked in:
[   11.694997] CPU: 11 PID: 1 Comm: swapper/0 Not tainted
4.1.0-next-20150626+ #6
[   11.697415] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 02/06/2014
[   11.699683] task: ffff880835888000 ti: ffff880236918000 task.ti:
ffff880236918000
[   11.702281] RIP: 0010:[<ffffffff813326ef>]  [<ffffffff813326ef>]
__list_add+0x1f/0xc0
[   11.704935] RSP: 0018:ffff88023691b968  EFLAGS: 00010246
[   11.706702] RAX: 00000000ffffffff RBX: ffff88023691b998 RCX: ffff880835888000
[   11.709199] RDX: ffff880634f58468 RSI: 0000000000000000 RDI: ffff88023691b998
[   11.711597] RBP: ffff88023691b988 R08: 0000000000000000 R09: ffff88023691bab8
[   11.714022] R10: 00000000000f0000 R11: ffff880000000000 R12: ffff880634f58468
[   11.716415] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff880634f58468
[   11.718909] FS:  0000000000000000(0000) GS:ffff880637d40000(0000)
knlGS:0000000000000000
[   11.721575] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   11.723541] CR2: 0000000000000000 CR3: 00000000019d4000 CR4: 00000000000406e0
[   11.725960] Stack:
[   11.726632]  0000000000001000 ffff880634f58460 ffff880634f58464
ffff880835888000
[   11.729440]  ffff88023691b9e8 ffffffff8168fde1 ffff88023691b9f8
ffffffff81318798
[   11.732131]  000000000000a1ff 000000008d3d0bb4 00002fdd3691b9e8
ffff880634f58460
[   11.734774] Call Trace:
[   11.735635]  [<ffffffff8168fde1>] __mutex_lock_slowpath+0x91/0x120
[   11.737676]  [<ffffffff81318798>] ? ida_simple_get+0x98/0x100
[   11.739682]  [<ffffffff8168fe93>] mutex_lock+0x23/0x37
[   11.741407]  [<ffffffff8143513a>] amd_iommu_map+0x4a/0x1b0
[   11.743293]  [<ffffffff8143081a>] iommu_map+0xfa/0x200
[   11.745025]  [<ffffffff81431587>] iommu_group_add_device+0x327/0x390
[   11.747184]  [<ffffffff814316fb>] iommu_group_get_forv+0x10b/0x1f0
[   11.849564]  [<ffffffff81436ac6>] amd_iommu_add_device+0x1b6/0x580
[   11.851645]  [<ffffffff8168d891>] ? __schedule+0xe1/0x890
[   11.85350883]  [<ffffffff814304db>] add_iommu_group+0x2b/0x50
[   11.857765]  [<ffffffff8144b40c>] bus_for_each_dev+0x6c/0xc0
[   11.859752]  [<ffffffff814311b4>] ? bus_set_iommu+0x54/0x100
[   11.861698]  [<ffffffff8143121e>] bus_set_iommu+0xbe/0x100
[   11.863485]  [<ffffffff81b77e46>] amd_iommu_init_api+0x17/0x19
[   11.865473]  [<ffffffff81b7993c>] state_next+0x57e/0x715
[   11.867212]  [<ffffffff81b37eec>] ? memblock_find_dma_reserve+0x177/0x177
[   11.869577]  [<ffffffff81b79aed>] iommu_go_to_state+0x1a/0x2d
[   11.871577]  [<ffffffff81b79b72>] amd_iommu_init+0x15/0xfc
[   11.873425]  [<ffffffff81b37eff>] pci_iommu_init+0x13/0x3e
[   11.875259]  [<ffffffff8100213d>] do_one_initcall+0xcd/0x1f0
[   11.877162]  [<ffffffff81098d00>] ? parse_args+0x220/0x470
[   11.879122]  [<ffffffff810bd548>] ? __wake_up+0x48/0x60
[   11.880872]  [<ffffffff81b2e349>] kernel_inia5/0x249
[   12.282919]  [<ffffffff81b2d9dd>] ? initcall_blacklist+0xb6/0xb6
[   12.285018]  [<ffffffff8167b9a0>] ? rest_init+0x80/0x80
[   12.286803]  [<ffffffff8167b9ae>] kernel_init+0xe/0xe0
[   12.288621]  [<ffffffff81691f5f>] ret_from_fork+0x3f/0x70
[   12.290761]  [<ffffffff8167b9a0>] ? rest_init+0x80/0x80
[   12.292516] Code: ff ff ff e9 31 ff ff ff 0f 1f 40 00 55 48 89 e5
41 55 49 89 f5 41 54 49 89 d4 53 48 89 fb 48 83 ec 08 4c 8b 42 08 49
39 f0 75 2e <4d> 8b 45 00 4d 39 c4 75 6c 4c 39 e3 74 42 4c 39 eb 74 3d
49 89
[   12.301447] RIP  [<ffffffff813326ef>] __list_add+0x1f/0xc0
[   12.303331]  RSP <ffff88023691b968>
[   12.304516] CR2: 0000000000000000
[   12.305657] ---[ end trace 20a8e3deaab91b75 ]---

I think the the
add_iommu_group->amd_iommu_add_device->init_iommu_group->iommu_group_get_for_dev->iommu_group_add_device->iommu_group_create_direct_mappings->iommu_map->amd_iommu_map->mutex_lock(&domain->api_lock)

but the is initialized amd_iommu_domain_alloc->dma_ops_domain_alloc,
which has not initialized the api_lock of protect_domain, so got the
panic.


Thanks,

Xu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: dma_ops_domain_alloc causes kernel 4.1.0-next-20150626+ panic
       [not found]         ` <CAPBX1xLA_GDeoi9wq-9A7njwzL3NBqJYYT_PqhwEzBAg=9=8kA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-06-30  7:44           ` Joerg Roedel
       [not found]             ` <20150630074454.GO18569-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Joerg Roedel @ 2015-06-30  7:44 UTC (permalink / raw)
  To: George Wang; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue, Jun 30, 2015 at 11:55:24AM +0800, George Wang wrote:
> [   11.734774] Call Trace:
> [   11.735635]  [<ffffffff8168fde1>] __mutex_lock_slowpath+0x91/0x120
> [   11.737676]  [<ffffffff81318798>] ? ida_simple_get+0x98/0x100
> [   11.739682]  [<ffffffff8168fe93>] mutex_lock+0x23/0x37
> [   11.741407]  [<ffffffff8143513a>] amd_iommu_map+0x4a/0x1b0
> [   11.743293]  [<ffffffff8143081a>] iommu_map+0xfa/0x200
> [   11.745025]  [<ffffffff81431587>] iommu_group_add_device+0x327/0x390
> [   11.747184]  [<ffffffff814316fb>] iommu_group_get_forv+0x10b/0x1f0
> [   11.849564]  [<ffffffff81436ac6>] amd_iommu_add_device+0x1b6/0x580

Ah, your AMD IOMMU system probably has unity mappings defined in its
ACPI table. I don't have systems with unity mappings defined, so I
couldn't test this. On what system you are running this test (system or
mainboard vendor and type)

Anyway, here is a patch that should fix this issue for you, can you
please test it?

>From a83e7544c3bc1bd843478e0809cc9781e844fd08 Mon Sep 17 00:00:00 2001
From: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org>
Date: Tue, 30 Jun 2015 08:56:11 +0200
Subject: [PATCH] iommu/amd: Introduce protection_domain_init() function

This function contains the common parts between the
initialization of dma_ops_domains and usual protection
domains. This also fixes a long-standing bug which was
uncovered by recent changes, in which the api_lock was not
initialized for dma_ops_domains.

Reported-by: George Wang <xuw2015-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org>
---
 drivers/iommu/amd_iommu.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index c5677ed..cedbf00 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -116,6 +116,7 @@ struct kmem_cache *amd_iommu_irq_cache;
 
 static void update_domain(struct protection_domain *domain);
 static int alloc_passthrough_domain(void);
+static int protection_domain_init(struct protection_domain *domain);
 
 /****************************************************************************
  *
@@ -1880,12 +1881,9 @@ static struct dma_ops_domain *dma_ops_domain_alloc(void)
 	if (!dma_dom)
 		return NULL;
 
-	spin_lock_init(&dma_dom->domain.lock);
-
-	dma_dom->domain.id = domain_id_alloc();
-	if (dma_dom->domain.id == 0)
+	if (protection_domain_init(&dma_dom->domain))
 		goto free_dma_dom;
-	INIT_LIST_HEAD(&dma_dom->domain.dev_list);
+
 	dma_dom->domain.mode = PAGE_MODE_2_LEVEL;
 	dma_dom->domain.pt_root = (void *)get_zeroed_page(GFP_KERNEL);
 	dma_dom->domain.flags = PD_DMA_OPS_MASK;
@@ -2915,6 +2913,18 @@ static void protection_domain_free(struct protection_domain *domain)
 	kfree(domain);
 }
 
+static int protection_domain_init(struct protection_domain *domain)
+{
+	spin_lock_init(&domain->lock);
+	mutex_init(&domain->api_lock);
+	domain->id = domain_id_alloc();
+	if (!domain->id)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&domain->dev_list);
+
+	return 0;
+}
+
 static struct protection_domain *protection_domain_alloc(void)
 {
 	struct protection_domain *domain;
@@ -2923,12 +2933,8 @@ static struct protection_domain *protection_domain_alloc(void)
 	if (!domain)
 		return NULL;
 
-	spin_lock_init(&domain->lock);
-	mutex_init(&domain->api_lock);
-	domain->id = domain_id_alloc();
-	if (!domain->id)
+	if (protection_domain_init(domain))
 		goto out_err;
-	INIT_LIST_HEAD(&domain->dev_list);
 
 	add_domain_to_list(domain);
 
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: dma_ops_domain_alloc causes kernel 4.1.0-next-20150626+ panic
       [not found]             ` <20150630074454.GO18569-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
@ 2015-07-01  5:20               ` George Wang
       [not found]                 ` <CAPBX1x+OK8EMwDsripY71jF44d73Qv0jBxyM+jJgPMzNVPTyaw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: George Wang @ 2015-07-01  5:20 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue, Jun 30, 2015 at 3:44 PM, Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:
> On Tue, Jun 30, 2015 at 11:55:24AM +0800, George Wang wrote:
>> [   11.734774] Call Trace:
>> [   11.735635]  [<ffffffff8168fde1>] __mutex_lock_slowpath+0x91/0x120
>> [   11.737676]  [<ffffffff81318798>] ? ida_simple_get+0x98/0x100
>> [   11.739682]  [<ffffffff8168fe93>] mutex_lock+0x23/0x37
>> [   11.741407]  [<ffffffff8143513a>] amd_iommu_map+0x4a/0x1b0
>> [   11.743293]  [<ffffffff8143081a>] iommu_map+0xfa/0x200
>> [   11.745025]  [<ffffffff81431587>] iommu_group_add_device+0x327/0x390
>> [   11.747184]  [<ffffffff814316fb>] iommu_group_get_forv+0x10b/0x1f0
>> [   11.849564]  [<ffffffff81436ac6>] amd_iommu_add_device+0x1b6/0x580
>
> Ah, your AMD IOMMU system probably has unity mappings defined in its
> ACPI table. I don't have systems with unity mappings defined, so I
> couldn't test this. On what system you are running this test (system or
> mainboard vendor and type)

I am not clear about the unity-mappings, I will do some learning for it.
I run lspic and dmidecode to get some infos about my machine. I am not
sure whether it is useful to you.
If you want to get information, please let me know.

[root@hp-dl385pg8-09 linux-next]# lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890
Northbridge only dual slot (2x16) PCI-e GFX Hydra part (rev 02)
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD/ATI] RD990 I/O Memory
Management Unit (IOMMU)
00:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI
to PCI bridge (PCI express gpp port B)
00:0a.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI
to PCI bridge (external gfx1 port A)
00:0c.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890S PCI
Express bridge for GPP2 port 1
00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 SATA Controller [IDE mode]
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
--snip--

[root@hp-dl385pg8-09 linux-next]# dmidecode|grep -A16 "System Information"
System Information
    Manufacturer: HP
    Product Name: ProLiant DL385p Gen8
    Version: Not Specified
    Serial Number: 6CU428FNLL
    UUID: 32333536-3330-4336-5534-3238464E4C4C
    Wake-up Type: Power Switch
    SKU Number: 653203-B21
    Family: ProLiant

>
> Anyway, here is a patch that should fix this issue for you, can you
> please test it?

Thanks for you work. Apply this patch, and it works good for me.

Thanks,

George

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: dma_ops_domain_alloc causes kernel 4.1.0-next-20150626+ panic
       [not found]                 ` <CAPBX1x+OK8EMwDsripY71jF44d73Qv0jBxyM+jJgPMzNVPTyaw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-07-01  6:44                   ` Joerg Roedel
  0 siblings, 0 replies; 6+ messages in thread
From: Joerg Roedel @ 2015-07-01  6:44 UTC (permalink / raw)
  To: George Wang; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hi George,

On Wed, Jul 01, 2015 at 01:20:59PM +0800, George Wang wrote:
> [root@hp-dl385pg8-09 linux-next]# dmidecode|grep -A16 "System Information"
> System Information
>     Manufacturer: HP
>     Product Name: ProLiant DL385p Gen8
>     Version: Not Specified
>     Serial Number: 6CU428FNLL
>     UUID: 32333536-3330-4336-5534-3238464E4C4C
>     Wake-up Type: Power Switch
>     SKU Number: 653203-B21
>     Family: ProLiant

Thanks for that info, so its HP hardware which has it.

> Thanks for you work. Apply this patch, and it works good for me.

Thanks for testing, I send the fix upstream asap.


	Joerg

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-07-01  6:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-29 17:44 dma_ops_domain_alloc causes kernel 4.1.0-next-20150626+ panic George Wang
     [not found] ` <CAPBX1x+zagVVYebbXU0M7VkEaDkzvqBGnkt6PW_N42fRQRQ9Gg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-29 19:34   ` Joerg Roedel
     [not found]     ` <20150629193402.GM18569-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-06-30  3:55       ` George Wang
     [not found]         ` <CAPBX1xLA_GDeoi9wq-9A7njwzL3NBqJYYT_PqhwEzBAg=9=8kA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-30  7:44           ` Joerg Roedel
     [not found]             ` <20150630074454.GO18569-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-07-01  5:20               ` George Wang
     [not found]                 ` <CAPBX1x+OK8EMwDsripY71jF44d73Qv0jBxyM+jJgPMzNVPTyaw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-07-01  6:44                   ` Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox