public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* git-latest: kernel oops in IOMMU setup
@ 2009-01-08 20:05 Dirk Hohndel
  2009-01-08 21:41 ` Grant Grundler
  0 siblings, 1 reply; 14+ messages in thread
From: Dirk Hohndel @ 2009-01-08 20:05 UTC (permalink / raw)
  To: Ingo Molnar, iommu, linux-pci, linux-kernel, Jesse Barnes,
	Arjan van de Ven


latest git from Linus. On a Thinkpad x200s with VT-d enabled (if I
disable VT-d, this of course goes away). 

The oops happens very early during boot in device_to_iommu (called
from domain_context_mapping_one).

Looking at the code dump and the disassembled function here's where
the error happens:

static struct intel_iommu *device_to_iommu(u8 bus, u8 devfn)
{
        struct dmar_drhd_unit *drhd = NULL;
        int i;

        for_each_drhd_unit(drhd) {
                if (drhd->ignored)
                        continue;

                for (i = 0; i < drhd->devices_cnt; i++)
                        if (drhd->devices[i]->bus->number == bus &&
--> drhd->devices[0] is NULL
                            drhd->devices[i]->devfn == devfn)
                                return drhd->iommu;


Given how early this happens it's a little hard to provide logs, etc. I
literally used delay_boot=100 and wrote things down by hand (forgot my
digital camera) and then added printk's to verify).

please let me know what other data I should collect.

The system ran fine with the 2.6.28 release kernel.

/D

-- 
Dirk Hohndel
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: git-latest: kernel oops in IOMMU setup
  2009-01-08 20:05 git-latest: kernel oops in IOMMU setup Dirk Hohndel
@ 2009-01-08 21:41 ` Grant Grundler
  2009-01-08 21:56   ` Dirk Hohndel
  2009-01-09  0:58   ` Han, Weidong
  0 siblings, 2 replies; 14+ messages in thread
From: Grant Grundler @ 2009-01-08 21:41 UTC (permalink / raw)
  To: Dirk Hohndel
  Cc: Ingo Molnar, iommu, linux-pci, linux-kernel, Jesse Barnes,
	Arjan van de Ven

On Thu, Jan 08, 2009 at 12:05:38PM -0800, Dirk Hohndel wrote:
> 
> latest git from Linus. On a Thinkpad x200s with VT-d enabled (if I
> disable VT-d, this of course goes away). 
> 
> The oops happens very early during boot in device_to_iommu (called
> from domain_context_mapping_one).
> 
> Looking at the code dump and the disassembled function here's where
> the error happens:
> 
> static struct intel_iommu *device_to_iommu(u8 bus, u8 devfn)
> {
>         struct dmar_drhd_unit *drhd = NULL;
>         int i;
> 
>         for_each_drhd_unit(drhd) {
>                 if (drhd->ignored)
>                         continue;
> 
>                 for (i = 0; i < drhd->devices_cnt; i++)
>                         if (drhd->devices[i]->bus->number == bus &&
> --> drhd->devices[0] is NULL
>                             drhd->devices[i]->devfn == devfn)
>                                 return drhd->iommu;
> 
> 
> Given how early this happens it's a little hard to provide logs, etc. I
> literally used delay_boot=100 and wrote things down by hand (forgot my
> digital camera) and then added printk's to verify).
> 
> please let me know what other data I should collect.

If you can, a back trace. Basically just need to know which caller
is tripping over this. But there can't be that many callers and they
are all in this file:
0 intel-iommu.c device_to_iommu            431 static struct intel_iommu *device_to_iommu(u8 bus, u8 devfn)
1 intel-iommu.c domain_context_mapping_on 1471 iommu = device_to_iommu(bus, devfn);
2 intel-iommu.c domain_context_mapped     1593 iommu = device_to_iommu(pdev->bus->number, pdev->devfn);
3 intel-iommu.c domain_remove_dev_info    1684 iommu = device_to_iommu(info->bus, info->devfn);
4 intel-iommu.c vm_domain_remove_one_dev_ 2773 iommu = device_to_iommu(pdev->bus->number, pdev->devfn);
5 intel-iommu.c vm_domain_remove_one_dev_ 2803 if (device_to_iommu(info->bus, info->devfn) == iommu)
6 intel-iommu.c vm_domain_remove_all_dev_ 2836 iommu = device_to_iommu(info->bus, info->devfn);
7 intel-iommu.c intel_iommu_attach_device 3023 iommu = device_to_iommu(pdev->bus->number, pdev->devfn);

so it should be possible to figure out which one is called
before the dev is setup. It's unlikely to be anything with
"remove" in the name. :)

My guess is it's intel_iommu_attach_device being called "too early".

hth,
grant


hth,
grant

> 
> The system ran fine with the 2.6.28 release kernel.
> 
> /D
> 
> -- 
> Dirk Hohndel
> Intel Open Source Technology Center
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: git-latest: kernel oops in IOMMU setup
  2009-01-08 21:41 ` Grant Grundler
@ 2009-01-08 21:56   ` Dirk Hohndel
  2009-01-09  0:58   ` Han, Weidong
  1 sibling, 0 replies; 14+ messages in thread
From: Dirk Hohndel @ 2009-01-08 21:56 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Ingo Molnar, iommu, linux-pci, linux-kernel, Jesse Barnes,
	Arjan van de Ven

On Thu, 8 Jan 2009 14:41:16 -0700
Grant Grundler <grundler@parisc-linux.org> wrote:

> On Thu, Jan 08, 2009 at 12:05:38PM -0800, Dirk Hohndel wrote:
> > 
> > latest git from Linus. On a Thinkpad x200s with VT-d enabled (if I
> > disable VT-d, this of course goes away). 
> > 
> > The oops happens very early during boot in device_to_iommu (called
> > from domain_context_mapping_one).

Look here, that's where it's called from.
Do you want me to note down the complete backtrace?


> If you can, a back trace. Basically just need to know which caller
> is tripping over this. But there can't be that many callers and they
> are all in this file:
> ...
> so it should be possible to figure out which one is called
> before the dev is setup. It's unlikely to be anything with
> "remove" in the name. :)

correct - it's context_mapping_one

-- 
Dirk Hohndel
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: git-latest: kernel oops in IOMMU setup
  2009-01-08 21:41 ` Grant Grundler
  2009-01-08 21:56   ` Dirk Hohndel
@ 2009-01-09  0:58   ` Han, Weidong
  2009-01-09  2:05     ` Dirk Hohndel
  1 sibling, 1 reply; 14+ messages in thread
From: Han, Weidong @ 2009-01-09  0:58 UTC (permalink / raw)
  To: 'Grant Grundler', 'Dirk Hohndel'
  Cc: 'linux-pci@vger.kernel.org',
	'linux-kernel@vger.kernel.org', 'Jesse Barnes',
	'iommu@lists.linux-foundation.org', 'Ingo Molnar',
	'Arjan van de Ven'

Grant Grundler wrote:
> On Thu, Jan 08, 2009 at 12:05:38PM -0800, Dirk Hohndel wrote:
>> 
>> latest git from Linus. On a Thinkpad x200s with VT-d enabled (if I
>> disable VT-d, this of course goes away).
>> 
>> The oops happens very early during boot in device_to_iommu (called
>> from domain_context_mapping_one).
>> 
>> Looking at the code dump and the disassembled function here's where
>> the error happens: 
>> 
>> static struct intel_iommu *device_to_iommu(u8 bus, u8 devfn) {
>>         struct dmar_drhd_unit *drhd = NULL;
>>         int i;
>> 
>>         for_each_drhd_unit(drhd) {
>>                 if (drhd->ignored)
>>                         continue;
>> 
>>                 for (i = 0; i < drhd->devices_cnt; i++)
>>                         if (drhd->devices[i]->bus->number == bus &&
>>                             --> drhd->devices[0] is NULL
>>                                 drhd->devices[i]->devfn == devfn)
>> return drhd->iommu; 
>> 
>> 
>> Given how early this happens it's a little hard to provide logs,
>> etc. I literally used delay_boot=100 and wrote things down by hand
>> (forgot my digital camera) and then added printk's to verify).
>> 
>> please let me know what other data I should collect.
> 
> If you can, a back trace. Basically just need to know which caller
> is tripping over this. But there can't be that many callers and they
> are all in this file:
> 0 intel-iommu.c device_to_iommu            431 static struct
> intel_iommu *device_to_iommu(u8 bus, u8 devfn) 1 intel-iommu.c
> domain_context_mapping_on 1471 iommu = device_to_iommu(bus, devfn); 2
> intel-iommu.c domain_context_mapped     1593 iommu =
> device_to_iommu(pdev->bus->number, pdev->devfn); 3 intel-iommu.c
> domain_remove_dev_info    1684 iommu = device_to_iommu(info->bus,
> info->devfn); 4 intel-iommu.c vm_domain_remove_one_dev_ 2773 iommu =
> device_to_iommu(pdev->bus->number, pdev->devfn); 5 intel-iommu.c
> vm_domain_remove_one_dev_ 2803 if (device_to_iommu(info->bus,
> info->devfn) == iommu) 6 intel-iommu.c vm_domain_remove_all_dev_ 2836
> iommu = device_to_iommu(info->bus, info->devfn); 7 intel-iommu.c
> intel_iommu_attach_device 3023 iommu =
> device_to_iommu(pdev->bus->number, pdev->devfn);     
> 
> so it should be possible to figure out which one is called
> before the dev is setup. It's unlikely to be anything with
> "remove" in the name. :)
> 
> My guess is it's intel_iommu_attach_device being called "too early".

yes, pls get the call trace. When device_to_iommu() is called, DMAR should be already parsed from acpi table and registered, so device_to_iommu() should not fail unless it's called earlier than DMAR is parsed and registered.

Regards,
Weidong

> 
> hth,
> grant
> 
> 
> hth,
> grant
> 
>> 
>> The system ran fine with the 2.6.28 release kernel.
>> 
>> /D
>> 
>> --
>> Dirk Hohndel
>> Intel Open Source Technology Center
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci"
>> in the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/iommu


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: git-latest: kernel oops in IOMMU setup
  2009-01-09  0:58   ` Han, Weidong
@ 2009-01-09  2:05     ` Dirk Hohndel
  2009-01-09  4:52       ` Dirk Hohndel
  0 siblings, 1 reply; 14+ messages in thread
From: Dirk Hohndel @ 2009-01-09  2:05 UTC (permalink / raw)
  To: Han, Weidong
  Cc: 'Grant Grundler', 'linux-pci@vger.kernel.org',
	'linux-kernel@vger.kernel.org', 'Jesse Barnes',
	'iommu@lists.linux-foundation.org', 'Ingo Molnar',
	'Arjan van de Ven'

On Fri, 9 Jan 2009 08:58:46 +0800 "Han, Weidong"
<weidong.han@intel.com> wrote:
> >> 
> >> The oops happens very early during boot in device_to_iommu (called
> >> from domain_context_mapping_one).
> >> 
> >> Looking at the code dump and the disassembled function here's where
> >> the error happens: 
> >> 
> >> static struct intel_iommu *device_to_iommu(u8 bus, u8 devfn) {
> >>         struct dmar_drhd_unit *drhd = NULL;
> >>         int i;
> >> 
> >>         for_each_drhd_unit(drhd) {
> >>                 if (drhd->ignored)
> >>                         continue;
> >> 
> >>                 for (i = 0; i < drhd->devices_cnt; i++)
> >>                         if (drhd->devices[i]->bus->number == bus &&
> >>                             --> drhd->devices[0] is NULL
> >>                                 drhd->devices[i]->devfn == devfn)
> >> return drhd->iommu; 
> >> 
> >> 
> >> Given how early this happens it's a little hard to provide logs,
> >> etc. I literally used delay_boot=100 and wrote things down by hand
> >> (forgot my digital camera) and then added printk's to verify).
> >> 
> >> please let me know what other data I should collect.
> > 
> yes, pls get the call trace. When device_to_iommu() is called, DMAR
> should be already parsed from acpi table and registered, so
> device_to_iommu() should not fail unless it's called earlier than
> DMAR is parsed and registered.

I updated to Linus' latest git (as your description made me wonder if
the async stuff might play a role here). I still get an oops - but at
a different spot and the system no longer hangs - it partly recovers
(but things aren't too well - for example my USB keyboard / mouse don't
work anymore). 

Here's the oops:

Jan  8 17:51:00 dhohndel-mobl4 kernel: [   12.359578] ------------[ cut here ]------------
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   12.410579] WARNING: at arch/x86/mm/ioremap.c:240 __ioremap_caller+0x150/0x2bd()
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   12.461578] Hardware name: 7465CTO
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   12.512578] Modules linked in:
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   12.614579] Pid: 1, comm: swapper Not tainted 2.6.28 #12
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   12.665578] Call Trace:
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   12.767581]  [<ffffffff81038b49>] warn_slowpath+0xb1/0xed
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   12.869580]  [<ffffffff81028319>] ? change_page_attr_set_clr+0x13e/0x2e6
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   12.971580]  [<ffffffff810275b2>] __ioremap_caller+0x150/0x2bd
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   13.073581]  [<ffffffff81158363>] ? alloc_iommu+0x140/0x181
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   13.175580]  [<ffffffff810277f2>] ioremap_nocache+0x12/0x14
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   13.277580]  [<ffffffff81158363>] alloc_iommu+0x140/0x181
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   13.379581]  [<ffffffff8166a5d6>] dmar_table_init+0x115/0x265
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   13.481580]  [<ffffffff8165687b>] ? pci_iommu_init+0x0/0x17
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   13.583580]  [<ffffffff8166abb1>] intel_iommu_init+0x16/0x8f3
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   13.685581]  [<ffffffff813ce372>] ? mutex_lock+0x11/0x23
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   13.787581]  [<ffffffff813bb9d1>] ? sysctl_net_init+0x1b/0x1f
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   13.889580]  [<ffffffff8165687b>] ? pci_iommu_init+0x0/0x17
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   13.991580]  [<ffffffff81656884>] pci_iommu_init+0x9/0x17
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   14.093581]  [<ffffffff81009056>] _stext+0x56/0x12b
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   14.195581]  [<ffffffff81071220>] ? register_irq_proc+0xa3/0xbf
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   14.297582]  [<ffffffff810e0000>] ? proc_coredump_filter_write+0xe0/0xfe
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   14.399581]  [<ffffffff8164e673>] kernel_init+0x139/0x191
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   14.501581]  [<ffffffff8100d27a>] child_rip+0xa/0x20
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   14.603581]  [<ffffffff8164e53a>] ? kernel_init+0x0/0x191
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   14.705581]  [<ffffffff8100d270>] ? child_rip+0x0/0x20
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   14.756580] ---[ end trace 4eaa2a86a8e2da22 ]---
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   14.807580] IOMMU: can't map the region
Jan  8 17:51:00 dhohndel-mobl4 kernel: [   14.858580] DMAR:parse DMAR table failure.

later in the log file I find lots of these:

Jan  8 17:51:00 dhohndel-mobl4 kernel: [   40.403251] nommu_map_single: overflow 13a08b248+8 of device mask ffffffff

and finally

Jan  8 17:51:00 dhohndel-mobl4 kernel: [   66.777166] hub 4-0:1.0: unable to enumerate USB device on port 2

/D

-- 
Dirk Hohndel
Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: git-latest: kernel oops in IOMMU setup
  2009-01-09  2:05     ` Dirk Hohndel
@ 2009-01-09  4:52       ` Dirk Hohndel
  2009-01-09  6:53         ` Han, Weidong
  0 siblings, 1 reply; 14+ messages in thread
From: Dirk Hohndel @ 2009-01-09  4:52 UTC (permalink / raw)
  To: Dirk Hohndel
  Cc: Han, Weidong, 'Grant Grundler',
	'linux-pci@vger.kernel.org',
	'linux-kernel@vger.kernel.org', 'Jesse Barnes',
	'iommu@lists.linux-foundation.org', 'Ingo Molnar',
	'Arjan van de Ven'

On Thu, 8 Jan 2009 18:05:15 -0800
Dirk Hohndel <hohndel@infradead.org> wrote:

> On Fri, 9 Jan 2009 08:58:46 +0800 "Han, Weidong"
> 
> I updated to Linus' latest git (as your description made me wonder if
> the async stuff might play a role here). I still get an oops - but at
> a different spot and the system no longer hangs - it partly recovers
> (but things aren't too well - for example my USB keyboard / mouse
> don't work anymore). 

Spoke too soon. Rebooted and had the same hard lockup again. This time
I had my camera within reach, so here's the trace:

device_to_iommu+0x33/0x73
domain_context_mapping_one+0x37/0x335
domain_context_mapping+0x25/0xa7
iommu_prepare_identity+0xd7/0xf3
intel_iommu_init+0x4e4/0x8f3
? mutex_lock
? sysctl_net_init
? pci_iommu_init
pci_iommu_init

I also have stack, code and register values. Let me know if you need
them. Or I can just post the picture :-)

Again, very latest git tree, VT-d enabled.

/D

-- 
Dirk Hohndel
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: git-latest: kernel oops in IOMMU setup
  2009-01-09  4:52       ` Dirk Hohndel
@ 2009-01-09  6:53         ` Han, Weidong
  2009-01-09 15:08           ` Dirk Hohndel
  0 siblings, 1 reply; 14+ messages in thread
From: Han, Weidong @ 2009-01-09  6:53 UTC (permalink / raw)
  To: 'Dirk Hohndel'
  Cc: 'Grant Grundler', 'linux-pci@vger.kernel.org',
	'linux-kernel@vger.kernel.org', 'Jesse Barnes',
	'iommu@lists.linux-foundation.org', 'Ingo Molnar',
	'Arjan van de Ven'

Dirk Hohndel wrote:
> On Thu, 8 Jan 2009 18:05:15 -0800
> Dirk Hohndel <hohndel@infradead.org> wrote:
> 
>> On Fri, 9 Jan 2009 08:58:46 +0800 "Han, Weidong"
>> 
>> I updated to Linus' latest git (as your description made me wonder if
>> the async stuff might play a role here). I still get an oops - but at
>> a different spot and the system no longer hangs - it partly recovers
>> (but things aren't too well - for example my USB keyboard / mouse
>> don't work anymore).
> 
> Spoke too soon. Rebooted and had the same hard lockup again. This time
> I had my camera within reach, so here's the trace:
> 
> device_to_iommu+0x33/0x73
> domain_context_mapping_one+0x37/0x335
> domain_context_mapping+0x25/0xa7
> iommu_prepare_identity+0xd7/0xf3
> intel_iommu_init+0x4e4/0x8f3
> ? mutex_lock
> ? sysctl_net_init
> ? pci_iommu_init
> pci_iommu_init
> 
> I also have stack, code and register values. Let me know if you need
> them. Or I can just post the picture :-)
> 
> Again, very latest git tree, VT-d enabled.
> 
> /D

I tried latest git tree, it works for me. Above call trace looks right. 

Regards,
Weidong

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: git-latest: kernel oops in IOMMU setup
  2009-01-09  6:53         ` Han, Weidong
@ 2009-01-09 15:08           ` Dirk Hohndel
  2009-01-09 16:16             ` Zhao, Yu
  0 siblings, 1 reply; 14+ messages in thread
From: Dirk Hohndel @ 2009-01-09 15:08 UTC (permalink / raw)
  To: Han, Weidong
  Cc: 'Grant Grundler', 'linux-pci@vger.kernel.org',
	'linux-kernel@vger.kernel.org', 'Jesse Barnes',
	'iommu@lists.linux-foundation.org', 'Ingo Molnar',
	'Arjan van de Ven'

On Fri, 9 Jan 2009 14:53:14 +0800
"Han, Weidong" <weidong.han@intel.com> wrote:

> Dirk Hohndel wrote:
> > On Thu, 8 Jan 2009 18:05:15 -0800
> > Dirk Hohndel <hohndel@infradead.org> wrote:
> > 
> >> On Fri, 9 Jan 2009 08:58:46 +0800 "Han, Weidong"
> >> 
> >> I updated to Linus' latest git (as your description made me wonder
> >> if the async stuff might play a role here). I still get an oops -
> >> but at a different spot and the system no longer hangs - it partly
> >> recovers (but things aren't too well - for example my USB
> >> keyboard / mouse don't work anymore).
> > 
> > Spoke too soon. Rebooted and had the same hard lockup again. This
> > time I had my camera within reach, so here's the trace:
> > 
> > device_to_iommu+0x33/0x73
> > domain_context_mapping_one+0x37/0x335
> > domain_context_mapping+0x25/0xa7
> > iommu_prepare_identity+0xd7/0xf3
> > intel_iommu_init+0x4e4/0x8f3
> > ? mutex_lock
> > ? sysctl_net_init
> > ? pci_iommu_init
> > pci_iommu_init
> > 
> > I also have stack, code and register values. Let me know if you need
> > them. Or I can just post the picture :-)
> > 
> > Again, very latest git tree, VT-d enabled.
> > 
> > /D
> 
> I tried latest git tree, it works for me. Above call trace looks
> right. 

Spent some more time reading the code. Can't quite claim to understand
all of it, yet, but I notice that most everywhere else drhd->devices[i]
is checked to be != NULL before it is accessed. Why is it safe not to
do that in device_to_iommu()?

Would the patch below be a valid fix? It stops my system from hanging at
boot. But I wonder if there is an assertion that if drhd->ignored is 0
then drhd->devices[0..drhd->device_cnt] is known to be != NULL and
therefore this test is just hiding a bug somewhere else...

/D

Signed-off-by: Dirk Hohndel <hohndel@linux.intel.com>
---
 drivers/pci/intel-iommu.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 235fb7a..3dfecb2 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -438,7 +438,8 @@ static struct intel_iommu *device_to_iommu(u8 bus,
u8 devfn) continue;
 
 		for (i = 0; i < drhd->devices_cnt; i++)
-			if (drhd->devices[i]->bus->number == bus &&
+			if (drhd->devices[i] &&
+			    drhd->devices[i]->bus->number == bus &&
 			    drhd->devices[i]->devfn == devfn)
 				return drhd->iommu;
 
-- 
1.6.0.6


-- 
Dirk Hohndel
Intel Open Source Technology Center

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: git-latest: kernel oops in IOMMU setup
  2009-01-09 15:08           ` Dirk Hohndel
@ 2009-01-09 16:16             ` Zhao, Yu
  2009-01-09 16:34               ` Dirk Hohndel
  0 siblings, 1 reply; 14+ messages in thread
From: Zhao, Yu @ 2009-01-09 16:16 UTC (permalink / raw)
  To: Dirk Hohndel
  Cc: Han, Weidong, 'Grant Grundler',
	'linux-pci@vger.kernel.org',
	'linux-kernel@vger.kernel.org', 'Jesse Barnes',
	'iommu@lists.linux-foundation.org', 'Ingo Molnar',
	'Arjan van de Ven'

Dirk Hohndel wrote:
> On Fri, 9 Jan 2009 14:53:14 +0800
> "Han, Weidong" <weidong.han@intel.com> wrote:
> 
>> Dirk Hohndel wrote:
>>> On Thu, 8 Jan 2009 18:05:15 -0800
>>> Dirk Hohndel <hohndel@infradead.org> wrote:
>>>
>>>> On Fri, 9 Jan 2009 08:58:46 +0800 "Han, Weidong"
>>>>
>>>> I updated to Linus' latest git (as your description made me wonder
>>>> if the async stuff might play a role here). I still get an oops -
>>>> but at a different spot and the system no longer hangs - it partly
>>>> recovers (but things aren't too well - for example my USB
>>>> keyboard / mouse don't work anymore).
>>> Spoke too soon. Rebooted and had the same hard lockup again. This
>>> time I had my camera within reach, so here's the trace:
>>>
>>> device_to_iommu+0x33/0x73
>>> domain_context_mapping_one+0x37/0x335
>>> domain_context_mapping+0x25/0xa7
>>> iommu_prepare_identity+0xd7/0xf3
>>> intel_iommu_init+0x4e4/0x8f3
>>> ? mutex_lock
>>> ? sysctl_net_init
>>> ? pci_iommu_init
>>> pci_iommu_init
>>>
>>> I also have stack, code and register values. Let me know if you need
>>> them. Or I can just post the picture :-)
>>>
>>> Again, very latest git tree, VT-d enabled.
>>>
>>> /D
>> I tried latest git tree, it works for me. Above call trace looks
>> right. 
> 
> Spent some more time reading the code. Can't quite claim to understand
> all of it, yet, but I notice that most everywhere else drhd->devices[i]
> is checked to be != NULL before it is accessed. Why is it safe not to
> do that in device_to_iommu()?
> 
> Would the patch below be a valid fix? It stops my system from hanging at
> boot. But I wonder if there is an assertion that if drhd->ignored is 0
> then drhd->devices[0..drhd->device_cnt] is known to be != NULL and
> therefore this test is just hiding a bug somewhere else...
> 
> /D
> 
> Signed-off-by: Dirk Hohndel <hohndel@linux.intel.com>
> ---
>  drivers/pci/intel-iommu.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> index 235fb7a..3dfecb2 100644
> --- a/drivers/pci/intel-iommu.c
> +++ b/drivers/pci/intel-iommu.c
> @@ -438,7 +438,8 @@ static struct intel_iommu *device_to_iommu(u8 bus,
> u8 devfn) continue;
>  
>  		for (i = 0; i < drhd->devices_cnt; i++)
> -			if (drhd->devices[i]->bus->number == bus &&
> +			if (drhd->devices[i] &&
> +			    drhd->devices[i]->bus->number == bus &&
>  			    drhd->devices[i]->devfn == devfn)
>  				return drhd->iommu;
>  

Did you see following in the kernel message?
                 printk(KERN_WARNING PREFIX
                 "Device scope device [%04x:%02x:%02x.%02x] not found\n",
                 segment, scope->bus, path->dev, path->fn);

If yes, then
		Acked-by: Yu Zhao <yu.zhao@intel.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: git-latest: kernel oops in IOMMU setup
  2009-01-09 16:16             ` Zhao, Yu
@ 2009-01-09 16:34               ` Dirk Hohndel
  2009-01-09 16:45                 ` Zhao, Yu
  0 siblings, 1 reply; 14+ messages in thread
From: Dirk Hohndel @ 2009-01-09 16:34 UTC (permalink / raw)
  To: Zhao, Yu
  Cc: Han, Weidong, 'Grant Grundler',
	'linux-pci@vger.kernel.org',
	'linux-kernel@vger.kernel.org', 'Jesse Barnes',
	'iommu@lists.linux-foundation.org', 'Ingo Molnar',
	'Arjan van de Ven'

On Sat, 10 Jan 2009 00:16:22 +0800
"Zhao, Yu" <yu.zhao@intel.com> wrote:

> Dirk Hohndel wrote:
> > On Fri, 9 Jan 2009 14:53:14 +0800
> > "Han, Weidong" <weidong.han@intel.com> wrote:
> > 
> >> Dirk Hohndel wrote:
> >>> On Thu, 8 Jan 2009 18:05:15 -0800
> >>> Dirk Hohndel <hohndel@infradead.org> wrote:
> >>>
> >>>> On Fri, 9 Jan 2009 08:58:46 +0800 "Han, Weidong"
> >>>>
> >>>> I updated to Linus' latest git (as your description made me
> >>>> wonder if the async stuff might play a role here). I still get
> >>>> an oops - but at a different spot and the system no longer hangs
> >>>> - it partly recovers (but things aren't too well - for example
> >>>> my USB keyboard / mouse don't work anymore).
> >>> Spoke too soon. Rebooted and had the same hard lockup again. This
> >>> time I had my camera within reach, so here's the trace:
> >>>
> >>> device_to_iommu+0x33/0x73
> >>> domain_context_mapping_one+0x37/0x335
> >>> domain_context_mapping+0x25/0xa7
> >>> iommu_prepare_identity+0xd7/0xf3
> >>> intel_iommu_init+0x4e4/0x8f3
> >>> ? mutex_lock
> >>> ? sysctl_net_init
> >>> ? pci_iommu_init
> >>> pci_iommu_init
> >>>
> >>> I also have stack, code and register values. Let me know if you
> >>> need them. Or I can just post the picture :-)
> >>>
> >>> Again, very latest git tree, VT-d enabled.
> >>>
> >>> /D
> >> I tried latest git tree, it works for me. Above call trace looks
> >> right. 
> > 
> > Spent some more time reading the code. Can't quite claim to
> > understand all of it, yet, but I notice that most everywhere else
> > drhd->devices[i] is checked to be != NULL before it is accessed.
> > Why is it safe not to do that in device_to_iommu()?
> > 
> > Would the patch below be a valid fix? It stops my system from
> > hanging at boot. But I wonder if there is an assertion that if
> > drhd->ignored is 0 then drhd->devices[0..drhd->device_cnt] is known
> > to be != NULL and therefore this test is just hiding a bug
> > somewhere else...
> > 
> > /D
> > 
> > Signed-off-by: Dirk Hohndel <hohndel@linux.intel.com>
> > ---
> >  drivers/pci/intel-iommu.c |    3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> > index 235fb7a..3dfecb2 100644
> > --- a/drivers/pci/intel-iommu.c
> > +++ b/drivers/pci/intel-iommu.c
> > @@ -438,7 +438,8 @@ static struct intel_iommu *device_to_iommu(u8
> > bus, u8 devfn) continue;
> >  
> >  		for (i = 0; i < drhd->devices_cnt; i++)
> > -			if (drhd->devices[i]->bus->number == bus &&
> > +			if (drhd->devices[i] &&
> > +			    drhd->devices[i]->bus->number == bus &&
> >  			    drhd->devices[i]->devfn == devfn)
> >  				return drhd->iommu;
> >  
> 
> Did you see following in the kernel message?
>                  printk(KERN_WARNING PREFIX
>                  "Device scope device [%04x:%02x:%02x.%02x] not
> found\n", segment, scope->bus, path->dev, path->fn);
> 
> If yes, then
> 		Acked-by: Yu Zhao <yu.zhao@intel.com>

Yes,

DMAR: Device scope device [0000:00:03:02] not found
DMAR: Device scope device [0000:00:03:02] not found
DMAR: Device scope device [0000:00:03:03] not found
DMAR: Device scope device [0000:00:03:03] not found

/D

-- 
Dirk Hohndel
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: git-latest: kernel oops in IOMMU setup
  2009-01-09 16:34               ` Dirk Hohndel
@ 2009-01-09 16:45                 ` Zhao, Yu
  2009-01-09 16:55                   ` Dirk Hohndel
  2009-01-09 16:58                   ` [PATCH] Prevent oops at boot with VT-d Dirk Hohndel
  0 siblings, 2 replies; 14+ messages in thread
From: Zhao, Yu @ 2009-01-09 16:45 UTC (permalink / raw)
  To: Dirk Hohndel
  Cc: Han, Weidong, 'Grant Grundler',
	'linux-pci@vger.kernel.org',
	'linux-kernel@vger.kernel.org', 'Jesse Barnes',
	'iommu@lists.linux-foundation.org', 'Ingo Molnar',
	'Arjan van de Ven'

Dirk Hohndel wrote:
> On Sat, 10 Jan 2009 00:16:22 +0800
> "Zhao, Yu" <yu.zhao@intel.com> wrote:
> 
>> Dirk Hohndel wrote:
>>> On Fri, 9 Jan 2009 14:53:14 +0800
>>> "Han, Weidong" <weidong.han@intel.com> wrote:
>>>
>>>> Dirk Hohndel wrote:
>>>>> On Thu, 8 Jan 2009 18:05:15 -0800
>>>>> Dirk Hohndel <hohndel@infradead.org> wrote:
>>>>>
>>>>>> On Fri, 9 Jan 2009 08:58:46 +0800 "Han, Weidong"
>>>>>>
>>>>>> I updated to Linus' latest git (as your description made me
>>>>>> wonder if the async stuff might play a role here). I still get
>>>>>> an oops - but at a different spot and the system no longer hangs
>>>>>> - it partly recovers (but things aren't too well - for example
>>>>>> my USB keyboard / mouse don't work anymore).
>>>>> Spoke too soon. Rebooted and had the same hard lockup again. This
>>>>> time I had my camera within reach, so here's the trace:
>>>>>
>>>>> device_to_iommu+0x33/0x73
>>>>> domain_context_mapping_one+0x37/0x335
>>>>> domain_context_mapping+0x25/0xa7
>>>>> iommu_prepare_identity+0xd7/0xf3
>>>>> intel_iommu_init+0x4e4/0x8f3
>>>>> ? mutex_lock
>>>>> ? sysctl_net_init
>>>>> ? pci_iommu_init
>>>>> pci_iommu_init
>>>>>
>>>>> I also have stack, code and register values. Let me know if you
>>>>> need them. Or I can just post the picture :-)
>>>>>
>>>>> Again, very latest git tree, VT-d enabled.
>>>>>
>>>>> /D
>>>> I tried latest git tree, it works for me. Above call trace looks
>>>> right. 
>>> Spent some more time reading the code. Can't quite claim to
>>> understand all of it, yet, but I notice that most everywhere else
>>> drhd->devices[i] is checked to be != NULL before it is accessed.
>>> Why is it safe not to do that in device_to_iommu()?
>>>
>>> Would the patch below be a valid fix? It stops my system from
>>> hanging at boot. But I wonder if there is an assertion that if
>>> drhd->ignored is 0 then drhd->devices[0..drhd->device_cnt] is known
>>> to be != NULL and therefore this test is just hiding a bug
>>> somewhere else...
>>>
>>> /D
>>>
>>> Signed-off-by: Dirk Hohndel <hohndel@linux.intel.com>
>>> ---
>>>  drivers/pci/intel-iommu.c |    3 ++-
>>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
>>> index 235fb7a..3dfecb2 100644
>>> --- a/drivers/pci/intel-iommu.c
>>> +++ b/drivers/pci/intel-iommu.c
>>> @@ -438,7 +438,8 @@ static struct intel_iommu *device_to_iommu(u8
>>> bus, u8 devfn) continue;
>>>  
>>>  		for (i = 0; i < drhd->devices_cnt; i++)
>>> -			if (drhd->devices[i]->bus->number == bus &&
>>> +			if (drhd->devices[i] &&
>>> +			    drhd->devices[i]->bus->number == bus &&
>>>  			    drhd->devices[i]->devfn == devfn)
>>>  				return drhd->iommu;
>>>  
>> Did you see following in the kernel message?
>>                  printk(KERN_WARNING PREFIX
>>                  "Device scope device [%04x:%02x:%02x.%02x] not
>> found\n", segment, scope->bus, path->dev, path->fn);
>>
>> If yes, then
>> 		Acked-by: Yu Zhao <yu.zhao@intel.com>
> 
> Yes,
> 
> DMAR: Device scope device [0000:00:03:02] not found
> DMAR: Device scope device [0000:00:03:02] not found
> DMAR: Device scope device [0000:00:03:03] not found
> DMAR: Device scope device [0000:00:03:03] not found

The laptop has a nasty bios, try to update it if you want to get rid of 
these noises... assuming you are luck enough :-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: git-latest: kernel oops in IOMMU setup
  2009-01-09 16:45                 ` Zhao, Yu
@ 2009-01-09 16:55                   ` Dirk Hohndel
  2009-01-09 16:58                   ` [PATCH] Prevent oops at boot with VT-d Dirk Hohndel
  1 sibling, 0 replies; 14+ messages in thread
From: Dirk Hohndel @ 2009-01-09 16:55 UTC (permalink / raw)
  To: Zhao, Yu
  Cc: Han, Weidong, 'Grant Grundler',
	'linux-pci@vger.kernel.org',
	'linux-kernel@vger.kernel.org', 'Jesse Barnes',
	'iommu@lists.linux-foundation.org', 'Ingo Molnar',
	'Arjan van de Ven'

On Sat, 10 Jan 2009 00:45:31 +0800
"Zhao, Yu" <yu.zhao@intel.com> wrote:
> > Yes,
> > 
> > DMAR: Device scope device [0000:00:03:02] not found
> > DMAR: Device scope device [0000:00:03:02] not found
> > DMAR: Device scope device [0000:00:03:03] not found
> > DMAR: Device scope device [0000:00:03:03] not found
> 
> The laptop has a nasty bios, try to update it if you want to get rid
> of these noises... assuming you are luck enough :-)

It's a Lenovo Thinkpad X200s - and I am running the latest BIOS (at
least according to their support website) :-(

 kernel: thinkpad_acpi: ThinkPad ACPI Extras v0.21
 kernel: thinkpad_acpi: http://ibm-acpi.sf.net/
 kernel: thinkpad_acpi: ThinkPad BIOS 6DET33WW (1.10 ), EC 7XHT21WW-1.03
 kernel: thinkpad_acpi: Lenovo ThinkPad X200s, model 7465CTO

/D

-- 
Dirk Hohndel
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] Prevent oops at boot with VT-d
  2009-01-09 16:45                 ` Zhao, Yu
  2009-01-09 16:55                   ` Dirk Hohndel
@ 2009-01-09 16:58                   ` Dirk Hohndel
  2009-01-11 15:25                     ` [Resend][PATCH] " Dirk Hohndel
  1 sibling, 1 reply; 14+ messages in thread
From: Dirk Hohndel @ 2009-01-09 16:58 UTC (permalink / raw)
  To: Zhao, Yu
  Cc: Han, Weidong, 'Grant Grundler',
	'linux-pci@vger.kernel.org',
	'linux-kernel@vger.kernel.org', 'Jesse Barnes',
	'iommu@lists.linux-foundation.org', 'Ingo Molnar',
	'Arjan van de Ven'


Resending with appropriate Subject

Signed-off-by: Dirk Hohndel <hohndel@linux.intel.com>
Acked-by: Yu Zhao <yu.zhao@intel.com>
---
 drivers/pci/intel-iommu.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 235fb7a..3dfecb2 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -438,7 +438,8 @@ static struct intel_iommu *device_to_iommu(u8 bus,
u8 devfn) continue;
 
 		for (i = 0; i < drhd->devices_cnt; i++)
-			if (drhd->devices[i]->bus->number == bus &&
+			if (drhd->devices[i] &&
+			    drhd->devices[i]->bus->number == bus &&
 			    drhd->devices[i]->devfn == devfn)
 				return drhd->iommu;
 
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Resend][PATCH] Prevent oops at boot with VT-d
  2009-01-09 16:58                   ` [PATCH] Prevent oops at boot with VT-d Dirk Hohndel
@ 2009-01-11 15:25                     ` Dirk Hohndel
  0 siblings, 0 replies; 14+ messages in thread
From: Dirk Hohndel @ 2009-01-11 15:25 UTC (permalink / raw)
  To: David Woodhouse, 'Ingo Molnar'
  Cc: Zhao, Yu, 'linux-pci@vger.kernel.org',
	'linux-kernel@vger.kernel.org', 'Jesse Barnes',
	'iommu@lists.linux-foundation.org'

This wasn't included in 2.6.29-rc1

With some broken BIOSs when VT-d is enabled, the data structures are
filled incorrectly. This can cause a NULL pointer dereference in very
early boot. 


Signed-off-by: Dirk Hohndel <hohndel@linux.intel.com>
Acked-by: Yu Zhao <yu.zhao@intel.com>
---
 drivers/pci/intel-iommu.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 235fb7a..3dfecb2 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -438,7 +438,8 @@ static struct intel_iommu *device_to_iommu(u8 bus, u8 devfn)
                        continue;
 
                for (i = 0; i < drhd->devices_cnt; i++)
-                       if (drhd->devices[i]->bus->number == bus &&
+                       if (drhd->devices[i] &&
+                           drhd->devices[i]->bus->number == bus &&
                            drhd->devices[i]->devfn == devfn)
                                return drhd->iommu;
 
-- 
1.6.0.6
-- 
Dirk Hohndel
Intel Open Source Technology Center

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-01-11 15:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-08 20:05 git-latest: kernel oops in IOMMU setup Dirk Hohndel
2009-01-08 21:41 ` Grant Grundler
2009-01-08 21:56   ` Dirk Hohndel
2009-01-09  0:58   ` Han, Weidong
2009-01-09  2:05     ` Dirk Hohndel
2009-01-09  4:52       ` Dirk Hohndel
2009-01-09  6:53         ` Han, Weidong
2009-01-09 15:08           ` Dirk Hohndel
2009-01-09 16:16             ` Zhao, Yu
2009-01-09 16:34               ` Dirk Hohndel
2009-01-09 16:45                 ` Zhao, Yu
2009-01-09 16:55                   ` Dirk Hohndel
2009-01-09 16:58                   ` [PATCH] Prevent oops at boot with VT-d Dirk Hohndel
2009-01-11 15:25                     ` [Resend][PATCH] " Dirk Hohndel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox