* git-latest: kernel oops in IOMMU setup
@ 2009-01-08 20:05 Dirk Hohndel
2009-01-08 21:41 ` Grant Grundler
0 siblings, 1 reply; 14+ messages in thread
From: Dirk Hohndel @ 2009-01-08 20:05 UTC (permalink / raw)
To: Ingo Molnar, iommu, linux-pci, linux-kernel, Jesse Barnes,
Arjan van de Ven
latest git from Linus. On a Thinkpad x200s with VT-d enabled (if I
disable VT-d, this of course goes away).
The oops happens very early during boot in device_to_iommu (called
from domain_context_mapping_one).
Looking at the code dump and the disassembled function here's where
the error happens:
static struct intel_iommu *device_to_iommu(u8 bus, u8 devfn)
{
struct dmar_drhd_unit *drhd = NULL;
int i;
for_each_drhd_unit(drhd) {
if (drhd->ignored)
continue;
for (i = 0; i < drhd->devices_cnt; i++)
if (drhd->devices[i]->bus->number == bus &&
--> drhd->devices[0] is NULL
drhd->devices[i]->devfn == devfn)
return drhd->iommu;
Given how early this happens it's a little hard to provide logs, etc. I
literally used delay_boot=100 and wrote things down by hand (forgot my
digital camera) and then added printk's to verify).
please let me know what other data I should collect.
The system ran fine with the 2.6.28 release kernel.
/D
--
Dirk Hohndel
Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: git-latest: kernel oops in IOMMU setup 2009-01-08 20:05 git-latest: kernel oops in IOMMU setup Dirk Hohndel @ 2009-01-08 21:41 ` Grant Grundler 2009-01-08 21:56 ` Dirk Hohndel 2009-01-09 0:58 ` Han, Weidong 0 siblings, 2 replies; 14+ messages in thread From: Grant Grundler @ 2009-01-08 21:41 UTC (permalink / raw) To: Dirk Hohndel Cc: Ingo Molnar, iommu, linux-pci, linux-kernel, Jesse Barnes, Arjan van de Ven On Thu, Jan 08, 2009 at 12:05:38PM -0800, Dirk Hohndel wrote: > > latest git from Linus. On a Thinkpad x200s with VT-d enabled (if I > disable VT-d, this of course goes away). > > The oops happens very early during boot in device_to_iommu (called > from domain_context_mapping_one). > > Looking at the code dump and the disassembled function here's where > the error happens: > > static struct intel_iommu *device_to_iommu(u8 bus, u8 devfn) > { > struct dmar_drhd_unit *drhd = NULL; > int i; > > for_each_drhd_unit(drhd) { > if (drhd->ignored) > continue; > > for (i = 0; i < drhd->devices_cnt; i++) > if (drhd->devices[i]->bus->number == bus && > --> drhd->devices[0] is NULL > drhd->devices[i]->devfn == devfn) > return drhd->iommu; > > > Given how early this happens it's a little hard to provide logs, etc. I > literally used delay_boot=100 and wrote things down by hand (forgot my > digital camera) and then added printk's to verify). > > please let me know what other data I should collect. If you can, a back trace. Basically just need to know which caller is tripping over this. But there can't be that many callers and they are all in this file: 0 intel-iommu.c device_to_iommu 431 static struct intel_iommu *device_to_iommu(u8 bus, u8 devfn) 1 intel-iommu.c domain_context_mapping_on 1471 iommu = device_to_iommu(bus, devfn); 2 intel-iommu.c domain_context_mapped 1593 iommu = device_to_iommu(pdev->bus->number, pdev->devfn); 3 intel-iommu.c domain_remove_dev_info 1684 iommu = device_to_iommu(info->bus, info->devfn); 4 intel-iommu.c vm_domain_remove_one_dev_ 2773 iommu = device_to_iommu(pdev->bus->number, pdev->devfn); 5 intel-iommu.c vm_domain_remove_one_dev_ 2803 if (device_to_iommu(info->bus, info->devfn) == iommu) 6 intel-iommu.c vm_domain_remove_all_dev_ 2836 iommu = device_to_iommu(info->bus, info->devfn); 7 intel-iommu.c intel_iommu_attach_device 3023 iommu = device_to_iommu(pdev->bus->number, pdev->devfn); so it should be possible to figure out which one is called before the dev is setup. It's unlikely to be anything with "remove" in the name. :) My guess is it's intel_iommu_attach_device being called "too early". hth, grant hth, grant > > The system ran fine with the 2.6.28 release kernel. > > /D > > -- > Dirk Hohndel > Intel Open Source Technology Center > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: git-latest: kernel oops in IOMMU setup 2009-01-08 21:41 ` Grant Grundler @ 2009-01-08 21:56 ` Dirk Hohndel 2009-01-09 0:58 ` Han, Weidong 1 sibling, 0 replies; 14+ messages in thread From: Dirk Hohndel @ 2009-01-08 21:56 UTC (permalink / raw) To: Grant Grundler Cc: Ingo Molnar, iommu, linux-pci, linux-kernel, Jesse Barnes, Arjan van de Ven On Thu, 8 Jan 2009 14:41:16 -0700 Grant Grundler <grundler@parisc-linux.org> wrote: > On Thu, Jan 08, 2009 at 12:05:38PM -0800, Dirk Hohndel wrote: > > > > latest git from Linus. On a Thinkpad x200s with VT-d enabled (if I > > disable VT-d, this of course goes away). > > > > The oops happens very early during boot in device_to_iommu (called > > from domain_context_mapping_one). Look here, that's where it's called from. Do you want me to note down the complete backtrace? > If you can, a back trace. Basically just need to know which caller > is tripping over this. But there can't be that many callers and they > are all in this file: > ... > so it should be possible to figure out which one is called > before the dev is setup. It's unlikely to be anything with > "remove" in the name. :) correct - it's context_mapping_one -- Dirk Hohndel Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: git-latest: kernel oops in IOMMU setup 2009-01-08 21:41 ` Grant Grundler 2009-01-08 21:56 ` Dirk Hohndel @ 2009-01-09 0:58 ` Han, Weidong 2009-01-09 2:05 ` Dirk Hohndel 1 sibling, 1 reply; 14+ messages in thread From: Han, Weidong @ 2009-01-09 0:58 UTC (permalink / raw) To: 'Grant Grundler', 'Dirk Hohndel' Cc: 'linux-pci@vger.kernel.org', 'linux-kernel@vger.kernel.org', 'Jesse Barnes', 'iommu@lists.linux-foundation.org', 'Ingo Molnar', 'Arjan van de Ven' Grant Grundler wrote: > On Thu, Jan 08, 2009 at 12:05:38PM -0800, Dirk Hohndel wrote: >> >> latest git from Linus. On a Thinkpad x200s with VT-d enabled (if I >> disable VT-d, this of course goes away). >> >> The oops happens very early during boot in device_to_iommu (called >> from domain_context_mapping_one). >> >> Looking at the code dump and the disassembled function here's where >> the error happens: >> >> static struct intel_iommu *device_to_iommu(u8 bus, u8 devfn) { >> struct dmar_drhd_unit *drhd = NULL; >> int i; >> >> for_each_drhd_unit(drhd) { >> if (drhd->ignored) >> continue; >> >> for (i = 0; i < drhd->devices_cnt; i++) >> if (drhd->devices[i]->bus->number == bus && >> --> drhd->devices[0] is NULL >> drhd->devices[i]->devfn == devfn) >> return drhd->iommu; >> >> >> Given how early this happens it's a little hard to provide logs, >> etc. I literally used delay_boot=100 and wrote things down by hand >> (forgot my digital camera) and then added printk's to verify). >> >> please let me know what other data I should collect. > > If you can, a back trace. Basically just need to know which caller > is tripping over this. But there can't be that many callers and they > are all in this file: > 0 intel-iommu.c device_to_iommu 431 static struct > intel_iommu *device_to_iommu(u8 bus, u8 devfn) 1 intel-iommu.c > domain_context_mapping_on 1471 iommu = device_to_iommu(bus, devfn); 2 > intel-iommu.c domain_context_mapped 1593 iommu = > device_to_iommu(pdev->bus->number, pdev->devfn); 3 intel-iommu.c > domain_remove_dev_info 1684 iommu = device_to_iommu(info->bus, > info->devfn); 4 intel-iommu.c vm_domain_remove_one_dev_ 2773 iommu = > device_to_iommu(pdev->bus->number, pdev->devfn); 5 intel-iommu.c > vm_domain_remove_one_dev_ 2803 if (device_to_iommu(info->bus, > info->devfn) == iommu) 6 intel-iommu.c vm_domain_remove_all_dev_ 2836 > iommu = device_to_iommu(info->bus, info->devfn); 7 intel-iommu.c > intel_iommu_attach_device 3023 iommu = > device_to_iommu(pdev->bus->number, pdev->devfn); > > so it should be possible to figure out which one is called > before the dev is setup. It's unlikely to be anything with > "remove" in the name. :) > > My guess is it's intel_iommu_attach_device being called "too early". yes, pls get the call trace. When device_to_iommu() is called, DMAR should be already parsed from acpi table and registered, so device_to_iommu() should not fail unless it's called earlier than DMAR is parsed and registered. Regards, Weidong > > hth, > grant > > > hth, > grant > >> >> The system ran fine with the 2.6.28 release kernel. >> >> /D >> >> -- >> Dirk Hohndel >> Intel Open Source Technology Center >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-pci" >> in the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > _______________________________________________ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/iommu ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: git-latest: kernel oops in IOMMU setup 2009-01-09 0:58 ` Han, Weidong @ 2009-01-09 2:05 ` Dirk Hohndel 2009-01-09 4:52 ` Dirk Hohndel 0 siblings, 1 reply; 14+ messages in thread From: Dirk Hohndel @ 2009-01-09 2:05 UTC (permalink / raw) To: Han, Weidong Cc: 'Grant Grundler', 'linux-pci@vger.kernel.org', 'linux-kernel@vger.kernel.org', 'Jesse Barnes', 'iommu@lists.linux-foundation.org', 'Ingo Molnar', 'Arjan van de Ven' On Fri, 9 Jan 2009 08:58:46 +0800 "Han, Weidong" <weidong.han@intel.com> wrote: > >> > >> The oops happens very early during boot in device_to_iommu (called > >> from domain_context_mapping_one). > >> > >> Looking at the code dump and the disassembled function here's where > >> the error happens: > >> > >> static struct intel_iommu *device_to_iommu(u8 bus, u8 devfn) { > >> struct dmar_drhd_unit *drhd = NULL; > >> int i; > >> > >> for_each_drhd_unit(drhd) { > >> if (drhd->ignored) > >> continue; > >> > >> for (i = 0; i < drhd->devices_cnt; i++) > >> if (drhd->devices[i]->bus->number == bus && > >> --> drhd->devices[0] is NULL > >> drhd->devices[i]->devfn == devfn) > >> return drhd->iommu; > >> > >> > >> Given how early this happens it's a little hard to provide logs, > >> etc. I literally used delay_boot=100 and wrote things down by hand > >> (forgot my digital camera) and then added printk's to verify). > >> > >> please let me know what other data I should collect. > > > yes, pls get the call trace. When device_to_iommu() is called, DMAR > should be already parsed from acpi table and registered, so > device_to_iommu() should not fail unless it's called earlier than > DMAR is parsed and registered. I updated to Linus' latest git (as your description made me wonder if the async stuff might play a role here). I still get an oops - but at a different spot and the system no longer hangs - it partly recovers (but things aren't too well - for example my USB keyboard / mouse don't work anymore). Here's the oops: Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 12.359578] ------------[ cut here ]------------ Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 12.410579] WARNING: at arch/x86/mm/ioremap.c:240 __ioremap_caller+0x150/0x2bd() Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 12.461578] Hardware name: 7465CTO Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 12.512578] Modules linked in: Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 12.614579] Pid: 1, comm: swapper Not tainted 2.6.28 #12 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 12.665578] Call Trace: Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 12.767581] [<ffffffff81038b49>] warn_slowpath+0xb1/0xed Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 12.869580] [<ffffffff81028319>] ? change_page_attr_set_clr+0x13e/0x2e6 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 12.971580] [<ffffffff810275b2>] __ioremap_caller+0x150/0x2bd Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.073581] [<ffffffff81158363>] ? alloc_iommu+0x140/0x181 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.175580] [<ffffffff810277f2>] ioremap_nocache+0x12/0x14 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.277580] [<ffffffff81158363>] alloc_iommu+0x140/0x181 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.379581] [<ffffffff8166a5d6>] dmar_table_init+0x115/0x265 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.481580] [<ffffffff8165687b>] ? pci_iommu_init+0x0/0x17 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.583580] [<ffffffff8166abb1>] intel_iommu_init+0x16/0x8f3 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.685581] [<ffffffff813ce372>] ? mutex_lock+0x11/0x23 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.787581] [<ffffffff813bb9d1>] ? sysctl_net_init+0x1b/0x1f Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.889580] [<ffffffff8165687b>] ? pci_iommu_init+0x0/0x17 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.991580] [<ffffffff81656884>] pci_iommu_init+0x9/0x17 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.093581] [<ffffffff81009056>] _stext+0x56/0x12b Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.195581] [<ffffffff81071220>] ? register_irq_proc+0xa3/0xbf Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.297582] [<ffffffff810e0000>] ? proc_coredump_filter_write+0xe0/0xfe Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.399581] [<ffffffff8164e673>] kernel_init+0x139/0x191 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.501581] [<ffffffff8100d27a>] child_rip+0xa/0x20 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.603581] [<ffffffff8164e53a>] ? kernel_init+0x0/0x191 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.705581] [<ffffffff8100d270>] ? child_rip+0x0/0x20 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.756580] ---[ end trace 4eaa2a86a8e2da22 ]--- Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.807580] IOMMU: can't map the region Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.858580] DMAR:parse DMAR table failure. later in the log file I find lots of these: Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 40.403251] nommu_map_single: overflow 13a08b248+8 of device mask ffffffff and finally Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 66.777166] hub 4-0:1.0: unable to enumerate USB device on port 2 /D -- Dirk Hohndel Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: git-latest: kernel oops in IOMMU setup 2009-01-09 2:05 ` Dirk Hohndel @ 2009-01-09 4:52 ` Dirk Hohndel 2009-01-09 6:53 ` Han, Weidong 0 siblings, 1 reply; 14+ messages in thread From: Dirk Hohndel @ 2009-01-09 4:52 UTC (permalink / raw) To: Dirk Hohndel Cc: Han, Weidong, 'Grant Grundler', 'linux-pci@vger.kernel.org', 'linux-kernel@vger.kernel.org', 'Jesse Barnes', 'iommu@lists.linux-foundation.org', 'Ingo Molnar', 'Arjan van de Ven' On Thu, 8 Jan 2009 18:05:15 -0800 Dirk Hohndel <hohndel@infradead.org> wrote: > On Fri, 9 Jan 2009 08:58:46 +0800 "Han, Weidong" > > I updated to Linus' latest git (as your description made me wonder if > the async stuff might play a role here). I still get an oops - but at > a different spot and the system no longer hangs - it partly recovers > (but things aren't too well - for example my USB keyboard / mouse > don't work anymore). Spoke too soon. Rebooted and had the same hard lockup again. This time I had my camera within reach, so here's the trace: device_to_iommu+0x33/0x73 domain_context_mapping_one+0x37/0x335 domain_context_mapping+0x25/0xa7 iommu_prepare_identity+0xd7/0xf3 intel_iommu_init+0x4e4/0x8f3 ? mutex_lock ? sysctl_net_init ? pci_iommu_init pci_iommu_init I also have stack, code and register values. Let me know if you need them. Or I can just post the picture :-) Again, very latest git tree, VT-d enabled. /D -- Dirk Hohndel Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: git-latest: kernel oops in IOMMU setup 2009-01-09 4:52 ` Dirk Hohndel @ 2009-01-09 6:53 ` Han, Weidong 2009-01-09 15:08 ` Dirk Hohndel 0 siblings, 1 reply; 14+ messages in thread From: Han, Weidong @ 2009-01-09 6:53 UTC (permalink / raw) To: 'Dirk Hohndel' Cc: 'Grant Grundler', 'linux-pci@vger.kernel.org', 'linux-kernel@vger.kernel.org', 'Jesse Barnes', 'iommu@lists.linux-foundation.org', 'Ingo Molnar', 'Arjan van de Ven' Dirk Hohndel wrote: > On Thu, 8 Jan 2009 18:05:15 -0800 > Dirk Hohndel <hohndel@infradead.org> wrote: > >> On Fri, 9 Jan 2009 08:58:46 +0800 "Han, Weidong" >> >> I updated to Linus' latest git (as your description made me wonder if >> the async stuff might play a role here). I still get an oops - but at >> a different spot and the system no longer hangs - it partly recovers >> (but things aren't too well - for example my USB keyboard / mouse >> don't work anymore). > > Spoke too soon. Rebooted and had the same hard lockup again. This time > I had my camera within reach, so here's the trace: > > device_to_iommu+0x33/0x73 > domain_context_mapping_one+0x37/0x335 > domain_context_mapping+0x25/0xa7 > iommu_prepare_identity+0xd7/0xf3 > intel_iommu_init+0x4e4/0x8f3 > ? mutex_lock > ? sysctl_net_init > ? pci_iommu_init > pci_iommu_init > > I also have stack, code and register values. Let me know if you need > them. Or I can just post the picture :-) > > Again, very latest git tree, VT-d enabled. > > /D I tried latest git tree, it works for me. Above call trace looks right. Regards, Weidong ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: git-latest: kernel oops in IOMMU setup 2009-01-09 6:53 ` Han, Weidong @ 2009-01-09 15:08 ` Dirk Hohndel 2009-01-09 16:16 ` Zhao, Yu 0 siblings, 1 reply; 14+ messages in thread From: Dirk Hohndel @ 2009-01-09 15:08 UTC (permalink / raw) To: Han, Weidong Cc: 'Grant Grundler', 'linux-pci@vger.kernel.org', 'linux-kernel@vger.kernel.org', 'Jesse Barnes', 'iommu@lists.linux-foundation.org', 'Ingo Molnar', 'Arjan van de Ven' On Fri, 9 Jan 2009 14:53:14 +0800 "Han, Weidong" <weidong.han@intel.com> wrote: > Dirk Hohndel wrote: > > On Thu, 8 Jan 2009 18:05:15 -0800 > > Dirk Hohndel <hohndel@infradead.org> wrote: > > > >> On Fri, 9 Jan 2009 08:58:46 +0800 "Han, Weidong" > >> > >> I updated to Linus' latest git (as your description made me wonder > >> if the async stuff might play a role here). I still get an oops - > >> but at a different spot and the system no longer hangs - it partly > >> recovers (but things aren't too well - for example my USB > >> keyboard / mouse don't work anymore). > > > > Spoke too soon. Rebooted and had the same hard lockup again. This > > time I had my camera within reach, so here's the trace: > > > > device_to_iommu+0x33/0x73 > > domain_context_mapping_one+0x37/0x335 > > domain_context_mapping+0x25/0xa7 > > iommu_prepare_identity+0xd7/0xf3 > > intel_iommu_init+0x4e4/0x8f3 > > ? mutex_lock > > ? sysctl_net_init > > ? pci_iommu_init > > pci_iommu_init > > > > I also have stack, code and register values. Let me know if you need > > them. Or I can just post the picture :-) > > > > Again, very latest git tree, VT-d enabled. > > > > /D > > I tried latest git tree, it works for me. Above call trace looks > right. Spent some more time reading the code. Can't quite claim to understand all of it, yet, but I notice that most everywhere else drhd->devices[i] is checked to be != NULL before it is accessed. Why is it safe not to do that in device_to_iommu()? Would the patch below be a valid fix? It stops my system from hanging at boot. But I wonder if there is an assertion that if drhd->ignored is 0 then drhd->devices[0..drhd->device_cnt] is known to be != NULL and therefore this test is just hiding a bug somewhere else... /D Signed-off-by: Dirk Hohndel <hohndel@linux.intel.com> --- drivers/pci/intel-iommu.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index 235fb7a..3dfecb2 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -438,7 +438,8 @@ static struct intel_iommu *device_to_iommu(u8 bus, u8 devfn) continue; for (i = 0; i < drhd->devices_cnt; i++) - if (drhd->devices[i]->bus->number == bus && + if (drhd->devices[i] && + drhd->devices[i]->bus->number == bus && drhd->devices[i]->devfn == devfn) return drhd->iommu; -- 1.6.0.6 -- Dirk Hohndel Intel Open Source Technology Center ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: git-latest: kernel oops in IOMMU setup 2009-01-09 15:08 ` Dirk Hohndel @ 2009-01-09 16:16 ` Zhao, Yu 2009-01-09 16:34 ` Dirk Hohndel 0 siblings, 1 reply; 14+ messages in thread From: Zhao, Yu @ 2009-01-09 16:16 UTC (permalink / raw) To: Dirk Hohndel Cc: Han, Weidong, 'Grant Grundler', 'linux-pci@vger.kernel.org', 'linux-kernel@vger.kernel.org', 'Jesse Barnes', 'iommu@lists.linux-foundation.org', 'Ingo Molnar', 'Arjan van de Ven' Dirk Hohndel wrote: > On Fri, 9 Jan 2009 14:53:14 +0800 > "Han, Weidong" <weidong.han@intel.com> wrote: > >> Dirk Hohndel wrote: >>> On Thu, 8 Jan 2009 18:05:15 -0800 >>> Dirk Hohndel <hohndel@infradead.org> wrote: >>> >>>> On Fri, 9 Jan 2009 08:58:46 +0800 "Han, Weidong" >>>> >>>> I updated to Linus' latest git (as your description made me wonder >>>> if the async stuff might play a role here). I still get an oops - >>>> but at a different spot and the system no longer hangs - it partly >>>> recovers (but things aren't too well - for example my USB >>>> keyboard / mouse don't work anymore). >>> Spoke too soon. Rebooted and had the same hard lockup again. This >>> time I had my camera within reach, so here's the trace: >>> >>> device_to_iommu+0x33/0x73 >>> domain_context_mapping_one+0x37/0x335 >>> domain_context_mapping+0x25/0xa7 >>> iommu_prepare_identity+0xd7/0xf3 >>> intel_iommu_init+0x4e4/0x8f3 >>> ? mutex_lock >>> ? sysctl_net_init >>> ? pci_iommu_init >>> pci_iommu_init >>> >>> I also have stack, code and register values. Let me know if you need >>> them. Or I can just post the picture :-) >>> >>> Again, very latest git tree, VT-d enabled. >>> >>> /D >> I tried latest git tree, it works for me. Above call trace looks >> right. > > Spent some more time reading the code. Can't quite claim to understand > all of it, yet, but I notice that most everywhere else drhd->devices[i] > is checked to be != NULL before it is accessed. Why is it safe not to > do that in device_to_iommu()? > > Would the patch below be a valid fix? It stops my system from hanging at > boot. But I wonder if there is an assertion that if drhd->ignored is 0 > then drhd->devices[0..drhd->device_cnt] is known to be != NULL and > therefore this test is just hiding a bug somewhere else... > > /D > > Signed-off-by: Dirk Hohndel <hohndel@linux.intel.com> > --- > drivers/pci/intel-iommu.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c > index 235fb7a..3dfecb2 100644 > --- a/drivers/pci/intel-iommu.c > +++ b/drivers/pci/intel-iommu.c > @@ -438,7 +438,8 @@ static struct intel_iommu *device_to_iommu(u8 bus, > u8 devfn) continue; > > for (i = 0; i < drhd->devices_cnt; i++) > - if (drhd->devices[i]->bus->number == bus && > + if (drhd->devices[i] && > + drhd->devices[i]->bus->number == bus && > drhd->devices[i]->devfn == devfn) > return drhd->iommu; > Did you see following in the kernel message? printk(KERN_WARNING PREFIX "Device scope device [%04x:%02x:%02x.%02x] not found\n", segment, scope->bus, path->dev, path->fn); If yes, then Acked-by: Yu Zhao <yu.zhao@intel.com> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: git-latest: kernel oops in IOMMU setup 2009-01-09 16:16 ` Zhao, Yu @ 2009-01-09 16:34 ` Dirk Hohndel 2009-01-09 16:45 ` Zhao, Yu 0 siblings, 1 reply; 14+ messages in thread From: Dirk Hohndel @ 2009-01-09 16:34 UTC (permalink / raw) To: Zhao, Yu Cc: Han, Weidong, 'Grant Grundler', 'linux-pci@vger.kernel.org', 'linux-kernel@vger.kernel.org', 'Jesse Barnes', 'iommu@lists.linux-foundation.org', 'Ingo Molnar', 'Arjan van de Ven' On Sat, 10 Jan 2009 00:16:22 +0800 "Zhao, Yu" <yu.zhao@intel.com> wrote: > Dirk Hohndel wrote: > > On Fri, 9 Jan 2009 14:53:14 +0800 > > "Han, Weidong" <weidong.han@intel.com> wrote: > > > >> Dirk Hohndel wrote: > >>> On Thu, 8 Jan 2009 18:05:15 -0800 > >>> Dirk Hohndel <hohndel@infradead.org> wrote: > >>> > >>>> On Fri, 9 Jan 2009 08:58:46 +0800 "Han, Weidong" > >>>> > >>>> I updated to Linus' latest git (as your description made me > >>>> wonder if the async stuff might play a role here). I still get > >>>> an oops - but at a different spot and the system no longer hangs > >>>> - it partly recovers (but things aren't too well - for example > >>>> my USB keyboard / mouse don't work anymore). > >>> Spoke too soon. Rebooted and had the same hard lockup again. This > >>> time I had my camera within reach, so here's the trace: > >>> > >>> device_to_iommu+0x33/0x73 > >>> domain_context_mapping_one+0x37/0x335 > >>> domain_context_mapping+0x25/0xa7 > >>> iommu_prepare_identity+0xd7/0xf3 > >>> intel_iommu_init+0x4e4/0x8f3 > >>> ? mutex_lock > >>> ? sysctl_net_init > >>> ? pci_iommu_init > >>> pci_iommu_init > >>> > >>> I also have stack, code and register values. Let me know if you > >>> need them. Or I can just post the picture :-) > >>> > >>> Again, very latest git tree, VT-d enabled. > >>> > >>> /D > >> I tried latest git tree, it works for me. Above call trace looks > >> right. > > > > Spent some more time reading the code. Can't quite claim to > > understand all of it, yet, but I notice that most everywhere else > > drhd->devices[i] is checked to be != NULL before it is accessed. > > Why is it safe not to do that in device_to_iommu()? > > > > Would the patch below be a valid fix? It stops my system from > > hanging at boot. But I wonder if there is an assertion that if > > drhd->ignored is 0 then drhd->devices[0..drhd->device_cnt] is known > > to be != NULL and therefore this test is just hiding a bug > > somewhere else... > > > > /D > > > > Signed-off-by: Dirk Hohndel <hohndel@linux.intel.com> > > --- > > drivers/pci/intel-iommu.c | 3 ++- > > 1 files changed, 2 insertions(+), 1 deletions(-) > > > > diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c > > index 235fb7a..3dfecb2 100644 > > --- a/drivers/pci/intel-iommu.c > > +++ b/drivers/pci/intel-iommu.c > > @@ -438,7 +438,8 @@ static struct intel_iommu *device_to_iommu(u8 > > bus, u8 devfn) continue; > > > > for (i = 0; i < drhd->devices_cnt; i++) > > - if (drhd->devices[i]->bus->number == bus && > > + if (drhd->devices[i] && > > + drhd->devices[i]->bus->number == bus && > > drhd->devices[i]->devfn == devfn) > > return drhd->iommu; > > > > Did you see following in the kernel message? > printk(KERN_WARNING PREFIX > "Device scope device [%04x:%02x:%02x.%02x] not > found\n", segment, scope->bus, path->dev, path->fn); > > If yes, then > Acked-by: Yu Zhao <yu.zhao@intel.com> Yes, DMAR: Device scope device [0000:00:03:02] not found DMAR: Device scope device [0000:00:03:02] not found DMAR: Device scope device [0000:00:03:03] not found DMAR: Device scope device [0000:00:03:03] not found /D -- Dirk Hohndel Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: git-latest: kernel oops in IOMMU setup 2009-01-09 16:34 ` Dirk Hohndel @ 2009-01-09 16:45 ` Zhao, Yu 2009-01-09 16:55 ` Dirk Hohndel 2009-01-09 16:58 ` [PATCH] Prevent oops at boot with VT-d Dirk Hohndel 0 siblings, 2 replies; 14+ messages in thread From: Zhao, Yu @ 2009-01-09 16:45 UTC (permalink / raw) To: Dirk Hohndel Cc: Han, Weidong, 'Grant Grundler', 'linux-pci@vger.kernel.org', 'linux-kernel@vger.kernel.org', 'Jesse Barnes', 'iommu@lists.linux-foundation.org', 'Ingo Molnar', 'Arjan van de Ven' Dirk Hohndel wrote: > On Sat, 10 Jan 2009 00:16:22 +0800 > "Zhao, Yu" <yu.zhao@intel.com> wrote: > >> Dirk Hohndel wrote: >>> On Fri, 9 Jan 2009 14:53:14 +0800 >>> "Han, Weidong" <weidong.han@intel.com> wrote: >>> >>>> Dirk Hohndel wrote: >>>>> On Thu, 8 Jan 2009 18:05:15 -0800 >>>>> Dirk Hohndel <hohndel@infradead.org> wrote: >>>>> >>>>>> On Fri, 9 Jan 2009 08:58:46 +0800 "Han, Weidong" >>>>>> >>>>>> I updated to Linus' latest git (as your description made me >>>>>> wonder if the async stuff might play a role here). I still get >>>>>> an oops - but at a different spot and the system no longer hangs >>>>>> - it partly recovers (but things aren't too well - for example >>>>>> my USB keyboard / mouse don't work anymore). >>>>> Spoke too soon. Rebooted and had the same hard lockup again. This >>>>> time I had my camera within reach, so here's the trace: >>>>> >>>>> device_to_iommu+0x33/0x73 >>>>> domain_context_mapping_one+0x37/0x335 >>>>> domain_context_mapping+0x25/0xa7 >>>>> iommu_prepare_identity+0xd7/0xf3 >>>>> intel_iommu_init+0x4e4/0x8f3 >>>>> ? mutex_lock >>>>> ? sysctl_net_init >>>>> ? pci_iommu_init >>>>> pci_iommu_init >>>>> >>>>> I also have stack, code and register values. Let me know if you >>>>> need them. Or I can just post the picture :-) >>>>> >>>>> Again, very latest git tree, VT-d enabled. >>>>> >>>>> /D >>>> I tried latest git tree, it works for me. Above call trace looks >>>> right. >>> Spent some more time reading the code. Can't quite claim to >>> understand all of it, yet, but I notice that most everywhere else >>> drhd->devices[i] is checked to be != NULL before it is accessed. >>> Why is it safe not to do that in device_to_iommu()? >>> >>> Would the patch below be a valid fix? It stops my system from >>> hanging at boot. But I wonder if there is an assertion that if >>> drhd->ignored is 0 then drhd->devices[0..drhd->device_cnt] is known >>> to be != NULL and therefore this test is just hiding a bug >>> somewhere else... >>> >>> /D >>> >>> Signed-off-by: Dirk Hohndel <hohndel@linux.intel.com> >>> --- >>> drivers/pci/intel-iommu.c | 3 ++- >>> 1 files changed, 2 insertions(+), 1 deletions(-) >>> >>> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c >>> index 235fb7a..3dfecb2 100644 >>> --- a/drivers/pci/intel-iommu.c >>> +++ b/drivers/pci/intel-iommu.c >>> @@ -438,7 +438,8 @@ static struct intel_iommu *device_to_iommu(u8 >>> bus, u8 devfn) continue; >>> >>> for (i = 0; i < drhd->devices_cnt; i++) >>> - if (drhd->devices[i]->bus->number == bus && >>> + if (drhd->devices[i] && >>> + drhd->devices[i]->bus->number == bus && >>> drhd->devices[i]->devfn == devfn) >>> return drhd->iommu; >>> >> Did you see following in the kernel message? >> printk(KERN_WARNING PREFIX >> "Device scope device [%04x:%02x:%02x.%02x] not >> found\n", segment, scope->bus, path->dev, path->fn); >> >> If yes, then >> Acked-by: Yu Zhao <yu.zhao@intel.com> > > Yes, > > DMAR: Device scope device [0000:00:03:02] not found > DMAR: Device scope device [0000:00:03:02] not found > DMAR: Device scope device [0000:00:03:03] not found > DMAR: Device scope device [0000:00:03:03] not found The laptop has a nasty bios, try to update it if you want to get rid of these noises... assuming you are luck enough :-) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: git-latest: kernel oops in IOMMU setup 2009-01-09 16:45 ` Zhao, Yu @ 2009-01-09 16:55 ` Dirk Hohndel 2009-01-09 16:58 ` [PATCH] Prevent oops at boot with VT-d Dirk Hohndel 1 sibling, 0 replies; 14+ messages in thread From: Dirk Hohndel @ 2009-01-09 16:55 UTC (permalink / raw) To: Zhao, Yu Cc: Han, Weidong, 'Grant Grundler', 'linux-pci@vger.kernel.org', 'linux-kernel@vger.kernel.org', 'Jesse Barnes', 'iommu@lists.linux-foundation.org', 'Ingo Molnar', 'Arjan van de Ven' On Sat, 10 Jan 2009 00:45:31 +0800 "Zhao, Yu" <yu.zhao@intel.com> wrote: > > Yes, > > > > DMAR: Device scope device [0000:00:03:02] not found > > DMAR: Device scope device [0000:00:03:02] not found > > DMAR: Device scope device [0000:00:03:03] not found > > DMAR: Device scope device [0000:00:03:03] not found > > The laptop has a nasty bios, try to update it if you want to get rid > of these noises... assuming you are luck enough :-) It's a Lenovo Thinkpad X200s - and I am running the latest BIOS (at least according to their support website) :-( kernel: thinkpad_acpi: ThinkPad ACPI Extras v0.21 kernel: thinkpad_acpi: http://ibm-acpi.sf.net/ kernel: thinkpad_acpi: ThinkPad BIOS 6DET33WW (1.10 ), EC 7XHT21WW-1.03 kernel: thinkpad_acpi: Lenovo ThinkPad X200s, model 7465CTO /D -- Dirk Hohndel Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] Prevent oops at boot with VT-d 2009-01-09 16:45 ` Zhao, Yu 2009-01-09 16:55 ` Dirk Hohndel @ 2009-01-09 16:58 ` Dirk Hohndel 2009-01-11 15:25 ` [Resend][PATCH] " Dirk Hohndel 1 sibling, 1 reply; 14+ messages in thread From: Dirk Hohndel @ 2009-01-09 16:58 UTC (permalink / raw) To: Zhao, Yu Cc: Han, Weidong, 'Grant Grundler', 'linux-pci@vger.kernel.org', 'linux-kernel@vger.kernel.org', 'Jesse Barnes', 'iommu@lists.linux-foundation.org', 'Ingo Molnar', 'Arjan van de Ven' Resending with appropriate Subject Signed-off-by: Dirk Hohndel <hohndel@linux.intel.com> Acked-by: Yu Zhao <yu.zhao@intel.com> --- drivers/pci/intel-iommu.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index 235fb7a..3dfecb2 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -438,7 +438,8 @@ static struct intel_iommu *device_to_iommu(u8 bus, u8 devfn) continue; for (i = 0; i < drhd->devices_cnt; i++) - if (drhd->devices[i]->bus->number == bus && + if (drhd->devices[i] && + drhd->devices[i]->bus->number == bus && drhd->devices[i]->devfn == devfn) return drhd->iommu; -- 1.6.0.6 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Resend][PATCH] Prevent oops at boot with VT-d 2009-01-09 16:58 ` [PATCH] Prevent oops at boot with VT-d Dirk Hohndel @ 2009-01-11 15:25 ` Dirk Hohndel 0 siblings, 0 replies; 14+ messages in thread From: Dirk Hohndel @ 2009-01-11 15:25 UTC (permalink / raw) To: David Woodhouse, 'Ingo Molnar' Cc: Zhao, Yu, 'linux-pci@vger.kernel.org', 'linux-kernel@vger.kernel.org', 'Jesse Barnes', 'iommu@lists.linux-foundation.org' This wasn't included in 2.6.29-rc1 With some broken BIOSs when VT-d is enabled, the data structures are filled incorrectly. This can cause a NULL pointer dereference in very early boot. Signed-off-by: Dirk Hohndel <hohndel@linux.intel.com> Acked-by: Yu Zhao <yu.zhao@intel.com> --- drivers/pci/intel-iommu.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index 235fb7a..3dfecb2 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -438,7 +438,8 @@ static struct intel_iommu *device_to_iommu(u8 bus, u8 devfn) continue; for (i = 0; i < drhd->devices_cnt; i++) - if (drhd->devices[i]->bus->number == bus && + if (drhd->devices[i] && + drhd->devices[i]->bus->number == bus && drhd->devices[i]->devfn == devfn) return drhd->iommu; -- 1.6.0.6 -- Dirk Hohndel Intel Open Source Technology Center ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-01-11 15:25 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-01-08 20:05 git-latest: kernel oops in IOMMU setup Dirk Hohndel 2009-01-08 21:41 ` Grant Grundler 2009-01-08 21:56 ` Dirk Hohndel 2009-01-09 0:58 ` Han, Weidong 2009-01-09 2:05 ` Dirk Hohndel 2009-01-09 4:52 ` Dirk Hohndel 2009-01-09 6:53 ` Han, Weidong 2009-01-09 15:08 ` Dirk Hohndel 2009-01-09 16:16 ` Zhao, Yu 2009-01-09 16:34 ` Dirk Hohndel 2009-01-09 16:45 ` Zhao, Yu 2009-01-09 16:55 ` Dirk Hohndel 2009-01-09 16:58 ` [PATCH] Prevent oops at boot with VT-d Dirk Hohndel 2009-01-11 15:25 ` [Resend][PATCH] " Dirk Hohndel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox