* 3.6.11 AMD-Vi: Completion-Wait loop timed out @ 2013-01-20 10:33 Udo van den Heuvel 2013-01-20 10:36 ` Borislav Petkov 0 siblings, 1 reply; 38+ messages in thread From: Udo van den Heuvel @ 2013-01-20 10:33 UTC (permalink / raw) To: linux-kernel Hello, See below for a part of the logging on this F2A85X-UP4 with AMD a10-5800k. Box was raid checking I guess. Jan 20 03:42:08 s3 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="3031" x-info="http://www.rsyslog.com"] rsyslogd was HUPed Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:18 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:18 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:18 s3 kernel: ------------[ cut here ]------------ Jan 20 04:11:18 s3 kernel: WARNING: at drivers/iommu/amd_iommu.c:1104 __domain_flush_pages+0x1ad/0x1b0() Jan 20 04:11:18 s3 kernel: Hardware name: To be filled by O.E.M. Jan 20 04:11:18 s3 kernel: Modules linked in: vfat fat usb_storage pwc udf crc_itu_t nfsv3 nfs bnep bluetooth fuse cpufreq_userspace nf_conntrack_netbios_ns eeprom nf_conntrack_broadcast ipt_REJECT ip6t_REJECT it87 iptable_filter hwmon_vid xt_tcpudp ipt_MASQUERADE nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf_defrag_ipv4 nf_conntrack ip6table_filter ip_tables ip6_tables x_tables dm_mirror dm_region_hash dm_log ext2 snd_usb_audio snd_usbmidi_lib snd_hwdep snd_rawmidi snd_hda_codec_realtek videobuf2_vmalloc videobuf2_memops videobuf2_core cdc_ether hid_generic videodev binfmt_misc radeon cfbfillrect snd_hda_intel cfbimgblt snd_hda_codec fbcon bitblit cfbcopyarea snd_seq softcursor i2c_algo_bit snd_seq_device font backlight powernow_k8 mperf drm_kms_helper kvm_amd ttm snd_pcm kvm drm fb snd_page_alloc snd_timer snd fbdev k10temp microcode evdev i2c_piix4 xhci_hcd button nfsd exportfs auth_rpcgss nfs_acl lockd sunrpc autofs4 usbhid ehci_hcd ohci_hcd sr_mod cdrom [last unloaded Jan 20 04:11:18 s3 kernel: : pwc] Jan 20 04:11:18 s3 kernel: Pid: 506, comm: irq/43-ahci Not tainted 3.6.11 #19 Jan 20 04:11:18 s3 kernel: Call Trace: Jan 20 04:11:18 s3 kernel: [<ffffffff8103c679>] ? warn_slowpath_common+0x79/0xc0 Jan 20 04:11:18 s3 kernel: [<ffffffff8134598d>] ? __domain_flush_pages+0x1ad/0x1b0 Jan 20 04:11:18 s3 kernel: [<ffffffff81345c2b>] ? __unmap_single.isra.24+0xdb/0x110 Jan 20 04:11:18 s3 kernel: [<ffffffff81346615>] ? unmap_sg+0x55/0xb0 Jan 20 04:11:18 s3 kernel: [<ffffffff812a9a61>] ? ata_sg_clean+0x61/0xd0 Jan 20 04:11:18 s3 kernel: [<ffffffff812b039d>] ? ata_scsi_qc_complete+0x5d/0x420 Jan 20 04:11:18 s3 kernel: [<ffffffff812a9cc0>] ? __ata_qc_complete+0x40/0x130 Jan 20 04:11:18 s3 kernel: [<ffffffff812aa05a>] ? ata_qc_complete_multiple+0x7a/0xc0 Jan 20 04:11:30 s3 kernel: [<ffffffff812c1d2f>] ? ahci_interrupt+0xaf/0x710 Jan 20 04:11:30 s3 kernel: [<ffffffff8109c8e0>] ? irq_thread_fn+0x40/0x40 Jan 20 04:11:30 s3 kernel: [<ffffffff8109c903>] ? irq_forced_thread_fn+0x23/0x50 Jan 20 04:11:30 s3 kernel: [<ffffffff8109c67b>] ? irq_thread+0x11b/0x180 Jan 20 04:11:30 s3 kernel: [<ffffffff81060d8c>] ? __wake_up_common+0x4c/0x80 Jan 20 04:11:30 s3 kernel: [<ffffffff8109c7e0>] ? irq_finalize_oneshot+0x100/0x100 Jan 20 04:11:30 s3 kernel: [<ffffffff8109c560>] ? wake_threads_waitq+0x50/0x50 Jan 20 04:11:30 s3 kernel: [<ffffffff81058d75>] ? kthread+0x85/0x90 Jan 20 04:11:30 s3 kernel: [<ffffffff8141ec34>] ? kernel_thread_helper+0x4/0x10 Jan 20 04:11:30 s3 kernel: [<ffffffff81058cf0>] ? kthread_freezable_should_stop+0x50/0x50 Jan 20 04:11:30 s3 kernel: [<ffffffff8141ec30>] ? gs_change+0xb/0xb Jan 20 04:11:30 s3 kernel: ---[ end trace 73ac82546fadadb1 ]--- Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out Jan 20 04:11:30 s3 kernel: ------------[ cut here ]------------ And many more of the WARNINGs. What went wrong? How to fix? Kind regards, Udo ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-20 10:33 3.6.11 AMD-Vi: Completion-Wait loop timed out Udo van den Heuvel @ 2013-01-20 10:36 ` Borislav Petkov 2013-01-20 10:40 ` Udo van den Heuvel 0 siblings, 1 reply; 38+ messages in thread From: Borislav Petkov @ 2013-01-20 10:36 UTC (permalink / raw) To: Udo van den Heuvel; +Cc: linux-kernel, Jörg Rödel I know just the guy, CCed. :-) On Sun, Jan 20, 2013 at 11:33:19AM +0100, Udo van den Heuvel wrote: > > Hello, > > See below for a part of the logging on this F2A85X-UP4 with AMD > a10-5800k. Box was raid checking I guess. > > > Jan 20 03:42:08 s3 rsyslogd: [origin software="rsyslogd" > swVersion="5.8.10" x-pid="3031" x-info="http://www.rsyslog.com"] > rsyslogd was HUPed > Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:17 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:18 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:18 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:18 s3 kernel: ------------[ cut here ]------------ > Jan 20 04:11:18 s3 kernel: WARNING: at drivers/iommu/amd_iommu.c:1104 > __domain_flush_pages+0x1ad/0x1b0() > Jan 20 04:11:18 s3 kernel: Hardware name: To be filled by O.E.M. > Jan 20 04:11:18 s3 kernel: Modules linked in: vfat fat usb_storage pwc > udf crc_itu_t nfsv3 nfs bnep bluetooth fuse cpufreq_userspace > nf_conntrack_netbios_ns eeprom nf_conntrack_broadcast ipt_REJECT > ip6t_REJECT it87 iptable_filter hwmon_vid xt_tcpudp ipt_MASQUERADE > nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat nf_conntrack_ipv4 > xt_state nf_defrag_ipv4 nf_conntrack ip6table_filter ip_tables > ip6_tables x_tables dm_mirror dm_region_hash dm_log ext2 snd_usb_audio > snd_usbmidi_lib snd_hwdep snd_rawmidi snd_hda_codec_realtek > videobuf2_vmalloc videobuf2_memops videobuf2_core cdc_ether hid_generic > videodev binfmt_misc radeon cfbfillrect snd_hda_intel cfbimgblt > snd_hda_codec fbcon bitblit cfbcopyarea snd_seq softcursor i2c_algo_bit > snd_seq_device font backlight powernow_k8 mperf drm_kms_helper kvm_amd > ttm snd_pcm kvm drm fb snd_page_alloc snd_timer snd fbdev k10temp > microcode evdev i2c_piix4 xhci_hcd button nfsd exportfs auth_rpcgss > nfs_acl lockd sunrpc autofs4 usbhid ehci_hcd ohci_hcd sr_mod cdrom [last > unloaded > Jan 20 04:11:18 s3 kernel: : pwc] > Jan 20 04:11:18 s3 kernel: Pid: 506, comm: irq/43-ahci Not tainted > 3.6.11 #19 > Jan 20 04:11:18 s3 kernel: Call Trace: > Jan 20 04:11:18 s3 kernel: [<ffffffff8103c679>] ? > warn_slowpath_common+0x79/0xc0 > Jan 20 04:11:18 s3 kernel: [<ffffffff8134598d>] ? > __domain_flush_pages+0x1ad/0x1b0 > Jan 20 04:11:18 s3 kernel: [<ffffffff81345c2b>] ? > __unmap_single.isra.24+0xdb/0x110 > Jan 20 04:11:18 s3 kernel: [<ffffffff81346615>] ? unmap_sg+0x55/0xb0 > Jan 20 04:11:18 s3 kernel: [<ffffffff812a9a61>] ? ata_sg_clean+0x61/0xd0 > Jan 20 04:11:18 s3 kernel: [<ffffffff812b039d>] ? > ata_scsi_qc_complete+0x5d/0x420 > Jan 20 04:11:18 s3 kernel: [<ffffffff812a9cc0>] ? > __ata_qc_complete+0x40/0x130 > Jan 20 04:11:18 s3 kernel: [<ffffffff812aa05a>] ? > ata_qc_complete_multiple+0x7a/0xc0 > Jan 20 04:11:30 s3 kernel: [<ffffffff812c1d2f>] ? ahci_interrupt+0xaf/0x710 > Jan 20 04:11:30 s3 kernel: [<ffffffff8109c8e0>] ? irq_thread_fn+0x40/0x40 > Jan 20 04:11:30 s3 kernel: [<ffffffff8109c903>] ? > irq_forced_thread_fn+0x23/0x50 > Jan 20 04:11:30 s3 kernel: [<ffffffff8109c67b>] ? irq_thread+0x11b/0x180 > Jan 20 04:11:30 s3 kernel: [<ffffffff81060d8c>] ? __wake_up_common+0x4c/0x80 > Jan 20 04:11:30 s3 kernel: [<ffffffff8109c7e0>] ? > irq_finalize_oneshot+0x100/0x100 > Jan 20 04:11:30 s3 kernel: [<ffffffff8109c560>] ? > wake_threads_waitq+0x50/0x50 > Jan 20 04:11:30 s3 kernel: [<ffffffff81058d75>] ? kthread+0x85/0x90 > Jan 20 04:11:30 s3 kernel: [<ffffffff8141ec34>] ? > kernel_thread_helper+0x4/0x10 > Jan 20 04:11:30 s3 kernel: [<ffffffff81058cf0>] ? > kthread_freezable_should_stop+0x50/0x50 > Jan 20 04:11:30 s3 kernel: [<ffffffff8141ec30>] ? gs_change+0xb/0xb > Jan 20 04:11:30 s3 kernel: ---[ end trace 73ac82546fadadb1 ]--- > Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:30 s3 kernel: AMD-Vi: Completion-Wait loop timed out > Jan 20 04:11:30 s3 kernel: ------------[ cut here ]------------ > > And many more of the WARNINGs. > > What went wrong? How to fix? > > > Kind regards, > Udo > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-20 10:36 ` Borislav Petkov @ 2013-01-20 10:40 ` Udo van den Heuvel 2013-01-20 11:19 ` Jörg Rödel 0 siblings, 1 reply; 38+ messages in thread From: Udo van den Heuvel @ 2013-01-20 10:40 UTC (permalink / raw) To: Borislav Petkov, Jörg Rödel; +Cc: linux-kernel Hello, On 2013-01-20 11:36, Borislav Petkov wrote: > I know just the guy, CCed. :-) Thanks for the quick response! I found this similar case: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1073384 Kind regards, Udo ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-20 10:40 ` Udo van den Heuvel @ 2013-01-20 11:19 ` Jörg Rödel 2013-01-20 11:25 ` Udo van den Heuvel 0 siblings, 1 reply; 38+ messages in thread From: Jörg Rödel @ 2013-01-20 11:19 UTC (permalink / raw) To: Udo van den Heuvel; +Cc: Borislav Petkov, linux-kernel On Sun, Jan 20, 2013 at 11:40:20AM +0100, Udo van den Heuvel wrote: > Hello, > > On 2013-01-20 11:36, Borislav Petkov wrote: > > I know just the guy, CCed. :-) > > Thanks for the quick response! > I found this similar case: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1073384 Yes, this is a Hardware issue for which the BIOS does not apply the workaround. The only solution for now is to disable the IOMMU on the Trinity based chips. Unfortunatly I don't have access to the hardware any longer to write a workaround in the AMD IOMMU driver. The question is what to do now, I tend to disable the IOMMU if a Trinity chip is detected. This is not the first report of this problem I encountered. Regards, Joerg ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-20 11:19 ` Jörg Rödel @ 2013-01-20 11:25 ` Udo van den Heuvel 2013-01-20 11:40 ` Jörg Rödel 0 siblings, 1 reply; 38+ messages in thread From: Udo van den Heuvel @ 2013-01-20 11:25 UTC (permalink / raw) To: Jörg Rödel; +Cc: Borislav Petkov, linux-kernel Hello Jörg, On 2013-01-20 12:19, Jörg Rödel wrote: > On Sun, Jan 20, 2013 at 11:40:20AM +0100, Udo van den Heuvel wrote: >> Hello, >> >> On 2013-01-20 11:36, Borislav Petkov wrote: >>> I know just the guy, CCed. :-) >> >> Thanks for the quick response! >> I found this similar case: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1073384 > > Yes, this is a Hardware issue for which the BIOS does not apply the > workaround. Hardware issue? What is wrong c.q. happening? I have this: # dmesg|grep IOMMU [ 0.000000] ACPI: IVRS 000000009dd12420 00070 (v02 AMD AMDIOMMU 00000001 AMD 00000000) [ 0.000000] Please enable the IOMMU option in the BIOS setup [ 1.125636] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40 So kernel says I have no IOMMU but still one is found? (!?) > The only solution for now is to disable the IOMMU on the > Trinity based chips. In PC-BIOS I assume? I did not yet find an option, but this is the first occurrence. Can the BIOS vendor fix this? If so: please explain so I cna contact Gigabyte (motherboard manufacturer) > The question is what to do now, I tend to disable the IOMMU if a > Trinity chip is detected. This is not the first report of this problem > I encountered. I know, see the URL I posted. What is the impact of disabling the IOMMU? Udo ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-20 11:25 ` Udo van den Heuvel @ 2013-01-20 11:40 ` Jörg Rödel 2013-01-20 11:48 ` Borislav Petkov 0 siblings, 1 reply; 38+ messages in thread From: Jörg Rödel @ 2013-01-20 11:40 UTC (permalink / raw) To: Udo van den Heuvel; +Cc: Borislav Petkov, linux-kernel On Sun, Jan 20, 2013 at 12:25:07PM +0100, Udo van den Heuvel wrote: > Hello Jörg, > > Hardware issue? What is wrong c.q. happening? I think it falls under Erratum 455 (which does not mention IOMMU specifically). Point is, there is a hardware workaround for this to make the IOMMU work, but your BIOS does not enable it. > > I have this: > > # dmesg|grep IOMMU > [ 0.000000] ACPI: IVRS 000000009dd12420 00070 (v02 AMD AMDIOMMU > 00000001 AMD 00000000) > [ 0.000000] Please enable the IOMMU option in the BIOS setup > [ 1.125636] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40 > > So kernel says I have no IOMMU but still one is found? (!?) The "Please enable IOMMU ..." line tells you about the GART, not the AMD IOMMU. This is a frequent source of confusion, we probably should remove that line. Trinity has no GART, so there is nothing to find :) > In PC-BIOS I assume? > I did not yet find an option, but this is the first occurrence. > Can the BIOS vendor fix this? If so: please explain so I cna contact > Gigabyte (motherboard manufacturer) Yes, the BIOS vendor can fix this issue. They need to disable NB clock gating for the IOMMU. > What is the impact of disabling the IOMMU? Well, it has some security impact and if you have more than 4GB of RAM maybe also some slight performance impact due to DMA bounce buffering. But thats still better as a system that stops working after some time. Joerg ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-20 11:40 ` Jörg Rödel @ 2013-01-20 11:48 ` Borislav Petkov 2013-01-20 11:50 ` Borislav Petkov ` (3 more replies) 0 siblings, 4 replies; 38+ messages in thread From: Borislav Petkov @ 2013-01-20 11:48 UTC (permalink / raw) To: Jörg Rödel Cc: Udo van den Heuvel, linux-kernel, Boris Ostrovsky, Jacob Shin On Sun, Jan 20, 2013 at 12:40:11PM +0100, Jörg Rödel wrote: > Yes, the BIOS vendor can fix this issue. They need to disable NB clock > gating for the IOMMU. Right, Udo, you can try Gigabyte first. Btw, can't we add a quirk to disable NB clock gating? Maybe Boris and Jacob could help. CCed. Guys, the error description is at http://marc.info/?l=linux-kernel&m=135867802432660 Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-20 11:48 ` Borislav Petkov @ 2013-01-20 11:50 ` Borislav Petkov 2013-01-20 11:59 ` Udo van den Heuvel 2013-01-20 11:52 ` Udo van den Heuvel ` (2 subsequent siblings) 3 siblings, 1 reply; 38+ messages in thread From: Borislav Petkov @ 2013-01-20 11:50 UTC (permalink / raw) To: Udo van den Heuvel Cc: Jörg Rödel, linux-kernel, Boris Ostrovsky, Jacob Shin On Sun, Jan 20, 2013 at 12:48:28PM +0100, Borislav Petkov wrote: > On Sun, Jan 20, 2013 at 12:40:11PM +0100, Jörg Rödel wrote: > > Yes, the BIOS vendor can fix this issue. They need to disable NB clock > > gating for the IOMMU. > > Right, Udo, you can try Gigabyte first. Btw, you're running the latest BIOS from them, I assume? -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-20 11:50 ` Borislav Petkov @ 2013-01-20 11:59 ` Udo van den Heuvel 2013-01-20 12:24 ` Borislav Petkov 0 siblings, 1 reply; 38+ messages in thread From: Udo van den Heuvel @ 2013-01-20 11:59 UTC (permalink / raw) To: Borislav Petkov, Jörg Rödel, linux-kernel, Boris Ostrovsky, Jacob Shin On 2013-01-20 12:50, Borislav Petkov wrote: >> Right, Udo, you can try Gigabyte first. > > Btw, you're running the latest BIOS from them, I assume? Nope. But I am beyond their first released BIOS, I am running one of their beta BIOSes. I am two beta updates behind current with F3g. They list as description for BIOS F3k: 1. Beta BIOS 2. Modify option of APU and memory voltage 3. Modify option of CPU PWM switch rate 4. Modify memory compatibility 5. Modify ET6 compatibility Udo ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-20 11:59 ` Udo van den Heuvel @ 2013-01-20 12:24 ` Borislav Petkov 0 siblings, 0 replies; 38+ messages in thread From: Borislav Petkov @ 2013-01-20 12:24 UTC (permalink / raw) To: Udo van den Heuvel Cc: Jörg Rödel, linux-kernel, Boris Ostrovsky, Jacob Shin On Sun, Jan 20, 2013 at 12:59:59PM +0100, Udo van den Heuvel wrote: > On 2013-01-20 12:50, Borislav Petkov wrote: > >> Right, Udo, you can try Gigabyte first. > > > > Btw, you're running the latest BIOS from them, I assume? > > Nope. But I am beyond their first released BIOS, I am running one of > their beta BIOSes. I am two beta updates behind current with F3g. > They list as description for BIOS F3k: > > 1. Beta BIOS > 2. Modify option of APU and memory voltage > 3. Modify option of CPU PWM switch rate > 4. Modify memory compatibility > 5. Modify ET6 compatibility Yeah, fixes lists are not always exhaustive, especially with BIOS. You could try the latest if you can downgrade it easily if something breaks with F3k. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-20 11:48 ` Borislav Petkov 2013-01-20 11:50 ` Borislav Petkov @ 2013-01-20 11:52 ` Udo van den Heuvel 2013-01-20 11:57 ` Jörg Rödel 2013-01-21 16:04 ` Jacob Shin 3 siblings, 0 replies; 38+ messages in thread From: Udo van den Heuvel @ 2013-01-20 11:52 UTC (permalink / raw) To: Borislav Petkov, Jörg Rödel, linux-kernel, Boris Ostrovsky, Jacob Shin On 2013-01-20 12:48, Borislav Petkov wrote: > On Sun, Jan 20, 2013 at 12:40:11PM +0100, Jörg Rödel wrote: >> Yes, the BIOS vendor can fix this issue. They need to disable NB clock >> gating for the IOMMU. > > Right, Udo, you can try Gigabyte first. I just did so and referred to the kernel.org bugzilla. > Btw, can't we add a quirk to disable NB clock gating? Maybe Boris and > Jacob could help. CCed. That would be most helpful! Of course I can help testing but the issue happened only 1 time so far. The person from https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1073384 had more bad luck in experiencing the issue. Thanks, Udo ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-20 11:48 ` Borislav Petkov 2013-01-20 11:50 ` Borislav Petkov 2013-01-20 11:52 ` Udo van den Heuvel @ 2013-01-20 11:57 ` Jörg Rödel 2013-01-21 13:09 ` Borislav Petkov 2013-01-21 14:37 ` Boris Ostrovsky 2013-01-21 16:04 ` Jacob Shin 3 siblings, 2 replies; 38+ messages in thread From: Jörg Rödel @ 2013-01-20 11:57 UTC (permalink / raw) To: Borislav Petkov, Udo van den Heuvel, linux-kernel, Boris Ostrovsky, Jacob Shin On Sun, Jan 20, 2013 at 12:48:28PM +0100, Borislav Petkov wrote: > On Sun, Jan 20, 2013 at 12:40:11PM +0100, Jörg Rödel wrote: > > Yes, the BIOS vendor can fix this issue. They need to disable NB clock > > gating for the IOMMU. > > Right, Udo, you can try Gigabyte first. > > Btw, can't we add a quirk to disable NB clock gating? Maybe Boris and > Jacob could help. CCed. BorisO is no longer with AMD afaik. I wrote an email to Sherry and Suravee and asked them to either send me hardware to write the fix on my own or to send a fix for the issue. Let's see what happens... Joerg ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-20 11:57 ` Jörg Rödel @ 2013-01-21 13:09 ` Borislav Petkov 2013-01-21 14:10 ` Udo van den Heuvel ` (2 more replies) 2013-01-21 14:37 ` Boris Ostrovsky 1 sibling, 3 replies; 38+ messages in thread From: Borislav Petkov @ 2013-01-21 13:09 UTC (permalink / raw) To: Jörg Rödel Cc: Udo van den Heuvel, linux-kernel, Boris Ostrovsky, Jacob Shin On Sun, Jan 20, 2013 at 12:57:55PM +0100, Jörg Rödel wrote: > BorisO is no longer with AMD afaik. Why am I not surprised... > I wrote an email to Sherry and Suravee and asked them to either send > me hardware to write the fix on my own or to send a fix for the issue. > Let's see what happens... Btw, while we're at it, here's some more h0rkage from my PD box: [ 0.220022] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table [ 0.220078] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table [ 0.220132] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS table [ 0.220187] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s) -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-21 13:09 ` Borislav Petkov @ 2013-01-21 14:10 ` Udo van den Heuvel 2013-01-21 14:55 ` Borislav Petkov 2013-01-21 15:10 ` Jörg Rödel 2013-04-21 1:03 ` Jake 2 siblings, 1 reply; 38+ messages in thread From: Udo van den Heuvel @ 2013-01-21 14:10 UTC (permalink / raw) To: Borislav Petkov, Jörg Rödel, linux-kernel, Boris Ostrovsky, Jacob Shin On 2013-01-21 14:09, Borislav Petkov wrote: >> Let's see what happens... > > Btw, while we're at it, here's some more h0rkage from my PD box: > > [ 0.220022] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table > [ 0.220078] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table > [ 0.220132] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS table > [ 0.220187] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s) I have: # dmesg|grep -i amd-vi [ 1.125636] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40 [ 1.125701] AMD-Vi: Extended features: PreF PPR GT IA [ 1.131725] AMD-Vi: Lazy IO/TLB flushing enabled Is that 'OK'? Udo ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-21 14:10 ` Udo van den Heuvel @ 2013-01-21 14:55 ` Borislav Petkov 0 siblings, 0 replies; 38+ messages in thread From: Borislav Petkov @ 2013-01-21 14:55 UTC (permalink / raw) To: Udo van den Heuvel Cc: Jörg Rödel, linux-kernel, Boris Ostrovsky, Jacob Shin On Mon, Jan 21, 2013 at 03:10:19PM +0100, Udo van den Heuvel wrote: > On 2013-01-21 14:09, Borislav Petkov wrote: > >> Let's see what happens... > > > > Btw, while we're at it, here's some more h0rkage from my PD box: > > > > [ 0.220022] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table > > [ 0.220078] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table > > [ 0.220132] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS table > > [ 0.220187] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s) > > I have: > > # dmesg|grep -i amd-vi > [ 1.125636] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40 > [ 1.125701] AMD-Vi: Extended features: PreF PPR GT IA > [ 1.131725] AMD-Vi: Lazy IO/TLB flushing enabled > > Is that 'OK'? That's simply dumping the IOMMU extended features and yes, it is ok. Mine happen when enabling CONFIG_IRQ_REMAP and they're somewhat related. Anyways, I decided to show them to Joerg so that he's aware. Btw, you could try enabling that on your machine and see whether IRQ remapping works there. Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-21 13:09 ` Borislav Petkov 2013-01-21 14:10 ` Udo van den Heuvel @ 2013-01-21 15:10 ` Jörg Rödel 2013-01-21 15:32 ` Borislav Petkov 2013-04-21 1:03 ` Jake 2 siblings, 1 reply; 38+ messages in thread From: Jörg Rödel @ 2013-01-21 15:10 UTC (permalink / raw) To: Borislav Petkov, Udo van den Heuvel, linux-kernel, Boris Ostrovsky, Jacob Shin On Mon, Jan 21, 2013 at 02:09:42PM +0100, Borislav Petkov wrote: > [ 0.220022] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table > [ 0.220078] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table > [ 0.220132] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS table > [ 0.220187] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s) Yes, that are BIOS bugs too that prevent interrupt remapping to function reliably. But the good thing is that these bugs can be detected easily to enable a workaround :-) Joerg ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-21 15:10 ` Jörg Rödel @ 2013-01-21 15:32 ` Borislav Petkov 2013-01-21 15:34 ` Udo van den Heuvel 0 siblings, 1 reply; 38+ messages in thread From: Borislav Petkov @ 2013-01-21 15:32 UTC (permalink / raw) To: Jörg Rödel Cc: Udo van den Heuvel, linux-kernel, Boris Ostrovsky, Jacob Shin On Mon, Jan 21, 2013 at 04:10:00PM +0100, Jörg Rödel wrote: > On Mon, Jan 21, 2013 at 02:09:42PM +0100, Borislav Petkov wrote: > > [ 0.220022] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table > > [ 0.220078] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table > > [ 0.220132] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS table > > [ 0.220187] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s) > > Yes, that are BIOS bugs too that prevent interrupt remapping to function > reliably. But the good thing is that these bugs can be detected easily > to enable a workaround :-) Well, I'm all ready to test stuff since this is the latest ASUS BIOS and I'm not even going to ask them to fix it there based on previous experience with them :-). -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-21 15:32 ` Borislav Petkov @ 2013-01-21 15:34 ` Udo van den Heuvel 0 siblings, 0 replies; 38+ messages in thread From: Udo van den Heuvel @ 2013-01-21 15:34 UTC (permalink / raw) To: Borislav Petkov, Jörg Rödel, linux-kernel, Boris Ostrovsky, Jacob Shin On 2013-01-21 16:32, Borislav Petkov wrote: >>> [ 0.220187] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s) >> >> Yes, that are BIOS bugs too that prevent interrupt remapping to function >> reliably. But the good thing is that these bugs can be detected easily >> to enable a workaround :-) > > Well, I'm all ready to test stuff since this is the latest ASUS BIOS > and I'm not even going to ask them to fix it there based on previous > experience with them :-). I too am ready to test but the Gigabyte case is still open and unanswered. Udo ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-21 13:09 ` Borislav Petkov 2013-01-21 14:10 ` Udo van den Heuvel 2013-01-21 15:10 ` Jörg Rödel @ 2013-04-21 1:03 ` Jake 2013-04-21 21:47 ` Borislav Petkov 2 siblings, 1 reply; 38+ messages in thread From: Jake @ 2013-04-21 1:03 UTC (permalink / raw) To: linux-kernel Borislav Petkov <bp <at> alien8.de> writes: > > On Sun, Jan 20, 2013 at 12:57:55PM +0100, Jörg Rödel wrote: > > BorisO is no longer with AMD afaik. > > Why am I not surprised... > > > I wrote an email to Sherry and Suravee and asked them to either send > > me hardware to write the fix on my own or to send a fix for the issue. > > Let's see what happens... > > Btw, while we're at it, here's some more h0rkage from my PD box: > > [ 0.220022] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table > [ 0.220078] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table > [ 0.220132] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS table > [ 0.220187] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s) > hello, I've never posted to this type of message board before so I hope I'm not out of order. I found my way here while trying to solve a shutdown problem in a new linux install (Arch). I've noticed for some time that I have the same lines as above in my dmesg as well as: ACPI BIOS Bug: Warning: Optional FADT field Pm2CintrolBlock has no address or length: 0x0000000000000000/0x1 (20121018/tbfadt-589) I have no idea whether this is related to my shutdown problem or not and what I was hoping is if someone would advise me how to find help for my problem. I have tried all the normal routes for my distro (forum and irc) repeatedly, and googled my weeping eyes out - but to no avail. Any advice would be greatly appreciated. Thanks Jake ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-04-21 1:03 ` Jake @ 2013-04-21 21:47 ` Borislav Petkov 0 siblings, 0 replies; 38+ messages in thread From: Borislav Petkov @ 2013-04-21 21:47 UTC (permalink / raw) To: Jake; +Cc: linux-kernel On Sun, Apr 21, 2013 at 01:03:16AM +0000, Jake wrote: > ACPI BIOS Bug: Warning: Optional FADT field Pm2CintrolBlock has no address or length: 0x0000000000000000/0x1 (20121018/tbfadt-589) I have the same one: [ 0.000000] ACPI BIOS Bug: Warning: Optional FADT field Pm2ControlBlock has zero address or length: 0x0000000000000000/0x1 (20130117/tbfadt-599) We probably have the same ASUS crap for a board. > I have no idea whether this is related to my shutdown problem or not and I don't think so as I can suspend/resume/shutdown my box just fine. :) > what I was hoping is if someone would advise me how to find help for > my problem. I have tried all the normal routes for my distro (forum > and irc) repeatedly, and googled my weeping eyes out - but to no > avail. Have you tried the upstream kernel yet? I hear 3.9-rc8 will be out tomorrow :-) HTH. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-20 11:57 ` Jörg Rödel 2013-01-21 13:09 ` Borislav Petkov @ 2013-01-21 14:37 ` Boris Ostrovsky 2013-01-21 14:44 ` Udo van den Heuvel 2013-01-21 14:47 ` Jörg Rödel 1 sibling, 2 replies; 38+ messages in thread From: Boris Ostrovsky @ 2013-01-21 14:37 UTC (permalink / raw) To: Jörg Rödel Cc: Borislav Petkov, Udo van den Heuvel, linux-kernel, Jacob Shin On 01/20/2013 06:57 AM, Jörg Rödel wrote: > On Sun, Jan 20, 2013 at 12:48:28PM +0100, Borislav Petkov wrote: >> On Sun, Jan 20, 2013 at 12:40:11PM +0100, Jörg Rödel wrote: >>> Yes, the BIOS vendor can fix this issue. They need to disable NB clock >>> gating for the IOMMU. >> >> Right, Udo, you can try Gigabyte first. >> >> Btw, can't we add a quirk to disable NB clock gating? Maybe Boris and >> Jacob could help. CCed. Are you talking about erratum 746? -boris ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-21 14:37 ` Boris Ostrovsky @ 2013-01-21 14:44 ` Udo van den Heuvel 2013-01-21 14:47 ` Jörg Rödel 1 sibling, 0 replies; 38+ messages in thread From: Udo van den Heuvel @ 2013-01-21 14:44 UTC (permalink / raw) To: Boris Ostrovsky Cc: Jörg Rödel, Borislav Petkov, linux-kernel, Jacob Shin On 2013-01-21 15:37, Boris Ostrovsky wrote: >>> Btw, can't we add a quirk to disable NB clock gating? Maybe Boris and >>> Jacob could help. CCed. > > > Are you talking about erratum 746? Link please? If we have a link I could add that to the Gigabyte case. Udo ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-21 14:37 ` Boris Ostrovsky 2013-01-21 14:44 ` Udo van den Heuvel @ 2013-01-21 14:47 ` Jörg Rödel 1 sibling, 0 replies; 38+ messages in thread From: Jörg Rödel @ 2013-01-21 14:47 UTC (permalink / raw) To: Boris Ostrovsky Cc: Borislav Petkov, Udo van den Heuvel, linux-kernel, Jacob Shin Hi Boris, On Mon, Jan 21, 2013 at 09:37:31AM -0500, Boris Ostrovsky wrote: > Are you talking about erratum 746? The problems seen here are not about PPR failures, so it is not particularily this erratum, but the workaround looks similar. Joerg ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-20 11:48 ` Borislav Petkov ` (2 preceding siblings ...) 2013-01-20 11:57 ` Jörg Rödel @ 2013-01-21 16:04 ` Jacob Shin 2013-01-21 22:35 ` Suravee Suthikulpanit 3 siblings, 1 reply; 38+ messages in thread From: Jacob Shin @ 2013-01-21 16:04 UTC (permalink / raw) To: Borislav Petkov, Jörg Rödel, Udo van den Heuvel, linux-kernel, Boris Ostrovsky Cc: Suravee.Suthikulpanit On Sun, Jan 20, 2013 at 12:48:28PM +0100, Borislav Petkov wrote: > On Sun, Jan 20, 2013 at 12:40:11PM +0100, Jörg Rödel wrote: > > Yes, the BIOS vendor can fix this issue. They need to disable NB clock > > gating for the IOMMU. > > Right, Udo, you can try Gigabyte first. > > Btw, can't we add a quirk to disable NB clock gating? Maybe Boris and > Jacob could help. CCed. Hi, yes we will try and reproduce the NB clock gating issue on our end and submit a patch ASAP. And Boris P., I think your IOAPIC not in IVRS issue we've also seen something similar recently (on Xen), so we'll atempt to tackle that one too afterwards. -Jacob > > Guys, the error description is at > http://marc.info/?l=linux-kernel&m=135867802432660 > > Thanks. > > -- > Regards/Gruss, > Boris. > > Sent from a fat crate under my desk. Formatting is fine. > -- > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-21 16:04 ` Jacob Shin @ 2013-01-21 22:35 ` Suravee Suthikulpanit 2013-01-22 3:22 ` Udo van den Heuvel 2013-01-22 14:13 ` Udo van den Heuvel 0 siblings, 2 replies; 38+ messages in thread From: Suravee Suthikulpanit @ 2013-01-21 22:35 UTC (permalink / raw) To: Jacob Shin Cc: Borislav Petkov, Jörg Rödel, Udo van den Heuvel, linux-kernel, Boris Ostrovsky Udo, I am trying to debug the issue but need to check one thing on your system. Would you please try the following and check the output value on your system? # setpci -s 00:00.02 F0.w=90 # setpci -s 00:00.02 F4.w Thank you, Suravee On Mon, 2013-01-21 at 10:04 -0600, Jacob Shin wrote: > On Sun, Jan 20, 2013 at 12:48:28PM +0100, Borislav Petkov wrote: > > On Sun, Jan 20, 2013 at 12:40:11PM +0100, Jörg Rödel wrote: > > > Yes, the BIOS vendor can fix this issue. They need to disable NB clock > > > gating for the IOMMU. > > > > Right, Udo, you can try Gigabyte first. > > > > Btw, can't we add a quirk to disable NB clock gating? Maybe Boris and > > Jacob could help. CCed. > > Hi, yes we will try and reproduce the NB clock gating issue on our > end and submit a patch ASAP. > > And Boris P., I think your IOAPIC not in IVRS issue we've also seen > something similar recently (on Xen), so we'll atempt to tackle that > one too afterwards. > > -Jacob > > > > > Guys, the error description is at > > http://marc.info/?l=linux-kernel&m=135867802432660 > > > > Thanks. > > > > -- > > Regards/Gruss, > > Boris. > > > > Sent from a fat crate under my desk. Formatting is fine. > > -- > > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-21 22:35 ` Suravee Suthikulpanit @ 2013-01-22 3:22 ` Udo van den Heuvel 2013-01-22 14:13 ` Udo van den Heuvel 1 sibling, 0 replies; 38+ messages in thread From: Udo van den Heuvel @ 2013-01-22 3:22 UTC (permalink / raw) To: suravee.suthikulpanit Cc: Jacob Shin, Borislav Petkov, Jörg Rödel, linux-kernel, Boris Ostrovsky On 2013-01-21 23:35, Suravee Suthikulpanit wrote: > Would you please try the following and check the output value > on your system? > > # setpci -s 00:00.02 F0.w=90 > # setpci -s 00:00.02 F4.w # setpci -s 00:00.02 F0.w=90 # setpci -s 00:00.02 F4.w 0050 # Udo ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-21 22:35 ` Suravee Suthikulpanit 2013-01-22 3:22 ` Udo van den Heuvel @ 2013-01-22 14:13 ` Udo van den Heuvel 2013-01-22 14:36 ` Boris Ostrovsky 1 sibling, 1 reply; 38+ messages in thread From: Udo van den Heuvel @ 2013-01-22 14:13 UTC (permalink / raw) To: suravee.suthikulpanit Cc: Jacob Shin, Borislav Petkov, Jörg Rödel, linux-kernel, Boris Ostrovsky Gigabyte demonstrate that using ESX 5i IOMMU works fine. (with pictures attached). What can we bring against that? Udo ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-22 14:13 ` Udo van den Heuvel @ 2013-01-22 14:36 ` Boris Ostrovsky 2013-01-22 15:16 ` Jörg Rödel ` (2 more replies) 0 siblings, 3 replies; 38+ messages in thread From: Boris Ostrovsky @ 2013-01-22 14:36 UTC (permalink / raw) To: Udo van den Heuvel Cc: suravee.suthikulpanit, Jacob Shin, Borislav Petkov, Jörg Rödel, linux-kernel On 01/22/2013 09:13 AM, Udo van den Heuvel wrote: > Gigabyte demonstrate that using ESX 5i IOMMU works fine. (with pictures > attached). There are no attachments to your message. I am not sure that 5i supports IOMMU (but I may well be wrong). > > What can we bring against that? How reproducible is the problem that you are seeing? -boris ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-22 14:36 ` Boris Ostrovsky @ 2013-01-22 15:16 ` Jörg Rödel 2013-01-22 15:27 ` Udo van den Heuvel 2013-01-31 15:42 ` Udo van den Heuvel 2 siblings, 0 replies; 38+ messages in thread From: Jörg Rödel @ 2013-01-22 15:16 UTC (permalink / raw) To: Boris Ostrovsky Cc: Udo van den Heuvel, suravee.suthikulpanit, Jacob Shin, Borislav Petkov, linux-kernel On Tue, Jan 22, 2013 at 09:36:34AM -0500, Boris Ostrovsky wrote: > > > On 01/22/2013 09:13 AM, Udo van den Heuvel wrote: > >Gigabyte demonstrate that using ESX 5i IOMMU works fine. (with pictures > >attached). > > There are no attachments to your message. > > I am not sure that 5i supports IOMMU (but I may well be wrong). Virtualization use-cases don't change the page-tables for the IOMMU very often. So there is less need to flush the IO-TLB and IOMMU command processing is utilized only from time to time. In Linux however the page-tables change all the time and there is a much higher load on the IOMMU command buffer which makes it much more likely to trigger the hardware problem. Joerg ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-22 14:36 ` Boris Ostrovsky 2013-01-22 15:16 ` Jörg Rödel @ 2013-01-22 15:27 ` Udo van den Heuvel 2013-01-22 16:12 ` Boris Ostrovsky 2013-01-31 15:42 ` Udo van den Heuvel 2 siblings, 1 reply; 38+ messages in thread From: Udo van den Heuvel @ 2013-01-22 15:27 UTC (permalink / raw) To: Boris Ostrovsky Cc: suravee.suthikulpanit, Jacob Shin, Borislav Petkov, Jörg Rödel, linux-kernel On 2013-01-22 15:36, Boris Ostrovsky wrote: >> Gigabyte demonstrate that using ESX 5i IOMMU works fine. (with pictures >> attached). > > There are no attachments to your message. Correct, gigabyte did send them via their support web-interface. Do yo uneed to see them? They just show IOMMU enabled or similar. >> What can we bring against that? > > How reproducible is the problem that you are seeing? Seen once over here. Correlated with raid-check. Udo ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-22 15:27 ` Udo van den Heuvel @ 2013-01-22 16:12 ` Boris Ostrovsky 2013-01-22 16:29 ` Udo van den Heuvel 0 siblings, 1 reply; 38+ messages in thread From: Boris Ostrovsky @ 2013-01-22 16:12 UTC (permalink / raw) To: Udo van den Heuvel Cc: suravee.suthikulpanit, Jacob Shin, Borislav Petkov, Jörg Rödel, linux-kernel On 01/22/2013 10:27 AM, Udo van den Heuvel wrote: > On 2013-01-22 15:36, Boris Ostrovsky wrote: >>> Gigabyte demonstrate that using ESX 5i IOMMU works fine. (with pictures >>> attached). >> >> There are no attachments to your message. > > Correct, gigabyte did send them via their support web-interface. > Do yo uneed to see them? They just show IOMMU enabled or similar. No, I thought you ran this yourself. > >>> What can we bring against that? >> >> How reproducible is the problem that you are seeing? > > Seen once over here. Correlated with raid-check. Then the answer from Gigabyte doesn't prove anything. You can also boot Linux without seeing this problem in most cases. Your BIOS does not have the required erratum workaround. We will provide a patch to close that hole but since the problem is not easily reproducible (and the erratum is also not easy to trigger) it may be difficult to say whether it really helped with your problem. -boris ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-22 16:12 ` Boris Ostrovsky @ 2013-01-22 16:29 ` Udo van den Heuvel 2013-01-22 23:29 ` Suravee Suthikulanit 0 siblings, 1 reply; 38+ messages in thread From: Udo van den Heuvel @ 2013-01-22 16:29 UTC (permalink / raw) To: Boris Ostrovsky Cc: suravee.suthikulpanit, Jacob Shin, Borislav Petkov, Jörg Rödel, linux-kernel On 2013-01-22 17:12, Boris Ostrovsky wrote: >> Seen once over here. Correlated with raid-check. > > Then the answer from Gigabyte doesn't prove anything. You can also boot > Linux without seeing this problem in most cases. That was my situation until the first time it hit. > Your BIOS does not have the required erratum workaround. We will provide > a patch to close that hole but since the problem is not easily > reproducible (and the erratum is also not easy to trigger) it may be > difficult to say whether it really helped with your problem. Can we think of certain loads/actions/etc that could help trigger the issue? Then if reproducing is easier we can better say if stuff is actually fixed after the workaround. Udo ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-22 16:29 ` Udo van den Heuvel @ 2013-01-22 23:29 ` Suravee Suthikulanit 2013-01-23 14:19 ` Udo van den Heuvel 2013-01-23 14:23 ` Udo van den Heuvel 0 siblings, 2 replies; 38+ messages in thread From: Suravee Suthikulanit @ 2013-01-22 23:29 UTC (permalink / raw) To: Udo van den Heuvel Cc: Boris Ostrovsky, Jacob Shin, Borislav Petkov, Jörg Rödel, linux-kernel On 1/22/2013 10:29 AM, Udo van den Heuvel wrote: > On 2013-01-22 17:12, Boris Ostrovsky wrote: >> Your BIOS does not have the required erratum workaround. We will provide >> a patch to close that hole but since the problem is not easily >> reproducible (and the erratum is also not easy to trigger) it may be >> difficult to say whether it really helped with your problem. Udo, I sent out a patch (http://marc.info/?l=linux-kernel&m=135889686523524&w=2) which should implement the workaround for AMD processor family15h model 10-1Fh erratum 746 in the IOMMU driver. In your case, the output from "setpci -s 00:00.02 F4.w" is "0050" which tells me that BIOS doesn't implement the work around. After patching, you should see the following message in "dmesg". "AMD-Vi: Applying erratum 746 for IOMMU at 0000:00:00.2" > Can we think of certain loads/actions/etc that could help trigger the issue? > Then if reproducing is easier we can better say if stuff is actually > fixed after the workaround. > > Udo Looking at the original kernel message, it seems that the the kernel timed out while waiting for the IOMMU to finish executing the "COMPLETION_WAIT" command. In this particular case, it is issued as part of "__domain_flush_pages()" while trying to send the "INVALIDATE_IOMMU_PAGE" command to the IOMMU but the command buffer is getting full and the kernel needed to wait for the command buffer to free up. However, the kernel message did not exactly telling us what caused IOMMU to locked up in the first place. According to my observation, high disk traffic workload should trigger large amount of "INVALIDATE_IOMMU_PAGE". However, this doesn't automatically issuing "COMPLETION_WAIT" command. The following patch slightly modify the code to always issue "COMPLETION_WAIT" after every command. This should help increasing the chance of reproducing the issue. diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index c1c74e0..d05b1f9 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -1016,6 +1016,7 @@ static int iommu_queue_command_sync(struct amd_iommu *iommu, struct iommu_cmd *cmd, bool sync) { +#if 0 u32 left, tail, head, next_tail; unsigned long flags; @@ -1052,6 +1053,40 @@ again: spin_unlock_irqrestore(&iommu->lock, flags); +#else + u32 tail; + unsigned long flags; + + WARN_ON(iommu->cmd_buf_size & CMD_BUFFER_UNINITIALIZED); + printk (KERN_DEBUG "AMD-Vi: iommu_queue_command_sync: iommu_queue_command_sync" + " data[0]:%#x data[1]:%#x data[2]:%#x data[3]:%#x\n", + cmd->data[0], cmd->data[1], cmd->data[2], cmd->data[3] ); + + spin_lock_irqsave(&iommu->lock, flags); + + tail = readl(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET); + copy_cmd_to_buffer(iommu, cmd, tail); + + spin_unlock_irqrestore(&iommu->lock, flags); + + // Sending completion_wait command + { + struct iommu_cmd sync_cmd; + volatile u64 sem = 0; + int ret; + + spin_lock_irqsave(&iommu->lock, flags); + + tail = readl(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET); + build_completion_wait(&sync_cmd, (u64)&sem); + copy_cmd_to_buffer(iommu, &sync_cmd, tail); + + spin_unlock_irqrestore(&iommu->lock, flags); + + if ((ret = wait_on_sem(&sem)) != 0) + return ret; + } +#endif return 0; } ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-22 23:29 ` Suravee Suthikulanit @ 2013-01-23 14:19 ` Udo van den Heuvel 2013-01-23 15:00 ` Suravee Suthikulpanit 2013-01-23 14:23 ` Udo van den Heuvel 1 sibling, 1 reply; 38+ messages in thread From: Udo van den Heuvel @ 2013-01-23 14:19 UTC (permalink / raw) To: Suravee Suthikulanit Cc: Boris Ostrovsky, Jacob Shin, Borislav Petkov, Jörg Rödel, linux-kernel On 2013-01-23 00:29, Suravee Suthikulanit wrote: > I sent out a patch > (http://marc.info/?l=linux-kernel&m=135889686523524&w=2) which should > implement > the workaround for AMD processor family15h model 10-1Fh erratum 746 in > the IOMMU driver. > In your case, the output from "setpci -s 00:00.02 F4.w" is "0050" which > tells me that BIOS doesn't > implement the work around. After patching, you should see the following > message in "dmesg". > > "AMD-Vi: Applying erratum 746 for IOMMU at 0000:00:00.2" Thanks! I'll check for that after these messages. > The following patch slightly modify > the code to always issue "COMPLETION_WAIT" after every command. This > should help increasing the chance of reproducing > the issue. Should I test with these two patches together? Or should I apply the first one first and then see what the second can help? Kind regards, Udo ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-23 14:19 ` Udo van den Heuvel @ 2013-01-23 15:00 ` Suravee Suthikulpanit 0 siblings, 0 replies; 38+ messages in thread From: Suravee Suthikulpanit @ 2013-01-23 15:00 UTC (permalink / raw) To: Udo van den Heuvel Cc: Boris Ostrovsky, Jacob Shin, Borislav Petkov, Jörg Rödel, linux-kernel On 1/23/2013 8:19 AM, Udo van den Heuvel wrote: > On 2013-01-23 00:29, Suravee Suthikulanit wrote: >> I sent out a patch >> (http://marc.info/?l=linux-kernel&m=135889686523524&w=2) which should >> implement >> the workaround for AMD processor family15h model 10-1Fh erratum 746 in >> the IOMMU driver. >> In your case, the output from "setpci -s 00:00.02 F4.w" is "0050" which >> tells me that BIOS doesn't >> implement the work around. After patching, you should see the following >> message in "dmesg". >> >> "AMD-Vi: Applying erratum 746 for IOMMU at 0000:00:00.2" > Thanks! > I'll check for that after these messages. > >> The following patch slightly modify >> the code to always issue "COMPLETION_WAIT" after every command. This >> should help increasing the chance of reproducing >> the issue. > Should I test with these two patches together? > Or should I apply the first one first and then see what the second can help? Please try the first one first. If the issue doesn't reproduce, you can use the second patch to try to trigger it. Thank you, Suravee > > > Kind regards, > Udo > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-22 23:29 ` Suravee Suthikulanit 2013-01-23 14:19 ` Udo van den Heuvel @ 2013-01-23 14:23 ` Udo van den Heuvel 2013-01-23 15:01 ` Suravee Suthikulpanit 1 sibling, 1 reply; 38+ messages in thread From: Udo van den Heuvel @ 2013-01-23 14:23 UTC (permalink / raw) To: Suravee Suthikulanit Cc: Boris Ostrovsky, Jacob Shin, Borislav Petkov, Jörg Rödel, linux-kernel On 2013-01-23 00:29, Suravee Suthikulanit wrote: > message in "dmesg". > > "AMD-Vi: Applying erratum 746 for IOMMU at 0000:00:00.2" [ 1.091733] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40 I assume that is correct. Kind regards, Udo ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-23 14:23 ` Udo van den Heuvel @ 2013-01-23 15:01 ` Suravee Suthikulpanit 0 siblings, 0 replies; 38+ messages in thread From: Suravee Suthikulpanit @ 2013-01-23 15:01 UTC (permalink / raw) To: Udo van den Heuvel Cc: Boris Ostrovsky, Jacob Shin, Borislav Petkov, Jörg Rödel, linux-kernel On 1/23/2013 8:23 AM, Udo van den Heuvel wrote: > On 2013-01-23 00:29, Suravee Suthikulanit wrote: >> message in "dmesg". >> >> "AMD-Vi: Applying erratum 746 for IOMMU at 0000:00:00.2" This is expected. Regards, Suravee > [ 1.091733] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40 > > I assume that is correct. > > Kind regards, > Udo > > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out 2013-01-22 14:36 ` Boris Ostrovsky 2013-01-22 15:16 ` Jörg Rödel 2013-01-22 15:27 ` Udo van den Heuvel @ 2013-01-31 15:42 ` Udo van den Heuvel 2 siblings, 0 replies; 38+ messages in thread From: Udo van den Heuvel @ 2013-01-31 15:42 UTC (permalink / raw) To: Boris Ostrovsky Cc: suravee.suthikulpanit, Jacob Shin, Borislav Petkov, Jörg Rödel, linux-kernel On 2013-01-22 15:36, Boris Ostrovsky wrote: > > > On 01/22/2013 09:13 AM, Udo van den Heuvel wrote: >> Gigabyte demonstrate that using ESX 5i IOMMU works fine. I forwarded the malinglist links with the patch(es) to Gigabyte support and they forwarded the info to the BIOS_team. (to be continued) Udo ^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2013-04-21 21:47 UTC | newest] Thread overview: 38+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-01-20 10:33 3.6.11 AMD-Vi: Completion-Wait loop timed out Udo van den Heuvel 2013-01-20 10:36 ` Borislav Petkov 2013-01-20 10:40 ` Udo van den Heuvel 2013-01-20 11:19 ` Jörg Rödel 2013-01-20 11:25 ` Udo van den Heuvel 2013-01-20 11:40 ` Jörg Rödel 2013-01-20 11:48 ` Borislav Petkov 2013-01-20 11:50 ` Borislav Petkov 2013-01-20 11:59 ` Udo van den Heuvel 2013-01-20 12:24 ` Borislav Petkov 2013-01-20 11:52 ` Udo van den Heuvel 2013-01-20 11:57 ` Jörg Rödel 2013-01-21 13:09 ` Borislav Petkov 2013-01-21 14:10 ` Udo van den Heuvel 2013-01-21 14:55 ` Borislav Petkov 2013-01-21 15:10 ` Jörg Rödel 2013-01-21 15:32 ` Borislav Petkov 2013-01-21 15:34 ` Udo van den Heuvel 2013-04-21 1:03 ` Jake 2013-04-21 21:47 ` Borislav Petkov 2013-01-21 14:37 ` Boris Ostrovsky 2013-01-21 14:44 ` Udo van den Heuvel 2013-01-21 14:47 ` Jörg Rödel 2013-01-21 16:04 ` Jacob Shin 2013-01-21 22:35 ` Suravee Suthikulpanit 2013-01-22 3:22 ` Udo van den Heuvel 2013-01-22 14:13 ` Udo van den Heuvel 2013-01-22 14:36 ` Boris Ostrovsky 2013-01-22 15:16 ` Jörg Rödel 2013-01-22 15:27 ` Udo van den Heuvel 2013-01-22 16:12 ` Boris Ostrovsky 2013-01-22 16:29 ` Udo van den Heuvel 2013-01-22 23:29 ` Suravee Suthikulanit 2013-01-23 14:19 ` Udo van den Heuvel 2013-01-23 15:00 ` Suravee Suthikulpanit 2013-01-23 14:23 ` Udo van den Heuvel 2013-01-23 15:01 ` Suravee Suthikulpanit 2013-01-31 15:42 ` Udo van den Heuvel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox