* regression ioatdma 3.3 @ 2012-01-27 13:31 William Dauchy 2012-01-27 14:47 ` Konrad Rzeszutek Wilk 0 siblings, 1 reply; 29+ messages in thread From: William Dauchy @ 2012-01-27 13:31 UTC (permalink / raw) To: xen-devel Hello, I have some troubles loading the IOATDMA module under xen4.1.2 and a linux dom0 3.3 CONFIG_INTEL_IOATDMA=m CONFIG_IGB=y It was working with linux 3.1.5. The regression seems to be since linux 3.2. I tried to do a `git bisect` but I'm facing other regressions which make the debug harder. Here is the call trace when loading the module in dom0: dca service started, version 1.12.1 ioatdma: Intel(R) QuickData Technology Driver 4.00 ioatdma 0000:00:16.0: enabling device (0000 -> 0002) xen: registering gsi 43 triggering 0 polarity 1 xen: --> pirq=43 -> irq=43 (gsi=43) ------------[ cut here ]------------ kernel BUG at /linux-3.3/drivers/dma/ioat/dma_v2.c:163! invalid opcode: 0000 [#1] SMP Modules linked in: ioatdma(+) dca ebt_ip6 ebt_dnat ebt_ip ebtable_broute ebt_snat ebtable_nat ebtable_filter ebtables bridge stp llc ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod button Pid: 0, comm: swapper/0 Not tainted 3.3.0-dom0-6357-i386+ #24 Dell C6100 /0D61XP EIP: 0061:[<f512c524>] EFLAGS: 00010246 CPU: 0 EIP is at __cleanup+0x154/0x160 [ioatdma] EAX: 00000000 EBX: e9dd44c0 ECX: 769ed7af EDX: 00000002 ESI: e90fe48c EDI: 00000002 EBP: eb40bf9c ESP: eb40bf7c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 Process swapper/0 (pid: 0, ti=eb40a000 task=c1401ee0 task.ti=c13fc000) Stack: eadc8040 00010002 18ae4040 00000002 0000bf9c e90fe48c e90fe4bc 00000006 eb40bfb0 f512c770 18ae4040 00000000 e90fe4f8 eb40bfc0 c10347cb 00000001 00000018 eb40bff8 c10350ab 00000000 00000000 00000000 00000000 00000000 Call Trace: [<f512c770>] ioat2_cleanup_event+0x30/0x50 [ioatdma] [<c10347cb>] tasklet_action+0x9b/0xb0 [<c10350ab>] __do_softirq+0x7b/0x110 [<c1035030>] ? irq_enter+0x70/0x70 <IRQ> [<c1034e7e>] ? irq_exit+0x6e/0xa0 [<c11bde70>] ? xen_evtchn_do_upcall+0x20/0x30 [<c1322907>] ? xen_do_upcall+0x7/0xc [<c10013a7>] ? hypercall_page+0x3a7/0x1000 [<c1006172>] ? xen_safe_halt+0x12/0x20 [<c1010582>] ? default_idle+0x32/0x60 [<c1008596>] ? cpu_idle+0x66/0xa0 [<c130bd58>] ? rest_init+0x58/0x60 [<c14237d2>] ? start_kernel+0x2e4/0x2ea [<c142331d>] ? kernel_init+0x11b/0x11b [<c14230ba>] ? i386_start_kernel+0xa9/0xb0 [<c1426abb>] ? xen_start_kernel+0x5a2/0x5aa Code: 00 e8 41 7a f0 cb 8b 15 40 1a 40 c1 8d 14 10 8d 46 3c e8 60 ea f0 cb 83 c4 14 5b 5e 5f c9 c3 31 d2 89 df 31 c0 eb a2 84 c0 75 b5 <0f> 0b eb fe 90 8d b4 26 00 00 00 00 55 89 e5 57 56 53 89 c3 83 EIP: [<f512c524>] __cleanup+0x154/0x160 [ioatdma] SS:ESP 0069:eb40bf7c ---[ end trace 902e93593e49fa50 ]--- Kernel panic - not syncing: Fatal exception in interrupt Does anybody have any clue? Regards, -- William ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: regression ioatdma 3.3 2012-01-27 13:31 regression ioatdma 3.3 William Dauchy @ 2012-01-27 14:47 ` Konrad Rzeszutek Wilk 2012-01-27 15:02 ` William Dauchy 2012-02-19 22:31 ` Jonathan Nieder 0 siblings, 2 replies; 29+ messages in thread From: Konrad Rzeszutek Wilk @ 2012-01-27 14:47 UTC (permalink / raw) To: William Dauchy; +Cc: xen-devel On Fri, Jan 27, 2012 at 02:31:55PM +0100, William Dauchy wrote: > Hello, > > I have some troubles loading the IOATDMA module under xen4.1.2 and a > linux dom0 3.3 So you are using the rc1 version? What exact git commit are you using? > > CONFIG_INTEL_IOATDMA=m > CONFIG_IGB=y > > It was working with linux 3.1.5. The regression seems to be since > linux 3.2. I tried to do a `git bisect` but I'm facing other 3.2 you say? This below is 3.3? > regressions which make the debug harder. Such as? > > Here is the call trace when loading the module in dom0: Is the problem present with baremetal (same exact kernel?) Do you see this if you run a 64-bit dom0? > > dca service started, version 1.12.1 > ioatdma: Intel(R) QuickData Technology Driver 4.00 > ioatdma 0000:00:16.0: enabling device (0000 -> 0002) > xen: registering gsi 43 triggering 0 polarity 1 > xen: --> pirq=43 -> irq=43 (gsi=43) > ------------[ cut here ]------------ > kernel BUG at /linux-3.3/drivers/dma/ioat/dma_v2.c:163! > invalid opcode: 0000 [#1] SMP > Modules linked in: ioatdma(+) dca ebt_ip6 ebt_dnat ebt_ip > ebtable_broute ebt_snat ebtable_nat ebtable_filter ebtables bridge stp > llc ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod > button > > Pid: 0, comm: swapper/0 Not tainted 3.3.0-dom0-6357-i386+ #24 Dell > C6100 /0D61XP > EIP: 0061:[<f512c524>] EFLAGS: 00010246 CPU: 0 > EIP is at __cleanup+0x154/0x160 [ioatdma] > EAX: 00000000 EBX: e9dd44c0 ECX: 769ed7af EDX: 00000002 > ESI: e90fe48c EDI: 00000002 EBP: eb40bf9c ESP: eb40bf7c > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 > Process swapper/0 (pid: 0, ti=eb40a000 task=c1401ee0 task.ti=c13fc000) > Stack: > eadc8040 00010002 18ae4040 00000002 0000bf9c e90fe48c e90fe4bc 00000006 > eb40bfb0 f512c770 18ae4040 00000000 e90fe4f8 eb40bfc0 c10347cb 00000001 > 00000018 eb40bff8 c10350ab 00000000 00000000 00000000 00000000 00000000 > Call Trace: > [<f512c770>] ioat2_cleanup_event+0x30/0x50 [ioatdma] > [<c10347cb>] tasklet_action+0x9b/0xb0 > [<c10350ab>] __do_softirq+0x7b/0x110 > [<c1035030>] ? irq_enter+0x70/0x70 > <IRQ> > [<c1034e7e>] ? irq_exit+0x6e/0xa0 > [<c11bde70>] ? xen_evtchn_do_upcall+0x20/0x30 > [<c1322907>] ? xen_do_upcall+0x7/0xc > [<c10013a7>] ? hypercall_page+0x3a7/0x1000 > [<c1006172>] ? xen_safe_halt+0x12/0x20 > [<c1010582>] ? default_idle+0x32/0x60 > [<c1008596>] ? cpu_idle+0x66/0xa0 > [<c130bd58>] ? rest_init+0x58/0x60 > [<c14237d2>] ? start_kernel+0x2e4/0x2ea > [<c142331d>] ? kernel_init+0x11b/0x11b > [<c14230ba>] ? i386_start_kernel+0xa9/0xb0 > [<c1426abb>] ? xen_start_kernel+0x5a2/0x5aa > Code: 00 e8 41 7a f0 cb 8b 15 40 1a 40 c1 8d 14 10 8d 46 3c e8 60 ea > f0 cb 83 c4 14 5b 5e 5f c9 c3 31 d2 89 df 31 c0 eb a2 84 c0 75 b5 <0f> > 0b eb fe 90 8d b4 26 00 00 00 00 55 89 e5 57 56 53 89 c3 83 > EIP: [<f512c524>] __cleanup+0x154/0x160 [ioatdma] SS:ESP 0069:eb40bf7c > ---[ end trace 902e93593e49fa50 ]--- > Kernel panic - not syncing: Fatal exception in interrupt > > > Does anybody have any clue? > > Regards, > -- > William > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: regression ioatdma 3.3 2012-01-27 14:47 ` Konrad Rzeszutek Wilk @ 2012-01-27 15:02 ` William Dauchy 2012-02-19 22:31 ` Jonathan Nieder 1 sibling, 0 replies; 29+ messages in thread From: William Dauchy @ 2012-01-27 15:02 UTC (permalink / raw) To: Konrad Rzeszutek Wilk; +Cc: xen-devel On Fri, Jan 27, 2012 at 3:47 PM, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote: > So you are using the rc1 version? What exact git commit are you using? I pulled the last revision 74ea15d > 3.2 you say? This below is 3.3? Yes. I was using 3.1 kernel. After an upgrade to 3.2 I got the problem and thought it was good to report the problem with the last 3.3-rc kernel > Is the problem present with baremetal (same exact kernel?) I indeed tested with a baremetal kernel and didn't got any problem. So it seems to come from a Xen problem. > Do you see this if you run a 64-bit dom0? I didn't test this. -- William ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: regression ioatdma 3.3 2012-01-27 14:47 ` Konrad Rzeszutek Wilk 2012-01-27 15:02 ` William Dauchy @ 2012-02-19 22:31 ` Jonathan Nieder 2012-02-20 18:16 ` Jonathan Nieder 1 sibling, 1 reply; 29+ messages in thread From: Jonathan Nieder @ 2012-02-19 22:31 UTC (permalink / raw) To: Konrad Rzeszutek Wilk; +Cc: Thomas Goirand, xen-devel, William Dauchy forwarded 660554 http://thread.gmane.org/gmane.comp.emulators.xen.devel/121604 quit (cc-ing Thomas, since he ran into the same bug) Hi, Konrad Rzeszutek Wilk wrote: > On Fri, Jan 27, 2012 at 02:31:55PM +0100, William Dauchy wrote: >> I have some troubles loading the IOATDMA module under xen4.1.2 and a >> linux dom0 3.3 > > So you are using the rc1 version? What exact git commit are you using? Broken: v3.2.6 + Debian patches (zigo) v3.3-rc2~22 (William) Not broken: v3.1.8 + Debian patches, presumably (zigo) v3.1.5 (William) [...] >> Here is the call trace when loading the module in dom0: > > Is the problem present with baremetal (same exact kernel?) No. > Do you see this if you run a 64-bit dom0? I'm guessing not, just based on the crazy coincidence that both reports were with 32-bit kernels. But who knows. ;-) [...] >> kernel BUG at /linux-3.3/drivers/dma/ioat/dma_v2.c:163! >> invalid opcode: 0000 [#1] SMP >> Modules linked in: ioatdma(+) dca ebt_ip6 ebt_dnat ebt_ip >> ebtable_broute ebt_snat ebtable_nat ebtable_filter ebtables bridge stp >> llc ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod >> button >> >> Pid: 0, comm: swapper/0 Not tainted 3.3.0-dom0-6357-i386+ #24 Dell C6100 /0D61XP This is active = ioat2_ring_active(ioat); for (i = 0; i < active && !seen_current; i++) { ... if (tx->phys == phys_complete) seen_current = true; } ... BUG_ON(active && !seen_current); /* no active descs have written a completion? */ Any hints for tracking it down? Thanks, Jonathan ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: regression ioatdma 3.3 2012-02-19 22:31 ` Jonathan Nieder @ 2012-02-20 18:16 ` Jonathan Nieder 2012-02-25 7:46 ` Thomas Goirand 0 siblings, 1 reply; 29+ messages in thread From: Jonathan Nieder @ 2012-02-20 18:16 UTC (permalink / raw) To: Konrad Rzeszutek Wilk; +Cc: Thomas Goirand, xen-devel, William Dauchy > Konrad Rzeszutek Wilk wrote: >> Do you see this if you run a 64-bit dom0? Looks like no. Thomas reports[1]: > I just tried with the amd64 kernel and Xen, and I didn't see any issue. > > However, it is important that Xen 4.1 + Linux 3.2 works with a 32 bits > kernel, because that is the most optimized configuration (eg: 64 bits > hypervisor, 32 bits kernel and 32 bits userland). Maybe Andres's patches are relevant. Hope that helps, Jonathan [1] http://bugs.debian.org/660554#25 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: regression ioatdma 3.3 2012-02-20 18:16 ` Jonathan Nieder @ 2012-02-25 7:46 ` Thomas Goirand 2012-02-25 21:13 ` William Dauchy 2012-03-02 5:57 ` ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 Jonathan Nieder 0 siblings, 2 replies; 29+ messages in thread From: Thomas Goirand @ 2012-02-25 7:46 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Konrad Rzeszutek Wilk, xen-devel, William Dauchy On 02/21/2012 02:16 AM, Jonathan Nieder wrote: >> I just tried with the amd64 kernel and Xen, and I didn't see any issue. >> >> However, it is important that Xen 4.1 + Linux 3.2 works with a 32 bits >> kernel, because that is the most optimized configuration (eg: 64 bits >> hypervisor, 32 bits kernel and 32 bits userland). >> > Maybe Andres's patches are relevant. > > Hope that helps, > Jonathan > > [1] http://bugs.debian.org/660554#25 > Hi, Which patch are you referring to? Is there anything I can do to help testing/investigating this? Should this be reported in the LKML? How can I find who's the author of this driver? Thomas Goirand (zigo) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: regression ioatdma 3.3 2012-02-25 7:46 ` Thomas Goirand @ 2012-02-25 21:13 ` William Dauchy 2012-03-02 5:57 ` ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 Jonathan Nieder 1 sibling, 0 replies; 29+ messages in thread From: William Dauchy @ 2012-02-25 21:13 UTC (permalink / raw) To: Thomas Goirand; +Cc: Jonathan Nieder, Konrad Rzeszutek Wilk, xen-devel Hi Thomas, On Sat, Feb 25, 2012 at 8:46 AM, Thomas Goirand <thomas@goirand.fr> wrote: > How can I find who's the author of this driver? I don't think the problem is related to the driver itself, because it is working without xen. I'm also looking for hints to fix the problem. -- William ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-02-25 7:46 ` Thomas Goirand 2012-02-25 21:13 ` William Dauchy @ 2012-03-02 5:57 ` Jonathan Nieder 2012-03-02 6:42 ` Dan Williams 1 sibling, 1 reply; 29+ messages in thread From: Jonathan Nieder @ 2012-03-02 5:57 UTC (permalink / raw) To: Dan Williams Cc: Thomas Goirand, Konrad Rzeszutek Wilk, xen-devel, William Dauchy, Maciej Sosnowski, pkg-xen-devel, linux-kernel Hi Dan, Thomas and William (cc-ed) have been having trouble loading the ioatdma driver on a 32-bit Xen dom0. The module loads automatically at boot time and trips BUG_ON(active && !seen_current); /* no active descs have written a completion? */ from drivers/dma/ioat/dma_v2.c. That check has been present since 5cbafa65b92e (ioat2,3: convert to a true ring buffer, 2009-08-26). The bug is probably in Xen code and seems to be a regression (the bug is present in 3.2 but not 3.1.8). Thomas Goirand wrote: > On 03/01/2012 11:53 PM, Bastian Blank wrote: >> On Thu, Mar 01, 2012 at 06:02:15PM +0800, Thomas Goirand wrote: >>> Any clue why I don't see crashes without Xen, with a >>> 64 bits kernel, or with earlier versions of Linux (eg: 3.1 for example)? >> >> xen/i386 uses a different memory model to anything else, this may be a >> problem. [...] > Replacing BUG_ON by a WARN_ON, and adding #define DEBUG 1 on top of > dma_v2.c, my kernel booted, and I had the attached dmesg output. > > Blacklisting the ioatdma kernel module of course, solved the issue. > > I hope that helps, please let me know if I should do more to help. If > you need access to my server, that's possible (I use it only for > packaging XCP and some tests...). I don't expect you to debug this Xen-specific bug, but I'm wondering: is there any reason this check has to be a BUG_ON instead of a WARN_ON? If there is some way to recover when the impossible happens, that would make using and debugging the kernel a little easier. Curious, Jonathan ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-02 5:57 ` ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 Jonathan Nieder @ 2012-03-02 6:42 ` Dan Williams 2012-03-02 16:21 ` [Pkg-xen-devel] " Bastian Blank 0 siblings, 1 reply; 29+ messages in thread From: Dan Williams @ 2012-03-02 6:42 UTC (permalink / raw) To: Jonathan Nieder Cc: xen-devel, pkg-xen-devel, Maciej Sosnowski, linux-kernel, William Dauchy, Konrad Rzeszutek Wilk, Thomas Goirand [-- Attachment #1.1: Type: text/plain, Size: 1917 bytes --] [replying from phone] WARN_ON may work, but then kernel may be subject random hangs from missed i/o completions. Is xen32 using vt-d? Just wondering if writes from ioat device are getting misdirected. -- Dan On Mar 1, 2012 9:57 PM, "Jonathan Nieder" <jrnieder@gmail.com> wrote: > Hi Dan, > > Thomas and William (cc-ed) have been having trouble loading the > ioatdma driver on a 32-bit Xen dom0. The module loads automatically > at boot time and trips > > BUG_ON(active && !seen_current); /* no active descs have written a > completion? */ > > from drivers/dma/ioat/dma_v2.c. That check has been present since > 5cbafa65b92e (ioat2,3: convert to a true ring buffer, 2009-08-26). > The bug is probably in Xen code and seems to be a regression (the bug > is present in 3.2 but not 3.1.8). > > Thomas Goirand wrote: > > On 03/01/2012 11:53 PM, Bastian Blank wrote: > >> On Thu, Mar 01, 2012 at 06:02:15PM +0800, Thomas Goirand wrote: > > >>> Any clue why I don't see crashes without Xen, with > a > >>> 64 bits kernel, or with earlier versions of Linux (eg: 3.1 for > example)? > >> > >> xen/i386 uses a different memory model to anything else, this may be a > >> problem. > [...] > > Replacing BUG_ON by a WARN_ON, and adding #define DEBUG 1 on top of > > dma_v2.c, my kernel booted, and I had the attached dmesg output. > > > > Blacklisting the ioatdma kernel module of course, solved the issue. > > > > I hope that helps, please let me know if I should do more to help. If > > you need access to my server, that's possible (I use it only for > > packaging XCP and some tests...). > > I don't expect you to debug this Xen-specific bug, but I'm wondering: > is there any reason this check has to be a BUG_ON instead of a > WARN_ON? If there is some way to recover when the impossible happens, > that would make using and debugging the kernel a little easier. > > Curious, > Jonathan > [-- Attachment #1.2: Type: text/html, Size: 2381 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-02 6:42 ` Dan Williams @ 2012-03-02 16:21 ` Bastian Blank 2012-03-02 16:44 ` Dan Williams 0 siblings, 1 reply; 29+ messages in thread From: Bastian Blank @ 2012-03-02 16:21 UTC (permalink / raw) To: Dan Williams Cc: xen-devel, pkg-xen-devel, Maciej Sosnowski, linux-kernel, Jonathan Nieder, William Dauchy, Konrad Rzeszutek Wilk On Thu, Mar 01, 2012 at 10:42:44PM -0800, Dan Williams wrote: > WARN_ON may work, but then kernel may be subject random hangs from missed > i/o completions. Why is that? Currently it just dies if was triggered via interrupt and for some reason no active descriptor was found. > Is xen32 using vt-d? Yes. Xen can use VT-D. > Just wondering if writes from ioat > device are getting misdirected. How do VT-D and ioat interact? Bastian -- Too much of anything, even love, isn't necessarily a good thing. -- Kirk, "The Trouble with Tribbles", stardate 4525.6 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-02 16:21 ` [Pkg-xen-devel] " Bastian Blank @ 2012-03-02 16:44 ` Dan Williams 2012-03-02 17:57 ` Bastian Blank 0 siblings, 1 reply; 29+ messages in thread From: Dan Williams @ 2012-03-02 16:44 UTC (permalink / raw) To: Dan Williams, Jonathan Nieder, xen-devel, pkg-xen-devel, Maciej Sosnowski, linux-kernel, William Dauchy, Konrad Rzeszutek Wilk, Dave Jiang On Fri, Mar 2, 2012 at 8:21 AM, Bastian Blank <waldi@debian.org> wrote: > On Thu, Mar 01, 2012 at 10:42:44PM -0800, Dan Williams wrote: >> WARN_ON may work, but then kernel may be subject random hangs from missed >> i/o completions. ...actually descriptors completing too early. > > Why is that? Currently it just dies if was triggered via interrupt and > for some reason no active descriptor was found. No, it's not the case that "no active descriptor was found". The channel is walking through the submitted descriptor chain to catch up with what was last posted to ' phys_complete'. It expects to stop when seeing phys_complete, but if it never finds it the driver ends up completing the entire pending ring. The BUG_ON is there because the driver has just completed every descriptor in the chain, and if the kernel was depending on proper descriptor ordering it may have just violated it. So I take it back, we can't go to WARN_ON, because the state of the system is compromised and we need to bring it to a halt. That said the code is likely failing in the self test, so the system is probably fine, but if this happened in the network or raid layer it is potentially fatal. >> Is xen32 using vt-d? > > Yes. Xen can use VT-D. > >> Just wondering if writes from ioat >> device are getting misdirected. > > How do VT-D and ioat interact? > Same as any other pci bus mastering device, via dma_map to get a io-virtual address. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-02 16:44 ` Dan Williams @ 2012-03-02 17:57 ` Bastian Blank 2012-03-02 19:31 ` Dan Williams 0 siblings, 1 reply; 29+ messages in thread From: Bastian Blank @ 2012-03-02 17:57 UTC (permalink / raw) To: Dan Williams Cc: Jonathan Nieder, xen-devel, pkg-xen-devel, Maciej Sosnowski, linux-kernel, William Dauchy, Konrad Rzeszutek Wilk, Dave Jiang On Fri, Mar 02, 2012 at 08:44:00AM -0800, Dan Williams wrote: > On Fri, Mar 2, 2012 at 8:21 AM, Bastian Blank <waldi@debian.org> wrote: > > On Thu, Mar 01, 2012 at 10:42:44PM -0800, Dan Williams wrote: > >> WARN_ON may work, but then kernel may be subject random hangs from missed > >> i/o completions. > ...actually descriptors completing too early. The interrupt happens while the module is still loading, so most likely directly after enabling them. There should be no request in flight yet. What puzzles me is the mix of different data types in the ioatdma driver: | u64 completion = *chan->completion; | unsigned long phys_complete = completion & ~0x3f; The state is 64bit long, but is down converted to a 32bit value without anything. phys_complete (a 32 bit value) gets compared to struct dma_async_tx_descriptor.phys, which is defined as dma_addr_t, a _64_ bit value. Bastian -- ... The prejudices people feel about each other disappear when they get to know each other. -- Kirk, "Elaan of Troyius", stardate 4372.5 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-02 17:57 ` Bastian Blank @ 2012-03-02 19:31 ` Dan Williams 2012-03-02 20:08 ` Bastian Blank 2012-03-05 15:26 ` Thomas Goirand 0 siblings, 2 replies; 29+ messages in thread From: Dan Williams @ 2012-03-02 19:31 UTC (permalink / raw) To: Dan Williams, Jonathan Nieder, xen-devel, pkg-xen-devel, Maciej Sosnowski, linux-kernel, William Dauchy, Konrad Rzeszutek Wilk, Dave Jiang On Fri, Mar 2, 2012 at 9:57 AM, Bastian Blank <waldi@debian.org> wrote: > On Fri, Mar 02, 2012 at 08:44:00AM -0800, Dan Williams wrote: >> On Fri, Mar 2, 2012 at 8:21 AM, Bastian Blank <waldi@debian.org> wrote: >> > On Thu, Mar 01, 2012 at 10:42:44PM -0800, Dan Williams wrote: >> >> WARN_ON may work, but then kernel may be subject random hangs from missed >> >> i/o completions. >> ...actually descriptors completing too early. > > The interrupt happens while the module is still loading, so most likely > directly after enabling them. There should be no request in flight yet. > > What puzzles me is the mix of different data types in the ioatdma > driver: > > | u64 completion = *chan->completion; > | unsigned long phys_complete = completion & ~0x3f; > > The state is 64bit long, but is down converted to a 32bit value without > anything. > > phys_complete (a 32 bit value) gets compared to struct > dma_async_tx_descriptor.phys, which is defined as dma_addr_t, a _64_ bit > value. The assumption is that the driver's control structures are not in high memory so all address values will only have 32-bits of valid data, but maybe xen32 changes that assumption? Can you send the log of the driver load with debug enabled? diff --git a/drivers/dma/ioat/dma.c b/drivers/dma/ioat/dma.c index a4d6cb0..82472de 100644 --- a/drivers/dma/ioat/dma.c +++ b/drivers/dma/ioat/dma.c @@ -24,7 +24,7 @@ * This driver supports an Intel I/OAT DMA engine, which does asynchronous * copy operations. */ - +#define DEBUG #include <linux/init.h> #include <linux/module.h> #include <linux/slab.h> diff --git a/drivers/dma/ioat/dma_v2.c b/drivers/dma/ioat/dma_v2.c index 5d65f83..da337e7 100644 --- a/drivers/dma/ioat/dma_v2.c +++ b/drivers/dma/ioat/dma_v2.c @@ -24,7 +24,7 @@ * This driver supports an Intel I/OAT DMA engine (versions >= 2), which * does asynchronous data movement and checksumming operations. */ - +#define DEBUG #include <linux/init.h> #include <linux/module.h> #include <linux/slab.h> ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-02 19:31 ` Dan Williams @ 2012-03-02 20:08 ` Bastian Blank 2012-03-02 20:16 ` Dan Williams 2012-03-05 15:26 ` Thomas Goirand 1 sibling, 1 reply; 29+ messages in thread From: Bastian Blank @ 2012-03-02 20:08 UTC (permalink / raw) To: Dan Williams Cc: Jonathan Nieder, xen-devel, pkg-xen-devel, Maciej Sosnowski, linux-kernel, William Dauchy, Konrad Rzeszutek Wilk, Dave Jiang On Fri, Mar 02, 2012 at 11:31:56AM -0800, Dan Williams wrote: > On Fri, Mar 2, 2012 at 9:57 AM, Bastian Blank <waldi@debian.org> wrote: > > phys_complete (a 32 bit value) gets compared to struct > > dma_async_tx_descriptor.phys, which is defined as dma_addr_t, a _64_ bit > > value. > The assumption is that the driver's control structures are not in high > memory so all address values will only have 32-bits of valid data, Can you back that up by some kernel documentation? There is a reason why pci_alloc_pool uses dma_addr_t to store the address and _not_ unsigned long. This are physical addresses, nothing the kernel can access directly without a mapping. > but > maybe xen32 changes that assumption? Xen changes a lot of things in the memory management. This includes that physical != machine addresses, where i915 failed horrible. > Can you send the log of the driver load with debug enabled? No, I don't have that hardware. Bastian -- Each kiss is as the first. -- Miramanee, Kirk's wife, "The Paradise Syndrome", stardate 4842.6 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-02 20:08 ` Bastian Blank @ 2012-03-02 20:16 ` Dan Williams 2012-03-02 20:56 ` Bastian Blank 0 siblings, 1 reply; 29+ messages in thread From: Dan Williams @ 2012-03-02 20:16 UTC (permalink / raw) To: Dan Williams, Jonathan Nieder, xen-devel, pkg-xen-devel, Maciej Sosnowski, linux-kernel, William Dauchy, Konrad Rzeszutek Wilk, Dave Jiang On Fri, Mar 2, 2012 at 12:08 PM, Bastian Blank <waldi@debian.org> wrote: > On Fri, Mar 02, 2012 at 11:31:56AM -0800, Dan Williams wrote: >> On Fri, Mar 2, 2012 at 9:57 AM, Bastian Blank <waldi@debian.org> wrote: >> > phys_complete (a 32 bit value) gets compared to struct >> > dma_async_tx_descriptor.phys, which is defined as dma_addr_t, a _64_ bit >> > value. >> The assumption is that the driver's control structures are not in high >> memory so all address values will only have 32-bits of valid data, > > Can you back that up by some kernel documentation? There is a reason why > pci_alloc_pool uses dma_addr_t to store the address and _not_ unsigned > long. This are physical addresses, nothing the kernel can access > directly without a mapping. High memory can only be accessed with kmap(), so the assumption is that dma_alloc never gives a buffer address above 32-bits on a 32-bit build. Yes, if HIGHMEM64G is set dma_addr_t becomes 64-bit, but that is only to access high memory mapped application buffers via dma_map. I'm not aware of any documentation in this area. I don't mind bumping up the size if xen32 is changing the above assumptions, but I'd want confirmation that this is the failure scenario. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-02 20:16 ` Dan Williams @ 2012-03-02 20:56 ` Bastian Blank 2012-03-02 21:17 ` Dan Williams 0 siblings, 1 reply; 29+ messages in thread From: Bastian Blank @ 2012-03-02 20:56 UTC (permalink / raw) To: Dan Williams Cc: Jonathan Nieder, xen-devel, pkg-xen-devel, Maciej Sosnowski, linux-kernel, William Dauchy, Konrad Rzeszutek Wilk, Dave Jiang On Fri, Mar 02, 2012 at 12:16:47PM -0800, Dan Williams wrote: > On Fri, Mar 2, 2012 at 12:08 PM, Bastian Blank <waldi@debian.org> wrote: > > On Fri, Mar 02, 2012 at 11:31:56AM -0800, Dan Williams wrote: > >> On Fri, Mar 2, 2012 at 9:57 AM, Bastian Blank <waldi@debian.org> wrote: > >> > phys_complete (a 32 bit value) gets compared to struct > >> > dma_async_tx_descriptor.phys, which is defined as dma_addr_t, a _64_ bit > >> > value. > >> The assumption is that the driver's control structures are not in high > >> memory so all address values will only have 32-bits of valid data, > > Can you back that up by some kernel documentation? There is a reason why > > pci_alloc_pool uses dma_addr_t to store the address and _not_ unsigned > > long. This are physical addresses, nothing the kernel can access > > directly without a mapping. > High memory can only be accessed with kmap(), so the assumption is > that dma_alloc never gives a buffer address above 32-bits on a 32-bit > build. Yes, if HIGHMEM64G is set dma_addr_t becomes 64-bit, but that > is only to access high memory mapped application buffers via dma_map. All memory needs to be mapped. Linux just have a default mapping of 1GiB of the memory handy. However this is irrelevant for the physical DMA addresses we talk about. A assume this devices have a DMA mask of 2^64, so they can address memory above the 4GiB. And the kernel will happily assign this memory if necessary or usefull. > I'm not aware of any documentation in this area. There is; the header files qualifies as documentation. > I don't mind bumping up the size if xen32 is changing the above > assumptions, but I'd want confirmation that this is the failure > scenario. At least it looks pretty wrong to remove four bits from a given address just for fun. Bastian -- Respect is a rational process -- McCoy, "The Galileo Seven", stardate 2822.3 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-02 20:56 ` Bastian Blank @ 2012-03-02 21:17 ` Dan Williams 0 siblings, 0 replies; 29+ messages in thread From: Dan Williams @ 2012-03-02 21:17 UTC (permalink / raw) To: Dan Williams, Jonathan Nieder, xen-devel, pkg-xen-devel, Maciej Sosnowski, linux-kernel, William Dauchy, Konrad Rzeszutek Wilk, Dave Jiang On Fri, Mar 2, 2012 at 12:56 PM, Bastian Blank <waldi@debian.org> wrote: > On Fri, Mar 02, 2012 at 12:16:47PM -0800, Dan Williams wrote: >> On Fri, Mar 2, 2012 at 12:08 PM, Bastian Blank <waldi@debian.org> wrote: >> > On Fri, Mar 02, 2012 at 11:31:56AM -0800, Dan Williams wrote: >> >> On Fri, Mar 2, 2012 at 9:57 AM, Bastian Blank <waldi@debian.org> wrote: >> >> > phys_complete (a 32 bit value) gets compared to struct >> >> > dma_async_tx_descriptor.phys, which is defined as dma_addr_t, a _64_ bit >> >> > value. >> >> The assumption is that the driver's control structures are not in high >> >> memory so all address values will only have 32-bits of valid data, >> > Can you back that up by some kernel documentation? There is a reason why >> > pci_alloc_pool uses dma_addr_t to store the address and _not_ unsigned >> > long. This are physical addresses, nothing the kernel can access >> > directly without a mapping. >> High memory can only be accessed with kmap(), so the assumption is >> that dma_alloc never gives a buffer address above 32-bits on a 32-bit >> build. Yes, if HIGHMEM64G is set dma_addr_t becomes 64-bit, but that >> is only to access high memory mapped application buffers via dma_map. > > All memory needs to be mapped. Linux just have a default mapping of 1GiB > of the memory handy. However this is irrelevant for the physical DMA > addresses we talk about. I'm not sure you understand how himem works or we're talking past each other. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-02 19:31 ` Dan Williams 2012-03-02 20:08 ` Bastian Blank @ 2012-03-05 15:26 ` Thomas Goirand 2012-03-05 15:38 ` Dan Williams 1 sibling, 1 reply; 29+ messages in thread From: Thomas Goirand @ 2012-03-05 15:26 UTC (permalink / raw) To: Dan Williams Cc: Jonathan Nieder, xen-devel, pkg-xen-devel, Maciej Sosnowski, linux-kernel, William Dauchy, Konrad Rzeszutek Wilk, Dave Jiang On 03/03/2012 03:31 AM, Dan Williams wrote: > Can you send the log of the driver load with debug enabled? > > diff --git a/drivers/dma/ioat/dma.c b/drivers/dma/ioat/dma.c > index a4d6cb0..82472de 100644 > --- a/drivers/dma/ioat/dma.c > +++ b/drivers/dma/ioat/dma.c > @@ -24,7 +24,7 @@ > * This driver supports an Intel I/OAT DMA engine, which does asynchronous > * copy operations. > */ > - > +#define DEBUG > #include <linux/init.h> > #include <linux/module.h> > #include <linux/slab.h> > diff --git a/drivers/dma/ioat/dma_v2.c b/drivers/dma/ioat/dma_v2.c > index 5d65f83..da337e7 100644 > --- a/drivers/dma/ioat/dma_v2.c > +++ b/drivers/dma/ioat/dma_v2.c > @@ -24,7 +24,7 @@ > * This driver supports an Intel I/OAT DMA engine (versions >= 2), which > * does asynchronous data movement and checksumming operations. > */ > - > +#define DEBUG > #include <linux/init.h> > #include <linux/module.h> > #include <linux/slab.h> I will do my best to provide it ASAP. Should I compile with BUG_ON so you see it crashing, as per the original code, or just with WARN_ON, so you also see further things in dmesg? Thomas ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-05 15:26 ` Thomas Goirand @ 2012-03-05 15:38 ` Dan Williams 2012-03-06 9:20 ` Thomas Goirand 0 siblings, 1 reply; 29+ messages in thread From: Dan Williams @ 2012-03-05 15:38 UTC (permalink / raw) To: Thomas Goirand Cc: Jonathan Nieder, xen-devel, pkg-xen-devel, Maciej Sosnowski, linux-kernel, William Dauchy, Konrad Rzeszutek Wilk, Dave Jiang On Mon, Mar 5, 2012 at 7:26 AM, Thomas Goirand <zigo@debian.org> wrote: > I will do my best to provide it ASAP. Should I compile with BUG_ON so > you see it crashing, as per the original code, or just with WARN_ON, so > you also see further things in dmesg? Yes, replacing with a WARN_ON might allow it to skid after the crash and give a bit more information. Thank you for grabbing this info. -- Dan ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-05 15:38 ` Dan Williams @ 2012-03-06 9:20 ` Thomas Goirand 2012-03-06 10:33 ` Bastian Blank 2012-03-06 14:14 ` Dan Williams 0 siblings, 2 replies; 29+ messages in thread From: Thomas Goirand @ 2012-03-06 9:20 UTC (permalink / raw) To: Dan Williams Cc: xen-devel, Dave Jiang, pkg-xen-devel, Maciej Sosnowski, linux-kernel, Jonathan Nieder, William Dauchy, Konrad Rzeszutek Wilk [-- Attachment #1: Type: text/plain, Size: 919 bytes --] On 03/05/2012 11:38 PM, Dan Williams wrote: > On Mon, Mar 5, 2012 at 7:26 AM, Thomas Goirand <zigo@debian.org> wrote: >> I will do my best to provide it ASAP. Should I compile with BUG_ON so >> you see it crashing, as per the original code, or just with WARN_ON, so >> you also see further things in dmesg? > > Yes, replacing with a WARN_ON might allow it to skid after the crash > and give a bit more information. > > Thank you for grabbing this info. > > -- > Dan Hi Dan, Please find attached the log that you asked me, with WARN_ON instead of BUG_ON, and with the 2 #define DEBUG in dma.c and dma_v2.c. Let me know if you want me to do more, or if you want to have access to my server (in which case, provide me a public ssh key and sign your email with PGP). Thomas P.S: I compressed the dmesg.txt because on debian lists if a message is >= 40K, it requires administrator moderation, which I want to avoid. [-- Attachment #2: dmesg.txt.gz --] [-- Type: application/x-gzip, Size: 20336 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-06 9:20 ` Thomas Goirand @ 2012-03-06 10:33 ` Bastian Blank 2012-03-06 14:14 ` Dan Williams 1 sibling, 0 replies; 29+ messages in thread From: Bastian Blank @ 2012-03-06 10:33 UTC (permalink / raw) To: Thomas Goirand Cc: Dan Williams, xen-devel, Dave Jiang, pkg-xen-devel, Maciej Sosnowski, linux-kernel, Jonathan Nieder, William Dauchy, Konrad Rzeszutek Wilk On Tue, Mar 06, 2012 at 05:20:54PM +0800, Thomas Goirand wrote: > Please find attached the log that you asked me, with WARN_ON instead of > BUG_ON, and with the 2 #define DEBUG in dma.c and dma_v2.c. | ioatdma 0000:00:16.1: desc[1]: (0x27ea6d040->0x27ea6d080) cookie: 0 flags: 0x0 ctl: 0x0 (op: 0 int_en: 0 compl: 0) | ioatdma 0000:00:16.1: desc[1]: (0x27ea6d040->0x27ea6d080) cookie: 0 flags: 0x31 ctl: 0x9 (op: 0 int_en: 1 compl: 1) *counting* 9 hex digest, aka > 2^32. What did I say? Bastian -- The joys of love made her human and the agonies of love destroyed her. -- Spock, "Requiem for Methuselah", stardate 5842.8 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-06 9:20 ` Thomas Goirand 2012-03-06 10:33 ` Bastian Blank @ 2012-03-06 14:14 ` Dan Williams 2012-03-06 14:39 ` Ian Campbell 2012-03-11 22:06 ` Jonathan Nieder 1 sibling, 2 replies; 29+ messages in thread From: Dan Williams @ 2012-03-06 14:14 UTC (permalink / raw) To: Thomas Goirand Cc: xen-devel, Dave Jiang, pkg-xen-devel, Maciej Sosnowski, linux-kernel, Jonathan Nieder, William Dauchy, Konrad Rzeszutek Wilk On Tue, Mar 6, 2012 at 1:20 AM, Thomas Goirand <zigo@debian.org> wrote: > On 03/05/2012 11:38 PM, Dan Williams wrote: >> On Mon, Mar 5, 2012 at 7:26 AM, Thomas Goirand <zigo@debian.org> wrote: >>> I will do my best to provide it ASAP. Should I compile with BUG_ON so >>> you see it crashing, as per the original code, or just with WARN_ON, so >>> you also see further things in dmesg? >> >> Yes, replacing with a WARN_ON might allow it to skid after the crash >> and give a bit more information. >> >> Thank you for grabbing this info. >> >> -- >> Dan > > Hi Dan, > > Please find attached the log that you asked me, with WARN_ON instead of > BUG_ON, and with the 2 #define DEBUG in dma.c and dma_v2.c. > [ 9.276817] ioatdma 0000:00:16.4: desc[0]: (0x300cc7000->0x300cc7040) cookie: 0 flags: 0x2 ctl: 0x29 (op: 0 int_en: 1 compl: 1) ... [ 9.276832] ioatdma 0000:00:16.4: ioat_get_current_completion: phys_complete: 0xcc7000 Thanks, this clearly shows that our descriptors are above 4GB and that the driver truncates the completion word. Is this new behavior for xen? Before you had mentioned that non-xen 32-bit builds don't fail. Can you send me the .config from those two cases (offlist if they are too large)? I'm looking for what config option enables this so I can quote it in the patch to increase the size of phys_complete. Certainly this changes my assumptions of what address ranges GFP_KERNEL memory will be located. -- Dan ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-06 14:14 ` Dan Williams @ 2012-03-06 14:39 ` Ian Campbell 2012-03-13 16:49 ` [Xen-devel] " Konrad Rzeszutek Wilk 2012-03-11 22:06 ` Jonathan Nieder 1 sibling, 1 reply; 29+ messages in thread From: Ian Campbell @ 2012-03-06 14:39 UTC (permalink / raw) To: Dan Williams Cc: xen-devel, Dave Jiang, pkg-xen-devel, Thomas Goirand, Maciej Sosnowski, linux-kernel, Jonathan Nieder, William Dauchy, Konrad Rzeszutek Wilk On Tue, 2012-03-06 at 06:14 -0800, Dan Williams wrote: > [ 9.276817] ioatdma 0000:00:16.4: desc[0]: > (0x300cc7000->0x300cc7040) cookie: 0 flags: 0x2 ctl: 0x29 (op: 0 > int_en: 1 compl: 1) > ... > [ 9.276832] ioatdma 0000:00:16.4: ioat_get_current_completion: > phys_complete: 0xcc7000 > > Thanks, this clearly shows that our descriptors are above 4GB and that > the driver truncates the completion word. > > Is this new behavior for xen? Xen makes a distinction between physical addresses and DMA addresses and the latter can potentially be anywhere in the machine's real address space while the former is what GFP_KERNEL etc controls. You are using pci_pool_alloc which is the correct API to use for these things since it's purpose is to handle cases where PHYS != DMA addr by exposing the DMA address to the caller. As part of that you should also be using dma_addr_t for DMA addresses since that is the type which is defined to handle the appropriate DMA address size on the platform. I think this DMA!=PHYS can also be true of some non-x86 architectures without Xen too but I guess ioat is quite x86 specific? In any case it is wrong, or at least non-portable, to use unsigned long for these addresses even though it happens on x86 that physaddr == dma addr (usually). Ian. -- Ian Campbell Start every day off with a smile and get it over with. -- W. C. Fields ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Xen-devel] [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-06 14:39 ` Ian Campbell @ 2012-03-13 16:49 ` Konrad Rzeszutek Wilk 0 siblings, 0 replies; 29+ messages in thread From: Konrad Rzeszutek Wilk @ 2012-03-13 16:49 UTC (permalink / raw) To: Ian Campbell Cc: Dan Williams, xen-devel, Dave Jiang, pkg-xen-devel, Thomas Goirand, Maciej Sosnowski, linux-kernel, Jonathan Nieder, William Dauchy, Konrad Rzeszutek Wilk On Tue, Mar 06, 2012 at 06:39:12AM -0800, Ian Campbell wrote: > On Tue, 2012-03-06 at 06:14 -0800, Dan Williams wrote: > > [ 9.276817] ioatdma 0000:00:16.4: desc[0]: > > (0x300cc7000->0x300cc7040) cookie: 0 flags: 0x2 ctl: 0x29 (op: 0 > > int_en: 1 compl: 1) > > ... > > [ 9.276832] ioatdma 0000:00:16.4: ioat_get_current_completion: > > phys_complete: 0xcc7000 > > > > Thanks, this clearly shows that our descriptors are above 4GB and that > > the driver truncates the completion word. > > > > Is this new behavior for xen? > > Xen makes a distinction between physical addresses and DMA addresses and > the latter can potentially be anywhere in the machine's real address > space while the former is what GFP_KERNEL etc controls. > > You are using pci_pool_alloc which is the correct API to use for these > things since it's purpose is to handle cases where PHYS != DMA addr by > exposing the DMA address to the caller. As part of that you should also > be using dma_addr_t for DMA addresses since that is the type which is > defined to handle the appropriate DMA address size on the platform. > > I think this DMA!=PHYS can also be true of some non-x86 architectures Especially SPARC. > without Xen too but I guess ioat is quite x86 specific? In any case it > is wrong, or at least non-portable, to use unsigned long for these > addresses even though it happens on x86 that physaddr == dma addr > (usually). I think with the Intel VT-d that can be different. The bus addresses returned do seem different. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-06 14:14 ` Dan Williams 2012-03-06 14:39 ` Ian Campbell @ 2012-03-11 22:06 ` Jonathan Nieder 2012-03-23 23:55 ` Dan Williams 1 sibling, 1 reply; 29+ messages in thread From: Jonathan Nieder @ 2012-03-11 22:06 UTC (permalink / raw) To: Dan Williams Cc: Thomas Goirand, xen-devel, Dave Jiang, pkg-xen-devel, Maciej Sosnowski, linux-kernel, William Dauchy, Konrad Rzeszutek Wilk Hi Dan, Dan Williams wrote: > Before you had mentioned that non-xen 32-bit builds don't fail. Can > you send me the .config from those two cases (offlist if they are too > large)? The failing and non-failing kernels are identical. It is the environment in which they are run that is different. Running the kernel on bare metal works fine, while booting as a dom0 from the xen hypervisor triggers the assertion failure.[1] .config: [2] Hope that helps, Jonathan [1] http://thread.gmane.org/gmane.comp.emulators.xen.devel/121604/focus=121615 [2] http://alioth.debian.org/~jrnieder-guest/temp/config-3.2.0-2-686-pae ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-11 22:06 ` Jonathan Nieder @ 2012-03-23 23:55 ` Dan Williams 2012-03-24 1:29 ` William Dauchy 2012-03-24 2:25 ` William Dauchy 0 siblings, 2 replies; 29+ messages in thread From: Dan Williams @ 2012-03-23 23:55 UTC (permalink / raw) To: Jonathan Nieder Cc: Thomas Goirand, xen-devel, Dave Jiang, pkg-xen-devel, Maciej Sosnowski, linux-kernel, William Dauchy, Konrad Rzeszutek Wilk Subject: ioat: fix size of 'completion' for Xen From: Dan Williams <dan.j.williams@intel.com> Starting with v3.2 Jonathan reports that Xen crashes loading the ioatdma driver. A debug run shows: ioatdma 0000:00:16.4: desc[0]: (0x300cc7000->0x300cc7040) cookie: 0 flags: 0x2 ctl: 0x29 (op: 0 int_en: 1 compl: 1) ... ioatdma 0000:00:16.4: ioat_get_current_completion: phys_complete: 0xcc7000 ...which shows that in this environment GFP_KERNEL memory may be backed by a 64-bit dma address. This breaks the driver's assumption that an unsigned long should be able to contain the physical address for descriptor memory. Switch to dma_addr_t which beyond being the right size, is the true type for the data i.e. an io-virtual address indicating the engine's last processed descriptor. [stable: 3.2+] Cc: <stable@vger.kernel.org> Reported-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- On Sun, 2012-03-11 at 17:06 -0500, Jonathan Nieder wrote: Hi Dan, > > Dan Williams wrote: > > > Before you had mentioned that non-xen 32-bit builds don't fail. Can > > you send me the .config from those two cases (offlist if they are too > > large)? > > The failing and non-failing kernels are identical. It is the > environment in which they are run that is different. > > Running the kernel on bare metal works fine, while booting as a dom0 > from the xen hypervisor triggers the assertion failure.[1] > > .config: [2] > > Hope that helps, > Jonathan Thanks for the debug help, does this patch fix the issue for you? drivers/dma/ioat/dma.c | 16 ++++++++-------- drivers/dma/ioat/dma.h | 6 +++--- drivers/dma/ioat/dma_v2.c | 8 ++++---- drivers/dma/ioat/dma_v3.c | 8 ++++---- 4 files changed, 19 insertions(+), 19 deletions(-) diff --git a/drivers/dma/ioat/dma.c b/drivers/dma/ioat/dma.c index a4d6cb0..6595180 100644 --- a/drivers/dma/ioat/dma.c +++ b/drivers/dma/ioat/dma.c @@ -548,9 +548,9 @@ void ioat_dma_unmap(struct ioat_chan_common *chan, enum dma_ctrl_flags flags, PCI_DMA_TODEVICE, flags, 0); } -unsigned long ioat_get_current_completion(struct ioat_chan_common *chan) +dma_addr_t ioat_get_current_completion(struct ioat_chan_common *chan) { - unsigned long phys_complete; + dma_addr_t phys_complete; u64 completion; completion = *chan->completion; @@ -571,7 +571,7 @@ unsigned long ioat_get_current_completion(struct ioat_chan_common *chan) } bool ioat_cleanup_preamble(struct ioat_chan_common *chan, - unsigned long *phys_complete) + dma_addr_t *phys_complete) { *phys_complete = ioat_get_current_completion(chan); if (*phys_complete == chan->last_completion) @@ -582,14 +582,14 @@ bool ioat_cleanup_preamble(struct ioat_chan_common *chan, return true; } -static void __cleanup(struct ioat_dma_chan *ioat, unsigned long phys_complete) +static void __cleanup(struct ioat_dma_chan *ioat, dma_addr_t phys_complete) { struct ioat_chan_common *chan = &ioat->base; struct list_head *_desc, *n; struct dma_async_tx_descriptor *tx; - dev_dbg(to_dev(chan), "%s: phys_complete: %lx\n", - __func__, phys_complete); + dev_dbg(to_dev(chan), "%s: phys_complete: %llx\n", + __func__, (unsigned long long) phys_complete); list_for_each_safe(_desc, n, &ioat->used_desc) { struct ioat_desc_sw *desc; @@ -655,7 +655,7 @@ static void __cleanup(struct ioat_dma_chan *ioat, unsigned long phys_complete) static void ioat1_cleanup(struct ioat_dma_chan *ioat) { struct ioat_chan_common *chan = &ioat->base; - unsigned long phys_complete; + dma_addr_t phys_complete; prefetch(chan->completion); @@ -701,7 +701,7 @@ static void ioat1_timer_event(unsigned long data) mod_timer(&chan->timer, jiffies + COMPLETION_TIMEOUT); spin_unlock_bh(&ioat->desc_lock); } else if (test_bit(IOAT_COMPLETION_PENDING, &chan->state)) { - unsigned long phys_complete; + dma_addr_t phys_complete; spin_lock_bh(&ioat->desc_lock); /* if we haven't made progress and we have already diff --git a/drivers/dma/ioat/dma.h b/drivers/dma/ioat/dma.h index 5216c8a..8bebddd 100644 --- a/drivers/dma/ioat/dma.h +++ b/drivers/dma/ioat/dma.h @@ -88,7 +88,7 @@ struct ioatdma_device { struct ioat_chan_common { struct dma_chan common; void __iomem *reg_base; - unsigned long last_completion; + dma_addr_t last_completion; spinlock_t cleanup_lock; dma_cookie_t completed_cookie; unsigned long state; @@ -333,7 +333,7 @@ int __devinit ioat_dma_self_test(struct ioatdma_device *device); void __devexit ioat_dma_remove(struct ioatdma_device *device); struct dca_provider * __devinit ioat_dca_init(struct pci_dev *pdev, void __iomem *iobase); -unsigned long ioat_get_current_completion(struct ioat_chan_common *chan); +dma_addr_t ioat_get_current_completion(struct ioat_chan_common *chan); void ioat_init_channel(struct ioatdma_device *device, struct ioat_chan_common *chan, int idx); enum dma_status ioat_dma_tx_status(struct dma_chan *c, dma_cookie_t cookie, @@ -341,7 +341,7 @@ enum dma_status ioat_dma_tx_status(struct dma_chan *c, dma_cookie_t cookie, void ioat_dma_unmap(struct ioat_chan_common *chan, enum dma_ctrl_flags flags, size_t len, struct ioat_dma_descriptor *hw); bool ioat_cleanup_preamble(struct ioat_chan_common *chan, - unsigned long *phys_complete); + dma_addr_t *phys_complete); void ioat_kobject_add(struct ioatdma_device *device, struct kobj_type *type); void ioat_kobject_del(struct ioatdma_device *device); extern const struct sysfs_ops ioat_sysfs_ops; diff --git a/drivers/dma/ioat/dma_v2.c b/drivers/dma/ioat/dma_v2.c index 5d65f83..cb8864d 100644 --- a/drivers/dma/ioat/dma_v2.c +++ b/drivers/dma/ioat/dma_v2.c @@ -126,7 +126,7 @@ static void ioat2_start_null_desc(struct ioat2_dma_chan *ioat) spin_unlock_bh(&ioat->prep_lock); } -static void __cleanup(struct ioat2_dma_chan *ioat, unsigned long phys_complete) +static void __cleanup(struct ioat2_dma_chan *ioat, dma_addr_t phys_complete) { struct ioat_chan_common *chan = &ioat->base; struct dma_async_tx_descriptor *tx; @@ -178,7 +178,7 @@ static void __cleanup(struct ioat2_dma_chan *ioat, unsigned long phys_complete) static void ioat2_cleanup(struct ioat2_dma_chan *ioat) { struct ioat_chan_common *chan = &ioat->base; - unsigned long phys_complete; + dma_addr_t phys_complete; spin_lock_bh(&chan->cleanup_lock); if (ioat_cleanup_preamble(chan, &phys_complete)) @@ -259,7 +259,7 @@ int ioat2_reset_sync(struct ioat_chan_common *chan, unsigned long tmo) static void ioat2_restart_channel(struct ioat2_dma_chan *ioat) { struct ioat_chan_common *chan = &ioat->base; - unsigned long phys_complete; + dma_addr_t phys_complete; ioat2_quiesce(chan, 0); if (ioat_cleanup_preamble(chan, &phys_complete)) @@ -274,7 +274,7 @@ void ioat2_timer_event(unsigned long data) struct ioat_chan_common *chan = &ioat->base; if (test_bit(IOAT_COMPLETION_PENDING, &chan->state)) { - unsigned long phys_complete; + dma_addr_t phys_complete; u64 status; status = ioat_chansts(chan); diff --git a/drivers/dma/ioat/dma_v3.c b/drivers/dma/ioat/dma_v3.c index f519c93..2dbf32b 100644 --- a/drivers/dma/ioat/dma_v3.c +++ b/drivers/dma/ioat/dma_v3.c @@ -256,7 +256,7 @@ static bool desc_has_ext(struct ioat_ring_ent *desc) * The difference from the dma_v2.c __cleanup() is that this routine * handles extended descriptors and dma-unmapping raid operations. */ -static void __cleanup(struct ioat2_dma_chan *ioat, unsigned long phys_complete) +static void __cleanup(struct ioat2_dma_chan *ioat, dma_addr_t phys_complete) { struct ioat_chan_common *chan = &ioat->base; struct ioat_ring_ent *desc; @@ -314,7 +314,7 @@ static void __cleanup(struct ioat2_dma_chan *ioat, unsigned long phys_complete) static void ioat3_cleanup(struct ioat2_dma_chan *ioat) { struct ioat_chan_common *chan = &ioat->base; - unsigned long phys_complete; + dma_addr_t phys_complete; spin_lock_bh(&chan->cleanup_lock); if (ioat_cleanup_preamble(chan, &phys_complete)) @@ -333,7 +333,7 @@ static void ioat3_cleanup_event(unsigned long data) static void ioat3_restart_channel(struct ioat2_dma_chan *ioat) { struct ioat_chan_common *chan = &ioat->base; - unsigned long phys_complete; + dma_addr_t phys_complete; ioat2_quiesce(chan, 0); if (ioat_cleanup_preamble(chan, &phys_complete)) @@ -348,7 +348,7 @@ static void ioat3_timer_event(unsigned long data) struct ioat_chan_common *chan = &ioat->base; if (test_bit(IOAT_COMPLETION_PENDING, &chan->state)) { - unsigned long phys_complete; + dma_addr_t phys_complete; u64 status; status = ioat_chansts(chan); ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-23 23:55 ` Dan Williams @ 2012-03-24 1:29 ` William Dauchy 2012-03-24 2:25 ` William Dauchy 1 sibling, 0 replies; 29+ messages in thread From: William Dauchy @ 2012-03-24 1:29 UTC (permalink / raw) To: Dan Williams Cc: Jonathan Nieder, Thomas Goirand, xen-devel, Dave Jiang, pkg-xen-devel, Maciej Sosnowski, linux-kernel, Konrad Rzeszutek Wilk On Sat, Mar 24, 2012 at 12:55 AM, Dan Williams <dan.j.williams@intel.com> wrote: > Starting with v3.2 Jonathan reports that Xen crashes loading the ioatdma > driver. A debug run shows: Please note that I reported the crash a bit earlier http://lists.xen.org/archives/html/xen-devel/2012-01/msg02408.html I will test this patch as soon as possible. Thanks for your work. Reported-by: William Dauchy <wdauchy@gmail.com> Regards, -- William ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-23 23:55 ` Dan Williams 2012-03-24 1:29 ` William Dauchy @ 2012-03-24 2:25 ` William Dauchy 2012-03-24 3:34 ` Williams, Dan J 1 sibling, 1 reply; 29+ messages in thread From: William Dauchy @ 2012-03-24 2:25 UTC (permalink / raw) To: Dan Williams Cc: Jonathan Nieder, Thomas Goirand, xen-devel, Dave Jiang, pkg-xen-devel, Maciej Sosnowski, linux-kernel, Konrad Rzeszutek Wilk Hi Dan, On Sat, Mar 24, 2012 at 12:55 AM, Dan Williams <dan.j.williams@intel.com> wrote: > Thanks for the debug help, does this patch fix the issue for you? I successfully tested your patch and it works fine. Thanks again for your work. Reported-by: William Dauchy <wdauchy@gmail.com> Tested-by: William Dauchy <wdauchy@gmail.com> Best regards, -- William ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 2012-03-24 2:25 ` William Dauchy @ 2012-03-24 3:34 ` Williams, Dan J 0 siblings, 0 replies; 29+ messages in thread From: Williams, Dan J @ 2012-03-24 3:34 UTC (permalink / raw) To: William Dauchy Cc: Jonathan Nieder, Thomas Goirand, xen-devel, Dave Jiang, pkg-xen-devel, Maciej Sosnowski, linux-kernel, Konrad Rzeszutek Wilk On Fri, Mar 23, 2012 at 7:25 PM, William Dauchy <wdauchy@gmail.com> wrote: > Hi Dan, > > On Sat, Mar 24, 2012 at 12:55 AM, Dan Williams <dan.j.williams@intel.com> wrote: >> Thanks for the debug help, does this patch fix the issue for you? > > I successfully tested your patch and it works fine. Thanks again for your work. > > Reported-by: William Dauchy <wdauchy@gmail.com> > Tested-by: William Dauchy <wdauchy@gmail.com> Great, thanks for the test. -- Dan ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2012-03-24 3:34 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-01-27 13:31 regression ioatdma 3.3 William Dauchy 2012-01-27 14:47 ` Konrad Rzeszutek Wilk 2012-01-27 15:02 ` William Dauchy 2012-02-19 22:31 ` Jonathan Nieder 2012-02-20 18:16 ` Jonathan Nieder 2012-02-25 7:46 ` Thomas Goirand 2012-02-25 21:13 ` William Dauchy 2012-03-02 5:57 ` ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 Jonathan Nieder 2012-03-02 6:42 ` Dan Williams 2012-03-02 16:21 ` [Pkg-xen-devel] " Bastian Blank 2012-03-02 16:44 ` Dan Williams 2012-03-02 17:57 ` Bastian Blank 2012-03-02 19:31 ` Dan Williams 2012-03-02 20:08 ` Bastian Blank 2012-03-02 20:16 ` Dan Williams 2012-03-02 20:56 ` Bastian Blank 2012-03-02 21:17 ` Dan Williams 2012-03-05 15:26 ` Thomas Goirand 2012-03-05 15:38 ` Dan Williams 2012-03-06 9:20 ` Thomas Goirand 2012-03-06 10:33 ` Bastian Blank 2012-03-06 14:14 ` Dan Williams 2012-03-06 14:39 ` Ian Campbell 2012-03-13 16:49 ` [Xen-devel] " Konrad Rzeszutek Wilk 2012-03-11 22:06 ` Jonathan Nieder 2012-03-23 23:55 ` Dan Williams 2012-03-24 1:29 ` William Dauchy 2012-03-24 2:25 ` William Dauchy 2012-03-24 3:34 ` Williams, Dan J
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).