* Dom0 crash with apache bench (ab) @ 2015-07-28 13:09 Christoffer Dall 2015-07-28 14:50 ` Konrad Rzeszutek Wilk 0 siblings, 1 reply; 13+ messages in thread From: Christoffer Dall @ 2015-07-28 13:09 UTC (permalink / raw) To: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1586 bytes --] Hi, I've been doing some performance comparisons lately, and wanted to compare the performance overhead of using Xen with apache bench, but unfortunately the Dom0 kernel crashes when hitting it with ab from a remote machine. Most other workloads seem to be stable, however, I do see similar crashes if hitting Dom0 mysql with a mysql benchmark with a high level of parallelism. I use a 10G Mellanox MX354A Dual port FDR CX3 adapter for networking on a Dell PowerEdge R320 system with a Xeon E5-2450 and 16 GB of RAM. Interestingly, we had a similarly looking issue on arm64 recently, but that was fixed with an APM-soecific fix to the hypervisor, so I am guessing this is unrelated, see: http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg02731.html and the fix: http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=50dcb3de603927db2fd87ba09e29c817415aaa44 I have tried with several Linux versions, v3.13, v3.18, v4.0-rc4, and v4.1, same issue. I have tried with Xen 4.5-0 release, and the Ubuntu packaged Xen 4.4 release, same issue. Examples of crash: http://pastebin.ubuntu.com/11953498/ http://pastebin.ubuntu.com/11953443/ Running DomU with a bridge and running ab against apache running in a DomU also causes the system to crash. Note: The server also has an embedded 1G Broadcom NIC (although not suitable for testing due to it being 1G and on a control network), and using that for the test does not cause a system crash, so this points to some difficulties with the Mellanox device and Xen. Any ideas or advice is greatly appreciated, thanks. -Christoffer [-- Attachment #1.2: Type: text/html, Size: 1976 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Dom0 crash with apache bench (ab) 2015-07-28 13:09 Dom0 crash with apache bench (ab) Christoffer Dall @ 2015-07-28 14:50 ` Konrad Rzeszutek Wilk 2015-07-28 14:55 ` Ian Campbell 0 siblings, 1 reply; 13+ messages in thread From: Konrad Rzeszutek Wilk @ 2015-07-28 14:50 UTC (permalink / raw) To: Christoffer Dall; +Cc: xen-devel On Tue, Jul 28, 2015 at 03:09:31PM +0200, Christoffer Dall wrote: > Hi, > > I've been doing some performance comparisons lately, and wanted to compare > the performance overhead of using Xen with apache bench, but unfortunately > the Dom0 kernel crashes when hitting it with ab from a remote machine. > Most other workloads seem to be stable, however, I do see similar crashes > if hitting Dom0 mysql with a mysql benchmark with a high level of > parallelism. > > I use a 10G Mellanox MX354A Dual port FDR CX3 adapter for networking on a > Dell PowerEdge R320 system with a Xeon E5-2450 and 16 GB of RAM. > > Interestingly, we had a similarly looking issue on arm64 recently, but that > was fixed with an APM-soecific fix to the hypervisor, so I am guessing this > is unrelated, see: > http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg02731.html > and the fix: > http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=50dcb3de603927db2fd87ba09e29c817415aaa44 > > I have tried with several Linux versions, v3.13, v3.18, v4.0-rc4, and v4.1, > same issue. I have tried with Xen 4.5-0 release, and the Ubuntu packaged > Xen 4.4 release, same issue. > > Examples of crash: > http://pastebin.ubuntu.com/11953498/ > http://pastebin.ubuntu.com/11953443/ 4.0-rc4? Have you tried 4.1? > > Running DomU with a bridge and running ab against apache running in a DomU > also causes the system to crash. > > Note: The server also has an embedded 1G Broadcom NIC (although not > suitable for testing due to it being 1G and on a control network), and > using that for the test does not cause a system crash, so this points to > some difficulties with the Mellanox device and Xen. > > Any ideas or advice is greatly appreciated, thanks. > > -Christoffer > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Dom0 crash with apache bench (ab) 2015-07-28 14:50 ` Konrad Rzeszutek Wilk @ 2015-07-28 14:55 ` Ian Campbell 2015-07-28 15:00 ` Christoffer Dall 0 siblings, 1 reply; 13+ messages in thread From: Ian Campbell @ 2015-07-28 14:55 UTC (permalink / raw) To: Konrad Rzeszutek Wilk, Christoffer Dall; +Cc: xen-devel On Tue, 2015-07-28 at 10:50 -0400, Konrad Rzeszutek Wilk wrote: > On Tue, Jul 28, 2015 at 03:09:31PM +0200, Christoffer Dall wrote: > > Hi, > > > > I've been doing some performance comparisons lately, and wanted to > > compare > > the performance overhead of using Xen with apache bench, but > > unfortunately > > the Dom0 kernel crashes when hitting it with ab from a remote machine. > > Most other workloads seem to be stable, however, I do see similar > > crashes > > if hitting Dom0 mysql with a mysql benchmark with a high level of > > parallelism. > > > > I use a 10G Mellanox MX354A Dual port FDR CX3 adapter for networking on > > a > > Dell PowerEdge R320 system with a Xeon E5-2450 and 16 GB of RAM. > > > > Interestingly, we had a similarly looking issue on arm64 recently, but > > that > > was fixed with an APM-soecific fix to the hypervisor, so I am guessing > > this > > is unrelated, see: > > http://lists.xenproject.org/archives/html/xen-devel/2015 > > -03/msg02731.html > > and the fix: > > http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=50dcb3de603927db2fd > > 87ba09e29c817415aaa44 > > > > I have tried with several Linux versions, v3.13, v3.18, v4.0-rc4, and > > v4.1, > > same issue. I have tried with Xen 4.5-0 release, and the Ubuntu > > packaged > > Xen 4.4 release, same issue. > > > > Examples of crash: > > http://pastebin.ubuntu.com/11953498/ > > http://pastebin.ubuntu.com/11953443/ > > 4.0-rc4? > > Have you tried 4.1? According to the previous paragraph, yes he has. > > > > > Running DomU with a bridge and running ab against apache running in a > > DomU > > also causes the system to crash. > > > > Note: The server also has an embedded 1G Broadcom NIC (although not > > suitable for testing due to it being 1G and on a control network), and > > using that for the test does not cause a system crash, so this points > > to > > some difficulties with the Mellanox device and Xen. > > > > Any ideas or advice is greatly appreciated, thanks. > > > > -Christoffer > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Dom0 crash with apache bench (ab) 2015-07-28 14:55 ` Ian Campbell @ 2015-07-28 15:00 ` Christoffer Dall 2015-07-31 10:24 ` Stefano Stabellini 0 siblings, 1 reply; 13+ messages in thread From: Christoffer Dall @ 2015-07-28 15:00 UTC (permalink / raw) To: Ian Campbell; +Cc: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1885 bytes --] On Tue, Jul 28, 2015 at 4:55 PM, Ian Campbell <ian.campbell@citrix.com> wrote: > On Tue, 2015-07-28 at 10:50 -0400, Konrad Rzeszutek Wilk wrote: > > On Tue, Jul 28, 2015 at 03:09:31PM +0200, Christoffer Dall wrote: > > > Hi, > > > > > > I've been doing some performance comparisons lately, and wanted to > > > compare > > > the performance overhead of using Xen with apache bench, but > > > unfortunately > > > the Dom0 kernel crashes when hitting it with ab from a remote machine. > > > Most other workloads seem to be stable, however, I do see similar > > > crashes > > > if hitting Dom0 mysql with a mysql benchmark with a high level of > > > parallelism. > > > > > > I use a 10G Mellanox MX354A Dual port FDR CX3 adapter for networking on > > > a > > > Dell PowerEdge R320 system with a Xeon E5-2450 and 16 GB of RAM. > > > > > > Interestingly, we had a similarly looking issue on arm64 recently, but > > > that > > > was fixed with an APM-soecific fix to the hypervisor, so I am guessing > > > this > > > is unrelated, see: > > > http://lists.xenproject.org/archives/html/xen-devel/2015 > > > -03/msg02731.html > > > and the fix: > > > > http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=50dcb3de603927db2fd > > > 87ba09e29c817415aaa44 > > > > > > I have tried with several Linux versions, v3.13, v3.18, v4.0-rc4, and > > > v4.1, > > > same issue. I have tried with Xen 4.5-0 release, and the Ubuntu > > > packaged > > > Xen 4.4 release, same issue. > > > > > > Examples of crash: > > > http://pastebin.ubuntu.com/11953498/ > > > http://pastebin.ubuntu.com/11953443/ > > > > 4.0-rc4? > > > > Have you tried 4.1? > > According to the previous paragraph, yes he has. > > yes, I have. Just for clarify, I used 4.0-rc4 because that's a branch which contained arm64 PCI support and has been used for other measurements, so this was simply my 'working tree'. Thanks, -Christoffer [-- Attachment #1.2: Type: text/html, Size: 3098 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Dom0 crash with apache bench (ab) 2015-07-28 15:00 ` Christoffer Dall @ 2015-07-31 10:24 ` Stefano Stabellini 2015-07-31 10:28 ` David Vrabel 0 siblings, 1 reply; 13+ messages in thread From: Stefano Stabellini @ 2015-07-31 10:24 UTC (permalink / raw) To: Christoffer Dall; +Cc: David Vrabel, Wei Liu, Ian Campbell, xen-devel [-- Attachment #1: Type: text/plain, Size: 4066 bytes --] This is a Linux Dom0 crash on x86 (Dell PowerEdge R320, Xeon E5-2450), CC'ing relevant people. As you can see from the links below the crash is: [ 253.619326] Call Trace: [ 253.619330] <IRQ> [ 253.619332] [<ffffffff815d7c25>] ? skb_copy_ubufs+0xa5/0x230 [ 253.619347] [<ffffffff815e8525>] __netif_receive_skb_core+0x6f5/0x940 [ 253.619353] [<ffffffff815e8788>] __netif_receive_skb+0x18/0x60 [ 253.619360] [<ffffffff815e87f8>] netif_receive_skb_internal+0x28/0x90 [ 253.619366] [<ffffffff815e91f5>] napi_gro_frags+0x125/0x1a0 [ 253.619378] [<ffffffffa01b1173>] mlx4_en_process_rx_cq+0x753/0xb50 [mlx4_en] [ 253.619387] [<ffffffffa01b1657>] mlx4_en_poll_rx_cq+0x97/0x160 [mlx4_en] [ 253.619393] [<ffffffff815e8bcd>] net_rx_action+0x13d/0x2f0 [ 253.619400] [<ffffffff8109fdea>] __do_softirq+0xda/0x1f0 [ 253.619406] [<ffffffff810a013d>] irq_exit+0x9d/0xb0 [ 253.619412] [<ffffffff813e3825>] xen_evtchn_do_upcall+0x35/0x50 [ 253.619420] [<ffffffff816c7bce>] xen_do_hypervisor_callback+0x1e/0x40 [ 253.619423] <EOI> [ 253.619426] [<ffffffff811a7870>] ? shrink_dcache_for_umount+0x90/0x90 [ 253.619437] [<ffffffff811a7ad9>] ? d_alloc_pseudo+0x9/0x10 [ 253.619443] [<ffffffff815cbbed>] ? sock_alloc_file+0x4d/0x120 [ 253.619448] [<ffffffff815cdf78>] ? SYSC_accept4+0xb8/0x200 [ 253.619454] [<ffffffff811d0377>] ? SyS_epoll_wait+0x87/0xe0 [ 253.619459] [<ffffffff815cf5c9>] ? SyS_accept4+0x9/0x10 [ 253.619465] [<ffffffff816c630d>] ? system_call_fastpath+0x16/0x1b [ 253.619469] Code: 4e 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 e8 6b fc ff ff eb e1 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3 > 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c [ 253.619513] RIP [<ffffffff81318b0d>] __memcpy+0xd/0x110 [ 253.619520] RSP <ffff88006b823c60> [ 253.619524] ---[ end trace ba5d35a466b03856 ]--- On Tue, 28 Jul 2015, Christoffer Dall wrote: > On Tue, Jul 28, 2015 at 4:55 PM, Ian Campbell <ian.campbell@citrix.com> wrote: > On Tue, 2015-07-28 at 10:50 -0400, Konrad Rzeszutek Wilk wrote: > > On Tue, Jul 28, 2015 at 03:09:31PM +0200, Christoffer Dall wrote: > > > Hi, > > > > > > I've been doing some performance comparisons lately, and wanted to > > > compare > > > the performance overhead of using Xen with apache bench, but > > > unfortunately > > > the Dom0 kernel crashes when hitting it with ab from a remote machine. > > > Most other workloads seem to be stable, however, I do see similar > > > crashes > > > if hitting Dom0 mysql with a mysql benchmark with a high level of > > > parallelism. > > > > > > I use a 10G Mellanox MX354A Dual port FDR CX3 adapter for networking on > > > a > > > Dell PowerEdge R320 system with a Xeon E5-2450 and 16 GB of RAM. > > > > > > Interestingly, we had a similarly looking issue on arm64 recently, but > > > that > > > was fixed with an APM-soecific fix to the hypervisor, so I am guessing > > > this > > > is unrelated, see: > > > http://lists.xenproject.org/archives/html/xen-devel/2015 > > > -03/msg02731.html > > > and the fix: > > > http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=50dcb3de603927db2fd > > > 87ba09e29c817415aaa44 > > > > > > I have tried with several Linux versions, v3.13, v3.18, v4.0-rc4, and > > > v4.1, > > > same issue. I have tried with Xen 4.5-0 release, and the Ubuntu > > > packaged > > > Xen 4.4 release, same issue. > > > > > > Examples of crash: > > > http://pastebin.ubuntu.com/11953498/ > > > http://pastebin.ubuntu.com/11953443/ > > > > 4.0-rc4? > > > > Have you tried 4.1? > > According to the previous paragraph, yes he has. > > yes, I have. Just for clarify, I used 4.0-rc4 because that's a branch which contained arm64 PCI support and has > been used for other measurements, so this was simply my 'working tree'. [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Dom0 crash with apache bench (ab) 2015-07-31 10:24 ` Stefano Stabellini @ 2015-07-31 10:28 ` David Vrabel 2015-07-31 13:17 ` Christoffer Dall 0 siblings, 1 reply; 13+ messages in thread From: David Vrabel @ 2015-07-31 10:28 UTC (permalink / raw) To: Stefano Stabellini, Christoffer Dall; +Cc: Wei Liu, Ian Campbell, xen-devel On 31/07/15 11:24, Stefano Stabellini wrote: > This is a Linux Dom0 crash on x86 (Dell PowerEdge R320, Xeon E5-2450), > CC'ing relevant people. As you can see from the links below the crash > is: > > [ 253.619326] Call Trace: > [ 253.619330] <IRQ> > [ 253.619332] [<ffffffff815d7c25>] ? skb_copy_ubufs+0xa5/0x230 > [ 253.619347] [<ffffffff815e8525>] __netif_receive_skb_core+0x6f5/0x940 > [ 253.619353] [<ffffffff815e8788>] __netif_receive_skb+0x18/0x60 > [ 253.619360] [<ffffffff815e87f8>] netif_receive_skb_internal+0x28/0x90 > [ 253.619366] [<ffffffff815e91f5>] napi_gro_frags+0x125/0x1a0 > [ 253.619378] [<ffffffffa01b1173>] mlx4_en_process_rx_cq+0x753/0xb50 [mlx4_en] > [ 253.619387] [<ffffffffa01b1657>] mlx4_en_poll_rx_cq+0x97/0x160 [mlx4_en] What makes you think this is Xen specific? I suggest raising this the the mlx4 maintainers. David > [ 253.619393] [<ffffffff815e8bcd>] net_rx_action+0x13d/0x2f0 > [ 253.619400] [<ffffffff8109fdea>] __do_softirq+0xda/0x1f0 > [ 253.619406] [<ffffffff810a013d>] irq_exit+0x9d/0xb0 > [ 253.619412] [<ffffffff813e3825>] xen_evtchn_do_upcall+0x35/0x50 > [ 253.619420] [<ffffffff816c7bce>] xen_do_hypervisor_callback+0x1e/0x40 > [ 253.619423] <EOI> > [ 253.619426] [<ffffffff811a7870>] ? shrink_dcache_for_umount+0x90/0x90 > [ 253.619437] [<ffffffff811a7ad9>] ? d_alloc_pseudo+0x9/0x10 > [ 253.619443] [<ffffffff815cbbed>] ? sock_alloc_file+0x4d/0x120 > [ 253.619448] [<ffffffff815cdf78>] ? SYSC_accept4+0xb8/0x200 > [ 253.619454] [<ffffffff811d0377>] ? SyS_epoll_wait+0x87/0xe0 > [ 253.619459] [<ffffffff815cf5c9>] ? SyS_accept4+0x9/0x10 > [ 253.619465] [<ffffffff816c630d>] ? system_call_fastpath+0x16/0x1b > [ 253.619469] Code: 4e 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 e8 6b fc > ff ff eb e1 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 > e2 07 <f3 >> 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c > [ 253.619513] RIP [<ffffffff81318b0d>] __memcpy+0xd/0x110 > [ 253.619520] RSP <ffff88006b823c60> > [ 253.619524] ---[ end trace ba5d35a466b03856 ]--- > > On Tue, 28 Jul 2015, Christoffer Dall wrote: >> On Tue, Jul 28, 2015 at 4:55 PM, Ian Campbell <ian.campbell@citrix.com> wrote: >> On Tue, 2015-07-28 at 10:50 -0400, Konrad Rzeszutek Wilk wrote: >> > On Tue, Jul 28, 2015 at 03:09:31PM +0200, Christoffer Dall wrote: >> > > Hi, >> > > >> > > I've been doing some performance comparisons lately, and wanted to >> > > compare >> > > the performance overhead of using Xen with apache bench, but >> > > unfortunately >> > > the Dom0 kernel crashes when hitting it with ab from a remote machine. >> > > Most other workloads seem to be stable, however, I do see similar >> > > crashes >> > > if hitting Dom0 mysql with a mysql benchmark with a high level of >> > > parallelism. >> > > >> > > I use a 10G Mellanox MX354A Dual port FDR CX3 adapter for networking on >> > > a >> > > Dell PowerEdge R320 system with a Xeon E5-2450 and 16 GB of RAM. >> > > >> > > Interestingly, we had a similarly looking issue on arm64 recently, but >> > > that >> > > was fixed with an APM-soecific fix to the hypervisor, so I am guessing >> > > this >> > > is unrelated, see: >> > > http://lists.xenproject.org/archives/html/xen-devel/2015 >> > > -03/msg02731.html >> > > and the fix: >> > > http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=50dcb3de603927db2fd >> > > 87ba09e29c817415aaa44 >> > > >> > > I have tried with several Linux versions, v3.13, v3.18, v4.0-rc4, and >> > > v4.1, >> > > same issue. I have tried with Xen 4.5-0 release, and the Ubuntu >> > > packaged >> > > Xen 4.4 release, same issue. >> > > >> > > Examples of crash: >> > > http://pastebin.ubuntu.com/11953498/ >> > > http://pastebin.ubuntu.com/11953443/ >> > >> > 4.0-rc4? >> > >> > Have you tried 4.1? >> >> According to the previous paragraph, yes he has. >> >> yes, I have. Just for clarify, I used 4.0-rc4 because that's a branch which contained arm64 PCI support and has >> been used for other measurements, so this was simply my 'working tree'. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Dom0 crash with apache bench (ab) 2015-07-31 10:28 ` David Vrabel @ 2015-07-31 13:17 ` Christoffer Dall 2015-09-14 12:40 ` Christoffer Dall 0 siblings, 1 reply; 13+ messages in thread From: Christoffer Dall @ 2015-07-31 13:17 UTC (permalink / raw) To: David Vrabel; +Cc: xen-devel, Wei Liu, Ian Campbell, Stefano Stabellini [-- Attachment #1.1: Type: text/plain, Size: 1093 bytes --] On Fri, Jul 31, 2015 at 12:28 PM, David Vrabel <david.vrabel@citrix.com> wrote: > On 31/07/15 11:24, Stefano Stabellini wrote: > > This is a Linux Dom0 crash on x86 (Dell PowerEdge R320, Xeon E5-2450), > > CC'ing relevant people. As you can see from the links below the crash > > is: > > > > [ 253.619326] Call Trace: > > [ 253.619330] <IRQ> > > [ 253.619332] [<ffffffff815d7c25>] ? skb_copy_ubufs+0xa5/0x230 > > [ 253.619347] [<ffffffff815e8525>] __netif_receive_skb_core+0x6f5/0x940 > > [ 253.619353] [<ffffffff815e8788>] __netif_receive_skb+0x18/0x60 > > [ 253.619360] [<ffffffff815e87f8>] netif_receive_skb_internal+0x28/0x90 > > [ 253.619366] [<ffffffff815e91f5>] napi_gro_frags+0x125/0x1a0 > > [ 253.619378] [<ffffffffa01b1173>] mlx4_en_process_rx_cq+0x753/0xb50 > [mlx4_en] > > [ 253.619387] [<ffffffffa01b1657>] mlx4_en_poll_rx_cq+0x97/0x160 > [mlx4_en] > > What makes you think this is Xen specific? I suggest raising this the > the mlx4 maintainers. > > Linux native and KVM guests (same hw, same kernel version+config) run just fine under the same workload. Thanks, -Christoffer [-- Attachment #1.2: Type: text/html, Size: 1671 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Dom0 crash with apache bench (ab) 2015-07-31 13:17 ` Christoffer Dall @ 2015-09-14 12:40 ` Christoffer Dall 2015-09-14 15:11 ` Konrad Rzeszutek Wilk 2015-09-14 15:20 ` Ian Campbell 0 siblings, 2 replies; 13+ messages in thread From: Christoffer Dall @ 2015-09-14 12:40 UTC (permalink / raw) To: Christoffer Dall Cc: Ian Campbell, xen-devel, Wei Liu, David Vrabel, Stefano Stabellini On Fri, Jul 31, 2015 at 03:17:56PM +0200, Christoffer Dall wrote: > On Fri, Jul 31, 2015 at 12:28 PM, David Vrabel <david.vrabel@citrix.com> > wrote: > > > On 31/07/15 11:24, Stefano Stabellini wrote: > > > This is a Linux Dom0 crash on x86 (Dell PowerEdge R320, Xeon E5-2450), > > > CC'ing relevant people. As you can see from the links below the crash > > > is: > > > > > > [ 253.619326] Call Trace: > > > [ 253.619330] <IRQ> > > > [ 253.619332] [<ffffffff815d7c25>] ? skb_copy_ubufs+0xa5/0x230 > > > [ 253.619347] [<ffffffff815e8525>] __netif_receive_skb_core+0x6f5/0x940 > > > [ 253.619353] [<ffffffff815e8788>] __netif_receive_skb+0x18/0x60 > > > [ 253.619360] [<ffffffff815e87f8>] netif_receive_skb_internal+0x28/0x90 > > > [ 253.619366] [<ffffffff815e91f5>] napi_gro_frags+0x125/0x1a0 > > > [ 253.619378] [<ffffffffa01b1173>] mlx4_en_process_rx_cq+0x753/0xb50 > > [mlx4_en] > > > [ 253.619387] [<ffffffffa01b1657>] mlx4_en_poll_rx_cq+0x97/0x160 > > [mlx4_en] > > > > What makes you think this is Xen specific? I suggest raising this the > > the mlx4 maintainers. > > > > > Linux native and KVM guests (same hw, same kernel version+config) run just > fine under the same workload. > Ping? >From the fact that bare-metal and KVM works fine with this hardware I still think it's reasonable to assume that it's a Xen issue and not a mlx4 issue. Is this completely flawed? Thanks, -Christoffer ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Dom0 crash with apache bench (ab) 2015-09-14 12:40 ` Christoffer Dall @ 2015-09-14 15:11 ` Konrad Rzeszutek Wilk 2015-09-14 15:20 ` Ian Campbell 1 sibling, 0 replies; 13+ messages in thread From: Konrad Rzeszutek Wilk @ 2015-09-14 15:11 UTC (permalink / raw) To: Christoffer Dall Cc: Wei Liu, Ian Campbell, Stefano Stabellini, xen-devel, Christoffer Dall, David Vrabel On Mon, Sep 14, 2015 at 02:40:08PM +0200, Christoffer Dall wrote: > On Fri, Jul 31, 2015 at 03:17:56PM +0200, Christoffer Dall wrote: > > On Fri, Jul 31, 2015 at 12:28 PM, David Vrabel <david.vrabel@citrix.com> > > wrote: > > > > > On 31/07/15 11:24, Stefano Stabellini wrote: > > > > This is a Linux Dom0 crash on x86 (Dell PowerEdge R320, Xeon E5-2450), > > > > CC'ing relevant people. As you can see from the links below the crash > > > > is: > > > > > > > > [ 253.619326] Call Trace: > > > > [ 253.619330] <IRQ> > > > > [ 253.619332] [<ffffffff815d7c25>] ? skb_copy_ubufs+0xa5/0x230 > > > > [ 253.619347] [<ffffffff815e8525>] __netif_receive_skb_core+0x6f5/0x940 > > > > [ 253.619353] [<ffffffff815e8788>] __netif_receive_skb+0x18/0x60 > > > > [ 253.619360] [<ffffffff815e87f8>] netif_receive_skb_internal+0x28/0x90 > > > > [ 253.619366] [<ffffffff815e91f5>] napi_gro_frags+0x125/0x1a0 > > > > [ 253.619378] [<ffffffffa01b1173>] mlx4_en_process_rx_cq+0x753/0xb50 > > > [mlx4_en] > > > > [ 253.619387] [<ffffffffa01b1657>] mlx4_en_poll_rx_cq+0x97/0x160 > > > [mlx4_en] > > > > > > What makes you think this is Xen specific? I suggest raising this the > > > the mlx4 maintainers. > > > > > > > > Linux native and KVM guests (same hw, same kernel version+config) run just > > fine under the same workload. > > > Ping? > > >From the fact that bare-metal and KVM works fine with this hardware I > still think it's reasonable to assume that it's a Xen issue and not a > mlx4 issue. > > Is this completely flawed? I have a feeling it is an mlx4 issue but you don't easily reproduce it under baremetal. Is there any way you could boot baremetal with 'iommu=soft swiotlb=force' to see if you can reproduce it under those conditions? thanks! > > Thanks, > -Christoffer > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Dom0 crash with apache bench (ab) 2015-09-14 12:40 ` Christoffer Dall 2015-09-14 15:11 ` Konrad Rzeszutek Wilk @ 2015-09-14 15:20 ` Ian Campbell 2015-09-14 16:16 ` Christoffer Dall 2015-09-28 20:53 ` Christoffer Dall 1 sibling, 2 replies; 13+ messages in thread From: Ian Campbell @ 2015-09-14 15:20 UTC (permalink / raw) To: Christoffer Dall, Christoffer Dall Cc: xen-devel, Wei Liu, David Vrabel, Stefano Stabellini On Mon, 2015-09-14 at 14:40 +0200, Christoffer Dall wrote: > On Fri, Jul 31, 2015 at 03:17:56PM +0200, Christoffer Dall wrote: > > On Fri, Jul 31, 2015 at 12:28 PM, David Vrabel <david.vrabel@citrix.com > > > > > wrote: > > > > > On 31/07/15 11:24, Stefano Stabellini wrote: > > > > This is a Linux Dom0 crash on x86 (Dell PowerEdge R320, Xeon E5 > > > > -2450), > > > > CC'ing relevant people. As you can see from the links below the > > > > crash > > > > is: > > > > > > > > [ 253.619326] Call Trace: > > > > [ 253.619330] <IRQ> > > > > [ 253.619332] [<ffffffff815d7c25>] ? skb_copy_ubufs+0xa5/0x230 > > > > [ 253.619347] [<ffffffff815e8525>] > > > > __netif_receive_skb_core+0x6f5/0x940 > > > > [ 253.619353] [<ffffffff815e8788>] __netif_receive_skb+0x18/0x60 > > > > [ 253.619360] [<ffffffff815e87f8>] > > > > netif_receive_skb_internal+0x28/0x90 > > > > [ 253.619366] [<ffffffff815e91f5>] napi_gro_frags+0x125/0x1a0 > > > > [ 253.619378] [<ffffffffa01b1173>] > > > > mlx4_en_process_rx_cq+0x753/0xb50 > > > [mlx4_en] > > > > [ 253.619387] [<ffffffffa01b1657>] mlx4_en_poll_rx_cq+0x97/0x160 > > > [mlx4_en] > > > > > > What makes you think this is Xen specific? I suggest raising this > > > the > > > the mlx4 maintainers. > > > > > > > > Linux native and KVM guests (same hw, same kernel version+config) run > > just > > fine under the same workload. > > > Ping? > > From the fact that bare-metal and KVM works fine with this hardware I > still think it's reasonable to assume that it's a Xen issue and not a > mlx4 issue. > > Is this completely flawed? My (somewhat educated) guess is that this is to do with the difference between (pseudo-)physical addresses and machine (AKA real-physical) addresses when running under Xen. The way this often shows up is in drivers which do not make correct use of the kernels DMA APIs but which happen to work on native x86 because physical==bus address on x86. Sometimes booting natively with 'iommu=soft swiotlb=force' can expose these sorts of issues. You are running 64-bit so I don't think the recent "config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected" is likely to be relevant (it's already unconditionally on for 64-bit). The trace appears to be on rx from a physical nic, there shouldn't be any magic Xen stuff (granted pages etc) getting themselves into that path at all. If it were tx then maybe it might be an issue with foreign pages. In any case I think you are able to repro with just dom0, i.e. never having started a domU, is that right? Ian. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Dom0 crash with apache bench (ab) 2015-09-14 15:20 ` Ian Campbell @ 2015-09-14 16:16 ` Christoffer Dall 2015-09-28 20:53 ` Christoffer Dall 1 sibling, 0 replies; 13+ messages in thread From: Christoffer Dall @ 2015-09-14 16:16 UTC (permalink / raw) To: Ian Campbell; +Cc: xen-devel, Wei Liu, David Vrabel, Stefano Stabellini [-- Attachment #1.1: Type: text/plain, Size: 3001 bytes --] On Mon, Sep 14, 2015 at 5:20 PM, Ian Campbell <ian.campbell@citrix.com> wrote: > On Mon, 2015-09-14 at 14:40 +0200, Christoffer Dall wrote: > > On Fri, Jul 31, 2015 at 03:17:56PM +0200, Christoffer Dall wrote: > > > On Fri, Jul 31, 2015 at 12:28 PM, David Vrabel < > david.vrabel@citrix.com > > > > > > > wrote: > > > > > > > On 31/07/15 11:24, Stefano Stabellini wrote: > > > > > This is a Linux Dom0 crash on x86 (Dell PowerEdge R320, Xeon E5 > > > > > -2450), > > > > > CC'ing relevant people. As you can see from the links below the > > > > > crash > > > > > is: > > > > > > > > > > [ 253.619326] Call Trace: > > > > > [ 253.619330] <IRQ> > > > > > [ 253.619332] [<ffffffff815d7c25>] ? skb_copy_ubufs+0xa5/0x230 > > > > > [ 253.619347] [<ffffffff815e8525>] > > > > > __netif_receive_skb_core+0x6f5/0x940 > > > > > [ 253.619353] [<ffffffff815e8788>] __netif_receive_skb+0x18/0x60 > > > > > [ 253.619360] [<ffffffff815e87f8>] > > > > > netif_receive_skb_internal+0x28/0x90 > > > > > [ 253.619366] [<ffffffff815e91f5>] napi_gro_frags+0x125/0x1a0 > > > > > [ 253.619378] [<ffffffffa01b1173>] > > > > > mlx4_en_process_rx_cq+0x753/0xb50 > > > > [mlx4_en] > > > > > [ 253.619387] [<ffffffffa01b1657>] mlx4_en_poll_rx_cq+0x97/0x160 > > > > [mlx4_en] > > > > > > > > What makes you think this is Xen specific? I suggest raising this > > > > the > > > > the mlx4 maintainers. > > > > > > > > > > > Linux native and KVM guests (same hw, same kernel version+config) run > > > just > > > fine under the same workload. > > > > > Ping? > > > > From the fact that bare-metal and KVM works fine with this hardware I > > still think it's reasonable to assume that it's a Xen issue and not a > > mlx4 issue. > > > > Is this completely flawed? > > My (somewhat educated) guess is that this is to do with the difference > between (pseudo-)physical addresses and machine (AKA real-physical) > addresses when running under Xen. > > The way this often shows up is in drivers which do not make correct use of > the kernels DMA APIs but which happen to work on native x86 because > physical==bus address on x86. > > Sometimes booting natively with 'iommu=soft swiotlb=force' can expose these > sorts of issues. > I'll give this a try. > > You are running 64-bit so I don't think the recent "config: Enable > NEED_DMA_MAP_STATE by default when SWIOTLB is selected" is likely to be > relevant (it's already unconditionally on for 64-bit). > > The trace appears to be on rx from a physical nic, there shouldn't be any > magic Xen stuff (granted pages etc) getting themselves into that path at > all. If it were tx then maybe it might be an issue with foreign pages. In > any case I think you are able to repro with just dom0, i.e. never having > started a domU, is that right? > As far as I remember and as far as I can interpret my own e-mail, yes. Thanks for the feedback, I'll try the suggested approaches and also try using v4.3-rc1 and take it up with the mlx4 maintainers if I still see the issue. -Christoffer [-- Attachment #1.2: Type: text/html, Size: 4283 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Dom0 crash with apache bench (ab) 2015-09-14 15:20 ` Ian Campbell 2015-09-14 16:16 ` Christoffer Dall @ 2015-09-28 20:53 ` Christoffer Dall 2015-09-30 15:12 ` Konrad Rzeszutek Wilk 1 sibling, 1 reply; 13+ messages in thread From: Christoffer Dall @ 2015-09-28 20:53 UTC (permalink / raw) To: Ian Campbell; +Cc: xen-devel, Wei Liu, David Vrabel, Stefano Stabellini [-- Attachment #1.1: Type: text/plain, Size: 2877 bytes --] On Mon, Sep 14, 2015 at 5:20 PM, Ian Campbell <ian.campbell@citrix.com> wrote: > On Mon, 2015-09-14 at 14:40 +0200, Christoffer Dall wrote: > > On Fri, Jul 31, 2015 at 03:17:56PM +0200, Christoffer Dall wrote: > > > On Fri, Jul 31, 2015 at 12:28 PM, David Vrabel < > david.vrabel@citrix.com > > > > > > > wrote: > > > > > > > On 31/07/15 11:24, Stefano Stabellini wrote: > > > > > This is a Linux Dom0 crash on x86 (Dell PowerEdge R320, Xeon E5 > > > > > -2450), > > > > > CC'ing relevant people. As you can see from the links below the > > > > > crash > > > > > is: > > > > > > > > > > [ 253.619326] Call Trace: > > > > > [ 253.619330] <IRQ> > > > > > [ 253.619332] [<ffffffff815d7c25>] ? skb_copy_ubufs+0xa5/0x230 > > > > > [ 253.619347] [<ffffffff815e8525>] > > > > > __netif_receive_skb_core+0x6f5/0x940 > > > > > [ 253.619353] [<ffffffff815e8788>] __netif_receive_skb+0x18/0x60 > > > > > [ 253.619360] [<ffffffff815e87f8>] > > > > > netif_receive_skb_internal+0x28/0x90 > > > > > [ 253.619366] [<ffffffff815e91f5>] napi_gro_frags+0x125/0x1a0 > > > > > [ 253.619378] [<ffffffffa01b1173>] > > > > > mlx4_en_process_rx_cq+0x753/0xb50 > > > > [mlx4_en] > > > > > [ 253.619387] [<ffffffffa01b1657>] mlx4_en_poll_rx_cq+0x97/0x160 > > > > [mlx4_en] > > > > > > > > What makes you think this is Xen specific? I suggest raising this > > > > the > > > > the mlx4 maintainers. > > > > > > > > > > > Linux native and KVM guests (same hw, same kernel version+config) run > > > just > > > fine under the same workload. > > > > > Ping? > > > > From the fact that bare-metal and KVM works fine with this hardware I > > still think it's reasonable to assume that it's a Xen issue and not a > > mlx4 issue. > > > > Is this completely flawed? > > My (somewhat educated) guess is that this is to do with the difference > between (pseudo-)physical addresses and machine (AKA real-physical) > addresses when running under Xen. > > The way this often shows up is in drivers which do not make correct use of > the kernels DMA APIs but which happen to work on native x86 because > physical==bus address on x86. > > Sometimes booting natively with 'iommu=soft swiotlb=force' can expose these > sorts of issues. > Indeed it does, on both v4.0 and v4.3-rc2. > > You are running 64-bit so I don't think the recent "config: Enable > NEED_DMA_MAP_STATE by default when SWIOTLB is selected" is likely to be > relevant (it's already unconditionally on for 64-bit). > > The trace appears to be on rx from a physical nic, there shouldn't be any > magic Xen stuff (granted pages etc) getting themselves into that path at > all. If it were tx then maybe it might be an issue with foreign pages. In > any case I think you are able to repro with just dom0, i.e. never having > started a domU, is that right? > > Yes, I can reproduce on Dom0. I will send this to the Mellanox people. Thanks, -Christoffer [-- Attachment #1.2: Type: text/html, Size: 4225 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Dom0 crash with apache bench (ab) 2015-09-28 20:53 ` Christoffer Dall @ 2015-09-30 15:12 ` Konrad Rzeszutek Wilk 0 siblings, 0 replies; 13+ messages in thread From: Konrad Rzeszutek Wilk @ 2015-09-30 15:12 UTC (permalink / raw) To: Christoffer Dall Cc: David Vrabel, Wei Liu, Ian Campbell, Stefano Stabellini, xen-devel On Mon, Sep 28, 2015 at 10:53:33PM +0200, Christoffer Dall wrote: > On Mon, Sep 14, 2015 at 5:20 PM, Ian Campbell <ian.campbell@citrix.com> > wrote: > > > On Mon, 2015-09-14 at 14:40 +0200, Christoffer Dall wrote: > > > On Fri, Jul 31, 2015 at 03:17:56PM +0200, Christoffer Dall wrote: > > > > On Fri, Jul 31, 2015 at 12:28 PM, David Vrabel < > > david.vrabel@citrix.com > > > > > > > > > wrote: > > > > > > > > > On 31/07/15 11:24, Stefano Stabellini wrote: > > > > > > This is a Linux Dom0 crash on x86 (Dell PowerEdge R320, Xeon E5 > > > > > > -2450), > > > > > > CC'ing relevant people. As you can see from the links below the > > > > > > crash > > > > > > is: > > > > > > > > > > > > [ 253.619326] Call Trace: > > > > > > [ 253.619330] <IRQ> > > > > > > [ 253.619332] [<ffffffff815d7c25>] ? skb_copy_ubufs+0xa5/0x230 > > > > > > [ 253.619347] [<ffffffff815e8525>] > > > > > > __netif_receive_skb_core+0x6f5/0x940 > > > > > > [ 253.619353] [<ffffffff815e8788>] __netif_receive_skb+0x18/0x60 > > > > > > [ 253.619360] [<ffffffff815e87f8>] > > > > > > netif_receive_skb_internal+0x28/0x90 > > > > > > [ 253.619366] [<ffffffff815e91f5>] napi_gro_frags+0x125/0x1a0 > > > > > > [ 253.619378] [<ffffffffa01b1173>] > > > > > > mlx4_en_process_rx_cq+0x753/0xb50 > > > > > [mlx4_en] > > > > > > [ 253.619387] [<ffffffffa01b1657>] mlx4_en_poll_rx_cq+0x97/0x160 > > > > > [mlx4_en] > > > > > > > > > > What makes you think this is Xen specific? I suggest raising this > > > > > the > > > > > the mlx4 maintainers. > > > > > > > > > > > > > > Linux native and KVM guests (same hw, same kernel version+config) run > > > > just > > > > fine under the same workload. > > > > > > > Ping? > > > > > > From the fact that bare-metal and KVM works fine with this hardware I > > > still think it's reasonable to assume that it's a Xen issue and not a > > > mlx4 issue. > > > > > > Is this completely flawed? > > > > My (somewhat educated) guess is that this is to do with the difference > > between (pseudo-)physical addresses and machine (AKA real-physical) > > addresses when running under Xen. > > > > The way this often shows up is in drivers which do not make correct use of > > the kernels DMA APIs but which happen to work on native x86 because > > physical==bus address on x86. > > > > Sometimes booting natively with 'iommu=soft swiotlb=force' can expose these > > sorts of issues. > > > > Indeed it does, on both v4.0 and v4.3-rc2. Yeeey! > > > > > > You are running 64-bit so I don't think the recent "config: Enable > > NEED_DMA_MAP_STATE by default when SWIOTLB is selected" is likely to be > > relevant (it's already unconditionally on for 64-bit). > > > > The trace appears to be on rx from a physical nic, there shouldn't be any > > magic Xen stuff (granted pages etc) getting themselves into that path at > > all. If it were tx then maybe it might be an issue with foreign pages. In > > any case I think you are able to repro with just dom0, i.e. never having > > started a domU, is that right? > > > > > Yes, I can reproduce on Dom0. > > I will send this to the Mellanox people. Thank you :-) Thought please do keep us (or at least me) CC, this is an interesting bug. > > Thanks, > -Christoffer > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2015-09-30 15:12 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-07-28 13:09 Dom0 crash with apache bench (ab) Christoffer Dall 2015-07-28 14:50 ` Konrad Rzeszutek Wilk 2015-07-28 14:55 ` Ian Campbell 2015-07-28 15:00 ` Christoffer Dall 2015-07-31 10:24 ` Stefano Stabellini 2015-07-31 10:28 ` David Vrabel 2015-07-31 13:17 ` Christoffer Dall 2015-09-14 12:40 ` Christoffer Dall 2015-09-14 15:11 ` Konrad Rzeszutek Wilk 2015-09-14 15:20 ` Ian Campbell 2015-09-14 16:16 ` Christoffer Dall 2015-09-28 20:53 ` Christoffer Dall 2015-09-30 15:12 ` Konrad Rzeszutek Wilk
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).