* DMA trouble with current xen-sparse
@ 2005-10-28 19:21 Stephen C. Tweedie
2005-10-29 8:55 ` Keir Fraser
2005-10-30 9:52 ` Muli Ben-Yehuda
0 siblings, 2 replies; 18+ messages in thread
From: Stephen C. Tweedie @ 2005-10-28 19:21 UTC (permalink / raw)
To: xen-devel
Hi,
I've been trying to get current xen-sparse up and running on a 2-cpu box
and have had a number of problems. One has been that networking is
completely unstable: I get kernel panics under the slightest network
load.
The trouble is that this is a 1G box, so its memory is not large enough
to automatically enable the swiotlb. (arch/xen/i386/kernel/swiotlb.c
enables swiotlb automatically for dom0 only if there's at least 2G of
memory.) And the first time we get a pci_dma_single() request for a
dom0-contiguous region which crosses a page boundary, we hit the BUG_ON
at arch/xen/i386/kernel/pci_dma.c:270 due to dma_map_single() checking:
IOMMU_BUG_ON(range_straddles_page_boundary(ptr, size));
And this happens *instantly* on any loaded tcp connection on my e1000
NIC. All I need to do to kill the box is to ssh in and type "find\n".
Instant dom0 death after the ssh client receives about a dozen lines of
output. The stack trace is appended below.
The PCI mapping documentation certainly says that pci_map_single() needs
to be able to map a single region, not just a single page. If it can't,
then I suspect we really need to enable swiotlb by default, because
we'll just be unstable without it.
The kernel panics after this with "Fatal DMA error! Please use
'swiotlb=force'". But of course the default for Xen is to instantly
reboot at this point before the error is visible. And even after
catching the message with serial console, I found that "swiotlb=force"
*also* dies on this box, with
(XEN) (file=memory.c, line=57) Could not allocate order=14 extent: id=0 flags=0
(0 of 1)
kernel BUG at arch/xen/i386/mm/hypervisor.c:354
(xen_create_contiguous_region)!
[<c011a77d>] xen_create_contiguous_region+0x26d/0x2b0
[<c0112596>] swiotlb_init_with_default_size+0x86/0x1c0
[<c0112735>] swiotlb_init+0x65/0xa0
because we don't have a large enough zone at boot time to create the
64MB swiotlb.
Booting with "swiotlb=force swiotlb=8m" works around both of these bugs
and allows me to boot; fortunately things are much more stable after I
get this far.
Cheers,
Stephen
---
kernel BUG at arch/xen/i386/kernel/pci-dma.c:270 (dma_map_single)!
[<c010ecd6>] dma_map_single+0xf6/0x160
[<f49cd40b>] e1000_xmit_frame+0x40b/0xd30 [e1000]
[<c0313510>] qdisc_restart+0x100/0x2f0
[<c03241d0>] ip_finish_output2+0x0/0x250
[<c030d594>] nf_hook_slow+0x64/0x110
[<c03010ff>] dev_queue_xmit+0x9f/0x340
[<c032404c>] ip_finish_output+0x15c/0x2e0
[<c03241d0>] ip_finish_output2+0x0/0x250
[<c0324947>] ip_queue_xmit+0x2b7/0x560
[<c0323ec0>] dst_output+0x0/0x30
[<c0155bf2>] poison_obj+0x32/0x60
[<c0155408>] dbg_redzone1+0x18/0x60
[<c0155e06>] check_poison_obj+0x26/0x1c0
[<c0155bf2>] poison_obj+0x32/0x60
[<c0155408>] dbg_redzone1+0x18/0x60
[<c0157dbc>] cache_alloc_debugcheck_after+0x4c/0x1b0
[<c0336e24>] tcp_transmit_skb+0x3d4/0x810
[<c02fab10>] skb_clone+0x20/0x1d0
[<c0337efd>] tcp_write_xmit+0x10d/0x330
[<c0334943>] __tcp_data_snd_check+0xa3/0xe0
[<c02fa961>] kfree_skbmem+0x21/0x30
[<c0335069>] tcp_rcv_established+0x2a9/0x910
[<f4b3f036>] ipt_hook+0x36/0x40 [iptable_filter]
[<c033ef5a>] tcp_v4_do_rcv+0xfa/0x150
[<c033f8d5>] tcp_v4_rcv+0x925/0x980
[<c030d594>] nf_hook_slow+0x64/0x110
[<c03208d0>] ip_local_deliver_finish+0x0/0x270
[<c03206bc>] ip_local_deliver+0xdc/0x2f0
[<c03208d0>] ip_local_deliver_finish+0x0/0x270
[<c0320f0e>] ip_rcv+0x3ce/0x5b0
[<c03210f0>] ip_rcv_finish+0x0/0x320
[<c0301be0>] netif_receive_skb+0x250/0x310
[<f49cf3ae>] e1000_clean_rx_irq+0x13e/0x5d0 [e1000]
[<f49ce8a2>] e1000_clean+0x52/0x1c0 [e1000]
[<c0301f2c>] net_rx_action+0xdc/0x220
[<c0128f4a>] __do_softirq+0x8a/0x120
[<c012905d>] do_softirq+0x7d/0x80
[<c010ee22>] do_IRQ+0x22/0x30
[<c01049be>] evtchn_do_upcall+0x9e/0xe0
[<c010a2f0>] hypervisor_callback+0x2c/0x34
[<c0107b30>] xen_idle+0x40/0x80
[<c0107bd4>] cpu_idle+0x64/0xb0
[<c0436a4f>] start_kernel+0x1af/0x210
[<c0436380>] unknown_bootoption+0x0/0x220
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: DMA trouble with current xen-sparse
2005-10-28 19:21 Stephen C. Tweedie
@ 2005-10-29 8:55 ` Keir Fraser
2005-10-30 9:52 ` Muli Ben-Yehuda
1 sibling, 0 replies; 18+ messages in thread
From: Keir Fraser @ 2005-10-29 8:55 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: xen-devel
On 28 Oct 2005, at 20:21, Stephen C. Tweedie wrote:
> The trouble is that this is a 1G box, so its memory is not large enough
> to automatically enable the swiotlb. (arch/xen/i386/kernel/swiotlb.c
> enables swiotlb automatically for dom0 only if there's at least 2G of
> memory.) And the first time we get a pci_dma_single() request for a
> dom0-contiguous region which crosses a page boundary, we hit the BUG_ON
> at arch/xen/i386/kernel/pci_dma.c:270 due to dma_map_single() checking:
>
> IOMMU_BUG_ON(range_straddles_page_boundary(ptr, size));
>
> And this happens *instantly* on any loaded tcp connection on my e1000
> NIC. All I need to do to kill the box is to ssh in and type "find\n".
> Instant dom0 death after the ssh client receives about a dozen lines of
> output. The stack trace is appended below.
Is the network interface set up to use jumbo frames? Otherwise I
wouldn't expect alloc_skb() to allocate a data area that straddles a
page boundary, since the allocation will come from one of the
sub-page-sized power-of-two kmem caches.
If the problem is jumbo frames, we might need to add a hook to
alloc_skb(). Using swiotlb will suck hugely.
-- Keir
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: DMA trouble with current xen-sparse
2005-10-28 19:21 Stephen C. Tweedie
2005-10-29 8:55 ` Keir Fraser
@ 2005-10-30 9:52 ` Muli Ben-Yehuda
1 sibling, 0 replies; 18+ messages in thread
From: Muli Ben-Yehuda @ 2005-10-30 9:52 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: xen-devel
On Fri, Oct 28, 2005 at 03:21:20PM -0400, Stephen C. Tweedie wrote:
> Hi,
>
> I've been trying to get current xen-sparse up and running on a 2-cpu box
> and have had a number of problems. One has been that networking is
> completely unstable: I get kernel panics under the slightest network
> load.
FYI, I opened bugzilla #373 to track this issue.
Cheers,
Muli
--
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: DMA trouble with current xen-sparse
@ 2005-11-02 15:32 Ian Pratt
2005-11-02 15:36 ` Stephen Tweedie
0 siblings, 1 reply; 18+ messages in thread
From: Ian Pratt @ 2005-11-02 15:32 UTC (permalink / raw)
To: Keir Fraser, Stephen C. Tweedie; +Cc: xen-devel
> On 28 Oct 2005, at 20:21, Stephen C. Tweedie wrote:
>
> > The trouble is that this is a 1G box, so its memory is not large
> > enough to automatically enable the swiotlb.
> > (arch/xen/i386/kernel/swiotlb.c enables swiotlb
> automatically for dom0
> > only if there's at least 2G of
> > memory.) And the first time we get a pci_dma_single()
> request for a
> > dom0-contiguous region which crosses a page boundary, we hit the
> > BUG_ON at arch/xen/i386/kernel/pci_dma.c:270 due to
> dma_map_single() checking:
Does your card support TSO? What revision e1000 is it?
Please can you try turning it off with:
ethtool -K eth0 tso off
If TSO is the problem we'll come up with a better fix than using
swiotlb.
Thanks,
Ian
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: DMA trouble with current xen-sparse
2005-11-02 15:32 Ian Pratt
@ 2005-11-02 15:36 ` Stephen Tweedie
2005-11-02 15:59 ` Daniel Veillard
0 siblings, 1 reply; 18+ messages in thread
From: Stephen Tweedie @ 2005-11-02 15:36 UTC (permalink / raw)
To: Ian Pratt; +Cc: xen-devel
Hi,
On Wed, Nov 02, 2005 at 03:32:58PM -0000, Ian Pratt wrote:
> Does your card support TSO? What revision e1000 is it?
Yes, and I'll check on Friday once I'm back from travelling (but it is
a very recent box.)
> Please can you try turning it off with:
> ethtool -K eth0 tso off
I already tried that and it did not help. I've also tried both gcc32
and gcc4 with no success.
Cheers,
Stephen
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: DMA trouble with current xen-sparse
2005-11-02 15:36 ` Stephen Tweedie
@ 2005-11-02 15:59 ` Daniel Veillard
2005-11-02 17:12 ` Keir Fraser
0 siblings, 1 reply; 18+ messages in thread
From: Daniel Veillard @ 2005-11-02 15:59 UTC (permalink / raw)
To: Ian Pratt; +Cc: xen-devel
On Wed, Nov 02, 2005 at 10:36:17AM -0500, Stephen Tweedie wrote:
> Hi,
>
> On Wed, Nov 02, 2005 at 03:32:58PM -0000, Ian Pratt wrote:
>
> > Does your card support TSO? What revision e1000 is it?
>
> Yes, and I'll check on Friday once I'm back from travelling (but it is
> a very recent box.)
I am seeing the exact same problem with my Dell Latitude D800 laptop using
Ethernet controller: Broadcom Corporation NetXtreme BCM5705M Gigabit Ethernet (rev 01)
This is a relatively common and not so recent configuration.
> > Please can you try turning it off with:
> > ethtool -K eth0 tso off
>
> I already tried that and it did not help. I've also tried both gcc32
> and gcc4 with no success.
[root@localhost ~]# ethtool -K eth0 tso off
Cannot set device tcp segmentation offload settings: Operation not supported
too bad ...
with 'swiotlb=force swiotlb=8m' kernel parameters the box is stable,
without it very basic network access can crash it (say 'locate lib' over ssh)
and then the whole system reboots.
100% reproductible for me, and without crazy hardware :-)
Hope this helps,
Daniel
--
Daniel Veillard | Red Hat http://redhat.com/
veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: DMA trouble with current xen-sparse
2005-11-02 15:59 ` Daniel Veillard
@ 2005-11-02 17:12 ` Keir Fraser
2005-11-02 23:04 ` Daniel Veillard
0 siblings, 1 reply; 18+ messages in thread
From: Keir Fraser @ 2005-11-02 17:12 UTC (permalink / raw)
To: veillard; +Cc: Ian Pratt, xen-devel
On 2 Nov 2005, at 15:59, Daniel Veillard wrote:
> [root@localhost ~]# ethtool -K eth0 tso off
> Cannot set device tcp segmentation offload settings: Operation not
> supported
>
> too bad ...
> with 'swiotlb=force swiotlb=8m' kernel parameters the box is stable,
> without it very basic network access can crash it (say 'locate lib'
> over ssh)
> and then the whole system reboots.
>
> 100% reproductible for me, and without crazy hardware :-)
It'd be interesting to know what form of skbuffs get sent to the driver
when this happens. e.g., how big is the skbuff data area, is the skbuff
fragmented, etc.
-- Keir
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: DMA trouble with current xen-sparse
2005-11-02 17:12 ` Keir Fraser
@ 2005-11-02 23:04 ` Daniel Veillard
2005-11-03 2:45 ` Vincent Hanquez
0 siblings, 1 reply; 18+ messages in thread
From: Daniel Veillard @ 2005-11-02 23:04 UTC (permalink / raw)
To: Keir Fraser; +Cc: Ian Pratt, xen-devel
On Wed, Nov 02, 2005 at 05:12:27PM +0000, Keir Fraser wrote:
>
> On 2 Nov 2005, at 15:59, Daniel Veillard wrote:
>
> >[root@localhost ~]# ethtool -K eth0 tso off
> >Cannot set device tcp segmentation offload settings: Operation not
> >supported
> >
> > too bad ...
> > with 'swiotlb=force swiotlb=8m' kernel parameters the box is stable,
> >without it very basic network access can crash it (say 'locate lib'
> >over ssh)
> >and then the whole system reboots.
> >
> > 100% reproductible for me, and without crazy hardware :-)
>
> It'd be interesting to know what form of skbuffs get sent to the driver
> when this happens. e.g., how big is the skbuff data area, is the skbuff
> fragmented, etc.
I'm not a kernel hacker, but if you give me a patch displaying those
informations at the IOMMU_BUG_ON pointed by Steven, I will gladly rebuild
and try to reboot over it to give you the informations (I have no serial
so hint on avoiding the instant reboot of the dom0 would help). Oh yeah
it's just dom0 on top of the hypervisor, no domU even started.
Daniel
--
Daniel Veillard | Red Hat http://redhat.com/
veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: DMA trouble with current xen-sparse
2005-11-02 23:04 ` Daniel Veillard
@ 2005-11-03 2:45 ` Vincent Hanquez
2005-11-03 14:51 ` Daniel Veillard
0 siblings, 1 reply; 18+ messages in thread
From: Vincent Hanquez @ 2005-11-03 2:45 UTC (permalink / raw)
To: Daniel Veillard; +Cc: Ian Pratt, xen-devel
On Wed, Nov 02, 2005 at 06:04:25PM -0500, Daniel Veillard wrote:
> I'm not a kernel hacker, but if you give me a patch displaying those
> informations at the IOMMU_BUG_ON pointed by Steven, I will gladly rebuild
> and try to reboot over it to give you the informations (I have no serial
> so hint on avoiding the instant reboot of the dom0 would help). Oh yeah
> it's just dom0 on top of the hypervisor, no domU even started.
Hi Daniel,
could you try the following patch just to have a bit more information
about the pointer and the size ?
diff -r ca2e91ab4311 linux-2.6-xen-sparse/arch/xen/i386/kernel/pci-dma.c
--- a/linux-2.6-xen-sparse/arch/xen/i386/kernel/pci-dma.c Thu Nov 3 01:45:07 2005
+++ b/linux-2.6-xen-sparse/arch/xen/i386/kernel/pci-dma.c Wed Nov 2 21:32:34 2005
@@ -267,6 +267,8 @@
dma = swiotlb_map_single(dev, ptr, size, direction);
} else {
dma = virt_to_bus(ptr);
+ if (range_straddles_page_boundary(ptr, size))
+ printk("ptr: %p %zd\n", ptr, size);
IOMMU_BUG_ON(range_straddles_page_boundary(ptr, size));
IOMMU_BUG_ON(address_needs_mapping(dev, dma));
}
stick a while (1) ; after the printk would help you to avoid the reboot
something like:
if (range_straddles_page_boundary(ptr, size)) {
printk("ptr: %p %zd\n", ptr, size);
while (1);
}
Cheers,
--
Vincent Hanquez
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: DMA trouble with current xen-sparse
2005-11-03 2:45 ` Vincent Hanquez
@ 2005-11-03 14:51 ` Daniel Veillard
0 siblings, 0 replies; 18+ messages in thread
From: Daniel Veillard @ 2005-11-03 14:51 UTC (permalink / raw)
To: Vincent Hanquez; +Cc: Ian Pratt, xen-devel
On Thu, Nov 03, 2005 at 03:45:27AM +0100, Vincent Hanquez wrote:
> On Wed, Nov 02, 2005 at 06:04:25PM -0500, Daniel Veillard wrote:
> > I'm not a kernel hacker, but if you give me a patch displaying those
> > informations at the IOMMU_BUG_ON pointed by Steven, I will gladly rebuild
> > and try to reboot over it to give you the informations (I have no serial
> > so hint on avoiding the instant reboot of the dom0 would help). Oh yeah
> > it's just dom0 on top of the hypervisor, no domU even started.
>
> Hi Daniel,
Hi, Salut :-)
> could you try the following patch just to have a bit more information
> about the pointer and the size ?
[...]
> stick a while (1) ; after the printk would help you to avoid the reboot
> something like:
Sure, took a bit of time to recompile the kernel (I didn't do this for years)
and it crashed as expected, here are the info:
ptr: f160ed8e 1514
the size looks a full ethernet frame, i.e. 1500 of payload, 2 ethernet
addresses and the 2bytes for the ethernet type, that looks kosher to me
but clearly it is not aligned.
Daniel
--
Daniel Veillard | Red Hat http://redhat.com/
veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: DMA trouble with current xen-sparse
@ 2005-11-04 14:50 Ian Pratt
2005-11-07 21:28 ` Stephen C. Tweedie
0 siblings, 1 reply; 18+ messages in thread
From: Ian Pratt @ 2005-11-04 14:50 UTC (permalink / raw)
To: veillard, Vincent Hanquez; +Cc: xen-devel
> Sure, took a bit of time to recompile the kernel (I didn't do
> this for years) and it crashed as expected, here are the info:
>
> ptr: f160ed8e 1514
>
> the size looks a full ethernet frame, i.e. 1500 of payload, 2
> ethernet addresses and the 2bytes for the ethernet type, that
> looks kosher to me but clearly it is not aligned.
Please can you try using either our -xen or -xen0 kernel config. I
strongly suspect there's something in your config that is breaking this
for you, just not sure what.
(NB: make sure you 'rm dist/install/boot/config*' to avoid make woprld
from grabbing your old config)
Best,
Ian
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: DMA trouble with current xen-sparse
@ 2005-11-07 13:51 Ian Pratt
2005-11-07 14:15 ` Daniel Veillard
0 siblings, 1 reply; 18+ messages in thread
From: Ian Pratt @ 2005-11-07 13:51 UTC (permalink / raw)
To: veillard, Vincent Hanquez; +Cc: xen-devel
> Sure, took a bit of time to recompile the kernel (I didn't do
> this for years) and it crashed as expected, here are the info:
>
> ptr: f160ed8e 1514
>
> the size looks a full ethernet frame, i.e. 1500 of payload, 2
> ethernet addresses and the 2bytes for the ethernet type, that
> looks kosher to me but clearly it is not aligned.
This allocation isn't aligned to the next power of 2 boundary ---
usually 1514 byte allocations are 2KB aligned.
You're not enabling some experimental option in your config that changes
the alignment of slab allocations are you?
Ian
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: DMA trouble with current xen-sparse
2005-11-07 13:51 Ian Pratt
@ 2005-11-07 14:15 ` Daniel Veillard
0 siblings, 0 replies; 18+ messages in thread
From: Daniel Veillard @ 2005-11-07 14:15 UTC (permalink / raw)
To: Ian Pratt, Stephen Tweedie; +Cc: xen-devel, Vincent Hanquez
On Mon, Nov 07, 2005 at 01:51:35PM -0000, Ian Pratt wrote:
>
> > Sure, took a bit of time to recompile the kernel (I didn't do
> > this for years) and it crashed as expected, here are the info:
> >
> > ptr: f160ed8e 1514
> >
> > the size looks a full ethernet frame, i.e. 1500 of payload, 2
> > ethernet addresses and the 2bytes for the ethernet type, that
> > looks kosher to me but clearly it is not aligned.
>
> This allocation isn't aligned to the next power of 2 boundary ---
> usually 1514 byte allocations are 2KB aligned.
>
> You're not enabling some experimental option in your config that changes
> the alignment of slab allocations are you?
Hi Ian,
sorry for not responding to your previous message. The point is that I don't
really know offhand myself those kernel internals aspects. Steven can certainly
provide a more informed answer. I checked our kernel config, and I see
CONFIG_DEBUG_SLAB=y
to be set up in our kernel-2.6.12-i686-hypervisor.config. Browsing to check
all the other DEBUG option which might be potentially relevant I only found
CONFIG_DEBUG_KERNEL CONFIG_DEBUG_HIGHMEM and CONFIG_DEBUG_INFO enabled.
CONFIG_DEBUG_DRIVER is not set. The Xen options are:
CONFIG_XEN=y
CONFIG_ARCH_XEN=y
CONFIG_NO_IDLE_HZ=y
CONFIG_XEN_WRITABLE_PAGETABLES=y
# CONFIG_XEN_SHADOW_MODE is not set
CONFIG_XEN_SCRUB_PAGES=y
CONFIG_FOREIGN_PAGES=y
CONFIG_HAVE_ARCH_DEV_ALLOC_SKB=y
CONFIG_XEN_BLKDEV_GRANT=y
# CONFIG_XEN_BLKDEV_TAP_BE is not set
# CONFIG_XEN_BLKDEV_TAP is not set
# CONFIG_XEN_NETDEV_GRANT_TX is not set
# CONFIG_XEN_NETDEV_GRANT_RX is not set
# CONFIG_SMP_ALTERNATIVES is not set
CONFIG_X86=y
# CONFIG_X86_64 is not set
CONFIG_XENARCH="i386"
CONFIG_MMU=y
CONFIG_UID16=y
CONFIG_GENERIC_ISA_DMA=y
# CONFIG_M686 is not set
Daniel
--
Daniel Veillard | Red Hat http://redhat.com/
veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: DMA trouble with current xen-sparse
2005-11-04 14:50 DMA trouble with current xen-sparse Ian Pratt
@ 2005-11-07 21:28 ` Stephen C. Tweedie
2005-11-08 6:41 ` Daniel Veillard
0 siblings, 1 reply; 18+ messages in thread
From: Stephen C. Tweedie @ 2005-11-07 21:28 UTC (permalink / raw)
To: Ian Pratt; +Cc: xen-devel, Vincent Hanquez, veillard
Hi,
On Fri, 2005-11-04 at 14:50 +0000, Ian Pratt wrote:
> > Sure, took a bit of time to recompile the kernel (I didn't do
> > this for years) and it crashed as expected, here are the info:
> >
> > ptr: f160ed8e 1514
> >
> > the size looks a full ethernet frame, i.e. 1500 of payload, 2
> > ethernet addresses and the 2bytes for the ethernet type, that
> > looks kosher to me but clearly it is not aligned.
>
> Please can you try using either our -xen or -xen0 kernel config. I
> strongly suspect there's something in your config that is breaking this
> for you, just not sure what.
I just tried to build it; it would not boot. That was building the
2.6.12 xen-sparse w/ gcc4; retrying with gcc32 now.
But I suspect that the problem is CONFIG_SLAB_DEBUG. That sets up slab
redzoning which checks for buffer overruns. One consequence is that
cached objects grow very slightly --- enough that the 2k kmalloc cache
gets created with 3 objects per order-2 slab, ie. all MTU-sized frames
are going to be allocated from an 8k slab and one in three will straddle
the page boundary.
I may not have time to verify that today, but it sounds like a likely
explanation for what we're seeing.
NB. even without redzoning, the slab allocator will try both order-1 and
order-2 slab sizes to see what minimises the wasted space in a slab, so
any subsystem that's doing its own allocation of objects from a pool
outside kmalloc may hit a size that creates these page-straddling
caches.
There's a hacky quick-fix, which is to change
#define BREAK_GFP_ORDER_HI 1
from 1 to 0 in mm/slab.c. But that's just going to waste more slab
cache space for many caches. Without that change, the fact is that an
important debugging option is creating cross-page objects routinely, and
that the slab allocator can create such objects quite normally even
without that option; so it may end up being something that Xen just has
to deal with.
--Stephen
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: DMA trouble with current xen-sparse
@ 2005-11-07 23:03 Ian Pratt
0 siblings, 0 replies; 18+ messages in thread
From: Ian Pratt @ 2005-11-07 23:03 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: xen-devel, Vincent Hanquez, veillard
> from 1 to 0 in mm/slab.c. But that's just going to waste more slab
> cache space for many caches. Without that change, the fact
> is that an
> important debugging option is creating cross-page objects
> routinely, and that the slab allocator can create such
> objects quite normally even without that option; so it may
> end up being something that Xen just has to deal with.
The best xen fix for this is for us to hook alloc_skb (rather than just
dev_alloc_skb). This will enable us to solve the jumbo frames issue too.
Ian
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: DMA trouble with current xen-sparse
2005-11-07 21:28 ` Stephen C. Tweedie
@ 2005-11-08 6:41 ` Daniel Veillard
2005-11-08 15:55 ` Keir Fraser
0 siblings, 1 reply; 18+ messages in thread
From: Daniel Veillard @ 2005-11-08 6:41 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: Ian Pratt, xen-devel, Vincent Hanquez
On Mon, Nov 07, 2005 at 04:28:59PM -0500, Stephen C. Tweedie wrote:
> Hi,
>
> On Fri, 2005-11-04 at 14:50 +0000, Ian Pratt wrote:
>
> > > Sure, took a bit of time to recompile the kernel (I didn't do
> > > this for years) and it crashed as expected, here are the info:
> > >
> > > ptr: f160ed8e 1514
> > >
> > > the size looks a full ethernet frame, i.e. 1500 of payload, 2
> > > ethernet addresses and the 2bytes for the ethernet type, that
> > > looks kosher to me but clearly it is not aligned.
> >
> > Please can you try using either our -xen or -xen0 kernel config. I
> > strongly suspect there's something in your config that is breaking this
> > for you, just not sure what.
>
> I just tried to build it; it would not boot. That was building the
> 2.6.12 xen-sparse w/ gcc4; retrying with gcc32 now.
>
> But I suspect that the problem is CONFIG_SLAB_DEBUG. That sets up slab
> redzoning which checks for buffer overruns. One consequence is that
Just to confirm that CONFIG_SLAB_DEBUG is the one exposing the issue.
I recompiled the exact same kernel with just that option turned off
and the tg3 driver does not seems to hang anymore.
Daniel
--
Daniel Veillard | Red Hat http://redhat.com/
veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: DMA trouble with current xen-sparse
2005-11-08 15:55 ` Keir Fraser
@ 2005-11-08 15:25 ` Stephen C. Tweedie
0 siblings, 0 replies; 18+ messages in thread
From: Stephen C. Tweedie @ 2005-11-08 15:25 UTC (permalink / raw)
To: Keir Fraser; +Cc: Ian Pratt, xen-devel, Vincent Hanquez, veillard
Hi,
On Tue, 2005-11-08 at 15:55 +0000, Keir Fraser wrote:
> > Just to confirm that CONFIG_SLAB_DEBUG is the one exposing the issue.
> > I recompiled the exact same kernel with just that option turned off
> > and the tg3 driver does not seems to hang anymore.
>
> This is now fixed in our tree (changeset 7700:98bcd8fbd5e3). Should get
> pushed to the public repository in an hour or two...
Thanks; I'll have a look at that when it shows up. My main test box at
work just died, though, so it might be a while before I can test it out
properly.
Cheers,
Stephen
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: DMA trouble with current xen-sparse
2005-11-08 6:41 ` Daniel Veillard
@ 2005-11-08 15:55 ` Keir Fraser
2005-11-08 15:25 ` Stephen C. Tweedie
0 siblings, 1 reply; 18+ messages in thread
From: Keir Fraser @ 2005-11-08 15:55 UTC (permalink / raw)
To: veillard; +Cc: xen-devel, Vincent Hanquez, Ian Pratt
On 8 Nov 2005, at 06:41, Daniel Veillard wrote:
>> I just tried to build it; it would not boot. That was building the
>> 2.6.12 xen-sparse w/ gcc4; retrying with gcc32 now.
>>
>> But I suspect that the problem is CONFIG_SLAB_DEBUG. That sets up
>> slab
>> redzoning which checks for buffer overruns. One consequence is that
>
> Just to confirm that CONFIG_SLAB_DEBUG is the one exposing the issue.
> I recompiled the exact same kernel with just that option turned off
> and the tg3 driver does not seems to hang anymore.
This is now fixed in our tree (changeset 7700:98bcd8fbd5e3). Should get
pushed to the public repository in an hour or two...
-- Keir
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2005-11-08 15:55 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-04 14:50 DMA trouble with current xen-sparse Ian Pratt
2005-11-07 21:28 ` Stephen C. Tweedie
2005-11-08 6:41 ` Daniel Veillard
2005-11-08 15:55 ` Keir Fraser
2005-11-08 15:25 ` Stephen C. Tweedie
-- strict thread matches above, loose matches on Subject: below --
2005-11-07 23:03 Ian Pratt
2005-11-07 13:51 Ian Pratt
2005-11-07 14:15 ` Daniel Veillard
2005-11-02 15:32 Ian Pratt
2005-11-02 15:36 ` Stephen Tweedie
2005-11-02 15:59 ` Daniel Veillard
2005-11-02 17:12 ` Keir Fraser
2005-11-02 23:04 ` Daniel Veillard
2005-11-03 2:45 ` Vincent Hanquez
2005-11-03 14:51 ` Daniel Veillard
2005-10-28 19:21 Stephen C. Tweedie
2005-10-29 8:55 ` Keir Fraser
2005-10-30 9:52 ` Muli Ben-Yehuda
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.