netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [E1000-devel] Page Allocation Failure with e1000 using jumbo frame
       [not found] <1124326404.5546.215.camel@localhost.localdomain>
@ 2005-08-19 16:51 ` Jesse Brandeburg
  2005-08-19 17:01   ` Ming Zhang
                     ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Jesse Brandeburg @ 2005-08-19 16:51 UTC (permalink / raw)
  To: Ming Zhang; +Cc: E1000, iet-dev, netdev

included netdev...


On Wed, 17 Aug 2005, Ming Zhang wrote:

> 
> Hi folks
> 
> We ran into this problem when running jumbo frame with iscsi over e1000.
> the MTU1500 is fine while jumbo frame can stably reproduce this error.
> 
> when meet this error, as reported in iet list, the box still has >600MB
> ram free. also the slab is not heavily used.
> 
> any idea on this?


So, what we do know is that your kernel memory manager is (for whatever 
unknown reason) having trouble finding contiguous 2^3 pages (32kB) chunks 
of memory.  This occurs because we need to give our hardware 16kB 
contiguous (currently, we actually have a patch for this in internal 
testing) which means that when we do dev_alloc_skb, it allocates 16kB + 16 
bytes reserve, plus 2 bytes NET_IP_ALIGN, which takes us into the 32kB 
slab (power of two roundup)

I guess we need to approach the memory manager guys and ask them why the 
current kernels are having so much trouble getting contiguous memory. 
Also, recently thanks to David Miller's discussions on socket charge, we 
understand that we're getting hit hard by using such a big buffer.

Netdev any ideas?

Jesse


> attached partial email on iet list.
> 
> -----------------------------------
> In either case it only seems to be an issue with jumbo frames.  I'm 
> also reasonably sure the box isn't actually running out of memory - 
> I'm running the same test right this minute on the actual disk array 
> with MTU=1500 on both ends and my target box has 642988k free. 
> Running it again with MTU=9000 generates the error right away and I 
> see 642696k free during this test run.
> 
> >> generate the Page Allocation Failure messages after running for a few
> >> minutes on a 'real' device and almost instantly using 'nullio'.
> >>
> > and this happen instantly? can u check the RAM usage? also can u check
> > the slabtop? maybe here are memory leak?
> 
> I'm not familiar enough with slabtop to interpret its output but 
> here's the summary section:
> 
> Active / Total Objects (% used)    : 250586 / 281397 (89.1%)
> Active / Total Slabs (% used)      : 6301 / 6310 (99.9%)
> Active / Total Caches (% used)     : 83 / 124 (66.9%)
> Active / Total Size (% used)       : 30053.05K / 33686.83K (89.2%)
> Minimum / Average / Maximum Object : 0.01K / 0.12K / 128.00K
> 
> Bryn
> 
> 
> 
> the detail msg.
> 
> -------- Forwarded Message --------
> > From: Bryn Hughes <linux@nashira.ca>
> > To: iet-dev <iscsitarget-devel@lists.sourceforge.net>
> > Subject: [Iscsitarget-devel] Page Allocation Failure
> > Date: Tue, 16 Aug 2005 14:49:29 -0700
> > Today I started doing testing with a larger volume and jumbo frames.
> > I'm running bonnie with a 8G test set on a 100G volume and since I
> > enabled jumbo frames I've been seeing this in dmesg over and over
> > again:
> >
> >
> > istd1: page allocation failure. order:3, mode:0x20
> > [<c0148b33>] __alloc_pages+0x343/0x3c0
> > [<c011cd58>] scheduler_tick+0x198/0x400
> > [<c014b602>] kmem_getpages+0x32/0xa0
> > [<c014c35e>] cache_grow+0xbe/0x180
> > [<c014c58d>] cache_alloc_refill+0x16d/0x240
> > [<c014c95a>] __kmalloc+0x6a/0x70
> > [<c02ba3e0>] alloc_skb+0x40/0xf0
> > [<f924471e>] e1000_alloc_rx_buffers+0x6e/0x3a0 [e1000]
> > [<f9243ee8>] e1000_clean_rx_irq+0x1e8/0x4d0 [e1000]
> > [<c02df856>] ip_queue_xmit+0x306/0x540
> > [<f92436da>] e1000_clean+0x9a/0x150 [e1000]
> > [<c02c0cf9>] net_rx_action+0xd9/0x1b0
> > [<c01267d2>] __do_softirq+0x82/0x100
> > [<c0126885>] do_softirq+0x35/0x40
> > [<c01066db>] do_IRQ+0x3b/0x70
> > [<c0104b86>] common_interrupt+0x1a/0x20
> > [<c0205a0c>] __copy_user_intel+0x2c/0xb0
> > [<c0205b9a>] __copy_to_user_ll+0x5a/0x80
> > [<c0205c76>] copy_to_user+0x36/0x60
> > [<c02bc689>] memcpy_toiovec+0x29/0x50
> > [<c02bcc5b>] skb_copy_datagram_iovec+0x4b/0x210
> > [<c02f7690>] tcp_v4_do_rcv+0xf0/0x110
> > [<c02b93e4>] __release_sock+0x54/0x70
> > [<c02e62c3>] tcp_recvmsg+0x4b3/0x750
> > [<c0148601>] buffered_rmqueue+0xf1/0x210
> > [<c01488b5>] __alloc_pages+0xc5/0x3c0
> > [<c02b9d38>] sock_common_recvmsg+0x48/0x70
> > [<c02b61e1>] sock_recvmsg+0x131/0x170
> > [<c02f0566>] tcp_transmit_skb+0x426/0x760
> > [<c012a2e7>] __mod_timer+0xf7/0x130
> > [<c02b993c>] sk_reset_timer+0xc/0x20
> > [<c02f12e6>] tcp_write_xmit+0x176/0x2c0
> > [<c0136120>] autoremove_wake_function+0x0/0x50
> > [<c03055a5>] inet_ioctl+0x45/0xf0
> > [<f9379d00>] is_data_available+0x30/0x50 [iscsi_trgt]
> > [<f9379e33>] do_recv+0xc3/0x1c0 [iscsi_trgt]
> > [<c030535a>] inet_sendmsg+0x4a/0x70
> > [<f9243ee8>] e1000_clean_rx_irq+0x1e8/0x4d0 [e1000]
> > [<c02c0cf9>] net_rx_action+0xd9/0x1b0
> > [<c0136120>] autoremove_wake_function+0x0/0x50
> > [<c01267d2>] __do_softirq+0x82/0x100
> > [<c014820a>] rmqueue_bulk+0x7a/0x90
> > [<c0148085>] prep_new_page+0x55/0x60
> > [<c0148601>] buffered_rmqueue+0xf1/0x210
> > [<c0162b38>] do_readv_writev+0x1e8/0x250
> > [<c01488b5>] __alloc_pages+0xc5/0x3c0
> > [<f93777c8>] cmnd_recv_pdu+0xb8/0x210 [iscsi_trgt]
> > [<f937607c>] tio_add_pages+0x7c/0xe0 [iscsi_trgt]
> > [<f9378375>] scsi_cmnd_start+0x3c5/0x490 [iscsi_trgt]
> > [<f937754f>] cmnd_insert_hash+0xff/0x140 [iscsi_trgt]
> > [<c02b9d86>] sock_common_setsockopt+0x26/0x40
> > [<c011b470>] try_to_wake_up+0x2b0/0x300
> > [<f937a38f>] recv+0x36f/0x420 [iscsi_trgt]
> > [<f937adb3>] process_io+0x23/0xa0 [iscsi_trgt]
> > [<f937b109>] istd+0x99/0x100 [iscsi_trgt]
> > [<f937b070>] istd+0x0/0x100 [iscsi_trgt]
> > [<c0135c65>] kthread+0xa5/0xf0
> > [<c0135bc0>] kthread+0x0/0xf0
> > [<c0102455>] kernel_thread_helper+0x5/0x10
> >
> >
> > Any idea what would be causing this?
> >
> >
> > SLES9/kernel 2.6.12.3
> > IET 0.4.11
> > e1000 6.1.16
> >
> 
> 
> 
> 
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> E1000-devel mailing list
> E1000-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> 
> 
>


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Page Allocation Failure with e1000 using jumbo frame
  2005-08-19 16:51 ` [E1000-devel] Page Allocation Failure with e1000 using jumbo frame Jesse Brandeburg
@ 2005-08-19 17:01   ` Ming Zhang
  2005-08-19 17:33     ` Page Allocation Failure with e1000 using jumboframe Jesse Brandeburg
  2005-08-19 17:02   ` Page Allocation Failure with e1000 using jumbo frame Andi Kleen
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Ming Zhang @ 2005-08-19 17:01 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: E1000, iet-dev, netdev

This is first reported on IET list and then i redo the test with vanilla
2.6.12.4 kernel and everything went fine.

so i suspect if there are special case caused by vendor kernel.

is this 32KB ATOMIC ram allocation request only available in jumbo frame
case? since the regular MTU case goes fine.

ming


On Fri, 2005-08-19 at 09:51 -0700, Jesse Brandeburg wrote:
> included netdev...
> 
> 
> On Wed, 17 Aug 2005, Ming Zhang wrote:
> 
> > 
> > Hi folks
> > 
> > We ran into this problem when running jumbo frame with iscsi over e1000.
> > the MTU1500 is fine while jumbo frame can stably reproduce this error.
> > 
> > when meet this error, as reported in iet list, the box still has >600MB
> > ram free. also the slab is not heavily used.
> > 
> > any idea on this?
> 
> 
> So, what we do know is that your kernel memory manager is (for whatever 
> unknown reason) having trouble finding contiguous 2^3 pages (32kB) chunks 
> of memory.  This occurs because we need to give our hardware 16kB 
> contiguous (currently, we actually have a patch for this in internal 
> testing) which means that when we do dev_alloc_skb, it allocates 16kB + 16 
> bytes reserve, plus 2 bytes NET_IP_ALIGN, which takes us into the 32kB 
> slab (power of two roundup)
> 
> I guess we need to approach the memory manager guys and ask them why the 
> current kernels are having so much trouble getting contiguous memory. 
> Also, recently thanks to David Miller's discussions on socket charge, we 
> understand that we're getting hit hard by using such a big buffer.
> 
> Netdev any ideas?
> 
> Jesse
> 
> 
> > attached partial email on iet list.
> > 
> > -----------------------------------
> > In either case it only seems to be an issue with jumbo frames.  I'm 
> > also reasonably sure the box isn't actually running out of memory - 
> > I'm running the same test right this minute on the actual disk array 
> > with MTU=1500 on both ends and my target box has 642988k free. 
> > Running it again with MTU=9000 generates the error right away and I 
> > see 642696k free during this test run.
> > 
> > >> generate the Page Allocation Failure messages after running for a few
> > >> minutes on a 'real' device and almost instantly using 'nullio'.
> > >>
> > > and this happen instantly? can u check the RAM usage? also can u check
> > > the slabtop? maybe here are memory leak?
> > 
> > I'm not familiar enough with slabtop to interpret its output but 
> > here's the summary section:
> > 
> > Active / Total Objects (% used)    : 250586 / 281397 (89.1%)
> > Active / Total Slabs (% used)      : 6301 / 6310 (99.9%)
> > Active / Total Caches (% used)     : 83 / 124 (66.9%)
> > Active / Total Size (% used)       : 30053.05K / 33686.83K (89.2%)
> > Minimum / Average / Maximum Object : 0.01K / 0.12K / 128.00K
> > 
> > Bryn
> > 
> > 
> > 
> > the detail msg.
> > 
> > -------- Forwarded Message --------
> > > From: Bryn Hughes <linux@nashira.ca>
> > > To: iet-dev <iscsitarget-devel@lists.sourceforge.net>
> > > Subject: [Iscsitarget-devel] Page Allocation Failure
> > > Date: Tue, 16 Aug 2005 14:49:29 -0700
> > > Today I started doing testing with a larger volume and jumbo frames.
> > > I'm running bonnie with a 8G test set on a 100G volume and since I
> > > enabled jumbo frames I've been seeing this in dmesg over and over
> > > again:
> > >
> > >
> > > istd1: page allocation failure. order:3, mode:0x20
> > > [<c0148b33>] __alloc_pages+0x343/0x3c0
> > > [<c011cd58>] scheduler_tick+0x198/0x400
> > > [<c014b602>] kmem_getpages+0x32/0xa0
> > > [<c014c35e>] cache_grow+0xbe/0x180
> > > [<c014c58d>] cache_alloc_refill+0x16d/0x240
> > > [<c014c95a>] __kmalloc+0x6a/0x70
> > > [<c02ba3e0>] alloc_skb+0x40/0xf0
> > > [<f924471e>] e1000_alloc_rx_buffers+0x6e/0x3a0 [e1000]
> > > [<f9243ee8>] e1000_clean_rx_irq+0x1e8/0x4d0 [e1000]
> > > [<c02df856>] ip_queue_xmit+0x306/0x540
> > > [<f92436da>] e1000_clean+0x9a/0x150 [e1000]
> > > [<c02c0cf9>] net_rx_action+0xd9/0x1b0
> > > [<c01267d2>] __do_softirq+0x82/0x100
> > > [<c0126885>] do_softirq+0x35/0x40
> > > [<c01066db>] do_IRQ+0x3b/0x70
> > > [<c0104b86>] common_interrupt+0x1a/0x20
> > > [<c0205a0c>] __copy_user_intel+0x2c/0xb0
> > > [<c0205b9a>] __copy_to_user_ll+0x5a/0x80
> > > [<c0205c76>] copy_to_user+0x36/0x60
> > > [<c02bc689>] memcpy_toiovec+0x29/0x50
> > > [<c02bcc5b>] skb_copy_datagram_iovec+0x4b/0x210
> > > [<c02f7690>] tcp_v4_do_rcv+0xf0/0x110
> > > [<c02b93e4>] __release_sock+0x54/0x70
> > > [<c02e62c3>] tcp_recvmsg+0x4b3/0x750
> > > [<c0148601>] buffered_rmqueue+0xf1/0x210
> > > [<c01488b5>] __alloc_pages+0xc5/0x3c0
> > > [<c02b9d38>] sock_common_recvmsg+0x48/0x70
> > > [<c02b61e1>] sock_recvmsg+0x131/0x170
> > > [<c02f0566>] tcp_transmit_skb+0x426/0x760
> > > [<c012a2e7>] __mod_timer+0xf7/0x130
> > > [<c02b993c>] sk_reset_timer+0xc/0x20
> > > [<c02f12e6>] tcp_write_xmit+0x176/0x2c0
> > > [<c0136120>] autoremove_wake_function+0x0/0x50
> > > [<c03055a5>] inet_ioctl+0x45/0xf0
> > > [<f9379d00>] is_data_available+0x30/0x50 [iscsi_trgt]
> > > [<f9379e33>] do_recv+0xc3/0x1c0 [iscsi_trgt]
> > > [<c030535a>] inet_sendmsg+0x4a/0x70
> > > [<f9243ee8>] e1000_clean_rx_irq+0x1e8/0x4d0 [e1000]
> > > [<c02c0cf9>] net_rx_action+0xd9/0x1b0
> > > [<c0136120>] autoremove_wake_function+0x0/0x50
> > > [<c01267d2>] __do_softirq+0x82/0x100
> > > [<c014820a>] rmqueue_bulk+0x7a/0x90
> > > [<c0148085>] prep_new_page+0x55/0x60
> > > [<c0148601>] buffered_rmqueue+0xf1/0x210
> > > [<c0162b38>] do_readv_writev+0x1e8/0x250
> > > [<c01488b5>] __alloc_pages+0xc5/0x3c0
> > > [<f93777c8>] cmnd_recv_pdu+0xb8/0x210 [iscsi_trgt]
> > > [<f937607c>] tio_add_pages+0x7c/0xe0 [iscsi_trgt]
> > > [<f9378375>] scsi_cmnd_start+0x3c5/0x490 [iscsi_trgt]
> > > [<f937754f>] cmnd_insert_hash+0xff/0x140 [iscsi_trgt]
> > > [<c02b9d86>] sock_common_setsockopt+0x26/0x40
> > > [<c011b470>] try_to_wake_up+0x2b0/0x300
> > > [<f937a38f>] recv+0x36f/0x420 [iscsi_trgt]
> > > [<f937adb3>] process_io+0x23/0xa0 [iscsi_trgt]
> > > [<f937b109>] istd+0x99/0x100 [iscsi_trgt]
> > > [<f937b070>] istd+0x0/0x100 [iscsi_trgt]
> > > [<c0135c65>] kthread+0xa5/0xf0
> > > [<c0135bc0>] kthread+0x0/0xf0
> > > [<c0102455>] kernel_thread_helper+0x5/0x10
> > >
> > >
> > > Any idea what would be causing this?
> > >
> > >
> > > SLES9/kernel 2.6.12.3
> > > IET 0.4.11
> > > e1000 6.1.16
> > >
> > 
> > 
> > 
> > 
> > -------------------------------------------------------
> > SF.Net email is Sponsored by the Better Software Conference & EXPO
> > September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> > _______________________________________________
> > E1000-devel mailing list
> > E1000-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/e1000-devel
> > 
> > 
> >



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Page Allocation Failure with e1000 using jumbo frame
  2005-08-19 16:51 ` [E1000-devel] Page Allocation Failure with e1000 using jumbo frame Jesse Brandeburg
  2005-08-19 17:01   ` Ming Zhang
@ 2005-08-19 17:02   ` Andi Kleen
  2005-08-19 18:10     ` [E1000-devel] " Martin Josefsson
  2005-08-19 17:03   ` Nivedita Singhvi
  2005-08-20  1:43   ` [E1000-devel] " Michael Iatrou
  3 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2005-08-19 17:02 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: Ming Zhang, E1000, iet-dev, netdev

> I guess we need to approach the memory manager guys and ask them why the 
> current kernels are having so much trouble getting contiguous memory. 

Because memory fragments.

The only long term reliable way is to not allocate buffers > PAGE_SIZE.
The stack supports paged skbs for that.

-Andi


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Page Allocation Failure with e1000 using jumbo frame
  2005-08-19 16:51 ` [E1000-devel] Page Allocation Failure with e1000 using jumbo frame Jesse Brandeburg
  2005-08-19 17:01   ` Ming Zhang
  2005-08-19 17:02   ` Page Allocation Failure with e1000 using jumbo frame Andi Kleen
@ 2005-08-19 17:03   ` Nivedita Singhvi
  2005-08-20  1:43   ` [E1000-devel] " Michael Iatrou
  3 siblings, 0 replies; 14+ messages in thread
From: Nivedita Singhvi @ 2005-08-19 17:03 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: Ming Zhang, E1000, iet-dev, netdev

Jesse Brandeburg wrote:

> included netdev...
> 
> 
> On Wed, 17 Aug 2005, Ming Zhang wrote:
> 
>>
>> Hi folks
>>
>> We ran into this problem when running jumbo frame with iscsi over e1000.
>> the MTU1500 is fine while jumbo frame can stably reproduce this error.
>>
>> when meet this error, as reported in iet list, the box still has >600MB
>> ram free. also the slab is not heavily used.
>>
>> any idea on this?
> 
> 
> 
> So, what we do know is that your kernel memory manager is (for whatever 
> unknown reason) having trouble finding contiguous 2^3 pages (32kB) 
> chunks of memory.  This occurs because we need to give our hardware 16kB 
> contiguous (currently, we actually have a patch for this in internal 
> testing) which means that when we do dev_alloc_skb, it allocates 16kB + 
> 16 bytes reserve, plus 2 bytes NET_IP_ALIGN, which takes us into the 
> 32kB slab (power of two roundup)
> 
> I guess we need to approach the memory manager guys and ask them why the 
> current kernels are having so much trouble getting contiguous memory. 
> Also, recently thanks to David Miller's discussions on socket charge, we 
> understand that we're getting hit hard by using such a big buffer.
> 
> Netdev any ideas?

Interesting you should mention this, I just mentioned it
in the context of another thread like 3 seconds ago.

I've talked to Martin Bligh in the past about having the
VM do something sane about these to improve the situation - I'll
go kick, er, ping him again...

The current status is "try not to do that" (large contig allocs).

If anyone has any good ideas, speak up :)

thanks,
Nivedita





-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Page Allocation Failure with e1000 using jumboframe
  2005-08-19 17:01   ` Ming Zhang
@ 2005-08-19 17:33     ` Jesse Brandeburg
  2005-08-19 17:42       ` [E1000-devel] " Andi Kleen
                         ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Jesse Brandeburg @ 2005-08-19 17:33 UTC (permalink / raw)
  To: Ming Zhang; +Cc: Brandeburg, Jesse, E1000, iet-dev, netdev

On Fri, 19 Aug 2005, Ming Zhang wrote:
> This is first reported on IET list and then i redo the test with vanilla
> 2.6.12.4 kernel and everything went fine.
> 
> so i suspect if there are special case caused by vendor kernel.
> 
> is this 32KB ATOMIC ram allocation request only available in jumbo frame
> case? since the regular MTU case goes fine.
>

Ahh, okay.  I'm pretty sure that SuSE did some changes (not sure what) to 
memory management.

the formula for the size that the current e1000 looks for is something 
like

a = MTU roundup to next power of 2
a += 2 (skb_reserve(NET_IP_ALIGN))
a += 16 (skb_reserve 16 by __dev_alloc_skb)

so, a = 2048 + 2 + 16, or 2066
request (a) from slab, which does a power of 2 roundup
so the skb comes from the 4k (single page) slab for standard mtu.

Jesse

PS we have a driver in test that won't do the large contig allocations any 
more.



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [E1000-devel] Page Allocation Failure with e1000 using jumboframe
  2005-08-19 17:33     ` Page Allocation Failure with e1000 using jumboframe Jesse Brandeburg
@ 2005-08-19 17:42       ` Andi Kleen
  2005-08-19 17:51         ` Jesse Brandeburg
  2005-08-19 17:52       ` Ming Zhang
  2005-08-20  1:46       ` Michael Iatrou
  2 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2005-08-19 17:42 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: Ming Zhang, E1000, iet-dev, netdev

> Ahh, okay.  I'm pretty sure that SuSE did some changes (not sure what) to 
> memory management.

I don't think so.

> 
> the formula for the size that the current e1000 looks for is something 
> like
> 
> a = MTU roundup to next power of 2
> a += 2 (skb_reserve(NET_IP_ALIGN))
> a += 16 (skb_reserve 16 by __dev_alloc_skb)
> 
> so, a = 2048 + 2 + 16, or 2066
> request (a) from slab, which does a power of 2 roundup
> so the skb comes from the 4k (single page) slab for standard mtu.

That's very suboptimal because you're wasting nearly 2k. It would 
be better if you allocated 4k or exactly 2k

-Andi


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [E1000-devel] Page Allocation Failure with e1000 using jumboframe
  2005-08-19 17:42       ` [E1000-devel] " Andi Kleen
@ 2005-08-19 17:51         ` Jesse Brandeburg
  2005-08-19 18:01           ` Andi Kleen
  0 siblings, 1 reply; 14+ messages in thread
From: Jesse Brandeburg @ 2005-08-19 17:51 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Brandeburg, Jesse, Ming Zhang, E1000, iet-dev, netdev

On Fri, 19 Aug 2005, Andi Kleen wrote:
> > Ahh, okay.  I'm pretty sure that SuSE did some changes (not sure what) to
> > memory management.
> 
> I don't think so.

I could certainly be mistaken.  The difference I saw was that suse kernels 
recycle the same skb pointers back to our driver, and the redhat kernels 
seem to march through a much larger range before the values repeat.  This 
is all observation based, so I may be completely wrong on this issue.

> > the formula for the size that the current e1000 looks for is something
> > like
> >
> > a = MTU roundup to next power of 2
> > a += 2 (skb_reserve(NET_IP_ALIGN))
> > a += 16 (skb_reserve 16 by __dev_alloc_skb)
> >
> > so, a = 2048 + 2 + 16, or 2066
> > request (a) from slab, which does a power of 2 roundup
> > so the skb comes from the 4k (single page) slab for standard mtu.
> 
> That's very suboptimal because you're wasting nearly 2k. It would
> be better if you allocated 4k or exactly 2k

we have to give the full 2k to hardware, unfortunately.  which means 
mapping the full 2k.  we do the skb reserve because of cache/alighment 
effects which show a (big) hit in performance if we don't align the IP 
header.  Yes I know that dword unaligned DMA really hurts on some arches, 
but thats why the arch can #def NET_IP_ALIGN 0.

if thats the case, then we're left asking the question, who uses that 16 
bytes that are skb_reserved by __dev_alloc_skb???

Jesse


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Page Allocation Failure with e1000 using jumboframe
  2005-08-19 17:33     ` Page Allocation Failure with e1000 using jumboframe Jesse Brandeburg
  2005-08-19 17:42       ` [E1000-devel] " Andi Kleen
@ 2005-08-19 17:52       ` Ming Zhang
  2005-08-20  1:46       ` Michael Iatrou
  2 siblings, 0 replies; 14+ messages in thread
From: Ming Zhang @ 2005-08-19 17:52 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: E1000, iet-dev, netdev

On Fri, 2005-08-19 at 10:33 -0700, Jesse Brandeburg wrote:
> On Fri, 19 Aug 2005, Ming Zhang wrote:
> > This is first reported on IET list and then i redo the test with vanilla
> > 2.6.12.4 kernel and everything went fine.
> > 
> > so i suspect if there are special case caused by vendor kernel.
> > 
> > is this 32KB ATOMIC ram allocation request only available in jumbo frame
> > case? since the regular MTU case goes fine.
> >
> 
> Ahh, okay.  I'm pretty sure that SuSE did some changes (not sure what) to 
> memory management.
> 
> the formula for the size that the current e1000 looks for is something 
> like
> 
> a = MTU roundup to next power of 2
> a += 2 (skb_reserve(NET_IP_ALIGN))
> a += 16 (skb_reserve 16 by __dev_alloc_skb)
> 
> so, a = 2048 + 2 + 16, or 2066
> request (a) from slab, which does a power of 2 roundup
> so the skb comes from the 4k (single page) slab for standard mtu.
> 
that is wasteful.


> Jesse
> 
> PS we have a driver in test that won't do the large contig allocations any 
> more.
> 

then wait to see this. :P





-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [E1000-devel] Page Allocation Failure with e1000 using jumboframe
  2005-08-19 17:51         ` Jesse Brandeburg
@ 2005-08-19 18:01           ` Andi Kleen
  2005-08-19 19:07             ` Ming Zhang
  2005-08-19 21:10             ` Jesse Brandeburg
  0 siblings, 2 replies; 14+ messages in thread
From: Andi Kleen @ 2005-08-19 18:01 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: Andi Kleen, Ming Zhang, E1000, iet-dev, netdev

> I could certainly be mistaken.  The difference I saw was that suse kernels 
> recycle the same skb pointers back to our driver, and the redhat kernels 
> seem to march through a much larger range before the values repeat.  This 
> is all observation based, so I may be completely wrong on this issue.

Maybe you're comparing 2.4 to 2.6 or different 2.6s. But I don't 
think there are any SUSE specific patches affecting this.

> 
> >> the formula for the size that the current e1000 looks for is something
> >> like
> >>
> >> a = MTU roundup to next power of 2
> >> a += 2 (skb_reserve(NET_IP_ALIGN))
> >> a += 16 (skb_reserve 16 by __dev_alloc_skb)
> >>
> >> so, a = 2048 + 2 + 16, or 2066
> >> request (a) from slab, which does a power of 2 roundup
> >> so the skb comes from the 4k (single page) slab for standard mtu.
> >
> >That's very suboptimal because you're wasting nearly 2k. It would
> >be better if you allocated 4k or exactly 2k
> 
> we have to give the full 2k to hardware, unfortunately.  which means 
> mapping the full 2k.  we do the skb reserve because of cache/alighment 
> effects which show a (big) hit in performance if we don't align the IP 
> header.  Yes I know that dword unaligned DMA really hurts on some arches, 
> but thats why the arch can #def NET_IP_ALIGN 0.

What is the requirement of your hardware? power of two alignment or 
power of two size?  If the later does it really trash the data behind it?

But surely it doesn't use all of the 2k for the 1.5k MTU, so it
would be good if you could fit the header alignment in there
and only get the exact needed amount from the underlying allocator.


> 
> if thats the case, then we're left asking the question, who uses that 16 
> bytes that are skb_reserved by __dev_alloc_skb???

Nothing, except maybe routing to a different class of link layer that
needs bigger headers (e.g. PPP).  Even then it's just a performance
optimization to avoid a skb copy.

I suppose it would be possible to keep track of the largest supported
hard_header_len of all devices and if it's all identical don't add the 
16 bytes.

-Andi


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [E1000-devel] Page Allocation Failure with e1000 using jumbo frame
  2005-08-19 17:02   ` Page Allocation Failure with e1000 using jumbo frame Andi Kleen
@ 2005-08-19 18:10     ` Martin Josefsson
  0 siblings, 0 replies; 14+ messages in thread
From: Martin Josefsson @ 2005-08-19 18:10 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jesse Brandeburg, Ming Zhang, E1000, iet-dev, netdev

[-- Attachment #1: Type: text/plain, Size: 724 bytes --]

On Fri, 2005-08-19 at 19:02 +0200, Andi Kleen wrote:
> > I guess we need to approach the memory manager guys and ask them why the 
> > current kernels are having so much trouble getting contiguous memory. 
> 
> Because memory fragments.
> 
> The only long term reliable way is to not allocate buffers > PAGE_SIZE.
> The stack supports paged skbs for that.

And the e1000 supports receiving/transmitting packets containing several
buffers according to the SDM, it just uses more descriptors in the rx/tx
rings.
Maybe it should allocate PAGE_SIZE large buffers, a 9kB packet would
then use 3 such buffers.

And it of course means more and smaller DMA transfers than when using
larger buffers.

-- 
/Martin

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [E1000-devel] Page Allocation Failure with e1000 using jumboframe
  2005-08-19 18:01           ` Andi Kleen
@ 2005-08-19 19:07             ` Ming Zhang
  2005-08-19 21:10             ` Jesse Brandeburg
  1 sibling, 0 replies; 14+ messages in thread
From: Ming Zhang @ 2005-08-19 19:07 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jesse Brandeburg, E1000, iet-dev, netdev

I am sorry that the guy who found this problem is running suse linux,
but with vanilla kernel. so this is a generic problem, not suse
specific.

I am sorry for my insaneness at that time. :P

Ming


On Fri, 2005-08-19 at 20:01 +0200, Andi Kleen wrote:
> > I could certainly be mistaken.  The difference I saw was that suse kernels 
> > recycle the same skb pointers back to our driver, and the redhat kernels 
> > seem to march through a much larger range before the values repeat.  This 
> > is all observation based, so I may be completely wrong on this issue.
> 
> Maybe you're comparing 2.4 to 2.6 or different 2.6s. But I don't 
> think there are any SUSE specific patches affecting this.
> 
> > 
> > >> the formula for the size that the current e1000 looks for is something
> > >> like
> > >>
> > >> a = MTU roundup to next power of 2
> > >> a += 2 (skb_reserve(NET_IP_ALIGN))
> > >> a += 16 (skb_reserve 16 by __dev_alloc_skb)
> > >>
> > >> so, a = 2048 + 2 + 16, or 2066
> > >> request (a) from slab, which does a power of 2 roundup
> > >> so the skb comes from the 4k (single page) slab for standard mtu.
> > >
> > >That's very suboptimal because you're wasting nearly 2k. It would
> > >be better if you allocated 4k or exactly 2k
> > 
> > we have to give the full 2k to hardware, unfortunately.  which means 
> > mapping the full 2k.  we do the skb reserve because of cache/alighment 
> > effects which show a (big) hit in performance if we don't align the IP 
> > header.  Yes I know that dword unaligned DMA really hurts on some arches, 
> > but thats why the arch can #def NET_IP_ALIGN 0.
> 
> What is the requirement of your hardware? power of two alignment or 
> power of two size?  If the later does it really trash the data behind it?
> 
> But surely it doesn't use all of the 2k for the 1.5k MTU, so it
> would be good if you could fit the header alignment in there
> and only get the exact needed amount from the underlying allocator.
> 
> 
> > 
> > if thats the case, then we're left asking the question, who uses that 16 
> > bytes that are skb_reserved by __dev_alloc_skb???
> 
> Nothing, except maybe routing to a different class of link layer that
> needs bigger headers (e.g. PPP).  Even then it's just a performance
> optimization to avoid a skb copy.
> 
> I suppose it would be possible to keep track of the largest supported
> hard_header_len of all devices and if it's all identical don't add the 
> 16 bytes.
> 
> -Andi



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [E1000-devel] Page Allocation Failure with e1000 using jumboframe
  2005-08-19 18:01           ` Andi Kleen
  2005-08-19 19:07             ` Ming Zhang
@ 2005-08-19 21:10             ` Jesse Brandeburg
  1 sibling, 0 replies; 14+ messages in thread
From: Jesse Brandeburg @ 2005-08-19 21:10 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Brandeburg, Jesse, Ming Zhang, E1000, iet-dev, netdev

On Fri, 19 Aug 2005, Andi Kleen wrote:
> > >> the formula for the size that the current e1000 looks for is something
> > >> like
> > >>
> > >> a = MTU roundup to next power of 2
> > >> a += 2 (skb_reserve(NET_IP_ALIGN))
> > >> a += 16 (skb_reserve 16 by __dev_alloc_skb)
> > >>
> > >> so, a = 2048 + 2 + 16, or 2066
> > >> request (a) from slab, which does a power of 2 roundup
> > >> so the skb comes from the 4k (single page) slab for standard mtu.
> > >
> > >That's very suboptimal because you're wasting nearly 2k. It would
> > >be better if you allocated 4k or exactly 2k
> >
> > we have to give the full 2k to hardware, unfortunately.  which means
> > mapping the full 2k.  we do the skb reserve because of cache/alighment
> > effects which show a (big) hit in performance if we don't align the IP
> > header.  Yes I know that dword unaligned DMA really hurts on some arches,
> > but thats why the arch can #def NET_IP_ALIGN 0.
> 
> What is the requirement of your hardware? power of two alignment or
> power of two size?  If the later does it really trash the data behind it?

our pci/pci-x hardware requires the full 2k, (power of 2 size) and it can 
trash the data behind it even in the 1500 MTU case because a frame larger 
than 1518 bytes can be received, and we could fill a whole descriptor and 
overflow into the next. see next paragraph.

> But surely it doesn't use all of the 2k for the 1.5k MTU, so it
> would be good if you could fit the header alignment in there
> and only get the exact needed amount from the underlying allocator.

that would be ideal.  Depending on the memory constraints of the system, 
we could set the RCTL.LPE=0 (long packet enable that forces the hardware 
to drop packets more than 1522 bytes) that would then enable us to 
cheat/optimize for the 1500 MTU case.  I guess then we just call alloc_skb 
directly instead of the dev version.

> > if thats the case, then we're left asking the question, who uses that 16
> > bytes that are skb_reserved by __dev_alloc_skb???
> 
> Nothing, except maybe routing to a different class of link layer that
> needs bigger headers (e.g. PPP).  Even then it's just a performance
> optimization to avoid a skb copy.
> 
> I suppose it would be possible to keep track of the largest supported
> hard_header_len of all devices and if it's all identical don't add the
> 16 bytes.

hmm, all this is pie in the sky until we can get some of this tested. 
Unfortunately those resources are busy here :-(

Jesse




-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [E1000-devel] Page Allocation Failure with e1000 using jumbo frame
  2005-08-19 16:51 ` [E1000-devel] Page Allocation Failure with e1000 using jumbo frame Jesse Brandeburg
                     ` (2 preceding siblings ...)
  2005-08-19 17:03   ` Nivedita Singhvi
@ 2005-08-20  1:43   ` Michael Iatrou
  3 siblings, 0 replies; 14+ messages in thread
From: Michael Iatrou @ 2005-08-20  1:43 UTC (permalink / raw)
  To: e1000-devel; +Cc: Jesse Brandeburg, Ming Zhang, iet-dev, netdev

When the date was Friday 19 August 2005 19:51, Jesse Brandeburg wrote:

> included netdev...
>
> On Wed, 17 Aug 2005, Ming Zhang wrote:
> > Hi folks
> >
> > We ran into this problem when running jumbo frame with iscsi over e1000.
> > the MTU1500 is fine while jumbo frame can stably reproduce this error.
> >
> > when meet this error, as reported in iet list, the box still has >600MB
> > ram free. also the slab is not heavily used.
> >
> > any idea on this?
>
> So, what we do know is that your kernel memory manager is (for whatever
> unknown reason) having trouble finding contiguous 2^3 pages (32kB) chunks
> of memory.  This occurs because we need to give our hardware 16kB
> contiguous (currently, we actually have a patch for this in internal
> testing) which means that when we do dev_alloc_skb, it allocates 16kB + 16
> bytes reserve, plus 2 bytes NET_IP_ALIGN, which takes us into the 32kB
> slab (power of two roundup)

I don't get it: For MTU = 9000, you already have to allocate 2^3 pages, but 
testing shows that the problem only occurs for MTU > 3*PAGE_SIZE

http://members.hellug.gr/iatrou/plain_ip_mtu.png

(reference to previous discussion)
http://sourceforge.net/mailarchive/forum.php?forum_id=12401&max_rows=25&style=ultimate&viewmonth=200505

-- 
 Michael Iatrou
 Electrical and Computer Engineering Dept.
 University of Patras, Greece


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Page Allocation Failure with e1000 using jumboframe
  2005-08-19 17:33     ` Page Allocation Failure with e1000 using jumboframe Jesse Brandeburg
  2005-08-19 17:42       ` [E1000-devel] " Andi Kleen
  2005-08-19 17:52       ` Ming Zhang
@ 2005-08-20  1:46       ` Michael Iatrou
  2 siblings, 0 replies; 14+ messages in thread
From: Michael Iatrou @ 2005-08-20  1:46 UTC (permalink / raw)
  To: e1000-devel; +Cc: Jesse Brandeburg, Ming Zhang, iet-dev, netdev

When the date was Friday 19 August 2005 20:33, Jesse Brandeburg wrote:

> PS we have a driver in test that won't do the large contig allocations any
> more.

In fact, I tested a version of these drivers about 3 months ago and not only 
they didn't solve the problem, but the throughput decreased!
Is there a newer version? I would gladly test it...

(previous report)
http://sourceforge.net/mailarchive/forum.php?thread_id=7237829&forum_id=12401

-- 
 Michael Iatrou
 Electrical and Computer Engineering Dept.
 University of Patras, Greece


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2005-08-20  1:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1124326404.5546.215.camel@localhost.localdomain>
2005-08-19 16:51 ` [E1000-devel] Page Allocation Failure with e1000 using jumbo frame Jesse Brandeburg
2005-08-19 17:01   ` Ming Zhang
2005-08-19 17:33     ` Page Allocation Failure with e1000 using jumboframe Jesse Brandeburg
2005-08-19 17:42       ` [E1000-devel] " Andi Kleen
2005-08-19 17:51         ` Jesse Brandeburg
2005-08-19 18:01           ` Andi Kleen
2005-08-19 19:07             ` Ming Zhang
2005-08-19 21:10             ` Jesse Brandeburg
2005-08-19 17:52       ` Ming Zhang
2005-08-20  1:46       ` Michael Iatrou
2005-08-19 17:02   ` Page Allocation Failure with e1000 using jumbo frame Andi Kleen
2005-08-19 18:10     ` [E1000-devel] " Martin Josefsson
2005-08-19 17:03   ` Nivedita Singhvi
2005-08-20  1:43   ` [E1000-devel] " Michael Iatrou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).