Re: 2.6.1[78] page allocation failure. order:3, mode:0x20

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
       [not found] <Pine.LNX.4.64.0609220655550.13396@diagnostix.dwd.de>
@ 2006-09-22  7:42 ` Andrew Morton
  2006-09-22 12:03   ` Holger Kiehl
  2006-09-22 17:10   ` Auke Kok
  0 siblings, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2006-09-22  7:42 UTC (permalink / raw)
  To: Holger Kiehl; +Cc: linux-kernel, linux-net, netdev

On Fri, 22 Sep 2006 07:27:18 +0000 (GMT)
Holger Kiehl <Holger.Kiehl@dwd.de> wrote:

> I get some of the "page allocation failure" errors. My hardware is 4 CPU
> Opteron with one quad + one dual intel e1000 cards. Kernel is plain 2.6.18
> and for two cards MTU is set to 9000.
> 
>     Sep 21 21:03:15 athena kernel: vsftpd: page allocation failure. order:3, mode:0x20
>     Sep 21 21:03:15 athena kernel:
>     Sep 21 21:03:15 athena kernel: Call Trace:
>     Sep 21 21:03:15 athena kernel:  <IRQ> [<ffffffff8024e516>] __alloc_pages+0x282/0x29b
>     Sep 21 21:03:15 athena kernel:  [<ffffffff8807aa93>] :ip_tables:ipt_do_table+0x1eb/0x318
>     Sep 21 21:03:15 athena kernel:  [<ffffffff8026614b>] cache_grow+0x134/0x33d
>     Sep 21 21:03:15 athena kernel:  [<ffffffff8026664c>] cache_alloc_refill+0x189/0x1d7
>     Sep 21 21:03:15 athena kernel:  [<ffffffff80266724>] __kmalloc+0x8a/0x94
>     Sep 21 21:03:15 athena kernel:  [<ffffffff803b5438>] __alloc_skb+0x5c/0x123
>     Sep 21 21:03:15 athena kernel:  [<ffffffff803b5f2e>] __netdev_alloc_skb+0x12/0x2d
>     Sep 21 21:03:15 athena kernel:  [<ffffffff8033cb22>] e1000_alloc_rx_buffers+0x6f/0x2f3
>     Sep 21 21:03:15 athena kernel:  [<ffffffff803d1234>] ip_local_deliver+0x173/0x23b
>     Sep 21 21:03:15 athena kernel:  [<ffffffff8033d29a>] e1000_clean_rx_irq+0x4f4/0x514

Is OK, it's just a warning and it is expected - the kernel will recover.

I'm half-inclined to shut the warning up by sticking a __GFP_NOWARN in there.

But on the other hand, that warning is handy sometimes.  How come kmalloc
decided to request a 32k hunk of memory when the MTU size is only 9k?  Is
the driver doing something dumb?

	else if (max_frame <= E1000_RXBUFFER_8192)
		adapter->rx_buffer_len = E1000_RXBUFFER_8192;
	else if (max_frame <= E1000_RXBUFFER_16384)
		adapter->rx_buffer_len = E1000_RXBUFFER_16384;

It sure is.

This is going to cause an 9000-byte MTU to use a 16384-byte allocation. 
e1000_alloc_rx_buffers() adds two bytes to that, so we do kmalloc(16386),
which causes the slab allocator to request 32768 bytes.  All for a 9kbyte skb.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
  2006-09-22  7:42 ` 2.6.1[78] page allocation failure. order:3, mode:0x20 Andrew Morton
@ 2006-09-22 12:03   ` Holger Kiehl
  2006-09-22 12:12     ` Evgeniy Polyakov
  2006-09-22 17:10   ` Auke Kok
  1 sibling, 1 reply; 11+ messages in thread
From: Holger Kiehl @ 2006-09-22 12:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-net, netdev

On Fri, 22 Sep 2006, Andrew Morton wrote:

> On Fri, 22 Sep 2006 07:27:18 +0000 (GMT)
> Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
>
>> I get some of the "page allocation failure" errors. My hardware is 4 CPU
>> Opteron with one quad + one dual intel e1000 cards. Kernel is plain 2.6.18
>> and for two cards MTU is set to 9000.
>>
>>     Sep 21 21:03:15 athena kernel: vsftpd: page allocation failure. order:3, mode:0x20
>>     Sep 21 21:03:15 athena kernel:
>>     Sep 21 21:03:15 athena kernel: Call Trace:
>>     Sep 21 21:03:15 athena kernel:  <IRQ> [<ffffffff8024e516>] __alloc_pages+0x282/0x29b
>>     Sep 21 21:03:15 athena kernel:  [<ffffffff8807aa93>] :ip_tables:ipt_do_table+0x1eb/0x318
>>     Sep 21 21:03:15 athena kernel:  [<ffffffff8026614b>] cache_grow+0x134/0x33d
>>     Sep 21 21:03:15 athena kernel:  [<ffffffff8026664c>] cache_alloc_refill+0x189/0x1d7
>>     Sep 21 21:03:15 athena kernel:  [<ffffffff80266724>] __kmalloc+0x8a/0x94
>>     Sep 21 21:03:15 athena kernel:  [<ffffffff803b5438>] __alloc_skb+0x5c/0x123
>>     Sep 21 21:03:15 athena kernel:  [<ffffffff803b5f2e>] __netdev_alloc_skb+0x12/0x2d
>>     Sep 21 21:03:15 athena kernel:  [<ffffffff8033cb22>] e1000_alloc_rx_buffers+0x6f/0x2f3
>>     Sep 21 21:03:15 athena kernel:  [<ffffffff803d1234>] ip_local_deliver+0x173/0x23b
>>     Sep 21 21:03:15 athena kernel:  [<ffffffff8033d29a>] e1000_clean_rx_irq+0x4f4/0x514
>
> Is OK, it's just a warning and it is expected - the kernel will recover.
>
> I'm half-inclined to shut the warning up by sticking a __GFP_NOWARN in there.
>
> But on the other hand, that warning is handy sometimes.  How come kmalloc
> decided to request a 32k hunk of memory when the MTU size is only 9k?  Is
> the driver doing something dumb?
>
> 	else if (max_frame <= E1000_RXBUFFER_8192)
> 		adapter->rx_buffer_len = E1000_RXBUFFER_8192;
> 	else if (max_frame <= E1000_RXBUFFER_16384)
> 		adapter->rx_buffer_len = E1000_RXBUFFER_16384;
>
> It sure is.
>
> This is going to cause an 9000-byte MTU to use a 16384-byte allocation.
> e1000_alloc_rx_buffers() adds two bytes to that, so we do kmalloc(16386),
> which causes the slab allocator to request 32768 bytes.  All for a 9kbyte skb.
>
I searched the list, which I should have done before asking (I was not sure
if this was due to the e1000) and found this

    http://www.ussg.iu.edu/hypermail/linux/kernel/0608.0/0942.html

discusion from 3rd August. As a summary I read that people are trying to find
a solution, in the meantime one should set /proc/sys/vm/min_free_kbytes to
65000 or higher, to ensure that the driver gets enough unfragmented memory.

Holger


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
  2006-09-22 12:03   ` Holger Kiehl
@ 2006-09-22 12:12     ` Evgeniy Polyakov
  0 siblings, 0 replies; 11+ messages in thread
From: Evgeniy Polyakov @ 2006-09-22 12:12 UTC (permalink / raw)
  To: Holger Kiehl; +Cc: Andrew Morton, linux-kernel, linux-net, netdev

On Fri, Sep 22, 2006 at 12:03:11PM +0000, Holger Kiehl (Holger.Kiehl@dwd.de) wrote:
> >This is going to cause an 9000-byte MTU to use a 16384-byte allocation.
> >e1000_alloc_rx_buffers() adds two bytes to that, so we do kmalloc(16386),
> >which causes the slab allocator to request 32768 bytes.  All for a 9kbyte 
> >skb.
> >
> I searched the list, which I should have done before asking (I was not sure
> if this was due to the e1000) and found this
> 
>    http://www.ussg.iu.edu/hypermail/linux/kernel/0608.0/0942.html
> 
> discusion from 3rd August. As a summary I read that people are trying to 
> find
> a solution, in the meantime one should set /proc/sys/vm/min_free_kbytes to
> 65000 or higher, to ensure that the driver gets enough unfragmented memory.

There is no solution (although e1000 memory management problem is one of
the reasons I created memory tree allocator) yet, only workarounds, one
of which you described above.

e1000 hardware does not support setting of the maximum transfer size, it
only allows power of two (and about 1500), so it does require 16k of
memory for 9k frame (plus network skb allocation path adds a little which
is transformed into 32k request due to power of two problem).

Intel folks were suggested to either use fragments in one skb (or wait
until network developers invent something new), but there are no patches
from them (hopefully yet).

It is not e1000 only problem - expect even 8k-12k allocation not on
startup is definitely a wrong way.

> Holger

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
  2006-09-22  7:42 ` 2.6.1[78] page allocation failure. order:3, mode:0x20 Andrew Morton
  2006-09-22 12:03   ` Holger Kiehl
@ 2006-09-22 17:10   ` Auke Kok
  2006-09-23  4:50     ` Andrew Morton
  1 sibling, 1 reply; 11+ messages in thread
From: Auke Kok @ 2006-09-22 17:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Holger Kiehl, linux-kernel, linux-net, netdev, Ronciak, John

Andrew Morton wrote:
> On Fri, 22 Sep 2006 07:27:18 +0000 (GMT)
> Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
> 
>> I get some of the "page allocation failure" errors. My hardware is 4 CPU
>> Opteron with one quad + one dual intel e1000 cards. Kernel is plain 2.6.18
>> and for two cards MTU is set to 9000.
>>
>>     Sep 21 21:03:15 athena kernel: vsftpd: page allocation failure. order:3, mode:0x20
>>     Sep 21 21:03:15 athena kernel:
>>     Sep 21 21:03:15 athena kernel: Call Trace:
>>     Sep 21 21:03:15 athena kernel:  <IRQ> [<ffffffff8024e516>] __alloc_pages+0x282/0x29b
>>     Sep 21 21:03:15 athena kernel:  [<ffffffff8807aa93>] :ip_tables:ipt_do_table+0x1eb/0x318
>>     Sep 21 21:03:15 athena kernel:  [<ffffffff8026614b>] cache_grow+0x134/0x33d
>>     Sep 21 21:03:15 athena kernel:  [<ffffffff8026664c>] cache_alloc_refill+0x189/0x1d7
>>     Sep 21 21:03:15 athena kernel:  [<ffffffff80266724>] __kmalloc+0x8a/0x94
>>     Sep 21 21:03:15 athena kernel:  [<ffffffff803b5438>] __alloc_skb+0x5c/0x123
>>     Sep 21 21:03:15 athena kernel:  [<ffffffff803b5f2e>] __netdev_alloc_skb+0x12/0x2d
>>     Sep 21 21:03:15 athena kernel:  [<ffffffff8033cb22>] e1000_alloc_rx_buffers+0x6f/0x2f3
>>     Sep 21 21:03:15 athena kernel:  [<ffffffff803d1234>] ip_local_deliver+0x173/0x23b
>>     Sep 21 21:03:15 athena kernel:  [<ffffffff8033d29a>] e1000_clean_rx_irq+0x4f4/0x514
> 
> Is OK, it's just a warning and it is expected - the kernel will recover.
> 
> I'm half-inclined to shut the warning up by sticking a __GFP_NOWARN in there.
> 
> But on the other hand, that warning is handy sometimes.  How come kmalloc
> decided to request a 32k hunk of memory when the MTU size is only 9k?  Is
> the driver doing something dumb?
> 
> 	else if (max_frame <= E1000_RXBUFFER_8192)
> 		adapter->rx_buffer_len = E1000_RXBUFFER_8192;
> 	else if (max_frame <= E1000_RXBUFFER_16384)
> 		adapter->rx_buffer_len = E1000_RXBUFFER_16384;
> 
> It sure is.
> 
> This is going to cause an 9000-byte MTU to use a 16384-byte allocation. 
> e1000_alloc_rx_buffers() adds two bytes to that, so we do kmalloc(16386),
> which causes the slab allocator to request 32768 bytes.  All for a 9kbyte skb.

I wonder if we can't account for NET_IP_ALIGN when selecting bufsize, to get at 
rid of at least 1 order size before we netdev_alloc_skb. This should make 9k 
frames only kmalloc(16384) and thus stay within the 16k boundary. I hope.

Completely untested: don't commit :)

Auke

---

e1000: account for NET_IP_ALIGN when calculating bufsiz

Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to 
reduce slab allocation by half.

Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index bb0d129..20b1f39 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -1144,7 +1144,7 @@ #endif

  	pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word);

-	adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
+	adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN;
  	adapter->rx_ps_bsize0 = E1000_RXBUFFER_128;
  	hw->max_frame_size = netdev->mtu +
  			     ENET_HEADER_SIZE + ETHERNET_FCS_SIZE;
@@ -3234,26 +3234,27 @@ #define MAX_STD_JUMBO_FRAME_SIZE 9234
  	 * larger slab size
  	 * i.e. RXBUFFER_2048 --> size-4096 slab */

-	if (max_frame <= E1000_RXBUFFER_256)
+	if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256)
  		adapter->rx_buffer_len = E1000_RXBUFFER_256;
-	else if (max_frame <= E1000_RXBUFFER_512)
+	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512)
  		adapter->rx_buffer_len = E1000_RXBUFFER_512;
-	else if (max_frame <= E1000_RXBUFFER_1024)
+	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024)
  		adapter->rx_buffer_len = E1000_RXBUFFER_1024;
-	else if (max_frame <= E1000_RXBUFFER_2048)
+	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048)
  		adapter->rx_buffer_len = E1000_RXBUFFER_2048;
-	else if (max_frame <= E1000_RXBUFFER_4096)
+	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096)
  		adapter->rx_buffer_len = E1000_RXBUFFER_4096;
-	else if (max_frame <= E1000_RXBUFFER_8192)
+	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192)
  		adapter->rx_buffer_len = E1000_RXBUFFER_8192;
-	else if (max_frame <= E1000_RXBUFFER_16384)
+	else
  		adapter->rx_buffer_len = E1000_RXBUFFER_16384;

  	/* adjust allocation if LPE protects us, and we aren't using SBP */
  	if (!adapter->hw.tbi_compatibility_on &&
  	    ((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) ||
  	     (max_frame == MAXIMUM_ETHERNET_VLAN_SIZE)))
-		adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
+		adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE
+		                         + NET_IP_ALIGN;

  	netdev->mtu = new_mtu;

@@ -4076,7 +4076,8 @@ e1000_alloc_rx_buffers(struct e1000_adap
  	struct e1000_buffer *buffer_info;
  	struct sk_buff *skb;
  	unsigned int i;
-	unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN;
+	/* we have already accounted for NET_IP_ALIGN */
+	unsigned int bufsz = adapter->rx_buffer_len;

  	i = rx_ring->next_to_use;
  	buffer_info = &rx_ring->buffer_info[i];

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
  2006-09-22 17:10   ` Auke Kok
@ 2006-09-23  4:50     ` Andrew Morton
  2006-09-23  5:25       ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2006-09-23  4:50 UTC (permalink / raw)
  To: Auke Kok; +Cc: Holger Kiehl, linux-kernel, linux-net, netdev, Ronciak, John

On Fri, 22 Sep 2006 10:10:36 -0700
Auke Kok <auke-jan.h.kok@intel.com> wrote:

> I wonder if we can't account for NET_IP_ALIGN when selecting bufsize, to get at 
> rid of at least 1 order size before we netdev_alloc_skb. This should make 9k 
> frames only kmalloc(16384) and thus stay within the 16k boundary. I hope.
> 
> Completely untested: don't commit :)
> 

I did - I think we want this patch.

> 
> e1000: account for NET_IP_ALIGN when calculating bufsiz
> 
> Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to 
> reduce slab allocation by half.

Could we please do whatever is needed to get this blessed and merged?  This
is such a common problem on such a common driver that I would suggest that
we want this in 2.6.18.x as well.  At least, I'd expect distributors to
ship this fix (they're nuts if they don't) and so it makes sense to deliver
it from kernel.org.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
  2006-09-23  4:50     ` Andrew Morton
@ 2006-09-23  5:25       ` David Miller
  2006-09-23  5:33         ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2006-09-23  5:25 UTC (permalink / raw)
  To: akpm
  Cc: auke-jan.h.kok, Holger.Kiehl, linux-kernel, linux-net, netdev,
	john.ronciak

From: Andrew Morton <akpm@osdl.org>
Date: Fri, 22 Sep 2006 21:50:00 -0700

> On Fri, 22 Sep 2006 10:10:36 -0700
> Auke Kok <auke-jan.h.kok@intel.com> wrote:
> 
> > e1000: account for NET_IP_ALIGN when calculating bufsiz
> > 
> > Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to 
> > reduce slab allocation by half.
> 
> Could we please do whatever is needed to get this blessed and merged?  This
> is such a common problem on such a common driver that I would suggest that
> we want this in 2.6.18.x as well.  At least, I'd expect distributors to
> ship this fix (they're nuts if they don't) and so it makes sense to deliver
> it from kernel.org.

The NET_IP_ALIGN existed not just for fun :)  There are ramifications
for removing it.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
  2006-09-23  5:25       ` David Miller
@ 2006-09-23  5:33         ` Andrew Morton
  2006-09-23 18:50           ` Auke Kok
  2006-09-24 15:26           ` Evgeniy Polyakov
  0 siblings, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2006-09-23  5:33 UTC (permalink / raw)
  To: David Miller
  Cc: auke-jan.h.kok, Holger.Kiehl, linux-kernel, linux-net, netdev,
	john.ronciak

On Fri, 22 Sep 2006 22:25:07 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Andrew Morton <akpm@osdl.org>
> Date: Fri, 22 Sep 2006 21:50:00 -0700
> 
> > On Fri, 22 Sep 2006 10:10:36 -0700
> > Auke Kok <auke-jan.h.kok@intel.com> wrote:
> > 
> > > e1000: account for NET_IP_ALIGN when calculating bufsiz
> > > 
> > > Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to 
> > > reduce slab allocation by half.
> > 
> > Could we please do whatever is needed to get this blessed and merged?  This
> > is such a common problem on such a common driver that I would suggest that
> > we want this in 2.6.18.x as well.  At least, I'd expect distributors to
> > ship this fix (they're nuts if they don't) and so it makes sense to deliver
> > it from kernel.org.
> 
> The NET_IP_ALIGN existed not just for fun :)  There are ramifications
> for removing it.

It's still there, isn't it?

For the 9k MTU case, for example, we end up allocating 16384 byte skbs
instead of 32786 kbytes ones.


diff -puN drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz drivers/net/e1000/e1000_main.c
--- a/drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz
+++ a/drivers/net/e1000/e1000_main.c
@@ -1101,7 +1101,7 @@ e1000_sw_init(struct e1000_adapter *adap
 
 	pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word);
 
-	adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
+	adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN;
 	adapter->rx_ps_bsize0 = E1000_RXBUFFER_128;
 	hw->max_frame_size = netdev->mtu +
 			     ENET_HEADER_SIZE + ETHERNET_FCS_SIZE;
@@ -3163,26 +3163,27 @@ e1000_change_mtu(struct net_device *netd
 	 * larger slab size
 	 * i.e. RXBUFFER_2048 --> size-4096 slab */
 
-	if (max_frame <= E1000_RXBUFFER_256)
+	if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256)
 		adapter->rx_buffer_len = E1000_RXBUFFER_256;
-	else if (max_frame <= E1000_RXBUFFER_512)
+	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512)
 		adapter->rx_buffer_len = E1000_RXBUFFER_512;
-	else if (max_frame <= E1000_RXBUFFER_1024)
+	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024)
 		adapter->rx_buffer_len = E1000_RXBUFFER_1024;
-	else if (max_frame <= E1000_RXBUFFER_2048)
+	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048)
 		adapter->rx_buffer_len = E1000_RXBUFFER_2048;
-	else if (max_frame <= E1000_RXBUFFER_4096)
+	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096)
 		adapter->rx_buffer_len = E1000_RXBUFFER_4096;
-	else if (max_frame <= E1000_RXBUFFER_8192)
+	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192)
 		adapter->rx_buffer_len = E1000_RXBUFFER_8192;
-	else if (max_frame <= E1000_RXBUFFER_16384)
+	else
 		adapter->rx_buffer_len = E1000_RXBUFFER_16384;
 
 	/* adjust allocation if LPE protects us, and we aren't using SBP */
 	if (!adapter->hw.tbi_compatibility_on &&
 	    ((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) ||
 	     (max_frame == MAXIMUM_ETHERNET_VLAN_SIZE)))
-		adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
+		adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE +
+					NET_IP_ALIGN;
 
 	netdev->mtu = new_mtu;
 
@@ -4002,7 +4003,8 @@ e1000_alloc_rx_buffers(struct e1000_adap
 	struct e1000_buffer *buffer_info;
 	struct sk_buff *skb;
 	unsigned int i;
-	unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN;
+	/* we have already accounted for NET_IP_ALIGN */
+	unsigned int bufsz = adapter->rx_buffer_len;
 
 	i = rx_ring->next_to_use;
 	buffer_info = &rx_ring->buffer_info[i];
_


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
  2006-09-23  5:33         ` Andrew Morton
@ 2006-09-23 18:50           ` Auke Kok
  2006-09-23 20:03             ` David Miller
  2006-09-24 15:26           ` Evgeniy Polyakov
  1 sibling, 1 reply; 11+ messages in thread
From: Auke Kok @ 2006-09-23 18:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Holger.Kiehl, linux-kernel, linux-net, netdev,
	john.ronciak

Andrew Morton wrote:
> On Fri, 22 Sep 2006 22:25:07 -0700 (PDT)
> David Miller <davem@davemloft.net> wrote:
> 
>> From: Andrew Morton <akpm@osdl.org>
>> Date: Fri, 22 Sep 2006 21:50:00 -0700
>>
>>> On Fri, 22 Sep 2006 10:10:36 -0700
>>> Auke Kok <auke-jan.h.kok@intel.com> wrote:
>>>
>>>> e1000: account for NET_IP_ALIGN when calculating bufsiz
>>>>
>>>> Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to 
>>>> reduce slab allocation by half.
>>> Could we please do whatever is needed to get this blessed and merged?  This
>>> is such a common problem on such a common driver that I would suggest that
>>> we want this in 2.6.18.x as well.  At least, I'd expect distributors to
>>> ship this fix (they're nuts if they don't) and so it makes sense to deliver
>>> it from kernel.org.
>> The NET_IP_ALIGN existed not just for fun :)  There are ramifications
>> for removing it.
> 
> It's still there, isn't it?
> 
> For the 9k MTU case, for example, we end up allocating 16384 byte skbs
> instead of 32786 kbytes ones.

yes, the only thing I'm doing is accounting for the 2 bytes one steap earlier. 
It works fine for the general case and I tested it too, but I am not too sure 
about the corner cases as the hardware has no notion of mtu at all and could 
possibly overwrite by two bytes. I think my patch actually give the hardware 
two bytes too much now, so we're on the other side (safe) of that problem, but 
I have to verify this first of course.

I'll be wrestling this on monday with Jesse and try to nail it down.

Auke

> 
> 
> diff -puN drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz drivers/net/e1000/e1000_main.c
> --- a/drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz
> +++ a/drivers/net/e1000/e1000_main.c
> @@ -1101,7 +1101,7 @@ e1000_sw_init(struct e1000_adapter *adap
>  
>  	pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word);
>  
> -	adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
> +	adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN;
>  	adapter->rx_ps_bsize0 = E1000_RXBUFFER_128;
>  	hw->max_frame_size = netdev->mtu +
>  			     ENET_HEADER_SIZE + ETHERNET_FCS_SIZE;
> @@ -3163,26 +3163,27 @@ e1000_change_mtu(struct net_device *netd
>  	 * larger slab size
>  	 * i.e. RXBUFFER_2048 --> size-4096 slab */
>  
> -	if (max_frame <= E1000_RXBUFFER_256)
> +	if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256)
>  		adapter->rx_buffer_len = E1000_RXBUFFER_256;
> -	else if (max_frame <= E1000_RXBUFFER_512)
> +	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512)
>  		adapter->rx_buffer_len = E1000_RXBUFFER_512;
> -	else if (max_frame <= E1000_RXBUFFER_1024)
> +	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024)
>  		adapter->rx_buffer_len = E1000_RXBUFFER_1024;
> -	else if (max_frame <= E1000_RXBUFFER_2048)
> +	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048)
>  		adapter->rx_buffer_len = E1000_RXBUFFER_2048;
> -	else if (max_frame <= E1000_RXBUFFER_4096)
> +	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096)
>  		adapter->rx_buffer_len = E1000_RXBUFFER_4096;
> -	else if (max_frame <= E1000_RXBUFFER_8192)
> +	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192)
>  		adapter->rx_buffer_len = E1000_RXBUFFER_8192;
> -	else if (max_frame <= E1000_RXBUFFER_16384)
> +	else
>  		adapter->rx_buffer_len = E1000_RXBUFFER_16384;
>  
>  	/* adjust allocation if LPE protects us, and we aren't using SBP */
>  	if (!adapter->hw.tbi_compatibility_on &&
>  	    ((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) ||
>  	     (max_frame == MAXIMUM_ETHERNET_VLAN_SIZE)))
> -		adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
> +		adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE +
> +					NET_IP_ALIGN;
>  
>  	netdev->mtu = new_mtu;
>  
> @@ -4002,7 +4003,8 @@ e1000_alloc_rx_buffers(struct e1000_adap
>  	struct e1000_buffer *buffer_info;
>  	struct sk_buff *skb;
>  	unsigned int i;
> -	unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN;
> +	/* we have already accounted for NET_IP_ALIGN */
> +	unsigned int bufsz = adapter->rx_buffer_len;
>  
>  	i = rx_ring->next_to_use;
>  	buffer_info = &rx_ring->buffer_info[i];
> _

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
  2006-09-23 18:50           ` Auke Kok
@ 2006-09-23 20:03             ` David Miller
  0 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2006-09-23 20:03 UTC (permalink / raw)
  To: auke-jan.h.kok
  Cc: akpm, Holger.Kiehl, linux-kernel, linux-net, netdev, john.ronciak

From: Auke Kok <auke-jan.h.kok@intel.com>
Date: Sat, 23 Sep 2006 11:50:34 -0700

> Andrew Morton wrote:
> > It's still there, isn't it?
> > 
> > For the 9k MTU case, for example, we end up allocating 16384 byte skbs
> > instead of 32786 kbytes ones.
> 
> yes, the only thing I'm doing is accounting for the 2 bytes one steap earlier. 

Ok, I'm fine with this patch unless it causes some regression that hasn't
been discovered yet :-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
  2006-09-23  5:33         ` Andrew Morton
  2006-09-23 18:50           ` Auke Kok
@ 2006-09-24 15:26           ` Evgeniy Polyakov
  2006-09-24 21:15             ` Auke Kok
  1 sibling, 1 reply; 11+ messages in thread
From: Evgeniy Polyakov @ 2006-09-24 15:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, auke-jan.h.kok, Holger.Kiehl, linux-kernel,
	linux-net, netdev, john.ronciak

On Fri, Sep 22, 2006 at 10:33:48PM -0700, Andrew Morton (akpm@osdl.org) wrote:
> > The NET_IP_ALIGN existed not just for fun :)  There are ramifications
> > for removing it.
> 
> It's still there, isn't it?
> 
> For the 9k MTU case, for example, we end up allocating 16384 byte skbs
> instead of 32786 kbytes ones.

This patch will not help - netdev_alloc_skb() adds additional
NET_SKB_PAD and then alloc_skb() adds sizeof(struct skb_shared_info).
And even if you acconut for them in adapter->rx_buf_len, chip still can
overwrite that area (in the thread mentioned in this e-mail thread
before I posted such patch and received a dump of sizes chip receives -
there were a lot of _different_ ones which were too close to the limit).

> 
> diff -puN drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz drivers/net/e1000/e1000_main.c
> --- a/drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz
> +++ a/drivers/net/e1000/e1000_main.c
> @@ -1101,7 +1101,7 @@ e1000_sw_init(struct e1000_adapter *adap
>  
>  	pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word);
>  
> -	adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
> +	adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN;
>  	adapter->rx_ps_bsize0 = E1000_RXBUFFER_128;
>  	hw->max_frame_size = netdev->mtu +
>  			     ENET_HEADER_SIZE + ETHERNET_FCS_SIZE;
> @@ -3163,26 +3163,27 @@ e1000_change_mtu(struct net_device *netd
>  	 * larger slab size
>  	 * i.e. RXBUFFER_2048 --> size-4096 slab */
>  
> -	if (max_frame <= E1000_RXBUFFER_256)
> +	if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256)
>  		adapter->rx_buffer_len = E1000_RXBUFFER_256;
> -	else if (max_frame <= E1000_RXBUFFER_512)
> +	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512)
>  		adapter->rx_buffer_len = E1000_RXBUFFER_512;
> -	else if (max_frame <= E1000_RXBUFFER_1024)
> +	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024)
>  		adapter->rx_buffer_len = E1000_RXBUFFER_1024;
> -	else if (max_frame <= E1000_RXBUFFER_2048)
> +	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048)
>  		adapter->rx_buffer_len = E1000_RXBUFFER_2048;
> -	else if (max_frame <= E1000_RXBUFFER_4096)
> +	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096)
>  		adapter->rx_buffer_len = E1000_RXBUFFER_4096;
> -	else if (max_frame <= E1000_RXBUFFER_8192)
> +	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192)
>  		adapter->rx_buffer_len = E1000_RXBUFFER_8192;
> -	else if (max_frame <= E1000_RXBUFFER_16384)
> +	else
>  		adapter->rx_buffer_len = E1000_RXBUFFER_16384;
>  
>  	/* adjust allocation if LPE protects us, and we aren't using SBP */
>  	if (!adapter->hw.tbi_compatibility_on &&
>  	    ((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) ||
>  	     (max_frame == MAXIMUM_ETHERNET_VLAN_SIZE)))
> -		adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
> +		adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE +
> +					NET_IP_ALIGN;
>  
>  	netdev->mtu = new_mtu;
>  
> @@ -4002,7 +4003,8 @@ e1000_alloc_rx_buffers(struct e1000_adap
>  	struct e1000_buffer *buffer_info;
>  	struct sk_buff *skb;
>  	unsigned int i;
> -	unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN;
> +	/* we have already accounted for NET_IP_ALIGN */
> +	unsigned int bufsz = adapter->rx_buffer_len;
>  
>  	i = rx_ring->next_to_use;
>  	buffer_info = &rx_ring->buffer_info[i];
> _
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
  2006-09-24 15:26           ` Evgeniy Polyakov
@ 2006-09-24 21:15             ` Auke Kok
  0 siblings, 0 replies; 11+ messages in thread
From: Auke Kok @ 2006-09-24 21:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Evgeniy Polyakov, David Miller, Holger.Kiehl, linux-kernel,
	linux-net, netdev, john.ronciak

Evgeniy Polyakov wrote:
> On Fri, Sep 22, 2006 at 10:33:48PM -0700, Andrew Morton (akpm@osdl.org) wrote:
>>> The NET_IP_ALIGN existed not just for fun :)  There are ramifications
>>> for removing it.
>> It's still there, isn't it?
>>
>> For the 9k MTU case, for example, we end up allocating 16384 byte skbs
>> instead of 32786 kbytes ones.
> 
> This patch will not help - netdev_alloc_skb() adds additional
> NET_SKB_PAD and then alloc_skb() adds sizeof(struct skb_shared_info).
> And even if you acconut for them in adapter->rx_buf_len, chip still can
> overwrite that area (in the thread mentioned in this e-mail thread
> before I posted such patch and received a dump of sizes chip receives -
> there were a lot of _different_ ones which were too close to the limit).

I just did the math on it and it does not compute as I wanted too, we're 
basically flowing to the next larger buffersize 2 mtu bytes earlier, undoing 
any benefit completely.

There is not much that can fix this issue since the hardware will always 
receive in 2-order buffers and dma that back in its entirity, so we must always 
claim size for NET_IP_ALIGN and NET_SKB_PAD after the 2-order bufsz. For the 
9kb mtu case (16kb hw bufsz), we're stuck with 32kb slab allocations. bummer.

Andrew, please drop this patch.

Auke

> 
>> diff -puN drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz drivers/net/e1000/e1000_main.c
>> --- a/drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz
>> +++ a/drivers/net/e1000/e1000_main.c
>> @@ -1101,7 +1101,7 @@ e1000_sw_init(struct e1000_adapter *adap
>>  
>>  	pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word);
>>  
>> -	adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
>> +	adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN;
>>  	adapter->rx_ps_bsize0 = E1000_RXBUFFER_128;
>>  	hw->max_frame_size = netdev->mtu +
>>  			     ENET_HEADER_SIZE + ETHERNET_FCS_SIZE;
>> @@ -3163,26 +3163,27 @@ e1000_change_mtu(struct net_device *netd
>>  	 * larger slab size
>>  	 * i.e. RXBUFFER_2048 --> size-4096 slab */
>>  
>> -	if (max_frame <= E1000_RXBUFFER_256)
>> +	if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256)
>>  		adapter->rx_buffer_len = E1000_RXBUFFER_256;
>> -	else if (max_frame <= E1000_RXBUFFER_512)
>> +	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512)
>>  		adapter->rx_buffer_len = E1000_RXBUFFER_512;
>> -	else if (max_frame <= E1000_RXBUFFER_1024)
>> +	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024)
>>  		adapter->rx_buffer_len = E1000_RXBUFFER_1024;
>> -	else if (max_frame <= E1000_RXBUFFER_2048)
>> +	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048)
>>  		adapter->rx_buffer_len = E1000_RXBUFFER_2048;
>> -	else if (max_frame <= E1000_RXBUFFER_4096)
>> +	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096)
>>  		adapter->rx_buffer_len = E1000_RXBUFFER_4096;
>> -	else if (max_frame <= E1000_RXBUFFER_8192)
>> +	else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192)
>>  		adapter->rx_buffer_len = E1000_RXBUFFER_8192;
>> -	else if (max_frame <= E1000_RXBUFFER_16384)
>> +	else
>>  		adapter->rx_buffer_len = E1000_RXBUFFER_16384;
>>  
>>  	/* adjust allocation if LPE protects us, and we aren't using SBP */
>>  	if (!adapter->hw.tbi_compatibility_on &&
>>  	    ((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) ||
>>  	     (max_frame == MAXIMUM_ETHERNET_VLAN_SIZE)))
>> -		adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
>> +		adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE +
>> +					NET_IP_ALIGN;
>>  
>>  	netdev->mtu = new_mtu;
>>  
>> @@ -4002,7 +4003,8 @@ e1000_alloc_rx_buffers(struct e1000_adap
>>  	struct e1000_buffer *buffer_info;
>>  	struct sk_buff *skb;
>>  	unsigned int i;
>> -	unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN;
>> +	/* we have already accounted for NET_IP_ALIGN */
>> +	unsigned int bufsz = adapter->rx_buffer_len;
>>  
>>  	i = rx_ring->next_to_use;
>>  	buffer_info = &rx_ring->buffer_info[i];
>> _
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-09-24 21:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.64.0609220655550.13396@diagnostix.dwd.de>
2006-09-22  7:42 ` 2.6.1[78] page allocation failure. order:3, mode:0x20 Andrew Morton
2006-09-22 12:03   ` Holger Kiehl
2006-09-22 12:12     ` Evgeniy Polyakov
2006-09-22 17:10   ` Auke Kok
2006-09-23  4:50     ` Andrew Morton
2006-09-23  5:25       ` David Miller
2006-09-23  5:33         ` Andrew Morton
2006-09-23 18:50           ` Auke Kok
2006-09-23 20:03             ` David Miller
2006-09-24 15:26           ` Evgeniy Polyakov
2006-09-24 21:15             ` Auke Kok

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).