* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
[not found] <Pine.LNX.4.64.0609220655550.13396@diagnostix.dwd.de>
@ 2006-09-22 7:42 ` Andrew Morton
2006-09-22 12:03 ` Holger Kiehl
2006-09-22 17:10 ` Auke Kok
0 siblings, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2006-09-22 7:42 UTC (permalink / raw)
To: Holger Kiehl; +Cc: linux-kernel, linux-net, netdev
On Fri, 22 Sep 2006 07:27:18 +0000 (GMT)
Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
> I get some of the "page allocation failure" errors. My hardware is 4 CPU
> Opteron with one quad + one dual intel e1000 cards. Kernel is plain 2.6.18
> and for two cards MTU is set to 9000.
>
> Sep 21 21:03:15 athena kernel: vsftpd: page allocation failure. order:3, mode:0x20
> Sep 21 21:03:15 athena kernel:
> Sep 21 21:03:15 athena kernel: Call Trace:
> Sep 21 21:03:15 athena kernel: <IRQ> [<ffffffff8024e516>] __alloc_pages+0x282/0x29b
> Sep 21 21:03:15 athena kernel: [<ffffffff8807aa93>] :ip_tables:ipt_do_table+0x1eb/0x318
> Sep 21 21:03:15 athena kernel: [<ffffffff8026614b>] cache_grow+0x134/0x33d
> Sep 21 21:03:15 athena kernel: [<ffffffff8026664c>] cache_alloc_refill+0x189/0x1d7
> Sep 21 21:03:15 athena kernel: [<ffffffff80266724>] __kmalloc+0x8a/0x94
> Sep 21 21:03:15 athena kernel: [<ffffffff803b5438>] __alloc_skb+0x5c/0x123
> Sep 21 21:03:15 athena kernel: [<ffffffff803b5f2e>] __netdev_alloc_skb+0x12/0x2d
> Sep 21 21:03:15 athena kernel: [<ffffffff8033cb22>] e1000_alloc_rx_buffers+0x6f/0x2f3
> Sep 21 21:03:15 athena kernel: [<ffffffff803d1234>] ip_local_deliver+0x173/0x23b
> Sep 21 21:03:15 athena kernel: [<ffffffff8033d29a>] e1000_clean_rx_irq+0x4f4/0x514
Is OK, it's just a warning and it is expected - the kernel will recover.
I'm half-inclined to shut the warning up by sticking a __GFP_NOWARN in there.
But on the other hand, that warning is handy sometimes. How come kmalloc
decided to request a 32k hunk of memory when the MTU size is only 9k? Is
the driver doing something dumb?
else if (max_frame <= E1000_RXBUFFER_8192)
adapter->rx_buffer_len = E1000_RXBUFFER_8192;
else if (max_frame <= E1000_RXBUFFER_16384)
adapter->rx_buffer_len = E1000_RXBUFFER_16384;
It sure is.
This is going to cause an 9000-byte MTU to use a 16384-byte allocation.
e1000_alloc_rx_buffers() adds two bytes to that, so we do kmalloc(16386),
which causes the slab allocator to request 32768 bytes. All for a 9kbyte skb.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
2006-09-22 7:42 ` 2.6.1[78] page allocation failure. order:3, mode:0x20 Andrew Morton
@ 2006-09-22 12:03 ` Holger Kiehl
2006-09-22 12:12 ` Evgeniy Polyakov
2006-09-22 17:10 ` Auke Kok
1 sibling, 1 reply; 11+ messages in thread
From: Holger Kiehl @ 2006-09-22 12:03 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-net, netdev
On Fri, 22 Sep 2006, Andrew Morton wrote:
> On Fri, 22 Sep 2006 07:27:18 +0000 (GMT)
> Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
>
>> I get some of the "page allocation failure" errors. My hardware is 4 CPU
>> Opteron with one quad + one dual intel e1000 cards. Kernel is plain 2.6.18
>> and for two cards MTU is set to 9000.
>>
>> Sep 21 21:03:15 athena kernel: vsftpd: page allocation failure. order:3, mode:0x20
>> Sep 21 21:03:15 athena kernel:
>> Sep 21 21:03:15 athena kernel: Call Trace:
>> Sep 21 21:03:15 athena kernel: <IRQ> [<ffffffff8024e516>] __alloc_pages+0x282/0x29b
>> Sep 21 21:03:15 athena kernel: [<ffffffff8807aa93>] :ip_tables:ipt_do_table+0x1eb/0x318
>> Sep 21 21:03:15 athena kernel: [<ffffffff8026614b>] cache_grow+0x134/0x33d
>> Sep 21 21:03:15 athena kernel: [<ffffffff8026664c>] cache_alloc_refill+0x189/0x1d7
>> Sep 21 21:03:15 athena kernel: [<ffffffff80266724>] __kmalloc+0x8a/0x94
>> Sep 21 21:03:15 athena kernel: [<ffffffff803b5438>] __alloc_skb+0x5c/0x123
>> Sep 21 21:03:15 athena kernel: [<ffffffff803b5f2e>] __netdev_alloc_skb+0x12/0x2d
>> Sep 21 21:03:15 athena kernel: [<ffffffff8033cb22>] e1000_alloc_rx_buffers+0x6f/0x2f3
>> Sep 21 21:03:15 athena kernel: [<ffffffff803d1234>] ip_local_deliver+0x173/0x23b
>> Sep 21 21:03:15 athena kernel: [<ffffffff8033d29a>] e1000_clean_rx_irq+0x4f4/0x514
>
> Is OK, it's just a warning and it is expected - the kernel will recover.
>
> I'm half-inclined to shut the warning up by sticking a __GFP_NOWARN in there.
>
> But on the other hand, that warning is handy sometimes. How come kmalloc
> decided to request a 32k hunk of memory when the MTU size is only 9k? Is
> the driver doing something dumb?
>
> else if (max_frame <= E1000_RXBUFFER_8192)
> adapter->rx_buffer_len = E1000_RXBUFFER_8192;
> else if (max_frame <= E1000_RXBUFFER_16384)
> adapter->rx_buffer_len = E1000_RXBUFFER_16384;
>
> It sure is.
>
> This is going to cause an 9000-byte MTU to use a 16384-byte allocation.
> e1000_alloc_rx_buffers() adds two bytes to that, so we do kmalloc(16386),
> which causes the slab allocator to request 32768 bytes. All for a 9kbyte skb.
>
I searched the list, which I should have done before asking (I was not sure
if this was due to the e1000) and found this
http://www.ussg.iu.edu/hypermail/linux/kernel/0608.0/0942.html
discusion from 3rd August. As a summary I read that people are trying to find
a solution, in the meantime one should set /proc/sys/vm/min_free_kbytes to
65000 or higher, to ensure that the driver gets enough unfragmented memory.
Holger
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
2006-09-22 12:03 ` Holger Kiehl
@ 2006-09-22 12:12 ` Evgeniy Polyakov
0 siblings, 0 replies; 11+ messages in thread
From: Evgeniy Polyakov @ 2006-09-22 12:12 UTC (permalink / raw)
To: Holger Kiehl; +Cc: Andrew Morton, linux-kernel, linux-net, netdev
On Fri, Sep 22, 2006 at 12:03:11PM +0000, Holger Kiehl (Holger.Kiehl@dwd.de) wrote:
> >This is going to cause an 9000-byte MTU to use a 16384-byte allocation.
> >e1000_alloc_rx_buffers() adds two bytes to that, so we do kmalloc(16386),
> >which causes the slab allocator to request 32768 bytes. All for a 9kbyte
> >skb.
> >
> I searched the list, which I should have done before asking (I was not sure
> if this was due to the e1000) and found this
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0608.0/0942.html
>
> discusion from 3rd August. As a summary I read that people are trying to
> find
> a solution, in the meantime one should set /proc/sys/vm/min_free_kbytes to
> 65000 or higher, to ensure that the driver gets enough unfragmented memory.
There is no solution (although e1000 memory management problem is one of
the reasons I created memory tree allocator) yet, only workarounds, one
of which you described above.
e1000 hardware does not support setting of the maximum transfer size, it
only allows power of two (and about 1500), so it does require 16k of
memory for 9k frame (plus network skb allocation path adds a little which
is transformed into 32k request due to power of two problem).
Intel folks were suggested to either use fragments in one skb (or wait
until network developers invent something new), but there are no patches
from them (hopefully yet).
It is not e1000 only problem - expect even 8k-12k allocation not on
startup is definitely a wrong way.
> Holger
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
2006-09-22 7:42 ` 2.6.1[78] page allocation failure. order:3, mode:0x20 Andrew Morton
2006-09-22 12:03 ` Holger Kiehl
@ 2006-09-22 17:10 ` Auke Kok
2006-09-23 4:50 ` Andrew Morton
1 sibling, 1 reply; 11+ messages in thread
From: Auke Kok @ 2006-09-22 17:10 UTC (permalink / raw)
To: Andrew Morton
Cc: Holger Kiehl, linux-kernel, linux-net, netdev, Ronciak, John
Andrew Morton wrote:
> On Fri, 22 Sep 2006 07:27:18 +0000 (GMT)
> Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
>
>> I get some of the "page allocation failure" errors. My hardware is 4 CPU
>> Opteron with one quad + one dual intel e1000 cards. Kernel is plain 2.6.18
>> and for two cards MTU is set to 9000.
>>
>> Sep 21 21:03:15 athena kernel: vsftpd: page allocation failure. order:3, mode:0x20
>> Sep 21 21:03:15 athena kernel:
>> Sep 21 21:03:15 athena kernel: Call Trace:
>> Sep 21 21:03:15 athena kernel: <IRQ> [<ffffffff8024e516>] __alloc_pages+0x282/0x29b
>> Sep 21 21:03:15 athena kernel: [<ffffffff8807aa93>] :ip_tables:ipt_do_table+0x1eb/0x318
>> Sep 21 21:03:15 athena kernel: [<ffffffff8026614b>] cache_grow+0x134/0x33d
>> Sep 21 21:03:15 athena kernel: [<ffffffff8026664c>] cache_alloc_refill+0x189/0x1d7
>> Sep 21 21:03:15 athena kernel: [<ffffffff80266724>] __kmalloc+0x8a/0x94
>> Sep 21 21:03:15 athena kernel: [<ffffffff803b5438>] __alloc_skb+0x5c/0x123
>> Sep 21 21:03:15 athena kernel: [<ffffffff803b5f2e>] __netdev_alloc_skb+0x12/0x2d
>> Sep 21 21:03:15 athena kernel: [<ffffffff8033cb22>] e1000_alloc_rx_buffers+0x6f/0x2f3
>> Sep 21 21:03:15 athena kernel: [<ffffffff803d1234>] ip_local_deliver+0x173/0x23b
>> Sep 21 21:03:15 athena kernel: [<ffffffff8033d29a>] e1000_clean_rx_irq+0x4f4/0x514
>
> Is OK, it's just a warning and it is expected - the kernel will recover.
>
> I'm half-inclined to shut the warning up by sticking a __GFP_NOWARN in there.
>
> But on the other hand, that warning is handy sometimes. How come kmalloc
> decided to request a 32k hunk of memory when the MTU size is only 9k? Is
> the driver doing something dumb?
>
> else if (max_frame <= E1000_RXBUFFER_8192)
> adapter->rx_buffer_len = E1000_RXBUFFER_8192;
> else if (max_frame <= E1000_RXBUFFER_16384)
> adapter->rx_buffer_len = E1000_RXBUFFER_16384;
>
> It sure is.
>
> This is going to cause an 9000-byte MTU to use a 16384-byte allocation.
> e1000_alloc_rx_buffers() adds two bytes to that, so we do kmalloc(16386),
> which causes the slab allocator to request 32768 bytes. All for a 9kbyte skb.
I wonder if we can't account for NET_IP_ALIGN when selecting bufsize, to get at
rid of at least 1 order size before we netdev_alloc_skb. This should make 9k
frames only kmalloc(16384) and thus stay within the 16k boundary. I hope.
Completely untested: don't commit :)
Auke
---
e1000: account for NET_IP_ALIGN when calculating bufsiz
Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to
reduce slab allocation by half.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index bb0d129..20b1f39 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -1144,7 +1144,7 @@ #endif
pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word);
- adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
+ adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN;
adapter->rx_ps_bsize0 = E1000_RXBUFFER_128;
hw->max_frame_size = netdev->mtu +
ENET_HEADER_SIZE + ETHERNET_FCS_SIZE;
@@ -3234,26 +3234,27 @@ #define MAX_STD_JUMBO_FRAME_SIZE 9234
* larger slab size
* i.e. RXBUFFER_2048 --> size-4096 slab */
- if (max_frame <= E1000_RXBUFFER_256)
+ if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256)
adapter->rx_buffer_len = E1000_RXBUFFER_256;
- else if (max_frame <= E1000_RXBUFFER_512)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512)
adapter->rx_buffer_len = E1000_RXBUFFER_512;
- else if (max_frame <= E1000_RXBUFFER_1024)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024)
adapter->rx_buffer_len = E1000_RXBUFFER_1024;
- else if (max_frame <= E1000_RXBUFFER_2048)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048)
adapter->rx_buffer_len = E1000_RXBUFFER_2048;
- else if (max_frame <= E1000_RXBUFFER_4096)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096)
adapter->rx_buffer_len = E1000_RXBUFFER_4096;
- else if (max_frame <= E1000_RXBUFFER_8192)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192)
adapter->rx_buffer_len = E1000_RXBUFFER_8192;
- else if (max_frame <= E1000_RXBUFFER_16384)
+ else
adapter->rx_buffer_len = E1000_RXBUFFER_16384;
/* adjust allocation if LPE protects us, and we aren't using SBP */
if (!adapter->hw.tbi_compatibility_on &&
((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) ||
(max_frame == MAXIMUM_ETHERNET_VLAN_SIZE)))
- adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
+ adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE
+ + NET_IP_ALIGN;
netdev->mtu = new_mtu;
@@ -4076,7 +4076,8 @@ e1000_alloc_rx_buffers(struct e1000_adap
struct e1000_buffer *buffer_info;
struct sk_buff *skb;
unsigned int i;
- unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN;
+ /* we have already accounted for NET_IP_ALIGN */
+ unsigned int bufsz = adapter->rx_buffer_len;
i = rx_ring->next_to_use;
buffer_info = &rx_ring->buffer_info[i];
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
2006-09-22 17:10 ` Auke Kok
@ 2006-09-23 4:50 ` Andrew Morton
2006-09-23 5:25 ` David Miller
0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2006-09-23 4:50 UTC (permalink / raw)
To: Auke Kok; +Cc: Holger Kiehl, linux-kernel, linux-net, netdev, Ronciak, John
On Fri, 22 Sep 2006 10:10:36 -0700
Auke Kok <auke-jan.h.kok@intel.com> wrote:
> I wonder if we can't account for NET_IP_ALIGN when selecting bufsize, to get at
> rid of at least 1 order size before we netdev_alloc_skb. This should make 9k
> frames only kmalloc(16384) and thus stay within the 16k boundary. I hope.
>
> Completely untested: don't commit :)
>
I did - I think we want this patch.
>
> e1000: account for NET_IP_ALIGN when calculating bufsiz
>
> Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to
> reduce slab allocation by half.
Could we please do whatever is needed to get this blessed and merged? This
is such a common problem on such a common driver that I would suggest that
we want this in 2.6.18.x as well. At least, I'd expect distributors to
ship this fix (they're nuts if they don't) and so it makes sense to deliver
it from kernel.org.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
2006-09-23 4:50 ` Andrew Morton
@ 2006-09-23 5:25 ` David Miller
2006-09-23 5:33 ` Andrew Morton
0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2006-09-23 5:25 UTC (permalink / raw)
To: akpm
Cc: auke-jan.h.kok, Holger.Kiehl, linux-kernel, linux-net, netdev,
john.ronciak
From: Andrew Morton <akpm@osdl.org>
Date: Fri, 22 Sep 2006 21:50:00 -0700
> On Fri, 22 Sep 2006 10:10:36 -0700
> Auke Kok <auke-jan.h.kok@intel.com> wrote:
>
> > e1000: account for NET_IP_ALIGN when calculating bufsiz
> >
> > Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to
> > reduce slab allocation by half.
>
> Could we please do whatever is needed to get this blessed and merged? This
> is such a common problem on such a common driver that I would suggest that
> we want this in 2.6.18.x as well. At least, I'd expect distributors to
> ship this fix (they're nuts if they don't) and so it makes sense to deliver
> it from kernel.org.
The NET_IP_ALIGN existed not just for fun :) There are ramifications
for removing it.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
2006-09-23 5:25 ` David Miller
@ 2006-09-23 5:33 ` Andrew Morton
2006-09-23 18:50 ` Auke Kok
2006-09-24 15:26 ` Evgeniy Polyakov
0 siblings, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2006-09-23 5:33 UTC (permalink / raw)
To: David Miller
Cc: auke-jan.h.kok, Holger.Kiehl, linux-kernel, linux-net, netdev,
john.ronciak
On Fri, 22 Sep 2006 22:25:07 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:
> From: Andrew Morton <akpm@osdl.org>
> Date: Fri, 22 Sep 2006 21:50:00 -0700
>
> > On Fri, 22 Sep 2006 10:10:36 -0700
> > Auke Kok <auke-jan.h.kok@intel.com> wrote:
> >
> > > e1000: account for NET_IP_ALIGN when calculating bufsiz
> > >
> > > Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to
> > > reduce slab allocation by half.
> >
> > Could we please do whatever is needed to get this blessed and merged? This
> > is such a common problem on such a common driver that I would suggest that
> > we want this in 2.6.18.x as well. At least, I'd expect distributors to
> > ship this fix (they're nuts if they don't) and so it makes sense to deliver
> > it from kernel.org.
>
> The NET_IP_ALIGN existed not just for fun :) There are ramifications
> for removing it.
It's still there, isn't it?
For the 9k MTU case, for example, we end up allocating 16384 byte skbs
instead of 32786 kbytes ones.
diff -puN drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz drivers/net/e1000/e1000_main.c
--- a/drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz
+++ a/drivers/net/e1000/e1000_main.c
@@ -1101,7 +1101,7 @@ e1000_sw_init(struct e1000_adapter *adap
pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word);
- adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
+ adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN;
adapter->rx_ps_bsize0 = E1000_RXBUFFER_128;
hw->max_frame_size = netdev->mtu +
ENET_HEADER_SIZE + ETHERNET_FCS_SIZE;
@@ -3163,26 +3163,27 @@ e1000_change_mtu(struct net_device *netd
* larger slab size
* i.e. RXBUFFER_2048 --> size-4096 slab */
- if (max_frame <= E1000_RXBUFFER_256)
+ if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256)
adapter->rx_buffer_len = E1000_RXBUFFER_256;
- else if (max_frame <= E1000_RXBUFFER_512)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512)
adapter->rx_buffer_len = E1000_RXBUFFER_512;
- else if (max_frame <= E1000_RXBUFFER_1024)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024)
adapter->rx_buffer_len = E1000_RXBUFFER_1024;
- else if (max_frame <= E1000_RXBUFFER_2048)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048)
adapter->rx_buffer_len = E1000_RXBUFFER_2048;
- else if (max_frame <= E1000_RXBUFFER_4096)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096)
adapter->rx_buffer_len = E1000_RXBUFFER_4096;
- else if (max_frame <= E1000_RXBUFFER_8192)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192)
adapter->rx_buffer_len = E1000_RXBUFFER_8192;
- else if (max_frame <= E1000_RXBUFFER_16384)
+ else
adapter->rx_buffer_len = E1000_RXBUFFER_16384;
/* adjust allocation if LPE protects us, and we aren't using SBP */
if (!adapter->hw.tbi_compatibility_on &&
((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) ||
(max_frame == MAXIMUM_ETHERNET_VLAN_SIZE)))
- adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
+ adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE +
+ NET_IP_ALIGN;
netdev->mtu = new_mtu;
@@ -4002,7 +4003,8 @@ e1000_alloc_rx_buffers(struct e1000_adap
struct e1000_buffer *buffer_info;
struct sk_buff *skb;
unsigned int i;
- unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN;
+ /* we have already accounted for NET_IP_ALIGN */
+ unsigned int bufsz = adapter->rx_buffer_len;
i = rx_ring->next_to_use;
buffer_info = &rx_ring->buffer_info[i];
_
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
2006-09-23 5:33 ` Andrew Morton
@ 2006-09-23 18:50 ` Auke Kok
2006-09-23 20:03 ` David Miller
2006-09-24 15:26 ` Evgeniy Polyakov
1 sibling, 1 reply; 11+ messages in thread
From: Auke Kok @ 2006-09-23 18:50 UTC (permalink / raw)
To: Andrew Morton
Cc: David Miller, Holger.Kiehl, linux-kernel, linux-net, netdev,
john.ronciak
Andrew Morton wrote:
> On Fri, 22 Sep 2006 22:25:07 -0700 (PDT)
> David Miller <davem@davemloft.net> wrote:
>
>> From: Andrew Morton <akpm@osdl.org>
>> Date: Fri, 22 Sep 2006 21:50:00 -0700
>>
>>> On Fri, 22 Sep 2006 10:10:36 -0700
>>> Auke Kok <auke-jan.h.kok@intel.com> wrote:
>>>
>>>> e1000: account for NET_IP_ALIGN when calculating bufsiz
>>>>
>>>> Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to
>>>> reduce slab allocation by half.
>>> Could we please do whatever is needed to get this blessed and merged? This
>>> is such a common problem on such a common driver that I would suggest that
>>> we want this in 2.6.18.x as well. At least, I'd expect distributors to
>>> ship this fix (they're nuts if they don't) and so it makes sense to deliver
>>> it from kernel.org.
>> The NET_IP_ALIGN existed not just for fun :) There are ramifications
>> for removing it.
>
> It's still there, isn't it?
>
> For the 9k MTU case, for example, we end up allocating 16384 byte skbs
> instead of 32786 kbytes ones.
yes, the only thing I'm doing is accounting for the 2 bytes one steap earlier.
It works fine for the general case and I tested it too, but I am not too sure
about the corner cases as the hardware has no notion of mtu at all and could
possibly overwrite by two bytes. I think my patch actually give the hardware
two bytes too much now, so we're on the other side (safe) of that problem, but
I have to verify this first of course.
I'll be wrestling this on monday with Jesse and try to nail it down.
Auke
>
>
> diff -puN drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz drivers/net/e1000/e1000_main.c
> --- a/drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz
> +++ a/drivers/net/e1000/e1000_main.c
> @@ -1101,7 +1101,7 @@ e1000_sw_init(struct e1000_adapter *adap
>
> pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word);
>
> - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
> + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN;
> adapter->rx_ps_bsize0 = E1000_RXBUFFER_128;
> hw->max_frame_size = netdev->mtu +
> ENET_HEADER_SIZE + ETHERNET_FCS_SIZE;
> @@ -3163,26 +3163,27 @@ e1000_change_mtu(struct net_device *netd
> * larger slab size
> * i.e. RXBUFFER_2048 --> size-4096 slab */
>
> - if (max_frame <= E1000_RXBUFFER_256)
> + if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256)
> adapter->rx_buffer_len = E1000_RXBUFFER_256;
> - else if (max_frame <= E1000_RXBUFFER_512)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512)
> adapter->rx_buffer_len = E1000_RXBUFFER_512;
> - else if (max_frame <= E1000_RXBUFFER_1024)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024)
> adapter->rx_buffer_len = E1000_RXBUFFER_1024;
> - else if (max_frame <= E1000_RXBUFFER_2048)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048)
> adapter->rx_buffer_len = E1000_RXBUFFER_2048;
> - else if (max_frame <= E1000_RXBUFFER_4096)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096)
> adapter->rx_buffer_len = E1000_RXBUFFER_4096;
> - else if (max_frame <= E1000_RXBUFFER_8192)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192)
> adapter->rx_buffer_len = E1000_RXBUFFER_8192;
> - else if (max_frame <= E1000_RXBUFFER_16384)
> + else
> adapter->rx_buffer_len = E1000_RXBUFFER_16384;
>
> /* adjust allocation if LPE protects us, and we aren't using SBP */
> if (!adapter->hw.tbi_compatibility_on &&
> ((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) ||
> (max_frame == MAXIMUM_ETHERNET_VLAN_SIZE)))
> - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
> + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE +
> + NET_IP_ALIGN;
>
> netdev->mtu = new_mtu;
>
> @@ -4002,7 +4003,8 @@ e1000_alloc_rx_buffers(struct e1000_adap
> struct e1000_buffer *buffer_info;
> struct sk_buff *skb;
> unsigned int i;
> - unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN;
> + /* we have already accounted for NET_IP_ALIGN */
> + unsigned int bufsz = adapter->rx_buffer_len;
>
> i = rx_ring->next_to_use;
> buffer_info = &rx_ring->buffer_info[i];
> _
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
2006-09-23 18:50 ` Auke Kok
@ 2006-09-23 20:03 ` David Miller
0 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2006-09-23 20:03 UTC (permalink / raw)
To: auke-jan.h.kok
Cc: akpm, Holger.Kiehl, linux-kernel, linux-net, netdev, john.ronciak
From: Auke Kok <auke-jan.h.kok@intel.com>
Date: Sat, 23 Sep 2006 11:50:34 -0700
> Andrew Morton wrote:
> > It's still there, isn't it?
> >
> > For the 9k MTU case, for example, we end up allocating 16384 byte skbs
> > instead of 32786 kbytes ones.
>
> yes, the only thing I'm doing is accounting for the 2 bytes one steap earlier.
Ok, I'm fine with this patch unless it causes some regression that hasn't
been discovered yet :-)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
2006-09-23 5:33 ` Andrew Morton
2006-09-23 18:50 ` Auke Kok
@ 2006-09-24 15:26 ` Evgeniy Polyakov
2006-09-24 21:15 ` Auke Kok
1 sibling, 1 reply; 11+ messages in thread
From: Evgeniy Polyakov @ 2006-09-24 15:26 UTC (permalink / raw)
To: Andrew Morton
Cc: David Miller, auke-jan.h.kok, Holger.Kiehl, linux-kernel,
linux-net, netdev, john.ronciak
On Fri, Sep 22, 2006 at 10:33:48PM -0700, Andrew Morton (akpm@osdl.org) wrote:
> > The NET_IP_ALIGN existed not just for fun :) There are ramifications
> > for removing it.
>
> It's still there, isn't it?
>
> For the 9k MTU case, for example, we end up allocating 16384 byte skbs
> instead of 32786 kbytes ones.
This patch will not help - netdev_alloc_skb() adds additional
NET_SKB_PAD and then alloc_skb() adds sizeof(struct skb_shared_info).
And even if you acconut for them in adapter->rx_buf_len, chip still can
overwrite that area (in the thread mentioned in this e-mail thread
before I posted such patch and received a dump of sizes chip receives -
there were a lot of _different_ ones which were too close to the limit).
>
> diff -puN drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz drivers/net/e1000/e1000_main.c
> --- a/drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz
> +++ a/drivers/net/e1000/e1000_main.c
> @@ -1101,7 +1101,7 @@ e1000_sw_init(struct e1000_adapter *adap
>
> pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word);
>
> - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
> + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN;
> adapter->rx_ps_bsize0 = E1000_RXBUFFER_128;
> hw->max_frame_size = netdev->mtu +
> ENET_HEADER_SIZE + ETHERNET_FCS_SIZE;
> @@ -3163,26 +3163,27 @@ e1000_change_mtu(struct net_device *netd
> * larger slab size
> * i.e. RXBUFFER_2048 --> size-4096 slab */
>
> - if (max_frame <= E1000_RXBUFFER_256)
> + if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256)
> adapter->rx_buffer_len = E1000_RXBUFFER_256;
> - else if (max_frame <= E1000_RXBUFFER_512)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512)
> adapter->rx_buffer_len = E1000_RXBUFFER_512;
> - else if (max_frame <= E1000_RXBUFFER_1024)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024)
> adapter->rx_buffer_len = E1000_RXBUFFER_1024;
> - else if (max_frame <= E1000_RXBUFFER_2048)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048)
> adapter->rx_buffer_len = E1000_RXBUFFER_2048;
> - else if (max_frame <= E1000_RXBUFFER_4096)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096)
> adapter->rx_buffer_len = E1000_RXBUFFER_4096;
> - else if (max_frame <= E1000_RXBUFFER_8192)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192)
> adapter->rx_buffer_len = E1000_RXBUFFER_8192;
> - else if (max_frame <= E1000_RXBUFFER_16384)
> + else
> adapter->rx_buffer_len = E1000_RXBUFFER_16384;
>
> /* adjust allocation if LPE protects us, and we aren't using SBP */
> if (!adapter->hw.tbi_compatibility_on &&
> ((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) ||
> (max_frame == MAXIMUM_ETHERNET_VLAN_SIZE)))
> - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
> + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE +
> + NET_IP_ALIGN;
>
> netdev->mtu = new_mtu;
>
> @@ -4002,7 +4003,8 @@ e1000_alloc_rx_buffers(struct e1000_adap
> struct e1000_buffer *buffer_info;
> struct sk_buff *skb;
> unsigned int i;
> - unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN;
> + /* we have already accounted for NET_IP_ALIGN */
> + unsigned int bufsz = adapter->rx_buffer_len;
>
> i = rx_ring->next_to_use;
> buffer_info = &rx_ring->buffer_info[i];
> _
>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20
2006-09-24 15:26 ` Evgeniy Polyakov
@ 2006-09-24 21:15 ` Auke Kok
0 siblings, 0 replies; 11+ messages in thread
From: Auke Kok @ 2006-09-24 21:15 UTC (permalink / raw)
To: Andrew Morton
Cc: Evgeniy Polyakov, David Miller, Holger.Kiehl, linux-kernel,
linux-net, netdev, john.ronciak
Evgeniy Polyakov wrote:
> On Fri, Sep 22, 2006 at 10:33:48PM -0700, Andrew Morton (akpm@osdl.org) wrote:
>>> The NET_IP_ALIGN existed not just for fun :) There are ramifications
>>> for removing it.
>> It's still there, isn't it?
>>
>> For the 9k MTU case, for example, we end up allocating 16384 byte skbs
>> instead of 32786 kbytes ones.
>
> This patch will not help - netdev_alloc_skb() adds additional
> NET_SKB_PAD and then alloc_skb() adds sizeof(struct skb_shared_info).
> And even if you acconut for them in adapter->rx_buf_len, chip still can
> overwrite that area (in the thread mentioned in this e-mail thread
> before I posted such patch and received a dump of sizes chip receives -
> there were a lot of _different_ ones which were too close to the limit).
I just did the math on it and it does not compute as I wanted too, we're
basically flowing to the next larger buffersize 2 mtu bytes earlier, undoing
any benefit completely.
There is not much that can fix this issue since the hardware will always
receive in 2-order buffers and dma that back in its entirity, so we must always
claim size for NET_IP_ALIGN and NET_SKB_PAD after the 2-order bufsz. For the
9kb mtu case (16kb hw bufsz), we're stuck with 32kb slab allocations. bummer.
Andrew, please drop this patch.
Auke
>
>> diff -puN drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz drivers/net/e1000/e1000_main.c
>> --- a/drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz
>> +++ a/drivers/net/e1000/e1000_main.c
>> @@ -1101,7 +1101,7 @@ e1000_sw_init(struct e1000_adapter *adap
>>
>> pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word);
>>
>> - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
>> + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN;
>> adapter->rx_ps_bsize0 = E1000_RXBUFFER_128;
>> hw->max_frame_size = netdev->mtu +
>> ENET_HEADER_SIZE + ETHERNET_FCS_SIZE;
>> @@ -3163,26 +3163,27 @@ e1000_change_mtu(struct net_device *netd
>> * larger slab size
>> * i.e. RXBUFFER_2048 --> size-4096 slab */
>>
>> - if (max_frame <= E1000_RXBUFFER_256)
>> + if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256)
>> adapter->rx_buffer_len = E1000_RXBUFFER_256;
>> - else if (max_frame <= E1000_RXBUFFER_512)
>> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512)
>> adapter->rx_buffer_len = E1000_RXBUFFER_512;
>> - else if (max_frame <= E1000_RXBUFFER_1024)
>> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024)
>> adapter->rx_buffer_len = E1000_RXBUFFER_1024;
>> - else if (max_frame <= E1000_RXBUFFER_2048)
>> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048)
>> adapter->rx_buffer_len = E1000_RXBUFFER_2048;
>> - else if (max_frame <= E1000_RXBUFFER_4096)
>> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096)
>> adapter->rx_buffer_len = E1000_RXBUFFER_4096;
>> - else if (max_frame <= E1000_RXBUFFER_8192)
>> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192)
>> adapter->rx_buffer_len = E1000_RXBUFFER_8192;
>> - else if (max_frame <= E1000_RXBUFFER_16384)
>> + else
>> adapter->rx_buffer_len = E1000_RXBUFFER_16384;
>>
>> /* adjust allocation if LPE protects us, and we aren't using SBP */
>> if (!adapter->hw.tbi_compatibility_on &&
>> ((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) ||
>> (max_frame == MAXIMUM_ETHERNET_VLAN_SIZE)))
>> - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
>> + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE +
>> + NET_IP_ALIGN;
>>
>> netdev->mtu = new_mtu;
>>
>> @@ -4002,7 +4003,8 @@ e1000_alloc_rx_buffers(struct e1000_adap
>> struct e1000_buffer *buffer_info;
>> struct sk_buff *skb;
>> unsigned int i;
>> - unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN;
>> + /* we have already accounted for NET_IP_ALIGN */
>> + unsigned int bufsz = adapter->rx_buffer_len;
>>
>> i = rx_ring->next_to_use;
>> buffer_info = &rx_ring->buffer_info[i];
>> _
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-09-24 21:15 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.64.0609220655550.13396@diagnostix.dwd.de>
2006-09-22 7:42 ` 2.6.1[78] page allocation failure. order:3, mode:0x20 Andrew Morton
2006-09-22 12:03 ` Holger Kiehl
2006-09-22 12:12 ` Evgeniy Polyakov
2006-09-22 17:10 ` Auke Kok
2006-09-23 4:50 ` Andrew Morton
2006-09-23 5:25 ` David Miller
2006-09-23 5:33 ` Andrew Morton
2006-09-23 18:50 ` Auke Kok
2006-09-23 20:03 ` David Miller
2006-09-24 15:26 ` Evgeniy Polyakov
2006-09-24 21:15 ` Auke Kok
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).