* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20 [not found] <Pine.LNX.4.64.0609220655550.13396@diagnostix.dwd.de> @ 2006-09-22 7:42 ` Andrew Morton 2006-09-22 12:03 ` Holger Kiehl 2006-09-22 17:10 ` Auke Kok 0 siblings, 2 replies; 11+ messages in thread From: Andrew Morton @ 2006-09-22 7:42 UTC (permalink / raw) To: Holger Kiehl; +Cc: linux-kernel, linux-net, netdev On Fri, 22 Sep 2006 07:27:18 +0000 (GMT) Holger Kiehl <Holger.Kiehl@dwd.de> wrote: > I get some of the "page allocation failure" errors. My hardware is 4 CPU > Opteron with one quad + one dual intel e1000 cards. Kernel is plain 2.6.18 > and for two cards MTU is set to 9000. > > Sep 21 21:03:15 athena kernel: vsftpd: page allocation failure. order:3, mode:0x20 > Sep 21 21:03:15 athena kernel: > Sep 21 21:03:15 athena kernel: Call Trace: > Sep 21 21:03:15 athena kernel: <IRQ> [<ffffffff8024e516>] __alloc_pages+0x282/0x29b > Sep 21 21:03:15 athena kernel: [<ffffffff8807aa93>] :ip_tables:ipt_do_table+0x1eb/0x318 > Sep 21 21:03:15 athena kernel: [<ffffffff8026614b>] cache_grow+0x134/0x33d > Sep 21 21:03:15 athena kernel: [<ffffffff8026664c>] cache_alloc_refill+0x189/0x1d7 > Sep 21 21:03:15 athena kernel: [<ffffffff80266724>] __kmalloc+0x8a/0x94 > Sep 21 21:03:15 athena kernel: [<ffffffff803b5438>] __alloc_skb+0x5c/0x123 > Sep 21 21:03:15 athena kernel: [<ffffffff803b5f2e>] __netdev_alloc_skb+0x12/0x2d > Sep 21 21:03:15 athena kernel: [<ffffffff8033cb22>] e1000_alloc_rx_buffers+0x6f/0x2f3 > Sep 21 21:03:15 athena kernel: [<ffffffff803d1234>] ip_local_deliver+0x173/0x23b > Sep 21 21:03:15 athena kernel: [<ffffffff8033d29a>] e1000_clean_rx_irq+0x4f4/0x514 Is OK, it's just a warning and it is expected - the kernel will recover. I'm half-inclined to shut the warning up by sticking a __GFP_NOWARN in there. But on the other hand, that warning is handy sometimes. How come kmalloc decided to request a 32k hunk of memory when the MTU size is only 9k? Is the driver doing something dumb? else if (max_frame <= E1000_RXBUFFER_8192) adapter->rx_buffer_len = E1000_RXBUFFER_8192; else if (max_frame <= E1000_RXBUFFER_16384) adapter->rx_buffer_len = E1000_RXBUFFER_16384; It sure is. This is going to cause an 9000-byte MTU to use a 16384-byte allocation. e1000_alloc_rx_buffers() adds two bytes to that, so we do kmalloc(16386), which causes the slab allocator to request 32768 bytes. All for a 9kbyte skb. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20 2006-09-22 7:42 ` 2.6.1[78] page allocation failure. order:3, mode:0x20 Andrew Morton @ 2006-09-22 12:03 ` Holger Kiehl 2006-09-22 12:12 ` Evgeniy Polyakov 2006-09-22 17:10 ` Auke Kok 1 sibling, 1 reply; 11+ messages in thread From: Holger Kiehl @ 2006-09-22 12:03 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-net, netdev On Fri, 22 Sep 2006, Andrew Morton wrote: > On Fri, 22 Sep 2006 07:27:18 +0000 (GMT) > Holger Kiehl <Holger.Kiehl@dwd.de> wrote: > >> I get some of the "page allocation failure" errors. My hardware is 4 CPU >> Opteron with one quad + one dual intel e1000 cards. Kernel is plain 2.6.18 >> and for two cards MTU is set to 9000. >> >> Sep 21 21:03:15 athena kernel: vsftpd: page allocation failure. order:3, mode:0x20 >> Sep 21 21:03:15 athena kernel: >> Sep 21 21:03:15 athena kernel: Call Trace: >> Sep 21 21:03:15 athena kernel: <IRQ> [<ffffffff8024e516>] __alloc_pages+0x282/0x29b >> Sep 21 21:03:15 athena kernel: [<ffffffff8807aa93>] :ip_tables:ipt_do_table+0x1eb/0x318 >> Sep 21 21:03:15 athena kernel: [<ffffffff8026614b>] cache_grow+0x134/0x33d >> Sep 21 21:03:15 athena kernel: [<ffffffff8026664c>] cache_alloc_refill+0x189/0x1d7 >> Sep 21 21:03:15 athena kernel: [<ffffffff80266724>] __kmalloc+0x8a/0x94 >> Sep 21 21:03:15 athena kernel: [<ffffffff803b5438>] __alloc_skb+0x5c/0x123 >> Sep 21 21:03:15 athena kernel: [<ffffffff803b5f2e>] __netdev_alloc_skb+0x12/0x2d >> Sep 21 21:03:15 athena kernel: [<ffffffff8033cb22>] e1000_alloc_rx_buffers+0x6f/0x2f3 >> Sep 21 21:03:15 athena kernel: [<ffffffff803d1234>] ip_local_deliver+0x173/0x23b >> Sep 21 21:03:15 athena kernel: [<ffffffff8033d29a>] e1000_clean_rx_irq+0x4f4/0x514 > > Is OK, it's just a warning and it is expected - the kernel will recover. > > I'm half-inclined to shut the warning up by sticking a __GFP_NOWARN in there. > > But on the other hand, that warning is handy sometimes. How come kmalloc > decided to request a 32k hunk of memory when the MTU size is only 9k? Is > the driver doing something dumb? > > else if (max_frame <= E1000_RXBUFFER_8192) > adapter->rx_buffer_len = E1000_RXBUFFER_8192; > else if (max_frame <= E1000_RXBUFFER_16384) > adapter->rx_buffer_len = E1000_RXBUFFER_16384; > > It sure is. > > This is going to cause an 9000-byte MTU to use a 16384-byte allocation. > e1000_alloc_rx_buffers() adds two bytes to that, so we do kmalloc(16386), > which causes the slab allocator to request 32768 bytes. All for a 9kbyte skb. > I searched the list, which I should have done before asking (I was not sure if this was due to the e1000) and found this http://www.ussg.iu.edu/hypermail/linux/kernel/0608.0/0942.html discusion from 3rd August. As a summary I read that people are trying to find a solution, in the meantime one should set /proc/sys/vm/min_free_kbytes to 65000 or higher, to ensure that the driver gets enough unfragmented memory. Holger ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20 2006-09-22 12:03 ` Holger Kiehl @ 2006-09-22 12:12 ` Evgeniy Polyakov 0 siblings, 0 replies; 11+ messages in thread From: Evgeniy Polyakov @ 2006-09-22 12:12 UTC (permalink / raw) To: Holger Kiehl; +Cc: Andrew Morton, linux-kernel, linux-net, netdev On Fri, Sep 22, 2006 at 12:03:11PM +0000, Holger Kiehl (Holger.Kiehl@dwd.de) wrote: > >This is going to cause an 9000-byte MTU to use a 16384-byte allocation. > >e1000_alloc_rx_buffers() adds two bytes to that, so we do kmalloc(16386), > >which causes the slab allocator to request 32768 bytes. All for a 9kbyte > >skb. > > > I searched the list, which I should have done before asking (I was not sure > if this was due to the e1000) and found this > > http://www.ussg.iu.edu/hypermail/linux/kernel/0608.0/0942.html > > discusion from 3rd August. As a summary I read that people are trying to > find > a solution, in the meantime one should set /proc/sys/vm/min_free_kbytes to > 65000 or higher, to ensure that the driver gets enough unfragmented memory. There is no solution (although e1000 memory management problem is one of the reasons I created memory tree allocator) yet, only workarounds, one of which you described above. e1000 hardware does not support setting of the maximum transfer size, it only allows power of two (and about 1500), so it does require 16k of memory for 9k frame (plus network skb allocation path adds a little which is transformed into 32k request due to power of two problem). Intel folks were suggested to either use fragments in one skb (or wait until network developers invent something new), but there are no patches from them (hopefully yet). It is not e1000 only problem - expect even 8k-12k allocation not on startup is definitely a wrong way. > Holger -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20 2006-09-22 7:42 ` 2.6.1[78] page allocation failure. order:3, mode:0x20 Andrew Morton 2006-09-22 12:03 ` Holger Kiehl @ 2006-09-22 17:10 ` Auke Kok 2006-09-23 4:50 ` Andrew Morton 1 sibling, 1 reply; 11+ messages in thread From: Auke Kok @ 2006-09-22 17:10 UTC (permalink / raw) To: Andrew Morton Cc: Holger Kiehl, linux-kernel, linux-net, netdev, Ronciak, John Andrew Morton wrote: > On Fri, 22 Sep 2006 07:27:18 +0000 (GMT) > Holger Kiehl <Holger.Kiehl@dwd.de> wrote: > >> I get some of the "page allocation failure" errors. My hardware is 4 CPU >> Opteron with one quad + one dual intel e1000 cards. Kernel is plain 2.6.18 >> and for two cards MTU is set to 9000. >> >> Sep 21 21:03:15 athena kernel: vsftpd: page allocation failure. order:3, mode:0x20 >> Sep 21 21:03:15 athena kernel: >> Sep 21 21:03:15 athena kernel: Call Trace: >> Sep 21 21:03:15 athena kernel: <IRQ> [<ffffffff8024e516>] __alloc_pages+0x282/0x29b >> Sep 21 21:03:15 athena kernel: [<ffffffff8807aa93>] :ip_tables:ipt_do_table+0x1eb/0x318 >> Sep 21 21:03:15 athena kernel: [<ffffffff8026614b>] cache_grow+0x134/0x33d >> Sep 21 21:03:15 athena kernel: [<ffffffff8026664c>] cache_alloc_refill+0x189/0x1d7 >> Sep 21 21:03:15 athena kernel: [<ffffffff80266724>] __kmalloc+0x8a/0x94 >> Sep 21 21:03:15 athena kernel: [<ffffffff803b5438>] __alloc_skb+0x5c/0x123 >> Sep 21 21:03:15 athena kernel: [<ffffffff803b5f2e>] __netdev_alloc_skb+0x12/0x2d >> Sep 21 21:03:15 athena kernel: [<ffffffff8033cb22>] e1000_alloc_rx_buffers+0x6f/0x2f3 >> Sep 21 21:03:15 athena kernel: [<ffffffff803d1234>] ip_local_deliver+0x173/0x23b >> Sep 21 21:03:15 athena kernel: [<ffffffff8033d29a>] e1000_clean_rx_irq+0x4f4/0x514 > > Is OK, it's just a warning and it is expected - the kernel will recover. > > I'm half-inclined to shut the warning up by sticking a __GFP_NOWARN in there. > > But on the other hand, that warning is handy sometimes. How come kmalloc > decided to request a 32k hunk of memory when the MTU size is only 9k? Is > the driver doing something dumb? > > else if (max_frame <= E1000_RXBUFFER_8192) > adapter->rx_buffer_len = E1000_RXBUFFER_8192; > else if (max_frame <= E1000_RXBUFFER_16384) > adapter->rx_buffer_len = E1000_RXBUFFER_16384; > > It sure is. > > This is going to cause an 9000-byte MTU to use a 16384-byte allocation. > e1000_alloc_rx_buffers() adds two bytes to that, so we do kmalloc(16386), > which causes the slab allocator to request 32768 bytes. All for a 9kbyte skb. I wonder if we can't account for NET_IP_ALIGN when selecting bufsize, to get at rid of at least 1 order size before we netdev_alloc_skb. This should make 9k frames only kmalloc(16384) and thus stay within the 16k boundary. I hope. Completely untested: don't commit :) Auke --- e1000: account for NET_IP_ALIGN when calculating bufsiz Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to reduce slab allocation by half. Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index bb0d129..20b1f39 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -1144,7 +1144,7 @@ #endif pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word); - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE; + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN; adapter->rx_ps_bsize0 = E1000_RXBUFFER_128; hw->max_frame_size = netdev->mtu + ENET_HEADER_SIZE + ETHERNET_FCS_SIZE; @@ -3234,26 +3234,27 @@ #define MAX_STD_JUMBO_FRAME_SIZE 9234 * larger slab size * i.e. RXBUFFER_2048 --> size-4096 slab */ - if (max_frame <= E1000_RXBUFFER_256) + if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256) adapter->rx_buffer_len = E1000_RXBUFFER_256; - else if (max_frame <= E1000_RXBUFFER_512) + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512) adapter->rx_buffer_len = E1000_RXBUFFER_512; - else if (max_frame <= E1000_RXBUFFER_1024) + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024) adapter->rx_buffer_len = E1000_RXBUFFER_1024; - else if (max_frame <= E1000_RXBUFFER_2048) + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048) adapter->rx_buffer_len = E1000_RXBUFFER_2048; - else if (max_frame <= E1000_RXBUFFER_4096) + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096) adapter->rx_buffer_len = E1000_RXBUFFER_4096; - else if (max_frame <= E1000_RXBUFFER_8192) + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192) adapter->rx_buffer_len = E1000_RXBUFFER_8192; - else if (max_frame <= E1000_RXBUFFER_16384) + else adapter->rx_buffer_len = E1000_RXBUFFER_16384; /* adjust allocation if LPE protects us, and we aren't using SBP */ if (!adapter->hw.tbi_compatibility_on && ((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) || (max_frame == MAXIMUM_ETHERNET_VLAN_SIZE))) - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE; + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + + NET_IP_ALIGN; netdev->mtu = new_mtu; @@ -4076,7 +4076,8 @@ e1000_alloc_rx_buffers(struct e1000_adap struct e1000_buffer *buffer_info; struct sk_buff *skb; unsigned int i; - unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN; + /* we have already accounted for NET_IP_ALIGN */ + unsigned int bufsz = adapter->rx_buffer_len; i = rx_ring->next_to_use; buffer_info = &rx_ring->buffer_info[i]; ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20 2006-09-22 17:10 ` Auke Kok @ 2006-09-23 4:50 ` Andrew Morton 2006-09-23 5:25 ` David Miller 0 siblings, 1 reply; 11+ messages in thread From: Andrew Morton @ 2006-09-23 4:50 UTC (permalink / raw) To: Auke Kok; +Cc: Holger Kiehl, linux-kernel, linux-net, netdev, Ronciak, John On Fri, 22 Sep 2006 10:10:36 -0700 Auke Kok <auke-jan.h.kok@intel.com> wrote: > I wonder if we can't account for NET_IP_ALIGN when selecting bufsize, to get at > rid of at least 1 order size before we netdev_alloc_skb. This should make 9k > frames only kmalloc(16384) and thus stay within the 16k boundary. I hope. > > Completely untested: don't commit :) > I did - I think we want this patch. > > e1000: account for NET_IP_ALIGN when calculating bufsiz > > Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to > reduce slab allocation by half. Could we please do whatever is needed to get this blessed and merged? This is such a common problem on such a common driver that I would suggest that we want this in 2.6.18.x as well. At least, I'd expect distributors to ship this fix (they're nuts if they don't) and so it makes sense to deliver it from kernel.org. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20 2006-09-23 4:50 ` Andrew Morton @ 2006-09-23 5:25 ` David Miller 2006-09-23 5:33 ` Andrew Morton 0 siblings, 1 reply; 11+ messages in thread From: David Miller @ 2006-09-23 5:25 UTC (permalink / raw) To: akpm Cc: auke-jan.h.kok, Holger.Kiehl, linux-kernel, linux-net, netdev, john.ronciak From: Andrew Morton <akpm@osdl.org> Date: Fri, 22 Sep 2006 21:50:00 -0700 > On Fri, 22 Sep 2006 10:10:36 -0700 > Auke Kok <auke-jan.h.kok@intel.com> wrote: > > > e1000: account for NET_IP_ALIGN when calculating bufsiz > > > > Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to > > reduce slab allocation by half. > > Could we please do whatever is needed to get this blessed and merged? This > is such a common problem on such a common driver that I would suggest that > we want this in 2.6.18.x as well. At least, I'd expect distributors to > ship this fix (they're nuts if they don't) and so it makes sense to deliver > it from kernel.org. The NET_IP_ALIGN existed not just for fun :) There are ramifications for removing it. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20 2006-09-23 5:25 ` David Miller @ 2006-09-23 5:33 ` Andrew Morton 2006-09-23 18:50 ` Auke Kok 2006-09-24 15:26 ` Evgeniy Polyakov 0 siblings, 2 replies; 11+ messages in thread From: Andrew Morton @ 2006-09-23 5:33 UTC (permalink / raw) To: David Miller Cc: auke-jan.h.kok, Holger.Kiehl, linux-kernel, linux-net, netdev, john.ronciak On Fri, 22 Sep 2006 22:25:07 -0700 (PDT) David Miller <davem@davemloft.net> wrote: > From: Andrew Morton <akpm@osdl.org> > Date: Fri, 22 Sep 2006 21:50:00 -0700 > > > On Fri, 22 Sep 2006 10:10:36 -0700 > > Auke Kok <auke-jan.h.kok@intel.com> wrote: > > > > > e1000: account for NET_IP_ALIGN when calculating bufsiz > > > > > > Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to > > > reduce slab allocation by half. > > > > Could we please do whatever is needed to get this blessed and merged? This > > is such a common problem on such a common driver that I would suggest that > > we want this in 2.6.18.x as well. At least, I'd expect distributors to > > ship this fix (they're nuts if they don't) and so it makes sense to deliver > > it from kernel.org. > > The NET_IP_ALIGN existed not just for fun :) There are ramifications > for removing it. It's still there, isn't it? For the 9k MTU case, for example, we end up allocating 16384 byte skbs instead of 32786 kbytes ones. diff -puN drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz drivers/net/e1000/e1000_main.c --- a/drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz +++ a/drivers/net/e1000/e1000_main.c @@ -1101,7 +1101,7 @@ e1000_sw_init(struct e1000_adapter *adap pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word); - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE; + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN; adapter->rx_ps_bsize0 = E1000_RXBUFFER_128; hw->max_frame_size = netdev->mtu + ENET_HEADER_SIZE + ETHERNET_FCS_SIZE; @@ -3163,26 +3163,27 @@ e1000_change_mtu(struct net_device *netd * larger slab size * i.e. RXBUFFER_2048 --> size-4096 slab */ - if (max_frame <= E1000_RXBUFFER_256) + if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256) adapter->rx_buffer_len = E1000_RXBUFFER_256; - else if (max_frame <= E1000_RXBUFFER_512) + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512) adapter->rx_buffer_len = E1000_RXBUFFER_512; - else if (max_frame <= E1000_RXBUFFER_1024) + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024) adapter->rx_buffer_len = E1000_RXBUFFER_1024; - else if (max_frame <= E1000_RXBUFFER_2048) + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048) adapter->rx_buffer_len = E1000_RXBUFFER_2048; - else if (max_frame <= E1000_RXBUFFER_4096) + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096) adapter->rx_buffer_len = E1000_RXBUFFER_4096; - else if (max_frame <= E1000_RXBUFFER_8192) + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192) adapter->rx_buffer_len = E1000_RXBUFFER_8192; - else if (max_frame <= E1000_RXBUFFER_16384) + else adapter->rx_buffer_len = E1000_RXBUFFER_16384; /* adjust allocation if LPE protects us, and we aren't using SBP */ if (!adapter->hw.tbi_compatibility_on && ((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) || (max_frame == MAXIMUM_ETHERNET_VLAN_SIZE))) - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE; + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + + NET_IP_ALIGN; netdev->mtu = new_mtu; @@ -4002,7 +4003,8 @@ e1000_alloc_rx_buffers(struct e1000_adap struct e1000_buffer *buffer_info; struct sk_buff *skb; unsigned int i; - unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN; + /* we have already accounted for NET_IP_ALIGN */ + unsigned int bufsz = adapter->rx_buffer_len; i = rx_ring->next_to_use; buffer_info = &rx_ring->buffer_info[i]; _ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20 2006-09-23 5:33 ` Andrew Morton @ 2006-09-23 18:50 ` Auke Kok 2006-09-23 20:03 ` David Miller 2006-09-24 15:26 ` Evgeniy Polyakov 1 sibling, 1 reply; 11+ messages in thread From: Auke Kok @ 2006-09-23 18:50 UTC (permalink / raw) To: Andrew Morton Cc: David Miller, Holger.Kiehl, linux-kernel, linux-net, netdev, john.ronciak Andrew Morton wrote: > On Fri, 22 Sep 2006 22:25:07 -0700 (PDT) > David Miller <davem@davemloft.net> wrote: > >> From: Andrew Morton <akpm@osdl.org> >> Date: Fri, 22 Sep 2006 21:50:00 -0700 >> >>> On Fri, 22 Sep 2006 10:10:36 -0700 >>> Auke Kok <auke-jan.h.kok@intel.com> wrote: >>> >>>> e1000: account for NET_IP_ALIGN when calculating bufsiz >>>> >>>> Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to >>>> reduce slab allocation by half. >>> Could we please do whatever is needed to get this blessed and merged? This >>> is such a common problem on such a common driver that I would suggest that >>> we want this in 2.6.18.x as well. At least, I'd expect distributors to >>> ship this fix (they're nuts if they don't) and so it makes sense to deliver >>> it from kernel.org. >> The NET_IP_ALIGN existed not just for fun :) There are ramifications >> for removing it. > > It's still there, isn't it? > > For the 9k MTU case, for example, we end up allocating 16384 byte skbs > instead of 32786 kbytes ones. yes, the only thing I'm doing is accounting for the 2 bytes one steap earlier. It works fine for the general case and I tested it too, but I am not too sure about the corner cases as the hardware has no notion of mtu at all and could possibly overwrite by two bytes. I think my patch actually give the hardware two bytes too much now, so we're on the other side (safe) of that problem, but I have to verify this first of course. I'll be wrestling this on monday with Jesse and try to nail it down. Auke > > > diff -puN drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz drivers/net/e1000/e1000_main.c > --- a/drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz > +++ a/drivers/net/e1000/e1000_main.c > @@ -1101,7 +1101,7 @@ e1000_sw_init(struct e1000_adapter *adap > > pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word); > > - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE; > + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN; > adapter->rx_ps_bsize0 = E1000_RXBUFFER_128; > hw->max_frame_size = netdev->mtu + > ENET_HEADER_SIZE + ETHERNET_FCS_SIZE; > @@ -3163,26 +3163,27 @@ e1000_change_mtu(struct net_device *netd > * larger slab size > * i.e. RXBUFFER_2048 --> size-4096 slab */ > > - if (max_frame <= E1000_RXBUFFER_256) > + if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256) > adapter->rx_buffer_len = E1000_RXBUFFER_256; > - else if (max_frame <= E1000_RXBUFFER_512) > + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512) > adapter->rx_buffer_len = E1000_RXBUFFER_512; > - else if (max_frame <= E1000_RXBUFFER_1024) > + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024) > adapter->rx_buffer_len = E1000_RXBUFFER_1024; > - else if (max_frame <= E1000_RXBUFFER_2048) > + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048) > adapter->rx_buffer_len = E1000_RXBUFFER_2048; > - else if (max_frame <= E1000_RXBUFFER_4096) > + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096) > adapter->rx_buffer_len = E1000_RXBUFFER_4096; > - else if (max_frame <= E1000_RXBUFFER_8192) > + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192) > adapter->rx_buffer_len = E1000_RXBUFFER_8192; > - else if (max_frame <= E1000_RXBUFFER_16384) > + else > adapter->rx_buffer_len = E1000_RXBUFFER_16384; > > /* adjust allocation if LPE protects us, and we aren't using SBP */ > if (!adapter->hw.tbi_compatibility_on && > ((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) || > (max_frame == MAXIMUM_ETHERNET_VLAN_SIZE))) > - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE; > + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + > + NET_IP_ALIGN; > > netdev->mtu = new_mtu; > > @@ -4002,7 +4003,8 @@ e1000_alloc_rx_buffers(struct e1000_adap > struct e1000_buffer *buffer_info; > struct sk_buff *skb; > unsigned int i; > - unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN; > + /* we have already accounted for NET_IP_ALIGN */ > + unsigned int bufsz = adapter->rx_buffer_len; > > i = rx_ring->next_to_use; > buffer_info = &rx_ring->buffer_info[i]; > _ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20 2006-09-23 18:50 ` Auke Kok @ 2006-09-23 20:03 ` David Miller 0 siblings, 0 replies; 11+ messages in thread From: David Miller @ 2006-09-23 20:03 UTC (permalink / raw) To: auke-jan.h.kok Cc: akpm, Holger.Kiehl, linux-kernel, linux-net, netdev, john.ronciak From: Auke Kok <auke-jan.h.kok@intel.com> Date: Sat, 23 Sep 2006 11:50:34 -0700 > Andrew Morton wrote: > > It's still there, isn't it? > > > > For the 9k MTU case, for example, we end up allocating 16384 byte skbs > > instead of 32786 kbytes ones. > > yes, the only thing I'm doing is accounting for the 2 bytes one steap earlier. Ok, I'm fine with this patch unless it causes some regression that hasn't been discovered yet :-) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20 2006-09-23 5:33 ` Andrew Morton 2006-09-23 18:50 ` Auke Kok @ 2006-09-24 15:26 ` Evgeniy Polyakov 2006-09-24 21:15 ` Auke Kok 1 sibling, 1 reply; 11+ messages in thread From: Evgeniy Polyakov @ 2006-09-24 15:26 UTC (permalink / raw) To: Andrew Morton Cc: David Miller, auke-jan.h.kok, Holger.Kiehl, linux-kernel, linux-net, netdev, john.ronciak On Fri, Sep 22, 2006 at 10:33:48PM -0700, Andrew Morton (akpm@osdl.org) wrote: > > The NET_IP_ALIGN existed not just for fun :) There are ramifications > > for removing it. > > It's still there, isn't it? > > For the 9k MTU case, for example, we end up allocating 16384 byte skbs > instead of 32786 kbytes ones. This patch will not help - netdev_alloc_skb() adds additional NET_SKB_PAD and then alloc_skb() adds sizeof(struct skb_shared_info). And even if you acconut for them in adapter->rx_buf_len, chip still can overwrite that area (in the thread mentioned in this e-mail thread before I posted such patch and received a dump of sizes chip receives - there were a lot of _different_ ones which were too close to the limit). > > diff -puN drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz drivers/net/e1000/e1000_main.c > --- a/drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz > +++ a/drivers/net/e1000/e1000_main.c > @@ -1101,7 +1101,7 @@ e1000_sw_init(struct e1000_adapter *adap > > pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word); > > - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE; > + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN; > adapter->rx_ps_bsize0 = E1000_RXBUFFER_128; > hw->max_frame_size = netdev->mtu + > ENET_HEADER_SIZE + ETHERNET_FCS_SIZE; > @@ -3163,26 +3163,27 @@ e1000_change_mtu(struct net_device *netd > * larger slab size > * i.e. RXBUFFER_2048 --> size-4096 slab */ > > - if (max_frame <= E1000_RXBUFFER_256) > + if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256) > adapter->rx_buffer_len = E1000_RXBUFFER_256; > - else if (max_frame <= E1000_RXBUFFER_512) > + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512) > adapter->rx_buffer_len = E1000_RXBUFFER_512; > - else if (max_frame <= E1000_RXBUFFER_1024) > + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024) > adapter->rx_buffer_len = E1000_RXBUFFER_1024; > - else if (max_frame <= E1000_RXBUFFER_2048) > + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048) > adapter->rx_buffer_len = E1000_RXBUFFER_2048; > - else if (max_frame <= E1000_RXBUFFER_4096) > + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096) > adapter->rx_buffer_len = E1000_RXBUFFER_4096; > - else if (max_frame <= E1000_RXBUFFER_8192) > + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192) > adapter->rx_buffer_len = E1000_RXBUFFER_8192; > - else if (max_frame <= E1000_RXBUFFER_16384) > + else > adapter->rx_buffer_len = E1000_RXBUFFER_16384; > > /* adjust allocation if LPE protects us, and we aren't using SBP */ > if (!adapter->hw.tbi_compatibility_on && > ((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) || > (max_frame == MAXIMUM_ETHERNET_VLAN_SIZE))) > - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE; > + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + > + NET_IP_ALIGN; > > netdev->mtu = new_mtu; > > @@ -4002,7 +4003,8 @@ e1000_alloc_rx_buffers(struct e1000_adap > struct e1000_buffer *buffer_info; > struct sk_buff *skb; > unsigned int i; > - unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN; > + /* we have already accounted for NET_IP_ALIGN */ > + unsigned int bufsz = adapter->rx_buffer_len; > > i = rx_ring->next_to_use; > buffer_info = &rx_ring->buffer_info[i]; > _ > > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.1[78] page allocation failure. order:3, mode:0x20 2006-09-24 15:26 ` Evgeniy Polyakov @ 2006-09-24 21:15 ` Auke Kok 0 siblings, 0 replies; 11+ messages in thread From: Auke Kok @ 2006-09-24 21:15 UTC (permalink / raw) To: Andrew Morton Cc: Evgeniy Polyakov, David Miller, Holger.Kiehl, linux-kernel, linux-net, netdev, john.ronciak Evgeniy Polyakov wrote: > On Fri, Sep 22, 2006 at 10:33:48PM -0700, Andrew Morton (akpm@osdl.org) wrote: >>> The NET_IP_ALIGN existed not just for fun :) There are ramifications >>> for removing it. >> It's still there, isn't it? >> >> For the 9k MTU case, for example, we end up allocating 16384 byte skbs >> instead of 32786 kbytes ones. > > This patch will not help - netdev_alloc_skb() adds additional > NET_SKB_PAD and then alloc_skb() adds sizeof(struct skb_shared_info). > And even if you acconut for them in adapter->rx_buf_len, chip still can > overwrite that area (in the thread mentioned in this e-mail thread > before I posted such patch and received a dump of sizes chip receives - > there were a lot of _different_ ones which were too close to the limit). I just did the math on it and it does not compute as I wanted too, we're basically flowing to the next larger buffersize 2 mtu bytes earlier, undoing any benefit completely. There is not much that can fix this issue since the hardware will always receive in 2-order buffers and dma that back in its entirity, so we must always claim size for NET_IP_ALIGN and NET_SKB_PAD after the 2-order bufsz. For the 9kb mtu case (16kb hw bufsz), we're stuck with 32kb slab allocations. bummer. Andrew, please drop this patch. Auke > >> diff -puN drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz drivers/net/e1000/e1000_main.c >> --- a/drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz >> +++ a/drivers/net/e1000/e1000_main.c >> @@ -1101,7 +1101,7 @@ e1000_sw_init(struct e1000_adapter *adap >> >> pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word); >> >> - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE; >> + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN; >> adapter->rx_ps_bsize0 = E1000_RXBUFFER_128; >> hw->max_frame_size = netdev->mtu + >> ENET_HEADER_SIZE + ETHERNET_FCS_SIZE; >> @@ -3163,26 +3163,27 @@ e1000_change_mtu(struct net_device *netd >> * larger slab size >> * i.e. RXBUFFER_2048 --> size-4096 slab */ >> >> - if (max_frame <= E1000_RXBUFFER_256) >> + if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256) >> adapter->rx_buffer_len = E1000_RXBUFFER_256; >> - else if (max_frame <= E1000_RXBUFFER_512) >> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512) >> adapter->rx_buffer_len = E1000_RXBUFFER_512; >> - else if (max_frame <= E1000_RXBUFFER_1024) >> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024) >> adapter->rx_buffer_len = E1000_RXBUFFER_1024; >> - else if (max_frame <= E1000_RXBUFFER_2048) >> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048) >> adapter->rx_buffer_len = E1000_RXBUFFER_2048; >> - else if (max_frame <= E1000_RXBUFFER_4096) >> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096) >> adapter->rx_buffer_len = E1000_RXBUFFER_4096; >> - else if (max_frame <= E1000_RXBUFFER_8192) >> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192) >> adapter->rx_buffer_len = E1000_RXBUFFER_8192; >> - else if (max_frame <= E1000_RXBUFFER_16384) >> + else >> adapter->rx_buffer_len = E1000_RXBUFFER_16384; >> >> /* adjust allocation if LPE protects us, and we aren't using SBP */ >> if (!adapter->hw.tbi_compatibility_on && >> ((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) || >> (max_frame == MAXIMUM_ETHERNET_VLAN_SIZE))) >> - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE; >> + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + >> + NET_IP_ALIGN; >> >> netdev->mtu = new_mtu; >> >> @@ -4002,7 +4003,8 @@ e1000_alloc_rx_buffers(struct e1000_adap >> struct e1000_buffer *buffer_info; >> struct sk_buff *skb; >> unsigned int i; >> - unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN; >> + /* we have already accounted for NET_IP_ALIGN */ >> + unsigned int bufsz = adapter->rx_buffer_len; >> >> i = rx_ring->next_to_use; >> buffer_info = &rx_ring->buffer_info[i]; >> _ >> >> - >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-09-24 21:15 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.64.0609220655550.13396@diagnostix.dwd.de>
2006-09-22 7:42 ` 2.6.1[78] page allocation failure. order:3, mode:0x20 Andrew Morton
2006-09-22 12:03 ` Holger Kiehl
2006-09-22 12:12 ` Evgeniy Polyakov
2006-09-22 17:10 ` Auke Kok
2006-09-23 4:50 ` Andrew Morton
2006-09-23 5:25 ` David Miller
2006-09-23 5:33 ` Andrew Morton
2006-09-23 18:50 ` Auke Kok
2006-09-23 20:03 ` David Miller
2006-09-24 15:26 ` Evgeniy Polyakov
2006-09-24 21:15 ` Auke Kok
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).