netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* bna alloc_pages() order 2 failure in bnad.c bnad_rxq_refill_page()
@ 2014-09-27 23:44 Eric Wheeler
  2014-09-29 16:28 ` Stephen Hemminger
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Wheeler @ 2014-09-27 23:44 UTC (permalink / raw)
  To: netdev; +Cc: rmody

Hello all,

We're using the 10gbe bna card and sometimes we get pages and pages of 
alloc_pages() failure backtraces like below.  (The maintainer 
rmody@brocade.com does not appear to have an active email at brocade, but 
cc'ing again just in case.)

It looks like bnad_rxq_refill_page() in bnad.c is allocating for the 
receive queue but fails.  We've already tried bumping vm.min_free_kbytes 
and vm.zone_reclaim_mode but it doesn't appear to help.

Suggestions?

Would it be appropriate to convert alloc_pages() to a mempool 
implementation?

-Eric

[135367.300669] swapper/5: page allocation failure: order:2, mode:0x4020
[135367.300903] CPU: 5 PID: 0 Comm: swapper/5 Tainted: GF          O 3.14.18 #1
[135367.301137] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0c 10/17/2013
[135367.301557]  0000000000000002 ffff88082fd43a78 ffffffff8159e704 0000000000000010
[135367.301986]  0000000000004020 ffff88082fd43b08 ffffffff8113b17b 0000000200000000
[135367.302417]  0000000000000000 00000000fffffffc 0000000000000004 0000003000000000
[135367.302858] Call Trace:
[135367.303077]  <IRQ> [<ffffffff8159e704>] dump_stack+0x49/0x5d
[135367.303310]  [<ffffffff8113b17b>] warn_alloc_failed+0xeb/0x150
[135367.303536]  [<ffffffff8113e2f0>] __alloc_pages_slowpath+0x4a0/0x7e0
[135367.303769]  [<ffffffff8113e8e2>] __alloc_pages_nodemask+0x2b2/0x2c0
[135367.303996]  [<ffffffff81181232>] alloc_pages_current+0xb2/0x170 
[135367.304226]  [<ffffffffa0239cb4>] bnad_rxq_refill_page+0x154/0x1e0 [bna] <---
[135367.304459]  [<ffffffffa023c282>] bnad_cq_process+0x462/0x840 [bna]
[135367.304921]  [<ffffffffa023c6af>] bnad_napi_poll_rx+0x4f/0xc0 [bna]
[135367.305144]  [<ffffffff814dcf4c>] net_rx_action+0xfc/0x280
[135367.305364]  [<ffffffff8105ba83>] __do_softirq+0xf3/0x2c0
[135367.305589]  [<ffffffff8105bd5d>] irq_exit+0xbd/0xd0
[135367.305811]  [<ffffffff815ae697>] do_IRQ+0x67/0x110
[135367.306036]  [<ffffffff815a39ad>] common_interrupt+0x6d/0x6d
[135367.306261]  <EOI> [<ffffffff81486275>] ? cpuidle_enter_state+0x55/0xd0
[135367.306501]  [<ffffffff8148626b>] ? cpuidle_enter_state+0x4b/0xd0
[135367.306719]  [<ffffffff814863b7>] cpuidle_idle_call+0xc7/0x160
[135367.306938]  [<ffffffff8100d73e>] arch_cpu_idle+0xe/0x30
[135367.307156]  [<ffffffff810b187e>] cpu_idle_loop+0x9e/0x250
[135367.307376]  [<ffffffff810b1aa0>] cpu_startup_entry+0x70/0x80
[135367.307598]  [<ffffffff810350d2>] start_secondary+0xd2/0xe0



--
Eric Wheeler, President           eWheeler, Inc. dba Global Linux Security
888-LINUX26 (888-546-8926)        Fax: 503-716-3878           PO Box 25107
www.GlobalLinuxSecurity.pro       Linux since 1996!     Portland, OR 97298

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bna alloc_pages() order 2 failure in bnad.c bnad_rxq_refill_page()
  2014-09-27 23:44 bna alloc_pages() order 2 failure in bnad.c bnad_rxq_refill_page() Eric Wheeler
@ 2014-09-29 16:28 ` Stephen Hemminger
  2014-09-30  5:23   ` Shahed Shaikh
  0 siblings, 1 reply; 16+ messages in thread
From: Stephen Hemminger @ 2014-09-29 16:28 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: netdev, rmody

On Sat, 27 Sep 2014 16:44:22 -0700 (PDT)
Eric Wheeler <nevdev@lists.ewheeler.net> wrote:

> Hello all,
> 
> We're using the 10gbe bna card and sometimes we get pages and pages of 
> alloc_pages() failure backtraces like below.  (The maintainer 
> rmody@brocade.com does not appear to have an active email at brocade, but 
> cc'ing again just in case.)
> 
> It looks like bnad_rxq_refill_page() in bnad.c is allocating for the 
> receive queue but fails.  We've already tried bumping vm.min_free_kbytes 
> and vm.zone_reclaim_mode but it doesn't appear to help.
> 
> Suggestions?
> 
> Would it be appropriate to convert alloc_pages() to a mempool 
> implementation?
> 
> -Eric

Brocade sold the NIC hardware business off to Qlogic.
I told them to update MAINTAINERS but they haven't submitted a patch yet.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: bna alloc_pages() order 2 failure in bnad.c bnad_rxq_refill_page()
  2014-09-29 16:28 ` Stephen Hemminger
@ 2014-09-30  5:23   ` Shahed Shaikh
  2014-10-07  1:57     ` [PATCH net] bna: page allocation during interrupts to use a mempool Eric Wheeler
  0 siblings, 1 reply; 16+ messages in thread
From: Shahed Shaikh @ 2014-09-30  5:23 UTC (permalink / raw)
  To: Stephen Hemminger, Eric Wheeler; +Cc: netdev, rmody@brocade.com, Rasesh Mody

+Rasesh, Maintainer of bna driver.

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Stephen Hemminger
> Sent: Monday, September 29, 2014 9:59 PM
> To: Eric Wheeler
> Cc: netdev; rmody@brocade.com
> Subject: Re: bna alloc_pages() order 2 failure in bnad.c
> bnad_rxq_refill_page()
>
> On Sat, 27 Sep 2014 16:44:22 -0700 (PDT) Eric Wheeler
> <nevdev@lists.ewheeler.net> wrote:
>
> > Hello all,
> >
> > We're using the 10gbe bna card and sometimes we get pages and pages of
> > alloc_pages() failure backtraces like below.  (The maintainer
> > rmody@brocade.com does not appear to have an active email at brocade,
> > but cc'ing again just in case.)
> >
> > It looks like bnad_rxq_refill_page() in bnad.c is allocating for the
> > receive queue but fails.  We've already tried bumping
> > vm.min_free_kbytes and vm.zone_reclaim_mode but it doesn't appear to
> help.
> >
> > Suggestions?
> >
> > Would it be appropriate to convert alloc_pages() to a mempool
> > implementation?
> >
> > -Eric
>
> Brocade sold the NIC hardware business off to Qlogic.
> I told them to update MAINTAINERS but they haven't submitted a patch yet.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

________________________________

This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net] bna: page allocation during interrupts to use a mempool.
  2014-10-07  1:57     ` [PATCH net] bna: page allocation during interrupts to use a mempool Eric Wheeler
@ 2014-10-07  1:52       ` Eric Dumazet
  2014-10-07  3:05         ` Eric Wheeler
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2014-10-07  1:52 UTC (permalink / raw)
  To: Eric Wheeler
  Cc: Shahed Shaikh, Stephen Hemminger, netdev, rmody@brocade.com,
	Rasesh Mody

On Mon, 2014-10-06 at 18:57 -0700, Eric Wheeler wrote:
> This patch fixes an order:2 memory allocation error backtrace by 
> guaranteeing that memory is available during simultaneous high memory 
> pressure and packet rates when using 9k jumbo frames.
> 
> Tests between two systems (one patched, one not) succeeded with ~1TB of 
> data transferred over DRBD.  As expected, the unpatched host gave 
> warn_alloc_failed's, and the patched host worked correctly.  This patch 
> increases kernel memory usage by 32 order-2 allocation when this module is 
> loaded (512k on x86) which should be negligible on hosts that use 10GbE 
> cards.

This is highly suspect to me.

Most likely yet another truesize lie.

At a first glance, bnad_cq_setup_skb_frags() is buggy here :

skb->truesize += totlen;

With this kind of lies, system can OOM very fast.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH net] bna: page allocation during interrupts to use a mempool.
  2014-09-30  5:23   ` Shahed Shaikh
@ 2014-10-07  1:57     ` Eric Wheeler
  2014-10-07  1:52       ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Wheeler @ 2014-10-07  1:57 UTC (permalink / raw)
  To: Shahed Shaikh; +Cc: Stephen Hemminger, netdev, rmody@brocade.com, Rasesh Mody

This patch fixes an order:2 memory allocation error backtrace by 
guaranteeing that memory is available during simultaneous high memory 
pressure and packet rates when using 9k jumbo frames.

Tests between two systems (one patched, one not) succeeded with ~1TB of 
data transferred over DRBD.  As expected, the unpatched host gave 
warn_alloc_failed's, and the patched host worked correctly.  This patch 
increases kernel memory usage by 32 order-2 allocation when this module is 
loaded (512k on x86) which should be negligible on hosts that use 10GbE 
cards.

Fixes:
[<ffffffff8113b17b>] warn_alloc_failed+0xeb/0x150
[<ffffffff8113e2f0>] __alloc_pages_slowpath+0x4a0/0x7e0
[<ffffffff8113e8e2>] __alloc_pages_nodemask+0x2b2/0x2c0
[<ffffffff81181232>] alloc_pages_current+0xb2/0x170
[<ffffffffa0239cb4>] bnad_rxq_refill_page+0x154/0x1e0 [bna]
[<ffffffffa023c282>] bnad_cq_process+0x462/0x840 [bna]
[<ffffffffa023c6af>] bnad_napi_poll_rx+0x4f/0xc0 [bna]
[<ffffffff814dcf4c>] net_rx_action+0xfc/0x280
[<ffffffff8105ba83>] __do_softirq+0xf3/0x2c0
[<ffffffff8105bd5d>] irq_exit+0xbd/0xd0
[<ffffffff815ae697>] do_IRQ+0x67/0x110
[<ffffffff815a39ad>] common_interrupt+0x6d/0x6d
<EOI>
[<ffffffff81486275>] ? cpuidle_enter_state+0x55/0xd0
[<ffffffff8148626b>] ? cpuidle_enter_state+0x4b/0xd0
[<ffffffff814863b7>] cpuidle_idle_call+0xc7/0x160
[<ffffffff8100d73e>] arch_cpu_idle+0xe/0x30
[<ffffffff810b187e>] cpu_idle_loop+0x9e/0x250
[<ffffffff810b1aa0>] cpu_startup_entry+0x70/0x80
[<ffffffff810350d2>] start_secondary+0xd2/0xe0

Signed-off-by: Eric Wheeler <netdev@lists.ewheeler.net>
---
  drivers/net/ethernet/brocade/bna/bnad.c |   27 ++++++++++++++++++++++++---
  1 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index 3a77f9e..8906ad1 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -26,6 +26,7 @@
  #include <linux/ip.h>
  #include <linux/prefetch.h>
  #include <linux/module.h>
+#include <linux/mempool.h>

  #include "bnad.h"
  #include "bna.h"
@@ -58,6 +59,8 @@ static struct mutex bnad_list_mutex;
  static LIST_HEAD(bnad_list);
  static const u8 bnad_bcast_addr[] =  {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};

+static mempool_t *rxq_mempool_o2 = NULL;
+
  /*
   * Local MACROS
   */
@@ -321,7 +324,7 @@ bnad_rxq_cleanup_page(struct bnad *bnad, struct bnad_rx_unmap *unmap)
  	dma_unmap_page(&bnad->pcidev->dev,
  			dma_unmap_addr(&unmap->vector, dma_addr),
  			unmap->vector.len, DMA_FROM_DEVICE);
-	put_page(unmap->page);
+	mempool_free(unmap->page, rxq_mempool_o2);
  	unmap->page = NULL;
  	dma_unmap_addr_set(&unmap->vector, dma_addr, 0);
  	unmap->vector.len = 0;
@@ -380,8 +383,11 @@ bnad_rxq_refill_page(struct bnad *bnad, struct bna_rcb *rcb, u32 nalloc)
  		unmap = &unmap_q->unmap[prod];

  		if (unmap_q->reuse_pi < 0) {
-			page = alloc_pages(GFP_ATOMIC | __GFP_COMP,
-					unmap_q->alloc_order);
+			if (unmap_q->alloc_order == 2)
+				page = mempool_alloc(rxq_mempool_o2, GFP_ATOMIC | __GFP_COMP);
+			else
+				page = alloc_pages(GFP_ATOMIC | __GFP_COMP,
+						unmap_q->alloc_order);
  			page_offset = 0;
  		} else {
  			prev = &unmap_q->unmap[unmap_q->reuse_pi];
@@ -3861,6 +3867,16 @@ static struct pci_driver bnad_pci_driver = {
  	.remove = bnad_pci_remove,
  };

+void *bnad_rxq_mempool_alloc_o2(gfp_t gfp_mask, void *pool_data)
+{
+	return (void*) alloc_pages(gfp_mask, 2);
+}
+
+void bnad_rxq_mempool_free_o2(void *page, void *pool_data)
+{
+	put_page((struct page*)page);
+}
+
  static int __init
  bnad_module_init(void)
  {
@@ -3878,6 +3894,10 @@ bnad_module_init(void)
  		return err;
  	}

+	rxq_mempool_o2 = mempool_create(32, /* how many do we really need? */
+		bnad_rxq_mempool_alloc_o2,
+		bnad_rxq_mempool_free_o2, NULL);
+
  	return 0;
  }

@@ -3886,6 +3906,7 @@ bnad_module_exit(void)
  {
  	pci_unregister_driver(&bnad_pci_driver);
  	release_firmware(bfi_fw);
+	mempool_destroy(rxq_mempool_o2);
  }

  module_init(bnad_module_init);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net] bna: page allocation during interrupts to use a mempool.
  2014-10-07  1:52       ` Eric Dumazet
@ 2014-10-07  3:05         ` Eric Wheeler
  2014-10-07  3:15           ` Eric Dumazet
  2014-10-07  4:04           ` Eric Dumazet
  0 siblings, 2 replies; 16+ messages in thread
From: Eric Wheeler @ 2014-10-07  3:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Shahed Shaikh, Stephen Hemminger, netdev, Rasesh Mody

On Mon, 6 Oct 2014, Eric Dumazet wrote:
> On Mon, 2014-10-06 at 18:57 -0700, Eric Wheeler wrote:
>> This patch fixes an order:2 memory allocation error backtrace by
>> guaranteeing that memory is available during simultaneous high memory
>> pressure and packet rates when using 9k jumbo frames.
>
> This is highly suspect to me.
> Most likely yet another truesize lie.
> At a first glance, bnad_cq_setup_skb_frags() is buggy here :
> skb->truesize += totlen;

skb->truesize wasn't part of my patch, can you explain in more detail what 
you suggest a better fix might be? If you write a quick patch I can test 
it.

The patch in question implements a simple mempool for the interrupt page 
allocs---which definitely fixes the problem even if a better solution 
might exist.  I have no problem giving up a small amount of memory to 
guarantee page allocs in the interrupt handler.

It would be great to see this patch pushed through since it does fix the 
problem---at least until we can come up with a better fix.  I'm happy to 
test if you can send a patch.

-Eric

>
> With this kind of lies, system can OOM very fast.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net] bna: page allocation during interrupts to use a mempool.
  2014-10-07  3:05         ` Eric Wheeler
@ 2014-10-07  3:15           ` Eric Dumazet
  2014-10-08  0:48             ` Eric Wheeler
  2014-10-07  4:04           ` Eric Dumazet
  1 sibling, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2014-10-07  3:15 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Shahed Shaikh, Stephen Hemminger, netdev, Rasesh Mody

On Mon, 2014-10-06 at 20:05 -0700, Eric Wheeler wrote:
> On Mon, 6 Oct 2014, Eric Dumazet wrote:
> > On Mon, 2014-10-06 at 18:57 -0700, Eric Wheeler wrote:
> >> This patch fixes an order:2 memory allocation error backtrace by
> >> guaranteeing that memory is available during simultaneous high memory
> >> pressure and packet rates when using 9k jumbo frames.
> >
> > This is highly suspect to me.
> > Most likely yet another truesize lie.
> > At a first glance, bnad_cq_setup_skb_frags() is buggy here :
> > skb->truesize += totlen;
> 
> skb->truesize wasn't part of my patch, can you explain in more detail what 
> you suggest a better fix might be? If you write a quick patch I can test 
> it.
> 
> The patch in question implements a simple mempool for the interrupt page 
> allocs---which definitely fixes the problem even if a better solution 
> might exist.  I have no problem giving up a small amount of memory to 
> guarantee page allocs in the interrupt handler.
> 
> It would be great to see this patch pushed through since it does fix the 
> problem---at least until we can come up with a better fix.  I'm happy to 
> test if you can send a patch.

It seems many drivers make this assumption that a frame of 1000 bytes
consumes 1000 bytes of memory.

Reality is that driver allocated more memory, because it can not predict
how many bytes are going to be received from the network.

By lying on skb->truesize (underestimating real memory cost), this
prevents networking stack making appropriate memory checks.

Here, it seems clear to me the following fix is needed, at very minimum.

And it might be that something better is needed :  MTU=9000 might force
the driver to allocate 16384 bytes per frame, not 9018

So it is possible that unmap->vector.len needs to be changed to the real
size of memory region (for example : PAGE_SIZE << 2)


diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index ffc92a41d75be550d27698af6ca3e600d9a146fe..ce867219e2ceaf33b17595b67bae99e964d5a6b6 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -550,6 +550,7 @@ bnad_cq_setup_skb_frags(struct bna_rcb *rcb, struct sk_buff *skb,
 				dma_unmap_addr(&unmap->vector, dma_addr),
 				unmap->vector.len, DMA_FROM_DEVICE);
 
+		skb->truesize += unmap->vector.len;
 		len = (vec == nvecs) ?
 			last_fraglen : unmap->vector.len;
 		totlen += len;
@@ -563,7 +564,6 @@ bnad_cq_setup_skb_frags(struct bna_rcb *rcb, struct sk_buff *skb,
 
 	skb->len += totlen;
 	skb->data_len += totlen;
-	skb->truesize += totlen;
 }
 
 static inline void

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net] bna: page allocation during interrupts to use a mempool.
  2014-10-07  3:05         ` Eric Wheeler
  2014-10-07  3:15           ` Eric Dumazet
@ 2014-10-07  4:04           ` Eric Dumazet
  1 sibling, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2014-10-07  4:04 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Shahed Shaikh, Stephen Hemminger, netdev, Rasesh Mody

On Mon, 2014-10-06 at 20:05 -0700, Eric Wheeler wrote:

> The patch in question implements a simple mempool for the interrupt page 
> allocs---which definitely fixes the problem even if a better solution 
> might exist.  I have no problem giving up a small amount of memory to 
> guarantee page allocs in the interrupt handler.

You have no such guarantee.

Once page is consumed by some networking frame, this frame can be
sitting in some socket receive queue. page cant be reused by the driver.

If you receive (small) burst of 100 frames, your 32 mempool will be
depleted anyway.

Even if you use a mempool with 10000 elements, there is no guarantee, it
really depends on the number of sockets and their SO_RCVBUF limits.

If your NIC depends of having order-2 pages available for its operation,
then I am afraid it cannot possibly work.

Memory will be eventually fragmented and order-2 allocations fail.

Run your patch with 100 concurrent TCP flows, add some losses to force
usage of out or order queue, you'll get errors quite fast.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net] bna: page allocation during interrupts to use a mempool.
  2014-10-07  3:15           ` Eric Dumazet
@ 2014-10-08  0:48             ` Eric Wheeler
  2014-10-08  1:28               ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Wheeler @ 2014-10-08  0:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Shahed Shaikh, Stephen Hemminger, netdev, Rasesh Mody

On Mon, 6 Oct 2014, Eric Dumazet wrote:
> On Mon, 2014-10-06 at 20:05 -0700, Eric Wheeler wrote:
>> On Mon, 6 Oct 2014, Eric Dumazet wrote:
>>> On Mon, 2014-10-06 at 18:57 -0700, Eric Wheeler wrote:
>>>> This patch fixes an order:2 memory allocation error backtrace by
>>>> guaranteeing that memory is available during simultaneous high memory
>>>> pressure and packet rates when using 9k jumbo frames.
>>>
>>> This is highly suspect to me.
>>> Most likely yet another truesize lie.
>>> At a first glance, bnad_cq_setup_skb_frags() is buggy here :
>>> skb->truesize += totlen;
>
> It seems many drivers make this assumption that a frame of 1000 bytes
> consumes 1000 bytes of memory.
>
> Reality is that driver allocated more memory, because it can not predict
> how many bytes are going to be received from the network.
>
> By lying on skb->truesize (underestimating real memory cost), this
> prevents networking stack making appropriate memory checks.
>
> Here, it seems clear to me the following fix is needed, at very minimum.
>
> And it might be that something better is needed :  MTU=9000 might force
> the driver to allocate 16384 bytes per frame, not 9018
>
> So it is possible that unmap->vector.len needs to be changed to the real
> size of memory region (for example : PAGE_SIZE << 2)

Just += unmap->vector.len still did not work (same backtrace), so I've 
rebuilt with PAGE_SIZE<<2 and so far so good.  I'll let it run all night 
and see if we get any problems.

-Eric

--
Eric Wheeler, President           eWheeler, Inc. dba Global Linux Security
888-LINUX26 (888-546-8926)        Fax: 503-716-3878           PO Box 25107
www.GlobalLinuxSecurity.pro       Linux since 1996!     Portland, OR 97298

>
>
> diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
> index ffc92a41d75be550d27698af6ca3e600d9a146fe..ce867219e2ceaf33b17595b67bae99e964d5a6b6 100644
> --- a/drivers/net/ethernet/brocade/bna/bnad.c
> +++ b/drivers/net/ethernet/brocade/bna/bnad.c
> @@ -550,6 +550,7 @@ bnad_cq_setup_skb_frags(struct bna_rcb *rcb, struct sk_buff *skb,
> 				dma_unmap_addr(&unmap->vector, dma_addr),
> 				unmap->vector.len, DMA_FROM_DEVICE);
>
> +		skb->truesize += unmap->vector.len;
> 		len = (vec == nvecs) ?
> 			last_fraglen : unmap->vector.len;
> 		totlen += len;
> @@ -563,7 +564,6 @@ bnad_cq_setup_skb_frags(struct bna_rcb *rcb, struct sk_buff *skb,
>
> 	skb->len += totlen;
> 	skb->data_len += totlen;
> -	skb->truesize += totlen;
> }
>
> static inline void
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net] bna: page allocation during interrupts to use a mempool.
  2014-10-08  0:48             ` Eric Wheeler
@ 2014-10-08  1:28               ` Eric Dumazet
  2014-10-08 19:01                 ` Eric Wheeler
                                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Eric Dumazet @ 2014-10-08  1:28 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Shahed Shaikh, Stephen Hemminger, netdev, Rasesh Mody

On Tue, 2014-10-07 at 17:48 -0700, Eric Wheeler wrote:

> Just += unmap->vector.len still did not work (same backtrace), so I've 
> rebuilt with PAGE_SIZE<<2 and so far so good.  I'll let it run all night 
> and see if we get any problems.

Further inspection of the driver told me that unmap->vector.len should
be 16384 already. (same than PAGE_SIZE << 2)
(set at line 304, drivers/net/ethernet/brocade/bna/bnad.c)

So you might hit memory fragmentation issues.

Do you have CONFIG_COMPACTION=y in your .config ?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net] bna: page allocation during interrupts to use a mempool.
  2014-10-08  1:28               ` Eric Dumazet
@ 2014-10-08 19:01                 ` Eric Wheeler
  2014-10-08 19:04                   ` Eric Dumazet
  2014-10-08 19:07                 ` Eric Wheeler
  2014-10-08 21:28                 ` Rasesh Mody
  2 siblings, 1 reply; 16+ messages in thread
From: Eric Wheeler @ 2014-10-08 19:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Shahed Shaikh, Stephen Hemminger, netdev, Rasesh Mody

On Tue, 7 Oct 2014, Eric Dumazet wrote:

> On Tue, 2014-10-07 at 17:48 -0700, Eric Wheeler wrote:
>
>> Just += unmap->vector.len still did not work (same backtrace), so I've
>> rebuilt with PAGE_SIZE<<2 and so far so good.  I'll let it run all night
>> and see if we get any problems.
>
> Further inspection of the driver told me that unmap->vector.len should
> be 16384 already. (same than PAGE_SIZE << 2)
> (set at line 304, drivers/net/ethernet/brocade/bna/bnad.c)
>
> So you might hit memory fragmentation issues.
>
> Do you have CONFIG_COMPACTION=y in your .config ?

We're still having the backtrace.

Here's nstat -a :

IpInReceives                    305249739          0.0
IpInAddrErrors                  1100               0.0
IpInDelivers                    305232139          0.0
IpOutRequests                   256712718          0.0
IpOutNoRoutes                   197                0.0
IcmpInErrors                    10241              0.0
IcmpInCsumErrors                4200               0.0
IcmpInTimeExcds                 9753               0.0
IcmpInEchoReps                  247                0.0
IcmpInTimestamps                241                0.0
IcmpOutErrors                   10325              0.0
IcmpOutTimeExcds                9792               0.0
IcmpOutEchoReps                 286                0.0
IcmpOutTimestamps               247                0.0
IcmpMsgInType0                  241                0.0
IcmpMsgInType3                  9753               0.0
IcmpMsgInType8                  247                0.0
IcmpMsgOutType0                 247                0.0
IcmpMsgOutType3                 9792               0.0
IcmpMsgOutType8                 286                0.0
TcpActiveOpens                  9585               0.0
TcpPassiveOpens                 1233               0.0
TcpAttemptFails                 8119               0.0
TcpEstabResets                  165                0.0
TcpInSegs                       305269357          0.0
TcpOutSegs                      745563105          0.0
TcpRetransSegs                  9118               0.0
TcpOutRsts                      74110              0.0
UdpInDatagrams                  81957              0.0
UdpOutDatagrams                 141536             0.0
Ip6InReceives                   348168             0.0
Ip6InNoRoutes                   1                  0.0
Ip6InDelivers                   303971             0.0
Ip6OutRequests                  168344             0.0
Ip6InMcastPkts                  183943             0.0
Ip6OutMcastPkts                 7976               0.0
Ip6InOctets                     867026250          0.0
Ip6OutOctets                    843515112          0.0
Ip6InMcastOctets                23885658           0.0
Ip6OutMcastOctets               733912             0.0
Ip6InNoECTPkts                  348168             0.0
Icmp6InMsgs                     139747             0.0
Icmp6OutMsgs                    4120               0.0
Icmp6InGroupMembQueries         3306               0.0
Icmp6InRouterAdvertisements     135646             0.0
Icmp6InNeighborAdvertisements   795                0.0
Icmp6OutRouterSolicits          187                0.0
Icmp6OutNeighborSolicits        77                 0.0
Icmp6OutMLDv2Reports            3856               0.0
Icmp6InType130                  3306               0.0
Icmp6InType134                  135646             0.0
Icmp6InType136                  795                0.0
Icmp6OutType133                 187                0.0
Icmp6OutType135                 77                 0.0
Icmp6OutType143                 3856               0.0
TcpExtSyncookiesFailed          5                  0.0
TcpExtPruneCalled               3694               0.0
TcpExtTW                        882                0.0
TcpExtPAWSEstab                 3                  0.0
TcpExtDelayedACKs               974539             0.0
TcpExtDelayedACKLocked          21365              0.0
TcpExtDelayedACKLost            2071               0.0
TcpExtTCPPrequeued              84806172           0.0
TcpExtTCPDirectCopyFromBacklog  687333144          0.0
TcpExtTCPDirectCopyFromPrequeue 2733255078         0.0
TcpExtTCPPrequeueDropped        915                0.0
TcpExtTCPHPHits                 85972078           0.0
TcpExtTCPHPHitsToUser           790722             0.0
TcpExtTCPPureAcks               119419354          0.0
TcpExtTCPHPAcks                 72305521           0.0
TcpExtTCPRenoRecovery           36                 0.0
TcpExtTCPSackRecovery           431                0.0
TcpExtTCPFACKReorder            8                  0.0
TcpExtTCPSACKReorder            51                 0.0
TcpExtTCPTSReorder              41                 0.0
TcpExtTCPFullUndo               67                 0.0
TcpExtTCPPartialUndo            62                 0.0
TcpExtTCPDSACKUndo              176                0.0
TcpExtTCPLossUndo               11                 0.0
TcpExtTCPLostRetransmit         10                 0.0
TcpExtTCPSackFailures           6                  0.0
TcpExtTCPFastRetrans            715                0.0
TcpExtTCPForwardRetrans         411                0.0
TcpExtTCPSlowStartRetrans       495                0.0
TcpExtTCPTimeouts               3517               0.0
TcpExtTCPLossProbes             2143               0.0
TcpExtTCPLossProbeRecovery      908                0.0
TcpExtTCPRenoRecoveryFail       3                  0.0
TcpExtTCPSchedulerFailed        245                0.0
TcpExtTCPRcvCollapsed           21549              0.0
TcpExtTCPDSACKOldSent           1891               0.0
TcpExtTCPDSACKRecv              1662               0.0
TcpExtTCPAbortOnData            28                 0.0
TcpExtTCPAbortOnClose           242                0.0
TcpExtTCPAbortOnTimeout         18                 0.0
TcpExtTCPDSACKIgnoredNoUndo     928                0.0
TcpExtTCPSpuriousRTOs           5                  0.0
TcpExtTCPSackShifted            479                0.0
TcpExtTCPSackMerged             498                0.0
TcpExtTCPSackShiftFallback      2934               0.0
TcpExtIPReversePathFilter       9407               0.0
TcpExtTCPRetransFail            582                0.0
TcpExtTCPRcvCoalesce            23709443           0.0
TcpExtTCPOFOQueue               19162              0.0
TcpExtTCPChallengeACK           8098               0.0
TcpExtTCPSpuriousRtxHostQueues  2932               0.0
TcpExtTCPAutoCorking            56870637           0.0
IpExtInMcastPkts                87288              0.0
IpExtOutMcastPkts               146507             0.0
IpExtInBcastPkts                16682              0.0
IpExtInOctets                   54665312000        0.0
IpExtOutOctets                  3365913113553      0.0
IpExtInMcastOctets              2765660            0.0
IpExtOutMcastOctets             4688240            0.0
IpExtInBcastOctets              3651661            0.0
IpExtInNoECTPkts                313364477          0.0




--
Eric Wheeler, President           eWheeler, Inc. dba Global Linux Security
888-LINUX26 (888-546-8926)        Fax: 503-716-3878           PO Box 25107
www.GlobalLinuxSecurity.pro       Linux since 1996!     Portland, OR 97298

>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net] bna: page allocation during interrupts to use a mempool.
  2014-10-08 19:01                 ` Eric Wheeler
@ 2014-10-08 19:04                   ` Eric Dumazet
  2014-10-09  0:46                     ` Eric Wheeler
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2014-10-08 19:04 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Shahed Shaikh, Stephen Hemminger, netdev, Rasesh Mody

On Wed, 2014-10-08 at 12:01 -0700, Eric Wheeler wrote:
> On Tue, 7 Oct 2014, Eric Dumazet wrote:
> 
> > On Tue, 2014-10-07 at 17:48 -0700, Eric Wheeler wrote:
> >
> >> Just += unmap->vector.len still did not work (same backtrace), so I've
> >> rebuilt with PAGE_SIZE<<2 and so far so good.  I'll let it run all night
> >> and see if we get any problems.
> >
> > Further inspection of the driver told me that unmap->vector.len should
> > be 16384 already. (same than PAGE_SIZE << 2)
> > (set at line 304, drivers/net/ethernet/brocade/bna/bnad.c)
> >
> > So you might hit memory fragmentation issues.
> >
> > Do you have CONFIG_COMPACTION=y in your .config ?
> 
> We're still having the backtrace.

What is the output of 

free
cat /proc/sys/vm/min_free_kbytes
cat /proc/buddyinfo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net] bna: page allocation during interrupts to use a mempool.
  2014-10-08  1:28               ` Eric Dumazet
  2014-10-08 19:01                 ` Eric Wheeler
@ 2014-10-08 19:07                 ` Eric Wheeler
  2014-10-08 21:28                 ` Rasesh Mody
  2 siblings, 0 replies; 16+ messages in thread
From: Eric Wheeler @ 2014-10-08 19:07 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Shahed Shaikh, Stephen Hemminger, netdev, Rasesh Mody

On Tue, 7 Oct 2014, Eric Dumazet wrote:

> On Tue, 2014-10-07 at 17:48 -0700, Eric Wheeler wrote:
>
>> Just += unmap->vector.len still did not work (same backtrace), so I've
>> rebuilt with PAGE_SIZE<<2 and so far so good.  I'll let it run all night
>> and see if we get any problems.
>
> Further inspection of the driver told me that unmap->vector.len should
> be 16384 already. (same than PAGE_SIZE << 2)
> (set at line 304, drivers/net/ethernet/brocade/bna/bnad.c)
>
> So you might hit memory fragmentation issues.
>
> Do you have CONFIG_COMPACTION=y in your .config ?

yes.

-e

>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH net] bna: page allocation during interrupts to use a mempool.
  2014-10-08  1:28               ` Eric Dumazet
  2014-10-08 19:01                 ` Eric Wheeler
  2014-10-08 19:07                 ` Eric Wheeler
@ 2014-10-08 21:28                 ` Rasesh Mody
  2014-10-09  0:50                   ` Eric Wheeler
  2 siblings, 1 reply; 16+ messages in thread
From: Rasesh Mody @ 2014-10-08 21:28 UTC (permalink / raw)
  To: Eric Dumazet, Eric Wheeler; +Cc: Shahed Shaikh, Stephen Hemminger, netdev

Hi Eric Wheeler, Eric Dumazet,

We'll try to reproduce the issue in-house to find more about the root cause of the failure and work on possible solution.

Thanks,
-Rasesh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net] bna: page allocation during interrupts to use a mempool.
  2014-10-08 19:04                   ` Eric Dumazet
@ 2014-10-09  0:46                     ` Eric Wheeler
  0 siblings, 0 replies; 16+ messages in thread
From: Eric Wheeler @ 2014-10-09  0:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Shahed Shaikh, Stephen Hemminger, netdev, Rasesh Mody

>>> Further inspection of the driver told me that unmap->vector.len should
>>> be 16384 already. (same than PAGE_SIZE << 2)
>>> (set at line 304, drivers/net/ethernet/brocade/bna/bnad.c)
>>>
>>> So you might hit memory fragmentation issues.
>>>
>>> Do you have CONFIG_COMPACTION=y in your .config ?
>>
>> We're still having the backtrace.
>
> What is the output of
>
> free
> cat /proc/sys/vm/min_free_kbytes
> cat /proc/buddyinfo


[root@hv2 ~]# cat /proc/sys/vm/min_free_kbytes
262144

[root@hv2 ~]# cat /proc/buddyinfo
Node 0, zone      DMA      1      2      1      1      1      1      1      0      1      1      3
Node 0, zone    DMA32   2221  28306    771   2964    148      9      4      1      0      0      1
Node 0, zone   Normal 956207  99272      0      0      0      0      0      0      0      0      1

[root@hv2 ~]# free
              total       used       free     shared    buffers     cached
Mem:      33051552   28058192    4993360          0    3058040     158224
-/+ buffers/cache:   24841928    8209624
Swap:      8388604      21540    8367064

[root@hv2 ~]# free -m
              total       used       free     shared    buffers     cached
Mem:         32276      27401       4875          0       2986        154
-/+ buffers/cache:      24260       8016
Swap:         8191         21       8170

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH net] bna: page allocation during interrupts to use a mempool.
  2014-10-08 21:28                 ` Rasesh Mody
@ 2014-10-09  0:50                   ` Eric Wheeler
  0 siblings, 0 replies; 16+ messages in thread
From: Eric Wheeler @ 2014-10-09  0:50 UTC (permalink / raw)
  To: Rasesh Mody; +Cc: Eric Dumazet, Shahed Shaikh, Stephen Hemminger, netdev

On Wed, 8 Oct 2014, Rasesh Mody wrote:

> Hi Eric Wheeler, Eric Dumazet,
>
> We'll try to reproduce the issue in-house to find more about the root 
> cause of the failure and work on possible solution.

Thanks Rasesh,

FYI, we're using the Ethernet part this card (no FCoE use here); I believe 
it is the 1020 version:

02:00.0 Fibre Channel: Brocade Communications Systems, Inc. 1010/1020/1007/1741 10Gbps CNA (rev 01)
02:00.1 Fibre Channel: Brocade Communications Systems, Inc. 1010/1020/1007/1741 10Gbps CNA (rev 01)
02:00.2 Ethernet controller: Brocade Communications Systems, Inc. 1010/1020/1007/1741 10Gbps CNA (rev 01)
02:00.3 Ethernet controller: Brocade Communications Systems, Inc. 1010/1020/1007/1741 10Gbps CNA (rev 01)


02:00.2 Ethernet controller: Brocade Communications Systems, Inc. 1010/1020/1007/1741 10Gbps CNA (rev 01)
         Subsystem: Brocade Communications Systems, Inc. 1010/1020/1007/1741 10Gbps CNA - LL
         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
         Latency: 0, Cache Line Size: 64 bytes
         Interrupt: pin A routed to IRQ 19
         Region 0: Memory at df540000 (64-bit, non-prefetchable) [size=256K]
         Region 2: Memory at df604000 (64-bit, non-prefetchable) [size=16K]
         Expansion ROM at df200000 [disabled] [size=1M]
         Capabilities: [40] Power Management version 3
                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
         Capabilities: [50] MSI-X: Enable+ Count=256 Masked-
                 Vector table: BAR=2 offset=00000000
                 PBA: BAR=2 offset=00002000
         Capabilities: [60] Express (v2) Endpoint, MSI 00
                 DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <256ns, L1 <64us
                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                         MaxPayload 128 bytes, MaxReadReq 512 bytes
                 DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
                 LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s, Latency L0 <256ns, L1 <64us
                         ClockPM- Surprise- LLActRep- BwNot-
                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                 LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                 DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                 LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                          EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
         Capabilities: [a0] Vital Product Data
                 No end tag found
         Capabilities: [100 v1] Advanced Error Reporting
                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                 CESta:  RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                 AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
         Capabilities: [180 v1] Power Budgeting <?>
         Capabilities: [190 v1] Alternative Routing-ID Interpretation (ARI)
                 ARICap: MFVC- ACS-, Next Function: 3
                 ARICtl: MFVC- ACS-, Function Group: 0
         Kernel driver in use: bna
         Kernel modules: bna


--
Eric Wheeler, President           eWheeler, Inc. dba Global Linux Security
888-LINUX26 (888-546-8926)        Fax: 503-716-3878           PO Box 25107
www.GlobalLinuxSecurity.pro       Linux since 1996!     Portland, OR 97298


>
> Thanks,
> -Rasesh

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-10-09  0:34 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-27 23:44 bna alloc_pages() order 2 failure in bnad.c bnad_rxq_refill_page() Eric Wheeler
2014-09-29 16:28 ` Stephen Hemminger
2014-09-30  5:23   ` Shahed Shaikh
2014-10-07  1:57     ` [PATCH net] bna: page allocation during interrupts to use a mempool Eric Wheeler
2014-10-07  1:52       ` Eric Dumazet
2014-10-07  3:05         ` Eric Wheeler
2014-10-07  3:15           ` Eric Dumazet
2014-10-08  0:48             ` Eric Wheeler
2014-10-08  1:28               ` Eric Dumazet
2014-10-08 19:01                 ` Eric Wheeler
2014-10-08 19:04                   ` Eric Dumazet
2014-10-09  0:46                     ` Eric Wheeler
2014-10-08 19:07                 ` Eric Wheeler
2014-10-08 21:28                 ` Rasesh Mody
2014-10-09  0:50                   ` Eric Wheeler
2014-10-07  4:04           ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).