Re: [Bugme-new] [Bug 16083] New: swapper: Page allocation failure

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [Bugme-new] [Bug 16083] New: swapper: Page allocation failure
       [not found] <bug-16083-10286@https.bugzilla.kernel.org/>
@ 2010-06-03 20:02 ` Andrew Morton
  2010-06-03 21:13   ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2010-06-03 20:02 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

On Mon, 31 May 2010 15:55:12 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=16083
> 
>            Summary: swapper: Page allocation failure
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 2.6.34
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: akpm@linux-foundation.org
>         ReportedBy: sgunderson@bigfoot.com
>         Regression: No
> 
> 
> Hi,
> 
> Since upgrading from a Q9450 to 2xE5520 (and upgrading from 2.6.34-rc-something
> to 2.6.34), I've started seeing these:
> 
> [605882.372418] swapper: page allocation failure. order:2, mode:0x4020
> [605882.378981] Pid: 0, comm: swapper Not tainted 2.6.34 #1
> [605882.384617] Call Trace:
> [605882.387499]  <IRQ>  [<ffffffff81096d5a>] __alloc_pages_nodemask+0x5b0/0x629
> [605882.395068]  [<ffffffff81096de5>] __get_free_pages+0x12/0x4f
> [605882.401103]  [<ffffffff810bdeb4>] __kmalloc_track_caller+0x4c/0x156
> [605882.407817]  [<ffffffff81245986>] ? sock_alloc_send_pskb+0xdd/0x32d
> [605882.414556]  [<ffffffff8124a515>] __alloc_skb+0x66/0x15b

I wonder if we should switch __alloc_skb() over to __GFP_NOWARN. 
People keep on reporting events such as the above, and nobody's
getting any value from this.

Downsides:

- the change would tend to deprive MM developers of prompt "hey you
  broke it again" notifications.

- if a system is getting enough allocation failures to impact
  throughput, the operators won't *know* that it's happening, and so
  they won't make the changes necessary to reduce the frequency of
  memory allocation failures.

If these are likely to be a problem, perhaps networking could provide
some other form of "hey, you keep on running out of memory"
notification, if it doesn't already do so.

Thoughts?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bugme-new] [Bug 16083] New: swapper: Page allocation failure
  2010-06-03 20:02 ` [Bugme-new] [Bug 16083] New: swapper: Page allocation failure Andrew Morton
@ 2010-06-03 21:13   ` Eric Dumazet
  2010-06-03 21:37     ` Eric Dumazet
  2010-06-03 21:39     ` Andrew Morton
  0 siblings, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2010-06-03 21:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David S. Miller, netdev

Le jeudi 03 juin 2010 à 13:02 -0700, Andrew Morton a écrit :
> On Mon, 31 May 2010 15:55:12 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=16083
> > 
> >            Summary: swapper: Page allocation failure
> >            Product: Memory Management
> >            Version: 2.5
> >     Kernel Version: 2.6.34
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >         AssignedTo: akpm@linux-foundation.org
> >         ReportedBy: sgunderson@bigfoot.com
> >         Regression: No
> > 
> > 
> > Hi,
> > 
> > Since upgrading from a Q9450 to 2xE5520 (and upgrading from 2.6.34-rc-something
> > to 2.6.34), I've started seeing these:
> > 
> > [605882.372418] swapper: page allocation failure. order:2, mode:0x4020
> > [605882.378981] Pid: 0, comm: swapper Not tainted 2.6.34 #1
> > [605882.384617] Call Trace:
> > [605882.387499]  <IRQ>  [<ffffffff81096d5a>] __alloc_pages_nodemask+0x5b0/0x629
> > [605882.395068]  [<ffffffff81096de5>] __get_free_pages+0x12/0x4f
> > [605882.401103]  [<ffffffff810bdeb4>] __kmalloc_track_caller+0x4c/0x156
> > [605882.407817]  [<ffffffff81245986>] ? sock_alloc_send_pskb+0xdd/0x32d
> > [605882.414556]  [<ffffffff8124a515>] __alloc_skb+0x66/0x15b
> 
> I wonder if we should switch __alloc_skb() over to __GFP_NOWARN. 
> People keep on reporting events such as the above, and nobody's
> getting any value from this.
> 

Then we could make __GFP_NOWARN for all allocations in kernel, why
network is so special ?

> Downsides:
> 
> - the change would tend to deprive MM developers of prompt "hey you
>   broke it again" notifications.
> 
> - if a system is getting enough allocation failures to impact
>   throughput, the operators won't *know* that it's happening, and so
>   they won't make the changes necessary to reduce the frequency of
>   memory allocation failures.
> 

We should have SNMP counter increments 

> If these are likely to be a problem, perhaps networking could provide
> some other form of "hey, you keep on running out of memory"
> notification, if it doesn't already do so.
> 
> Thoughts?
> 

order-2 ATOMIC allocations ?

skb = mld_newpack(dev, dev->mtu);

Let's face it : It can not work in the long term.

MTU=9000 on a system with 4K pages... Oh well...

maybe net/ipv6/mcast.c should cap dev->mtu to PAGE_SIZE-128 or
something, so that order-0 allocations are done.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bugme-new] [Bug 16083] New: swapper: Page allocation failure
  2010-06-03 21:13   ` Eric Dumazet
@ 2010-06-03 21:37     ` Eric Dumazet
  2010-06-03 22:01       ` Andrew Morton
  2010-06-05 10:04       ` David Miller
  2010-06-03 21:39     ` Andrew Morton
  1 sibling, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2010-06-03 21:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David S. Miller, netdev

Le jeudi 03 juin 2010 à 23:13 +0200, Eric Dumazet a écrit :

> MTU=9000 on a system with 4K pages... Oh well...
> 
> maybe net/ipv6/mcast.c should cap dev->mtu to PAGE_SIZE-128 or
> something, so that order-0 allocations are done.
> 
> 

Something like this patch (completely untested) :

[PATCH] ipv6: avoid high order allocations

With mtu=9000, mld_newpack() use order-2 GFP_ATOMIC allocations, that
are very unreliable, on machines where PAGE_SIZE=4K

Limit allocated skbs to be at most one page. (order-0 allocations)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv6/mcast.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 59f1881..3484794 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -1356,7 +1356,10 @@ static struct sk_buff *mld_newpack(struct net_device *dev, int size)
 		     IPV6_TLV_PADN, 0 };
 
 	/* we assume size > sizeof(ra) here */
-	skb = sock_alloc_send_skb(sk, size + LL_ALLOCATED_SPACE(dev), 1, &err);
+	size += LL_ALLOCATED_SPACE(dev);
+	/* limit our allocations to order-0 page */
+	size = min(size, SKB_MAX_ORDER(0, 0));
+	skb = sock_alloc_send_skb(sk, size, 1, &err);
 
 	if (!skb)
 		return NULL;



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Bugme-new] [Bug 16083] New: swapper: Page allocation failure
  2010-06-03 21:13   ` Eric Dumazet
  2010-06-03 21:37     ` Eric Dumazet
@ 2010-06-03 21:39     ` Andrew Morton
  2010-06-03 21:58       ` Eric Dumazet
  1 sibling, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2010-06-03 21:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, netdev

On Thu, 03 Jun 2010 23:13:23 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le jeudi 03 juin 2010 __ 13:02 -0700, Andrew Morton a __crit :
> > On Mon, 31 May 2010 15:55:12 GMT
> > bugzilla-daemon@bugzilla.kernel.org wrote:
> > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=16083
> > > 
> > >            Summary: swapper: Page allocation failure
> > >            Product: Memory Management
> > >            Version: 2.5
> > >     Kernel Version: 2.6.34
> > >           Platform: All
> > >         OS/Version: Linux
> > >               Tree: Mainline
> > >             Status: NEW
> > >           Severity: normal
> > >           Priority: P1
> > >          Component: Other
> > >         AssignedTo: akpm@linux-foundation.org
> > >         ReportedBy: sgunderson@bigfoot.com
> > >         Regression: No
> > > 
> > > 
> > > Hi,
> > > 
> > > Since upgrading from a Q9450 to 2xE5520 (and upgrading from 2.6.34-rc-something
> > > to 2.6.34), I've started seeing these:
> > > 
> > > [605882.372418] swapper: page allocation failure. order:2, mode:0x4020
> > > [605882.378981] Pid: 0, comm: swapper Not tainted 2.6.34 #1
> > > [605882.384617] Call Trace:
> > > [605882.387499]  <IRQ>  [<ffffffff81096d5a>] __alloc_pages_nodemask+0x5b0/0x629
> > > [605882.395068]  [<ffffffff81096de5>] __get_free_pages+0x12/0x4f
> > > [605882.401103]  [<ffffffff810bdeb4>] __kmalloc_track_caller+0x4c/0x156
> > > [605882.407817]  [<ffffffff81245986>] ? sock_alloc_send_pskb+0xdd/0x32d
> > > [605882.414556]  [<ffffffff8124a515>] __alloc_skb+0x66/0x15b
> > 
> > I wonder if we should switch __alloc_skb() over to __GFP_NOWARN. 
> > People keep on reporting events such as the above, and nobody's
> > getting any value from this.
> > 
> 
> Then we could make __GFP_NOWARN for all allocations in kernel, why
> network is so special ?

Because this failure is known and is expected to occur sometimes and we
know that networking knows how to recover from it.

This removes most of the value from the warning.  The warning's there
to tell us about potentially buggy code, and to tell us why an
immediately-following oops happened.  Not applicable with alloc_skb()!

I mean, it's just not telling us anything very useful and it's alarming
users and is consuming effort.

> > Downsides:
> > 
> > - the change would tend to deprive MM developers of prompt "hey you
> >   broke it again" notifications.
> > 
> > - if a system is getting enough allocation failures to impact
> >   throughput, the operators won't *know* that it's happening, and so
> >   they won't make the changes necessary to reduce the frequency of
> >   memory allocation failures.
> > 
> 
> We should have SNMP counter increments 

I was thinking maybe a rate-limited printk every minute or so "12 skb
allocation failures since ...".  Dunno.

One of the problem with the current warning is that it looks like an oops.
In fact reporters regularly _call_ it "an oops".  Something less alarming
and more specific would be more helpful here.

> > If these are likely to be a problem, perhaps networking could provide
> > some other form of "hey, you keep on running out of memory"
> > notification, if it doesn't already do so.
> > 
> > Thoughts?
> > 
> 
> order-2 ATOMIC allocations ?
> 
> skb = mld_newpack(dev, dev->mtu);
> 
> Let's face it : It can not work in the long term.
> 
> MTU=9000 on a system with 4K pages... Oh well...
> 
> maybe net/ipv6/mcast.c should cap dev->mtu to PAGE_SIZE-128 or
> something, so that order-0 allocations are done.

Well.  The presence of this warning does serve to remind us how sucky
e1000[e] is :(

I'm not particularly fussed either way - I'm just wondering if you guys
think this thing meets the noise-to-benefit test...


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bugme-new] [Bug 16083] New: swapper: Page allocation failure
  2010-06-03 21:39     ` Andrew Morton
@ 2010-06-03 21:58       ` Eric Dumazet
  2010-06-03 22:11         ` Andrew Morton
  2010-06-05  9:59         ` David Miller
  0 siblings, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2010-06-03 21:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David S. Miller, netdev

Le jeudi 03 juin 2010 à 14:39 -0700, Andrew Morton a écrit :

> Well.  The presence of this warning does serve to remind us how sucky
> e1000[e] is :(
> 
> I'm not particularly fussed either way - I'm just wondering if you guys
> think this thing meets the noise-to-benefit test...
> 

Well, in this particular case, I think its a genuine bug in the ipv6
code, not on the e1000[e] driver :)

It allocates "a priori" dev->mtu sized skb that are filled by maybe one
hundred bytes by caller.

With MTU=9000, this means order-2 allocations. In an ideal world, it
would be fine, but in practice, we know only fools can trust high order
allocations.

Since code is prepared to chain skbs, just make sure we cap allocations
to smaller units (up to 0xe80 bytes on a 64bit kernel)

So in this particular case, the bugzilla report can point to a real
problem in our stack.

Failed allocations had been silent, we probably would never have
noticed.

Hmm...




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bugme-new] [Bug 16083] New: swapper: Page allocation failure
  2010-06-03 21:37     ` Eric Dumazet
@ 2010-06-03 22:01       ` Andrew Morton
  2010-06-05 10:04       ` David Miller
  1 sibling, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2010-06-03 22:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, netdev

On Thu, 03 Jun 2010 23:37:16 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le jeudi 03 juin 2010 __ 23:13 +0200, Eric Dumazet a __crit :
> 
> > MTU=9000 on a system with 4K pages... Oh well...
> > 
> > maybe net/ipv6/mcast.c should cap dev->mtu to PAGE_SIZE-128 or
> > something, so that order-0 allocations are done.
> > 
> > 
> 
> Something like this patch (completely untested) :
> 
> [PATCH] ipv6: avoid high order allocations
> 
> With mtu=9000, mld_newpack() use order-2 GFP_ATOMIC allocations, that
> are very unreliable, on machines where PAGE_SIZE=4K
> 
> Limit allocated skbs to be at most one page. (order-0 allocations)
> 

Maybe - I wouldn't know how desirable that is from the
imapct-on-efficiency POV.  But I think most failures I've seen are for
regular old tcpipv4.  Often with e1000, which does larger-than-needed
allocations for (iirc) weird alignment requirements.

> ---
>  net/ipv6/mcast.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
> index 59f1881..3484794 100644
> --- a/net/ipv6/mcast.c
> +++ b/net/ipv6/mcast.c
> @@ -1356,7 +1356,10 @@ static struct sk_buff *mld_newpack(struct net_device *dev, int size)
>  		     IPV6_TLV_PADN, 0 };
>  
>  	/* we assume size > sizeof(ra) here */
> -	skb = sock_alloc_send_skb(sk, size + LL_ALLOCATED_SPACE(dev), 1, &err);
> +	size += LL_ALLOCATED_SPACE(dev);
> +	/* limit our allocations to order-0 page */
> +	size = min(size, SKB_MAX_ORDER(0, 0));
> +	skb = sock_alloc_send_skb(sk, size, 1, &err);
>  
>  	if (!skb)
>  		return NULL;

An alternative which retains any performance benefit from the order-2
allocation would be:

	p = alloc_pages(__GFP_NOWARN|..., 2);
	if (!p)
		p = alloc_pages(..., 0);

if you see what I mean.

This would also fix any retry/timeout-related stalls which people might
experience when the order-2 allocation fails, so it might make things
better in general.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bugme-new] [Bug 16083] New: swapper: Page allocation failure
  2010-06-03 21:58       ` Eric Dumazet
@ 2010-06-03 22:11         ` Andrew Morton
  2010-06-05  9:59         ` David Miller
  1 sibling, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2010-06-03 22:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, netdev

On Thu, 03 Jun 2010 23:58:02 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le jeudi 03 juin 2010 __ 14:39 -0700, Andrew Morton a __crit :
> 
> > Well.  The presence of this warning does serve to remind us how sucky
> > e1000[e] is :(
> > 
> > I'm not particularly fussed either way - I'm just wondering if you guys
> > think this thing meets the noise-to-benefit test...
> > 
> 
> Well, in this particular case, I think its a genuine bug in the ipv6
> code, not on the e1000[e] driver :)
> 
> It allocates "a priori" dev->mtu sized skb that are filled by maybe one
> hundred bytes by caller.
> 
> With MTU=9000, this means order-2 allocations. In an ideal world, it
> would be fine, but in practice, we know only fools can trust high order
> allocations.
> 
> Since code is prepared to chain skbs, just make sure we cap allocations
> to smaller units (up to 0xe80 bytes on a 64bit kernel)
> 
> So in this particular case, the bugzilla report can point to a real
> problem in our stack.
> 
> Failed allocations had been silent, we probably would never have
> noticed.
> 
> Hmm...
> 

heh, in that case I guess we should leave it there.

But I do tend to ignore such reports, or fob them off with some
boilerplate.  It was pure luck that the one I chose as an example
happened to be interesting.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bugme-new] [Bug 16083] New: swapper: Page allocation failure
  2010-06-03 21:58       ` Eric Dumazet
  2010-06-03 22:11         ` Andrew Morton
@ 2010-06-05  9:59         ` David Miller
  1 sibling, 0 replies; 9+ messages in thread
From: David Miller @ 2010-06-05  9:59 UTC (permalink / raw)
  To: eric.dumazet; +Cc: akpm, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 03 Jun 2010 23:58:02 +0200

> So in this particular case, the bugzilla report can point to a real
> problem in our stack.
> 
> Failed allocations had been silent, we probably would never have
> noticed.

Agreed.

We'd be instead scratching our heads over reports about IGMP/MLD
reports not going out.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bugme-new] [Bug 16083] New: swapper: Page allocation failure
  2010-06-03 21:37     ` Eric Dumazet
  2010-06-03 22:01       ` Andrew Morton
@ 2010-06-05 10:04       ` David Miller
  1 sibling, 0 replies; 9+ messages in thread
From: David Miller @ 2010-06-05 10:04 UTC (permalink / raw)
  To: eric.dumazet; +Cc: akpm, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 03 Jun 2010 23:37:16 +0200

> [PATCH] ipv6: avoid high order allocations
> 
> With mtu=9000, mld_newpack() use order-2 GFP_ATOMIC allocations, that
> are very unreliable, on machines where PAGE_SIZE=4K
> 
> Limit allocated skbs to be at most one page. (order-0 allocations)
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

This looks perfectly fine to me, except I had to change
min(...) to min_t(int, ...) to avoid a compile warning.

Thanks Eric!

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-06-05 10:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-16083-10286@https.bugzilla.kernel.org/>
2010-06-03 20:02 ` [Bugme-new] [Bug 16083] New: swapper: Page allocation failure Andrew Morton
2010-06-03 21:13   ` Eric Dumazet
2010-06-03 21:37     ` Eric Dumazet
2010-06-03 22:01       ` Andrew Morton
2010-06-05 10:04       ` David Miller
2010-06-03 21:39     ` Andrew Morton
2010-06-03 21:58       ` Eric Dumazet
2010-06-03 22:11         ` Andrew Morton
2010-06-05  9:59         ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).