Re: [Bugme-new] [Bug 13084] New: page allocation failure. order:0, mode:0x20

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [Bugme-new] [Bug 13084] New: page allocation failure. order:0, mode:0x20
       [not found] <bug-13084-10286@http.bugzilla.kernel.org/>
@ 2009-04-13 19:42 ` Andrew Morton
  2009-04-13 20:06   ` Brandeburg, Jesse
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2009-04-13 19:42 UTC (permalink / raw)
  To: reeve.yang
  Cc: bugzilla-daemon, netdev, jeffrey.t.kirsher, jesse.brandeburg,
	bruce.w.allan, peter.p.waskiewicz.jr, john.ronciak

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Mon, 13 Apr 2009 19:27:27 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=13084
> 
>            Summary: page allocation failure. order:0, mode:0x20
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 2.6.17.4
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Page Allocator
>         AssignedTo: akpm@linux-foundation.org
>         ReportedBy: reeve.yang@gmail.com
>         Regression: No
> 
> 
> Created an attachment (id=20964)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=20964)
> kernel config file.
> 
> The system is Intel Xeon Quad core with 8G physical RAM. When it's under UPD
> loads, e.g., DNS queries, the box is stuck in terms it cannot be pinged or
> login. By checking syslog, I'm seeing following trace back from various
> dameon/processes. The network controller is E1000 82571 with NAPI enabled in
> kernel.
>
> page allocation failure. order:0, mode:0x20

This is very common.  e1000 attempts to do large memory allocations
from within interrupt context and the page allocator cannot satisfy the
allocation and is not allowed to do the necessary work to make the
allocation attempt succeed.  It's the same with all net drivers, but
e1000 is especially prone, apparently because of hardware suckiness.

However the networking stack should just drop the packet and the system
will recover.

You report is unclear.  Yes, the machine wedges up under the UDP load. 
But does it recover when the other machine stops spraying UDP packets
at this machine?  It _should_ recover.  If it does not, we have a bug
somewhere.

The usual workaround for these problems is to increase the value in
/proc/sys/vm/min_free_kbytes.

2.6.17 is fairly old.  If we need to do additional work on this report
then we'll be asking you to test something more recent - ideally
2.6.29.

Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Bugme-new] [Bug 13084] New: page allocation failure. order:0, mode:0x20
  2009-04-13 19:42 ` [Bugme-new] [Bug 13084] New: page allocation failure. order:0, mode:0x20 Andrew Morton
@ 2009-04-13 20:06   ` Brandeburg, Jesse
  2009-04-13 20:11     ` Reeve Yang
  2009-04-13 20:41     ` David Miller
  0 siblings, 2 replies; 4+ messages in thread
From: Brandeburg, Jesse @ 2009-04-13 20:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: reeve.yang@gmail.com, bugzilla-daemon@bugzilla.kernel.org,
	netdev@vger.kernel.org, Kirsher, Jeffrey T, Allan, Bruce W,
	Waskiewicz Jr, Peter P, Ronciak, John

On Mon, 13 Apr 2009, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Mon, 13 Apr 2009 19:27:27 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=13084
> > 
> >            Summary: page allocation failure. order:0, mode:0x20
> >            Product: Memory Management
> >            Version: 2.5
> >     Kernel Version: 2.6.17.4
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: high
> >           Priority: P1
> >          Component: Page Allocator
> >         AssignedTo: akpm@linux-foundation.org
> >         ReportedBy: reeve.yang@gmail.com
> >         Regression: No
> > 
> > 
> > Created an attachment (id=20964)
> >  --> (http://bugzilla.kernel.org/attachment.cgi?id=20964)
> > kernel config file.
> > 
> > The system is Intel Xeon Quad core with 8G physical RAM. When it's under UPD
> > loads, e.g., DNS queries, the box is stuck in terms it cannot be pinged or
> > login. By checking syslog, I'm seeing following trace back from various
> > dameon/processes. The network controller is E1000 82571 with NAPI enabled in
> > kernel.
> >
> > page allocation failure. order:0, mode:0x20
> 
> This is very common.  e1000 attempts to do large memory allocations
> from within interrupt context and the page allocator cannot satisfy the
> allocation and is not allowed to do the necessary work to make the
> allocation attempt succeed.  It's the same with all net drivers, but
> e1000 is especially prone, apparently because of hardware suckiness.

while in jumbo mode, andrew's statement is true, but with order:0 
allocation failures it is just normal networking goo that causes the 
memory allocator to run out of free pages, seems much less frequent in 
newer kernels.
 
> However the networking stack should just drop the packet and the system
> will recover.

I think at that point the kernel gets quite busy printing warnings about 
how much it is out of memory.

> You report is unclear.  Yes, the machine wedges up under the UDP load. 
> But does it recover when the other machine stops spraying UDP packets
> at this machine?  It _should_ recover.  If it does not, we have a bug
> somewhere.

In this case kmem_cache_alloc is failing to get memory, being called by 
the route_dst code, maybe someone on netdev can comment if this has been 
fixed along the way.
 
> The usual workaround for these problems is to increase the value in
> /proc/sys/vm/min_free_kbytes.

this should help a lot in my experience.

> 2.6.17 is fairly old.  If we need to do additional work on this report
> then we'll be asking you to test something more recent - ideally
> 2.6.29.

If you must run 2.6.17, then you might want to try the e1000e driver (*not 
e1000*) from sourceforge for your 82571.

Otherwise I also will be asking you to soon try a newer kernel.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Bugme-new] [Bug 13084] New: page allocation failure. order:0, mode:0x20
  2009-04-13 20:06   ` Brandeburg, Jesse
@ 2009-04-13 20:11     ` Reeve Yang
  2009-04-13 20:41     ` David Miller
  1 sibling, 0 replies; 4+ messages in thread
From: Reeve Yang @ 2009-04-13 20:11 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: Andrew Morton, bugzilla-daemon@bugzilla.kernel.org,
	netdev@vger.kernel.org, Kirsher, Jeffrey T, Allan, Bruce W,
	Waskiewicz Jr, Peter P, Ronciak, John

Here is the memory snapshot when problem happening:

MemTotal:      8307844 kB
MemFree:       6091208 kB
Buffers:          6524 kB
Cached:        1121528 kB
SwapCached:          0 kB
Active:        1361052 kB
Inactive:        25784 kB
HighTotal:     7470464 kB
HighFree:      6083688 kB
LowTotal:       837380 kB
LowFree:          7520 kB
SwapTotal:     2047992 kB
SwapFree:      2047992 kB
Dirty:          744488 kB
Writeback:           0 kB
Mapped:         285532 kB
Slab:           797500 kB
CommitLimit:   6201912 kB
Committed_AS:   459788 kB
PageTables:       3532 kB
VmallocTotal:   118776 kB
VmallocUsed:      2432 kB
VmallocChunk:   116084 kB

You can see I have lots of physical RAM available. The LowFree
reduction rate is about 10M/Second.

On Mon, Apr 13, 2009 at 1:06 PM, Brandeburg, Jesse
<jesse.brandeburg@intel.com> wrote:
> On Mon, 13 Apr 2009, Andrew Morton wrote:
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>> On Mon, 13 Apr 2009 19:27:27 GMT
>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>
>> > http://bugzilla.kernel.org/show_bug.cgi?id=13084
>> >
>> >            Summary: page allocation failure. order:0, mode:0x20
>> >            Product: Memory Management
>> >            Version: 2.5
>> >     Kernel Version: 2.6.17.4
>> >           Platform: All
>> >         OS/Version: Linux
>> >               Tree: Mainline
>> >             Status: NEW
>> >           Severity: high
>> >           Priority: P1
>> >          Component: Page Allocator
>> >         AssignedTo: akpm@linux-foundation.org
>> >         ReportedBy: reeve.yang@gmail.com
>> >         Regression: No
>> >
>> >
>> > Created an attachment (id=20964)
>> >  --> (http://bugzilla.kernel.org/attachment.cgi?id=20964)
>> > kernel config file.
>> >
>> > The system is Intel Xeon Quad core with 8G physical RAM. When it's under UPD
>> > loads, e.g., DNS queries, the box is stuck in terms it cannot be pinged or
>> > login. By checking syslog, I'm seeing following trace back from various
>> > dameon/processes. The network controller is E1000 82571 with NAPI enabled in
>> > kernel.
>> >
>> > page allocation failure. order:0, mode:0x20
>>
>> This is very common.  e1000 attempts to do large memory allocations
>> from within interrupt context and the page allocator cannot satisfy the
>> allocation and is not allowed to do the necessary work to make the
>> allocation attempt succeed.  It's the same with all net drivers, but
>> e1000 is especially prone, apparently because of hardware suckiness.
>
> while in jumbo mode, andrew's statement is true, but with order:0
> allocation failures it is just normal networking goo that causes the
> memory allocator to run out of free pages, seems much less frequent in
> newer kernels.
>
>> However the networking stack should just drop the packet and the system
>> will recover.
>
> I think at that point the kernel gets quite busy printing warnings about
> how much it is out of memory.
>
>> You report is unclear.  Yes, the machine wedges up under the UDP load.
>> But does it recover when the other machine stops spraying UDP packets
>> at this machine?  It _should_ recover.  If it does not, we have a bug
>> somewhere.
>
> In this case kmem_cache_alloc is failing to get memory, being called by
> the route_dst code, maybe someone on netdev can comment if this has been
> fixed along the way.
>
>> The usual workaround for these problems is to increase the value in
>> /proc/sys/vm/min_free_kbytes.
>
> this should help a lot in my experience.
>
>> 2.6.17 is fairly old.  If we need to do additional work on this report
>> then we'll be asking you to test something more recent - ideally
>> 2.6.29.
>
> If you must run 2.6.17, then you might want to try the e1000e driver (*not
> e1000*) from sourceforge for your 82571.
>
> Otherwise I also will be asking you to soon try a newer kernel.
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Bugme-new] [Bug 13084] New: page allocation failure. order:0, mode:0x20
  2009-04-13 20:06   ` Brandeburg, Jesse
  2009-04-13 20:11     ` Reeve Yang
@ 2009-04-13 20:41     ` David Miller
  1 sibling, 0 replies; 4+ messages in thread
From: David Miller @ 2009-04-13 20:41 UTC (permalink / raw)
  To: jesse.brandeburg
  Cc: akpm, reeve.yang, bugzilla-daemon, netdev, jeffrey.t.kirsher,
	bruce.w.allan, peter.p.waskiewicz.jr, john.ronciak

From: "Brandeburg, Jesse" <jesse.brandeburg@intel.com>
Date: Mon, 13 Apr 2009 13:06:04 -0700 (Pacific Daylight Time)

> On Mon, 13 Apr 2009, Andrew Morton wrote:
>> You report is unclear.  Yes, the machine wedges up under the UDP load. 
>> But does it recover when the other machine stops spraying UDP packets
>> at this machine?  It _should_ recover.  If it does not, we have a bug
>> somewhere.
> 
> In this case kmem_cache_alloc is failing to get memory, being called by 
> the route_dst code, maybe someone on netdev can comment if this has been 
> fixed along the way.

Although I have some level of tolerance, there is zero way I'm
going to analyze anything on 2.6.17 kernels nor am I going to
encourage other core networking developers to waste their time
on this either.

It is easily the case that we've fixed 10's of thousands of
VM and networking bugs since then, if not more.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-04-13 20:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-13084-10286@http.bugzilla.kernel.org/>
2009-04-13 19:42 ` [Bugme-new] [Bug 13084] New: page allocation failure. order:0, mode:0x20 Andrew Morton
2009-04-13 20:06   ` Brandeburg, Jesse
2009-04-13 20:11     ` Reeve Yang
2009-04-13 20:41     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).