linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Vanilla-Kernel 3 - page allocation failure
@ 2011-10-18 10:25 Philipp Herz - Profihost AG
  2011-10-18 11:32 ` Thadeu Lima de Souza Cascardo
  2011-10-18 15:51 ` Andi Kleen
  0 siblings, 2 replies; 23+ messages in thread
From: Philipp Herz - Profihost AG @ 2011-10-18 10:25 UTC (permalink / raw)
  To: linux-kernel

After updating kernel (x86_64) to stable version 3 there are a few 
messages appearing in the kernel log such as

kworker/0:1: page allocation failure: order:1, mode:0x20
mysql: page allocation failure: order:1, mode:0x20
php5: page allocation failure: order:1, mode:0x20

Searching the net showed that these messages are known to occur since 2004.

Some people were able to get rid of them by setting 
/proc/sys/vm/min_free_kbytes to a high enough value. This does not help 
in our case.


Is there a kernel comand line argument to avoid these messages?

As of mm/page_alloc.c these messages are marked to be only warning 
messages and would not appear if 'gpf_mask' was set to __GFP_NOWARN in 
function warn_alloc_failed.

How does this mask get set? Is it set by the "external" process knocking 
at the memory manager?

What is the magic behind the 'order' and 'mode'?

I'm not a subscriber, so please CC me a copy of messages related to the 
subject. I'm not sure if I can help much by looking at the inside of the 
kernel, but I will try my best to answer any questions concerning this 
issue.

Best regards, Philipp

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-18 10:25 Vanilla-Kernel 3 - page allocation failure Philipp Herz - Profihost AG
@ 2011-10-18 11:32 ` Thadeu Lima de Souza Cascardo
  2011-10-18 12:07   ` Philipp Herz - Profihost AG
  2011-10-18 15:51 ` Andi Kleen
  1 sibling, 1 reply; 23+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2011-10-18 11:32 UTC (permalink / raw)
  To: Philipp Herz - Profihost AG; +Cc: linux-kernel

On Tue, Oct 18, 2011 at 12:25:03PM +0200, Philipp Herz - Profihost AG wrote:
> After updating kernel (x86_64) to stable version 3 there are a few
> messages appearing in the kernel log such as
> 
> kworker/0:1: page allocation failure: order:1, mode:0x20
> mysql: page allocation failure: order:1, mode:0x20
> php5: page allocation failure: order:1, mode:0x20
> 
> Searching the net showed that these messages are known to occur since 2004.
> 
> Some people were able to get rid of them by setting
> /proc/sys/vm/min_free_kbytes to a high enough value. This does not
> help in our case.
> 
> 
> Is there a kernel comand line argument to avoid these messages?
> 
> As of mm/page_alloc.c these messages are marked to be only warning
> messages and would not appear if 'gpf_mask' was set to __GFP_NOWARN
> in function warn_alloc_failed.
> 
> How does this mask get set? Is it set by the "external" process
> knocking at the memory manager?
> 

Hello, Philipp.

This happens when kernel tries to allocate memory, sometimes in response
to some request by the user space, but also in other contexts. For
example, an interrupt by a network driver may try to allocate memory. In
this context, it will use GFP_ATOMIC as a mask, for example. The most
usual flags in the kernel are GFP_KERNEL and GFP_ATOMIC.

> What is the magic behind the 'order' and 'mode'?
> 

The order is the binary log of the number of pages requested. So, order 1
allocations are 2 pages, order 4 would be 16 pages, for example.

The mode is, in fact, gfp_flags. 0x20 is GFP_ATOMIC. This kind of
allocation cannot do IO or access the filesystem. Also, it cannot wait
for reclaim memory from cache.

This warning is usually followed by some statistics about memory use
in your system. Please post it to give more information about this
situation.

I have watched some of this happen when lots of cache is used by some
filesystems. Perhaps, some tweaking of the vm sysctl options may help,
but I can point any magic tweaking right now.

Regards,
Cascardo.

> I'm not a subscriber, so please CC me a copy of messages related to
> the subject. I'm not sure if I can help much by looking at the
> inside of the kernel, but I will try my best to answer any questions
> concerning this issue.
> 
> Best regards, Philipp
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-18 11:32 ` Thadeu Lima de Souza Cascardo
@ 2011-10-18 12:07   ` Philipp Herz - Profihost AG
  2011-10-18 12:38     ` Thadeu Lima de Souza Cascardo
  0 siblings, 1 reply; 23+ messages in thread
From: Philipp Herz - Profihost AG @ 2011-10-18 12:07 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo; +Cc: linux-kernel

Hello Cascardo,

thanks for your detailed answer!

I have uploaded two call traces to pastebin for further investigation.

Maybe this can help you.

* http://pastebin.com/Psg2dGYC (kworker)
* http://pastebin.com/pPFjZqxL (php5)

Regards,
Philipp


Am 18.10.2011 13:32, schrieb Thadeu Lima de Souza Cascardo:
> On Tue, Oct 18, 2011 at 12:25:03PM +0200, Philipp Herz - Profihost AG wrote:
>> After updating kernel (x86_64) to stable version 3 there are a few
>> messages appearing in the kernel log such as
>>
>> kworker/0:1: page allocation failure: order:1, mode:0x20
>> mysql: page allocation failure: order:1, mode:0x20
>> php5: page allocation failure: order:1, mode:0x20
>>
>> Searching the net showed that these messages are known to occur since 2004.
>>
>> Some people were able to get rid of them by setting
>> /proc/sys/vm/min_free_kbytes to a high enough value. This does not
>> help in our case.
>>
>>
>> Is there a kernel comand line argument to avoid these messages?
>>
>> As of mm/page_alloc.c these messages are marked to be only warning
>> messages and would not appear if 'gpf_mask' was set to __GFP_NOWARN
>> in function warn_alloc_failed.
>>
>> How does this mask get set? Is it set by the "external" process
>> knocking at the memory manager?
>>
>
> Hello, Philipp.
>
> This happens when kernel tries to allocate memory, sometimes in response
> to some request by the user space, but also in other contexts. For
> example, an interrupt by a network driver may try to allocate memory. In
> this context, it will use GFP_ATOMIC as a mask, for example. The most
> usual flags in the kernel are GFP_KERNEL and GFP_ATOMIC.
>
>> What is the magic behind the 'order' and 'mode'?
>>
>
> The order is the binary log of the number of pages requested. So, order 1
> allocations are 2 pages, order 4 would be 16 pages, for example.
>
> The mode is, in fact, gfp_flags. 0x20 is GFP_ATOMIC. This kind of
> allocation cannot do IO or access the filesystem. Also, it cannot wait
> for reclaim memory from cache.
>
> This warning is usually followed by some statistics about memory use
> in your system. Please post it to give more information about this
> situation.
>
> I have watched some of this happen when lots of cache is used by some
> filesystems. Perhaps, some tweaking of the vm sysctl options may help,
> but I can point any magic tweaking right now.
>
> Regards,
> Cascardo.
>
>> I'm not a subscriber, so please CC me a copy of messages related to
>> the subject. I'm not sure if I can help much by looking at the
>> inside of the kernel, but I will try my best to answer any questions
>> concerning this issue.
>>
>> Best regards, Philipp
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-18 12:07   ` Philipp Herz - Profihost AG
@ 2011-10-18 12:38     ` Thadeu Lima de Souza Cascardo
  2011-10-18 13:24       ` Philipp Herz - Profihost AG
  0 siblings, 1 reply; 23+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2011-10-18 12:38 UTC (permalink / raw)
  To: Philipp Herz - Profihost AG; +Cc: linux-kernel

On Tue, Oct 18, 2011 at 02:07:38PM +0200, Philipp Herz - Profihost AG wrote:
> Hello Cascardo,
> 
> thanks for your detailed answer!
> 
> I have uploaded two call traces to pastebin for further investigation.
> 
> Maybe this can help you.
> 
> * http://pastebin.com/Psg2dGYC (kworker)
> * http://pastebin.com/pPFjZqxL (php5)
> 
> Regards,
> Philipp
> 

Hello, Philipp.

That only tells us that you have a TCP workload in your system. This is
the subsystem that is trying to allocate memory. However, we do not know
why there is failure. Usually, after the stack dump, there is some
statistics about memory. I have seen that these may be suppressed if you
have a NUMA system with lots of nodes. Check for NODE_SHIFT in your
config. If it's greater than 8, that output may have been suppressed.
But you may have just ignored the statistics because of the stack dump.

Regards,
Cascardo.

> 
> Am 18.10.2011 13:32, schrieb Thadeu Lima de Souza Cascardo:
> >On Tue, Oct 18, 2011 at 12:25:03PM +0200, Philipp Herz - Profihost AG wrote:
> >>After updating kernel (x86_64) to stable version 3 there are a few
> >>messages appearing in the kernel log such as
> >>
> >>kworker/0:1: page allocation failure: order:1, mode:0x20
> >>mysql: page allocation failure: order:1, mode:0x20
> >>php5: page allocation failure: order:1, mode:0x20
> >>
> >>Searching the net showed that these messages are known to occur since 2004.
> >>
> >>Some people were able to get rid of them by setting
> >>/proc/sys/vm/min_free_kbytes to a high enough value. This does not
> >>help in our case.
> >>
> >>
> >>Is there a kernel comand line argument to avoid these messages?
> >>
> >>As of mm/page_alloc.c these messages are marked to be only warning
> >>messages and would not appear if 'gpf_mask' was set to __GFP_NOWARN
> >>in function warn_alloc_failed.
> >>
> >>How does this mask get set? Is it set by the "external" process
> >>knocking at the memory manager?
> >>
> >
> >Hello, Philipp.
> >
> >This happens when kernel tries to allocate memory, sometimes in response
> >to some request by the user space, but also in other contexts. For
> >example, an interrupt by a network driver may try to allocate memory. In
> >this context, it will use GFP_ATOMIC as a mask, for example. The most
> >usual flags in the kernel are GFP_KERNEL and GFP_ATOMIC.
> >
> >>What is the magic behind the 'order' and 'mode'?
> >>
> >
> >The order is the binary log of the number of pages requested. So, order 1
> >allocations are 2 pages, order 4 would be 16 pages, for example.
> >
> >The mode is, in fact, gfp_flags. 0x20 is GFP_ATOMIC. This kind of
> >allocation cannot do IO or access the filesystem. Also, it cannot wait
> >for reclaim memory from cache.
> >
> >This warning is usually followed by some statistics about memory use
> >in your system. Please post it to give more information about this
> >situation.
> >
> >I have watched some of this happen when lots of cache is used by some
> >filesystems. Perhaps, some tweaking of the vm sysctl options may help,
> >but I can point any magic tweaking right now.
> >
> >Regards,
> >Cascardo.
> >
> >>I'm not a subscriber, so please CC me a copy of messages related to
> >>the subject. I'm not sure if I can help much by looking at the
> >>inside of the kernel, but I will try my best to answer any questions
> >>concerning this issue.
> >>
> >>Best regards, Philipp
> >>--
> >>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >>the body of a message to majordomo@vger.kernel.org
> >>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>Please read the FAQ at  http://www.tux.org/lkml/
> >
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-18 12:38     ` Thadeu Lima de Souza Cascardo
@ 2011-10-18 13:24       ` Philipp Herz - Profihost AG
  2011-10-18 14:35         ` Thadeu Lima de Souza Cascardo
  0 siblings, 1 reply; 23+ messages in thread
From: Philipp Herz - Profihost AG @ 2011-10-18 13:24 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo; +Cc: linux-kernel

Hello Cascardo

 > Usually, after the stack dump, there is some
 > statistics about memory.
Yes, i have seen this in other posts as well.

 > I have seen that these may be suppressed
 > if you have a NUMA system with lots of nodes.
Yes, in our case it seems to be suppressed.

 > Check for NODE_SHIFT in your
 > config. If it's greater than 8, that output may have been suppressed.
CONFIG_NODES_SHIFT=10 will be the answer.

Is there any way to get those stats without recompiling the kernel?

 > But you may have just ignored the statistics because of the
 > stack dump.
No, i was also wondering why other do have these ;-)

Regards,
Philipp

Am 18.10.2011 14:38, schrieb Thadeu Lima de Souza Cascardo:
> On Tue, Oct 18, 2011 at 02:07:38PM +0200, Philipp Herz - Profihost AG wrote:
>> Hello Cascardo,
>>
>> thanks for your detailed answer!
>>
>> I have uploaded two call traces to pastebin for further investigation.
>>
>> Maybe this can help you.
>>
>> * http://pastebin.com/Psg2dGYC (kworker)
>> * http://pastebin.com/pPFjZqxL (php5)
>>
>> Regards,
>> Philipp
>>
>
> Hello, Philipp.
>
> That only tells us that you have a TCP workload in your system. This is
> the subsystem that is trying to allocate memory. However, we do not know
> why there is failure. Usually, after the stack dump, there is some
> statistics about memory. I have seen that these may be suppressed if you
> have a NUMA system with lots of nodes. Check for NODE_SHIFT in your
> config. If it's greater than 8, that output may have been suppressed.
> But you may have just ignored the statistics because of the stack dump.
>
> Regards,
> Cascardo.
>
>>
>> Am 18.10.2011 13:32, schrieb Thadeu Lima de Souza Cascardo:
>>> On Tue, Oct 18, 2011 at 12:25:03PM +0200, Philipp Herz - Profihost AG wrote:
>>>> After updating kernel (x86_64) to stable version 3 there are a few
>>>> messages appearing in the kernel log such as
>>>>
>>>> kworker/0:1: page allocation failure: order:1, mode:0x20
>>>> mysql: page allocation failure: order:1, mode:0x20
>>>> php5: page allocation failure: order:1, mode:0x20
>>>>
>>>> Searching the net showed that these messages are known to occur since 2004.
>>>>
>>>> Some people were able to get rid of them by setting
>>>> /proc/sys/vm/min_free_kbytes to a high enough value. This does not
>>>> help in our case.
>>>>
>>>>
>>>> Is there a kernel comand line argument to avoid these messages?
>>>>
>>>> As of mm/page_alloc.c these messages are marked to be only warning
>>>> messages and would not appear if 'gpf_mask' was set to __GFP_NOWARN
>>>> in function warn_alloc_failed.
>>>>
>>>> How does this mask get set? Is it set by the "external" process
>>>> knocking at the memory manager?
>>>>
>>>
>>> Hello, Philipp.
>>>
>>> This happens when kernel tries to allocate memory, sometimes in response
>>> to some request by the user space, but also in other contexts. For
>>> example, an interrupt by a network driver may try to allocate memory. In
>>> this context, it will use GFP_ATOMIC as a mask, for example. The most
>>> usual flags in the kernel are GFP_KERNEL and GFP_ATOMIC.
>>>
>>>> What is the magic behind the 'order' and 'mode'?
>>>>
>>>
>>> The order is the binary log of the number of pages requested. So, order 1
>>> allocations are 2 pages, order 4 would be 16 pages, for example.
>>>
>>> The mode is, in fact, gfp_flags. 0x20 is GFP_ATOMIC. This kind of
>>> allocation cannot do IO or access the filesystem. Also, it cannot wait
>>> for reclaim memory from cache.
>>>
>>> This warning is usually followed by some statistics about memory use
>>> in your system. Please post it to give more information about this
>>> situation.
>>>
>>> I have watched some of this happen when lots of cache is used by some
>>> filesystems. Perhaps, some tweaking of the vm sysctl options may help,
>>> but I can point any magic tweaking right now.
>>>
>>> Regards,
>>> Cascardo.
>>>
>>>> I'm not a subscriber, so please CC me a copy of messages related to
>>>> the subject. I'm not sure if I can help much by looking at the
>>>> inside of the kernel, but I will try my best to answer any questions
>>>> concerning this issue.
>>>>
>>>> Best regards, Philipp
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>
>>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-18 13:24       ` Philipp Herz - Profihost AG
@ 2011-10-18 14:35         ` Thadeu Lima de Souza Cascardo
  2011-10-19  6:45           ` Philipp Herz - Profihost AG
  0 siblings, 1 reply; 23+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2011-10-18 14:35 UTC (permalink / raw)
  To: Philipp Herz - Profihost AG; +Cc: linux-kernel

On Tue, Oct 18, 2011 at 03:24:44PM +0200, Philipp Herz - Profihost AG wrote:
> Hello Cascardo
> 
> > Usually, after the stack dump, there is some
> > statistics about memory.
> Yes, i have seen this in other posts as well.
> 
> > I have seen that these may be suppressed
> > if you have a NUMA system with lots of nodes.
> Yes, in our case it seems to be suppressed.
> 
> > Check for NODE_SHIFT in your
> > config. If it's greater than 8, that output may have been suppressed.
> CONFIG_NODES_SHIFT=10 will be the answer.
> 
> Is there any way to get those stats without recompiling the kernel?
> 
> > But you may have just ignored the statistics because of the
> > stack dump.
> No, i was also wondering why other do have these ;-)
> 
> Regards,
> Philipp
> 

echo m > /proc/sysrq-trigger

will show you that same output, but not at the time the memory failure
happens. It may still show you what is the condition of memory on your
nodes.

I am not that much versed in the VM. It just happens that I had very
similar issues lately and was trying to undertand it a little more. I
still have to solve these issues myself.

In my case, the workload is IO bound on extX filesystems and I see that
other systems have these failures due to this memory pressure. Usually,
after stopping the workload and unmounting the filesystems, I get most
of the memory in the system freed.

Most of the failures are from GFP_ATOMIC allocations, because those
won't reclaim memory, but they won't allocate if there is only freed
memory below the threshold. Setting this threshold to a lower value
like it was suggested (min_free_kbytes) would have helped, but, then,
this allows whatever is putting pressure on your memory to also allocate
below the threshold and you end up in the same situation (or a worse
one).

Do your workload works better on a previous version? I had problems
using something like 2.6.32.

Regards,
Cascardo.

> Am 18.10.2011 14:38, schrieb Thadeu Lima de Souza Cascardo:
> >On Tue, Oct 18, 2011 at 02:07:38PM +0200, Philipp Herz - Profihost AG wrote:
> >>Hello Cascardo,
> >>
> >>thanks for your detailed answer!
> >>
> >>I have uploaded two call traces to pastebin for further investigation.
> >>
> >>Maybe this can help you.
> >>
> >>* http://pastebin.com/Psg2dGYC (kworker)
> >>* http://pastebin.com/pPFjZqxL (php5)
> >>
> >>Regards,
> >>Philipp
> >>
> >
> >Hello, Philipp.
> >
> >That only tells us that you have a TCP workload in your system. This is
> >the subsystem that is trying to allocate memory. However, we do not know
> >why there is failure. Usually, after the stack dump, there is some
> >statistics about memory. I have seen that these may be suppressed if you
> >have a NUMA system with lots of nodes. Check for NODE_SHIFT in your
> >config. If it's greater than 8, that output may have been suppressed.
> >But you may have just ignored the statistics because of the stack dump.
> >
> >Regards,
> >Cascardo.
> >
> >>
> >>Am 18.10.2011 13:32, schrieb Thadeu Lima de Souza Cascardo:
> >>>On Tue, Oct 18, 2011 at 12:25:03PM +0200, Philipp Herz - Profihost AG wrote:
> >>>>After updating kernel (x86_64) to stable version 3 there are a few
> >>>>messages appearing in the kernel log such as
> >>>>
> >>>>kworker/0:1: page allocation failure: order:1, mode:0x20
> >>>>mysql: page allocation failure: order:1, mode:0x20
> >>>>php5: page allocation failure: order:1, mode:0x20
> >>>>
> >>>>Searching the net showed that these messages are known to occur since 2004.
> >>>>
> >>>>Some people were able to get rid of them by setting
> >>>>/proc/sys/vm/min_free_kbytes to a high enough value. This does not
> >>>>help in our case.
> >>>>
> >>>>
> >>>>Is there a kernel comand line argument to avoid these messages?
> >>>>
> >>>>As of mm/page_alloc.c these messages are marked to be only warning
> >>>>messages and would not appear if 'gpf_mask' was set to __GFP_NOWARN
> >>>>in function warn_alloc_failed.
> >>>>
> >>>>How does this mask get set? Is it set by the "external" process
> >>>>knocking at the memory manager?
> >>>>
> >>>
> >>>Hello, Philipp.
> >>>
> >>>This happens when kernel tries to allocate memory, sometimes in response
> >>>to some request by the user space, but also in other contexts. For
> >>>example, an interrupt by a network driver may try to allocate memory. In
> >>>this context, it will use GFP_ATOMIC as a mask, for example. The most
> >>>usual flags in the kernel are GFP_KERNEL and GFP_ATOMIC.
> >>>
> >>>>What is the magic behind the 'order' and 'mode'?
> >>>>
> >>>
> >>>The order is the binary log of the number of pages requested. So, order 1
> >>>allocations are 2 pages, order 4 would be 16 pages, for example.
> >>>
> >>>The mode is, in fact, gfp_flags. 0x20 is GFP_ATOMIC. This kind of
> >>>allocation cannot do IO or access the filesystem. Also, it cannot wait
> >>>for reclaim memory from cache.
> >>>
> >>>This warning is usually followed by some statistics about memory use
> >>>in your system. Please post it to give more information about this
> >>>situation.
> >>>
> >>>I have watched some of this happen when lots of cache is used by some
> >>>filesystems. Perhaps, some tweaking of the vm sysctl options may help,
> >>>but I can point any magic tweaking right now.
> >>>
> >>>Regards,
> >>>Cascardo.
> >>>
> >>>>I'm not a subscriber, so please CC me a copy of messages related to
> >>>>the subject. I'm not sure if I can help much by looking at the
> >>>>inside of the kernel, but I will try my best to answer any questions
> >>>>concerning this issue.
> >>>>
> >>>>Best regards, Philipp
> >>>>--
> >>>>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >>>>the body of a message to majordomo@vger.kernel.org
> >>>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>Please read the FAQ at  http://www.tux.org/lkml/
> >>>
> >>
> >
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-18 10:25 Vanilla-Kernel 3 - page allocation failure Philipp Herz - Profihost AG
  2011-10-18 11:32 ` Thadeu Lima de Souza Cascardo
@ 2011-10-18 15:51 ` Andi Kleen
  2011-10-18 17:02   ` Dave Jones
                     ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Andi Kleen @ 2011-10-18 15:51 UTC (permalink / raw)
  To: p.herz; +Cc: linux-kernel

Philipp Herz - Profihost AG <p.herz@profihost.ag> writes:

> After updating kernel (x86_64) to stable version 3 there are a few
> messages appearing in the kernel log such as
>
> kworker/0:1: page allocation failure: order:1, mode:0x20
> mysql: page allocation failure: order:1, mode:0x20
> php5: page allocation failure: order:1, mode:0x20

You just ran out of memory.

The problem here seems to be that the kernel is unable to communicate
in a language you can understand. 

How do you think the message should have been phrased to make the 
issue more clear?

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-18 15:51 ` Andi Kleen
@ 2011-10-18 17:02   ` Dave Jones
  2011-10-18 18:59     ` Andi Kleen
  2011-10-19  1:58   ` David Rientjes
  2011-10-20 21:11   ` Valdis.Kletnieks
  2 siblings, 1 reply; 23+ messages in thread
From: Dave Jones @ 2011-10-18 17:02 UTC (permalink / raw)
  To: Andi Kleen; +Cc: p.herz, linux-kernel

On Tue, Oct 18, 2011 at 08:51:54AM -0700, Andi Kleen wrote:
 > Philipp Herz - Profihost AG <p.herz@profihost.ag> writes:
 > 
 > > After updating kernel (x86_64) to stable version 3 there are a few
 > > messages appearing in the kernel log such as
 > >
 > > kworker/0:1: page allocation failure: order:1, mode:0x20
 > > mysql: page allocation failure: order:1, mode:0x20
 > > php5: page allocation failure: order:1, mode:0x20
 > 
 > You just ran out of memory.
 > 
 > The problem here seems to be that the kernel is unable to communicate
 > in a language you can understand. 
 > 
 > How do you think the message should have been phrased to make the 
 > issue more clear?

We get reports like this fairly regularly, usually accompanied by
"But I had lots of free memory and/or swap!"

The order/mode stuff is completely opaque to end-users, who have no
clue that there are different types of memory, and exhausting one type
can happen even when plenty of other memory is free.

I've been toying with the idea of hacking up a patch to turn those mode
flags into printing things like "mode:GFP_ATOMIC|GFP_NOIO" instead though, as I can
never remember those flags off the top of my head.
Still won't help end-users, but it would at least speed up diagnosing reports.

	Dave
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-18 17:02   ` Dave Jones
@ 2011-10-18 18:59     ` Andi Kleen
  0 siblings, 0 replies; 23+ messages in thread
From: Andi Kleen @ 2011-10-18 18:59 UTC (permalink / raw)
  To: Dave Jones, Andi Kleen, p.herz, linux-kernel

> We get reports like this fairly regularly, usually accompanied by
> "But I had lots of free memory and/or swap!"

I think the backtrace is also really bad. It just makes it look like a crash.
It just cries "please report me", even though there's usually no good
reason for it.

I understand it can be useful sometimes for debugging, but most of the
time it is unnecessary and just confusing. One good thing probably
would be some heuristic to see when to print the backtrace, and don't
print it in common situations.

> 
> The order/mode stuff is completely opaque to end-users, who have no
> clue that there are different types of memory, and exhausting one type
> can happen even when plenty of other memory is free.

order should be probably replaced with a user readable size, agreed.

order:2 = "16 KB"

[note if anybody wants to reply now it should be "16 KiB", don't bother;
i'll ignore you]
> 
> I've been toying with the idea of hacking up a patch to turn those mode
> flags into printing things like "mode:GFP_ATOMIC|GFP_NOIO" instead though, as I can
> never remember those flags off the top of my head.
> Still won't help end-users, but it would at least speed up diagnosing reports.

Better decode it: "from interrupt handler", "inside a file system"

Unfortunately there's no flag for GFP_ATOMIC but not in a interrupt handler,
but some code with broken locking abusing it. Perhaps there should be.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-18 15:51 ` Andi Kleen
  2011-10-18 17:02   ` Dave Jones
@ 2011-10-19  1:58   ` David Rientjes
  2011-10-24  6:33     ` Philipp Herz - Profihost AG
  2011-10-20 21:11   ` Valdis.Kletnieks
  2 siblings, 1 reply; 23+ messages in thread
From: David Rientjes @ 2011-10-19  1:58 UTC (permalink / raw)
  To: Andi Kleen; +Cc: p.herz, linux-kernel

On Tue, 18 Oct 2011, Andi Kleen wrote:

> Philipp Herz - Profihost AG <p.herz@profihost.ag> writes:
> 
> > After updating kernel (x86_64) to stable version 3 there are a few
> > messages appearing in the kernel log such as
> >
> > kworker/0:1: page allocation failure: order:1, mode:0x20
> > mysql: page allocation failure: order:1, mode:0x20
> > php5: page allocation failure: order:1, mode:0x20
> 
> You just ran out of memory.
> 

He ran out of order-1 physically contiguous memory and was unable to 
compact or reclaim because of the atomic context.

Philipp, based on your pastes from another post, it's evident you're using 
CONFIG_SLAB and, unfortunately, it's not possible to change to single 
page allocations (which would only result in a page allocation failure if 
you were completely out of memory) without recompiling.

You have a couple options:

 - recompile with BREAK_GFP_ORDER_HI redefined to 0 in mm/slab.c, or

 - recompile with CONFIG_SLUB instead of CONFIG_SLAB.

It's very possible that neither of these will help, but it will tell you 
whether you need to go out and buy more RAM or not.  If you try to 
recompile with BREAK_GFP_ORDER_HI, these may turn into order-0 
allocations.  If you can't reboot, send the output of 
/proc/<pid>/net/protocols where <pid> is the pid of one of the above tasks 
(kworker, mysql, php5) when they are running and we'll know.

 [ Changing slab_break_gfp_order should really be a CONFIG_SLAB command-
   line option.  It can't be runtime because slab depends on the order for
   caches remaining constant, but we can certainly change it on boot. ]

If you try CONFIG_SLUB instead of CONFIG_SLAB, you can pass 
slub_max_order=0 on the command line and see if it helps.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-18 14:35         ` Thadeu Lima de Souza Cascardo
@ 2011-10-19  6:45           ` Philipp Herz - Profihost AG
  0 siblings, 0 replies; 23+ messages in thread
From: Philipp Herz - Profihost AG @ 2011-10-19  6:45 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo; +Cc: linux-kernel

Hello Cascardo,

 > echo m>  /proc/sysrq-trigger
Thanks,
I have pasted another Call Trace including memory stats at

* http://pastebin.com/vjLHuqtk

Not sure if memory stats are close enough to the call trace event.

If not, do we have to recompile the kernel to get call traces and memory 
stats at the same time?

 > Do your workload works better on a previous version? I had problems
 > using something like 2.6.32.
Yes,
kernel version was 2.6.32.40 before and we never had those messages 
appearing.

Regards,
Philipp

Am 18.10.2011 16:35, schrieb Thadeu Lima de Souza Cascardo:
> On Tue, Oct 18, 2011 at 03:24:44PM +0200, Philipp Herz - Profihost AG wrote:
>> Hello Cascardo
>>
>>> Usually, after the stack dump, there is some
>>> statistics about memory.
>> Yes, i have seen this in other posts as well.
>>
>>> I have seen that these may be suppressed
>>> if you have a NUMA system with lots of nodes.
>> Yes, in our case it seems to be suppressed.
>>
>>> Check for NODE_SHIFT in your
>>> config. If it's greater than 8, that output may have been suppressed.
>> CONFIG_NODES_SHIFT=10 will be the answer.
>>
>> Is there any way to get those stats without recompiling the kernel?
>>
>>> But you may have just ignored the statistics because of the
>>> stack dump.
>> No, i was also wondering why other do have these ;-)
>>
>> Regards,
>> Philipp
>>
>
> echo m>  /proc/sysrq-trigger
>
> will show you that same output, but not at the time the memory failure
> happens. It may still show you what is the condition of memory on your
> nodes.
>
> I am not that much versed in the VM. It just happens that I had very
> similar issues lately and was trying to undertand it a little more. I
> still have to solve these issues myself.
>
> In my case, the workload is IO bound on extX filesystems and I see that
> other systems have these failures due to this memory pressure. Usually,
> after stopping the workload and unmounting the filesystems, I get most
> of the memory in the system freed.
>
> Most of the failures are from GFP_ATOMIC allocations, because those
> won't reclaim memory, but they won't allocate if there is only freed
> memory below the threshold. Setting this threshold to a lower value
> like it was suggested (min_free_kbytes) would have helped, but, then,
> this allows whatever is putting pressure on your memory to also allocate
> below the threshold and you end up in the same situation (or a worse
> one).
>
> Do your workload works better on a previous version? I had problems
> using something like 2.6.32.
>
> Regards,
> Cascardo.
>
>> Am 18.10.2011 14:38, schrieb Thadeu Lima de Souza Cascardo:
>>> On Tue, Oct 18, 2011 at 02:07:38PM +0200, Philipp Herz - Profihost AG wrote:
>>>> Hello Cascardo,
>>>>
>>>> thanks for your detailed answer!
>>>>
>>>> I have uploaded two call traces to pastebin for further investigation.
>>>>
>>>> Maybe this can help you.
>>>>
>>>> * http://pastebin.com/Psg2dGYC (kworker)
>>>> * http://pastebin.com/pPFjZqxL (php5)
>>>>
>>>> Regards,
>>>> Philipp
>>>>
>>>
>>> Hello, Philipp.
>>>
>>> That only tells us that you have a TCP workload in your system. This is
>>> the subsystem that is trying to allocate memory. However, we do not know
>>> why there is failure. Usually, after the stack dump, there is some
>>> statistics about memory. I have seen that these may be suppressed if you
>>> have a NUMA system with lots of nodes. Check for NODE_SHIFT in your
>>> config. If it's greater than 8, that output may have been suppressed.
>>> But you may have just ignored the statistics because of the stack dump.
>>>
>>> Regards,
>>> Cascardo.
>>>
>>>>
>>>> Am 18.10.2011 13:32, schrieb Thadeu Lima de Souza Cascardo:
>>>>> On Tue, Oct 18, 2011 at 12:25:03PM +0200, Philipp Herz - Profihost AG wrote:
>>>>>> After updating kernel (x86_64) to stable version 3 there are a few
>>>>>> messages appearing in the kernel log such as
>>>>>>
>>>>>> kworker/0:1: page allocation failure: order:1, mode:0x20
>>>>>> mysql: page allocation failure: order:1, mode:0x20
>>>>>> php5: page allocation failure: order:1, mode:0x20
>>>>>>
>>>>>> Searching the net showed that these messages are known to occur since 2004.
>>>>>>
>>>>>> Some people were able to get rid of them by setting
>>>>>> /proc/sys/vm/min_free_kbytes to a high enough value. This does not
>>>>>> help in our case.
>>>>>>
>>>>>>
>>>>>> Is there a kernel comand line argument to avoid these messages?
>>>>>>
>>>>>> As of mm/page_alloc.c these messages are marked to be only warning
>>>>>> messages and would not appear if 'gpf_mask' was set to __GFP_NOWARN
>>>>>> in function warn_alloc_failed.
>>>>>>
>>>>>> How does this mask get set? Is it set by the "external" process
>>>>>> knocking at the memory manager?
>>>>>>
>>>>>
>>>>> Hello, Philipp.
>>>>>
>>>>> This happens when kernel tries to allocate memory, sometimes in response
>>>>> to some request by the user space, but also in other contexts. For
>>>>> example, an interrupt by a network driver may try to allocate memory. In
>>>>> this context, it will use GFP_ATOMIC as a mask, for example. The most
>>>>> usual flags in the kernel are GFP_KERNEL and GFP_ATOMIC.
>>>>>
>>>>>> What is the magic behind the 'order' and 'mode'?
>>>>>>
>>>>>
>>>>> The order is the binary log of the number of pages requested. So, order 1
>>>>> allocations are 2 pages, order 4 would be 16 pages, for example.
>>>>>
>>>>> The mode is, in fact, gfp_flags. 0x20 is GFP_ATOMIC. This kind of
>>>>> allocation cannot do IO or access the filesystem. Also, it cannot wait
>>>>> for reclaim memory from cache.
>>>>>
>>>>> This warning is usually followed by some statistics about memory use
>>>>> in your system. Please post it to give more information about this
>>>>> situation.
>>>>>
>>>>> I have watched some of this happen when lots of cache is used by some
>>>>> filesystems. Perhaps, some tweaking of the vm sysctl options may help,
>>>>> but I can point any magic tweaking right now.
>>>>>
>>>>> Regards,
>>>>> Cascardo.
>>>>>
>>>>>> I'm not a subscriber, so please CC me a copy of messages related to
>>>>>> the subject. I'm not sure if I can help much by looking at the
>>>>>> inside of the kernel, but I will try my best to answer any questions
>>>>>> concerning this issue.
>>>>>>
>>>>>> Best regards, Philipp
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>>>
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-18 15:51 ` Andi Kleen
  2011-10-18 17:02   ` Dave Jones
  2011-10-19  1:58   ` David Rientjes
@ 2011-10-20 21:11   ` Valdis.Kletnieks
  2011-10-21  6:36     ` Philipp Herz - Profihost AG
  2 siblings, 1 reply; 23+ messages in thread
From: Valdis.Kletnieks @ 2011-10-20 21:11 UTC (permalink / raw)
  To: Andi Kleen; +Cc: p.herz, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 576 bytes --]

On Tue, 18 Oct 2011 08:51:54 PDT, Andi Kleen said:
> Philipp Herz - Profihost AG <p.herz@profihost.ag> writes:
> 
> > After updating kernel (x86_64) to stable version 3 there are a few
> > messages appearing in the kernel log such as
> >
> > kworker/0:1: page allocation failure: order:1, mode:0x20
> > mysql: page allocation failure: order:1, mode:0x20
> > php5: page allocation failure: order:1, mode:0x20
> 
> You just ran out of memory.

I read it as "Why is this happening when the previous kernel didn't have
this issue?", which is a *much* more complicated question...

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-20 21:11   ` Valdis.Kletnieks
@ 2011-10-21  6:36     ` Philipp Herz - Profihost AG
  0 siblings, 0 replies; 23+ messages in thread
From: Philipp Herz - Profihost AG @ 2011-10-21  6:36 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Andi Kleen, linux-kernel

> On Tue, 18 Oct 2011 08:51:54 PDT, Andi Kleen said:
>> Philipp Herz - Profihost AG<p.herz@profihost.ag>  writes:
>>
>>> After updating kernel (x86_64) to stable version 3 there are a few
>>> messages appearing in the kernel log such as
>>>
>>> kworker/0:1: page allocation failure: order:1, mode:0x20
>>> mysql: page allocation failure: order:1, mode:0x20
>>> php5: page allocation failure: order:1, mode:0x20
>>
>> You just ran out of memory.
>
> I read it as "Why is this happening when the previous kernel didn't have
> this issue?", which is a *much* more complicated question...

Exactly, that was the intention of my post.

It would me nice to know, if these messages do require to focus on 
memory statitics?

Looking at our monitoring data, the system does have enough memory 
available.

How do processes get effected, when they are running into situations 
where kernel memory manager tells them about "page allocation failure"?

Is it just meant to be a warning/debugging message new to current kernel 
version 3?

How can we solve the situation? Getting rid of these message, by

- just suppressing
- increasing system's memory
- change kernel and/or kernel runtime config

?

Kind regards,
Philipp

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-19  1:58   ` David Rientjes
@ 2011-10-24  6:33     ` Philipp Herz - Profihost AG
  2011-10-24  7:03       ` Eric Dumazet
  2011-10-26 20:26       ` David Rientjes
  0 siblings, 2 replies; 23+ messages in thread
From: Philipp Herz - Profihost AG @ 2011-10-24  6:33 UTC (permalink / raw)
  To: David Rientjes; +Cc: Andi Kleen, linux-kernel


Am 19.10.2011 03:58, schrieb David Rientjes:
> On Tue, 18 Oct 2011, Andi Kleen wrote:
>
>> Philipp Herz - Profihost AG<p.herz@profihost.ag>  writes:
>>
>>> After updating kernel (x86_64) to stable version 3 there are a few
>>> messages appearing in the kernel log such as
>>>
>>> kworker/0:1: page allocation failure: order:1, mode:0x20
>>> mysql: page allocation failure: order:1, mode:0x20
>>> php5: page allocation failure: order:1, mode:0x20
>>
>> You just ran out of memory.
>>
>
> He ran out of order-1 physically contiguous memory and was unable to
> compact or reclaim because of the atomic context.
>
> Philipp, based on your pastes from another post, it's evident you're using
> CONFIG_SLAB and, unfortunately, it's not possible to change to single
> page allocations (which would only result in a page allocation failure if
> you were completely out of memory) without recompiling.
>
> You have a couple options:
>
>   - recompile with BREAK_GFP_ORDER_HI redefined to 0 in mm/slab.c, or
>
>   - recompile with CONFIG_SLUB instead of CONFIG_SLAB.
>
> It's very possible that neither of these will help, but it will tell you
> whether you need to go out and buy more RAM or not.  If you try to
> recompile with BREAK_GFP_ORDER_HI, these may turn into order-0
> allocations.  If you can't reboot, send the output of
> /proc/<pid>/net/protocols where<pid>  is the pid of one of the above tasks
> (kworker, mysql, php5) when they are running and we'll know.
>
>   [ Changing slab_break_gfp_order should really be a CONFIG_SLAB command-
>     line option.  It can't be runtime because slab depends on the order for
>     caches remaining constant, but we can certainly change it on boot. ]
>
> If you try CONFIG_SLUB instead of CONFIG_SLAB, you can pass
> slub_max_order=0 on the command line and see if it helps.

Hi David,

we have recompiled the kernel of one machine with CONFIG_SLUB instead of 
CONFIG_SLAB, but it is showing similar message.

Now it's showing failure at "order:5, mode:0x4020".

Call trace can be found at:
* http://pastebin.com/uGJiwvG1

Comparing kernel 2.6.32 (mm/page_alloc.c) there seams to be the same way 
of dealing with page allocation.

Do you have an idea why these (warning) messages do never appear running 
2.6.32?

Regards,
Philipp

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-24  6:33     ` Philipp Herz - Profihost AG
@ 2011-10-24  7:03       ` Eric Dumazet
  2011-10-24  7:21         ` Philipp Herz - Profihost AG
  2011-10-26 20:26       ` David Rientjes
  1 sibling, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2011-10-24  7:03 UTC (permalink / raw)
  To: p.herz; +Cc: David Rientjes, Andi Kleen, linux-kernel

Le lundi 24 octobre 2011 à 08:33 +0200, Philipp Herz - Profihost AG a
écrit :
> Am 19.10.2011 03:58, schrieb David Rientjes:
> > On Tue, 18 Oct 2011, Andi Kleen wrote:
> >
> >> Philipp Herz - Profihost AG<p.herz@profihost.ag>  writes:
> >>
> >>> After updating kernel (x86_64) to stable version 3 there are a few
> >>> messages appearing in the kernel log such as
> >>>
> >>> kworker/0:1: page allocation failure: order:1, mode:0x20
> >>> mysql: page allocation failure: order:1, mode:0x20
> >>> php5: page allocation failure: order:1, mode:0x20
> >>
> >> You just ran out of memory.
> >>
> >
> > He ran out of order-1 physically contiguous memory and was unable to
> > compact or reclaim because of the atomic context.
> >
> > Philipp, based on your pastes from another post, it's evident you're using
> > CONFIG_SLAB and, unfortunately, it's not possible to change to single
> > page allocations (which would only result in a page allocation failure if
> > you were completely out of memory) without recompiling.
> >
> > You have a couple options:
> >
> >   - recompile with BREAK_GFP_ORDER_HI redefined to 0 in mm/slab.c, or
> >
> >   - recompile with CONFIG_SLUB instead of CONFIG_SLAB.
> >
> > It's very possible that neither of these will help, but it will tell you
> > whether you need to go out and buy more RAM or not.  If you try to
> > recompile with BREAK_GFP_ORDER_HI, these may turn into order-0
> > allocations.  If you can't reboot, send the output of
> > /proc/<pid>/net/protocols where<pid>  is the pid of one of the above tasks
> > (kworker, mysql, php5) when they are running and we'll know.
> >
> >   [ Changing slab_break_gfp_order should really be a CONFIG_SLAB command-
> >     line option.  It can't be runtime because slab depends on the order for
> >     caches remaining constant, but we can certainly change it on boot. ]
> >
> > If you try CONFIG_SLUB instead of CONFIG_SLAB, you can pass
> > slub_max_order=0 on the command line and see if it helps.
> 
> Hi David,
> 
> we have recompiled the kernel of one machine with CONFIG_SLUB instead of 
> CONFIG_SLAB, but it is showing similar message.
> 
> Now it's showing failure at "order:5, mode:0x4020".
> 
> Call trace can be found at:
> * http://pastebin.com/uGJiwvG1
> 
> Comparing kernel 2.6.32 (mm/page_alloc.c) there seams to be the same way 
> of dealing with page allocation.
> 
> Do you have an idea why these (warning) messages do never appear running 
> 2.6.32?

Your tg3 has a firmware limitation, and some skbs using fragments have
to be reallocated using a single and contiguous area of memory.

Initial skb delivered by tcp stack only uses order-0 pages, but the
reallocated one, being 64K, can be order-5

You can avoid this by following tuning :

ethtool -K eth0 sg off




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-24  7:03       ` Eric Dumazet
@ 2011-10-24  7:21         ` Philipp Herz - Profihost AG
  2011-10-24  8:01           ` Eric Dumazet
  0 siblings, 1 reply; 23+ messages in thread
From: Philipp Herz - Profihost AG @ 2011-10-24  7:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Rientjes, Andi Kleen, s.priebe, linux-kernel

Am 24.10.2011 09:03, schrieb Eric Dumazet:
> Le lundi 24 octobre 2011 à 08:33 +0200, Philipp Herz - Profihost AG a
> écrit :
>> Am 19.10.2011 03:58, schrieb David Rientjes:
>>> On Tue, 18 Oct 2011, Andi Kleen wrote:
>>>
>>>> Philipp Herz - Profihost AG<p.herz@profihost.ag>   writes:
>>>>
>>>>> After updating kernel (x86_64) to stable version 3 there are a few
>>>>> messages appearing in the kernel log such as
>>>>>
>>>>> kworker/0:1: page allocation failure: order:1, mode:0x20
>>>>> mysql: page allocation failure: order:1, mode:0x20
>>>>> php5: page allocation failure: order:1, mode:0x20
>>>>
>>>> You just ran out of memory.
>>>>
>>>
>>> He ran out of order-1 physically contiguous memory and was unable to
>>> compact or reclaim because of the atomic context.
>>>
>>> Philipp, based on your pastes from another post, it's evident you're using
>>> CONFIG_SLAB and, unfortunately, it's not possible to change to single
>>> page allocations (which would only result in a page allocation failure if
>>> you were completely out of memory) without recompiling.
>>>
>>> You have a couple options:
>>>
>>>    - recompile with BREAK_GFP_ORDER_HI redefined to 0 in mm/slab.c, or
>>>
>>>    - recompile with CONFIG_SLUB instead of CONFIG_SLAB.
>>>
>>> It's very possible that neither of these will help, but it will tell you
>>> whether you need to go out and buy more RAM or not.  If you try to
>>> recompile with BREAK_GFP_ORDER_HI, these may turn into order-0
>>> allocations.  If you can't reboot, send the output of
>>> /proc/<pid>/net/protocols where<pid>   is the pid of one of the above tasks
>>> (kworker, mysql, php5) when they are running and we'll know.
>>>
>>>    [ Changing slab_break_gfp_order should really be a CONFIG_SLAB command-
>>>      line option.  It can't be runtime because slab depends on the order for
>>>      caches remaining constant, but we can certainly change it on boot. ]
>>>
>>> If you try CONFIG_SLUB instead of CONFIG_SLAB, you can pass
>>> slub_max_order=0 on the command line and see if it helps.
>>
>> Hi David,
>>
>> we have recompiled the kernel of one machine with CONFIG_SLUB instead of
>> CONFIG_SLAB, but it is showing similar message.
>>
>> Now it's showing failure at "order:5, mode:0x4020".
>>
>> Call trace can be found at:
>> * http://pastebin.com/uGJiwvG1
>>
>> Comparing kernel 2.6.32 (mm/page_alloc.c) there seams to be the same way
>> of dealing with page allocation.
>>
>> Do you have an idea why these (warning) messages do never appear running
>> 2.6.32?
>
> Your tg3 has a firmware limitation, and some skbs using fragments have
> to be reallocated using a single and contiguous area of memory.
>
> Initial skb delivered by tcp stack only uses order-0 pages, but the
> reallocated one, being 64K, can be order-5
>
> You can avoid this by following tuning :
>
> ethtool -K eth0 sg off
>

ok,

does that mean that there was no firmware limitation with kernel 2.6.32 
or that the tg3 module has any "disable warnings" flag matching 
__GFP_NOWARN?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-24  7:21         ` Philipp Herz - Profihost AG
@ 2011-10-24  8:01           ` Eric Dumazet
  2011-10-24  8:19             ` Philipp Herz - Profihost AG
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2011-10-24  8:01 UTC (permalink / raw)
  To: p.herz; +Cc: David Rientjes, Andi Kleen, s.priebe, linux-kernel

Le lundi 24 octobre 2011 à 09:21 +0200, Philipp Herz - Profihost AG a
écrit :

> does that mean that there was no firmware limitation with kernel 2.6.32 
> or that the tg3 module has any "disable warnings" flag matching 
> __GFP_NOWARN?
> 

There is no __GFP_NOWARN trick on tg3.

We tend to prefer to be notified of a memory problem, instead of
hide ...

By the way, apparently this driver drops the frame and doesnt increase
tx_dropped device counter. A patch will follow.

Could you post your full dmesg ?



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-24  8:01           ` Eric Dumazet
@ 2011-10-24  8:19             ` Philipp Herz - Profihost AG
  2011-10-24  8:29               ` Eric Dumazet
  0 siblings, 1 reply; 23+ messages in thread
From: Philipp Herz - Profihost AG @ 2011-10-24  8:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Rientjes, Andi Kleen, s.priebe, linux-kernel

Am 24.10.2011 10:01, schrieb Eric Dumazet:
> Le lundi 24 octobre 2011 à 09:21 +0200, Philipp Herz - Profihost AG a
> écrit :
>
>> does that mean that there was no firmware limitation with kernel 2.6.32
>> or that the tg3 module has any "disable warnings" flag matching
>> __GFP_NOWARN?
>>
>
> There is no __GFP_NOWARN trick on tg3.
>
> We tend to prefer to be notified of a memory problem, instead of
> hide ...
yes,
that's exactly what would like to understand in comparison to the 
behavior of kernel 2.6.32.

why does this notification show up now and never did before...

>
> By the way, apparently this driver drops the frame and doesnt increase
> tx_dropped device counter. A patch will follow.
fine

>
> Could you post your full dmesg ?
>
>
Currently i can not provide any further information, 'cause server has 
been restarted.

What exactly are you looking for?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-24  8:19             ` Philipp Herz - Profihost AG
@ 2011-10-24  8:29               ` Eric Dumazet
  2011-10-24  8:36                 ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2011-10-24  8:29 UTC (permalink / raw)
  To: p.herz; +Cc: David Rientjes, Andi Kleen, s.priebe, linux-kernel

Le lundi 24 octobre 2011 à 10:19 +0200, Philipp Herz - Profihost AG a
écrit :
> Currently i can not provide any further information, 'cause server has 
> been restarted.
> 
> What exactly are you looking for?

A dmesg for 2.6.32 is OK, I look for :

- memory layout (tg3 has workarounds for 4G crossing),
- and exact tg3 chip, tg3 messages...




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-24  8:29               ` Eric Dumazet
@ 2011-10-24  8:36                 ` Stefan Priebe - Profihost AG
  0 siblings, 0 replies; 23+ messages in thread
From: Stefan Priebe - Profihost AG @ 2011-10-24  8:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: p.herz, David Rientjes, Andi Kleen, linux-kernel

Am 24.10.2011 10:29, schrieb Eric Dumazet:
> Le lundi 24 octobre 2011 à 10:19 +0200, Philipp Herz - Profihost AG a
> écrit :
>> Currently i can not provide any further information, 'cause server has
>> been restarted.
>>
>> What exactly are you looking for?
>
> A dmesg for 2.6.32 is OK, I look for :
>
> - memory layout (tg3 has workarounds for 4G crossing),
> - and exact tg3 chip, tg3 messages...

Here is a dmesg from 2.6.38 where this message also does not occur:
http://pastebin.com/raw.php?i=HjDEKVcp

Stefan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-24  6:33     ` Philipp Herz - Profihost AG
  2011-10-24  7:03       ` Eric Dumazet
@ 2011-10-26 20:26       ` David Rientjes
  2011-10-27  7:13         ` Philipp Herz - Profihost AG
  1 sibling, 1 reply; 23+ messages in thread
From: David Rientjes @ 2011-10-26 20:26 UTC (permalink / raw)
  To: Philipp Herz - Profihost AG; +Cc: Andi Kleen, linux-kernel

On Mon, 24 Oct 2011, Philipp Herz - Profihost AG wrote:

> we have recompiled the kernel of one machine with CONFIG_SLUB instead of
> CONFIG_SLAB, but it is showing similar message.
> 
> Now it's showing failure at "order:5, mode:0x4020".
> 
> Call trace can be found at:
> * http://pastebin.com/uGJiwvG1
> 
> Comparing kernel 2.6.32 (mm/page_alloc.c) there seams to be the same way of
> dealing with page allocation.
> 
> Do you have an idea why these (warning) messages do never appear running
> 2.6.32?
> 

Do you have CONFIG_COMPACTION enabled?  Perhaps this is a difference in 
the deprecation of lumpy reclaim between 2.6.35 and 2.6.38 and 
defragmentation being done by memory compaction instead.

It won't be triggered synchronously in this context since it's a 
GFP_ATOMIC allocation, which is why it emits a page allocation failure in 
the first place, but it will show whether defragmentation is the issue or 
you're just simply low on memory.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-26 20:26       ` David Rientjes
@ 2011-10-27  7:13         ` Philipp Herz - Profihost AG
  2011-10-27 20:08           ` David Rientjes
  0 siblings, 1 reply; 23+ messages in thread
From: Philipp Herz - Profihost AG @ 2011-10-27  7:13 UTC (permalink / raw)
  To: David Rientjes; +Cc: Andi Kleen, linux-kernel, s.priebe

Am 26.10.2011 22:26, schrieb David Rientjes:
> On Mon, 24 Oct 2011, Philipp Herz - Profihost AG wrote:
>
>> we have recompiled the kernel of one machine with CONFIG_SLUB instead of
>> CONFIG_SLAB, but it is showing similar message.
>>
>> Now it's showing failure at "order:5, mode:0x4020".
>>
>> Call trace can be found at:
>> * http://pastebin.com/uGJiwvG1
>>
>> Comparing kernel 2.6.32 (mm/page_alloc.c) there seams to be the same way of
>> dealing with page allocation.
>>
>> Do you have an idea why these (warning) messages do never appear running
>> 2.6.32?
>>
>
> Do you have CONFIG_COMPACTION enabled?  Perhaps this is a difference in
> the deprecation of lumpy reclaim between 2.6.35 and 2.6.38 and
> defragmentation being done by memory compaction instead.
Yes CONFIG_COMPACTION is enabled as a dependency for 
TRANSPARENT_HUGEPAGE which is different to the configs before.

>
> It won't be triggered synchronously in this context since it's a
> GFP_ATOMIC allocation, which is why it emits a page allocation failure in
> the first place, but it will show whether defragmentation is the issue or
> you're just simply low on memory.
Do you mean that "memory compaction" should be turned off again?

How can I see the difference between "deframentation issue" and "low 
memory"? I did not get this point.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Vanilla-Kernel 3 - page allocation failure
  2011-10-27  7:13         ` Philipp Herz - Profihost AG
@ 2011-10-27 20:08           ` David Rientjes
  0 siblings, 0 replies; 23+ messages in thread
From: David Rientjes @ 2011-10-27 20:08 UTC (permalink / raw)
  To: Philipp Herz - Profihost AG; +Cc: Andi Kleen, linux-kernel, s.priebe

On Thu, 27 Oct 2011, Philipp Herz - Profihost AG wrote:

> > It won't be triggered synchronously in this context since it's a
> > GFP_ATOMIC allocation, which is why it emits a page allocation failure in
> > the first place, but it will show whether defragmentation is the issue or
> > you're just simply low on memory.
> Do you mean that "memory compaction" should be turned off again?
> 

No, I mean the difference since 2.6.32 might be that lumpy reclaim was 
deprecated so it is no longer causing high-order page allocations to be 
free in atomic context where no balancing or migration can synchronously 
be done.

> How can I see the difference between "deframentation issue" and "low memory"?
> I did not get this point.
> 

unusable_index and extfrag_index in debugfs if you've enabled and mounted 
that.  You could also take a look at tweaking 
/proc/sys/vm/extfrag_threshold, see Documentation/sysctl/vm.txt.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2011-10-27 20:08 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-18 10:25 Vanilla-Kernel 3 - page allocation failure Philipp Herz - Profihost AG
2011-10-18 11:32 ` Thadeu Lima de Souza Cascardo
2011-10-18 12:07   ` Philipp Herz - Profihost AG
2011-10-18 12:38     ` Thadeu Lima de Souza Cascardo
2011-10-18 13:24       ` Philipp Herz - Profihost AG
2011-10-18 14:35         ` Thadeu Lima de Souza Cascardo
2011-10-19  6:45           ` Philipp Herz - Profihost AG
2011-10-18 15:51 ` Andi Kleen
2011-10-18 17:02   ` Dave Jones
2011-10-18 18:59     ` Andi Kleen
2011-10-19  1:58   ` David Rientjes
2011-10-24  6:33     ` Philipp Herz - Profihost AG
2011-10-24  7:03       ` Eric Dumazet
2011-10-24  7:21         ` Philipp Herz - Profihost AG
2011-10-24  8:01           ` Eric Dumazet
2011-10-24  8:19             ` Philipp Herz - Profihost AG
2011-10-24  8:29               ` Eric Dumazet
2011-10-24  8:36                 ` Stefan Priebe - Profihost AG
2011-10-26 20:26       ` David Rientjes
2011-10-27  7:13         ` Philipp Herz - Profihost AG
2011-10-27 20:08           ` David Rientjes
2011-10-20 21:11   ` Valdis.Kletnieks
2011-10-21  6:36     ` Philipp Herz - Profihost AG

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).