netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* order 7 allocations from xt_recent
@ 2013-01-03 16:43 Dave Jones
  2013-01-03 16:55 ` Eric Dumazet
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Jones @ 2013-01-03 16:43 UTC (permalink / raw)
  To: netdev; +Cc: h.reindl, Fedora Kernel Team

We had a report from a user that shows this code trying
to do enormous allocations, which isn't going to work too well..

iptables: page allocation failure: order:7, mode:0xc0d0
Pid: 2822, comm: iptables Not tainted 3.6.10-2.fc17.x86_64 #1
Call Trace:
 [<ffffffff8113130b>] warn_alloc_failed+0xeb/0x150
 [<ffffffff81616576>] ? __alloc_pages_direct_compact+0x17e/0x190
 [<ffffffff81135196>] __alloc_pages_nodemask+0x736/0x990
 [<ffffffff811710e0>] alloc_pages_current+0xb0/0x120
 [<ffffffff8113022a>] __get_free_pages+0x2a/0x80
 [<ffffffff811786d9>] kmalloc_order_trace+0x39/0xb0
 [<ffffffff8117ae3a>] __kmalloc+0x16a/0x1a0
 [<ffffffff8118aa7c>] ? mem_cgroup_bad_page_check+0x1c/0x30
 [<ffffffff81134563>] ? get_page_from_freelist+0x453/0x950
 [<ffffffffa007696e>] recent_mt_check.isra.6+0x16e/0x2c0 [xt_recent]
 [<ffffffffa0076b4b>] recent_mt_check_v0+0x6b/0xa0 [xt_recent]
 [<ffffffff8153fdda>] xt_check_match+0xaa/0x1e0
 [<ffffffff8153f3ab>] ? xt_find_match+0x11b/0x130
 [<ffffffff8153f3ab>] ? xt_find_match+0x11b/0x130
 [<ffffffff8159257c>] check_match+0x3c/0x50
 [<ffffffff81593ccb>] translate_table+0x39b/0x5b0
 [<ffffffff815956f3>] do_ipt_set_ctl+0x133/0x200
 [<ffffffff8153e10b>] nf_setsockopt+0x6b/0x90
 [<ffffffff8161f236>] ? _raw_spin_lock_bh+0x16/0x40
 [<ffffffff8154e41f>] ip_setsockopt+0x8f/0xa0
 [<ffffffff8156f49d>] raw_setsockopt+0x1d/0x30
 [<ffffffff814fcf14>] sock_common_setsockopt+0x14/0x20
 [<ffffffff814fc23c>] sys_setsockopt+0x7c/0xe0
 [<ffffffff816270e9>] system_call_fastpath+0x16/0x1b

which looks like it's this..

        t = kzalloc(sizeof(*t) + sizeof(t->iphash[0]) * ip_list_hash_size,
                    GFP_KERNEL);

Which is initialised thus..

        ip_list_hash_size = 1 << fls(ip_list_tot);

And ip_list_tot is 10000 in this case. Hmm ?


Complete report and setup described in his bug report at https://bugzilla.redhat.com/show_bug.cgi?id=890715

	Dave

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: order 7 allocations from xt_recent
  2013-01-03 16:43 order 7 allocations from xt_recent Dave Jones
@ 2013-01-03 16:55 ` Eric Dumazet
  2013-01-03 17:11   ` Dave Jones
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2013-01-03 16:55 UTC (permalink / raw)
  To: Dave Jones; +Cc: netdev, h.reindl, Fedora Kernel Team

On Thu, 2013-01-03 at 11:43 -0500, Dave Jones wrote:
> We had a report from a user that shows this code trying
> to do enormous allocations, which isn't going to work too well..
> 
> iptables: page allocation failure: order:7, mode:0xc0d0
> Pid: 2822, comm: iptables Not tainted 3.6.10-2.fc17.x86_64 #1
> Call Trace:
>  [<ffffffff8113130b>] warn_alloc_failed+0xeb/0x150
>  [<ffffffff81616576>] ? __alloc_pages_direct_compact+0x17e/0x190
>  [<ffffffff81135196>] __alloc_pages_nodemask+0x736/0x990
>  [<ffffffff811710e0>] alloc_pages_current+0xb0/0x120
>  [<ffffffff8113022a>] __get_free_pages+0x2a/0x80
>  [<ffffffff811786d9>] kmalloc_order_trace+0x39/0xb0
>  [<ffffffff8117ae3a>] __kmalloc+0x16a/0x1a0
>  [<ffffffff8118aa7c>] ? mem_cgroup_bad_page_check+0x1c/0x30
>  [<ffffffff81134563>] ? get_page_from_freelist+0x453/0x950
>  [<ffffffffa007696e>] recent_mt_check.isra.6+0x16e/0x2c0 [xt_recent]
>  [<ffffffffa0076b4b>] recent_mt_check_v0+0x6b/0xa0 [xt_recent]
>  [<ffffffff8153fdda>] xt_check_match+0xaa/0x1e0
>  [<ffffffff8153f3ab>] ? xt_find_match+0x11b/0x130
>  [<ffffffff8153f3ab>] ? xt_find_match+0x11b/0x130
>  [<ffffffff8159257c>] check_match+0x3c/0x50
>  [<ffffffff81593ccb>] translate_table+0x39b/0x5b0
>  [<ffffffff815956f3>] do_ipt_set_ctl+0x133/0x200
>  [<ffffffff8153e10b>] nf_setsockopt+0x6b/0x90
>  [<ffffffff8161f236>] ? _raw_spin_lock_bh+0x16/0x40
>  [<ffffffff8154e41f>] ip_setsockopt+0x8f/0xa0
>  [<ffffffff8156f49d>] raw_setsockopt+0x1d/0x30
>  [<ffffffff814fcf14>] sock_common_setsockopt+0x14/0x20
>  [<ffffffff814fc23c>] sys_setsockopt+0x7c/0xe0
>  [<ffffffff816270e9>] system_call_fastpath+0x16/0x1b
> 
> which looks like it's this..
> 
>         t = kzalloc(sizeof(*t) + sizeof(t->iphash[0]) * ip_list_hash_size,
>                     GFP_KERNEL);
> 
> Which is initialised thus..
> 
>         ip_list_hash_size = 1 << fls(ip_list_tot);
> 
> And ip_list_tot is 10000 in this case. Hmm ?
> 
> 
> Complete report and setup described in his bug report at https://bugzilla.redhat.com/show_bug.cgi?id=890715
> 
> 	Dave
> 

Yes, we had a report and a patch :

http://comments.gmane.org/gmane.linux.network/248216

I'll send it in a more formal way.

Thanks

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: order 7 allocations from xt_recent
  2013-01-03 16:55 ` Eric Dumazet
@ 2013-01-03 17:11   ` Dave Jones
  2013-01-03 17:26     ` Dave Jones
  2013-01-03 18:02     ` Eric Dumazet
  0 siblings, 2 replies; 7+ messages in thread
From: Dave Jones @ 2013-01-03 17:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, h.reindl, Fedora Kernel Team

On Thu, Jan 03, 2013 at 08:55:04AM -0800, Eric Dumazet wrote:
 > On Thu, 2013-01-03 at 11:43 -0500, Dave Jones wrote:
 > > We had a report from a user that shows this code trying
 > > to do enormous allocations, which isn't going to work too well..
 > >  ...
 > > Which is initialised thus..
 > > 
 > >         ip_list_hash_size = 1 << fls(ip_list_tot);
 > > 
 > > And ip_list_tot is 10000 in this case. Hmm ?
 > > 
 > > Complete report and setup described in his bug report at https://bugzilla.redhat.com/show_bug.cgi?id=890715
 > 
 > Yes, we had a report and a patch :
 > 
 > http://comments.gmane.org/gmane.linux.network/248216
 > 
 > I'll send it in a more formal way.

Ah! Excellent.

That 'check size and vmalloc/kmalloc accordingly' thing seems to be a pattern
that comes up time and time again.  Is it worth maybe making a more generic
version of that instead of open-coding it each time it comes up ?

thanks,

	Dave

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: order 7 allocations from xt_recent
  2013-01-03 17:11   ` Dave Jones
@ 2013-01-03 17:26     ` Dave Jones
  2013-01-03 18:00       ` Eric Dumazet
  2013-01-03 18:02     ` Eric Dumazet
  1 sibling, 1 reply; 7+ messages in thread
From: Dave Jones @ 2013-01-03 17:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, h.reindl, Fedora Kernel Team

On Thu, Jan 03, 2013 at 12:11:15PM -0500, Dave Jones wrote:
 > On Thu, Jan 03, 2013 at 08:55:04AM -0800, Eric Dumazet wrote:
 >  > On Thu, 2013-01-03 at 11:43 -0500, Dave Jones wrote:
 >  > > We had a report from a user that shows this code trying
 >  > > to do enormous allocations, which isn't going to work too well..
 >  > >  ...
 >  > > Which is initialised thus..
 >  > > 
 >  > >         ip_list_hash_size = 1 << fls(ip_list_tot);
 >  > > 
 >  > > And ip_list_tot is 10000 in this case. Hmm ?
 >  > > 
 >  > > Complete report and setup described in his bug report at https://bugzilla.redhat.com/show_bug.cgi?id=890715
 >  > 
 >  > Yes, we had a report and a patch :
 >  > 
 >  > http://comments.gmane.org/gmane.linux.network/248216
 >  > 
 >  > I'll send it in a more formal way.
 > 
 > Ah! Excellent.
 > 
 > That 'check size and vmalloc/kmalloc accordingly' thing seems to be a pattern
 > that comes up time and time again.  Is it worth maybe making a more generic
 > version of that instead of open-coding it each time it comes up ?

Something else that I'm puzzled by.

In the report above, it failed to allocate 512kb, but..

Node 0 Normal: 2388*4kB 347*8kB 1029*16kB 3512*32kB 29*64kB 2*128kB 1*256kB 5*512kB 1*1024kB 0*2048kB 0*4096kB = 147128kB
                                                                            ^^^^^^^^^^^^^^^^

Shouldn't the allocator have been able to satisfy that anyway ?

	Dave

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: order 7 allocations from xt_recent
  2013-01-03 17:26     ` Dave Jones
@ 2013-01-03 18:00       ` Eric Dumazet
  2013-01-03 19:51         ` Reindl Harald
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2013-01-03 18:00 UTC (permalink / raw)
  To: Dave Jones; +Cc: netdev, h.reindl, Fedora Kernel Team

On Thu, 2013-01-03 at 12:26 -0500, Dave Jones wrote:
> On Thu, Jan 03, 2013 at 12:11:15PM -0500, Dave Jones wrote:
>  > On Thu, Jan 03, 2013 at 08:55:04AM -0800, Eric Dumazet wrote:
>  >  > On Thu, 2013-01-03 at 11:43 -0500, Dave Jones wrote:
>  >  > > We had a report from a user that shows this code trying
>  >  > > to do enormous allocations, which isn't going to work too well..
>  >  > >  ...
>  >  > > Which is initialised thus..
>  >  > > 
>  >  > >         ip_list_hash_size = 1 << fls(ip_list_tot);
>  >  > > 
>  >  > > And ip_list_tot is 10000 in this case. Hmm ?
>  >  > > 
>  >  > > Complete report and setup described in his bug report at https://bugzilla.redhat.com/show_bug.cgi?id=890715
>  >  > 
>  >  > Yes, we had a report and a patch :
>  >  > 
>  >  > http://comments.gmane.org/gmane.linux.network/248216
>  >  > 
>  >  > I'll send it in a more formal way.
>  > 
>  > Ah! Excellent.
>  > 
>  > That 'check size and vmalloc/kmalloc accordingly' thing seems to be a pattern
>  > that comes up time and time again.  Is it worth maybe making a more generic
>  > version of that instead of open-coding it each time it comes up ?
> 
> Something else that I'm puzzled by.
> 
> In the report above, it failed to allocate 512kb, but..
> 
> Node 0 Normal: 2388*4kB 347*8kB 1029*16kB 3512*32kB 29*64kB 2*128kB 1*256kB 5*512kB 1*1024kB 0*2048kB 0*4096kB = 147128kB
>                                                                             ^^^^^^^^^^^^^^^^
> 
> Shouldn't the allocator have been able to satisfy that anyway ?
> 
> 	Dave
> 

Might be something related to the CONFIG_COMPACTION=y and lumpy reclaim
removal ?

Anyway, we keep a fraction of memory for ATOMIC allocations.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: order 7 allocations from xt_recent
  2013-01-03 17:11   ` Dave Jones
  2013-01-03 17:26     ` Dave Jones
@ 2013-01-03 18:02     ` Eric Dumazet
  1 sibling, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2013-01-03 18:02 UTC (permalink / raw)
  To: Dave Jones; +Cc: netdev, h.reindl, Fedora Kernel Team

On Thu, 2013-01-03 at 12:11 -0500, Dave Jones wrote:

> That 'check size and vmalloc/kmalloc accordingly' thing seems to be a pattern
> that comes up time and time again.  Is it worth maybe making a more generic
> version of that instead of open-coding it each time it comes up ?

We had numerous discussions and patch submissions in the past, and this
went nowhere. I am not sure anybody wants to spend more cycles on this.

Its sad, but true.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: order 7 allocations from xt_recent
  2013-01-03 18:00       ` Eric Dumazet
@ 2013-01-03 19:51         ` Reindl Harald
  0 siblings, 0 replies; 7+ messages in thread
From: Reindl Harald @ 2013-01-03 19:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Dave Jones, netdev, Fedora Kernel Team

[-- Attachment #1: Type: text/plain, Size: 2274 bytes --]



Am 03.01.2013 19:00, schrieb Eric Dumazet:
> On Thu, 2013-01-03 at 12:26 -0500, Dave Jones wrote:
>> On Thu, Jan 03, 2013 at 12:11:15PM -0500, Dave Jones wrote:
>>  > On Thu, Jan 03, 2013 at 08:55:04AM -0800, Eric Dumazet wrote:
>>  >  > On Thu, 2013-01-03 at 11:43 -0500, Dave Jones wrote:
>>  >  > > We had a report from a user that shows this code trying
>>  >  > > to do enormous allocations, which isn't going to work too well..
>>  >  > >  ...
>>  >  > > Which is initialised thus..
>>  >  > > 
>>  >  > >         ip_list_hash_size = 1 << fls(ip_list_tot);
>>  >  > > 
>>  >  > > And ip_list_tot is 10000 in this case. Hmm ?
>>  >  > > 
>>  >  > > Complete report and setup described in his bug report at https://bugzilla.redhat.com/show_bug.cgi?id=890715
>>  >  > 
>>  >  > Yes, we had a report and a patch :
>>  >  > 
>>  >  > http://comments.gmane.org/gmane.linux.network/248216
>>  >  > 
>>  >  > I'll send it in a more formal way.
>>  > 
>>  > Ah! Excellent.
>>  > 
>>  > That 'check size and vmalloc/kmalloc accordingly' thing seems to be a pattern
>>  > that comes up time and time again.  Is it worth maybe making a more generic
>>  > version of that instead of open-coding it each time it comes up ?
>>
>> Something else that I'm puzzled by.
>>
>> In the report above, it failed to allocate 512kb, but..
>>
>> Node 0 Normal: 2388*4kB 347*8kB 1029*16kB 3512*32kB 29*64kB 2*128kB 1*256kB 5*512kB 1*1024kB 0*2048kB 0*4096kB = 147128kB
>>                                                                             ^^^^^^^^^^^^^^^^
>>
>> Shouldn't the allocator have been able to satisfy that anyway ?
>>
> 
> Might be something related to the CONFIG_COMPACTION=y and lumpy reclaim
> removal ?
> 
> Anyway, we keep a fraction of memory for ATOMIC allocations

on the machine there is even "vm.min_free_kbytes" set to 128 MB
however, something goes terrible wrong if cache-pages leads to
stack-traces about failed memory allocation

vm.swappiness = 0
vm.overcommit_memory = 1
vm.overcommit_ratio = 60
vm.vfs_cache_pressure = 30
vm.dirty_background_ratio = 15
vm.dirty_ratio = 40
vm.dirty_expire_centisecs = 1500
vm.dirty_writeback_centisecs = 1500
vm.zone_reclaim_mode = 0
vm.min_free_kbytes = 131072


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-01-03 20:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-03 16:43 order 7 allocations from xt_recent Dave Jones
2013-01-03 16:55 ` Eric Dumazet
2013-01-03 17:11   ` Dave Jones
2013-01-03 17:26     ` Dave Jones
2013-01-03 18:00       ` Eric Dumazet
2013-01-03 19:51         ` Reindl Harald
2013-01-03 18:02     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).