* order 7 allocations from xt_recent
@ 2013-01-03 16:43 Dave Jones
2013-01-03 16:55 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Dave Jones @ 2013-01-03 16:43 UTC (permalink / raw)
To: netdev; +Cc: h.reindl, Fedora Kernel Team
We had a report from a user that shows this code trying
to do enormous allocations, which isn't going to work too well..
iptables: page allocation failure: order:7, mode:0xc0d0
Pid: 2822, comm: iptables Not tainted 3.6.10-2.fc17.x86_64 #1
Call Trace:
[<ffffffff8113130b>] warn_alloc_failed+0xeb/0x150
[<ffffffff81616576>] ? __alloc_pages_direct_compact+0x17e/0x190
[<ffffffff81135196>] __alloc_pages_nodemask+0x736/0x990
[<ffffffff811710e0>] alloc_pages_current+0xb0/0x120
[<ffffffff8113022a>] __get_free_pages+0x2a/0x80
[<ffffffff811786d9>] kmalloc_order_trace+0x39/0xb0
[<ffffffff8117ae3a>] __kmalloc+0x16a/0x1a0
[<ffffffff8118aa7c>] ? mem_cgroup_bad_page_check+0x1c/0x30
[<ffffffff81134563>] ? get_page_from_freelist+0x453/0x950
[<ffffffffa007696e>] recent_mt_check.isra.6+0x16e/0x2c0 [xt_recent]
[<ffffffffa0076b4b>] recent_mt_check_v0+0x6b/0xa0 [xt_recent]
[<ffffffff8153fdda>] xt_check_match+0xaa/0x1e0
[<ffffffff8153f3ab>] ? xt_find_match+0x11b/0x130
[<ffffffff8153f3ab>] ? xt_find_match+0x11b/0x130
[<ffffffff8159257c>] check_match+0x3c/0x50
[<ffffffff81593ccb>] translate_table+0x39b/0x5b0
[<ffffffff815956f3>] do_ipt_set_ctl+0x133/0x200
[<ffffffff8153e10b>] nf_setsockopt+0x6b/0x90
[<ffffffff8161f236>] ? _raw_spin_lock_bh+0x16/0x40
[<ffffffff8154e41f>] ip_setsockopt+0x8f/0xa0
[<ffffffff8156f49d>] raw_setsockopt+0x1d/0x30
[<ffffffff814fcf14>] sock_common_setsockopt+0x14/0x20
[<ffffffff814fc23c>] sys_setsockopt+0x7c/0xe0
[<ffffffff816270e9>] system_call_fastpath+0x16/0x1b
which looks like it's this..
t = kzalloc(sizeof(*t) + sizeof(t->iphash[0]) * ip_list_hash_size,
GFP_KERNEL);
Which is initialised thus..
ip_list_hash_size = 1 << fls(ip_list_tot);
And ip_list_tot is 10000 in this case. Hmm ?
Complete report and setup described in his bug report at https://bugzilla.redhat.com/show_bug.cgi?id=890715
Dave
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: order 7 allocations from xt_recent 2013-01-03 16:43 order 7 allocations from xt_recent Dave Jones @ 2013-01-03 16:55 ` Eric Dumazet 2013-01-03 17:11 ` Dave Jones 0 siblings, 1 reply; 7+ messages in thread From: Eric Dumazet @ 2013-01-03 16:55 UTC (permalink / raw) To: Dave Jones; +Cc: netdev, h.reindl, Fedora Kernel Team On Thu, 2013-01-03 at 11:43 -0500, Dave Jones wrote: > We had a report from a user that shows this code trying > to do enormous allocations, which isn't going to work too well.. > > iptables: page allocation failure: order:7, mode:0xc0d0 > Pid: 2822, comm: iptables Not tainted 3.6.10-2.fc17.x86_64 #1 > Call Trace: > [<ffffffff8113130b>] warn_alloc_failed+0xeb/0x150 > [<ffffffff81616576>] ? __alloc_pages_direct_compact+0x17e/0x190 > [<ffffffff81135196>] __alloc_pages_nodemask+0x736/0x990 > [<ffffffff811710e0>] alloc_pages_current+0xb0/0x120 > [<ffffffff8113022a>] __get_free_pages+0x2a/0x80 > [<ffffffff811786d9>] kmalloc_order_trace+0x39/0xb0 > [<ffffffff8117ae3a>] __kmalloc+0x16a/0x1a0 > [<ffffffff8118aa7c>] ? mem_cgroup_bad_page_check+0x1c/0x30 > [<ffffffff81134563>] ? get_page_from_freelist+0x453/0x950 > [<ffffffffa007696e>] recent_mt_check.isra.6+0x16e/0x2c0 [xt_recent] > [<ffffffffa0076b4b>] recent_mt_check_v0+0x6b/0xa0 [xt_recent] > [<ffffffff8153fdda>] xt_check_match+0xaa/0x1e0 > [<ffffffff8153f3ab>] ? xt_find_match+0x11b/0x130 > [<ffffffff8153f3ab>] ? xt_find_match+0x11b/0x130 > [<ffffffff8159257c>] check_match+0x3c/0x50 > [<ffffffff81593ccb>] translate_table+0x39b/0x5b0 > [<ffffffff815956f3>] do_ipt_set_ctl+0x133/0x200 > [<ffffffff8153e10b>] nf_setsockopt+0x6b/0x90 > [<ffffffff8161f236>] ? _raw_spin_lock_bh+0x16/0x40 > [<ffffffff8154e41f>] ip_setsockopt+0x8f/0xa0 > [<ffffffff8156f49d>] raw_setsockopt+0x1d/0x30 > [<ffffffff814fcf14>] sock_common_setsockopt+0x14/0x20 > [<ffffffff814fc23c>] sys_setsockopt+0x7c/0xe0 > [<ffffffff816270e9>] system_call_fastpath+0x16/0x1b > > which looks like it's this.. > > t = kzalloc(sizeof(*t) + sizeof(t->iphash[0]) * ip_list_hash_size, > GFP_KERNEL); > > Which is initialised thus.. > > ip_list_hash_size = 1 << fls(ip_list_tot); > > And ip_list_tot is 10000 in this case. Hmm ? > > > Complete report and setup described in his bug report at https://bugzilla.redhat.com/show_bug.cgi?id=890715 > > Dave > Yes, we had a report and a patch : http://comments.gmane.org/gmane.linux.network/248216 I'll send it in a more formal way. Thanks ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: order 7 allocations from xt_recent 2013-01-03 16:55 ` Eric Dumazet @ 2013-01-03 17:11 ` Dave Jones 2013-01-03 17:26 ` Dave Jones 2013-01-03 18:02 ` Eric Dumazet 0 siblings, 2 replies; 7+ messages in thread From: Dave Jones @ 2013-01-03 17:11 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, h.reindl, Fedora Kernel Team On Thu, Jan 03, 2013 at 08:55:04AM -0800, Eric Dumazet wrote: > On Thu, 2013-01-03 at 11:43 -0500, Dave Jones wrote: > > We had a report from a user that shows this code trying > > to do enormous allocations, which isn't going to work too well.. > > ... > > Which is initialised thus.. > > > > ip_list_hash_size = 1 << fls(ip_list_tot); > > > > And ip_list_tot is 10000 in this case. Hmm ? > > > > Complete report and setup described in his bug report at https://bugzilla.redhat.com/show_bug.cgi?id=890715 > > Yes, we had a report and a patch : > > http://comments.gmane.org/gmane.linux.network/248216 > > I'll send it in a more formal way. Ah! Excellent. That 'check size and vmalloc/kmalloc accordingly' thing seems to be a pattern that comes up time and time again. Is it worth maybe making a more generic version of that instead of open-coding it each time it comes up ? thanks, Dave ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: order 7 allocations from xt_recent 2013-01-03 17:11 ` Dave Jones @ 2013-01-03 17:26 ` Dave Jones 2013-01-03 18:00 ` Eric Dumazet 2013-01-03 18:02 ` Eric Dumazet 1 sibling, 1 reply; 7+ messages in thread From: Dave Jones @ 2013-01-03 17:26 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, h.reindl, Fedora Kernel Team On Thu, Jan 03, 2013 at 12:11:15PM -0500, Dave Jones wrote: > On Thu, Jan 03, 2013 at 08:55:04AM -0800, Eric Dumazet wrote: > > On Thu, 2013-01-03 at 11:43 -0500, Dave Jones wrote: > > > We had a report from a user that shows this code trying > > > to do enormous allocations, which isn't going to work too well.. > > > ... > > > Which is initialised thus.. > > > > > > ip_list_hash_size = 1 << fls(ip_list_tot); > > > > > > And ip_list_tot is 10000 in this case. Hmm ? > > > > > > Complete report and setup described in his bug report at https://bugzilla.redhat.com/show_bug.cgi?id=890715 > > > > Yes, we had a report and a patch : > > > > http://comments.gmane.org/gmane.linux.network/248216 > > > > I'll send it in a more formal way. > > Ah! Excellent. > > That 'check size and vmalloc/kmalloc accordingly' thing seems to be a pattern > that comes up time and time again. Is it worth maybe making a more generic > version of that instead of open-coding it each time it comes up ? Something else that I'm puzzled by. In the report above, it failed to allocate 512kb, but.. Node 0 Normal: 2388*4kB 347*8kB 1029*16kB 3512*32kB 29*64kB 2*128kB 1*256kB 5*512kB 1*1024kB 0*2048kB 0*4096kB = 147128kB ^^^^^^^^^^^^^^^^ Shouldn't the allocator have been able to satisfy that anyway ? Dave ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: order 7 allocations from xt_recent 2013-01-03 17:26 ` Dave Jones @ 2013-01-03 18:00 ` Eric Dumazet 2013-01-03 19:51 ` Reindl Harald 0 siblings, 1 reply; 7+ messages in thread From: Eric Dumazet @ 2013-01-03 18:00 UTC (permalink / raw) To: Dave Jones; +Cc: netdev, h.reindl, Fedora Kernel Team On Thu, 2013-01-03 at 12:26 -0500, Dave Jones wrote: > On Thu, Jan 03, 2013 at 12:11:15PM -0500, Dave Jones wrote: > > On Thu, Jan 03, 2013 at 08:55:04AM -0800, Eric Dumazet wrote: > > > On Thu, 2013-01-03 at 11:43 -0500, Dave Jones wrote: > > > > We had a report from a user that shows this code trying > > > > to do enormous allocations, which isn't going to work too well.. > > > > ... > > > > Which is initialised thus.. > > > > > > > > ip_list_hash_size = 1 << fls(ip_list_tot); > > > > > > > > And ip_list_tot is 10000 in this case. Hmm ? > > > > > > > > Complete report and setup described in his bug report at https://bugzilla.redhat.com/show_bug.cgi?id=890715 > > > > > > Yes, we had a report and a patch : > > > > > > http://comments.gmane.org/gmane.linux.network/248216 > > > > > > I'll send it in a more formal way. > > > > Ah! Excellent. > > > > That 'check size and vmalloc/kmalloc accordingly' thing seems to be a pattern > > that comes up time and time again. Is it worth maybe making a more generic > > version of that instead of open-coding it each time it comes up ? > > Something else that I'm puzzled by. > > In the report above, it failed to allocate 512kb, but.. > > Node 0 Normal: 2388*4kB 347*8kB 1029*16kB 3512*32kB 29*64kB 2*128kB 1*256kB 5*512kB 1*1024kB 0*2048kB 0*4096kB = 147128kB > ^^^^^^^^^^^^^^^^ > > Shouldn't the allocator have been able to satisfy that anyway ? > > Dave > Might be something related to the CONFIG_COMPACTION=y and lumpy reclaim removal ? Anyway, we keep a fraction of memory for ATOMIC allocations. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: order 7 allocations from xt_recent 2013-01-03 18:00 ` Eric Dumazet @ 2013-01-03 19:51 ` Reindl Harald 0 siblings, 0 replies; 7+ messages in thread From: Reindl Harald @ 2013-01-03 19:51 UTC (permalink / raw) To: Eric Dumazet; +Cc: Dave Jones, netdev, Fedora Kernel Team [-- Attachment #1: Type: text/plain, Size: 2274 bytes --] Am 03.01.2013 19:00, schrieb Eric Dumazet: > On Thu, 2013-01-03 at 12:26 -0500, Dave Jones wrote: >> On Thu, Jan 03, 2013 at 12:11:15PM -0500, Dave Jones wrote: >> > On Thu, Jan 03, 2013 at 08:55:04AM -0800, Eric Dumazet wrote: >> > > On Thu, 2013-01-03 at 11:43 -0500, Dave Jones wrote: >> > > > We had a report from a user that shows this code trying >> > > > to do enormous allocations, which isn't going to work too well.. >> > > > ... >> > > > Which is initialised thus.. >> > > > >> > > > ip_list_hash_size = 1 << fls(ip_list_tot); >> > > > >> > > > And ip_list_tot is 10000 in this case. Hmm ? >> > > > >> > > > Complete report and setup described in his bug report at https://bugzilla.redhat.com/show_bug.cgi?id=890715 >> > > >> > > Yes, we had a report and a patch : >> > > >> > > http://comments.gmane.org/gmane.linux.network/248216 >> > > >> > > I'll send it in a more formal way. >> > >> > Ah! Excellent. >> > >> > That 'check size and vmalloc/kmalloc accordingly' thing seems to be a pattern >> > that comes up time and time again. Is it worth maybe making a more generic >> > version of that instead of open-coding it each time it comes up ? >> >> Something else that I'm puzzled by. >> >> In the report above, it failed to allocate 512kb, but.. >> >> Node 0 Normal: 2388*4kB 347*8kB 1029*16kB 3512*32kB 29*64kB 2*128kB 1*256kB 5*512kB 1*1024kB 0*2048kB 0*4096kB = 147128kB >> ^^^^^^^^^^^^^^^^ >> >> Shouldn't the allocator have been able to satisfy that anyway ? >> > > Might be something related to the CONFIG_COMPACTION=y and lumpy reclaim > removal ? > > Anyway, we keep a fraction of memory for ATOMIC allocations on the machine there is even "vm.min_free_kbytes" set to 128 MB however, something goes terrible wrong if cache-pages leads to stack-traces about failed memory allocation vm.swappiness = 0 vm.overcommit_memory = 1 vm.overcommit_ratio = 60 vm.vfs_cache_pressure = 30 vm.dirty_background_ratio = 15 vm.dirty_ratio = 40 vm.dirty_expire_centisecs = 1500 vm.dirty_writeback_centisecs = 1500 vm.zone_reclaim_mode = 0 vm.min_free_kbytes = 131072 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 261 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: order 7 allocations from xt_recent 2013-01-03 17:11 ` Dave Jones 2013-01-03 17:26 ` Dave Jones @ 2013-01-03 18:02 ` Eric Dumazet 1 sibling, 0 replies; 7+ messages in thread From: Eric Dumazet @ 2013-01-03 18:02 UTC (permalink / raw) To: Dave Jones; +Cc: netdev, h.reindl, Fedora Kernel Team On Thu, 2013-01-03 at 12:11 -0500, Dave Jones wrote: > That 'check size and vmalloc/kmalloc accordingly' thing seems to be a pattern > that comes up time and time again. Is it worth maybe making a more generic > version of that instead of open-coding it each time it comes up ? We had numerous discussions and patch submissions in the past, and this went nowhere. I am not sure anybody wants to spend more cycles on this. Its sad, but true. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-01-03 20:09 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-01-03 16:43 order 7 allocations from xt_recent Dave Jones 2013-01-03 16:55 ` Eric Dumazet 2013-01-03 17:11 ` Dave Jones 2013-01-03 17:26 ` Dave Jones 2013-01-03 18:00 ` Eric Dumazet 2013-01-03 19:51 ` Reindl Harald 2013-01-03 18:02 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).