* iptables/tc: page allocation failures question @ 2012-11-03 10:27 Miroslav Kratochvil 2012-11-03 11:34 ` Eric Dumazet 0 siblings, 1 reply; 3+ messages in thread From: Miroslav Kratochvil @ 2012-11-03 10:27 UTC (permalink / raw) To: netdev Hello everyone, I've got several linux boxes that do mostly routing and traffic shaping stuff. The load isn't any dramatic - it's around 100Mbit of traffic shaping over a HFSC qdisc with ~10k classes/filters. Recently I started seeing messages like this in dmesg: iptables: page allocation failure: order:9, mode:0xc0d0 tc: page allocation failure (....) (full messages are attached below) I understood that it means the kernel couldn't allocate memory for execution of given command - it is usually triggered by stuff like 'tc class add' or 'iptables -A something'. The boxes, on the other hand, still have pretty much free memory (alloc+buffers+cache fill around 400MB of 2 gigs available, swap is empty). I guess the problem is caused by the fact that the allocation is constrained by something (like GFP_ATOMIC, or that they can only allocate lower memory). Is this true? If so, is there some possibility to avoid such constraint? What also worries me is that when the box at some point starts to do memory allocation failures, I've been unable to make it stop, even if I delete all qdiscs/iptable entries, clear every cache I know about and restart most of userspace, which should hopefully free a good amount of memory, nothing can be added back. I'm attaching the dmesg of the failure below. Could anyone provide a comment on this, or possibly point me to what can cause this behavior? Is there any better debug output that could clarify this? Thanks in advance, Mirek Kratochvil now how it happens: # iptables -A FORWARD -s 192.168.0.0/24 -m recent -p tcp --name somelist --set /// it waits around 10 seconds here iptables: Memory allocation problem. # dmesg [3391568.085980] iptables: page allocation failure: order:9, mode:0xc0d0 [3391568.085983] Pid: 21115, comm: iptables Not tainted 3.1.1 #3 [3391568.085985] Call Trace: [3391568.085991] [<ffffffff810a4172>] ? warn_alloc_failed+0xf2/0x140 [3391568.085993] [<ffffffff810b3fc0>] ? next_zone+0x30/0x40 [3391568.085995] [<ffffffff810a6fe0>] ? page_alloc_cpu_notify+0x40/0x40 [3391568.085997] [<ffffffff810a69f9>] ? __alloc_pages_nodemask+0x609/0x790 [3391568.086000] [<ffffffff810d3740>] ? alloc_pages_current+0xa0/0x110 [3391568.086002] [<ffffffff810a3c79>] ? __get_free_pages+0x9/0x40 [3391568.086006] [<ffffffff814d40c5>] ? recent_mt_check+0x165/0x2a0 [3391568.086007] [<ffffffff814d1ac2>] ? xt_check_match+0xb2/0x1f0 [3391568.086010] [<ffffffff8109f758>] ? find_get_page+0x18/0xa0 [3391568.086015] [<ffffffff815bfc69>] ? _raw_spin_lock_bh+0x9/0x20 [3391568.086024] [<ffffffff815be9b9>] ? mutex_lock_interruptible+0x9/0x40 [3391568.086026] [<ffffffff814d12f0>] ? xt_find_match+0xb0/0x110 [3391568.086029] [<ffffffff815226fe>] ? translate_table+0x29e/0x640 [3391568.086031] [<ffffffff815237ba>] ? do_ipt_set_ctl+0x13a/0x1b0 [3391568.086036] [<ffffffff814cdb0a>] ? nf_sockopt+0x5a/0xb0 [3391568.086039] [<ffffffff814cdb9a>] ? nf_setsockopt+0x1a/0x20 [3391568.086044] [<ffffffff814e2933>] ? ip_setsockopt+0x93/0xc0 [3391568.086048] [<ffffffff81485e02>] ? sockfd_lookup_light+0x22/0x90 [3391568.086051] [<ffffffff81488d7d>] ? sys_setsockopt+0x6d/0xd0 [3391568.086054] [<ffffffff815c043b>] ? system_call_fastpath+0x16/0x1b [3391568.086057] Mem-Info: [3391568.086058] Node 0 DMA per-cpu: [3391568.086061] CPU 0: hi: 0, btch: 1 usd: 0 [3391568.086063] CPU 1: hi: 0, btch: 1 usd: 0 [3391568.086064] CPU 2: hi: 0, btch: 1 usd: 0 [3391568.086065] CPU 3: hi: 0, btch: 1 usd: 0 [3391568.086066] Node 0 DMA32 per-cpu: [3391568.086067] CPU 0: hi: 186, btch: 31 usd: 0 [3391568.086068] CPU 1: hi: 186, btch: 31 usd: 0 [3391568.086069] CPU 2: hi: 186, btch: 31 usd: 25 [3391568.086071] CPU 3: hi: 186, btch: 31 usd: 0 [3391568.086075] active_anon:10259 inactive_anon:3453 isolated_anon:0 [3391568.086076] active_file:261 inactive_file:269 isolated_file:4 [3391568.086077] unevictable:0 dirty:0 writeback:3640 unstable:0 [3391568.086078] free:447441 slab_reclaimable:892 slab_unreclaimable:41475 [3391568.086079] mapped:4747 shmem:4488 pagetables:694 bounce:0 [3391568.086081] Node 0 DMA free:11104kB min:40kB low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15648kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:128kB slab_unreclaimable:4664kB kernel_stack:8kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [3391568.086091] lowmem_reserve[]: 0 1978 1978 1978 [3391568.086096] Node 0 DMA32 free:1778660kB min:5668kB low:7084kB high:8500kB active_anon:41036kB inactive_anon:13812kB active_file:1044kB inactive_file:1076kB unevictable:0kB isolated(anon):0kB isolated(file):144kB present:2026068kB mlocked:0kB dirty:0kB writeback:14560kB mapped:18988kB shmem:17952kB slab_reclaimable:3440kB slab_unreclaimable:161236kB kernel_stack:704kB pagetables:2776kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:288 all_unreclaimable? no [3391568.086107] lowmem_reserve[]: 0 0 0 0 [3391568.086112] Node 0 DMA: 18*4kB 7*8kB 10*16kB 12*32kB 9*64kB 9*128kB 8*256kB 3*512kB 3*1024kB 1*2048kB 0*4096kB = 11104kB [3391568.086119] Node 0 DMA32: 13255*4kB 14135*8kB 14291*16kB 11013*32kB 6973*64kB 2976*128kB 614*256kB 88*512kB 2*1024kB 0*2048kB 0*4096kB = 1778660kB [3391568.086139] 8553 total pagecache pages [3391568.086140] 3528 pages in swap cache [3391568.086142] Swap cache stats: add 666897, delete 663369, find 317604632/317669671 [3391568.086144] Free swap = 973916kB [3391568.086146] Total swap = 1048572kB [3391568.091300] 522224 pages RAM [3391568.091302] 14130 pages reserved [3391568.091302] 14125 pages shared [3391568.091303] 55020 pages non-shared ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: iptables/tc: page allocation failures question 2012-11-03 10:27 iptables/tc: page allocation failures question Miroslav Kratochvil @ 2012-11-03 11:34 ` Eric Dumazet 2012-11-03 15:24 ` Miroslav Kratochvil 0 siblings, 1 reply; 3+ messages in thread From: Eric Dumazet @ 2012-11-03 11:34 UTC (permalink / raw) To: Miroslav Kratochvil; +Cc: netdev On Sat, 2012-11-03 at 11:27 +0100, Miroslav Kratochvil wrote: > Hello everyone, > > I've got several linux boxes that do mostly routing and traffic > shaping stuff. The load isn't any dramatic - it's around 100Mbit of > traffic shaping over a HFSC qdisc with ~10k classes/filters. > > Recently I started seeing messages like this in dmesg: > > iptables: page allocation failure: order:9, mode:0xc0d0 > > tc: page allocation failure (....) > > (full messages are attached below) > > I understood that it means the kernel couldn't allocate memory for > execution of given command - it is usually triggered by stuff like 'tc > class add' or 'iptables -A something'. > > The boxes, on the other hand, still have pretty much free memory > (alloc+buffers+cache fill around 400MB of 2 gigs available, swap is > empty). I guess the problem is caused by the fact that the allocation > is constrained by something (like GFP_ATOMIC, or that they can only > allocate lower memory). Is this true? If so, is there some possibility > to avoid such constraint? > > What also worries me is that when the box at some point starts to do > memory allocation failures, I've been unable to make it stop, even if > I delete all qdiscs/iptable entries, clear every cache I know about > and restart most of userspace, which should hopefully free a good > amount of memory, nothing can be added back. > > I'm attaching the dmesg of the failure below. Could anyone provide a > comment on this, or possibly point me to what can cause this behavior? > Is there any better debug output that could clarify this? > > Thanks in advance, > Mirek Kratochvil You apparently load xt_recent module with a big ip_list_tot value (default is 100), and kzalloc() wants an order-9 page (contiguous 2MB of ram), and it fails. I guess following patch should solve your problem diff --git a/net/netfilter/xt_recent.c b/net/netfilter/xt_recent.c index 4635c9b..ceebd8b 100644 --- a/net/netfilter/xt_recent.c +++ b/net/netfilter/xt_recent.c @@ -29,6 +29,7 @@ #include <linux/skbuff.h> #include <linux/inet.h> #include <linux/slab.h> +#include <linux/vmalloc.h> #include <net/net_namespace.h> #include <net/netns/generic.h> @@ -310,6 +311,14 @@ out: return ret; } +static void recent_table_free(void *addr) +{ + if (is_vmalloc_addr(addr)) + vfree(addr); + else + kfree(addr); +} + static int recent_mt_check(const struct xt_mtchk_param *par, const struct xt_recent_mtinfo_v1 *info) { @@ -322,6 +331,7 @@ static int recent_mt_check(const struct xt_mtchk_param *par, #endif unsigned int i; int ret = -EINVAL; + size_t sz; if (unlikely(!hash_rnd_inited)) { get_random_bytes(&hash_rnd, sizeof(hash_rnd)); @@ -360,8 +370,11 @@ static int recent_mt_check(const struct xt_mtchk_param *par, goto out; } - t = kzalloc(sizeof(*t) + sizeof(t->iphash[0]) * ip_list_hash_size, - GFP_KERNEL); + sz = sizeof(*t) + sizeof(t->iphash[0]) * ip_list_hash_size; + if (sz <= PAGE_SIZE) + t = kzalloc(sz, GFP_KERNEL); + else + t = vzalloc(sz); if (t == NULL) { ret = -ENOMEM; goto out; @@ -377,14 +390,14 @@ static int recent_mt_check(const struct xt_mtchk_param *par, uid = make_kuid(&init_user_ns, ip_list_uid); gid = make_kgid(&init_user_ns, ip_list_gid); if (!uid_valid(uid) || !gid_valid(gid)) { - kfree(t); + recent_table_free(t); ret = -EINVAL; goto out; } pde = proc_create_data(t->name, ip_list_perms, recent_net->xt_recent, &recent_mt_fops, t); if (pde == NULL) { - kfree(t); + recent_table_free(t); ret = -ENOMEM; goto out; } @@ -434,7 +447,7 @@ static void recent_mt_destroy(const struct xt_mtdtor_param *par) remove_proc_entry(t->name, recent_net->xt_recent); #endif recent_table_flush(t); - kfree(t); + recent_table_free(t); } mutex_unlock(&recent_mutex); } ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: iptables/tc: page allocation failures question 2012-11-03 11:34 ` Eric Dumazet @ 2012-11-03 15:24 ` Miroslav Kratochvil 0 siblings, 0 replies; 3+ messages in thread From: Miroslav Kratochvil @ 2012-11-03 15:24 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev Hi, Thanks for the patch! I think it will fix the problem, I patched one of the production boxes and will see if it breaks again; it usually happens after a day or two. Anyway, more questions: - my problem sometimes happens even when there are no big xt_recent allocations happening (just TC/HFSC). Therefore: 1] Is it possible that something similarly big gets allocated in HFSC? I didn't actually find anything that would, so... 2] Is it possible that allocation fragmentation of kalloc/kfree zone (well it's 10k filters + 10k classes + filter hash table infrastructure and it is still being rewritten/restructured by the management software...) can cause similar problems? - is there some decent way to possibly fix this without manually patching all production kernels? magic kernel parameter that would convert failing kalloc to valloc? sysctl to prevent exhausting the memory? or, at least, something that would reset the failing machine's memory to a state other than "everything fails"? Sorry for asking too many questions, but I feel it'd be unwise to let it behave this way... :] Thanks, -mk On Sat, Nov 3, 2012 at 12:34 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Sat, 2012-11-03 at 11:27 +0100, Miroslav Kratochvil wrote: >> Hello everyone, >> >> I've got several linux boxes that do mostly routing and traffic >> shaping stuff. The load isn't any dramatic - it's around 100Mbit of >> traffic shaping over a HFSC qdisc with ~10k classes/filters. >> >> Recently I started seeing messages like this in dmesg: >> >> iptables: page allocation failure: order:9, mode:0xc0d0 >> >> tc: page allocation failure (....) >> >> (full messages are attached below) >> >> I understood that it means the kernel couldn't allocate memory for >> execution of given command - it is usually triggered by stuff like 'tc >> class add' or 'iptables -A something'. >> >> The boxes, on the other hand, still have pretty much free memory >> (alloc+buffers+cache fill around 400MB of 2 gigs available, swap is >> empty). I guess the problem is caused by the fact that the allocation >> is constrained by something (like GFP_ATOMIC, or that they can only >> allocate lower memory). Is this true? If so, is there some possibility >> to avoid such constraint? >> >> What also worries me is that when the box at some point starts to do >> memory allocation failures, I've been unable to make it stop, even if >> I delete all qdiscs/iptable entries, clear every cache I know about >> and restart most of userspace, which should hopefully free a good >> amount of memory, nothing can be added back. >> >> I'm attaching the dmesg of the failure below. Could anyone provide a >> comment on this, or possibly point me to what can cause this behavior? >> Is there any better debug output that could clarify this? >> >> Thanks in advance, >> Mirek Kratochvil > > You apparently load xt_recent module with a big ip_list_tot value > (default is 100), and kzalloc() wants an order-9 page (contiguous 2MB of > ram), and it fails. > > I guess following patch should solve your problem > > diff --git a/net/netfilter/xt_recent.c b/net/netfilter/xt_recent.c > index 4635c9b..ceebd8b 100644 > --- a/net/netfilter/xt_recent.c > +++ b/net/netfilter/xt_recent.c > @@ -29,6 +29,7 @@ > #include <linux/skbuff.h> > #include <linux/inet.h> > #include <linux/slab.h> > +#include <linux/vmalloc.h> > #include <net/net_namespace.h> > #include <net/netns/generic.h> > > @@ -310,6 +311,14 @@ out: > return ret; > } > > +static void recent_table_free(void *addr) > +{ > + if (is_vmalloc_addr(addr)) > + vfree(addr); > + else > + kfree(addr); > +} > + > static int recent_mt_check(const struct xt_mtchk_param *par, > const struct xt_recent_mtinfo_v1 *info) > { > @@ -322,6 +331,7 @@ static int recent_mt_check(const struct xt_mtchk_param *par, > #endif > unsigned int i; > int ret = -EINVAL; > + size_t sz; > > if (unlikely(!hash_rnd_inited)) { > get_random_bytes(&hash_rnd, sizeof(hash_rnd)); > @@ -360,8 +370,11 @@ static int recent_mt_check(const struct xt_mtchk_param *par, > goto out; > } > > - t = kzalloc(sizeof(*t) + sizeof(t->iphash[0]) * ip_list_hash_size, > - GFP_KERNEL); > + sz = sizeof(*t) + sizeof(t->iphash[0]) * ip_list_hash_size; > + if (sz <= PAGE_SIZE) > + t = kzalloc(sz, GFP_KERNEL); > + else > + t = vzalloc(sz); > if (t == NULL) { > ret = -ENOMEM; > goto out; > @@ -377,14 +390,14 @@ static int recent_mt_check(const struct xt_mtchk_param *par, > uid = make_kuid(&init_user_ns, ip_list_uid); > gid = make_kgid(&init_user_ns, ip_list_gid); > if (!uid_valid(uid) || !gid_valid(gid)) { > - kfree(t); > + recent_table_free(t); > ret = -EINVAL; > goto out; > } > pde = proc_create_data(t->name, ip_list_perms, recent_net->xt_recent, > &recent_mt_fops, t); > if (pde == NULL) { > - kfree(t); > + recent_table_free(t); > ret = -ENOMEM; > goto out; > } > @@ -434,7 +447,7 @@ static void recent_mt_destroy(const struct xt_mtdtor_param *par) > remove_proc_entry(t->name, recent_net->xt_recent); > #endif > recent_table_flush(t); > - kfree(t); > + recent_table_free(t); > } > mutex_unlock(&recent_mutex); > } > > ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-11-03 15:24 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-11-03 10:27 iptables/tc: page allocation failures question Miroslav Kratochvil 2012-11-03 11:34 ` Eric Dumazet 2012-11-03 15:24 ` Miroslav Kratochvil
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).