From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nikolay Borisov Subject: Re: [PATCH v2] netfilter: ipset: Fix sleeping memory allocation in atomic context Date: Thu, 15 Oct 2015 16:41:56 +0300 Message-ID: <561FAD24.4050708@kyup.com> References: <1444906569-9131-1-git-send-email-kernel@kyup.com> <1444915978.4200.4.camel@edumazet-glaptop2.roam.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: kadlec@blackhole.kfki.hu, pablo@netfilter.org, davem@davemloft.net, netfilter-devel@vger.kernel.org, netdev@vger.kernel.org, operations@siteground.com To: Eric Dumazet Return-path: Received: from mail-wi0-f178.google.com ([209.85.212.178]:37631 "EHLO mail-wi0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752853AbbJONmA (ORCPT ); Thu, 15 Oct 2015 09:42:00 -0400 Received: by wijq8 with SMTP id q8so130529425wij.0 for ; Thu, 15 Oct 2015 06:41:58 -0700 (PDT) In-Reply-To: <1444915978.4200.4.camel@edumazet-glaptop2.roam.corp.google.com> Sender: netdev-owner@vger.kernel.org List-ID: On 10/15/2015 04:32 PM, Eric Dumazet wrote: > On Thu, 2015-10-15 at 13:56 +0300, Nikolay Borisov wrote: >> Commit 00590fdd5be0 introduced RCU locking in list type and in >> doing so introduced a memory allocation in list_set_add, which >> results in the following splat: >> >> BUG: sleeping function called from invalid context at mm/page_alloc.c:2759 >> in_atomic(): 1, irqs_disabled(): 0, pid: 9664, name: ipset >> CPU: 18 PID: 9664 Comm: ipset Tainted: G O 3.12.47-clouder3 #1 >> Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015 >> 0000000000000002 ffff881fd14273c8 ffffffff8163d891 ffff881fcb4264b0 >> ffff881fcb4260c0 ffff881fd14273e8 ffffffff810ba5bf ffff881fd1427558 >> 0000000000000000 ffff881fd1427568 ffffffff81142b33 ffff881f00000000 >> Call Trace: >> [] dump_stack+0x58/0x7f >> [] __might_sleep+0xdf/0x110 >> [] __alloc_pages_nodemask+0x243/0xc20 >> [] alloc_pages_current+0xbe/0x170 >> [] new_slab+0x295/0x340 >> [] __slab_alloc+0x2c0/0x5a0 >> [] ? __schedule+0x2dc/0x760 >> [] __kmalloc+0x11b/0x230 >> [] ? ip_set_get_byname+0xec/0x100 [ip_set] >> [] list_set_uadd+0x16b/0x314 [ip_set_list_set] >> [] ? _raw_write_unlock_bh+0x28/0x30 >> [] list_set_uadt+0x21c/0x320 [ip_set_list_set] >> [] ? list_set_create+0x1a0/0x1a0 [ip_set_list_set] >> [] call_ad+0x82/0x200 [ip_set] >> [] ? find_set_type+0x51/0xa0 [ip_set] >> [] ? nla_parse+0xf5/0x130 >> [] ip_set_uadd+0x20e/0x2d0 [ip_set] >> [] ? ip_set_create+0x2a3/0x450 [ip_set] >> [] ? ip_set_udel+0x2e0/0x2e0 [ip_set] >> [] nfnetlink_rcv_msg+0x31e/0x330 >> [] ? nfnetlink_rcv_msg+0x41/0x330 >> [] ? nfnl_lock+0x30/0x30 >> [] netlink_rcv_skb+0xa9/0xd0 >> [] nfnetlink_rcv+0x15/0x20 >> [] netlink_unicast+0x10f/0x190 >> [] netlink_sendmsg+0x2c0/0x660 >> [] sock_sendmsg+0x90/0xc0 >> [] ? move_addr_to_user+0xa3/0xc0 >> [] ? ___sys_recvmsg+0x182/0x300 >> [] SYSC_sendto+0x134/0x180 >> [] ? mntput+0x21/0x30 >> [] ? __kfree_skb+0x3f/0xa0 >> [] SyS_sendto+0xe/0x10 >> [] system_call_fastpath+0x16/0x1b >> >> The call chain leading to this is as follow: >> call_ad -> list_set_uadt -> list_set_uadd -> kzalloc(, GFP_KERNEL). >> And since GFP_KERNEL allows initiating direct reclaim thus >> potentially sleeping in the allocation path, this leads to the >> aforementioned splat. >> >> To fix it change the allocation type to GFP_ATOMIC, to >> correctly reflect that it is occuring in an atomic context. >> >> Fixes: 00590fdd5be0 ("netfilter: ipset: Introduce RCU locking in list type") >> >> Acked-by: Jozsef Kadlecsik >> Signed-off-by: Nikolay Borisov >> --- >> >> Changes since V1: >> * Added acked-by >> * Fixed patch header >> >> net/netfilter/ipset/ip_set_list_set.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/net/netfilter/ipset/ip_set_list_set.c b/net/netfilter/ipset/ip_set_list_set.c >> index a1fe537..5a30ce6 100644 >> --- a/net/netfilter/ipset/ip_set_list_set.c >> +++ b/net/netfilter/ipset/ip_set_list_set.c >> @@ -297,7 +297,7 @@ list_set_uadd(struct ip_set *set, void *value, const struct ip_set_ext *ext, >> ip_set_timeout_expired(ext_timeout(n, set)))) >> n = NULL; >> >> - e = kzalloc(set->dsize, GFP_KERNEL); >> + e = kzalloc(set->dsize, GFP_ATOMIC); >> if (!e) >> return -ENOMEM; >> e->id = d->id; > > This patch looks very bogus to me. > > Could we fix the root cause please ? > > Root cause is that somewhere in this controlling path, an erroneous > rcu_read_lock() is used, while it is very probably not needed, as > controlling path should be protected by a mutex, which definitely is > sane, because it allows us to perform GFP_KERNEL allocations and being > preempted. > > Why are we using rcu_read_lock() in list_set_list() ? > > This looks as yet another bit of 'let us throw > rcu_read_lock()/rcu_read_unlock() pairs' all over the places because it > feels so good. I did check the call paths and there isn't an rcu_read_lock called in list_set_uadt/list_set_uadd. On the contrary, this "write" operation to the list is being serialised in call_ad() via set->lock spin_lock. What am I missing here? > > > >