All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Borisov <kernel@kyup.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: kadlec@blackhole.kfki.hu, pablo@netfilter.org,
	davem@davemloft.net, netfilter-devel@vger.kernel.org,
	netdev@vger.kernel.org, operations@siteground.com
Subject: Re: [PATCH v2] netfilter: ipset: Fix sleeping memory allocation in atomic context
Date: Thu, 15 Oct 2015 17:49:37 +0300	[thread overview]
Message-ID: <561FBD01.8020906@kyup.com> (raw)
In-Reply-To: <1444919523.4200.16.camel@edumazet-glaptop2.roam.corp.google.com>



On 10/15/2015 05:32 PM, Eric Dumazet wrote:
> On Thu, 2015-10-15 at 16:41 +0300, Nikolay Borisov wrote:
>>
>> On 10/15/2015 04:32 PM, Eric Dumazet wrote:
>>> On Thu, 2015-10-15 at 13:56 +0300, Nikolay Borisov wrote:
>>>> Commit 00590fdd5be0 introduced RCU locking in list type and in
>>>> doing so introduced a memory allocation in list_set_add, which
>>>> results in the following splat:
>>>>
>>>> BUG: sleeping function called from invalid context at mm/page_alloc.c:2759
>>>> in_atomic(): 1, irqs_disabled(): 0, pid: 9664, name: ipset
>>>> CPU: 18 PID: 9664 Comm: ipset Tainted: G           O 3.12.47-clouder3 #1
>>>> Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
>>>>  0000000000000002 ffff881fd14273c8 ffffffff8163d891 ffff881fcb4264b0
>>>>  ffff881fcb4260c0 ffff881fd14273e8 ffffffff810ba5bf ffff881fd1427558
>>>>  0000000000000000 ffff881fd1427568 ffffffff81142b33 ffff881f00000000
>>>> Call Trace:
>>>>  [<ffffffff8163d891>] dump_stack+0x58/0x7f
>>>>  [<ffffffff810ba5bf>] __might_sleep+0xdf/0x110
>>>>  [<ffffffff81142b33>] __alloc_pages_nodemask+0x243/0xc20
>>>>  [<ffffffff81181c6e>] alloc_pages_current+0xbe/0x170
>>>>  [<ffffffff81188315>] new_slab+0x295/0x340
>>>>  [<ffffffff81189a40>] __slab_alloc+0x2c0/0x5a0
>>>>  [<ffffffff8164000c>] ? __schedule+0x2dc/0x760
>>>>  [<ffffffff8118a71b>] __kmalloc+0x11b/0x230
>>>>  [<ffffffffa02bd0ac>] ? ip_set_get_byname+0xec/0x100 [ip_set]
>>>>  [<ffffffffa02d23fb>] list_set_uadd+0x16b/0x314 [ip_set_list_set]
>>>>  [<ffffffff81642148>] ? _raw_write_unlock_bh+0x28/0x30
>>>>  [<ffffffffa02d1cfc>] list_set_uadt+0x21c/0x320 [ip_set_list_set]
>>>>  [<ffffffffa02d2290>] ? list_set_create+0x1a0/0x1a0 [ip_set_list_set]
>>>>  [<ffffffffa02be242>] call_ad+0x82/0x200 [ip_set]
>>>>  [<ffffffffa02bb171>] ? find_set_type+0x51/0xa0 [ip_set]
>>>>  [<ffffffff8133f275>] ? nla_parse+0xf5/0x130
>>>>  [<ffffffffa02be8ae>] ip_set_uadd+0x20e/0x2d0 [ip_set]
>>>>  [<ffffffffa02be013>] ? ip_set_create+0x2a3/0x450 [ip_set]
>>>>  [<ffffffffa02be6a0>] ? ip_set_udel+0x2e0/0x2e0 [ip_set]
>>>>  [<ffffffff815b316e>] nfnetlink_rcv_msg+0x31e/0x330
>>>>  [<ffffffff815b2e91>] ? nfnetlink_rcv_msg+0x41/0x330
>>>>  [<ffffffff815b2e50>] ? nfnl_lock+0x30/0x30
>>>>  [<ffffffff815ae179>] netlink_rcv_skb+0xa9/0xd0
>>>>  [<ffffffff815b2d45>] nfnetlink_rcv+0x15/0x20
>>>>  [<ffffffff815ade5f>] netlink_unicast+0x10f/0x190
>>>>  [<ffffffff815aedb0>] netlink_sendmsg+0x2c0/0x660
>>>>  [<ffffffff81567f00>] sock_sendmsg+0x90/0xc0
>>>>  [<ffffffff81565b03>] ? move_addr_to_user+0xa3/0xc0
>>>>  [<ffffffff81568552>] ? ___sys_recvmsg+0x182/0x300
>>>>  [<ffffffff81568064>] SYSC_sendto+0x134/0x180
>>>>  [<ffffffff811c4e01>] ? mntput+0x21/0x30
>>>>  [<ffffffff81572d2f>] ? __kfree_skb+0x3f/0xa0
>>>>  [<ffffffff815680be>] SyS_sendto+0xe/0x10
>>>>  [<ffffffff816434b2>] system_call_fastpath+0x16/0x1b
>>>>
>>>> The call chain leading to this is as follow:
>>>> call_ad -> list_set_uadt -> list_set_uadd -> kzalloc(, GFP_KERNEL).
>>>> And since GFP_KERNEL allows initiating direct reclaim thus
>>>> potentially sleeping in the allocation path, this leads to the
>>>> aforementioned splat.
>>>>
>>>> To fix it change the allocation type to GFP_ATOMIC, to
>>>> correctly reflect that it is occuring in an atomic context.
>>>>
>>>> Fixes: 00590fdd5be0 ("netfilter: ipset: Introduce RCU locking in list type")
>>>>
>>>> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
>>>> Signed-off-by: Nikolay Borisov <kernel@kyup.com>
>>>> ---
>>>>
>>>> Changes since V1: 
>>>>  * Added acked-by 
>>>>  * Fixed patch header 
>>>>
>>>>  net/netfilter/ipset/ip_set_list_set.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/netfilter/ipset/ip_set_list_set.c b/net/netfilter/ipset/ip_set_list_set.c
>>>> index a1fe537..5a30ce6 100644
>>>> --- a/net/netfilter/ipset/ip_set_list_set.c
>>>> +++ b/net/netfilter/ipset/ip_set_list_set.c
>>>> @@ -297,7 +297,7 @@ list_set_uadd(struct ip_set *set, void *value, const struct ip_set_ext *ext,
>>>>  	      ip_set_timeout_expired(ext_timeout(n, set))))
>>>>  		n =  NULL;
>>>>  
>>>> -	e = kzalloc(set->dsize, GFP_KERNEL);
>>>> +	e = kzalloc(set->dsize, GFP_ATOMIC);
>>>>  	if (!e)
>>>>  		return -ENOMEM;
>>>>  	e->id = d->id;
>>>
>>> This patch looks very bogus to me.
>>>
>>> Could we fix the root cause please ?
>>>
>>> Root cause is that somewhere in this controlling path, an erroneous
>>> rcu_read_lock() is used, while it is very probably not needed, as
>>> controlling path should be protected by a mutex, which definitely is
>>> sane, because it allows us to perform GFP_KERNEL allocations and being
>>> preempted.
>>>
>>> Why are we using rcu_read_lock() in list_set_list() ?
>>>
>>> This looks as yet another bit of 'let us throw
>>> rcu_read_lock()/rcu_read_unlock() pairs' all over the places because it
>>> feels so good.
>>
>> I did check the call paths and there isn't an rcu_read_lock called in
>> list_set_uadt/list_set_uadd. On the contrary, this "write" operation to
>> the list is being serialised in call_ad() via set->lock spin_lock.
>>
>> What am I missing here?
> 
> I was not complaining to you, but to Jozsef ;)
> 
> Looking at commit 00590fdd5be0d7 terse changelog, we have to look at
> whole commit, and find suspicious rcu_read_lock()
> 
> Apparently, before this commit, allocations were safe, in process
> context (and using GFP_KERNEL), but after the commit, we have the
> unfortunate side effect of having potential burst of allocations
> done from bh context, using the special reserves dedicated to GFP_ATOMIC
> true users (processing packets from softirq handlers)

I guess the reason why a spinlock is used is due to the gc code which is
responsible for reaping entries whose timeout has elapsed. All of this
is happening in set_cleanup_entries. Given the state of things I don't
think using a mutex is a feasible solution unless the gc mechanism is
reworked.

> 
> I hate when we add more GFP_ATOMIC allocations all over the places,
> as this increases the risk of depleting memory reserves.
> 
> Can the spinlock be converted to a mutex ? If not, then instead of
> putting a stack trace in your changelog, a good explanation would be
> more useful.
> 
> Thanks !
> 
> 
> 

  reply	other threads:[~2015-10-15 14:49 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-15 10:56 [PATCH v2] netfilter: ipset: Fix sleeping memory allocation in atomic context Nikolay Borisov
2015-10-15 13:32 ` Eric Dumazet
2015-10-15 13:41   ` Nikolay Borisov
2015-10-15 14:32     ` Eric Dumazet
2015-10-15 14:49       ` Nikolay Borisov [this message]
2015-10-15 18:25       ` Jozsef Kadlecsik
2015-10-15 18:46         ` Eric Dumazet
2015-10-15 20:20           ` Nikolay Borisov
2015-10-15 20:53             ` Eric Dumazet
2015-10-16  9:22               ` Pablo Neira Ayuso
2015-10-16  9:27               ` Jozsef Kadlecsik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=561FBD01.8020906@kyup.com \
    --to=kernel@kyup.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=kadlec@blackhole.kfki.hu \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=operations@siteground.com \
    --cc=pablo@netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.