From: Zefan Li <lizefan@huawei.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: "Roman Gushchin" <guro@fb.com>,
"Linux Kernel Network Developers" <netdev@vger.kernel.org>,
"Cameron Berkenpas" <cam@neo-zeon.de>,
"Peter Geis" <pgwipeout@gmail.com>,
"Lu Fengqi" <lufq.fnst@cn.fujitsu.com>,
"Daniël Sonck" <dsonck92@gmail.com>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Tejun Heo" <tj@kernel.org>
Subject: Re: [Patch net] cgroup: fix cgroup_sk_alloc() for sk_clone_lock()
Date: Sat, 20 Jun 2020 08:51:31 +0800 [thread overview]
Message-ID: <35480172-c77e-fb67-7559-04576f375ea6@huawei.com> (raw)
In-Reply-To: <459be87d-0272-9ea9-839a-823b01e354b6@huawei.com>
在 2020/6/20 8:45, Zefan Li 写道:
> On 2020/6/20 3:51, Cong Wang wrote:
>> On Thu, Jun 18, 2020 at 11:40 PM Zefan Li <lizefan@huawei.com> wrote:
>>>
>>> On 2020/6/19 5:09, Cong Wang wrote:
>>>> On Thu, Jun 18, 2020 at 12:36 PM Roman Gushchin <guro@fb.com> wrote:
>>>>>
>>>>> On Thu, Jun 18, 2020 at 12:19:13PM -0700, Cong Wang wrote:
>>>>>> On Wed, Jun 17, 2020 at 6:44 PM Zefan Li <lizefan@huawei.com> wrote:
>>>>>>>
>>>>>>> Cc: Roman Gushchin <guro@fb.com>
>>>>>>>
>>>>>>> Thanks for fixing this.
>>>>>>>
>>>>>>> On 2020/6/17 2:03, Cong Wang wrote:
>>>>>>>> When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
>>>>>>>> copied, so the cgroup refcnt must be taken too. And, unlike the
>>>>>>>> sk_alloc() path, sock_update_netprioidx() is not called here.
>>>>>>>> Therefore, it is safe and necessary to grab the cgroup refcnt
>>>>>>>> even when cgroup_sk_alloc is disabled.
>>>>>>>>
>>>>>>>> sk_clone_lock() is in BH context anyway, the in_interrupt()
>>>>>>>> would terminate this function if called there. And for sk_alloc()
>>>>>>>> skcd->val is always zero. So it's safe to factor out the code
>>>>>>>> to make it more readable.
>>>>>>>>
>>>>>>>> Fixes: 090e28b229af92dc5b ("netprio_cgroup: Fix unlimited memory leak of v2 cgroups")
>>>>>>>
>>>>>>> but I don't think the bug was introduced by this commit, because there
>>>>>>> are already calls to cgroup_sk_alloc_disable() in write_priomap() and
>>>>>>> write_classid(), which can be triggered by writing to ifpriomap or
>>>>>>> classid in cgroupfs. This commit just made it much easier to happen
>>>>>>> with systemd invovled.
>>>>>>>
>>>>>>> I think it's 4bfc0bb2c60e2f4c ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself"),
>>>>>>> which added cgroup_bpf_get() in cgroup_sk_alloc().
>>>>>>
>>>>>> Good point.
>>>>>>
>>>>>> I take a deeper look, it looks like commit d979a39d7242e06
>>>>>> is the one to blame, because it is the first commit that began to
>>>>>> hold cgroup refcnt in cgroup_sk_alloc().
>>>>>
>>>>> I agree, ut seems that the issue is not related to bpf and probably
>>>>> can be reproduced without CONFIG_CGROUP_BPF. d979a39d7242e06 indeed
>>>>> seems closer to the origin.
>>>>
>>>> Yeah, I will update the Fixes tag and send V2.
>>>>
>>>
>>> Commit d979a39d7242e06 looks innocent to me. With this commit when cgroup_sk_alloc
>>> is disabled and then a socket is cloned the cgroup refcnt will not be incremented,
>>> but this is fine, because when the socket is to be freed:
>>>
>>> sk_prot_free()
>>> cgroup_sk_free()
>>> cgroup_put(sock_cgroup_ptr(skcd)) == cgroup_put(&cgrp_dfl_root.cgrp)
>>>
>>> cgroup_put() does nothing for the default root cgroup, so nothing bad will happen.
>>
>> But skcd->val can be a pointer to a non-root cgroup:
>
> It returns a non-root cgroup when cgroup_sk_alloc is not disabled. The bug happens
> when cgroup_sk_alloc is disabled.
>
And please read those recent bug reports, they all happened when bpf cgroup was in use,
and there was no bpf cgroup when d979a39d7242e06 was merged into mainline.
next prev parent reply other threads:[~2020-06-20 0:54 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-16 18:03 [Patch net] cgroup: fix cgroup_sk_alloc() for sk_clone_lock() Cong Wang
2020-06-18 1:44 ` Zefan Li
2020-06-18 19:19 ` Cong Wang
2020-06-18 19:36 ` Roman Gushchin
2020-06-18 21:09 ` Cong Wang
2020-06-18 21:26 ` Roman Gushchin
2020-06-18 22:45 ` Peter Geis
2020-06-19 6:40 ` Zefan Li
2020-06-19 19:51 ` Cong Wang
2020-06-20 0:45 ` Zefan Li
2020-06-20 0:51 ` Zefan Li [this message]
2020-06-20 3:31 ` Cong Wang
2020-06-20 7:52 ` Zefan Li
2020-06-20 16:04 ` Roman Gushchin
2020-06-23 22:21 ` Roman Gushchin
2020-06-26 5:23 ` Cameron Berkenpas
2020-06-26 17:58 ` Cong Wang
2020-06-26 22:03 ` Cameron Berkenpas
2020-06-27 22:59 ` Cameron Berkenpas
2020-06-30 22:16 ` Cong Wang
2020-06-27 23:41 ` Roman Gushchin
2020-06-30 22:22 ` Cong Wang
2020-06-30 22:48 ` Roman Gushchin
2020-07-01 1:18 ` Zefan Li
2020-07-02 4:48 ` Cong Wang
2020-07-02 8:12 ` Thomas Lamprecht
2020-07-02 16:02 ` Roman Gushchin
2020-07-02 16:24 ` Peter Geis
2020-07-03 1:17 ` Zefan Li
2020-06-20 0:51 ` Roman Gushchin
2020-06-20 1:00 ` Zefan Li
2020-06-20 1:14 ` Roman Gushchin
2020-06-20 2:48 ` Zefan Li
2020-06-20 3:00 ` Cong Wang
2020-06-20 15:57 ` Roman Gushchin
2020-06-22 18:14 ` Cong Wang
2020-06-22 20:39 ` Roman Gushchin
2020-06-23 8:45 ` Zhang,Qiang
2020-06-23 17:56 ` Cong Wang
2020-06-23 8:54 ` Zhang,Qiang
2020-06-23 9:01 ` Zhang,Qiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=35480172-c77e-fb67-7559-04576f375ea6@huawei.com \
--to=lizefan@huawei.com \
--cc=cam@neo-zeon.de \
--cc=daniel@iogearbox.net \
--cc=dsonck92@gmail.com \
--cc=guro@fb.com \
--cc=lufq.fnst@cn.fujitsu.com \
--cc=netdev@vger.kernel.org \
--cc=pgwipeout@gmail.com \
--cc=tj@kernel.org \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).