Re: [PATCH stable] memcg: add refcnt for pcpu stock to avoid UAF problem in drain_all_stock()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Michal Hocko <mhocko@suse.com>
To: "GONG, Ruiqi" <gongruiqi1@huawei.com>
Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeelb@google.com>,
	Muchun Song <muchun.song@linux.dev>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	Wang Weiyang <wangweiyang2@huawei.com>,
	Xiu Jianfeng <xiujianfeng@huawei.com>
Subject: Re: [PATCH stable] memcg: add refcnt for pcpu stock to avoid UAF problem in drain_all_stock()
Date: Wed, 21 Feb 2024 09:38:24 +0100	[thread overview]
Message-ID: <ZdW2gMajIw_cUN2-@tiehlicka> (raw)
In-Reply-To: <20240221081801.69764-1-gongruiqi1@huawei.com>

On Wed 21-02-24 16:18:01, GONG, Ruiqi wrote:
> commit 1a3e1f40962c445b997151a542314f3c6097f8c3 upstream.

I think it would be good to mention that this is only a partial backport
and also explain why to do a partial rather than the full one.

> There was a kernel panic happened on an in-house environment running
> 3.10, and the same problem was reproduced on 4.19:
> 
> general protection fault: 0000 [#1] SMP PTI
> CPU: 1 PID: 2085 Comm: bash Kdump: loaded Tainted: G             L    4.19.90+ #7
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
> RIP: 0010 drain_all_stock+0xad/0x140
> Code: 00 00 4d 85 ff 74 2c 45 85 c9 74 27 4d 39 fc 74 42 41 80 bc 24 28 04 00 00 00 74 17 49 8b 04 24 49 8b 17 48 8b 88 90 02 00 00 <48> 39 8a 90 02 00 00 74 02 eb 86 48 63 88 3c 01 00 00 39 8a 3c 01
> RSP: 0018:ffffa7efc5813d70 EFLAGS: 00010202
> RAX: ffff8cb185548800 RBX: ffff8cb89f420160 RCX: ffff8cb1867b6000
> RDX: babababababababa RSI: 0000000000000001 RDI: 0000000000231876
> RBP: 0000000000000000 R08: 0000000000000415 R09: 0000000000000002
> R10: 0000000000000000 R11: 0000000000000001 R12: ffff8cb186f89040
> R13: 0000000000020160 R14: 0000000000000001 R15: ffff8cb186b27040
> FS:  00007f4a308d3740(0000) GS:ffff8cb89f440000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffe4d634a68 CR3: 000000010b022000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  mem_cgroup_force_empty_write+0x31/0xb0
>  cgroup_file_write+0x60/0x140
>  ? __check_object_size+0x136/0x147
>  kernfs_fop_write+0x10e/0x190
>  __vfs_write+0x37/0x1b0
>  ? selinux_file_permission+0xe8/0x130
>  ? security_file_permission+0x2e/0xb0
>  vfs_write+0xb6/0x1a0
>  ksys_write+0x57/0xd0
>  do_syscall_64+0x63/0x250
>  ? async_page_fault+0x8/0x30
>  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
> Modules linked in: ...
> 
> It is found that in case of stock->nr_pages == 0, the memcg on
> stock->cached could be freed due to its refcnt decreased to 0, which
> made stock->cached become a dangling pointer. It could cause a UAF
> problem in drain_all_stock() in the following concurrent scenario. Note
> that drain_all_stock() doesn't disable irq but only preemption.
> 
> CPU1                             CPU2
> ==============================================================================
> stock->cached = memcgA (freed)
>                                  drain_all_stock(memcgB)
>                                   rcu_read_lock()
>                                   memcg = CPU1's stock->cached (memcgA)
>                                   (interrupted)
> refill_stock(memcgC)
>  drain_stock(memcgA)
>  stock->cached = memcgC
>  stock->nr_pages += xxx (> 0)
>                                   stock->nr_pages > 0
>                                   mem_cgroup_is_descendant(memcgA, memcgB) [UAF]
>                                   rcu_read_unlock()
> 
> This problem is, unintenionally, fixed at 5.9, where commit 1a3e1f40962c
> ("mm: memcontrol: decouple reference counting from page accounting")
> adds memcg refcnt for stock. Therefore affected LTS versions include
> 4.19 and 5.4.
> 
> For 4.19, memcg's css offline process doesn't call drain_all_stock(). so
> it's easier for the released memcg to be left on the stock. For 5.4,
> although mem_cgroup_css_offline() does call drain_all_stock(), but the
> flushing could be skipped when stock->nr_pages happens to be 0, and
> besides the async draining could be delayed and take place after the UAF
> problem has happened.
> 
> Fix this problem by adding (and decreasing) memcg's refcnt when memcg is
> put onto (and removed from) stock, just like how commit 1a3e1f40962c
> ("mm: memcontrol: decouple reference counting from page accounting")
> does. After all, "being on the stock" is a kind of reference with
> regards to memcg. As such, it's guaranteed that a css on stock would not
> be freed.

What does prevent from the following?

refill_stock(memcgC)		drain_all_stock(memcgB)
  drain_stock(memcgA)		  rcu_read_lock()
    css_put(old->css)		  memcgA = stock->cached
                                  mem_cgroup_is_descendant(memcgA, memcgB) UAF
    stock->cached = NULL

> 
> Cc: stable@vger.kernel.org      # 4.19 5.4
> Fixes: cdec2e4265df ("memcg: coalesce charging via percpu storage")
> Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
> ---
>  mm/memcontrol.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5a366cf79821..8c04296df1c7 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2015,6 +2015,9 @@ static void drain_stock(struct memcg_stock_pcp *stock)
>  {
>  	struct mem_cgroup *old = stock->cached;
>  
> +	if (!old)
> +		return;
> +
>  	if (stock->nr_pages) {
>  		page_counter_uncharge(&old->memory, stock->nr_pages);
>  		if (do_memsw_account())
> @@ -2022,6 +2025,8 @@ static void drain_stock(struct memcg_stock_pcp *stock)
>  		css_put_many(&old->css, stock->nr_pages);
>  		stock->nr_pages = 0;
>  	}
> +
> +	css_put(&old->css);
>  	stock->cached = NULL;
>  }
>  
> @@ -2057,6 +2062,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
>  	stock = this_cpu_ptr(&memcg_stock);
>  	if (stock->cached != memcg) { /* reset if necessary */
>  		drain_stock(stock);
> +		css_get(&memcg->css);
>  		stock->cached = memcg;
>  	}
>  	stock->nr_pages += nr_pages;
> -- 
> 2.25.1
> 

-- 
Michal Hocko
SUSE Labs

next prev parent reply	other threads:[~2024-02-21  8:38 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-21  8:18 [PATCH stable] memcg: add refcnt for pcpu stock to avoid UAF problem in drain_all_stock() GONG, Ruiqi
2024-02-21  8:38 ` Michal Hocko [this message]
2024-02-21  9:50   ` Gong Ruiqi
2024-02-21 10:08     ` Michal Hocko
2024-02-21 10:22       ` Michal Hocko
2024-02-22  2:01       ` Gong Ruiqi
2024-02-21  8:40 ` Greg KH
2024-02-21  8:48   ` Michal Hocko
2024-02-21 10:33     ` Greg KH
2024-02-21 10:48       ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZdW2gMajIw_cUN2-@tiehlicka \
    --to=mhocko@suse.com \
    --cc=cgroups@vger.kernel.org \
    --cc=gongruiqi1@huawei.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=stable@vger.kernel.org \
    --cc=wangweiyang2@huawei.com \
    --cc=xiujianfeng@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.