Archive-only list for patches
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, "GONG, Ruiqi" <gongruiqi1@huawei.com>,
	Michal Hocko <mhocko@suse.com>,
	GONG
Subject: [PATCH 4.19 06/52] memcg: add refcnt for pcpu stock to avoid UAF problem in drain_all_stock()
Date: Tue, 27 Feb 2024 14:25:53 +0100	[thread overview]
Message-ID: <20240227131548.736821200@linuxfoundation.org> (raw)
In-Reply-To: <20240227131548.514622258@linuxfoundation.org>

4.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "GONG, Ruiqi" <gongruiqi1@huawei.com>

commit 1a3e1f40962c445b997151a542314f3c6097f8c3 upstream.

NOTE: This is a partial backport since we only need the refcnt between
memcg and stock to fix the problem stated below, and in this way
multiple versions use the same code and align with each other.

There was a kernel panic happened on an in-house environment running
3.10, and the same problem was reproduced on 4.19:

general protection fault: 0000 [#1] SMP PTI
CPU: 1 PID: 2085 Comm: bash Kdump: loaded Tainted: G             L    4.19.90+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010 drain_all_stock+0xad/0x140
Code: 00 00 4d 85 ff 74 2c 45 85 c9 74 27 4d 39 fc 74 42 41 80 bc 24 28 04 00 00 00 74 17 49 8b 04 24 49 8b 17 48 8b 88 90 02 00 00 <48> 39 8a 90 02 00 00 74 02 eb 86 48 63 88 3c 01 00 00 39 8a 3c 01
RSP: 0018:ffffa7efc5813d70 EFLAGS: 00010202
RAX: ffff8cb185548800 RBX: ffff8cb89f420160 RCX: ffff8cb1867b6000
RDX: babababababababa RSI: 0000000000000001 RDI: 0000000000231876
RBP: 0000000000000000 R08: 0000000000000415 R09: 0000000000000002
R10: 0000000000000000 R11: 0000000000000001 R12: ffff8cb186f89040
R13: 0000000000020160 R14: 0000000000000001 R15: ffff8cb186b27040
FS:  00007f4a308d3740(0000) GS:ffff8cb89f440000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe4d634a68 CR3: 000000010b022000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 mem_cgroup_force_empty_write+0x31/0xb0
 cgroup_file_write+0x60/0x140
 ? __check_object_size+0x136/0x147
 kernfs_fop_write+0x10e/0x190
 __vfs_write+0x37/0x1b0
 ? selinux_file_permission+0xe8/0x130
 ? security_file_permission+0x2e/0xb0
 vfs_write+0xb6/0x1a0
 ksys_write+0x57/0xd0
 do_syscall_64+0x63/0x250
 ? async_page_fault+0x8/0x30
 entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Modules linked in: ...

It is found that in case of stock->nr_pages == 0, the memcg on
stock->cached could be freed due to its refcnt decreased to 0, which
made stock->cached become a dangling pointer. It could cause a UAF
problem in drain_all_stock() in the following concurrent scenario. Note
that drain_all_stock() doesn't disable irq but only preemption.

CPU1                             CPU2
==============================================================================
stock->cached = memcgA (freed)
                                 drain_all_stock(memcgB)
                                  rcu_read_lock()
                                  memcg = CPU1's stock->cached (memcgA)
                                  (interrupted)
refill_stock(memcgC)
 drain_stock(memcgA)
 stock->cached = memcgC
 stock->nr_pages += xxx (> 0)
                                  stock->nr_pages > 0
                                  mem_cgroup_is_descendant(memcgA, memcgB) [UAF]
                                  rcu_read_unlock()

This problem is, unintentionally, fixed at 5.9, where commit
1a3e1f40962c ("mm: memcontrol: decouple reference counting from page
accounting") adds memcg refcnt for stock. Therefore affected LTS
versions include 4.19 and 5.4.

For 4.19, memcg's css offline process doesn't call drain_all_stock(). so
it's easier for the released memcg to be left on the stock. For 5.4,
although mem_cgroup_css_offline() does call drain_all_stock(), but the
flushing could be skipped when stock->nr_pages happens to be 0, and
besides the async draining could be delayed and take place after the UAF
problem has happened.

Fix this problem by adding (and decreasing) memcg's refcnt when memcg is
put onto (and removed from) stock, just like how commit 1a3e1f40962c
("mm: memcontrol: decouple reference counting from page accounting")
does. After all, "being on the stock" is a kind of reference with
regards to memcg. As such, it's guaranteed that a css on stock would not
be freed.

It's good to mention that refill_stock() is executed in an irq-disabled
context, so the drain_stock() patched with css_put() would not actually
free memcgA until the end of refill_stock(), since css_put() is an RCU
free and it's still in grace period. For CPU2, the access to CPU1's
stock->cached is protected by rcu_read_lock(), so in this case it gets
either NULL from stock->cached or a memcgA that is still good.

Cc: stable@vger.kernel.org      # 4.19 5.4
Fixes: cdec2e4265df ("memcg: coalesce charging via percpu storage")
Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/memcontrol.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2015,6 +2015,9 @@ static void drain_stock(struct memcg_sto
 {
 	struct mem_cgroup *old = stock->cached;
 
+	if (!old)
+		return;
+
 	if (stock->nr_pages) {
 		page_counter_uncharge(&old->memory, stock->nr_pages);
 		if (do_memsw_account())
@@ -2022,6 +2025,8 @@ static void drain_stock(struct memcg_sto
 		css_put_many(&old->css, stock->nr_pages);
 		stock->nr_pages = 0;
 	}
+
+	css_put(&old->css);
 	stock->cached = NULL;
 }
 
@@ -2057,6 +2062,7 @@ static void refill_stock(struct mem_cgro
 	stock = this_cpu_ptr(&memcg_stock);
 	if (stock->cached != memcg) { /* reset if necessary */
 		drain_stock(stock);
+		css_get(&memcg->css);
 		stock->cached = memcg;
 	}
 	stock->nr_pages += nr_pages;



  parent reply	other threads:[~2024-02-27 13:45 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27 13:25 [PATCH 4.19 00/52] 4.19.308-rc1 review Greg Kroah-Hartman
2024-02-27 13:25 ` [PATCH 4.19 01/52] net/sched: Retire CBQ qdisc Greg Kroah-Hartman
2024-02-27 13:25 ` [PATCH 4.19 02/52] net/sched: Retire ATM qdisc Greg Kroah-Hartman
2024-02-27 13:25 ` [PATCH 4.19 03/52] net/sched: Retire dsmark qdisc Greg Kroah-Hartman
2024-02-27 13:25 ` [PATCH 4.19 04/52] stmmac: no need to check return value of debugfs_create functions Greg Kroah-Hartman
2024-02-27 13:25 ` [PATCH 4.19 05/52] net: stmmac: fix notifier registration Greg Kroah-Hartman
2024-02-27 13:25 ` Greg Kroah-Hartman [this message]
2024-02-27 13:25 ` [PATCH 4.19 07/52] nilfs2: replace WARN_ONs for invalid DAT metadata block requests Greg Kroah-Hartman
2024-02-27 13:25 ` [PATCH 4.19 08/52] userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb Greg Kroah-Hartman
2024-02-27 13:25 ` [PATCH 4.19 09/52] sched/rt: Fix sysctl_sched_rr_timeslice intial value Greg Kroah-Hartman
2024-02-27 13:25 ` [PATCH 4.19 10/52] sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset Greg Kroah-Hartman
2024-02-27 13:25 ` [PATCH 4.19 11/52] sched/rt: Disallow writing invalid values to sched_rt_period_us Greg Kroah-Hartman
2024-02-27 13:25 ` [PATCH 4.19 12/52] scsi: target: core: Add TMF to tmr_list handling Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 13/52] dmaengine: shdma: increase size of dev_id Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 14/52] wifi: cfg80211: fix missing interfaces when dumping Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 15/52] wifi: mac80211: fix race condition on enabling fast-xmit Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 16/52] fbdev: savage: Error out if pixclock equals zero Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 17/52] fbdev: sis: " Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 18/52] ahci: asm1166: correct count of reported ports Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 19/52] ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found() Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 20/52] ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal() Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 21/52] regulator: pwm-regulator: Add validity checks in continuous .get_voltage Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 22/52] hwmon: (coretemp) Enlarge per package core count limit Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 23/52] firewire: core: send bus reset promptly on gap count error Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 24/52] virtio-blk: Ensure no requests in virtqueues before deleting vqs Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 25/52] s390/qeth: Fix potential loss of L3-IP@ in case of network issues Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 26/52] pmdomain: renesas: r8a77980-sysc: CR7 must be always on Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 27/52] IB/hfi1: Fix sdma.h tx->num_descs off-by-one error Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 28/52] mm: memcontrol: switch to rcu protection in drain_all_stock() Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 29/52] dm-crypt: dont modify the data when using authenticated encryption Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 30/52] gtp: fix use-after-free and null-ptr-deref in gtp_genl_dump_pdp() Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 31/52] l2tp: pass correct message length to ip6_append_data Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 32/52] ARM: ep93xx: Add terminator to gpiod_lookup_table Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 33/52] usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 34/52] usb: roles: dont get/set_role() when usb_role_switch is unregistered Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 35/52] IB/hfi1: Fix a memleak in init_credit_return Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 36/52] RDMA/bnxt_re: Return error for SRQ resize Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 37/52] RDMA/srpt: Support specifying the srpt_service_guid parameter Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 38/52] RDMA/ulp: Use dev_name instead of ibdev->name Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 39/52] RDMA/srpt: Make debug output more detailed Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 40/52] RDMA/srpt: fix function pointer cast warnings Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 41/52] scripts/bpf: teach bpf_helpers_doc.py to dump BPF helper definitions Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 42/52] bpf, scripts: Correct GPL license name Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 43/52] scsi: jazz_esp: Only build if SCSI core is builtin Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 44/52] nouveau: fix function cast warnings Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 45/52] ipv6: sr: fix possible use-after-free and null-ptr-deref Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 46/52] packet: move from strlcpy with unused retval to strscpy Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 47/52] s390: use the correct count for __iowrite64_copy() Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 48/52] PCI/MSI: Prevent MSI hardware interrupt number truncation Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 49/52] KVM: arm64: vgic-its: Test for valid IRQ in its_sync_lpi_pending_table() Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 50/52] KVM: arm64: vgic-its: Test for valid IRQ in MOVALL handler Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 51/52] fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio Greg Kroah-Hartman
2024-02-27 13:26 ` [PATCH 4.19 52/52] scripts/bpf: Fix xdp_md forward declaration typo Greg Kroah-Hartman
2024-02-27 18:31 ` [PATCH 4.19 00/52] 4.19.308-rc1 review Pavel Machek
2024-02-28  8:49 ` Naresh Kamboju
2024-02-28 13:39 ` Jon Hunter
2024-02-28 16:58 ` Shuah Khan
2024-02-28 18:16 ` Harshit Mogalapalli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240227131548.736821200@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=gongruiqi1@huawei.com \
    --cc=mhocko@suse.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox