From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Mukesh Ojha <quic_mojha@quicinc.com>,
Ziwei Dai <ziwei.dai@unisoc.com>,
"Uladzislau Rezki (Sony)" <urezki@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Suren Baghdasaryan <surenb@google.com>
Subject: [PATCH 5.10 75/89] rcu/kvfree: Avoid freeing new kfree_rcu() memory after old grace period
Date: Mon, 19 Jun 2023 12:31:03 +0200 [thread overview]
Message-ID: <20230619102141.680726056@linuxfoundation.org> (raw)
In-Reply-To: <20230619102138.279161276@linuxfoundation.org>
From: Ziwei Dai <ziwei.dai@unisoc.com>
commit 5da7cb193db32da783a3f3e77d8b639989321d48 upstream.
Memory passed to kvfree_rcu() that is to be freed is tracked by a
per-CPU kfree_rcu_cpu structure, which in turn contains pointers
to kvfree_rcu_bulk_data structures that contain pointers to memory
that has not yet been handed to RCU, along with an kfree_rcu_cpu_work
structure that tracks the memory that has already been handed to RCU.
These structures track three categories of memory: (1) Memory for
kfree(), (2) Memory for kvfree(), and (3) Memory for both that arrived
during an OOM episode. The first two categories are tracked in a
cache-friendly manner involving a dynamically allocated page of pointers
(the aforementioned kvfree_rcu_bulk_data structures), while the third
uses a simple (but decidedly cache-unfriendly) linked list through the
rcu_head structures in each block of memory.
On a given CPU, these three categories are handled as a unit, with that
CPU's kfree_rcu_cpu_work structure having one pointer for each of the
three categories. Clearly, new memory for a given category cannot be
placed in the corresponding kfree_rcu_cpu_work structure until any old
memory has had its grace period elapse and thus has been removed. And
the kfree_rcu_monitor() function does in fact check for this.
Except that the kfree_rcu_monitor() function checks these pointers one
at a time. This means that if the previous kfree_rcu() memory passed
to RCU had only category 1 and the current one has only category 2, the
kfree_rcu_monitor() function will send that current category-2 memory
along immediately. This can result in memory being freed too soon,
that is, out from under unsuspecting RCU readers.
To see this, consider the following sequence of events, in which:
o Task A on CPU 0 calls rcu_read_lock(), then uses "from_cset",
then is preempted.
o CPU 1 calls kfree_rcu(cset, rcu_head) in order to free "from_cset"
after a later grace period. Except that "from_cset" is freed
right after the previous grace period ended, so that "from_cset"
is immediately freed. Task A resumes and references "from_cset"'s
member, after which nothing good happens.
In full detail:
CPU 0 CPU 1
---------------------- ----------------------
count_memcg_event_mm()
|rcu_read_lock() <---
|mem_cgroup_from_task()
|// css_set_ptr is the "from_cset" mentioned on CPU 1
|css_set_ptr = rcu_dereference((task)->cgroups)
|// Hard irq comes, current task is scheduled out.
cgroup_attach_task()
|cgroup_migrate()
|cgroup_migrate_execute()
|css_set_move_task(task, from_cset, to_cset, true)
|cgroup_move_task(task, to_cset)
|rcu_assign_pointer(.., to_cset)
|...
|cgroup_migrate_finish()
|put_css_set_locked(from_cset)
|from_cset->refcount return 0
|kfree_rcu(cset, rcu_head) // free from_cset after new gp
|add_ptr_to_bulk_krc_lock()
|schedule_delayed_work(&krcp->monitor_work, ..)
kfree_rcu_monitor()
|krcp->bulk_head[0]'s work attached to krwp->bulk_head_free[]
|queue_rcu_work(system_wq, &krwp->rcu_work)
|if rwork->rcu.work is not in WORK_STRUCT_PENDING_BIT state,
|call_rcu(&rwork->rcu, rcu_work_rcufn) <--- request new gp
// There is a perious call_rcu(.., rcu_work_rcufn)
// gp end, rcu_work_rcufn() is called.
rcu_work_rcufn()
|__queue_work(.., rwork->wq, &rwork->work);
|kfree_rcu_work()
|krwp->bulk_head_free[0] bulk is freed before new gp end!!!
|The "from_cset" is freed before new gp end.
// the task resumes some time later.
|css_set_ptr->subsys[(subsys_id) <--- Caused kernel crash, because css_set_ptr is freed.
This commit therefore causes kfree_rcu_monitor() to refrain from moving
kfree_rcu() memory to the kfree_rcu_cpu_work structure until the RCU
grace period has completed for all three categories.
v2: Use helper function instead of inserted code block at kfree_rcu_monitor().
Fixes: 34c881745549 ("rcu: Support kfree_bulk() interface in kfree_rcu()")
Fixes: 5f3c8d620447 ("rcu/tree: Maintain separate array for vmalloc ptrs")
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Ziwei Dai <ziwei.dai@unisoc.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
kernel/rcu/tree.c | 49 +++++++++++++++++++++++++++++++++++--------------
1 file changed, 35 insertions(+), 14 deletions(-)
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3281,6 +3281,30 @@ static void kfree_rcu_work(struct work_s
}
}
+static bool
+need_offload_krc(struct kfree_rcu_cpu *krcp)
+{
+ int i;
+
+ for (i = 0; i < FREE_N_CHANNELS; i++)
+ if (krcp->bkvhead[i])
+ return true;
+
+ return !!krcp->head;
+}
+
+static bool
+need_wait_for_krwp_work(struct kfree_rcu_cpu_work *krwp)
+{
+ int i;
+
+ for (i = 0; i < FREE_N_CHANNELS; i++)
+ if (krwp->bkvhead_free[i])
+ return true;
+
+ return !!krwp->head_free;
+}
+
/*
* Schedule the kfree batch RCU work to run in workqueue context after a GP.
*
@@ -3298,16 +3322,13 @@ static inline bool queue_kfree_rcu_work(
for (i = 0; i < KFREE_N_BATCHES; i++) {
krwp = &(krcp->krw_arr[i]);
- /*
- * Try to detach bkvhead or head and attach it over any
- * available corresponding free channel. It can be that
- * a previous RCU batch is in progress, it means that
- * immediately to queue another one is not possible so
- * return false to tell caller to retry.
- */
- if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) ||
- (krcp->bkvhead[1] && !krwp->bkvhead_free[1]) ||
- (krcp->head && !krwp->head_free)) {
+ // Try to detach bulk_head or head and attach it, only when
+ // all channels are free. Any channel is not free means at krwp
+ // there is on-going rcu work to handle krwp's free business.
+ if (need_wait_for_krwp_work(krwp))
+ continue;
+
+ if (need_offload_krc(krcp)) {
// Channel 1 corresponds to SLAB ptrs.
// Channel 2 corresponds to vmalloc ptrs.
for (j = 0; j < FREE_N_CHANNELS; j++) {
@@ -3334,12 +3355,12 @@ static inline bool queue_kfree_rcu_work(
*/
queue_rcu_work(system_wq, &krwp->rcu_work);
}
-
- // Repeat if any "free" corresponding channel is still busy.
- if (krcp->bkvhead[0] || krcp->bkvhead[1] || krcp->head)
- repeat = true;
}
+ // Repeat if any "free" corresponding channel is still busy.
+ if (need_offload_krc(krcp))
+ repeat = true;
+
return !repeat;
}
next prev parent reply other threads:[~2023-06-19 10:58 UTC|newest]
Thread overview: 98+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-19 10:29 [PATCH 5.10 00/89] 5.10.185-rc1 review Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 01/89] lib: cleanup kstrto*() usage Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 02/89] kernel.h: split out kstrtox() and simple_strtox() to a separate header Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 03/89] test_firmware: Use kstrtobool() instead of strtobool() Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 04/89] test_firmware: prevent race conditions by a correct implementation of locking Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 05/89] test_firmware: fix a memory leak with reqs buffer Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 06/89] power: supply: ab8500: Fix external_power_changed race Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 07/89] power: supply: sc27xx: " Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 08/89] power: supply: bq27xxx: Use mod_delayed_work() instead of cancel() + schedule() Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 09/89] ARM: dts: vexpress: add missing cache properties Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 10/89] tools: gpio: fix debounce_period_us output of lsgpio Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 11/89] power: supply: Ratelimit no data debug output Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 12/89] platform/x86: asus-wmi: Ignore WMI events with codes 0x7B, 0xC0 Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 13/89] regulator: Fix error checking for debugfs_create_dir Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 14/89] irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 15/89] power: supply: Fix logic checking if system is running from battery Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 16/89] btrfs: scrub: try harder to mark RAID56 block groups read-only Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 17/89] btrfs: handle memory allocation failure in btrfs_csum_one_bio Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 18/89] ASoC: soc-pcm: test if a BE can be prepared Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 19/89] parisc: Improve cache flushing for PCXL in arch_sync_dma_for_cpu() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 20/89] parisc: Flush gatt writes and adjust gatt mask in parisc_agp_mask_memory() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 21/89] MIPS: Alchemy: fix dbdma2 Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 22/89] mips: Move initrd_start check after initrd address sanitisation Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 23/89] ASoC: dwc: move DMA init to snd_soc_dai_driver probe() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 24/89] xen/blkfront: Only check REQ_FUA for writes Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 25/89] drm:amd:amdgpu: Fix missing buffer object unlock in failure path Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 26/89] irqchip/gic: Correctly validate OF quirk descriptors Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 27/89] io_uring: hold uring mutex around poll removal Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 28/89] epoll: ep_autoremove_wake_function should use list_del_init_careful Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 29/89] ocfs2: fix use-after-free when unmounting read-only filesystem Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 30/89] ocfs2: check new file size on fallocate call Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 31/89] nios2: dts: Fix tse_mac "max-frame-size" property Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 32/89] nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 33/89] nilfs2: fix possible out-of-bounds segment allocation in resize ioctl Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 34/89] kexec: support purgatories with .text.hot sections Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 35/89] x86/purgatory: remove PGO flags Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 36/89] powerpc/purgatory: " Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 37/89] nouveau: fix client work fence deletion race Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 38/89] RDMA/uverbs: Restrict usage of privileged QKEYs Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 39/89] net: usb: qmi_wwan: add support for Compal RXM-G1 Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 40/89] ALSA: hda/realtek: Add a quirk for Compaq N14JP6 Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 41/89] Remove DECnet support from kernel Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 42/89] USB: serial: option: add Quectel EM061KGL series Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 43/89] serial: lantiq: add missing interrupt ack Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 44/89] usb: dwc3: gadget: Reset num TRBs before giving back the request Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 45/89] RDMA/rtrs: Fix the last iu->buf leak in err path Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 46/89] spi: fsl-dspi: avoid SCK glitches with continuous transfers Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 47/89] netfilter: nfnetlink: skip error delivery on batch in case of ENOMEM Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 48/89] net: enetc: correct the indexes of highest and 2nd highest TCs Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 49/89] ping6: Fix send to link-local addresses with VRF Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 50/89] net/sched: cls_u32: Fix reference counter leak leading to overflow Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 51/89] RDMA/rxe: Remove the unused variable obj Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 52/89] RDMA/rxe: Removed unused name from rxe_task struct Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 53/89] RDMA/rxe: Fix the use-before-initialization error of resp_pkts Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 54/89] iavf: remove mask from iavf_irq_enable_queues() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 55/89] octeontx2-af: fixed resource availability check Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 56/89] RDMA/mlx5: Initiate dropless RQ for RAW Ethernet functions Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 57/89] RDMA/cma: Always set static rate to 0 for RoCE Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 58/89] IB/uverbs: Fix to consider event queue closing also upon non-blocking mode Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 59/89] IB/isert: Fix dead lock in ib_isert Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 60/89] IB/isert: Fix possible list corruption in CMA handler Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 61/89] IB/isert: Fix incorrect release of isert connection Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 62/89] ipvlan: fix bound dev checking for IPv6 l3s mode Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 63/89] sctp: fix an error code in sctp_sf_eat_auth() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 64/89] igb: fix nvm.ops.read() error handling Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 65/89] drm/nouveau: dont detect DSM for non-NVIDIA device Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 66/89] drm/nouveau/dp: check for NULL nv_connector->native_mode Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 67/89] drm/nouveau: add nv_encoder pointer check for NULL Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 68/89] ext4: drop the call to ext4_error() from ext4_get_group_info() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 69/89] net/sched: cls_api: Fix lockup on flushing explicitly created chain Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 70/89] net: lapbether: only support ethernet devices Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 71/89] net: tipc: resize nlattr array to correct size Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 72/89] selftests/ptp: Fix timestamp printf format for PTP_SYS_OFFSET Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 73/89] afs: Fix vlserver probe RTT handling Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 74/89] cgroup: always put cset in cgroup_css_set_put_fork Greg Kroah-Hartman
2023-06-19 10:31 ` Greg Kroah-Hartman [this message]
2023-06-19 10:31 ` [PATCH 5.10 76/89] neighbour: Remove unused inline function neigh_key_eq16() Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 77/89] net: Remove unused inline function dst_hold_and_use() Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 78/89] net: Remove DECnet leftovers from flow.h Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 79/89] neighbour: delete neigh_lookup_nodev as not used Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 80/89] batman-adv: Switch to kstrtox.h for kstrtou64 Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 81/89] mmc: block: ensure error propagation for non-blk Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 82/89] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 83/89] nilfs2: reject devices with insufficient block count Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 84/89] media: dvbdev: Fix memleak in dvb_register_device Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 85/89] media: dvbdev: fix error logic at dvb_register_device() Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 86/89] media: dvb-core: Fix use-after-free due to race " Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 87/89] drm/i915/dg1: Wait for pcode/uncore handshake at startup Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 88/89] drm/i915/gen11+: Only load DRAM information from pcode Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 89/89] um: Fix build w/o CONFIG_PM_SLEEP Greg Kroah-Hartman
2023-06-19 13:20 ` [PATCH 5.10 00/89] 5.10.185-rc1 review Florian Fainelli
2023-06-20 9:18 ` Chris Paterson
2023-06-20 10:21 ` Jon Hunter
2023-06-20 11:04 ` Sudip Mukherjee (Codethink)
2023-06-20 14:37 ` Naresh Kamboju
2023-06-20 17:08 ` Allen Pais
2023-06-20 21:04 ` Shuah Khan
2023-06-21 0:38 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230619102141.680726056@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=patches@lists.linux.dev \
--cc=paulmck@kernel.org \
--cc=quic_mojha@quicinc.com \
--cc=stable@vger.kernel.org \
--cc=surenb@google.com \
--cc=urezki@gmail.com \
--cc=ziwei.dai@unisoc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).