From: Uladzislau Rezki <urezki@gmail.com>
To: Ziwei Dai <ziwei.dai@unisoc.com>
Cc: urezki@gmail.com, paulmck@kernel.org, frederic@kernel.org,
quic_neeraju@quicinc.com, josh@joshtriplett.org,
rostedt@goodmis.org, mathieu.desnoyers@efficios.com,
jiangshanlai@gmail.com, joel@joelfernandes.org,
rcu@vger.kernel.org, linux-kernel@vger.kernel.org,
shuang.wang@unisoc.com, yifan.xin@unisoc.com, ke.wang@unisoc.com,
xuewen.yan@unisoc.com, zhiguo.niu@unisoc.com,
zhaoyang.huang@unisoc.com
Subject: Re: [PATCH V2] rcu: Make sure new krcp free business is handled after the wanted rcu grace period.
Date: Fri, 31 Mar 2023 17:01:52 +0200 [thread overview]
Message-ID: <ZCb14DWFDKyuuhyi@pc638.lan> (raw)
In-Reply-To: <1680266529-28429-1-git-send-email-ziwei.dai@unisoc.com>
On Fri, Mar 31, 2023 at 08:42:09PM +0800, Ziwei Dai wrote:
> In kfree_rcu_monitor(), new free business at krcp is attached to any free
> channel at krwp. kfree_rcu_monitor() is responsible to make sure new free
> business is handled after the rcu grace period. But if there is any none-free
> channel at krwp already, that means there is an on-going rcu work,
> which will cause the kvfree_call_rcu()-triggered free business is done
> before the wanted rcu grace period ends.
>
> This commit ignore krwp which has non-free channel at kfree_rcu_monitor(),
> to fix the issue that kvfree_call_rcu() loses effectiveness.
>
> Below is the css_set obj "from_cset" use-after-free case caused by
> kvfree_call_rcu() losing effectiveness.
> CPU 0 calls rcu_read_lock(), then use "from_cset", then hard irq comes,
> the task is schedule out.
> CPU 1 calls kfree_rcu(cset, rcu_head), willing to free "from_cset" after new gp.
> But "from_cset" is freed right after current gp end. "from_cset" is reallocated.
> CPU 0 's task arrives back, references "from_cset"'s member, which causes crash.
>
> CPU 0 CPU 1
> count_memcg_event_mm()
> |rcu_read_lock() <---
> |mem_cgroup_from_task()
> |// css_set_ptr is the "from_cset" mentioned on CPU 1
> |css_set_ptr = rcu_dereference((task)->cgroups)
> |// Hard irq comes, current task is scheduled out.
>
> cgroup_attach_task()
> |cgroup_migrate()
> |cgroup_migrate_execute()
> |css_set_move_task(task, from_cset, to_cset, true)
> |cgroup_move_task(task, to_cset)
> |rcu_assign_pointer(.., to_cset)
> |...
> |cgroup_migrate_finish()
> |put_css_set_locked(from_cset)
> |from_cset->refcount return 0
> |kfree_rcu(cset, rcu_head) // means to free from_cset after new gp
> |add_ptr_to_bulk_krc_lock()
> |schedule_delayed_work(&krcp->monitor_work, ..)
>
> kfree_rcu_monitor()
> |krcp->bulk_head[0]'s work attached to krwp->bulk_head_free[]
> |queue_rcu_work(system_wq, &krwp->rcu_work)
> |if rwork->rcu.work is not in WORK_STRUCT_PENDING_BIT state,
> |call_rcu(&rwork->rcu, rcu_work_rcufn) <--- request a new gp
>
> // There is a perious call_rcu(.., rcu_work_rcufn)
> // gp end, rcu_work_rcufn() is called.
> rcu_work_rcufn()
> |__queue_work(.., rwork->wq, &rwork->work);
>
> |kfree_rcu_work()
> |krwp->bulk_head_free[0] bulk is freed before new gp end!!!
> |The "from_cset" is freed before new gp end.
>
> // the task is scheduled in after many ms.
> |css_set_ptr->subsys[(subsys_id) <--- Caused kernel crash, because css_set_ptr is freed.
>
> v2: Use helper function instead of inserted code block at kfree_rcu_monitor().
>
> Fixes: c014efeef76a ("rcu: Add multiple in-flight batches of kfree_rcu() work")
> Signed-off-by: Ziwei Dai <ziwei.dai@unisoc.com>
> ---
> kernel/rcu/tree.c | 27 +++++++++++++++++++--------
> 1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8e880c0..7b95ee9 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3024,6 +3024,18 @@ static void kfree_rcu_work(struct work_struct *work)
> return !!READ_ONCE(krcp->head);
> }
>
> +static bool
> +need_wait_for_krwp_work(struct kfree_rcu_cpu_work *krwp)
> +{
> + int i;
> +
> + for (i = 0; i < FREE_N_CHANNELS; i++)
> + if (!list_empty(&krwp->bulk_head_free[i]))
> + return true;
> +
> + return !!krwp->head_free;
> +}
> +
> static int krc_count(struct kfree_rcu_cpu *krcp)
> {
> int sum = atomic_read(&krcp->head_count);
> @@ -3107,15 +3119,14 @@ static void kfree_rcu_monitor(struct work_struct *work)
> for (i = 0; i < KFREE_N_BATCHES; i++) {
> struct kfree_rcu_cpu_work *krwp = &(krcp->krw_arr[i]);
>
> - // Try to detach bulk_head or head and attach it over any
> - // available corresponding free channel. It can be that
> - // a previous RCU batch is in progress, it means that
> - // immediately to queue another one is not possible so
> - // in that case the monitor work is rearmed.
> - if ((!list_empty(&krcp->bulk_head[0]) && list_empty(&krwp->bulk_head_free[0])) ||
> - (!list_empty(&krcp->bulk_head[1]) && list_empty(&krwp->bulk_head_free[1])) ||
> - (READ_ONCE(krcp->head) && !krwp->head_free)) {
> + // Try to detach bulk_head or head and attach it, only when
> + // all channels are free. Any channel is not free means at krwp
> + // there is on-going rcu work to handle krwp's free business.
> + if (need_wait_for_krwp_work(krwp))
> + continue;
>
> + // kvfree_rcu_drain_ready() might handle this krcp, if so give up.
> + if (need_offload_krc(krcp)) {
> // Channel 1 corresponds to the SLAB-pointer bulk path.
> // Channel 2 corresponds to vmalloc-pointer bulk path.
> for (j = 0; j < FREE_N_CHANNELS; j++) {
> --
> 1.9.1
>
It looks correct to me. I will test it over weekend.
--
Uladzislau Rezki
next prev parent reply other threads:[~2023-03-31 15:02 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-31 12:42 [PATCH V2] rcu: Make sure new krcp free business is handled after the wanted rcu grace period Ziwei Dai
2023-03-31 15:01 ` Uladzislau Rezki [this message]
2023-04-03 7:28 ` Uladzislau Rezki
2023-04-03 14:01 ` Mukesh Ojha
2023-04-03 22:58 ` Paul E. McKenney
2023-04-04 2:49 ` 答复: " 代子为 (Ziwei Dai)
2023-04-04 3:23 ` Paul E. McKenney
2023-04-05 17:39 ` Joel Fernandes
2023-04-05 18:12 ` Joel Fernandes
2023-04-05 18:46 ` Paul E. McKenney
2023-04-06 1:38 ` 答复: " 代子为 (Ziwei Dai)
2023-04-06 3:46 ` Paul E. McKenney
2023-04-06 4:44 ` Uladzislau Rezki
2023-04-06 14:13 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZCb14DWFDKyuuhyi@pc638.lan \
--to=urezki@gmail.com \
--cc=frederic@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=josh@joshtriplett.org \
--cc=ke.wang@unisoc.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=paulmck@kernel.org \
--cc=quic_neeraju@quicinc.com \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=shuang.wang@unisoc.com \
--cc=xuewen.yan@unisoc.com \
--cc=yifan.xin@unisoc.com \
--cc=zhaoyang.huang@unisoc.com \
--cc=zhiguo.niu@unisoc.com \
--cc=ziwei.dai@unisoc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.