From: James Morse <james.morse@arm.com>
To: Valentin Schneider <valentin.schneider@arm.com>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
Fenghua Yu <fenghua.yu@intel.com>,
Reinette Chatre <reinette.chatre@intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH 2/2] x86/intel_rdt: Plug task_work vs task_struct {rmid,closid} update race
Date: Fri, 20 Nov 2020 14:53:16 +0000 [thread overview]
Message-ID: <c3370c24-98f2-dac8-e390-9c80a3321cf0@arm.com> (raw)
In-Reply-To: <20201118180030.22764-3-valentin.schneider@arm.com>
Hi Valentin,
On 18/11/2020 18:00, Valentin Schneider wrote:
> Upon moving a task to a new control / monitor group, said task's {closid,
> rmid} fields are updated *after* triggering the move_myself() task_work
> callback. This can cause said callback to miss the update, e.g. if the
> triggering thread got preempted before fiddling with task_struct, or if the
> targeted task was already on its way to return to userspace.
So, if move_myself() runs after task_work_add() but before tsk is written to.
Sounds fun!
> Update the task_struct's {closid, rmid} tuple *before* invoking
> task_work_add(). As they can happen concurrently, wrap {closid, rmid}
> accesses with READ_ONCE() and WRITE_ONCE(). Highlight the required ordering
> with a pair of comments.
... and this one is if move_myself() or __resctrl_sched_in() runs while tsk is being
written to on another CPU. It might get torn values, or multiple-reads get different values.
The READ_ONCE/WRITE_ONCEry would have been easier to read as a separate patch as you touch
all sites, and move/change some of them.
Regardless:
Reviewed-by: James Morse <james.morse@arm.com>
I don't 'get' memory-ordering, so one curiosity below:
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index b6b5b95df833..135a51529f70 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -524,11 +524,13 @@ static void move_myself(struct callback_head *head)
> * If resource group was deleted before this task work callback
> * was invoked, then assign the task to root group and free the
> * resource group.
> + *
> + * See pairing atomic_inc() in __rdtgroup_move_task()
> */
> if (atomic_dec_and_test(&rdtgrp->waitcount) &&
> (rdtgrp->flags & RDT_DELETED)) {
> - current->closid = 0;
> - current->rmid = 0;
> + WRITE_ONCE(current->closid, 0);
> + WRITE_ONCE(current->rmid, 0);
> kfree(rdtgrp);
> }
>
> @@ -553,14 +555,32 @@ static int __rdtgroup_move_task(struct task_struct *tsk,
> /*
> * Take a refcount, so rdtgrp cannot be freed before the
> * callback has been invoked.
> + *
> + * Also ensures above {closid, rmid} writes are observed by
> + * move_myself(), as it can run immediately after task_work_add().
> + * Otherwise old values may be loaded, and the move will only actually
> + * happen at the next context switch.
But __resctrl_sched_in() can still occur at anytime and READ_ONCE() a pair of values that
don't go together?
I don't think this is a problem for RDT as with old-rmid the task was a member of that
monitor-group previously, and 'freed' rmid are kept in limbo for a while after.
(old-closid is the same as the task having not schedule()d since the change, which is fine).
For MPAM, this is more annoying as changing just the closid may put the task in a
monitoring group that never existed, meaning its surprise dirty later.
If this all makes sense, I guess the fix (for much later) is to union closid/rmid, and
WRITE_ONCE() them together where necessary.
(I've made a note for when I next pass that part of the MPAM tree)
> + *
> + * Pairs with atomic_dec() in move_myself().
> */
> atomic_inc(&rdtgrp->waitcount);
> +
> ret = task_work_add(tsk, &callback->work, TWA_RESUME);
> if (ret) {
> /*
Thanks!
James
next prev parent reply other threads:[~2020-11-20 14:53 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-18 18:00 [PATCH 0/2] x86/intel_rdt: task_work vs task_struct rmid/closid write race Valentin Schneider
2020-11-18 18:00 ` [PATCH 1/2] x86/intel_rdt: Check monitor group vs control group membership earlier Valentin Schneider
2020-11-20 14:53 ` James Morse
2020-11-20 15:53 ` Valentin Schneider
2020-11-18 18:00 ` [PATCH 2/2] x86/intel_rdt: Plug task_work vs task_struct {rmid,closid} update race Valentin Schneider
2020-11-20 14:53 ` James Morse [this message]
2020-11-20 15:54 ` Valentin Schneider
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c3370c24-98f2-dac8-e390-9c80a3321cf0@arm.com \
--to=james.morse@arm.com \
--cc=bp@alien8.de \
--cc=fenghua.yu@intel.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=reinette.chatre@intel.com \
--cc=tglx@linutronix.de \
--cc=valentin.schneider@arm.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox