From: Reinette Chatre <reinette.chatre@intel.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>, <x86@kernel.org>,
Fenghua Yu <fenghuay@nvidia.com>,
Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>,
Peter Newman <peternewman@google.com>,
James Morse <james.morse@arm.com>,
Babu Moger <babu.moger@amd.com>,
"Drew Fustini" <dfustini@baylibre.com>,
Dave Martin <Dave.Martin@arm.com>, Chen Yu <yu.c.chen@intel.com>,
<linux-kernel@vger.kernel.org>, <patches@lists.linux.dev>
Subject: Re: [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain()
Date: Tue, 5 May 2026 14:26:11 -0700 [thread overview]
Message-ID: <7fad1d7d-c892-416e-b97a-a230fd43f2a4@intel.com> (raw)
In-Reply-To: <afoesuWB8RezVLrN@agluck-desk3>
Hi Tony,
On 5/5/26 9:45 AM, Luck, Tony wrote:
> On Mon, May 04, 2026 at 09:39:40PM -0700, Reinette Chatre wrote:
...
> Here's the scenario:
>
> There is just one CPU left in a domain. The mbm_over worker is woken on
> that CPU just as user requests to take that CPU offline (executing on
> another CPU).
>
> There is a race. mbm_handle_overflow() has begun execution, but the
> offline process has taken cpus_write_lock() so it blocks and sleeps
> waiting for cpus_read_lock().
>
> The offline process calls:
>
> resctrl_arch_offline_cpu()
> -> resctrl_offline_cpu
> -> cancel_delayed_work(&d->mbm_over); /* not the _sync version */
Here I expect d->mbm_work_cpu to be set to this one CPU that is left in the domain.
> -> mbm_setup_overflow_handler(d, 0, cpu);
> Finds there are other CPUs in the domain
I do not think that this will find other CPUs in the domain here since
the scenario starts with there being only one CPU left in the domain and it is
currently running the overflow handler . From what I can tell, in this scenario,
mbm_setup_overflow_handler() will return without scheduling the overflow handler
on any CPU.
After mbm_setup_overflow_handler() returns I expect d->mbm_work_cpu to be set to
nr_cpus_ids.
This detail does not impact the race you are highlighting though.
> -> domain_remove_cpu()
> -> domain_remove_cpu_mon()
> clears bit for this CPU from hdr->cpu_mask and finds mask is empty
> -> resctrl_offline_mon_domain()
> -> cancel_delayed_work(&d->mbm_over); /* Again! Still not _sync version */
> -> domain_destroy_l3_mon_state()
> -> kfree(d->mbm_states[idx]); /* file system state */
> -> removes domain from L3 resource domain list
> -> l3_mon_domain_free()
> kfree(hw_dom->arch_mbm_states[idx]) /* architecture state */
> kfree(hw_dom)
>
> Offline process continues. One of the things it does is checks for
> orphans on the run queue of the now deceased CPU and adopts them.
>
> Finally the offline process completes and does cpus_write_unlock()
>
> Now mbm_handle_overflow() can continue. But it is on the wrong CPU
> and has a pointer to a work_struct that was freed by the offline
> process.
Thank you for the explanation, I did not consider this scenario. From what I understand
folks encountering this scenario should also encounter a splat from the MBM overflow
handler from all the places where it runs smp_processor_id(). Well, actually, if these
people are running with CONFIG_DEBUG_PREEMPT enabled because of the
debug_smp_processor_id()->check_preemption_disabled()->is_percpu_thread() check.
...
>> I still think that using get_mon_domain_from_cpu() in the workers could work here.
>> Here is the idea more specifically for the MBM overflow handler:
>>
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index 88f1fa0b9d8d..7186d6d02d6e 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -856,7 +856,9 @@ void mbm_handle_overflow(struct work_struct *work)
>> goto out_unlock;
>>
>> r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
>> - d = container_of(work, struct rdt_l3_mon_domain, mbm_over.work);
>> + d = get_mon_domain_from_cpu(smp_processor_id(), r);
>> + if (!d)
>> + goto out_unlock;
>
> This will get a domain, but it will be for the wrong CPU.
If it was migrated yes. From what I understand when this happens it stops being
a "per-CPU thread" so how about something like:
mbm_handle_overflow()
{
...
cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
/*
* Overflow handler migrated during race while CPU went offline and
* no longer running on intended CPU.
*/
if (!is_percpu_thread())
goto unlock;
...
out_unlock:
mutex_unlock(&rdtgroup_mutex);
cpus_read_unlock();
}
I am not clear on whether there are more than one race here now. From the flow
you describe it seems that when mbm_handle_overflow() runs on its intended CPU then
it can assume that its associated domain has not been removed and thus that it is
running with a valid work_struct? More specifically, if is_percpu_thread() is true
then "d = container_of(work, struct rdt_l3_mon_domain, mbm_over.work)" will work?
I am trying to match this race involving the CPU hotplug lock with the race described
in original changelog that involves rdtgroup_mutex here ...
>
> Maybe it doesn't hurt for it to perform an extra set of mbm_update()
> calls on that CPU?
The extra mbm_update() calls seem ok. An extra call to limbo handler should be ok also.
One possible issue is the impact on software controller that assumes it is being
called once per second. Looks like an extra call can be avoided though?
>
> It will then do:
>
> d->mbm_work_cpu = cpumask_any_housekeeping(&d->hdr.cpu_mask,
> RESCTRL_PICK_ANY_CPU);
> schedule_delayed_work_on(d->mbm_work_cpu, &d->mbm_over, delay);
>
> This "wrong" domain already has a worker ... Will this just reset the
> timeout to the new "delay" value? Possibly also to a different CPU?
queue_delayed_work_on()'s comments mention "Return: %false if @work was
already on a queue" which I interpret as existing (correct) worker not being
impacted. I am not familiar with these corner cases though.
Reinette
next prev parent reply other threads:[~2026-05-05 21:26 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-01 21:36 [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain() Tony Luck
2026-05-04 15:11 ` Reinette Chatre
2026-05-04 22:50 ` Luck, Tony
2026-05-05 4:39 ` Reinette Chatre
2026-05-05 16:45 ` Luck, Tony
2026-05-05 21:26 ` Reinette Chatre [this message]
2026-05-05 23:07 ` Luck, Tony
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7fad1d7d-c892-416e-b97a-a230fd43f2a4@intel.com \
--to=reinette.chatre@intel.com \
--cc=Dave.Martin@arm.com \
--cc=babu.moger@amd.com \
--cc=bp@alien8.de \
--cc=dfustini@baylibre.com \
--cc=fenghuay@nvidia.com \
--cc=james.morse@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maciej.wieczor-retman@intel.com \
--cc=patches@lists.linux.dev \
--cc=peternewman@google.com \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox