From: "Luck, Tony" <tony.luck@intel.com>
To: Reinette Chatre <reinette.chatre@intel.com>
Cc: Borislav Petkov <bp@alien8.de>, <x86@kernel.org>,
Fenghua Yu <fenghuay@nvidia.com>,
Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>,
Peter Newman <peternewman@google.com>,
James Morse <james.morse@arm.com>,
Babu Moger <babu.moger@amd.com>,
"Drew Fustini" <dfustini@baylibre.com>,
Dave Martin <Dave.Martin@arm.com>, Chen Yu <yu.c.chen@intel.com>,
<linux-kernel@vger.kernel.org>, <patches@lists.linux.dev>
Subject: Re: [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain()
Date: Wed, 6 May 2026 12:48:51 -0700 [thread overview]
Message-ID: <afubI4kYrdWUXGUR@agluck-desk3> (raw)
In-Reply-To: <d065f7b7-daac-4e45-b7c9-69175dfb43a7@intel.com>
On Wed, May 06, 2026 at 11:24:30AM -0700, Reinette Chatre wrote:
... trimmed discussion on how we got here ...
> schedule_delayed_work_on() will schedule the work but will do so on CPU going
> offline. Does not seem as though schedule_delayed_work_on() should be used at all
> if the worker is currently running. As an alternative, when it finds that it cannot
> cancel the work resctrl can avoid attempting to reschedule the work and instead just
> set rdt_l3_mon_domain::mbm_work_cpu to nr_cpu_ids to signal that this domain needs a
> worker to be scheduled and that to be done by the exiting work.
>
> Combining the previous ideas with the results from experiments I think the following
> may address the problem for MBM overflow handler, not expanded to include limbo handler
> and untested:
Initial testing seems good. I added a big mdelay() in mbm_handle_overflow()
before cpus_read_lock() to make it easy to hit the case where cancel_delayed_work()
fails. Tested both the "still have remaining CPUs in the domain" and "this is
last cpu" case for both success and fail of cancel_delayed_work().
It looks to me that resctrl_offline_cpu() handles this completely and
the additional cancel_delayed_work() calls from resctrl_offline_mon_domain()
aren't needed.
Do you agree that those can be deleted?
I'll look at fixing the cqm_limbo path in the same style.
>
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 9fd901c78dc6..2e54042b7ee9 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -852,6 +852,30 @@ void mbm_handle_overflow(struct work_struct *work)
> goto out_unlock;
>
> r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
> +
> + /*
> + * Worker was blocked waiting for the CPU it was running on to go
> + * offline. Handle two scenarios:
> + * - Worker was running on the last CPU of a domain. The domain and
> + * thus the work_struct has been freed so do not attempt to obtain
> + * domain via container_of(). All remaining domains have overflow
> + * handlers so the loop will not find any domains needing an
> + * overflow handler. Just exit.
> + * - Worker was running on CPU that just went offline with other
> + * CPUs in domain still running and available to take over the
> + * worker. Offline handler could not schedule a new worker on
> + * another CPU in the domain but signaled that this needs to be
> + * done by setting mbm_work_cpu to nr_cpu_ids. Find the domain
> + * that needs a worker and schedule it now.
> + */
> + if (!is_percpu_thread()) {
> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
> + if (d->mbm_work_cpu == nr_cpu_ids)
> + mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL, RESCTRL_PICK_ANY_CPU);
> + }
> + goto out_unlock;
> + }
> +
> d = container_of(work, struct rdt_l3_mon_domain, mbm_over.work);
>
> list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 02f87c4bc03c..cc8620ace7ed 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -4539,8 +4539,19 @@ void resctrl_offline_cpu(unsigned int cpu)
> d = get_mon_domain_from_cpu(cpu, l3);
> if (d) {
> if (resctrl_is_mbm_enabled() && cpu == d->mbm_work_cpu) {
> - cancel_delayed_work(&d->mbm_over);
> - mbm_setup_overflow_handler(d, 0, cpu);
> + if (cancel_delayed_work(&d->mbm_over)) {
> + mbm_setup_overflow_handler(d, 0, cpu);
> + } else {
> + /*
> + * Unable to schedule work on new CPU if it
> + * is currently running since the re-schedule
> + * will just force new work to run on
> + * current CPU. Mark domain's worker as
> + * needing to be rescheduled to be handled
> + * by worker itself.
> + */
> + d->mbm_work_cpu = nr_cpu_ids;
> + }
> }
> if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) &&
> cpu == d->cqm_work_cpu && has_busy_rmid(d)) {
>
>
-Tony
next prev parent reply other threads:[~2026-05-06 19:48 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-01 21:36 [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain() Tony Luck
2026-05-04 15:11 ` Reinette Chatre
2026-05-04 22:50 ` Luck, Tony
2026-05-05 4:39 ` Reinette Chatre
2026-05-05 16:45 ` Luck, Tony
2026-05-05 21:26 ` Reinette Chatre
2026-05-05 23:07 ` Luck, Tony
2026-05-06 18:24 ` Reinette Chatre
2026-05-06 19:48 ` Luck, Tony [this message]
2026-05-06 21:45 ` Reinette Chatre
2026-05-06 22:11 ` Luck, Tony
2026-05-06 22:28 ` Reinette Chatre
2026-05-06 23:14 ` Luck, Tony
2026-05-06 20:02 ` Luck, Tony
2026-05-06 20:33 ` Reinette Chatre
2026-05-06 20:52 ` Luck, Tony
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=afubI4kYrdWUXGUR@agluck-desk3 \
--to=tony.luck@intel.com \
--cc=Dave.Martin@arm.com \
--cc=babu.moger@amd.com \
--cc=bp@alien8.de \
--cc=dfustini@baylibre.com \
--cc=fenghuay@nvidia.com \
--cc=james.morse@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maciej.wieczor-retman@intel.com \
--cc=patches@lists.linux.dev \
--cc=peternewman@google.com \
--cc=reinette.chatre@intel.com \
--cc=x86@kernel.org \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox