Re: [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Luck, Tony" <tony.luck@intel.com>
To: Reinette Chatre <reinette.chatre@intel.com>
Cc: Borislav Petkov <bp@alien8.de>, <x86@kernel.org>,
	Fenghua Yu <fenghuay@nvidia.com>,
	Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>,
	Peter Newman <peternewman@google.com>,
	James Morse <james.morse@arm.com>,
	Babu Moger <babu.moger@amd.com>,
	"Drew Fustini" <dfustini@baylibre.com>,
	Dave Martin <Dave.Martin@arm.com>, Chen Yu <yu.c.chen@intel.com>,
	<linux-kernel@vger.kernel.org>, <patches@lists.linux.dev>
Subject: Re: [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain()
Date: Wed, 6 May 2026 12:48:51 -0700	[thread overview]
Message-ID: <afubI4kYrdWUXGUR@agluck-desk3> (raw)
In-Reply-To: <d065f7b7-daac-4e45-b7c9-69175dfb43a7@intel.com>

On Wed, May 06, 2026 at 11:24:30AM -0700, Reinette Chatre wrote:

... trimmed discussion on how we got here ...

> schedule_delayed_work_on() will schedule the work but will do so on CPU going
> offline. Does not seem as though schedule_delayed_work_on() should be used at all
> if the worker is currently running. As an alternative, when it finds that it cannot
> cancel the work resctrl can avoid attempting to reschedule the work and instead just
> set rdt_l3_mon_domain::mbm_work_cpu to nr_cpu_ids to signal that this domain needs a
> worker to be scheduled and that to be done by the exiting work.
> 
> Combining the previous ideas with the results from experiments I think the following
> may address the problem for MBM overflow handler, not expanded to include limbo handler
> and untested:

Initial testing seems good. I added a big mdelay() in mbm_handle_overflow() 
before cpus_read_lock() to make it easy to hit the case where cancel_delayed_work()
fails. Tested both the "still have remaining CPUs in the domain" and "this is 
last cpu" case for both success and fail of cancel_delayed_work().

It looks to me that resctrl_offline_cpu() handles this completely and
the additional cancel_delayed_work() calls from resctrl_offline_mon_domain()
aren't needed.

Do you agree that those can be deleted?

I'll look at fixing the cqm_limbo path in the same style.

> 
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 9fd901c78dc6..2e54042b7ee9 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -852,6 +852,30 @@ void mbm_handle_overflow(struct work_struct *work)
>  		goto out_unlock;
>  
>  	r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
> +
> +	/*
> +	 * Worker was blocked waiting for the CPU it was running on to go
> +	 * offline. Handle two scenarios:
> +	 * - Worker was running on the last CPU of a domain. The domain and
> +	 *   thus the work_struct has been freed so do not attempt to obtain
> +	 *   domain via container_of(). All remaining domains have overflow
> +	 *   handlers so the loop will not find any domains needing an
> +	 *   overflow handler. Just exit.
> +	 * - Worker was running on CPU that just went offline with other
> +	 *   CPUs in domain still running and available to take over the
> +	 *   worker. Offline handler could not schedule a new worker on
> +	 *   another CPU in the domain but signaled that this needs to be
> +	 *   done by setting mbm_work_cpu to nr_cpu_ids. Find the domain
> +	 *   that needs a worker and schedule it now.
> +	 */
> +	if (!is_percpu_thread()) {
> +		list_for_each_entry(d, &r->mon_domains, hdr.list) {
> +			if (d->mbm_work_cpu == nr_cpu_ids)
> +				mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL, RESCTRL_PICK_ANY_CPU);
> +		}
> +		goto out_unlock;
> +	}
> +
>  	d = container_of(work, struct rdt_l3_mon_domain, mbm_over.work);
>  
>  	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 02f87c4bc03c..cc8620ace7ed 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -4539,8 +4539,19 @@ void resctrl_offline_cpu(unsigned int cpu)
>  	d = get_mon_domain_from_cpu(cpu, l3);
>  	if (d) {
>  		if (resctrl_is_mbm_enabled() && cpu == d->mbm_work_cpu) {
> -			cancel_delayed_work(&d->mbm_over);
> -			mbm_setup_overflow_handler(d, 0, cpu);
> +			if (cancel_delayed_work(&d->mbm_over)) {
> +				mbm_setup_overflow_handler(d, 0, cpu);
> +			} else {
> +				/*
> +				 * Unable to schedule work on new CPU if it
> +				 * is currently running since the re-schedule
> +				 * will just force new work to run on
> +				 * current CPU. Mark domain's worker as
> +				 * needing to be rescheduled to be handled
> +				 * by worker itself.
> +				 */
> +				d->mbm_work_cpu = nr_cpu_ids;
> +			}
>  		}
>  		if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) &&
>  		    cpu == d->cqm_work_cpu && has_busy_rmid(d)) {
> 
> 

-Tony

next prev parent reply	other threads:[~2026-05-06 19:48 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-01 21:36 [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain() Tony Luck
2026-05-04 15:11 ` Reinette Chatre
2026-05-04 22:50   ` Luck, Tony
2026-05-05  4:39     ` Reinette Chatre
2026-05-05 16:45       ` Luck, Tony
2026-05-05 21:26         ` Reinette Chatre
2026-05-05 23:07           ` Luck, Tony
2026-05-06 18:24             ` Reinette Chatre
2026-05-06 19:48               ` Luck, Tony [this message]
2026-05-06 21:45                 ` Reinette Chatre
2026-05-06 22:11                   ` Luck, Tony
2026-05-06 22:28                     ` Reinette Chatre
2026-05-06 23:14                       ` Luck, Tony
2026-05-07  3:42                         ` Reinette Chatre
2026-05-07 15:12                           ` Luck, Tony
2026-05-06 20:02               ` Luck, Tony
2026-05-06 20:33                 ` Reinette Chatre
2026-05-06 20:52                   ` Luck, Tony
2026-05-07 15:48               ` Luck, Tony
2026-05-07 17:06                 ` Reinette Chatre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afubI4kYrdWUXGUR@agluck-desk3 \
    --to=tony.luck@intel.com \
    --cc=Dave.Martin@arm.com \
    --cc=babu.moger@amd.com \
    --cc=bp@alien8.de \
    --cc=dfustini@baylibre.com \
    --cc=fenghuay@nvidia.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=patches@lists.linux.dev \
    --cc=peternewman@google.com \
    --cc=reinette.chatre@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.