public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Reinette Chatre <reinette.chatre@intel.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>, <x86@kernel.org>,
	Fenghua Yu <fenghuay@nvidia.com>,
	Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>,
	Peter Newman <peternewman@google.com>,
	James Morse <james.morse@arm.com>,
	Babu Moger <babu.moger@amd.com>,
	"Drew Fustini" <dfustini@baylibre.com>,
	Dave Martin <Dave.Martin@arm.com>, Chen Yu <yu.c.chen@intel.com>,
	<linux-kernel@vger.kernel.org>, <patches@lists.linux.dev>
Subject: Re: [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain()
Date: Wed, 6 May 2026 14:45:16 -0700	[thread overview]
Message-ID: <217d306e-78dd-4762-8c82-88d6bab9de44@intel.com> (raw)
In-Reply-To: <afubI4kYrdWUXGUR@agluck-desk3>

Hi Tony,

On 5/6/26 12:48 PM, Luck, Tony wrote:
> On Wed, May 06, 2026 at 11:24:30AM -0700, Reinette Chatre wrote:
> 
> ... trimmed discussion on how we got here ...
> 
>> schedule_delayed_work_on() will schedule the work but will do so on CPU going
>> offline. Does not seem as though schedule_delayed_work_on() should be used at all
>> if the worker is currently running. As an alternative, when it finds that it cannot
>> cancel the work resctrl can avoid attempting to reschedule the work and instead just
>> set rdt_l3_mon_domain::mbm_work_cpu to nr_cpu_ids to signal that this domain needs a
>> worker to be scheduled and that to be done by the exiting work.
>>
>> Combining the previous ideas with the results from experiments I think the following
>> may address the problem for MBM overflow handler, not expanded to include limbo handler
>> and untested:
> 
> Initial testing seems good. I added a big mdelay() in mbm_handle_overflow() 
> before cpus_read_lock() to make it easy to hit the case where cancel_delayed_work()
> fails. Tested both the "still have remaining CPUs in the domain" and "this is 
> last cpu" case for both success and fail of cancel_delayed_work().

Thank you very much for the testing.

> 
> It looks to me that resctrl_offline_cpu() handles this completely and
> the additional cancel_delayed_work() calls from resctrl_offline_mon_domain()
> aren't needed.
> 
> Do you agree that those can be deleted?

Good catch. I am not able to think of a scenario where this is still needed. The new
flow opens up some new scenarios, for example when the last *two* CPUs of a domain go
offline while the worker is blocked on cpus_read_lock() and worker not getting opportunity
to transition. Even then, when the MBM overflow handling code in resctrl_offline_cpu()
is totally skipped for one CPU the cancel_delayed_work() in resctrl_offline_mon_domain()
seems unnecessary to me. 

Could you perhaps ask the AI agent that assisted with original patch if it can find any
corner cases?

Unrelated to this question but may be worth a mention in the fix is that this work focuses
and fixes resctrl to not access freed memory from the worker self. To complement this it may
be worthwhile to highlight that it is safe for the work_struct self to be deleted while the
work is running (but blocked on cpus_read_lock()) based on the following comment from
kernel/workqueue.c:process_one_work():
"It is permissible to free the struct work_struct from inside the function that is called
from it ..."

> 
> I'll look at fixing the cqm_limbo path in the same style.

Thank you very much.

Reinette


  reply	other threads:[~2026-05-06 21:45 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-01 21:36 [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain() Tony Luck
2026-05-04 15:11 ` Reinette Chatre
2026-05-04 22:50   ` Luck, Tony
2026-05-05  4:39     ` Reinette Chatre
2026-05-05 16:45       ` Luck, Tony
2026-05-05 21:26         ` Reinette Chatre
2026-05-05 23:07           ` Luck, Tony
2026-05-06 18:24             ` Reinette Chatre
2026-05-06 19:48               ` Luck, Tony
2026-05-06 21:45                 ` Reinette Chatre [this message]
2026-05-06 22:11                   ` Luck, Tony
2026-05-06 22:28                     ` Reinette Chatre
2026-05-06 23:14                       ` Luck, Tony
2026-05-06 20:02               ` Luck, Tony
2026-05-06 20:33                 ` Reinette Chatre
2026-05-06 20:52                   ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=217d306e-78dd-4762-8c82-88d6bab9de44@intel.com \
    --to=reinette.chatre@intel.com \
    --cc=Dave.Martin@arm.com \
    --cc=babu.moger@amd.com \
    --cc=bp@alien8.de \
    --cc=dfustini@baylibre.com \
    --cc=fenghuay@nvidia.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=patches@lists.linux.dev \
    --cc=peternewman@google.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox