public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Reinette Chatre <reinette.chatre@intel.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>, "x86@kernel.org" <x86@kernel.org>,
	"Fenghua Yu" <fenghuay@nvidia.com>,
	"Wieczor-Retman, Maciej" <maciej.wieczor-retman@intel.com>,
	Peter Newman <peternewman@google.com>,
	James Morse <james.morse@arm.com>,
	Babu Moger <babu.moger@amd.com>,
	"Drew Fustini" <dfustini@baylibre.com>,
	Dave Martin <Dave.Martin@arm.com>,
	"Chen, Yu C" <yu.c.chen@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"patches@lists.linux.dev" <patches@lists.linux.dev>
Subject: Re: [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain()
Date: Wed, 6 May 2026 20:42:09 -0700	[thread overview]
Message-ID: <528caf7e-b548-4e80-9ec2-70697073a14d@intel.com> (raw)
In-Reply-To: <SJ1PR11MB608396D4D57AD6C1E079A4B7FC3F2@SJ1PR11MB6083.namprd11.prod.outlook.com>

Hi Tony,

On 5/6/26 4:14 PM, Luck, Tony wrote:
>>>> Unrelated to this question but may be worth a mention in the fix is that this work focuses
>>>> and fixes resctrl to not access freed memory from the worker self. To complement this it may
>>>> be worthwhile to highlight that it is safe for the work_struct self to be deleted while the
>>>> work is running (but blocked on cpus_read_lock()) based on the following comment from
>>>> kernel/workqueue.c:process_one_work():
>>>> "It is permissible to free the struct work_struct from inside the function that is called
>>>> from it ..."
>>>
>>> Scope increased from just the use-after-free when the domain was deleted. The case
>>> for taking the current worker CPU offline doesn't involve a use-after-free. It just results
>>> in running the workier on the wrong CPU for one iteration.
>>>
>>> Deleting the work_struct inside the called function is different from some agent deleting
>>> the work_struct while the worker is running.
>>
>> Right. I interpret this to mean that judging the safety of work_struct removal should consider not
>> only the workqueue API itself but also external agents that may access the work_struct after its
>> removal. The current fix addresses access to removed work_struct from within worker itself while I
>> interpret the workqueue API to guarantee that there will be no access to work_struct during or
>> after worker execution. The fix under development thus makes it possible to safely remove the
>> domain even if a worker belonging to it is executing and blocked on cpus_read_lock(). Do you
>> see any remaining issues here?
> 
> OK. I'll add something to the commit message.
> 
> I asked my original AI about this fix. It claimed to find problems relating to kernel using the work_struct
> after return from the function. Pasting in that comment you gave me from process_one_work() about
> it being OK to free the work_struct made it reconsider and retract.
> 
> Another AI (using a copy of the sashiko rules) has found an issue with our reliance on is_percpu_thread()
> 
> The problem is the ordering of hotplug callbacks.
> 
> resctrl_arch_offline_cpu() runs early because it is in the CPUHP_AP_ONLINE_DYN class. AI claims
> that cpus_write_lock() is released after running this, but before running workqueue_offline_cpu() in the
> CPUHP_AP_WORKQUEUE_ONLINE class.
> 
> So our worker may obtain cpus_read_lock() and not yet lost its_percpu_thread() status.

Your message is not clear to me. Do you agree with AI here and thus claim that there remains an issue? 

Are you suggesting that the original race explained in
https://lore.kernel.org/lkml/afoesuWB8RezVLrN@agluck-desk3/ is not accurate? 

I am not able to see how CPU hotplug write lock is released in the middle of all the AP cleanup
handlers. When looking at _cpu_down() I see:

	_cpu_down()
	{
		...
		cpus_write_lock();

		/*
		 * Run all the AP handlers on CPU going down - this includes
		 * everything > CPUHP_TEARDOWN_CPU that includes CPUHP_AP_WORKQUEUE_ONLINE
		 * and CPUHP_AP_ONLINE_DYN.
		 */

		/* 
		 * Run rest of cleanups on other CPU
		 */

		cpus_write_unlock();
	}

You claim that cpus_write_lock() is dropped in this flow.

To test this I enabled tracing and see the following when offlining CPU #38 running the
overflow handler:

offline triggered on CPU#1 and it takes CPU hotplug write lock
  1)               |  _cpu_down() {
  1)               |    percpu_down_write() {  <<<<<<<<<<======== CPU hotplug write lock acquired here
  1) # 9155.999 us |    }
  1)               |  /* cpuhp_enter: cpu: 0038 target: 144 step: 236 (cpuhp_kick_ap_work) */

...
executing moves to CPU being offlined (#38) from where the different AP offline callbacks are called:
 38)               |  cpuhp_thread_fun() {
 38)               |  /* cpuhp_enter: cpu: 0038 target: 144 step: 235 (sched_cpu_deactivate) */
 38)               |  /* cpuhp_exit:  cpu: 0038  state: 234 step: 235 ret: 0 */
 38) * 20632.54 us |  }
 38)               |  cpuhp_thread_fun() {
 38)               |  /* cpuhp_enter: cpu: 0038 target: 144 step: 214 (rapl_cpu_down_prep [intel_rapl_msr]) */
 38)               |  /* cpuhp_exit:  cpu: 0038  state: 213 step: 214 ret: 0 */
 38)   3.171 us    |  }
 38)               |  cpuhp_thread_fun() {
 38)               |  /* cpuhp_enter: cpu: 0038 target: 144 step: 213 (pkg_thermal_cpu_offline [x86_pkg_temp_thermal]) */
 38)               |  /* cpuhp_exit:  cpu: 0038  state: 212 step: 213 ret: 0 */
 38)   2.378 us    |  }

... this includes resctrl ...
 38)               |  cpuhp_thread_fun() {
 38)               |  /* cpuhp_enter: cpu: 0038 target: 144 step: 209 (resctrl_arch_offline_cpu) */
 38)               |    resctrl_arch_offline_cpu() {
 38)               |      resctrl_offline_cpu() {
 38)               |        /* workqueue_queue_work: work struct=00000000ed014eff function=mbm_handle_overflow workqueue=events req_cpu=39 cpu=39 */
 38) # 5920.866 us |      }
 38) # 5927.396 us |    }
 38)               |  /* cpuhp_exit:  cpu: 0038  state: 208 step: 209 ret: 0 */
 38) # 5929.182 us |  }

... and the workqueues ...
 38)               |  cpuhp_thread_fun() {
 38)               |  /* cpuhp_enter: cpu: 0038 target: 144 step: 187 (workqueue_offline_cpu) */
 38)               |    workqueue_offline_cpu() {
 38)   3.724 us    |      unbind_worker();
 38)   2.312 us    |      unbind_worker();
 38)   1.701 us    |      unbind_worker();
 38)   1.681 us    |      unbind_worker();
 38) ! 226.852 us  |    }
 38)               |  /* cpuhp_exit:  cpu: 0038  state: 186 step: 187 ret: 0 */
 38) ! 229.393 us  |  }

....
eventually this all finishes and _cpu_down() completes, releasing the CPU hotplug write lock:

 73)               |  /* cpuhp_exit:  cpu: 0038  state:   6 step:   7 ret: 0 */
 73)               |  /* cpuhp_enter: cpu: 0038 target:   0 step:   2 (x86_pmu_dead_cpu) */
 73)               |  /* cpuhp_exit:  cpu: 0038  state:   1 step:   2 ret: 0 */
 73)   5.038 us    |    percpu_up_write(); <<<<<<<<<<======== CPU hotplug write lock released here
 73)               |    cpus_read_lock() {
 73)   0.474 us    |      __percpu_down_read();
 73)   1.420 us    |    }
 73) * 62023.47 us |  } /* _cpu_down */

In the trace that included all CPUs I only see one instance of percpu_down_write() called when
_cpu_down() starts and one instance of percpu_up_write() when _cpu_down() exits.

You claim that CPU hotplug write lock is released before workqueue_offline_cpu() is called.
I am not able to verify this by looking at the code nor the traces generated when offlining a CPU.
Could you please help me understand your claim?

Reinette






  reply	other threads:[~2026-05-07  3:42 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-01 21:36 [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain() Tony Luck
2026-05-04 15:11 ` Reinette Chatre
2026-05-04 22:50   ` Luck, Tony
2026-05-05  4:39     ` Reinette Chatre
2026-05-05 16:45       ` Luck, Tony
2026-05-05 21:26         ` Reinette Chatre
2026-05-05 23:07           ` Luck, Tony
2026-05-06 18:24             ` Reinette Chatre
2026-05-06 19:48               ` Luck, Tony
2026-05-06 21:45                 ` Reinette Chatre
2026-05-06 22:11                   ` Luck, Tony
2026-05-06 22:28                     ` Reinette Chatre
2026-05-06 23:14                       ` Luck, Tony
2026-05-07  3:42                         ` Reinette Chatre [this message]
2026-05-06 20:02               ` Luck, Tony
2026-05-06 20:33                 ` Reinette Chatre
2026-05-06 20:52                   ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=528caf7e-b548-4e80-9ec2-70697073a14d@intel.com \
    --to=reinette.chatre@intel.com \
    --cc=Dave.Martin@arm.com \
    --cc=babu.moger@amd.com \
    --cc=bp@alien8.de \
    --cc=dfustini@baylibre.com \
    --cc=fenghuay@nvidia.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=patches@lists.linux.dev \
    --cc=peternewman@google.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox