public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: Reinette Chatre <reinette.chatre@intel.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	H Peter Anvin <hpa@zytor.com>, Babu Moger <Babu.Moger@amd.com>,
	shameerali.kolothum.thodi@huawei.com,
	D Scott Phillips OS <scott@os.amperecomputing.com>,
	carl@os.amperecomputing.com, lcherian@marvell.com,
	bobo.shaobowang@huawei.com, tan.shaopeng@fujitsu.com,
	baolin.wang@linux.alibaba.com,
	Jamie Iles <quic_jiles@quicinc.com>,
	Xin Hao <xhao@linux.alibaba.com>,
	peternewman@google.com, dfustini@baylibre.com,
	amitsinght@marvell.com, David Hildenbrand <david@redhat.com>,
	Rex Nie <rex.nie@jaguarmicro.com>,
	Dave Martin <dave.martin@arm.com>, Koba Ko <kobak@nvidia.com>,
	Shanker Donthineni <sdonthineni@nvidia.com>,
	fenghuay@nvidia.com, Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>,
	Tony Luck <tony.luck@intel.com>
Subject: Re: [PATCH v7 33/49] x86/resctrl: resctrl_exit() teardown resctrl but leave the mount point
Date: Fri, 7 Mar 2025 17:54:01 +0000	[thread overview]
Message-ID: <0d290ca1-20cf-4165-aa4e-62da582a5461@arm.com> (raw)
In-Reply-To: <053d8a62-022b-4bf8-8e47-651e7c3a2d59@intel.com>

Hi Reinette,

On 07/03/2025 04:45, Reinette Chatre wrote:
> On 2/28/25 11:58 AM, James Morse wrote:
>> resctrl_exit() was intended for use when the 'resctrl' module was unloaded.
>> resctrl can't be built as a module, and the kernfs helpers are not exported
>> so this is unlikely to change. MPAM has an error interrupt which indicates
>> the MPAM driver has gone haywire. Should this occur tasks could run with
>> the wrong control values, leading to bad performance for important tasks.
>> The MPAM driver needs a way to tell resctrl that no further configuration
>> should be attempted.
>>
>> Using resctrl_exit() for this leaves the system in a funny state as
>> resctrl is still mounted, but cannot be un-mounted because the sysfs
>> directory that is typically used has been removed. Dave Martin suggests
>> this may cause systemd trouble in the future as not all filesystems
>> can be unmounted.
>>
>> Add calls to remove all the files and directories in resctrl, and
>> remove the sysfs_remove_mount_point() call that leaves the system
>> in a funny state. When triggered, this causes all the resctrl files
>> to disappear. resctrl can be unmounted, but not mounted again.

> (copying v6 discussion here)

> On 3/6/25 11:28 AM, James Morse wrote:
>> On 01/03/2025 02:35, Reinette Chatre wrote:
>>> On 2/28/25 11:54 AM, James Morse wrote:
>>>> On 20/02/2025 04:42, Reinette Chatre wrote:
> 
>>>>> It is difficult for me to follow the kernfs reference counting required
>>>>> to make this work. Specifically, the root kn is "destroyed" here but it
>>>>> is required to stick around until unmount when the rest of the files
>>>>> are removed.
>>>>
>>>> This drops resctrl's reference to all of the files, which would make the files disappear.
>>>> unmount is what calls kernfs_kill_sb(), which gets rid of the root of the filesystem.
>>>
>>> My concern is mostly with the kernfs_remove() calls in the rdt_kill_sb()->rmdir_all_sub()
>>> flow. For example:
>>> 	kernfs_remove(kn_info);
>>> 	kernfs_remove(kn_mongrp);
>>> 	kernfs_remove(kn_mondata);
>>>
>>> As I understand the above require the destroyed root to still be around.

>> Right - because rdt_get_tree() has these global pointers into the hierarchy, but doesn't
>> take a reference. rmdir_all_sub() relies on always being called before
>> rdtgroup_destroy_root().

> Is this a known issue then?

Its just a subtle thing that resctrl was relying on, and I didn't spot. Thanks for
pointing it out!


> Since I am not able to use your test I created something new
> after thinking there would be no response to my comment and indeed on unmount:
> [  293.707228] BUG: KASAN: slab-use-after-free in kernfs_remove+0x87/0xa0
> [  293.714718] Read of size 8 at addr ff11000309d88f30 by task umount/3793
>> The point hack would be for rdtgroup_destroy_root() to NULL out those global pointers, (I
>> note they are left dangling) - that would make a subsequent call to rmdir_all_sub() harmless.
>>
>> A better fix would be to pull out all the filesystem relevant parts from rdt_kill_sb(),
>> make that safe for multiple calls and get resctrl_exit() to call that.
>> A call to rdt_kill_sb() after resctrl_exit() would just cleanup the super-block.
>> This will leave things in a more predictable state.


> Why just the filesystem relevant parts?

The stated aim is to prevent any further configuration of the hardware.
We can change that to 'unmount as much as possible'. I didn't think of it like that as I
couldn't find an in-kernel way of triggering a umount(). (and the mount may be copied into
numerous namespaces)


> Although, you also state "resctrl_exit() would just
> cleanup the super-block" that sounds like you are thinking about pulling out all reset work.
> This sounds reasonable to me. It really feels more appropriate to do proper cleanup and
> not just wipe the root while leaving everything else underneath it.

After this discussion, my new proposal is to do this:
-------%<-------
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 0d74a6d98dba..cee4604a59c0 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3063,6 +3063,21 @@ static void rmdir_all_sub(void)
        kernfs_remove(kn_mondata);
 }

+static void resctrl_fs_teardown(void)
+{
+       lockdep_assert_held(&rdtgroup_mutex);
+
+       /* Cleared by rdtgroup_destroy_root() */
+       if (!rdtgroup_default.kn)
+               return;
+
+       rmdir_all_sub();
+       rdt_pseudo_lock_release();
+       rdtgroup_default.mode = RDT_MODE_SHAREABLE;
+       schemata_list_destroy();
+       closid_exit();
+       rdtgroup_destroy_root();
+}
+
 static void rdt_kill_sb(struct super_block *sb)
 {
        struct rdt_resource *r;
@@ -3076,11 +3091,8 @@ static void rdt_kill_sb(struct super_block *sb)
        for_each_alloc_capable_rdt_resource(r)
                resctrl_arch_reset_all_ctrls(r);

-       rmdir_all_sub();
-       rdt_pseudo_lock_release();
-       rdtgroup_default.mode = RDT_MODE_SHAREABLE;
-       schemata_list_destroy();
-       rdtgroup_destroy_root();
+       resctrl_fs_teardown();
+
        if (resctrl_arch_alloc_capable())
                resctrl_arch_disable_alloc();
        if (resctrl_arch_mon_capable())
                resctrl_arch_disable_mon();
-       closid_exit();
        resctrl_mounted = false;
        kernfs_kill_sb(sb);
        mutex_unlock(&rdtgroup_mutex);
-------%<-------

and call resctrl_fs_teardown() from resctrl_exit().

This leaves a bunch of resctrl_arch_ calls behind, the reason is its the arch code that
calls resctrl_exit(), so it probably doesn't need to be told to disable things. For MPAM,
the work of resctrl_arch_reset_all_ctrls() will already have been done - and all these
calls will fail because the MPAM driver is blocking further access to the hardware.


Thanks,

James

  reply	other threads:[~2025-03-07 17:54 UTC|newest]

Thread overview: 149+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-28 19:58 [PATCH v7 00/49] x86/resctrl: Move the resctrl filesystem code to /fs/resctrl James Morse
2025-02-28 19:58 ` [PATCH v7 01/49] x86/resctrl: Fix allocation of cleanest CLOSID on platforms with no monitors James Morse
2025-03-06 20:56   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 02/49] x86/resctrl: Add a helper to avoid reaching into the arch code resource list James Morse
2025-03-06 20:58   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 03/49] x86/resctrl: Remove fflags from struct rdt_resource James Morse
2025-03-06 21:00   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 04/49] x86/resctrl: Use schema type to determine how to parse schema values James Morse
2025-03-06 21:00   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 05/49] x86/resctrl: Use schema type to determine the schema format string James Morse
2025-03-06 21:01   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 06/49] x86/resctrl: Remove data_width and the tabular format James Morse
2025-03-06 21:02   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 07/49] x86/resctrl: Add max_bw to struct resctrl_membw James Morse
2025-03-06 21:02   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 08/49] x86/resctrl: Generate default_ctrl instead of sharing it James Morse
2025-03-06 21:04   ` Fenghua Yu
2025-03-07  4:33   ` Reinette Chatre
2025-03-07 18:08     ` James Morse
2025-02-28 19:58 ` [PATCH v7 09/49] x86/resctrl: Add helper for setting CPU default properties James Morse
2025-03-06 21:04   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 10/49] x86/resctrl: Remove rdtgroup from update_cpu_closid_rmid() James Morse
2025-03-06 21:04   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 11/49] x86/resctrl: Expose resctrl fs's init function to the rest of the kernel James Morse
2025-03-06 21:05   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 12/49] x86/resctrl: Move rdt_find_domain() to be visible to arch and fs code James Morse
2025-03-06 21:08   ` Fenghua Yu
2025-03-07  4:34   ` Reinette Chatre
2025-03-07 19:35     ` James Morse
2025-02-28 19:58 ` [PATCH v7 13/49] x86/resctrl: Move resctrl types to a separate header James Morse
2025-03-06 21:09   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 14/49] x86/resctrl: Add an arch helper to reset one resource James Morse
2025-03-06 21:10   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 15/49] x86/resctrl: Move monitor exit work to a resctrl exit call James Morse
2025-03-06 21:28   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 16/49] x86/resctrl: Move monitor init work to a resctrl init call James Morse
2025-03-06 21:29   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 17/49] x86/resctrl: Rewrite and move the for_each_*_rdt_resource() walkers James Morse
2025-03-06 21:29   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 18/49] x86/resctrl: Move the is_mbm_*_enabled() helpers to asm/resctrl.h James Morse
2025-03-06 21:30   ` Fenghua Yu
2025-03-07  4:35   ` Reinette Chatre
2025-02-28 19:58 ` [PATCH v7 19/49] x86/resctrl: Add resctrl_arch_is_evt_configurable() to abstract BMEC James Morse
2025-03-06 21:32   ` Fenghua Yu
2025-03-07  4:37   ` Reinette Chatre
2025-03-10 12:26     ` James Morse
2025-02-28 19:58 ` [PATCH v7 20/49] x86/resctrl: Change mon_event_config_{read,write}() to be arch helpers James Morse
2025-03-06 21:32   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 21/49] x86/resctrl: Move mba_mbps_default_event init to filesystem code James Morse
2025-03-06 21:33   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 22/49] x86/resctrl: Move mbm_cfg_mask to struct rdt_resource James Morse
2025-03-06 21:34   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 23/49] x86/resctrl: Add resctrl_arch_ prefix to pseudo lock functions James Morse
2025-03-06 21:34   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 24/49] x86/resctrl: Allow an architecture to disable pseudo lock James Morse
2025-03-06 21:35   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 25/49] x86/resctrl: Make prefetch_disable_bits belong to the arch code James Morse
2025-03-06 21:36   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 26/49] x86/resctrl: Make resctrl_arch_pseudo_lock_fn() take a plr James Morse
2025-03-06 21:36   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 27/49] x86/resctrl: Move RFTYPE flags to be managed by resctrl James Morse
2025-03-06 21:37   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 28/49] x86/resctrl: Handle throttle_mode for SMBA resources James Morse
2025-03-06 21:37   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 29/49] x86/resctrl: Move get_config_index() to a header James Morse
2025-03-06 22:00   ` Fenghua Yu
2025-03-07  4:37   ` Reinette Chatre
2025-02-28 19:58 ` [PATCH v7 30/49] x86/resctrl: Move get_{mon,ctrl}_domain_from_cpu() to live with their callers James Morse
2025-03-06 22:08   ` Fenghua Yu
2025-03-07  4:38   ` Reinette Chatre
2025-02-28 19:58 ` [PATCH v7 31/49] x86/resctrl: Remove the limit on the number of CLOSID James Morse
2025-03-06 22:11   ` Fenghua Yu
2025-03-07  4:40   ` Reinette Chatre
2025-03-10 14:20     ` James Morse
2025-02-28 19:58 ` [PATCH v7 32/49] x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_" James Morse
2025-03-06 22:11   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 33/49] x86/resctrl: resctrl_exit() teardown resctrl but leave the mount point James Morse
2025-03-07  4:45   ` Reinette Chatre
2025-03-07 17:54     ` James Morse [this message]
2025-03-07 19:04       ` Reinette Chatre
2025-03-10 18:15         ` James Morse
2025-02-28 19:58 ` [PATCH v7 34/49] x86/resctrl: Drop __init/__exit on assorted symbols James Morse
2025-03-06 22:30   ` Fenghua Yu
2025-02-28 19:58 ` [PATCH v7 35/49] x86/resctrl: Move is_mba_sc() out of core.c James Morse
2025-03-06 22:31   ` Fenghua Yu
2025-02-28 19:59 ` [PATCH v7 36/49] x86/resctrl: Add end-marker to the resctrl_event_id enum James Morse
2025-03-06 22:31   ` Fenghua Yu
2025-02-28 19:59 ` [PATCH v7 37/49] x86/resctrl: Expand the width of dom_id by replacing mon_data_bits James Morse
2025-03-07  5:03   ` Reinette Chatre
2025-03-12 18:04     ` James Morse
2025-03-13 15:25       ` Reinette Chatre
2025-03-25  0:52         ` Luck, Tony
2025-04-03 19:16           ` Luck, Tony
2025-04-04 16:53             ` James Morse
2025-03-07 10:17   ` Amit Singh Tomar
2025-03-12 18:04     ` James Morse
2025-02-28 19:59 ` [PATCH v7 38/49] x86/resctrl: Remove a newline to avoid confusing the code move script James Morse
2025-03-06 22:41   ` Fenghua Yu
2025-02-28 19:59 ` [PATCH v7 39/49] x86/resctrl: Split trace.h James Morse
2025-03-07  5:04   ` Reinette Chatre
2025-02-28 19:59 ` [PATCH v7 40/49] fs/resctrl: Add boiler plate for external resctrl code James Morse
2025-03-07  2:16   ` Fenghua Yu
2025-03-14 12:01     ` James Morse
2025-03-07  5:05   ` Reinette Chatre
2025-02-28 19:59 ` [PATCH v7 41/49] x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl James Morse
2025-03-07  2:19   ` Fenghua Yu
2025-03-07  5:05   ` Reinette Chatre
2025-02-28 19:59 ` [PATCH v7 42/49] x86/resctrl: Squelch whitespace anomalies in resctrl core code James Morse
2025-03-07  2:20   ` Fenghua Yu
2025-03-07  5:06   ` Reinette Chatre
2025-02-28 19:59 ` [PATCH v7 43/49] x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context() James Morse
2025-03-07  2:23   ` Fenghua Yu
2025-03-07  5:06   ` Reinette Chatre
2025-02-28 19:59 ` [PATCH v7 44/49] x86/resctrl: Relax some asm #includes James Morse
2025-03-07  2:51   ` Fenghua Yu
     [not found]   ` <d4bba1b8-2faa-4d9b-becf-f10011351652@nvidia.com>
2025-03-14 12:00     ` James Morse
2025-02-28 19:59 ` [PATCH v7 45/49] x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl James Morse
2025-03-06 20:35   ` Fenghua Yu
2025-03-14 17:42     ` James Morse
2025-03-06 20:45   ` Fenghua Yu
2025-02-28 19:59 ` [PATCH v7 46/49] x86,fs/resctrl: Remove duplicated trace header files James Morse
2025-03-07  2:32   ` Fenghua Yu
2025-03-14 17:42     ` James Morse
2025-02-28 19:59 ` [PATCH v7 47/49] fs/resctrl: Remove unnecessary includes James Morse
2025-03-07  2:37   ` Fenghua Yu
2025-03-14 17:42     ` James Morse
2025-03-07  5:07   ` Reinette Chatre
2025-02-28 19:59 ` [PATCH v7 48/49] fs/resctrl: Change internal.h's header guard macros James Morse
2025-03-07  2:44   ` Fenghua Yu
2025-02-28 19:59 ` [PATCH v7 49/49] x86,fs/resctrl: Move resctrl.rst to live under Documentation/filesystems James Morse
2025-03-07  2:45   ` Fenghua Yu
2025-03-03 10:14 ` [PATCH v7 00/49] x86/resctrl: Move the resctrl filesystem code to /fs/resctrl Peter Newman
2025-03-11 12:16   ` James Morse
2025-03-04 21:47 ` Luck, Tony
2025-03-05 16:35   ` Reinette Chatre
2025-03-05 20:47     ` Moger, Babu
2025-03-06 11:47 ` Shaopeng Tan (Fujitsu)
2025-03-11 12:26   ` James Morse
2025-03-07 10:27 ` Amit Singh Tomar
2025-03-07 13:50   ` Amit Singh Tomar
2025-03-07 16:38     ` Shanker Donthineni
2025-03-07 16:50       ` Shanker Donthineni
2025-03-11 18:33         ` James Morse
2025-03-11 18:33     ` James Morse
2025-03-10 14:07 ` Moger, Babu
2025-03-11 16:18   ` James Morse
2025-03-10 19:09 ` Borislav Petkov
2025-03-11 18:38   ` James Morse
2025-03-11 20:06     ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0d290ca1-20cf-4165-aa4e-62da582a5461@arm.com \
    --to=james.morse@arm.com \
    --cc=Babu.Moger@amd.com \
    --cc=amitsinght@marvell.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bobo.shaobowang@huawei.com \
    --cc=bp@alien8.de \
    --cc=carl@os.amperecomputing.com \
    --cc=dave.martin@arm.com \
    --cc=david@redhat.com \
    --cc=dfustini@baylibre.com \
    --cc=fenghuay@nvidia.com \
    --cc=hpa@zytor.com \
    --cc=kobak@nvidia.com \
    --cc=lcherian@marvell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peternewman@google.com \
    --cc=quic_jiles@quicinc.com \
    --cc=reinette.chatre@intel.com \
    --cc=rex.nie@jaguarmicro.com \
    --cc=scott@os.amperecomputing.com \
    --cc=sdonthineni@nvidia.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=tan.shaopeng@fujitsu.com \
    --cc=tan.shaopeng@jp.fujitsu.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=xhao@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox