From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3259F2701CC for ; Thu, 24 Apr 2025 09:15:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745486124; cv=none; b=XLKTIc9dARhVxYyHLVzOIUHaK9MdvW7iLRBZF18OGQ/TRQccGsR9i5N0f0etlOYlthLv3vjEgtzxoQ4XNxIB6zgsz1ijgtrg/rS3o55G0LanJe0fxGK+3OR1XjiwaJt0fxQxSYtlJHEynTlvWfc1YgLTgxA6/DZfC42/aQqECGo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745486124; c=relaxed/simple; bh=Jrfga0n+DIwSYFtPo/KzGVOxGL6d9C0zVE201GuiIjw=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=ZnqzRngaLXUKvCxS4Lg2tpgOjuGkk5VeO3XmNEsE34lq7qy8/K9A5b/4mFSZBr0DUDOrl7RzhrU8fCyX+Hi5+X2014NPaaYPZZAeb4tX5odcds/LeKugCtctmasB45+fWXLoJr9v5mB1NPQIydqPicEF8LM+RYF4KTXsMEbJEwk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 977AD1063; Thu, 24 Apr 2025 02:15:17 -0700 (PDT) Received: from [10.1.196.43] (eglon.cambridge.arm.com [10.1.196.43]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D27EE3F66E; Thu, 24 Apr 2025 02:15:18 -0700 (PDT) Message-ID: Date: Thu, 24 Apr 2025 10:15:17 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v8 04/21] x86/resctrl: resctrl_exit() teardown resctrl but leave the mount point To: Reinette Chatre , x86@kernel.org, linux-kernel@vger.kernel.org Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , H Peter Anvin , Babu Moger , shameerali.kolothum.thodi@huawei.com, D Scott Phillips OS , carl@os.amperecomputing.com, lcherian@marvell.com, bobo.shaobowang@huawei.com, tan.shaopeng@fujitsu.com, baolin.wang@linux.alibaba.com, Jamie Iles , Xin Hao , peternewman@google.com, dfustini@baylibre.com, amitsinght@marvell.com, David Hildenbrand , Rex Nie , Dave Martin , Koba Ko , Shanker Donthineni , fenghuay@nvidia.com, Shaopeng Tan , Tony Luck References: <20250411164229.23413-1-james.morse@arm.com> <20250411164229.23413-5-james.morse@arm.com> <81a54d21-40af-438e-8139-322597a7506e@intel.com> Content-Language: en-GB From: James Morse In-Reply-To: <81a54d21-40af-438e-8139-322597a7506e@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Hi Reinette, On 16/04/2025 01:25, Reinette Chatre wrote: > On 4/11/25 9:42 AM, James Morse wrote: >> resctrl_exit() was intended for use when the 'resctrl' module was unloaded. >> resctrl can't be built as a module, and the kernfs helpers are not exported >> so this is unlikely to change. MPAM has an error interrupt which indicates >> the MPAM driver has gone haywire. Should this occur tasks could run with >> the wrong control values, leading to bad performance for important tasks. >> In this scenario the MPAM driver will reset the hardware, but it needs >> a way to tell resctrl that no further configuration should be attempted. >> >> In particular, moving tasks between control or monitor groups does not >> interact with the architecture code, so there is no opportunity for the >> arch code to indicate that the hardware is no-longer functioning. >> >> Using resctrl_exit() for this leaves the system in a funny state as >> resctrl is still mounted, but cannot be un-mounted because the sysfs >> directory that is typically used has been removed. Dave Martin suggests >> this may cause systemd trouble in the future as not all filesystems >> can be unmounted. >> >> Add calls to remove all the files and directories in resctrl, and >> remove the sysfs_remove_mount_point() call that leaves the system >> in a funny state. When triggered, this causes all the resctrl files >> to disappear. resctrl can be unmounted, but not mounted again. > The caveat here is that resctrl pretends to be mounted (resctrl_mounted == true) > but there is nothing there. The undocumented part of this is that for this > to work resctrl fs depends (a lot) on the architecture's callbacks to know > if they are being called after a resctrl_exit() call so that they return data > that will direct resctrl fs behavior to safest exit for those > resctrl fs flows that are still possible after a resctrl_exit(). Not ideal > layering. It was the arch code that called resctrl_exit() - there is no other path into it. I don't think its a problem for the arch code to also know to return an error. I haven't found anything where which error is returned actually matter - so there is no 'direction', only errors. I agree the documentation can be improved. > I understand from a previous comment [1] that one of the Arm "tricks" is to > offline all domains. This seems to be a good "catch all" to ensure that at least > current flows of concern are not running anymore. Yup, that is necessary to stop the limbo and overflow workers for trying to read the counters - which is a waste of time. > Considering this, > what if there is a new resctrl_error_exit() that does something like below? > > void resctrl_error_exit(void) > { > mutex_lock(&rdtgroup_mutex); > WARN_ON_ONCE(resctrl_new_function_returns_true_if_any_resource_has_a_control_or_monitor_domain()); > resctrl_fs_teardown(); > mutex_unlock(&rdtgroup_mutex); > resctrl_exit(); > } Makes sense - the alternative would be to dig around to cancel the limbo/overflow work, and a subsequent CPU-online might start them again. > I do not see this as requiring anything new from architecture but instead > making what Arm already does a requirement and keeping existing behavior? I agree. > This leaves proc_resctrl_show() that relies on resctrl_mounted but as I see > the resctrl_fs_cleanup() will remove all resource groups that should result > in the output being as it will be if resctrl is not mounted. No dependence > on architecture callbacks returning resctrl_exit() aware data here. Great - I'd missed that one, >> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c >> index fdf2616c7ca0..3f9c37637d7e 100644 >> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c >> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c >> @@ -4416,11 +4429,26 @@ int __init resctrl_init(void) >> return ret; >> } >> >> +/** >> + * resctrl_exit() - Remove the resctrl filesystem and free resources. >> + * >> + * Called by the architecture code in response to a fatal error. >> + * Resctrl files and structures are removed from kernfs to prevent further >> + * configuration. > > Please write with imperative tone. For example, "Remove resctrl files and structures ..." > >> + */ >> void __exit resctrl_exit(void) >> { >> + mutex_lock(&rdtgroup_mutex); >> + resctrl_fs_teardown(); >> + mutex_unlock(&rdtgroup_mutex); >> + >> debugfs_remove_recursive(debugfs_resctrl); > > Is it possible for the fatal error handling to trigger multiple calls here? > To protect against multiple calls causing issues debugfs_resctrl can be set to NULL here. It's not, the driver keeps track of whether resctrl_init() had been called, and only calls resctrl_exit() once. But I agree it would be better to make it robust to this. >> unregister_filesystem(&rdt_fs_type); > > unregister_filesystem() seems to handle an already-unregistered filesystem. > >> - sysfs_remove_mount_point(fs_kobj, "resctrl"); >> + >> + /* >> + * The sysfs mount point added by resctrl_init() is not removed so that >> + * it can be used to umount resctrl. >> + */ > > (needs imperative) > >> >> resctrl_mon_resource_exit(); >> } Thanks, James