Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: Zeng Heng <zengheng4@huawei.com>,
	ben.horgan@arm.com, Dave.Martin@arm.com,
	tan.shaopeng@jp.fujitsu.com, reinette.chatre@intel.com,
	fenghuay@nvidia.com, tglx@kernel.org, will@kernel.org,
	hpa@zytor.com, bp@alien8.de, babu.moger@amd.com,
	dave.hansen@linux.intel.com, mingo@redhat.com,
	tony.luck@intel.com, gshan@redhat.com, catalin.marinas@arm.com
Cc: linux-arm-kernel@lists.infradead.org, x86@kernel.org,
	linux-kernel@vger.kernel.org, wangkefeng.wang@huawei.com
Subject: Re: [PATCH v8 next 01/10] fs/resctrl: Fix MPAM Partid parsing errors by preserving CDP state during umount
Date: Thu, 14 May 2026 18:06:19 +0100	[thread overview]
Message-ID: <0efe369b-8a01-45b1-bd0f-4fdd4ebbb18a@arm.com> (raw)
In-Reply-To: <20260413085405.1166412-2-zengheng4@huawei.com>

Hi Zeng,

I think this should be a separate patch as its fixing a problem not adding a feature. It's
not actually relevant to the rest of the series.

On 13/04/2026 09:53, Zeng Heng wrote:
> This patch fixes a pre-existing issue in the resctrl filesystem teardown
> sequence where premature clearing of cdp_enabled could lead to MPAM Partid
> parsing errors.

resctrl changes need to go via tip, which has a bunch of rules about commit messages,
see Documentation/process/maintainer-tip.rst

You end up with a structure describing the current state, e.g:
| When resctrl is umounted it disables CDP,

what the problem is, e.g:
| CLOSID remain in the limbo list, and the mbm monitors continue to be read
| after umount. MPAM changes the meaning of CLOSID when CDP is enabled/disabled,
| resulting in out of bounds accesses.

Then, what you do about it, here you are:
| Throwing away the limbo list on umount.

(I don't suggest you take this wording - its just an example)

"this patch" is a phrase to avoid, acronyms like CLOSID need capitalising, etc.


> The closid to partid conversion logic inherently depends on the global
> cdp_enabled state. However, rdt_disable_ctx() clears this flag early in
> the umount path, while free_rmid() operations will reference after that.
> This creates a window where partid parsing operates with inconsistent CDP
> state, potentially makes monitor reads with wrong partid mapping.
> 
> Additionally, rmid_entry remaining in limbo between mount sessions may
> trigger potential partid out-of-range errors, leading to MPAM fault
> interrupts and subsequent MPAM disablement.

Can you give more details on this. I assume its going from CDP-disable to
enabled, means MPAM doubles the CLOSID from the stale limbo list, making it
out of range.


> Reorder rdt_kill_sb() to delay rdt_disable_ctx() until after
> rmdir_all_sub() and resctrl_fs_teardown() complete. This ensures
> all rmid-related operations finish with correct CDP state.


> Introduce rdt_flush_limbo() to flush and cancel limbo work before the
> filesystem teardown completes.

So, discard the state in the hope we don't need it again.
What happens if the filesystem is mounted again quickly afterwards?
Surely we get noisy bandwidth results for ~minutes afterwards?


> An alternative approach would be to cancel limbo work on umount

Sounds like a move in the right direction - having bits of resctrl still
taking CPU time when its not in use is surprising.

I'd love to eventually remove the limbo worker and have the RMID alloc code
search the limbo list for a clean RMID when a control/monitor group is created.
By deferring the work as late as possible, we do less work overall.


> and restart it on remount with remaked bitmap.
> However, this would require substantial changes in the resctrl layer to
> handle CDP state transitions across mount sessions,

This would be necessary if the limbo timer was stopped on umount too.
It also covers cases where you kexec and re-mount resctrl.

I think this is a good idea. I agree its more work.


> which is beyond the
> scope of the reqpartid feature work this patchset focuses on.

Was it a mistake to include it in this series then?


> The current
> fix addresses the immediate correctness issue with minimal churn.

I'm not a fan of papering over problems in resctrl. Could we do it properly
by rebuilding the limbo list at mount time as you suggested above?



Thanks,

James


  reply	other threads:[~2026-05-14 17:06 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-13  8:53 [PATCH v8 next 00/10] arm_mpam: Introduce Narrow-PARTID feature Zeng Heng
2026-04-13  8:53 ` [PATCH v8 next 01/10] fs/resctrl: Fix MPAM Partid parsing errors by preserving CDP state during umount Zeng Heng
2026-05-14 17:06   ` James Morse [this message]
2026-04-13  8:53 ` [PATCH v8 next 02/10] arm_mpam: Add intPARTID and reqPARTID support for Narrow-PARTID feature Zeng Heng
2026-05-14 17:06   ` James Morse
2026-04-13  8:53 ` [PATCH v8 next 03/10] arm_mpam: Disable reqPARTID expansion when Narrow-PARTID is unavailable Zeng Heng
2026-05-14 17:06   ` James Morse
2026-04-13  8:53 ` [PATCH v8 next 04/10] arm_mpam: Refactor rmid to reqPARTID/PMG mapping Zeng Heng
2026-05-14 17:07   ` James Morse
2026-04-13  8:54 ` [PATCH v8 next 05/10] arm_mpam: Propagate control group config to sub-monitoring groups Zeng Heng
2026-04-13  8:54 ` [PATCH v8 next 06/10] arm_mpam: Add boot parameter to limit mpam_intpartid_max Zeng Heng
2026-04-13  8:54 ` [PATCH v8 next 07/10] fs/resctrl: Add rmid_entry state helpers Zeng Heng
2026-04-13  8:54 ` [PATCH v8 next 08/10] arm_mpam: Implement dynamic reqPARTID allocation for monitoring groups Zeng Heng
2026-04-13  8:54 ` [PATCH v8 next 09/10] fs/resctrl: Wire up rmid expansion and reclaim functions Zeng Heng
2026-04-13  8:54 ` [PATCH v8 next 10/10] arm_mpam: Add mpam_sync_config() for dynamic rmid expansion Zeng Heng
2026-04-16  6:29 ` [PATCH v8 next 00/10] arm_mpam: Introduce Narrow-PARTID feature Shaopeng Tan (Fujitsu)
2026-04-20  7:31 ` Zeng Heng
2026-04-28  4:20   ` Shaopeng Tan (Fujitsu)
2026-04-29  9:47     ` Zeng Heng
2026-04-29 10:59 ` Zeng Heng
2026-05-14 17:06 ` James Morse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0efe369b-8a01-45b1-bd0f-4fdd4ebbb18a@arm.com \
    --to=james.morse@arm.com \
    --cc=Dave.Martin@arm.com \
    --cc=babu.moger@amd.com \
    --cc=ben.horgan@arm.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=fenghuay@nvidia.com \
    --cc=gshan@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=reinette.chatre@intel.com \
    --cc=tan.shaopeng@jp.fujitsu.com \
    --cc=tglx@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=zengheng4@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox