* Re: [PATCH] hwmon: (asus-ec-sensors) add ROG STRIX B650E-E GAMING WIFI
From: Eugene Shalygin @ 2026-04-09 8:18 UTC (permalink / raw)
To: Veronika Kossmann
Cc: Guenter Roeck, Veronika Kossmann, Veronika Kossmann,
Jonathan Corbet, Shuah Khan, linux-hwmon, linux-doc, linux-kernel
In-Reply-To: <25bbdd98-656e-407a-ada7-da2bdacb1aea@rxtx.cx>
Hey Veronika,
On Wed, 8 Apr 2026 at 22:29, Veronika Kossmann <nanodesu@rxtx.cx> wrote:
>
> Of course:
>
> $sensors asusec-isa-000a
> asusec-isa-000a
> Adapter: ISA adapter
> CPU: +37.0°C
> Motherboard: +38.0°C
> VRM: +51.0°C
>
> These are relevant to actual temperatures.
>
Thanks! So, there is no output for CPU current and chipset
temperature. Could you, please, test that CPU current displays
reasonable values with the following additional change:
diff --git a/asus-ec-sensors.c b/asus-ec-sensors.c
index 47e6c2db8b97..4a0b80012a6d 100644
--- a/asus-ec-sensors.c
+++ b/asus-ec-sensors.c
@@ -284,6 +284,7 @@ static const struct ec_sensor_info
sensors_family_amd_600[] = {
EC_SENSOR("VRM", hwmon_temp, 1, 0x00, 0x33),
[ec_sensor_temp_t_sensor] =
EC_SENSOR("T_Sensor", hwmon_temp, 1, 0x00, 0x36),
+ [ec_sensor_curr_cpu] = EC_SENSOR("CPU", hwmon_curr, 1, 0x00, 0xf4),
[ec_sensor_fan_cpu_opt] =
EC_SENSOR("CPU_Opt", hwmon_fan, 2, 0x00, 0xb0),
[ec_sensor_temp_water_in] =
At least it should correlate with CPU load.
And we need to replace SENSOR_SET_TEMP_CHIPSET_CPU_MB with
SENSOR_TEMP_CPU | SENSOR_TEMP_MB.
Cheers,
Eugene
^ permalink raw reply related
* Re: [PATCH mm-unstable v15 03/13] mm/khugepaged: generalize __collapse_huge_page_* for mTHP support
From: David Hildenbrand (Arm) @ 2026-04-09 8:14 UTC (permalink / raw)
To: Nico Pache
Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
lance.yang, Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers,
matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
willy, yang, ying.huang, ziy, zokeefe
In-Reply-To: <CAA1CXcA8nE2PZrB4J1gV5v16PeQ7X2AiwjJ3gO1Q8hW7tyTtPQ@mail.gmail.com>
On 4/8/26 21:48, Nico Pache wrote:
> On Thu, Mar 12, 2026 at 2:56 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>> On 3/12/26 21:36, David Hildenbrand (Arm) wrote:
>>>
>>> Okay, now I am confused. Why are you not taking care of
>>> collapse_scan_pmd() in the same context?
>>>
>>> Because if you make sure that we properly check against a max_ptes_swap
>>> similar as in the style above, we'd rule out swapin right from the start?
>>>
>>> Also, I would expect that all other parameters in there are similarly
>>> handled?
>>>
>>
>> Okay, I think you should add the following:
>
> Hey! Thanks for all your reviews here.
>
> For multiple reasons, here is the solution I developed:
>
> Add a patch before the generalize __collapse.. patch that reworks the
> max_ptes* handling and introduces the helpers (no functional changes).
I assume that's roughly the patch I shared below? If so, sounds good to me.
--
Cheers,
David
^ permalink raw reply
* Re: [PATCH 0/1] Documentation: leds: leds-class: Document keyboard backlight LED class naming
From: Kate Hsuan @ 2026-04-09 6:43 UTC (permalink / raw)
To: Hans de Goede, Lee Jones, Pavel Machek, Jonathan Corbet,
Shuah Khan
Cc: Rishit Bansal, Carlos Ferreira, Edip Hazuri, Mustafa Ekşi,
Xavier Bestel, linux-leds, linux-doc
In-Reply-To: <20260406174638.320135-1-johannes.goede@oss.qualcomm.com>
Hi Hans,
On 4/7/26 1:46 AM, Hans de Goede wrote:
> Hi All,
>
> Over the last couple of years there have been several attempts to add
> upstream kernel support for controlling keyboard backlights consisting of
> a small number of backlight zones, think e.g. : "main", "cursor" and
> "keypad" zones.
>
> All of these attempts have gotten or are stuck on the lack of consensus on
> a userspace API (1) for controlling such zoned keyboard backlights.
>
> Previous discussion can be summarized as there being consensus that
> these backlights should be represented as (multi-color) LED class devices
> with one LED class device per zone, mirroring the existing use of
> a LED class device for controlling single zone keyboard backlights.
>
> The only thing which really still needs to be agreed upon is a naming
> scheme for the per zone LED class devices so that userspace can detect:
>
> 1. That the function of these is to control a zoned keyboard backlight.
> 2. How to group the per zone devices together for a single keyboard.
>
> The single patch in this series documents the currently undocumented naming
> scheme for single zone keyboard backlights and extends this with a naming
> scheme to use for multi-zone keyboard backlights.
>
> This is send out as a separate patch rather then as part of a series
> implementing this in the hope to get multiple drivers which are in
> the process of being upstreamed unstuck wrt the LED class naming problem.
>
> Drivers which need this are:
>
> 1. HP WMI laptop driver Omen gaming keyboards backlight control support:
> First 2023 attempt:
> https://lore.kernel.org/platform-driver-x86/20230131235027.36304-1-rishitbansal0@gmail.com/
> Later 2024 attempt which includes an earlier version of this doc patch:
> https://lore.kernel.org/platform-driver-x86/20240719100011.16656-1-carlosmiguelferreira.2003@gmail.com/
> Current ongoing 2026 attempt:
> https://lore.kernel.org/platform-driver-x86/20260304105831.119349-3-edip@medip.dev/
>
> 2. Casper Excalibur laptop driver (inc. multi-zone kbd backlight control):
> https://lore.kernel.org/platform-driver-x86/20240806205001.191551-2-mustafa.eskieksi@gmail.com/
> This one unfortunately seems to have stalled.
>
> 3. Logitech G710/G710+ gaming keyboards HID driver:
> https://lore.kernel.org/linux-input/20260402075239.3829699-1-xav@bes.tel/
> Posted a week ago, needs an agreement on the LED class dev naming scheme
> to continue.
>
> Regards,
>
> Hans
>
>
> 1) The lack of such an API may not always have been the sole reason these
> drivers have gotten stuck, but it was always a factor.
>
>
> Carlos Ferreira (1):
> Documentation: leds: leds-class: Document keyboard backlight LED class
> naming
>
> Documentation/leds/leds-class.rst | 63 +++++++++++++++++++++++++++++++
> 1 file changed, 63 insertions(+)
>
Thank you for your work.
The kbd_zoned_backlight is pretty useful for the upper-layer apps, such
as upower.
This gives additional information about the location of the keyboard
backlight LED and allows the upower to expose the APIs with the zone
information to the user space. It also improves the user experience of
the keyboard backlight control.
Acked-by: Kate Hsuan <hpa@redhat.com>
^ permalink raw reply
* Re: [PATCH 3/4] docs/zh_CN: update rust/quick-start.rst translation
From: Dongliang Mu @ 2026-04-09 5:37 UTC (permalink / raw)
To: Gary Guo, Ben Guo, Alex Shi, Yanteng Si, Jonathan Corbet
Cc: linux-doc, linux-kernel, rust-for-linux
In-Reply-To: <DHNYKCR34P1F.1EZ3D0A8UB8S5@garyguo.net>
On 4/9/26 1:43 AM, Gary Guo wrote:
> On Wed Apr 8, 2026 at 5:51 PM BST, Ben Guo wrote:
>> On 4/8/26 7:33 PM, Gary Guo wrote:
>>> Hi Ben,
>>>
>>> Thanks on updating the doc translation. There has been new changes to
>>> quick-start.rst on rust-next, could you update the translation to base on that
>>> please?
>>>
>>> Thanks,
>>> Gary
>> Hi Gary,
>>
>>
>>
>>
>>
>> Thanks for the review. This series is based on the Chinese documentation
>> maintainer's tree (alexs/linux.git docs-next), which does not yet have
>> the latest quick-start.rst changes from the Rust-for-Linux rust-next
>> tree.
>>
>> Would it be better to wait until those changes land in our base tree
>> and then resend with the updated translation? Or would you prefer a
>> different approach?
>>
>> Thanks,
>> Ben
> I don't see the issue of sending translation of the latest quick-start.rst even
> if it's not in your base yet. By the time the changes land upstream, the
> original quick-start.rst would already be there.
Hi Gary,
Let’s wait for the rust-next changes to land upstream first, then I’ll
ask Ben Guo to sync that commit. Otherwise, the Chinese translation
would do not match the original English doc, which will confuse readers.
We have checktransupdate.py in place for monitoring the updates in
English documents.
Dongliang Mu
>
> Best,
> Gary
^ permalink raw reply
* Re: [PATCH net-next V5 00/12] devlink: add per-port resource support
From: patchwork-bot+netdevbpf @ 2026-04-09 3:10 UTC (permalink / raw)
To: Tariq Toukan
Cc: edumazet, kuba, pabeni, andrew+netdev, davem, horms,
donald.hunter, jiri, corbet, skhan, saeedm, leon, mbloch, shuah,
matttbe, chuck.lever, cjubran, ohartoov, moshe, dtatulea,
daniel.zahka, shshitrit, cratiu, jacob.e.keller, parav,
ajayachandra, shayd, kees, danielj, netdev, linux-kernel,
linux-doc, linux-rdma, linux-kselftest, gal
In-Reply-To: <20260407194107.148063-1-tariqt@nvidia.com>
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Tue, 7 Apr 2026 22:40:55 +0300 you wrote:
> Hi,
>
> This series by Or adds devlink per-port resource support.
> See detailed description by Or below [1].
>
> Regards,
> Tariq
>
> [...]
Here is the summary with links:
- [net-next,V5,01/12] devlink: Refactor resource functions to be generic
https://git.kernel.org/netdev/net-next/c/7be3163c49b2
- [net-next,V5,02/12] devlink: Add port-level resource registration infrastructure
https://git.kernel.org/netdev/net-next/c/6f38acfed5ed
- [net-next,V5,03/12] net/mlx5: Register SF resource on PF port representor
https://git.kernel.org/netdev/net-next/c/4be8326d817e
- [net-next,V5,04/12] netdevsim: Add devlink port resource registration
https://git.kernel.org/netdev/net-next/c/085b234b28cc
- [net-next,V5,05/12] devlink: Add dump support for device-level resources
https://git.kernel.org/netdev/net-next/c/11636b550eea
- [net-next,V5,06/12] devlink: Include port resources in resource dump dumpit
https://git.kernel.org/netdev/net-next/c/810b76394d69
- [net-next,V5,07/12] devlink: Add port-specific option to resource dump doit
https://git.kernel.org/netdev/net-next/c/7511ff14f30d
- [net-next,V5,08/12] selftest: netdevsim: Add devlink port resource doit test
https://git.kernel.org/netdev/net-next/c/396135377104
- [net-next,V5,09/12] devlink: Document port-level resources and full dump
https://git.kernel.org/netdev/net-next/c/170e160a0e7c
- [net-next,V5,10/12] devlink: Add resource scope filtering to resource dump
https://git.kernel.org/netdev/net-next/c/1bc45341a6ea
- [net-next,V5,11/12] selftest: netdevsim: Add resource dump and scope filter test
https://git.kernel.org/netdev/net-next/c/2a8e91235254
- [net-next,V5,12/12] devlink: Document resource scope filtering
https://git.kernel.org/netdev/net-next/c/78c327c1728d
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* 答复: [PATCH v2] Documentation/kernel-parameters: fix architecture alignment for pt, nopt, and nobypass
From: Li,Rongqing(ACG CCN) @ 2026-04-09 2:18 UTC (permalink / raw)
To: Jonathan Corbet, Andrew Morton, Borislav Petkov, Randy Dunlap,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Shuah Khan, Peter Zijlstra, Feng Tang, Pawan Gupta, Dapeng Mi,
Kees Cook, Marco Elver, Paul E . McKenney, Askar Safin,
Bjorn Helgaas, Sohil Mehta
In-Reply-To: <20260330105957.2271-1-lirongqing@baidu.com>
> 主题: [PATCH v2] Documentation/kernel-parameters: fix architecture alignment
> for pt, nopt, and nobypass
>
> From: Li RongQing <lirongqing@baidu.com>
>
> Commit ab0e7f20768a ("Documentation: Merge x86-specific boot options doc
> into kernel-parameters.txt") introduced a formatting regression where
> architecture tags were placed on separate lines with broken indentation.
> This caused the 'nopt' [X86] parameter to appear as if it belonged to the
> [PPC/POWERNV] section.
>
> Furthermore, since the main 'iommu=' parameter heading already specifies it is
> for [X86, EARLY], the subsequent standalone [X86] tags for 'pt', 'nopt', and the
> AMD GART options are redundant and clutter the documentation.
>
> Clean up the formatting by removing these redundant tags and properly
> attributing the 'nobypass' option to [PPC/POWERNV].
>
Ping
thanks
[Li,Rongqing]
> Fixes: ab0e7f20768a ("Documentation: Merge x86-specific boot options doc
> into kernel-parameters.txt")
> Acked-by: Randy Dunlap <rdunlap@infradead.org>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Shuah Khan <skhan@linuxfoundation.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Borislav Petkov (AMD) <bp@alien8.de>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Feng Tang <feng.tang@linux.alibaba.com>
> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Cc: Kees Cook <kees@kernel.org>
> Cc: Marco Elver <elver@google.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Askar Safin <safinaskar@gmail.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Sohil Mehta <sohil.mehta@intel.com>
> ---
> Documentation/admin-guide/kernel-parameters.txt | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt
> b/Documentation/admin-guide/kernel-parameters.txt
> index 03a5506..5253c23 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2615,15 +2615,11 @@ Kernel parameters
> Intel machines). This can be used to prevent the usage
> of an available hardware IOMMU.
>
> - [X86]
> pt
> - [X86]
> nopt
> - [PPC/POWERNV]
> - nobypass
> + nobypass [PPC/POWERNV]
> Disable IOMMU bypass, using IOMMU for PCI devices.
>
> - [X86]
> AMD Gart HW IOMMU-specific options:
>
> <size>
> --
> 2.9.4
^ permalink raw reply
* Re: [RFC net-next 15/15] Documentation: networking: add ipxlat translator guide
From: Xavier HSINYUAN @ 2026-04-09 2:17 UTC (permalink / raw)
To: Daniel Gröber
Cc: ralf, antonio, corbet, davem, edumazet, horms, kuba, linux-doc,
linux-kernel, netdev, pabeni, skhan
In-Reply-To: <fldksy7obiaonlcxrjcbnfkfmaup27t3fq3ktubd7sx35fsswx@hjmchh6sr7rw>
Hi Daniel,
> Indeed, the JSON is just wrong and --do dev-set is missing. However
> `--family ipxlat` works for me and looking at the code is basically the
> same as specifying --spec.
>
> Could you try this:
>
> $ JSON='{"ifindex": '"$IID"', "config": {"xlat-prefix6": { "prefix": "'$ADDR_HEX'", "prefix-len": 96}}}'
> $ ./tools/net/ynl/pyynl/cli.py --family ipxlat --do dev-set --json "$JSON"
This looks good to me now. `--family ipxlat` is fine with me if this runs
from the source tree.
> I worry once we start with that we're really just re-stating what's already
> extensively documented in the RFCs.
>
> How about a reference to RFC 7915 Appendix A? This has a full bidirectional
> end-to-end example of how translation operates:
> https://datatracker.ietf.org/doc/html/rfc7915#appendix-A
>
> Admittedly using a /96 prefix (which the appendix doesn't) would make it
> easier to grok whats going on. Not sure that's reason enough to get into
> more detailed examples here.
A reference to RFC 7915 Appendix A sounds good to me. Still, a short /96
mapping example would help readers quickly see how the translation works
before reading the full RFC, and would make the following NAT64 section
easier to follow as well.
Best regards,
Xavier
^ permalink raw reply
* Re: [PATCH net-next] docs: netdev: improve wording of reviewer guidance
From: patchwork-bot+netdevbpf @ 2026-04-09 2:10 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
skhan, workflows, linux-doc
In-Reply-To: <20260406175334.3153451-1-kuba@kernel.org>
Hello:
This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 6 Apr 2026 10:53:34 -0700 you wrote:
> Reword the reviewer guidance based on behavior we see on the list.
> Steer folks:
> - towards sending tags
> - away from process issues.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>
> [...]
Here is the summary with links:
- [net-next] docs: netdev: improve wording of reviewer guidance
https://git.kernel.org/netdev/net-next/c/bd5c24e4001d
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH] crash: Support high memory reservation for range syntax
From: Youling Tang @ 2026-04-09 1:55 UTC (permalink / raw)
To: Baoquan He, Sourabh Jain
Cc: Andrew Morton, Jonathan Corbet, Vivek Goyal, Dave Young, kexec,
linux-kernel, linux-doc, Youling Tang
In-Reply-To: <adZYpnwOxgvFMLaT@MiWiFi-R3L-srv>
Hi, Baoquan
On 4/8/26 21:32, Baoquan He wrote:
> On 04/08/26 at 10:01am, Sourabh Jain wrote:
>> Hello Youling,
>>
>> On 04/04/26 13:11, Youling Tang wrote:
>>> From: Youling Tang <tangyouling@kylinos.cn>
>>>
>>> The crashkernel range syntax (range1:size1[,range2:size2,...]) allows
>>> automatic size selection based on system RAM, but it always reserves
>>> from low memory. When a large crashkernel is selected, this can
>>> consume most of the low memory, causing subsequent hardware
>>> hotplug or drivers requiring low memory to fail due to allocation
>>> failures.
>>
>> Support for high crashkernel reservation has been added to
>> address the above problem.
>>
>> However, high crashkernel reservation is not supported with
>> range-based crashkernel kernel command-line arguments.
>> For example: crashkernel=0M-1G:100M,1G-4G:160M,4G-8G:192M
>>
>> Many users, including some distributions, use range-based
>> crashkernel configuration. So, adding support for high crashkernel
>> reservation with range-based configuration would be useful.
> Sorry for late response. And I have to say sorry because I have some
> negative tendency on this change.
>
> We use crashkernel=xM|G and crashkernel=range1:size1[,range2:size2,...]
> as default setting, so that people only need to set suggested amount
> of memory. While crashkernel=,high|low is for advanced user to customize
> their crashkernel value. In that case, user knows what's high memory and
> low memory, and how much is needed separately to achieve their goal, e.g
> saving low memory, taking away more high memory.
>
> To be honest, above grammers sounds simple, right? I believe both of you
> know very well how complicated the current crashkernel code is. I would
> suggest not letting them becomre more and more complicated by extending
> the grammer further and further. Unless you meet unavoidable issue with
> the existing grammer.
>
> Here comes my question, do you meet unavoidable issue with the existing
> grammer when you use crashkernel=range1:size1[,range2:size2,...] and
> think it's not satisfactory, and at the same time crashkernel=,high|low
> can't meet your demand either?
Yes, regular users generally don't know about high memory and low memory,
and probably don't know how much crashkernel memory should be reserved
either. They mostly just use the default crashkernel parameters configured
by the distribution.
For advanced users, the current grammar is sufficient, because
'crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset],>boundary'
can definitely be replaced with 'crashkernel=size,high'.
The main purpose of this patch is to provide distributions with a more
reasonable default parameter configuration (satisfying most requirements),
without having to set different distribution default parameters for
different
scenarios (physical machines, virtual machines) and different machine
models.
Thanks,
Youling.
>
> Thanks
> Baoquan
>
^ permalink raw reply
* Re: [RFC PATCH v3 00/10] mm/damon: introduce DAMOS failed region quota charge ratio
From: SeongJae Park @ 2026-04-09 0:00 UTC (permalink / raw)
To: Bijan Tabatabai
Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, Brendan Higgins,
David Gow, David Hildenbrand, Jonathan Corbet, Lorenzo Stoakes,
Michal Hocko, Mike Rapoport, Shuah Khan, Shuah Khan,
Suren Baghdasaryan, Vlastimil Babka, damon, kunit-dev, linux-doc,
linux-kernel, linux-kselftest, linux-mm
In-Reply-To: <20260408165001.8473-1-bijan311@gmail.com>
On Wed, 8 Apr 2026 11:48:27 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:
> On Mon, 6 Apr 2026 18:05:22 -0700 SeongJae Park <sj@kernel.org> wrote:
>
> Hi SJ,
>
> > TL; DR: Let users set different DAMOS quota charge ratios for DAMOS
> > action failed regions, for deterministic and consistent DAMOS action
> > progress.
> >
> > Common Reports: Unexpectedly Slow DAMOS
> > =======================================
> >
> > One common issue report that we get from DAMON users is that DAMOS
> > action applying progress speed is sometimes much slower than expected.
> > And one common root cause is that the DAMOS quota is exceeded by the
> > action applying failed memory regions.
> >
> > For example, a group of users tried to run DAMOS-based proactive memory
> > reclamation (DAMON_RECLAIM) with 100 MiB per second DAMOS quota. They
> > ran it on a system having no active workload which means all memory of
> > the system is cold. The expectation was that the system will show 100
> > MiB per second reclamation until (nearly) all memory is reclaimed. But
> > what they found is that the speed is quite inconsistent and sometimes it
> > becomes very slower than the expectation, sometimes even no reclamation
> > at all for about tens of seconds. The upper limit of the speed (100 MiB
> > per second) was being kept as expected, though.
> >
> > By monitoring the qt_exceeds (number of DAMOS quota exceed events) DAMOS
> > stat, we found DAMOS quota is always exceeded when the speed is slow. By
> > monitoring sz_tried and sz_applied (the total amount of DAMOS action
> > tried memory and succeeded memory) DAMOS stats together, we found the
> > reclamation attempts nearly always failed when the speed is slow.
> >
> > DAMOS quota charges DAMOS action tried regions regardless of the
> > successfulness of the try. Hence in the example reported case, there
> > was unreclaimable memory spread around the system memory. Sometimes
> > nearly 100 MiB of memory that DAMOS tried to reclaim in the given quota
> > interval was reclaimable, and therefore showed nearly 100 MiB per second
> > speed. Sometimes nearly 99 MiB of memory that DAMOS was trying to
> > reclaim in the given quota interval was unreclaimable, and therefore
> > showing only about 1 MiB per second reclaim speed.
> >
> > We explained it is an expected behavior of the feature rather than a
> > bug, as DAMOS quota is there for only the upper-limit of the speed. The
> > users agreed and later reported a huge win from the adoption of
> > DAMON_RECLAIM on their products.
>
> Thanks for this series. This is a problem I have come across and am looking
> forward to seeing this land.
Thank you for acknowledging. I'm hoping this to land on 7.2-rc1.
[...]
> > DAMOS Action Failed Region Quota Charge Ratio
> > =============================================
> >
> > Let users set the charge ratio for the action-failed memory, for more
> > optimal and deterministic use of DAMOS. It allows users to specify the
> > numerator and the denominator of the ratio for flexible setup. For
> > example, let's suppose the numerator and the denominator are set to 1
> > and 4,096, respectively. The ratio is 1 / 4,096. A DAMOS scheme action
> > is applied to 5 GiB memory. For 1 GiB of the memory, the action is
> > succeeded. For the rest (4 GiB), the action is failed. Then, only 1
> > GiB and 1 MiB quota is charged.
> >
> > The optimal charge ratio will depend on the use case and
> > system/workload. I'd recommend starting from setting the nominator as 1
> > and the denominator as PAGE_SIZE and tune based on the results, because
> > many DAMOS actions are applied at page level.
>
> This makes sense, but the quota is also considered when setting the minimum
> allowable score in damos_adjust_quota(), which, to my understanding, assumes
> that all of the all of a region's data will by applied. If an action fails for
> a significant amount of the memory, a lower score than what was calculated in
> damos_adjust_quota() could be valid. If that's the case, the scheme would be
> applied to fewer regions than strictly necessary.
Good point, you are right.
>
> As you mention above, this is not a correctness issue because the quota only
> guarantees an upper limit on the amount of data the scheme is applied to.
I agree.
> Additionally, it may very well be true that what I listed above would not be
> very noticeable in practice.
I guess it is hopefully true, for following reason.
The score for each region is calculated as a weigted sum of the access
frequency and the age of the region. To avoid DAMOS action is repeatedly
applied to only a few regions, we reset age of regions after a DAMOS action is
applied to the region, regardless of the action failure. So, periodically the
score of the regions having the action unapplicable region will get low, make
no big impact to the minimum score threshold calculation.
But real data could say something different. I will be happy to be proven
wrong my real data. :)
> I just thought this was worth pointing out as
> something to think about.
Indeed. Thank you for pointing out. Nonetheless this is not a new issue that
introduced by this patch series. And the impact is not clear at the moment. I
will be happy to revisit this in parallel to this patch series.
Thanks,
SJ
[...]
^ permalink raw reply
* Re: [PATCH v2 00/16] fs,x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Reinette Chatre @ 2026-04-08 23:41 UTC (permalink / raw)
To: Moger, Babu, Babu Moger, corbet@lwn.net, tony.luck@intel.com,
Dave.Martin@arm.com, james.morse@arm.com, tglx@kernel.org,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
Cc: skhan@linuxfoundation.org, x86@kernel.org, hpa@zytor.com,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
vschneid@redhat.com, kas@kernel.org, rick.p.edgecombe@intel.com,
akpm@linux-foundation.org, pmladek@suse.com,
rdunlap@infradead.org, dapeng1.mi@linux.intel.com,
kees@kernel.org, elver@google.com, paulmck@kernel.org,
lirongqing@baidu.com, safinaskar@gmail.com, fvdl@google.com,
seanjc@google.com, pawan.kumar.gupta@linux.intel.com,
xin@zytor.com, tiala@microsoft.com, chang.seok.bae@intel.com,
Lendacky, Thomas, elena.reshetova@intel.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-coco@lists.linux.dev, kvm@vger.kernel.org,
eranian@google.com, peternewman@google.com
In-Reply-To: <20aaacfb-9601-4343-a5d5-f3df6152155b@amd.com>
Hi Babu,
On 4/8/26 4:07 PM, Moger, Babu wrote:
> On 4/8/2026 4:24 PM, Reinette Chatre wrote:
>> On 4/8/26 1:45 PM, Babu Moger wrote:
...
>>> The modes "global_assign_ctrl_inherit_mon_per_cpu" and "global_assign_ctrl_assign_mon_per_cpu" represent the actual PLZA modes.
>>>
>>> Both of these modes introduce new files kernel_mode_cpus/ and kernel_mode_cpus_list in the resctrl group.
>>
>> Right. To be specific when the user changes the mode to either "global_assign_ctrl_inherit_mon_per_cpu" or
>> "global_assign_ctrl_assign_mon_per_cpu" the new files will be created in the default resource group with
>> associated setting applied globally at that time.
>
> If, at that point, "info/kernel_mode_assignment" points to // (the default group), is that correct?
I see "info/kernel_mode_assignment" pointing to default group as the only
option right after a mode switch away from "inherit_ctrl_and_mon".
To elaborate, the current idea is that the mode within info/kernel_mode determines
which, if any, control files are presented to user space.
Assuming that the system boots up with:
# cat info/kernel_mode
[inherit_ctrl_and_mon]
global_assign_ctrl_inherit_mon_per_cpu
global_assign_ctrl_assign_mon_per_cpu
In above scenario "info/kernel_mode_assignment" does not exist (is not visible to
user space).
When the user switches to either "global_assign_ctrl_inherit_mon_per_cpu" or
'global_assign_ctrl_assign_mon_per_cpu" then "info/kernel_mode_assignment" is created
(or made visible to user space) and is expected to point to default group.
User can change the group using "info/kernel_mode_assignment" at this point.
If the current scenario is below ...
# cat info/kernel_mode
[global_assign_ctrl_inherit_mon_per_cpu]
inherit_ctrl_and_mon
global_assign_ctrl_assign_mon_per_cpu
... then "info/kernel_mode_assignment" will exist but what it should contain if
user switches mode at this point may be up for discussion.
option 1)
When user switches mode to "global_assign_ctrl_assign_mon_per_cpu" then
the resource group in "info/kernel_mode_assignment" is reset to the
default group and all CPUs PLZA state reset to match. The kernel_mode_cpus
and kernel_mode_cpuslist files become visible in default resource group
and they contain "all online CPUs".
option 2)
When user switches mode to "global_assign_ctrl_assign_mon_per_cpu" then
the resource group in "info/kernel_mode_assignment" is kept and all
CPUs PLZA state set to match it while also keeping the current
values of that resource group's kernel_mode_cpus and kernel_mode_cpuslist
files.
I am leaning towards "option 1" to keep it consistent with a switch from
"inherit_ctrl_and_mon" and being deterministic about how a mode is started with
a clean slate. What are your thoughts? What would be use case where a user would
want to switch between "global_assign_ctrl_inherit_mon_per_cpu" and
"global_assign_ctrl_assign_mon_per_cpu" to just switch rmid_en on and off?
> And if "info/kernel_mode_assignment" points to a different group
> (for example, test//), then the kernel_mode_cpus/ and
> kernel_mode_cpus_list files will be created only under the test//
> group. Is that correct?
I expect that if "info/kernel_mode_assignment" exists then the group
listed within contains kernel_mode_cpus and kernel_mode_cpuslist.
How the group ends up in "info/kernel_mode_assignment" could result
from mode change or from write by user space.
Reinette
^ permalink raw reply
* Re: [PATCH v10 12/21] gpu: nova-core: mm: Add unified page table entry wrapper enums
From: John Hubbard @ 2026-04-08 23:13 UTC (permalink / raw)
To: Joel Fernandes, Eliot Courtney, linux-kernel
Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
Alex Gaynor, Boqun Feng, Alistair Popple, Timur Tabi, Edwin Peer,
Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi, joel,
linux-doc, amd-gfx, intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <da8d03f8-0294-417b-b684-2c20d577f94a@nvidia.com>
On 4/8/26 9:58 AM, Joel Fernandes wrote:
> On 4/8/2026 9:26 AM, Eliot Courtney wrote:
>> On Tue Apr 7, 2026 at 10:59 PM JST, Joel Fernandes wrote:
>>> On 4/7/2026 9:42 AM, Eliot Courtney wrote:
>>>> On Tue Apr 7, 2026 at 6:55 AM JST, Joel Fernandes wrote:
...>> [1]: https://github.com/Edgeworth/linux/commits/review/nova-mm-v10/
> First, thanks for the effort. I looked through this, its pretty much what I
> had before when I used traits. I don't think it is better to be honest. In
> fact your version is worse, it adds many new types and things like the
> following which I did not need before.
Hi Joel and all,
I also looked through Eliot's above attempt carefully, and actually
liked it a lot (sorry! haha):
* It cleans up the code. The initial working version was readable, but
also had lots of noise on the screen: match statements and pairs of
v2/v3 statements.
And interestingly, the mmu_version was, in effect, sporadically
implementing a Trait-based approach. But because it is custom,
readers don't benefit as much as they would with Traits, which
tell you immediately how things are structured.
Joel, I am passionately in agreement with your principles: code must
be readable on the screen.
In this case, though, Traits make considerably more readable,
especially if one makes the very reasonable assumption that readers are
thoroughly accustomed to dealing with Rust traits.
>
> To put it mildly, the following suggestion should not be anywhere near my code:
>
lol I understand, believe me. But this is short and not too bad, really.
> /// Type-erased MMU-specific [`Vmm`] implementations.
Type erasure remains a semi-exotic thing, IMHO. As such, another
sentence to elaborate on this would be a nice touch.
> enum VmmInner {
> /// `Vmm` implementation for MMU v2.
> V2(VmmImpl<MmuV2>),
> /// `Vmm` implementation for MMU v3.
> V3(VmmImpl<MmuV3>),
> }
>
> /// MMU-specific [`Vmm`] implementation.
> struct VmmImpl<M: Mmu> {
>
> Seriously, I have to pass on this. :-)
>
> And, you unfortunately seem to have ignored my point about requiring 4 NEW
> traits (Mmu, PteOps, PdeOps, DualPdeOps etc), which I did not need before.
> So you're making the code much much worse than before actually. We don't
> new traits and types pointlessly.
They are not pointless.
However! What I think would be nice is: do a new v11 with approximately
this approach, and then we can beat it into being as readable as
possible.
thanks,
--
John Hubbard
^ permalink raw reply
* Re: [PATCH v8 0/2] PCI: s390: Expose the UID as an arch specific PCI slot attribute
From: Vasily Gorbik @ 2026-04-08 23:12 UTC (permalink / raw)
To: Niklas Schnelle
Cc: Bjorn Helgaas, Jonathan Corbet, Lukas Wunner, Shuah Khan,
Farhan Ali, Alexander Gordeev, Christian Borntraeger,
Gerald Schaefer, Gerd Bayer, Heiko Carstens, Julian Ruess,
Matthew Rosato, Peter Oberparleiter, Ramesh Errabolu,
Sven Schnelle, linux-doc, linux-kernel, linux-pci, linux-s390,
Randy Dunlap
In-Reply-To: <20260407-uid_slot-v8-0-15ae4409d2ce@linux.ibm.com>
On Tue, Apr 07, 2026 at 03:24:44PM +0200, Niklas Schnelle wrote:
> Add a mechanism for architecture specific attributes on
> PCI slots in order to add the user-defined ID (UID) as an s390 specific
> PCI slot attribute. First though improve some issues with the s390 specific
> documentation of PCI sysfs attributes noticed during development.
> Niklas Schnelle (2):
> docs: s390/pci: Improve and update PCI documentation
> PCI: s390: Expose the UID as an arch specific PCI slot attribute
>
> Documentation/arch/s390/pci.rst | 151 +++++++++++++++++++++++++++-------------
> arch/s390/include/asm/pci.h | 4 ++
> arch/s390/pci/pci_sysfs.c | 20 ++++++
> drivers/pci/slot.c | 13 +++-
> 4 files changed, 140 insertions(+), 48 deletions(-)
Applied to s390 tree, thank you!
^ permalink raw reply
* Re: [PATCH v2 00/16] fs,x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Moger, Babu @ 2026-04-08 23:07 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, corbet@lwn.net, tony.luck@intel.com,
Dave.Martin@arm.com, james.morse@arm.com, tglx@kernel.org,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
Cc: skhan@linuxfoundation.org, x86@kernel.org, hpa@zytor.com,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
vschneid@redhat.com, kas@kernel.org, rick.p.edgecombe@intel.com,
akpm@linux-foundation.org, pmladek@suse.com,
rdunlap@infradead.org, dapeng1.mi@linux.intel.com,
kees@kernel.org, elver@google.com, paulmck@kernel.org,
lirongqing@baidu.com, safinaskar@gmail.com, fvdl@google.com,
seanjc@google.com, pawan.kumar.gupta@linux.intel.com,
xin@zytor.com, tiala@microsoft.com, chang.seok.bae@intel.com,
Lendacky, Thomas, elena.reshetova@intel.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-coco@lists.linux.dev, kvm@vger.kernel.org,
eranian@google.com, peternewman@google.com
In-Reply-To: <72297351-2954-4318-81b6-7de409e5552c@intel.com>
Hi Reinette,
On 4/8/2026 4:24 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 4/8/26 1:45 PM, Babu Moger wrote:
>> On 4/7/26 23:45, Reinette Chatre wrote:
>>> On 4/7/26 6:01 PM, Babu Moger wrote:
>
>>>> That said, I’m open to not having a dedicated group if we can still support all the features that PLZA provides without it.
>>>
>>> I find that enabling user space to share CLOSID/RMID between user space
>>> and kernel space to indeed support what PLZA provides. I think I am missing
>>> something here since below proposal again attempts to isolate a resource group
>>> (CLOSID) for kernel work.
>>
>> No. I dont want to isolate a group just for PLZA. All I am saying
>> is, we should provide option to create a dedicated group if the user
>> wants to do it.
> I agree. I do not see resctrl needing to do anything to accomplish this though. If
> the user wants a group dedicated to kernel mode/PLZA then all that is needed is for the
> user not to assign any tasks to this group, either via changes to the group's tasks file
> or via the group's cpus/cpus_list files.
>
>>>>
>>>> The mode can simply be determined on a per-group basis. We can
>>>> introduce two new files—kernel_mode_cpus and
>>>> kernel_mode_cpus_list—within each resctrl group when kmode (or
>>>> PLZA) is supported.
>>>
>>> I think having these files in every resource group is confusing since user can only interact
>>> with these files in one resource group for current PLZA. Why not *just* have the files in the
>>> resource group that matches the group in info/kernel_mode_assignment?
>>
>> The default group can also serve as the PLZA group.
>>
>> #cat info/kernel_mode_assignment
>> //
>>
>> At this point, the (kmode_cpus / kmode_cpus_list) files will exist in the default group:
>>
>> Then user changes the PLZA group to "test".
>>
>> #echo "test//" > info/kernel_mode_assignment
>>
>> At this point, we expect the files "(kmode_cpus/kmode_cpus_list)" to be visible in "test//" group.
>>
>> One open question is whether we should remove the visibility of these files from the default group. It’s unclear if we can safely do this dynamically.
>>
>> An alternative approach would be to always keep the files present, but allow access to them only for groups that are listed in "info/kernel_mode_assignment".
>
> The files appearing/disappearing is just how the user experiences the resctrl fs interface.
> Within resctrl the files could indeed always exist but resctrl can use the kernfs_show()
> API to show/hide them as needed. Similar to resctrl_bmec_files_show() that you created.
> Allowing/removing access becomes complicated because user space can always do a chmod
> to change permissions that resctrl would need to handle.
>
> I do not know if there are sharp corners here when thinking about strange scenarios where
> user opens a file before resctrl changes visibility or permissions and then user space
> interacts with the file. This may be worthwhile to test to matter which mechanism is used.
>
>>>> Files and behavior:
>>>> - cpus / cpus_list:
>>>>
>>>> CPUs listed here use the same allocation for both user and kernel space.
>>>
>>> Both user and kernel space?
>>
>> As it stands today, the CPU list is written to MSR_PQR_ASSOC, resulting in the same allocation for both user and kernel within a given CLOS.
>>
>> Kernel-mode allocation changes only if specific CPUs are included in the kmode_cpus list.
>
> ack.
>
>>>> There is no change to the current semantics of these files.
>>>> If these files are empty, the group effectively becomes a PLZA-dedicated group.
>>>
>>> I do not see it this way. If the cpu/cpus_list files are empty then it means that the
>>> tasks in the group will use their own CLOSID/RMID for user space allocation and
>>> monitoring. What allocations/monitoring is used by tasks when in kernel mode depends
>>> on whether the CPU the task is running on can be found in a kernel_mode_cpus/kernel_mode_cpuslist
>>> file. If the CPU the task is running on can be found in a kernel_mode_cpus/kernel_mode_cpuslist
>>> file then it will inherit whatever the PQR_PLZA setting of that CPU which is the allocation
>>> associated with the resource group to which that kernel_mode_cpus/kernel_mode_cpuslist belongs.
>>> If the CPU the task is running on cannot be found in kernel_mode_cpus/kernel_mode_cpuslist
>>> then its kernel work will inherit its user space allocations and monitoring.
>>>
>>
>> Yes. that is correct. I think our understanding is correct, but our implementation ideas are different it seems.
>
> While we have been sharing different ideas I have tried to be clear on *why* I made
> certain choices and attempted to provide specific feedback to your ideas. If you find
> your plan to be better then please respond to my feedback about it to help me understand
> why that may be the better solution. If you find your solution is better then could you please
> describe it with detail? At this time I do not have a clear understanding of what you propose.
>
> ...
>>
>> Let me make sure I understand what you mentioned earlier. Copied the text below from the thread for the context:
>>
>> https://lore.kernel.org/lkml/3305c18e-9e50-4df0-b9f1-c61028628967@intel.com/
>> =====================================================================
>>
>> Please consider the intent of this file when thinking about names. The idea is that "info/kernel_mode"
>> specifies the "mode" of how kernel work is handled and it determines the configuration files used in that
>> mode as well as the syntax when interacting with those files. By renaming "kernel_mode_assignment" to
>> "kmode_groups" it implicitly requires all future kernel mode enhancements to need some data related to "groups".
>>
>> In summary, I think this can be simplified by introducing just two new files in info/ that enables the
>> user to (a) select and (b) configure the "kernel mode". To start there can be just two modes,
>> global_assign_ctrl_inherit_mon_per_cpu and global_assign_ctrl_assign_mon_per_cpu.
>> global_assign_ctrl_inherit_mon_per_cpu mode requires a control group in kernel_mode_assignment while
>> global_assign_ctrl_assign_mon_per_cpu requires a control and monitoring group.
>>
>> The resource group in info/kernel_mode_assignment gets two additional files "kernel_mode_cpus" and
>> "kernel_mode_cpus_list" that contains the CPUs enabled with the kernel mode configuration, by default
>> it will be all online CPUs. The resource group can continue to be used to manage allocations of and
>> monitor user space tasks. Specifically, the "cpus", "cpus_list", and "tasks" files remain.
>>
>> A user wanting just "global" settings will get just that when writing the group to
>> info/kernel_mode_assignment. A user wanting "per CPU" settings can follow the
>> info/kernel_mode_assignment setting with changes to that resource group's kernel_mode_cpus/kernel_mode_cpus_list
>> files. Any task running on a CPU that is *not* in kernel_mode_cpus/kernel_mode_cpus_list can be
>> expected to inherit both CLOSID and RMID from user space for all kernel work.
>>
>> ======================================================================
>>
>> Let me try to get few clarification on things here.
>>
>> # cat info/kernel_mode
>> [inherit_ctrl_and_mon]
>> global_assign_ctrl_inherit_mon_per_cpu
>> global_assign_ctrl_assign_mon_per_cpu
>>
>> My understanding of "inherit_ctrl_and_mon" is that the kernel
>> inherits both the CLOS and the RMID from user space. Basically both
>> user and kernel uses same CLOSID and RMID. This reflects the current
>> behavior (without PLZA) correct? This would correspond to the
>
> Correct.
>
>> default group when resctrl is mounted.
>
>>
>> The modes "global_assign_ctrl_inherit_mon_per_cpu" and "global_assign_ctrl_assign_mon_per_cpu" represent the actual PLZA modes.
>>
>> Both of these modes introduce new files kernel_mode_cpus/ and kernel_mode_cpus_list in the resctrl group.
>
> Right. To be specific when the user changes the mode to either "global_assign_ctrl_inherit_mon_per_cpu" or
> "global_assign_ctrl_assign_mon_per_cpu" the new files will be created in the default resource group with
> associated setting applied globally at that time.
If, at that point, "info/kernel_mode_assignment" points to // (the
default group), is that correct?
And if "info/kernel_mode_assignment" points to a different group (for
example, test//), then the kernel_mode_cpus/ and kernel_mode_cpus_list
files will be created only under the test// group. Is that correct?
Thanks
Babu
^ permalink raw reply
* Re: allowing '-' instead of ':' in kernel-doc descriptions
From: Randy Dunlap @ 2026-04-08 22:44 UTC (permalink / raw)
To: Mauro Carvalho Chehab; +Cc: Jonathan Corbet, Linux Documentation
In-Reply-To: <dskdc44um6l6sw43uazfpzmsv4tkesog7sro22qkvzxyflvurt@pwhb3rs44ga7>
Hi,
[modified Subject & recipients]
On 11/13/25 2:32 AM, Mauro Carvalho Chehab wrote:
> On Thu, Nov 13, 2025 at 03:49:27AM -0500, Michael S. Tsirkin wrote:
>> On Thu, Nov 13, 2025 at 12:55:37PM +1100, Stephen Rothwell wrote:
>>> Hi all,
>>>
>>> Today's linux-next build (htmldocs) produced these warnings:
>>>
>>> WARNING: /home/sfr/kernels/next/next/include/linux/virtio_config.h:174 duplicate section name 'Return'
>>> WARNING: /home/sfr/kernels/next/next/include/linux/virtio_config.h:184 duplicate section name 'Return'
>>> WARNING: /home/sfr/kernels/next/next/include/linux/virtio_config.h:190 duplicate section name 'Return'
>>>
>>> Introduced by commit
>>>
>>> bee8c7c24b73 ("virtio: introduce map ops in virtio core")
>>>
>>> but is probably a bug in our scripts as those lines above have "Returns:"
>>> in them, not "Return:".
>>>
>>> These have turned up now since a bug was fixed that was repressing a
>>> lot of warnings.
>>
>> Indeed. But the rest of header says Returns ... without : so I will just
>> fix this one to do the same. I also fixed other issues in the comments
>> in this header while I was at it. Will post shortly.
>
> That's the best approach. We could instead change the new section detection
> regex to accept just one space at most:
>
> diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
> index f7dbb0868367..bab0ec3abe31 100644
> --- a/scripts/lib/kdoc/kdoc_parser.py
> +++ b/scripts/lib/kdoc/kdoc_parser.py
> @@ -46,7 +46,7 @@ doc_decl = doc_com + KernRe(r'(\w+)', cache=False)
> known_section_names = 'description|context|returns?|notes?|examples?'
> known_sections = KernRe(known_section_names, flags = re.I)
> doc_sect = doc_com + \
> - KernRe(r'\s*(@[.\w]+|@\.\.\.|' + known_section_names + r')\s*:([^:].*)?$',
> + KernRe(r'\s?(@[.\w]+|@\.\.\.|' + known_section_names + r')\s*:([^:].*)?$',
> flags=re.I, cache=False)
>
> doc_content = doc_com_body + KernRe(r'(.*)', cache=False)
>
> (patch not tested)
>
> But, if we do so, someone has to check if this won't cause regressions
> elsewhere. I'm almost sure a change like that will break something...
Following up:
I've been testing this patch for about 3 months now.
The only problems that I have seen with it are these:
(in linux-next-20260408)
WARNING: ../drivers/pci/msi/api.c:102 duplicate section name 'Return'
WARNING: ../mm/damon/core.c:1472 duplicate section name 'Return'
WARNING: ../mm/damon/core.c:1472 duplicate section name 'Return'
WARNING: ../include/uapi/drm/i915_drm.h:2403 duplicate section name 'Return'
WARNING: ../include/uapi/drm/i915_drm.h:2403 duplicate section name 'Return'
WARNING: ../include/uapi/drm/i915_drm.h:2403 duplicate section name 'Return'
WARNING: ../drivers/gpu/drm/drm_atomic_helper.c:3546 duplicate section name 'Return'
WARNING: ../drivers/gpu/drm/drm_atomic_helper.c:3710 duplicate section name 'Return'
WARNING: ../drivers/gpu/drm/drm_of.c:382 duplicate section name 'Return'
WARNING: ../drivers/gpu/drm/drm_of.c:432 duplicate section name 'Return'
WARNING: ../drivers/gpu/drm/drm_gem.c:900 duplicate section name 'Return'
WARNING: ../include/linux/w1.h:115 duplicate section name 'Return'
WARNING: ../include/linux/w1.h:115 duplicate section name 'Return'
--
~Randy
^ permalink raw reply
* [PATCH v2] doc: watchdog: fix typos etc.
From: Randy Dunlap @ 2026-04-08 21:35 UTC (permalink / raw)
To: linux-kernel
Cc: Randy Dunlap, Andrew Morton, Jonathan Corbet, Shuah Khan,
linux-doc, Björn Persson
Correct typos in lockup-watchdogs.rst.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
---
v2: corrections from Björn (Thanks)
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-doc@vger.kernel.org
Cc: Björn Persson <Bjorn@xn--rombobjrn-67a.se>
Documentation/admin-guide/lockup-watchdogs.rst | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- linux-next-20260406.orig/Documentation/admin-guide/lockup-watchdogs.rst
+++ linux-next-20260406/Documentation/admin-guide/lockup-watchdogs.rst
@@ -41,7 +41,7 @@ is a trade-off between fast response to
Implementation
==============
-The soft and hard lockup detectors are built around a hrtimer.
+The soft and hard lockup detectors are built around an hrtimer.
In addition, the softlockup detector regularly schedules a job, and
the hard lockup detector might use Perf/NMI events on architectures
that support it.
@@ -49,7 +49,7 @@ that support it.
Frequency and Heartbeats
------------------------
-The core of the detectors in a hrtimer. It servers multiple purpose:
+The core of the detectors is an hrtimer. It serves multiple purposes:
- schedules watchdog job for the softlockup detector
- bumps the interrupt counter for hardlockup detectors (heartbeat)
^ permalink raw reply
* Re: [PATCH] doc: watchdog: fix typos etc.
From: Randy Dunlap @ 2026-04-08 21:28 UTC (permalink / raw)
To: Björn Persson
Cc: Andrew Morton, Jonathan Corbet, Shuah Khan, linux-doc,
linux-kernel
In-Reply-To: <20260408205611.0f7e38de@tag.xn--rombobjrn-67a.se>
On 4/8/26 11:56 AM, Björn Persson wrote:
> Randy Dunlap wrote:
>> -Similarly to the softlockup case, the current stack trace is displayed
>> +Similar to the softlockup case, the current stack trace is displayed
>
> "Similarly" modifies "is displayed", so the adverbial form is correct.
>
>> -The core of the detectors in a hrtimer. It servers multiple purpose:
>> +The core of the detectors is an hrtimer. It servers multiple purposes:
>
> And "servers" should be "serves".
Thank you.
Andrew, I'll send a v2 patch.
--
~Randy
^ permalink raw reply
* Re: [PATCH v2 00/16] fs,x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Reinette Chatre @ 2026-04-08 21:24 UTC (permalink / raw)
To: Babu Moger, corbet@lwn.net, tony.luck@intel.com,
Dave.Martin@arm.com, james.morse@arm.com, tglx@kernel.org,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
Cc: skhan@linuxfoundation.org, x86@kernel.org, hpa@zytor.com,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
vschneid@redhat.com, kas@kernel.org, rick.p.edgecombe@intel.com,
akpm@linux-foundation.org, pmladek@suse.com,
rdunlap@infradead.org, dapeng1.mi@linux.intel.com,
kees@kernel.org, elver@google.com, paulmck@kernel.org,
lirongqing@baidu.com, safinaskar@gmail.com, fvdl@google.com,
seanjc@google.com, pawan.kumar.gupta@linux.intel.com,
xin@zytor.com, tiala@microsoft.com, chang.seok.bae@intel.com,
Lendacky, Thomas, elena.reshetova@intel.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-coco@lists.linux.dev, kvm@vger.kernel.org,
eranian@google.com, peternewman@google.com
In-Reply-To: <0ae2b267-4527-4251-9136-6afdc3fc97a5@amd.com>
Hi Babu,
On 4/8/26 1:45 PM, Babu Moger wrote:
> On 4/7/26 23:45, Reinette Chatre wrote:
>> On 4/7/26 6:01 PM, Babu Moger wrote:
>>> That said, I’m open to not having a dedicated group if we can still support all the features that PLZA provides without it.
>>
>> I find that enabling user space to share CLOSID/RMID between user space
>> and kernel space to indeed support what PLZA provides. I think I am missing
>> something here since below proposal again attempts to isolate a resource group
>> (CLOSID) for kernel work.
>
> No. I dont want to isolate a group just for PLZA. All I am saying
> is, we should provide option to create a dedicated group if the user
> wants to do it.
I agree. I do not see resctrl needing to do anything to accomplish this though. If
the user wants a group dedicated to kernel mode/PLZA then all that is needed is for the
user not to assign any tasks to this group, either via changes to the group's tasks file
or via the group's cpus/cpus_list files.
>>>
>>> The mode can simply be determined on a per-group basis. We can
>>> introduce two new files—kernel_mode_cpus and
>>> kernel_mode_cpus_list—within each resctrl group when kmode (or
>>> PLZA) is supported.
>>
>> I think having these files in every resource group is confusing since user can only interact
>> with these files in one resource group for current PLZA. Why not *just* have the files in the
>> resource group that matches the group in info/kernel_mode_assignment?
>
> The default group can also serve as the PLZA group.
>
> #cat info/kernel_mode_assignment
> //
>
> At this point, the (kmode_cpus / kmode_cpus_list) files will exist in the default group:
>
> Then user changes the PLZA group to "test".
>
> #echo "test//" > info/kernel_mode_assignment
>
> At this point, we expect the files "(kmode_cpus/kmode_cpus_list)" to be visible in "test//" group.
>
> One open question is whether we should remove the visibility of these files from the default group. It’s unclear if we can safely do this dynamically.
>
> An alternative approach would be to always keep the files present, but allow access to them only for groups that are listed in "info/kernel_mode_assignment".
The files appearing/disappearing is just how the user experiences the resctrl fs interface.
Within resctrl the files could indeed always exist but resctrl can use the kernfs_show()
API to show/hide them as needed. Similar to resctrl_bmec_files_show() that you created.
Allowing/removing access becomes complicated because user space can always do a chmod
to change permissions that resctrl would need to handle.
I do not know if there are sharp corners here when thinking about strange scenarios where
user opens a file before resctrl changes visibility or permissions and then user space
interacts with the file. This may be worthwhile to test to matter which mechanism is used.
>>> Files and behavior:
>>> - cpus / cpus_list:
>>>
>>> CPUs listed here use the same allocation for both user and kernel space.
>>
>> Both user and kernel space?
>
> As it stands today, the CPU list is written to MSR_PQR_ASSOC, resulting in the same allocation for both user and kernel within a given CLOS.
>
> Kernel-mode allocation changes only if specific CPUs are included in the kmode_cpus list.
ack.
>>> There is no change to the current semantics of these files.
>>> If these files are empty, the group effectively becomes a PLZA-dedicated group.
>>
>> I do not see it this way. If the cpu/cpus_list files are empty then it means that the
>> tasks in the group will use their own CLOSID/RMID for user space allocation and
>> monitoring. What allocations/monitoring is used by tasks when in kernel mode depends
>> on whether the CPU the task is running on can be found in a kernel_mode_cpus/kernel_mode_cpuslist
>> file. If the CPU the task is running on can be found in a kernel_mode_cpus/kernel_mode_cpuslist
>> file then it will inherit whatever the PQR_PLZA setting of that CPU which is the allocation
>> associated with the resource group to which that kernel_mode_cpus/kernel_mode_cpuslist belongs.
>> If the CPU the task is running on cannot be found in kernel_mode_cpus/kernel_mode_cpuslist
>> then its kernel work will inherit its user space allocations and monitoring.
>>
>
> Yes. that is correct. I think our understanding is correct, but our implementation ideas are different it seems.
While we have been sharing different ideas I have tried to be clear on *why* I made
certain choices and attempted to provide specific feedback to your ideas. If you find
your plan to be better then please respond to my feedback about it to help me understand
why that may be the better solution. If you find your solution is better then could you please
describe it with detail? At this time I do not have a clear understanding of what you propose.
...
>
> Let me make sure I understand what you mentioned earlier. Copied the text below from the thread for the context:
>
> https://lore.kernel.org/lkml/3305c18e-9e50-4df0-b9f1-c61028628967@intel.com/
> =====================================================================
>
> Please consider the intent of this file when thinking about names. The idea is that "info/kernel_mode"
> specifies the "mode" of how kernel work is handled and it determines the configuration files used in that
> mode as well as the syntax when interacting with those files. By renaming "kernel_mode_assignment" to
> "kmode_groups" it implicitly requires all future kernel mode enhancements to need some data related to "groups".
>
> In summary, I think this can be simplified by introducing just two new files in info/ that enables the
> user to (a) select and (b) configure the "kernel mode". To start there can be just two modes,
> global_assign_ctrl_inherit_mon_per_cpu and global_assign_ctrl_assign_mon_per_cpu.
> global_assign_ctrl_inherit_mon_per_cpu mode requires a control group in kernel_mode_assignment while
> global_assign_ctrl_assign_mon_per_cpu requires a control and monitoring group.
>
> The resource group in info/kernel_mode_assignment gets two additional files "kernel_mode_cpus" and
> "kernel_mode_cpus_list" that contains the CPUs enabled with the kernel mode configuration, by default
> it will be all online CPUs. The resource group can continue to be used to manage allocations of and
> monitor user space tasks. Specifically, the "cpus", "cpus_list", and "tasks" files remain.
>
> A user wanting just "global" settings will get just that when writing the group to
> info/kernel_mode_assignment. A user wanting "per CPU" settings can follow the
> info/kernel_mode_assignment setting with changes to that resource group's kernel_mode_cpus/kernel_mode_cpus_list
> files. Any task running on a CPU that is *not* in kernel_mode_cpus/kernel_mode_cpus_list can be
> expected to inherit both CLOSID and RMID from user space for all kernel work.
>
> ======================================================================
>
> Let me try to get few clarification on things here.
>
> # cat info/kernel_mode
> [inherit_ctrl_and_mon]
> global_assign_ctrl_inherit_mon_per_cpu
> global_assign_ctrl_assign_mon_per_cpu
>
> My understanding of "inherit_ctrl_and_mon" is that the kernel
> inherits both the CLOS and the RMID from user space. Basically both
> user and kernel uses same CLOSID and RMID. This reflects the current
> behavior (without PLZA) correct? This would correspond to the
Correct.
> default group when resctrl is mounted.
>
> The modes "global_assign_ctrl_inherit_mon_per_cpu" and "global_assign_ctrl_assign_mon_per_cpu" represent the actual PLZA modes.
>
> Both of these modes introduce new files kernel_mode_cpus/ and kernel_mode_cpus_list in the resctrl group.
Right. To be specific when the user changes the mode to either "global_assign_ctrl_inherit_mon_per_cpu" or
"global_assign_ctrl_assign_mon_per_cpu" the new files will be created in the default resource group with
associated setting applied globally at that time.
>
> When the user echoes a group name into info/kernel_mode_assignment, PLZA is applied globally across all CPUs. This is default behavior.
>
> If the user wants PLZA to apply only to a specific subset of CPUs, then the kernel_mode_cpus or kernel_mode_cpus_list files need to be updated accordingly.
>
> global_assign_ctrl_inherit_mon_per_cpu : The group needs to be CTLR_MON group. This mode uses rmid_en=0 when writing PLZA MSR.
>
> global_assign_ctrl_assign_mon_per_cpu: The group needs to be CTLR_MON/MON group. This mode uses rmid_en=1 when writing PLZA MSR.
>
> Did I get it right?
This is my understanding also, yes.
Reinette
^ permalink raw reply
* Re: [PATCH v2 00/16] fs,x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Babu Moger @ 2026-04-08 20:45 UTC (permalink / raw)
To: Reinette Chatre, corbet@lwn.net, tony.luck@intel.com,
Dave.Martin@arm.com, james.morse@arm.com, tglx@kernel.org,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
Cc: skhan@linuxfoundation.org, x86@kernel.org, hpa@zytor.com,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
vschneid@redhat.com, kas@kernel.org, rick.p.edgecombe@intel.com,
akpm@linux-foundation.org, pmladek@suse.com,
rdunlap@infradead.org, dapeng1.mi@linux.intel.com,
kees@kernel.org, elver@google.com, paulmck@kernel.org,
lirongqing@baidu.com, safinaskar@gmail.com, fvdl@google.com,
seanjc@google.com, pawan.kumar.gupta@linux.intel.com,
xin@zytor.com, tiala@microsoft.com, chang.seok.bae@intel.com,
Lendacky, Thomas, elena.reshetova@intel.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-coco@lists.linux.dev, kvm@vger.kernel.org,
eranian@google.com, peternewman@google.com
In-Reply-To: <efc269f8-bf98-4f12-8d76-1fee564be84c@intel.com>
Hi Reinette,
On 4/7/26 23:45, Reinette Chatre wrote:
> Hi Babu,
>
> On 4/7/26 6:01 PM, Babu Moger wrote:
>> Hi Reinette,
>>
>> On 4/7/26 12:48, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 4/6/26 3:45 PM, Babu Moger wrote:
>>>> Hi Reinette,
>>>>
>>>> Sorry for the late response. I was trying to get confirmation about the use case.
>>>
>>> No problem. I appreciate that you did this so that we can make sure resctrl supports
>>> needed use cases.
>>>
>>>>
>>>> On 3/31/26 17:24, Reinette Chatre wrote:
>>>>> On 3/30/26 11:46 AM, Babu Moger wrote:
>>>>>> On 3/27/26 17:11, Reinette Chatre wrote:
>>>>>>> On 3/26/26 10:12 AM, Babu Moger wrote:
>>>>>>>> On 3/24/26 17:51, Reinette Chatre wrote:
>>>>>>>>> On 3/12/26 1:36 PM, Babu Moger wrote:
>>>
>>>>> can have domains that span different CPUs. There thus seem to be a built in assumption of what a "domain"
>>>>> means for PQR_PLZA_ASSOC so it sounds to me as though, instead of saying that "PQR_PLZA_ASSOC needs
>>>>> to be the same in QoS domain" it may be more accurate to, for example, say that "PQR_PLZA_ASSOC has L3 scope"?
>>>>
>>>> Yes.
>>>
>>> Above is about L3 scope ...
>>
>> Yes. The scope for PQR_PLZA_ASSOC is L3.
>>
>> Is that what you are asking here?
>
> I was trying to point out that there appears to be a mismatch between the actual scope and
> the planned implementation. As highlighted below during the discussion about "global" this is
> fine with me and I just wanted to confirm that this matches your intentions.
Ack.
>
>>
>>>
>>>>>
>>>>> This seems to be what this implementation does since it hardcodes PQR_PLZA_ASSOC scope to the L3
>>>>> resource but that creates dependency to the L3 resource that would make PLZA unusable if, for example,
>>>>> the user boots with "rdt=!l3cat" while wanting to use PLZA to manage MBA allocations when in kernel?
>>>>
>>>> Yes. that is correct. It should not be attached to one resource. We need to change it to global scope.
>>>
>>> Can I interpret "global scope" as "all online CPUs"? Doing so will simplify
>>
>> Yes. That is correct.
>>
>>
>>> supporting this feature. It does not sound practical for a user wanting to assign
>>> different resource groups to kernel work done in different domains ... the guidance should
>>> instead be to just set the allocations of one resource group to what is needed in the different
>>> domains? There may be more flexibility when supporting per-domain RMIDs though but so far
>>> it sounds as though the focus is global. We can consider what needs to be done to support
>>> some type of "per-domain" assignment as exercise whether current interface could support it
>>> in the future.
>>
>> Yes. Makes sense.
>>
>>>
>
> ...
>
>>>> The PLZA MSR is updated when user changes the association to the
>>>> file. No context switch code changes are needed. This will be
>>>> dedicated group. The current resctrl group files, "cpus, cpus_list
>>>
>>> Why does this have to be a dedicated group? One of the conclusions from v1
>>> discussion was that the "PLZA group" need *not* be a dedicated group. I repeated that
>>> in my earlier response that I left quoted above. You did not respond to these
>>> conclusions and statements in this regard while you keep coming back to this
>>> needing to be a dedicated group without providing a motivation to do so.
>>> Could you please elaborate why a dedicated group is required?
>>
>> If the same group applies identical limits to both user and kernel
>> space, it essentially behaves like a current resctrl group. In that
>> sense, it’s not really a PLZA group. PLZA’s key value is the ability
>> to separate allocations between user space and kernel space. A
>
> The plan has never been to force identical allocations for user and kernel
> space since that would go against this feature entirely. Even so, just as
> user and kernel space cannot be forced to have identical allocations they
> also cannot be forced to have different allocations. Specifically,
> a task *can* use the same CLOSID for user and kernel space work just as easily
> as it can use *different* CLOSID for user and kernel space work. There
> should not be any CLOSID reserved just for kernel work. Or am I missing something?
No. You are not missing anything.
>
>> single CPU can belong to two groups: one group manages the user-
>> space allocation for that CPU, while another manages the kernel-mode
>> allocation.
>
> Exactly. This is why it is important to have two files for this CPU association
> within a resource group. The cpus/cpus_list file continues to be used as today
> while the new kernel_mode_cpus/kernel_mode_cpus_list is used for kernel work.
> With this a task can be associated with any resource group for its user space
> allocations but when it runs on one of the CPUs within kernel_mode_cpus then
> its kernel work will be done with allocations of the resource group the
> kernel_mode_cpus file belongs to, which may or may not be the same
> resource group that the user space task belongs to.
Yes. Exactly.
>
>> This approach also simplifies file handling, which is another reason
>> I prefer it.
>
> I *think* we have different interpretations of "dedicated group":
> It sounds as though you interpret "dedicated group" as a way that enforces
> the same allocations to user space and kernel work.
> I interpret "dedicated group" essentially as a CLOSID reserved for kernel
> work. Since I do not see that resctrl should dedicate a CLOSID/resource group
> for kernel work I have been pushing against such "dedicated group".
Actually, our understanding is same. Probably, I am not explaining it
right. Hope we get there soon.
>
>> That said, I’m open to not having a dedicated group if we can still support all the features that PLZA provides without it.
>
> I find that enabling user space to share CLOSID/RMID between user space
> and kernel space to indeed support what PLZA provides. I think I am missing
> something here since below proposal again attempts to isolate a resource group
> (CLOSID) for kernel work.
No. I dont want to isolate a group just for PLZA. All I am saying is, we
should provide option to create a dedicated group if the user wants to
do it.
>
>>>> Add a file, "info/kmode_monitor", to describe how kmode is monitored.
>>>>
>>>> # cat info/kmode_monitor
>>>> [inherit_ctrl_and_mon] <- Kernel uses the same CLOSID/RMID as user. Default option for the "global"
>>>> assign_ctrl_inherit_mon <- One CLOSID for all kernel work; RMID inherited from user.
>>>> assign_ctrl_assign_mon <- One resource group (CLOSID+RMID) for all kernel work. Default option for "cpu" type.
>>>
>>> My first thought is that the naming is confusing. resctrl has a very strong relationship between
>>> "RMID" and "monitoring" so naming a file "monitor" that deals with allocation/ctrl/CLOSID is
>>> potentially confusion.
>>>
>>> Apart from that, while I think I understand where you are going by separating the mode into
>>> two files I am concerned about future complications needing to accommodate all different
>>> combinations of the (now) essentially two modes. My preference is thus to keep this simple by
>>> keeping the mode within one file.
>>>
>>> Even so, when stepping back, it does not really look like we need to separate the "global"
>>> and "per CPU" modes. We could just have a single "per CPU" mode and the "global" is just
>>> its default of "all CPUs", no?
>>
>> Yes. That correct.
>>
>>>
>>> Consider, for example, the implementation just consisting of:
>>>
>>> # cat info/kernel_mode
>>> [inherit_ctrl_and_mon]
>>> global_assign_ctrl_inherit_mon_per_cpu
>>> global_assign_ctrl_assign_mon_per_cpu
>>>
>>>>
>>>> Rename “kernel_mode_assignment” to “kmode_group” to assign the specific group to kmode. This file usage is same as before.
>>>>
>>>> #cat info/kmode_groups (Renamed "kernel_mode_assignment")
>>>> //
>>>
>>> Please consider the intent of this file when thinking about names. The idea is that "info/kernel_mode"
>>> specifies the "mode" of how kernel work is handled and it determines the configuration files used in that
>>> mode as well as the syntax when interacting with those files. By renaming "kernel_mode_assignment" to
>>> "kmode_groups" it implicitly requires all future kernel mode enhancements to need some data related to "groups".
>>>
>>> In summary, I think this can be simplified by introducing just two new files in info/ that enables the
>>> user to (a) select and (b) configure the "kernel mode". To start there can be just two modes,
>>> global_assign_ctrl_inherit_mon_per_cpu and global_assign_ctrl_assign_mon_per_cpu.
>>> global_assign_ctrl_inherit_mon_per_cpu mode requires a control group in kernel_mode_assignment while
>>> global_assign_ctrl_assign_mon_per_cpu requires a control and monitoring group.
>>>
>>> The resource group in info/kernel_mode_assignment gets two additional files "kernel_mode_cpus" and
>>> "kernel_mode_cpus_list" that contains the CPUs enabled with the kernel mode configuration, by default
>>> it will be all online CPUs. The resource group can continue to be used to manage allocations of and
>>> monitor user space tasks. Specifically, the "cpus", "cpus_list", and "tasks" files remain.
>>>
>>> A user wanting just "global" settings will get just that when writing the group to
>>> info/kernel_mode_assignment. A user wanting "per CPU" settings can follow the
>>> info/kernel_mode_assignment setting with changes to that resource group's kernel_mode_cpus/kernel_mode_cpus_list
>>> files. Any task running on a CPU that is *not* in kernel_mode_cpus/kernel_mode_cpus_list can be
>>> expected to inherit both CLOSID and RMID from user space for all kernel work.
>>
>> After further consideration, I don’t think the info/kernel_mode file
>> is necessary. There’s no need to enforce a specific mode for all the
>> PLZA groups. Avoiding this constraint makes the design more
>> flexible, particularly as we move toward supporting multiple PLZA
>> groups in the future. MPAM already appears capable of handling more
>> than one group—for example, one group could use
>> inherit_ctrl_and_mon, while another could use
>> global_assign_ctrl_inherit_mon_per_cpu.
>
> You are looking ahead at future capabilities for which we do not know all requirements
> at this time. I think it is very good to consider how things may progress and your example
> of MPAM is of course on point. I believe the current design does consider this progression.
> Please see https://lore.kernel.org/lkml/2ab556af-095b-422b-9396-f845c6fd0342@intel.com/
> (search for "per_group_assign_ctrl_assign_mon"). In that exploration per-group assignment
> is actually accomplished with global files. I thus think we should not make such a big
> architectural decision that does not benefit the immediate feature using partial information.
> As it is, a "info/kernel_mode" gives the flexibility to expand to, if needed, configuration
> files within a resource group. That is why the intention is to associate the mode within
> info/kernel_mode with the presence/absence of info/kernel_mode_assignment (search for
> "Visibility depends on active mode in info/kernel_mode" in linked email) since in the
> future resctrl may need to enable a mode that needs configuration files within each
> resource group and when enabling such mode the per-resource group files will appear
> instead of the global info/kernel_mode_assignment.
>
>>
>> The mode can simply be determined on a per-group basis. We can introduce two new files—kernel_mode_cpus and kernel_mode_cpus_list—within each resctrl group when kmode (or PLZA) is supported.
>
> I think having these files in every resource group is confusing since user can only interact
> with these files in one resource group for current PLZA. Why not *just* have the files in the
> resource group that matches the group in info/kernel_mode_assignment?
The default group can also serve as the PLZA group.
#cat info/kernel_mode_assignment
//
At this point, the (kmode_cpus / kmode_cpus_list) files will exist in
the default group:
Then user changes the PLZA group to "test".
#echo "test//" > info/kernel_mode_assignment
At this point, we expect the files "(kmode_cpus/kmode_cpus_list)" to be
visible in "test//" group.
One open question is whether we should remove the visibility of these
files from the default group. It’s unclear if we can safely do this
dynamically.
An alternative approach would be to always keep the files present, but
allow access to them only for groups that are listed in
"info/kernel_mode_assignment".
>>
>> The info/kernel_mode_assignment file would indicate which resctrl
>> group(or groups) is used for PLZA. The files—kernel_mode_cpus and
>> kernel_mode_cpus_list would indicate how the plza is applied which
>> each group.
>
> The "how PLZA is applied" should be learned from info/kernel_mode where user
> space learns whether RMID is inherited or not. While I find kernel_mode_cpus
> and kernel_mode_cpus_list to be just for configuration and just found in the
> resource group listed in info/kernel_mode_assignment.
ok.
>
>>
>> Files and behavior:
>> - cpus / cpus_list:
>>
>> CPUs listed here use the same allocation for both user and kernel space.
>
> Both user and kernel space?
As it stands today, the CPU list is written to MSR_PQR_ASSOC, resulting
in the same allocation for both user and kernel within a given CLOS.
Kernel-mode allocation changes only if specific CPUs are included in the
kmode_cpus list.
> Monitoring would depend on info/kernel_mode_assignment ("inherit_mon")
> and kernel space allocation would depend on whether the CPU on which the task runs
> can be found in kernel_mode_cpus, no?
Yes. that is correct.
>
>
>> There is no change to the current semantics of these files.
>> If these files are empty, the group effectively becomes a PLZA-dedicated group.
>
> I do not see it this way. If the cpu/cpus_list files are empty then it means that the
> tasks in the group will use their own CLOSID/RMID for user space allocation and
> monitoring. What allocations/monitoring is used by tasks when in kernel mode depends
> on whether the CPU the task is running on can be found in a kernel_mode_cpus/kernel_mode_cpuslist
> file. If the CPU the task is running on can be found in a kernel_mode_cpus/kernel_mode_cpuslist
> file then it will inherit whatever the PQR_PLZA setting of that CPU which is the allocation
> associated with the resource group to which that kernel_mode_cpus/kernel_mode_cpuslist belongs.
> If the CPU the task is running on cannot be found in kernel_mode_cpus/kernel_mode_cpuslist
> then its kernel work will inherit its user space allocations and monitoring.
>
Yes. that is correct. I think our understanding is correct, but our
implementation ideas are different it seems.
>>
>> - kernel_mode_cpus / kernel_mode_cpus_list:
>>
>> These files determine whether a separate kernel allocation is applied.
>> If empty, user and kernel share the same allocation.
>> If non-empty, the kernel uses a separate allocation.
>>
>> The group can be CTL_MON or MON group. Based on type the group the CLOSID and RMID will be used to enable PLZA. If it is MON, then rmid_en = 1 when writing PLZA MSR.
>
> This will be difficult to get right since CTRL_MON groups also have RMID assigned.
>
>> Here’s the proposed flow:
>>
>> # mount -t resctrl resctrl /sys/fs/resctrl/
>> # cd /sys/fs/resctrl/
>> # cat info/kernel_mode_assignment
>> //
>>
>> By default, the root (default) group is PLZA-enabled when resctrl is mounted. All CPUs use CLOSID 0 for both user and kernel-mode allocation.
>>
>> # cat cpus_list
>> 1-64
>> # cat kmode_cpus_list
>> 1-64
>>
>> Next, create a new group for PLZA:
>>
>> # mkdir plza_group
>>
>> # echo "plza_group//" > info/kernel_mode_assignment
>>
>> At this point, plza_group becomes the new PLZA-enabled group, and the PLZA-related MSRs are updated accordingly.
>
> It really looks like you are getting back to trying to dedicate a resource group to
> kernel work and that is not something that resctrl should enforce.
>
>>
>> # cat plza_group/cpus_list
>> <empty>
>>
>> # cat plza_group/kmode_cpus_list
>> 1-64
>>
>> The user can then update kmode_cpus_list to apply PLZA only to a specific subset of CPUs, if desired.
>>
>>
>> What do you think of this approach?
>
> It is difficult to predict how the "next" PLZA will actually end up looking like and I find resctrl creating a complicated
> interface to support this to be risky. Instead I would prefer to focus on efficiently supporting what PLZA can do today
> and make it extensible. Apart from that I find the implicit interface, "If it is MON, then rmid_en = 1" to be too
> architecture specific for a generic interface while also not able to accurately capture user's intent (i.e. user may
> indeed, for example, want "a CTRL_MON group to have rmid_en = 1"). Finally, I am just so confused about why the implementations
> keep needing to dedicate a resource group/CLOSID to kernel work.
Let me make sure I understand what you mentioned earlier. Copied the
text below from the thread for the context:
https://lore.kernel.org/lkml/3305c18e-9e50-4df0-b9f1-c61028628967@intel.com/
=====================================================================
Please consider the intent of this file when thinking about names. The
idea is that "info/kernel_mode"
specifies the "mode" of how kernel work is handled and it determines the
configuration files used in that
mode as well as the syntax when interacting with those files. By
renaming "kernel_mode_assignment" to
"kmode_groups" it implicitly requires all future kernel mode
enhancements to need some data related to "groups".
In summary, I think this can be simplified by introducing just two new
files in info/ that enables the
user to (a) select and (b) configure the "kernel mode". To start there
can be just two modes,
global_assign_ctrl_inherit_mon_per_cpu and
global_assign_ctrl_assign_mon_per_cpu.
global_assign_ctrl_inherit_mon_per_cpu mode requires a control group in
kernel_mode_assignment while
global_assign_ctrl_assign_mon_per_cpu requires a control and monitoring
group.
The resource group in info/kernel_mode_assignment gets two additional
files "kernel_mode_cpus" and
"kernel_mode_cpus_list" that contains the CPUs enabled with the kernel
mode configuration, by default
it will be all online CPUs. The resource group can continue to be used
to manage allocations of and
monitor user space tasks. Specifically, the "cpus", "cpus_list", and
"tasks" files remain.
A user wanting just "global" settings will get just that when writing
the group to
info/kernel_mode_assignment. A user wanting "per CPU" settings can
follow the
info/kernel_mode_assignment setting with changes to that resource
group's kernel_mode_cpus/kernel_mode_cpus_list
files. Any task running on a CPU that is *not* in
kernel_mode_cpus/kernel_mode_cpus_list can be
expected to inherit both CLOSID and RMID from user space for all kernel
work.
======================================================================
Let me try to get few clarification on things here.
# cat info/kernel_mode
[inherit_ctrl_and_mon]
global_assign_ctrl_inherit_mon_per_cpu
global_assign_ctrl_assign_mon_per_cpu
My understanding of "inherit_ctrl_and_mon" is that the kernel inherits
both the CLOS and the RMID from user space. Basically both user and
kernel uses same CLOSID and RMID. This reflects the current behavior
(without PLZA) correct? This would correspond to the default group when
resctrl is mounted.
The modes "global_assign_ctrl_inherit_mon_per_cpu" and
"global_assign_ctrl_assign_mon_per_cpu" represent the actual PLZA modes.
Both of these modes introduce new files kernel_mode_cpus/ and
kernel_mode_cpus_list in the resctrl group.
When the user echoes a group name into info/kernel_mode_assignment, PLZA
is applied globally across all CPUs. This is default behavior.
If the user wants PLZA to apply only to a specific subset of CPUs, then
the kernel_mode_cpus or kernel_mode_cpus_list files need to be updated
accordingly.
global_assign_ctrl_inherit_mon_per_cpu : The group needs to be CTLR_MON
group. This mode uses rmid_en=0 when writing PLZA MSR.
global_assign_ctrl_assign_mon_per_cpu: The group needs to be
CTLR_MON/MON group. This mode uses rmid_en=1 when writing PLZA MSR.
Did I get it right?
Thanks
Babu
^ permalink raw reply
* Re: [PATCH] hwmon: (asus-ec-sensors) add ROG STRIX B650E-E GAMING WIFI
From: Veronika Kossmann @ 2026-04-08 20:28 UTC (permalink / raw)
To: Eugene Shalygin, Guenter Roeck
Cc: Veronika Kossmann, Veronika Kossmann, Jonathan Corbet, Shuah Khan,
linux-hwmon, linux-doc, linux-kernel
In-Reply-To: <CAB95QATxrJa0koMq=BCjnXvLHJ5boRBUA+76FwqWJhmhEi-Tqg@mail.gmail.com>
On 4/4/26 10:12, Eugene Shalygin wrote:
> On Sat, 4 Apr 2026 at 06:38, Guenter Roeck <linux@roeck-us.net> wrote:
>> Sashiko has a problem with this patch:
> I must admit now, that these _SET macros were a bad idea, it turned
> out to be too easy to misread. I'm going to remove them.
>
> Veronika, could you, please, show us the output from sensors with this
> version of the code?
>
> Cheers,
> Eugene
Of course:
$sensors asusec-isa-000a
asusec-isa-000a
Adapter: ISA adapter
CPU: +37.0°C
Motherboard: +38.0°C
VRM: +51.0°C
These are relevant to actual temperatures.
Best wishes,
Veronika
^ permalink raw reply
* [PATCH 1/2] KVM: arm64: Add KVM_CAP_ARM_DISABLE_EXITS for WFI/WFE passthrough
From: David Woodhouse @ 2026-04-08 20:23 UTC (permalink / raw)
To: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Marc Zyngier,
Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, kvm, linux-doc, linux-kernel,
linux-arm-kernel, kvmarm, linux-kselftest, Colton Lewis,
Jing Zhang, David Woodhouse
In-Reply-To: <20260408202557.2102476-1-dwmw2@infradead.org>
From: David Woodhouse <dwmw@amazon.co.uk>
Add a per-VM capability to allow userspace to disable WFI and/or WFE
trapping, modelled after x86's KVM_CAP_X86_DISABLE_EXITS. When the
corresponding flag is set, the trap is unconditionally cleared
regardless of the global kvm-arm.wf{i,e}_trap_policy setting.
The existing kernel command line parameters provide a system-wide
override, but a per-VM capability allows the VMM to make the decision
per guest.
This is useful for hypervisors running a combination of dedicated
pinned vCPUs which want to avoid the cost of trapping WFI/WFE, as
well as overcommitted floating instances where it is necessary.
As with the x86 equivalent, KVM_CHECK_EXTENSION returns the bitmask of
supported exit disables.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
Documentation/virt/kvm/api.rst | 28 ++++++++++++++++++++++++++++
arch/arm64/include/asm/kvm_host.h | 4 ++++
arch/arm64/kvm/arm.c | 20 ++++++++++++++++++++
include/uapi/linux/kvm.h | 6 ++++++
4 files changed, 58 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 032516783e96..e3b3bd9edeec 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8902,6 +8902,34 @@ helpful if user space wants to emulate instructions which are not
This capability can be enabled dynamically even if VCPUs were already
created and are running.
+7.47 KVM_CAP_ARM_DISABLE_EXITS
+------------------------------
+
+:Architecture: arm64
+:Target: VM
+:Parameters: args[0] is a bitmask of exits to disable
+:Returns: 0 on success, -EINVAL if unsupported bits are set.
+
+Valid bits in args[0]:
+
+ - ``KVM_ARM_DISABLE_EXITS_WFI``: Disable trapping of WFI (Wait For
+ Interrupt) instructions. The guest WFI will execute natively instead
+ of causing a VM exit.
+
+ - ``KVM_ARM_DISABLE_EXITS_WFE``: Disable trapping of WFE (Wait For
+ Event) instructions. The guest WFE will execute natively instead of
+ causing a VM exit.
+
+When a bit is set, the corresponding trap is unconditionally cleared for
+all vCPUs in the VM, overriding the system-wide ``kvm-arm.wfi_trap_policy``
+and ``kvm-arm.wfe_trap_policy`` kernel parameters.
+
+Disabling exits is a one-way operation: once an exit type is disabled for
+a VM, it cannot be re-enabled. Calling this ioctl with args[0] = 0 is a
+no-op.
+
+``KVM_CHECK_EXTENSION`` returns the bitmask of exits that can be disabled.
+
8. Other capabilities.
======================
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 70cb9cfd760a..a1bb025c641f 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -312,6 +312,10 @@ struct kvm_arch {
size_t nested_mmus_size;
int nested_mmus_next;
+ /* Per-VM WFI trap override; set via KVM_CAP_ARM_DISABLE_EXITS */
+ bool wfi_in_guest;
+ bool wfe_in_guest;
+
/* Interrupt controller */
struct vgic_dist vgic;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 410ffd41fd73..326a99fea753 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -178,6 +178,17 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
}
mutex_unlock(&kvm->lock);
break;
+ case KVM_CAP_ARM_DISABLE_EXITS:
+ if (cap->args[0] & ~KVM_ARM_DISABLE_VALID_EXITS) {
+ r = -EINVAL;
+ break;
+ }
+ if (cap->args[0] & KVM_ARM_DISABLE_EXITS_WFI)
+ kvm->arch.wfi_in_guest = true;
+ if (cap->args[0] & KVM_ARM_DISABLE_EXITS_WFE)
+ kvm->arch.wfe_in_guest = true;
+ r = 0;
+ break;
case KVM_CAP_ARM_SEA_TO_USER:
r = 0;
set_bit(KVM_ARCH_FLAG_EXIT_SEA, &kvm->arch.flags);
@@ -379,6 +390,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ARM_SEA_TO_USER:
r = 1;
break;
+ case KVM_CAP_ARM_DISABLE_EXITS:
+ r = KVM_ARM_DISABLE_VALID_EXITS;
+ break;
case KVM_CAP_SET_GUEST_DEBUG2:
return KVM_GUESTDBG_VALID_MASK;
case KVM_CAP_ARM_SET_DEVICE_ADDR:
@@ -610,6 +624,9 @@ static void vcpu_set_pauth_traps(struct kvm_vcpu *vcpu)
static bool kvm_vcpu_should_clear_twi(struct kvm_vcpu *vcpu)
{
+ if (vcpu->kvm->arch.wfi_in_guest)
+ return true;
+
if (unlikely(kvm_wfi_trap_policy != KVM_WFX_NOTRAP_SINGLE_TASK))
return kvm_wfi_trap_policy == KVM_WFX_NOTRAP;
@@ -621,6 +638,9 @@ static bool kvm_vcpu_should_clear_twi(struct kvm_vcpu *vcpu)
static bool kvm_vcpu_should_clear_twe(struct kvm_vcpu *vcpu)
{
+ if (vcpu->kvm->arch.wfe_in_guest)
+ return true;
+
if (unlikely(kvm_wfe_trap_policy != KVM_WFX_NOTRAP_SINGLE_TASK))
return kvm_wfe_trap_policy == KVM_WFX_NOTRAP;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 80364d4dbebb..694cf699ed0a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -669,6 +669,11 @@ struct kvm_ioeventfd {
#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3)
#define KVM_X86_DISABLE_EXITS_APERFMPERF (1 << 4)
+#define KVM_ARM_DISABLE_EXITS_WFI (1 << 0)
+#define KVM_ARM_DISABLE_EXITS_WFE (1 << 1)
+#define KVM_ARM_DISABLE_VALID_EXITS (KVM_ARM_DISABLE_EXITS_WFI | \
+ KVM_ARM_DISABLE_EXITS_WFE)
+
/* for KVM_ENABLE_CAP */
struct kvm_enable_cap {
/* in */
@@ -989,6 +994,7 @@ struct kvm_enable_cap {
#define KVM_CAP_ARM_SEA_TO_USER 245
#define KVM_CAP_S390_USER_OPEREXEC 246
#define KVM_CAP_S390_KEYOP 247
+#define KVM_CAP_ARM_DISABLE_EXITS 248
struct kvm_irq_routing_irqchip {
__u32 irqchip;
--
2.51.0
^ permalink raw reply related
* [PATCH 2/2] KVM: arm64: selftests: Add KVM_CAP_ARM_DISABLE_EXITS UAPI test
From: David Woodhouse @ 2026-04-08 20:23 UTC (permalink / raw)
To: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Marc Zyngier,
Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, kvm, linux-doc, linux-kernel,
linux-arm-kernel, kvmarm, linux-kselftest, Colton Lewis,
Jing Zhang, David Woodhouse
In-Reply-To: <20260408202557.2102476-1-dwmw2@infradead.org>
From: David Woodhouse <dwmw@amazon.co.uk>
Test the KVM_CAP_ARM_DISABLE_EXITS capability interface:
- KVM_CHECK_EXTENSION reports KVM_ARM_DISABLE_EXITS_WFI
- KVM_ENABLE_CAP succeeds with valid flags (WFI, zero)
- KVM_ENABLE_CAP fails with EINVAL for unknown flags
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../selftests/kvm/arm64/disable_exits.c | 48 +++++++++++++++++++
2 files changed, 49 insertions(+)
create mode 100644 tools/testing/selftests/kvm/arm64/disable_exits.c
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 878d7cb92555..d8e7ff122445 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -179,6 +179,7 @@ TEST_GEN_PROGS_arm64 += arm64/vgic_irq
TEST_GEN_PROGS_arm64 += arm64/vgic_lpi_stress
TEST_GEN_PROGS_arm64 += arm64/vgic_group_iidr
TEST_GEN_PROGS_arm64 += arm64/vgic_group_v2
+TEST_GEN_PROGS_arm64 += arm64/disable_exits
TEST_GEN_PROGS_arm64 += arm64/vpmu_counter_access
TEST_GEN_PROGS_arm64 += arm64/no-vgic-v3
TEST_GEN_PROGS_arm64 += arm64/idreg-idst
diff --git a/tools/testing/selftests/kvm/arm64/disable_exits.c b/tools/testing/selftests/kvm/arm64/disable_exits.c
new file mode 100644
index 000000000000..27fe6c9297b2
--- /dev/null
+++ b/tools/testing/selftests/kvm/arm64/disable_exits.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * disable_exits.c - Test KVM_CAP_ARM_DISABLE_EXITS UAPI
+ *
+ * Verify that KVM_CHECK_EXTENSION reports the valid exit disable mask
+ * and that KVM_ENABLE_CAP accepts valid flags and rejects invalid ones.
+ */
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+int main(int argc, char *argv[])
+{
+ struct kvm_vm *vm;
+ int r;
+
+ TEST_REQUIRE(kvm_has_cap(KVM_CAP_ARM_DISABLE_EXITS));
+
+ r = kvm_check_cap(KVM_CAP_ARM_DISABLE_EXITS);
+ TEST_ASSERT(r & KVM_ARM_DISABLE_EXITS_WFI,
+ "KVM_CHECK_EXTENSION should report WFI: got 0x%x", r);
+ TEST_ASSERT(r & KVM_ARM_DISABLE_EXITS_WFE,
+ "KVM_CHECK_EXTENSION should report WFE: got 0x%x", r);
+
+ vm = vm_create(1);
+
+ /* Valid: disable WFI trapping */
+ vm_enable_cap(vm, KVM_CAP_ARM_DISABLE_EXITS, KVM_ARM_DISABLE_EXITS_WFI);
+
+ /* Valid: disable WFE trapping */
+ vm_enable_cap(vm, KVM_CAP_ARM_DISABLE_EXITS, KVM_ARM_DISABLE_EXITS_WFE);
+
+ /* Valid: disable both */
+ vm_enable_cap(vm, KVM_CAP_ARM_DISABLE_EXITS,
+ KVM_ARM_DISABLE_EXITS_WFI | KVM_ARM_DISABLE_EXITS_WFE);
+
+ /* Valid: no exits disabled (no-op) */
+ vm_enable_cap(vm, KVM_CAP_ARM_DISABLE_EXITS, 0);
+
+ /* Invalid: unknown bit set */
+ r = __vm_enable_cap(vm, KVM_CAP_ARM_DISABLE_EXITS, 1ULL << 31);
+ TEST_ASSERT(r == -1 && errno == EINVAL,
+ "Unknown flags should fail with EINVAL: got %d errno %d",
+ r, errno);
+
+ kvm_vm_free(vm);
+ return 0;
+}
--
2.51.0
^ permalink raw reply related
* [PATCH 0/2] KVM: arm64: KVM: arm64: Add per-VM WFI/WFE exit disable capability
From: David Woodhouse @ 2026-04-08 20:23 UTC (permalink / raw)
To: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Marc Zyngier,
Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, kvm, linux-doc, linux-kernel,
linux-arm-kernel, kvmarm, linux-kselftest, Colton Lewis,
Jing Zhang, David Woodhouse
Add KVM_CAP_ARM_DISABLE_EXITS, modelled after the existing x86
KVM_CAP_X86_DISABLE_EXITS, to allow userspace to disable WFI and/or
WFE trapping on a per-VM basis.
KVM already has system-wide kernel command line parameters
(kvm-arm.wfi_trap_policy and kvm-arm.wfe_trap_policy, added in
0b5afe05377d) to control WFx trapping. However, these are global and
set at boot time. A per-VM capability allows the VMM to make the
decision per guest — for example, disabling WFI trapping for
latency-sensitive VMs with pinned vCPUs while keeping it enabled for
overcommitted guests on the same host.
When a flag is set via KVM_ENABLE_CAP, the corresponding trap is
unconditionally cleared, overriding the system-wide policy. When the
flag is not set, the system policy (including the default
single-task heuristic) applies as before.
As with the x86 equivalent, disabling exits is a one-way operation
per VM.
Tested on Graviton 3 (Neoverse-V1) metal.
David Woodhouse (2):
KVM: arm64: Add KVM_CAP_ARM_DISABLE_EXITS for WFI/WFE passthrough
KVM: arm64: selftests: Add KVM_CAP_ARM_DISABLE_EXITS UAPI test
Documentation/virt/kvm/api.rst | 28 +++++++++++++
arch/arm64/include/asm/kvm_host.h | 4 ++
arch/arm64/kvm/arm.c | 20 ++++++++++
include/uapi/linux/kvm.h | 6 +++
tools/testing/selftests/kvm/Makefile.kvm | 1 +
tools/testing/selftests/kvm/arm64/disable_exits.c | 48 +++++++++++++++++++++++
6 files changed, 107 insertions(+)
create mode 100644 tools/testing/selftests/kvm/arm64/disable_exits.c
^ permalink raw reply
* Re: [PATCH v10 12/21] gpu: nova-core: mm: Add unified page table entry wrapper enums
From: Joel Fernandes @ 2026-04-08 20:19 UTC (permalink / raw)
To: Alexandre Courbot, Eliot Courtney, Danilo Krummrich
Cc: linux-kernel, Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
Alex Gaynor, Boqun Feng, John Hubbard, Alistair Popple,
Timur Tabi, Edwin Peer, Andrea Righi, Andy Ritger, Zhi Wang,
Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi, joel,
linux-doc, amd-gfx, intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <DHNKYBM159T9.2UUQ7CU0RN0BU@nvidia.com>
Hi Alex, Eliot, Danilo,
Thanks for taking a look. Let me respond to the specific points below.
On Wed, 08 Apr 2026, Alexandre Courbot wrote:
> After a quick look I'd say that having a trait here would actually be
> *good* for correctness and maintainability.
>
> The current design implies that every operation on a page table (most
> likely using the walker) goes through a branching point. Just looking at
> `PtWalk::read_pte_at_level`, there are already at least 6
> `if version == 2 { } else { }` branches that all resolve to the same
> result. Include walking down the PDEs and you have at least a dozen of
> these just to resolve a virtual address. I know CPUs are fast, but this
> is still wasted cycles for no good reason.
I did some measurements and there is no notieceable difference in both
approaches. I ran perf and loaded nova with self-tests running. The extra
potential branching is lost in the noise. In both cases, loading nova and
running the self-tests has ~119.7M branch instructions on my Ampere. The total
instruction count is also identical (~615M).
I measured like this:
perf stat -e
branches,branch-misses,cache-references,cache-misses,instructions,cycles --
modprobe nova_core
So I think the branching argument is not a strong one. I also did more
measurements and the dominant time taken is MMIO. During the map prep and
execute, page table walks are done. A TLB flush alone costs ~1.4 microseconds.
And PRAMIN BAR0 writes to write the PTE is also about 1 microsecond. Considering
this, I don't think the extra branching argument holds (even without branch
prediction and speculation).
Also some branches cannot be eliminated even with parameterization:
if level == self.mmu_version.dual_pde_level() {
// 128-bit dual PDE read
} else {
// Regular 64-bit PDE read
}
This isn't really a version branch -- it's a structural branch that
distinguishes between 64-bit PDE and 128-bit dual PDE entries. Any MMU
version with a dual PDE level would need this same distinction.
I also did code-generation size analysis (see diff of code used below):
Code generation analysis:
Module .ko size: Before: 511,792 bytes After: 524,464 bytes (+2.5%)
.text section: Before: 112,620 bytes After: 116,628 bytes (+4,008 bytes)
The +4K .text growth is the monomorphization cost: every generic function
is compiled twice (once for MmuV2, once for MmuV3).
> If you use a trait here, and make `PtWalk` generic against it, you can
> optimize this away. We had a similar situation when we introduced Turing
> support and the v2 ucode header, and tried both approaches: the
> trait-based one was slightly shorter, and arguably more readable.
Actually I was the one who suggested traits for Falcon ucode descriptor if you
see this thread [1]. So basically you and Eliot are telling me to do what I
suggested in [1]. :-) However, I disagree that it is the right choice for this code.
[1] https://lore.kernel.org/all/20251117231028.GA1095236@joelbox2/
I think the two cases are quite different in complexity:
The falcon ucode descriptor is essentially a set of flat field accessors
and a few params (imem_sec_load_params, dmem_load_params).
The trait has ~10 simple getter methods. There's no multi-level hierarchy,
no walker, and no generic propagation.
The MMU page table case is structurally different. Making PtWalk generic
over an Mmu trait would require:
- PtWalk<M: Mmu> (the walker)
- Plus all the associated types: M::Pte, M::Pde, M::DualPde each
needing their own trait bounds
And we would also need:
- Vmm<M: Mmu> (which creates PtWalk)
- BarUser<M: Mmu> (which creates Vmm)
I am also against making Vmm an enum as Eliot suggested:
enum Vmm {
V2(VmmInner<MmuV2>),
V3(VmmInner<MmuV3>),
}
That moves the version complexity up to the reader. Code complexity IMO should
decrease as we go up abstractions, making it easier for users (Vmm/Bar).
If you look at the the changes in vmm.rs to handle version dispatch there [2]:
Added: +109
Removed: -28
[2]
https://github.com/Edgeworth/linux/commit/3627af550b61256184d589e7ec666c1108971f0e
The main benefit of my approach is version-specific dispatch complexity is
completely isolated inside MmuVersion thus making the code outside of
pagetable.rs much more readable, without having to parametrize anything, and
without code size increase. I think that is worth considering.
> But the main argument to use a trait here IMO is that it enables
> associated types and constants. That's particularly critical since some
> equivalent fields have different lengths between v2 and v3. An
> associated `Bounded` type for these would force the caller to validate
> the length of these fields before calling a non-fallible operation,
> which is exactly the level of caution that we want when dealing with
> page tables.
I think Bounded validation is orthogonal to the dispatch model.
We can add Bounded to the current design without restructuring
into traits. For example:
// In ver2::Pte
pub fn new_vram(pfn: Bounded<Pfn, 25>, writable: bool) -> Self { ... }
// In ver3::Pte
pub fn new_vram(pfn: Bounded<Pfn, 40>, writable: bool) -> Self { ... }
The unified Pte enum wrapper already dispatches to the correct
version-specific constructor, which would enforce the correct Bounded
constraint for that version.
> In order to fully benefit from it, we will need the bitfield macro from
> the `kernel` crate so the PDE/PTE fields can be `Bounded`, I will try to
> make it available quickly in a patch that you can depend on.
That would be great, and I'd be happy to integrate Bounded validation once
the macro is available. I just don't think we need to restructure the
dispatch model in order to benefit from it.
> But long story short, and although I need to dive deeper into the code,
> this looks like a good candidate for using a trait and associated types.
The walker code (walk.rs) is already version-agnostic and reads cleanly.
The version dispatch is encapsulated behind method calls, not exposed as
inline if/else blocks.
Generic propagation (or version-specific dispatch at higher levels) adds more
complexity at higher layers.
Enclosed below [3] is the diff I used for my testing with the data, I don't
really see a net readability win there (IMO, it is a net-loss in readability).
[3]
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=trait-pt-dispatch&id=5eb0e98af11ba608ff4d0f7a06065ee863f5066a
thanks,
--
Joel Fernandes
^ permalink raw reply
* [PATCH v13 35/36] docs/dyndbg: add classmap info to howto
From: Jim Cromie @ 2026-04-08 20:02 UTC (permalink / raw)
To: linux-kernel; +Cc: gregkh, jbaron, louis.chauvet, Jim Cromie, linux-doc
In-Reply-To: <20260408200211.43821-1-jim.cromie@gmail.com>
Describe the 3 API macros providing dynamic_debug's classmaps
DYNAMIC_DEBUG_CLASSMAP_DEFINE - create & export a classmap
DYNAMIC_DEBUG_CLASSMAP_USE - refer to exported map
DYNAMIC_DEBUG_CLASSMAP_PARAM - bind control param to the classmap
DYNAMIC_DEBUG_CLASSMAP_PARAM_REF + use module's storage - __drm_debug
NB: The _DEFINE & _USE model makes the user dependent on the definer,
just like EXPORT_SYMBOL(__drm_debug) already does.
cc: linux-doc@vger.kernel.org
Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
---
.../admin-guide/dynamic-debug-howto.rst | 132 ++++++++++++++++--
1 file changed, 122 insertions(+), 10 deletions(-)
diff --git a/Documentation/admin-guide/dynamic-debug-howto.rst b/Documentation/admin-guide/dynamic-debug-howto.rst
index 0a42b9de55ac..734be0b5fe9a 100644
--- a/Documentation/admin-guide/dynamic-debug-howto.rst
+++ b/Documentation/admin-guide/dynamic-debug-howto.rst
@@ -146,6 +146,9 @@ keywords are::
"1-30" is valid range but "1 - 30" is not.
+Keywords
+--------
+
The meanings of each keyword are:
func
@@ -194,16 +197,6 @@ format
format "nfsd: SETATTR" // a neater way to match a format with whitespace
format 'nfsd: SETATTR' // yet another way to match a format with whitespace
-class
- The given class_name is validated against each module, which may
- have declared a list of known class_names. If the class_name is
- found for a module, callsite & class matching and adjustment
- proceeds. Examples::
-
- class DRM_UT_KMS # a DRM.debug category
- class JUNK # silent non-match
- // class TLD_* # NOTICE: no wildcard in class names
-
line
The given line number or range of line numbers is compared
against the line number of each ``pr_debug()`` callsite. A single
@@ -218,6 +211,25 @@ line
line -1605 // the 1605 lines from line 1 to line 1605
line 1600- // all lines from line 1600 to the end of the file
+class
+
+ The given class_name is validated against each module, which may
+ have declared a list of class_names it accepts. If the class_name
+ accepted by a module, callsite & class matching and adjustment
+ proceeds. Examples::
+
+ class DRM_UT_KMS # a drm.debug category
+ class JUNK # silent non-match
+ // class TLD_* # NOTICE: no wildcard in class names
+
+.. note::
+
+ Unlike other keywords, classes are "name-to-change", not
+ "omitting-constraint-allows-change". See Dynamic Debug Classmaps
+
+Flags
+-----
+
The flags specification comprises a change operation followed
by one or more flag characters. The change operation is one
of the characters::
@@ -239,6 +251,11 @@ The flags are::
l Include line number
d Include call trace
+.. note::
+
+ * To query without changing ``+_`` or ``-_``.
+ * To clear all flags ``=_`` or ``-fslmpt``.
+
For ``print_hex_dump_debug()`` and ``print_hex_dump_bytes()``, only
the ``p`` flag has meaning, other flags are ignored.
@@ -395,3 +412,98 @@ just a shortcut for ``print_hex_dump(KERN_DEBUG)``.
For ``print_hex_dump_debug()``/``print_hex_dump_bytes()``, format string is
its ``prefix_str`` argument, if it is constant string; or ``hexdump``
in case ``prefix_str`` is built dynamically.
+
+.. _dyndbg-classmaps:
+
+Dynamic Debug Classmaps
+=======================
+
+The "class" keyword selects prdbgs based on author supplied,
+domain-oriented names. This complements the nested-scope keywords:
+module, file, function, line.
+
+The main difference from the others: classes must be named to be
+changed. This protects them from unintended overwrite::
+
+ # IOW this cannot undo any drm.debug settings
+ :#> ddcmd -p
+
+This protection is needed; /sys/module/drm/parameters/debug is ABI.
+drm.debug is authoritative when dyndbg is not used, dyndbg-under-DRM
+is an implementation detail, and must not behave erratically, just
+because another admin fed >control something unrelated.
+
+So each class must be enabled individually (no wildcards)::
+
+ :#> ddcmd class DRM_UT_CORE +p
+ :#> ddcmd class DRM_UT_KMS +p
+ # or more selectively
+ :#> ddcmd class DRM_UT_CORE module drm +p
+
+That makes direct >control wordy and annoying, but it is a secondary
+interface; it is not intended to replace the ABI, just slide in
+underneath and reimplement the guaranteed behavior. So DRM would keep
+using the convenient way, and be able to trust it::
+
+ :#> echo 0x1ff > /sys/module/drm/parameters/debug
+
+That said, since the sysfs/kparam is the ABI, if the author omits the
+CLASSMAP_PARAM, theres no ABI to guard, and he probably wants a less
+pedantic >control interface. In this case, protection is dropped.
+
+Dynamic Debug Classmap API
+==========================
+
+DYNAMIC_DEBUG_CLASSMAP_DEFINE(clname,type,_base,classnames) - this maps
+classnames (a list of strings) onto class-ids consecutively, starting
+at _base.
+
+DYNAMIC_DEBUG_CLASSMAP_USE(clname) & _USE_(clname,_base) - modules
+call this to refer to the var _DEFINEd elsewhere (and exported).
+
+DYNAMIC_DEBUG_CLASSMAP_PARAM(clname) - creates the sysfs/kparam,
+maps/exposes bits 0..N as class-names.
+
+Classmaps are opt-in: modules invoke _DEFINE or _USE to authorize
+dyndbg to update those named classes. "class FOO" queries are
+validated against the classes defined or used by the module, this
+finds the classid to alter; classes are not directly selectable by
+their classid.
+
+Classnames are global in scope, so subsystems (module-groups) should
+prepend a subsystem name; unqualified names like "CORE" are discouraged.
+
+NB: It is an inherent API limitation (due to class_id's int type) that
+the following are possible:
+
+ // these errors should be caught in review
+ __pr_debug_cls(0, "fake DRM_UT_CORE msg"); // this works
+ __pr_debug_cls(62, "un-known classid msg"); // this compiles, does nothing
+
+There are 2 types of classmaps:
+
+* DD_CLASS_TYPE_DISJOINT_BITS: classes are independent, like drm.debug
+* DD_CLASS_TYPE_LEVEL_NUM: classes are relative, ordered (V3 > V2)
+
+DYNAMIC_DEBUG_CLASSMAP_PARAM - modelled after module_param_cb, it
+refers to a DEFINEd classmap, and associates it to the param's
+data-store. This state is then applied to DEFINEr and USEr modules
+when they're modprobed.
+
+The PARAM interface also enforces the DD_CLASS_TYPE_LEVEL_NUM relation
+amongst the contained classnames; all classes are independent in the
+control parser itself. There is no implied meaning in names like "V4"
+or "PL_ERROR" vs "PL_WARNING".
+
+Modules or subsystems (drm & drivers) can define multiple classmaps,
+as long as they (all the classmaps) share the limited 0..62
+per-module-group _class_id range, without overlap.
+
+If a module encounters a conflict between 2 classmaps it is _USEing or
+_DEFINEing, it can invoke the extended _USE_(name,_base) macro to
+de-conflict the respective ranges.
+
+``#define DEBUG`` will enable all pr_debugs in scope, including any
+class'd ones. This won't be reflected in the PARAM readback value,
+but the class'd pr_debug callsites can be forced off by toggling the
+classmap-kparam all-on then all-off.
--
2.53.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox