Linux Documentation
 help / color / mirror / Atom feed
* Re: [PATCH v10 12/21] gpu: nova-core: mm: Add unified page table entry wrapper enums
From: Gary Guo @ 2026-04-09 11:02 UTC (permalink / raw)
  To: Joel Fernandes, Alexandre Courbot, Eliot Courtney,
	Danilo Krummrich
  Cc: linux-kernel, Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alex Gaynor, Boqun Feng, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi, joel,
	linux-doc, amd-gfx, intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <2f004511-61d1-4197-84b6-cddcdd275e55@nvidia.com>

On Wed Apr 8, 2026 at 9:19 PM BST, Joel Fernandes wrote:
> Hi Alex, Eliot, Danilo,
>
> Thanks for taking a look. Let me respond to the specific points below.
>
> On Wed, 08 Apr 2026, Alexandre Courbot wrote:
>> After a quick look I'd say that having a trait here would actually be
>> *good* for correctness and maintainability.
>>
>> The current design implies that every operation on a page table (most
>> likely using the walker) goes through a branching point. Just looking at
>> `PtWalk::read_pte_at_level`, there are already at least 6
>> `if version == 2 { } else { }` branches that all resolve to the same
>> result. Include walking down the PDEs and you have at least a dozen of
>> these just to resolve a virtual address. I know CPUs are fast, but this
>> is still wasted cycles for no good reason.
>
> I did some measurements and there is no notieceable difference in both
> approaches. I ran perf and loaded nova with self-tests running. The extra
> potential branching is lost in the noise. In both cases, loading nova and
> running the self-tests has ~119.7M branch instructions on my Ampere. The total
> instruction count is also identical (~615M).
>
> I measured like this:
> perf stat -e
> branches,branch-misses,cache-references,cache-misses,instructions,cycles --
> modprobe nova_core
>
> So I think the branching argument is not a strong one. I also did more
> measurements and the dominant time taken is MMIO. During the map prep and
> execute, page table walks are done. A TLB flush alone costs ~1.4 microseconds.
> And PRAMIN BAR0 writes to write the PTE is also about 1 microsecond. Considering
> this, I don't think the extra branching argument holds (even without branch
> prediction and speculation).
>
> Also some branches cannot be eliminated even with parameterization:
>
>     if level == self.mmu_version.dual_pde_level() {
>         // 128-bit dual PDE read
>     } else {
>         // Regular 64-bit PDE read
>     }
>
> This isn't really a version branch -- it's a structural branch that
> distinguishes between 64-bit PDE and 128-bit dual PDE entries. Any MMU
> version with a dual PDE level would need this same distinction.
>
> I also did code-generation size analysis (see diff of code used below):
>
> Code generation analysis:
>
>   Module .ko size:   Before: 511,792 bytes   After: 524,464 bytes  (+2.5%)
>   .text section:     Before: 112,620 bytes   After: 116,628 bytes  (+4,008 bytes)
>
>   The +4K .text growth is the monomorphization cost: every generic function
>   is compiled twice (once for MmuV2, once for MmuV3).
>
>> If you use a trait here, and make `PtWalk` generic against it, you can
>> optimize this away. We had a similar situation when we introduced Turing
>> support and the v2 ucode header, and tried both approaches: the
>> trait-based one was slightly shorter, and arguably more readable.
>
> Actually I was the one who suggested traits for Falcon ucode descriptor if you
> see this thread [1]. So basically you and Eliot are telling me to do what I
> suggested in [1]. :-) However, I disagree that it is the right choice for this code.
>
> [1] https://lore.kernel.org/all/20251117231028.GA1095236@joelbox2/
>
> I think the two cases are quite different in complexity:
>
> The falcon ucode descriptor is essentially a set of flat field accessors
> and a few params (imem_sec_load_params, dmem_load_params).
> The trait has ~10 simple getter methods. There's no multi-level hierarchy,
> no walker, and no generic propagation.
>
> The MMU page table case is structurally different. Making PtWalk generic
> over an Mmu trait would require:
>
>   - PtWalk<M: Mmu> (the walker)
>   - Plus all the associated types: M::Pte, M::Pde, M::DualPde each
>     needing their own trait bounds
>
> And we would also need:
>   - Vmm<M: Mmu> (which creates PtWalk)
>   - BarUser<M: Mmu> (which creates Vmm)
>
> I am also against making Vmm an enum as Eliot suggested:
>        enum Vmm {
>            V2(VmmInner<MmuV2>),
>            V3(VmmInner<MmuV3>),
>        }
>
> That moves the version complexity up to the reader. Code complexity IMO should
> decrease as we go up abstractions, making it easier for users (Vmm/Bar).
>
> If you look at the the changes in vmm.rs to handle version dispatch there [2]:
> Added: +109
> Removed: -28
>
> [2]
> https://github.com/Edgeworth/linux/commit/3627af550b61256184d589e7ec666c1108971f0e
>
> The main benefit of my approach is version-specific dispatch complexity is
> completely isolated inside MmuVersion thus making the code outside of
> pagetable.rs much more readable, without having to parametrize anything, and
> without code size increase. I think that is worth considering.
>
>> But the main argument to use a trait here IMO is that it enables
>> associated types and constants. That's particularly critical since some
>> equivalent fields have different lengths between v2 and v3. An
>> associated `Bounded` type for these would force the caller to validate
>> the length of these fields before calling a non-fallible operation,
>> which is exactly the level of caution that we want when dealing with
>> page tables.
>
> I think Bounded validation is orthogonal to the dispatch model.
> We can add Bounded to the current design without restructuring
> into traits. For example:
>
>     // In ver2::Pte
>     pub fn new_vram(pfn: Bounded<Pfn, 25>, writable: bool) -> Self { ... }
>
>     // In ver3::Pte
>     pub fn new_vram(pfn: Bounded<Pfn, 40>, writable: bool) -> Self { ... }
>
> The unified Pte enum wrapper already dispatches to the correct
> version-specific constructor, which would enforce the correct Bounded
> constraint for that version.
>
>> In order to fully benefit from it, we will need the bitfield macro from
>> the `kernel` crate so the PDE/PTE fields can be `Bounded`, I will try to
>> make it available quickly in a patch that you can depend on.
>
> That would be great, and I'd be happy to integrate Bounded validation once
> the macro is available. I just don't think we need to restructure the
> dispatch model in order to benefit from it.
>
>> But long story short, and although I need to dive deeper into the code,
>> this looks like a good candidate for using a trait and associated types.
>
> The walker code (walk.rs) is already version-agnostic and reads cleanly.
> The version dispatch is encapsulated behind method calls, not exposed as
> inline if/else blocks.
>
> Generic propagation (or version-specific dispatch at higher levels) adds more
> complexity at higher layers.
>
> Enclosed below [3] is the diff I used for my testing with the data, I don't
> really see a net readability win there (IMO, it is a net-loss in readability).
>
> [3]
> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=trait-pt-dispatch&id=5eb0e98af11ba608ff4d0f7a06065ee863f5066a

IMO this diff is quite has got me quite in favour of trait approach.

I wanted about to purpose something similar (or maybe I had already?) trait
approach some versions ago but didn't due to the eventual need of `match` like
dispatch (like you had with `vmm_dispatch`), but your code made that looks not
as bad as I thought it would be.

Best,
Gary

^ permalink raw reply

* Re: [PATCH v10 12/21] gpu: nova-core: mm: Add unified page table entry wrapper enums
From: Danilo Krummrich @ 2026-04-09 11:00 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: John Hubbard, Eliot Courtney, linux-kernel, Miguel Ojeda,
	Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Dave Airlie,
	Daniel Almeida, Koen Koning, dri-devel, rust-for-linux,
	Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Vivi Rodrigo, Tvrtko Ursulin, Rui Huang, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alex Gaynor, Boqun Feng, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, Alexey Ivanov,
	linux-doc, amd-gfx, intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <1775730646.3752.4760@nvidia.com>

On Thu Apr 9, 2026 at 12:33 PM CEST, Joel Fernandes wrote:
> Since it is 3 against 1 here, I rest my case :-).

That's not how I'd view it. :)

Anyways, in case I'm included in "3", that's not my position. My point was to
ensure we keep discussing advantages and disadvantages on their merits, as I
think you both have good points.

> I am still in disagreement since I do not see much benefit (that is why I said
> pointless above).

That is fair -- in this case please explain why the advantages pointed out by
others are not worth it, propose something that picks up the best of both
worlds, etc.

You can also turn it around and ask people whether they can tweak their counter
proposal to get rid of specific parts you dislike for a reason.

IOW, keep the ball rolling, so we can come up with the best possible solution.

^ permalink raw reply

* Re: [PATCH v10 12/21] gpu: nova-core: mm: Add unified page table entry wrapper enums
From: Alexandre Courbot @ 2026-04-09 10:56 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Eliot Courtney, Danilo Krummrich, linux-kernel, Miguel Ojeda,
	Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Dave Airlie,
	Daniel Almeida, Koen Koning, dri-devel, rust-for-linux,
	Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alex Gaynor, Boqun Feng, John Hubbard, Alistair Popple,
	Timur Tabi, Edwin Peer, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi, joel,
	linux-doc, amd-gfx, intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <2f004511-61d1-4197-84b6-cddcdd275e55@nvidia.com>

On Thu Apr 9, 2026 at 5:19 AM JST, Joel Fernandes wrote:
> Hi Alex, Eliot, Danilo,
>
> Thanks for taking a look. Let me respond to the specific points below.
>
> On Wed, 08 Apr 2026, Alexandre Courbot wrote:
>> After a quick look I'd say that having a trait here would actually be
>> *good* for correctness and maintainability.
>>
>> The current design implies that every operation on a page table (most
>> likely using the walker) goes through a branching point. Just looking at
>> `PtWalk::read_pte_at_level`, there are already at least 6
>> `if version == 2 { } else { }` branches that all resolve to the same
>> result. Include walking down the PDEs and you have at least a dozen of
>> these just to resolve a virtual address. I know CPUs are fast, but this
>> is still wasted cycles for no good reason.
>
> I did some measurements and there is no notieceable difference in both
> approaches. I ran perf and loaded nova with self-tests running. The extra
> potential branching is lost in the noise. In both cases, loading nova and
> running the self-tests has ~119.7M branch instructions on my Ampere. The total
> instruction count is also identical (~615M).

That's expected - as I said, CPUs are fast - and that's also not my
point. My issue is that we are doing countless tests that all resolve to
the code path, a code path that is already known during probe time.
That's a huge code smell.

When we create the GPU, we know whether we will be using v2 or v3 page
tables. That we need to test that again 12 times per address resolution
is a design issue, irrespective of performance. There are 24 version
match sites in patch 12 alone.

And that's precisely a good justification for using monomorphization. v2
and v3 are technically two different page table implementations (they
even have their own distinct module in your series), we just use
generics to factorize the (source) code a bit.

>
> I measured like this:
> perf stat -e
> branches,branch-misses,cache-references,cache-misses,instructions,cycles --
> modprobe nova_core
>
> So I think the branching argument is not a strong one. I also did more
> measurements and the dominant time taken is MMIO. During the map prep and
> execute, page table walks are done. A TLB flush alone costs ~1.4 microseconds.
> And PRAMIN BAR0 writes to write the PTE is also about 1 microsecond. Considering
> this, I don't think the extra branching argument holds (even without branch
> prediction and speculation).
>
> Also some branches cannot be eliminated even with parameterization:
>
>     if level == self.mmu_version.dual_pde_level() {
>         // 128-bit dual PDE read
>     } else {
>         // Regular 64-bit PDE read
>     }
>
> This isn't really a version branch -- it's a structural branch that
> distinguishes between 64-bit PDE and 128-bit dual PDE entries. Any MMU
> version with a dual PDE level would need this same distinction.

The dual PDE level should be an associated constant - you still need to
do the test, but note that you would also do it if there was only a
single page table version. It's orthogonal to whether we use a trait or
not here.

>
> I also did code-generation size analysis (see diff of code used below):
>
> Code generation analysis:
>
>   Module .ko size:   Before: 511,792 bytes   After: 524,464 bytes  (+2.5%)
>   .text section:     Before: 112,620 bytes   After: 116,628 bytes  (+4,008 bytes)
>
>   The +4K .text growth is the monomorphization cost: every generic function
>   is compiled twice (once for MmuV2, once for MmuV3).

I would say this is working as intended then.

>
>> If you use a trait here, and make `PtWalk` generic against it, you can
>> optimize this away. We had a similar situation when we introduced Turing
>> support and the v2 ucode header, and tried both approaches: the
>> trait-based one was slightly shorter, and arguably more readable.
>
> Actually I was the one who suggested traits for Falcon ucode descriptor if you
> see this thread [1]. So basically you and Eliot are telling me to do what I
> suggested in [1]. :-) However, I disagree that it is the right choice for this code.
>
> [1] https://lore.kernel.org/all/20251117231028.GA1095236@joelbox2/
>
> I think the two cases are quite different in complexity:

Exactly. The complexity is different (this one involves multiple traits
and associated types) but the pattern is the same - and that's a pattern
traits are designed to address. If we were supposed to stop applying it
when things go beyond a certain level of complexity, the conceptors of
Rust would not have bothered addings things like associated types.

These traits are nothing new, they simply formalize a reality that
already exists in your code, which is that each version of the page
table needs to implement a given set of methods. It's already there with
the version doing dispatches, only it is not articulated clearly to the
reader. So in that respect, having traits make the code *more* readable
imho.

>
> The falcon ucode descriptor is essentially a set of flat field accessors
> and a few params (imem_sec_load_params, dmem_load_params).
> The trait has ~10 simple getter methods. There's no multi-level hierarchy,
> no walker, and no generic propagation.
>
> The MMU page table case is structurally different. Making PtWalk generic
> over an Mmu trait would require:
>
>   - PtWalk<M: Mmu> (the walker)
>   - Plus all the associated types: M::Pte, M::Pde, M::DualPde each
>     needing their own trait bounds
>
> And we would also need:
>   - Vmm<M: Mmu> (which creates PtWalk)
>   - BarUser<M: Mmu> (which creates Vmm)
>
> I am also against making Vmm an enum as Eliot suggested:
>        enum Vmm {
>            V2(VmmInner<MmuV2>),
>            V3(VmmInner<MmuV3>),
>        }
>
> That moves the version complexity up to the reader. Code complexity IMO should
> decrease as we go up abstractions, making it easier for users (Vmm/Bar).
>
> If you look at the the changes in vmm.rs to handle version dispatch there [2]:
> Added: +109
> Removed: -28
>
> [2]
> https://github.com/Edgeworth/linux/commit/3627af550b61256184d589e7ec666c1108971f0e
>
> The main benefit of my approach is version-specific dispatch complexity is
> completely isolated inside MmuVersion thus making the code outside of
> pagetable.rs much more readable, without having to parametrize anything, and
> without code size increase. I think that is worth considering.
>
>> But the main argument to use a trait here IMO is that it enables
>> associated types and constants. That's particularly critical since some
>> equivalent fields have different lengths between v2 and v3. An
>> associated `Bounded` type for these would force the caller to validate
>> the length of these fields before calling a non-fallible operation,
>> which is exactly the level of caution that we want when dealing with
>> page tables.
>
> I think Bounded validation is orthogonal to the dispatch model.
> We can add Bounded to the current design without restructuring
> into traits. For example:
>
>     // In ver2::Pte
>     pub fn new_vram(pfn: Bounded<Pfn, 25>, writable: bool) -> Self { ... }
>
>     // In ver3::Pte
>     pub fn new_vram(pfn: Bounded<Pfn, 40>, writable: bool) -> Self { ... }
>
> The unified Pte enum wrapper already dispatches to the correct
> version-specific constructor, which would enforce the correct Bounded
> constraint for that version.

But then what type does the `new_vram` dispatch method take? Generic
code lets us expose the expected `Bounded` type to the caller, which can
do the proper validation. This is a small example, but I expect this
pattern to come up in other parts of the code as well.

>
>> In order to fully benefit from it, we will need the bitfield macro from
>> the `kernel` crate so the PDE/PTE fields can be `Bounded`, I will try to
>> make it available quickly in a patch that you can depend on.
>
> That would be great, and I'd be happy to integrate Bounded validation once
> the macro is available. I just don't think we need to restructure the
> dispatch model in order to benefit from it.

I'll finish the series and hopefully send it a bit later today. That's
another significant rework for the series (sorry about that) but it
should be worth the effort for the added correctness.

^ permalink raw reply

* Re: [PATCH v10 12/21] gpu: nova-core: mm: Add unified page table entry wrapper enums
From: Joel Fernandes @ 2026-04-09 10:33 UTC (permalink / raw)
  To: John Hubbard
  Cc: Joel Fernandes, Eliot Courtney, linux-kernel, Miguel Ojeda,
	Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Vivi Rodrigo, Tvrtko Ursulin, Rui Huang, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alex Gaynor, Boqun Feng, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, Alexey Ivanov,
	linux-doc, amd-gfx, intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <42dd707f-e23a-4725-8b6f-08ca346b0143@nvidia.com>

> On Apr 8, 2026, at 7:13 PM, John Hubbard <jhubbard@nvidia.com> wrote:
> 
> On 4/8/26 9:58 AM, Joel Fernandes wrote:
>>> On 4/8/2026 9:26 AM, Eliot Courtney wrote:
>>> On Tue Apr 7, 2026 at 10:59 PM JST, Joel Fernandes wrote:
>>>> On 4/7/2026 9:42 AM, Eliot Courtney wrote:
>>>>> On Tue Apr 7, 2026 at 6:55 AM JST, Joel Fernandes wrote:
> ...>> [1]: https://github.com/Edgeworth/linux/commits/review/nova-mm-v10/
>> First, thanks for the effort. I looked through this, its pretty much what I
>> had before when I used traits. I don't think it is better to be honest. In
>> fact your version is worse, it adds many new types and things like the
>> following which I did not need before.
> 
> Hi Joel and all,
> 
> I also looked through Eliot's above attempt carefully, and actually
> liked it a lot (sorry! haha):
> 
> * It cleans up the code. The initial working version was readable, but
>  also had lots of noise on the screen: match statements and pairs of
>  v2/v3 statements.
> 
>  And interestingly, the mmu_version was, in effect, sporadically
>  implementing a Trait-based approach. But because it is custom,
>  readers don't benefit as much as they would with Traits, which
>  tell you immediately how things are structured.
> 
> Joel, I am passionately in agreement with your principles: code must
> be readable on the screen.
> 
> In this case, though, Traits make considerably more readable,
> especially if one makes the very reasonable assumption that readers are
> thoroughly accustomed to dealing with Rust traits.
> 
>> 
>> To put it mildly, the following suggestion should not be anywhere near my code:
>> 
> 
> lol I understand, believe me. But this is short and not too bad, really.
> 
>> /// Type-erased MMU-specific [`Vmm`] implementations.
> 
> Type erasure remains a semi-exotic thing, IMHO. As such, another
> sentence to elaborate on this would be a nice touch.
> 
>> enum VmmInner {
>>    /// `Vmm` implementation for MMU v2.
>>    V2(VmmImpl<MmuV2>),
>>    /// `Vmm` implementation for MMU v3.
>>    V3(VmmImpl<MmuV3>),
>> }
>> 
>> /// MMU-specific [`Vmm`] implementation.
>> struct VmmImpl<M: Mmu> {
>> 
>> Seriously, I have to pass on this. :-)
>> 
>> And, you unfortunately seem to have ignored my point about requiring 4 NEW
>> traits (Mmu, PteOps, PdeOps, DualPdeOps etc), which I did not need before.
>> So you're making the code much much worse than before actually. We don't
>> new traits and types pointlessly.
> 
> They are not pointless.
> 
> However! What I think would be nice is: do a new v11 with approximately
> this approach, and then we can beat it into being as readable as
> possible.

Since it is 3 against 1 here, I rest my case :-). I am still in
disagreement since I do not see much benefit (that is why I said
pointless above). Actually it is not even about readability, that is
subjective (and I haven’t heard most people say parametrizing code for
the sake of it makes it more readable anyway). It is that the code gen
is worse, and the complexity is just moved to a higher level in the
code, not removed. So what are we getting out of this really, other than
more boiler plate in higher layers of the code that did not exist
before? Not performance, not better generated code. Really nothing. See
all the data points in my previous reply.

Note that if the mmu version threading bothers everyone so much, we can
also pass down chipset instead and let the walker deal with determining
versioning. Would that be better?

But otherwise and since you guys asked, here comes a parameterized v11... ;-).
  (Coming next week since this week I’m working on IRQ handling).

thanks,

-- 
Joel Fernandes

^ permalink raw reply

* Re: [PATCH 3/4] docs/zh_CN: update rust/quick-start.rst translation
From: Gary Guo @ 2026-04-09 10:03 UTC (permalink / raw)
  To: Dongliang Mu, Gary Guo, Ben Guo, Alex Shi, Yanteng Si,
	Jonathan Corbet
  Cc: linux-doc, linux-kernel, rust-for-linux
In-Reply-To: <d7e81015-f17e-4ab9-a9e5-d2ac6dd82e7b@hust.edu.cn>

On Thu Apr 9, 2026 at 6:37 AM BST, Dongliang Mu wrote:
>
> On 4/9/26 1:43 AM, Gary Guo wrote:
>> On Wed Apr 8, 2026 at 5:51 PM BST, Ben Guo wrote:
>>> On 4/8/26 7:33 PM, Gary Guo wrote:
>>>> Hi Ben,
>>>>
>>>> Thanks on updating the doc translation. There has been new changes to
>>>> quick-start.rst on rust-next, could you update the translation to base on that
>>>> please?
>>>>
>>>> Thanks,
>>>> Gary
>>> Hi Gary,
>>>    
>>>
>>>    
>>>    
>>>
>>> Thanks for the review. This series is based on the Chinese documentation
>>> maintainer's tree (alexs/linux.git docs-next), which does not yet have
>>> the latest quick-start.rst changes from the Rust-for-Linux rust-next
>>> tree.
>>>
>>> Would it be better to wait until those changes land in our base tree
>>> and then resend with the updated translation? Or would you prefer a
>>> different approach?
>>>
>>> Thanks,
>>> Ben
>> I don't see the issue of sending translation of the latest quick-start.rst even
>> if it's not in your base yet. By the time the changes land upstream, the
>> original quick-start.rst would already be there.
>
> Hi Gary,
>
> Let’s wait for the rust-next changes to land upstream first, then I’ll 
> ask Ben Guo to sync that commit. Otherwise, the Chinese translation 
> would do not match the original English doc, which will confuse readers.
>
> We have checktransupdate.py in place for monitoring the updates in 
> English documents.
>
> Dongliang Mu

Given that you have tools to catch this, I'm also okay with this patch landing
as is, with a follow up translation when the new quick-start.rst lands upstream.

Acked-by: Gary Guo <gary@garyguo.net> # Rust

Thanks,
Gary

^ permalink raw reply

* Re: [PATCH net-next v2 05/14] libie: add bookkeeping support for control queue messages
From: Paolo Abeni @ 2026-04-09  9:07 UTC (permalink / raw)
  To: Tony Nguyen, davem, kuba, edumazet, andrew+netdev, netdev
  Cc: Phani R Burra, larysa.zaremba, przemyslaw.kitszel,
	aleksander.lobakin, sridhar.samudrala, anjali.singhai,
	michal.swiatkowski, maciej.fijalkowski, emil.s.tantilov,
	madhu.chittim, joshua.a.hay, jacob.e.keller,
	jayaprakash.shanmugam, jiri, horms, corbet, richardcochran,
	linux-doc, Bharath R, Samuel Salin, Aleksandr Loktionov
In-Reply-To: <20260403194938.3577011-6-anthony.l.nguyen@intel.com>

On 4/3/26 9:49 PM, Tony Nguyen wrote:
> +static bool
> +libie_ctlq_xn_process_recv(struct libie_ctlq_xn_recv_params *params,
> +			   struct libie_ctlq_msg *ctlq_msg)
> +{
> +	struct libie_ctlq_xn_manager *xnm = params->xnm;
> +	struct libie_ctlq_xn *xn;
> +	u16 msg_cookie, xn_index;
> +	struct kvec *response;
> +	int status;
> +	u16 data;
> +
> +	data = ctlq_msg->sw_cookie;
> +	xn_index = FIELD_GET(LIBIE_CTLQ_XN_INDEX_M, data);
> +	msg_cookie = FIELD_GET(LIBIE_CTLQ_XN_COOKIE_M, data);
> +	status = ctlq_msg->chnl_retval ? -EFAULT : 0;
> +
> +	xn = &xnm->ring[xn_index];
> +	if (ctlq_msg->chnl_opcode != xn->virtchnl_opcode ||
> +	    msg_cookie != xn->cookie)
> +		return false;
> +
> +	spin_lock(&xn->xn_lock);

Sashiko says:

---
Because the cookie and opcode are checked before acquiring the lock, is
it possible for the transaction to time out, be returned to the free
list, and get reallocated for a new message before the lock is acquired?
If that happens, could the old delayed response falsely complete the
newly allocated transaction since the identifiers are not re-verified
inside the lock?
---

> +/**
> + * libie_xn_check_async_timeout - Check for asynchronous message timeouts
> + * @xnm: Xn transaction manager
> + *
> + * Call the corresponding callback to notify the caller about the timeout.
> + */
> +static void libie_xn_check_async_timeout(struct libie_ctlq_xn_manager *xnm)
> +{
> +	u32 idx;
> +
> +	for_each_clear_bit(idx, xnm->free_xns_bm, LIBIE_CTLQ_MAX_XN_ENTRIES) {

Sashiko says:

---
This iterates over the bitmap without holding the lock. Concurrently,
other paths modify this bitmap using non-atomic bitwise operations like
__clear_bit() and __set_bit() under the lock. Will this cause torn reads
or data races that might lead the timeout handler to skip valid
transactions or examine invalid ones?
---


> +		params->ctlq_msg->sw_cookie = cookie;
> +		params->ctlq_msg->send_mem = *dma_mem;
> +		params->ctlq_msg->data_len = buf_len;
> +		params->ctlq_msg->chnl_opcode = params->chnl_opcode;
> +		ret = libie_ctlq_send(params->ctlq, params->ctlq_msg, 1);
> +	}
> +
> +	if (ret && !libie_cp_can_send_onstack(buf_len))
> +		libie_cp_unmap_dma_mem(dev, dma_mem);

Sashiko says:

---
When libie_ctlq_send() fails here, the DMA memory is unmapped and the
buffer is freed by the caller. However, the software tracking ring at
tx_msg[next_to_use] still contains the populated send_mem details and a
non-zero data_len.

During driver teardown, libie_ctlq_xn_send_clean() is invoked with
params->force = true, which processes the ring without checking the
hardware completion bit. Could this cause the cleanup routine to process
the failed slot again, resulting in a double-free and double-unmap?
---

There are more remarks on the following patch, please have a look.

Also, it would be very helpful if you could help triaging such
(overwhelming amount of) feedback on future submissions, explicitly
commenting on the ML. Sashiko tends to be quite noise on device driver code.

Thanks,

Paolo


^ permalink raw reply

* Re: [PATCH v3 0/2] docs: advanced search with benchmark harness
From: Rito Rhymes @ 2026-04-09  9:04 UTC (permalink / raw)
  To: Randy Dunlap, Rito Rhymes; +Cc: linux-doc, linux-kernel
In-Reply-To: <b3c87ef0-5faf-4bda-90e4-e6b24419e0c0@infradead.org>

> I like it. I think it's useful -- the old search could give a bit too
> much output. The search result tabs (groups) are helpful.

Thanks for taking the time to test it out again and give feedback.
I'm glad you see potential utility for it.

> I mostly use 'grep' for searching Documentation/ and I expect lots
> of other developers also do that (if they bother to look).

That's definitely what I expect kernel hackers to default to.

I'd like to get a clearer sense of your perspective, it may represent
others too, and I can weigh it against my own assumptions here.

So my question framed for you is:

You know a particular concept you want to look up, but you do not know
the exact file, and related words repeat a lot across the source.

Could you imagine yourself going through grep results, not quickly
finding what you need, burning mental bandwidth and then deciding:
"let me just go on docs.kernel.org real quick, hit the advanced search,
and see what I find"?

Is that something you could actually see ever happening?

Maybe even, in that type of situation, eventually defaulting to that
mode first to avoid spending time scanning through noisy grep results.

Or is grep and staying in the terminal a comfortable enough place to
remain even when the results are not very fruitful and the time spent
there is not especially efficient?

Or does that situation just not come up often enough to justify a
separate mental workflow for it outside the grep norm?

> I do notice under the Pages tab that all of the pages listed say
> "Summary unavailable." I don't know what should be there instead of
> that message.

It's supposed to be populated with an excerpt from the page related
to the search criterion; 2-3 lines of text or so.

I encountered that same issue after an incremental rebuild, doing a
full rebuild fixed it.

Could you please confirm if it works after a full rebuild?

Rito

^ permalink raw reply

* Re: [PATCH net-next v2 02/14] libie: add PCI device initialization helpers to libie
From: Paolo Abeni @ 2026-04-09  8:56 UTC (permalink / raw)
  To: Tony Nguyen, davem, kuba, edumazet, andrew+netdev, netdev
  Cc: Phani R Burra, larysa.zaremba, przemyslaw.kitszel,
	aleksander.lobakin, sridhar.samudrala, anjali.singhai,
	michal.swiatkowski, maciej.fijalkowski, emil.s.tantilov,
	madhu.chittim, joshua.a.hay, jacob.e.keller,
	jayaprakash.shanmugam, jiri, horms, corbet, richardcochran,
	linux-doc, bhelgaas, linux-pci, Bharath R, Samuel Salin,
	Aleksandr Loktionov
In-Reply-To: <20260403194938.3577011-3-anthony.l.nguyen@intel.com>

On 4/3/26 9:49 PM, Tony Nguyen wrote:
> +	mr = libie_find_mmio_region(&mmio_info->mmio_list, offset, size,
> +				    bar_idx);
> +	if (mr) {
> +		pci_warn(pdev,
> +			 "Mapping of BAR%u (offset=%llu, size=%llu) intersecting region (offset=%llu, size=%llu) already exists\n",
> +			 bar_idx, (unsigned long long)mr->offset,
> +			 (unsigned long long)mr->size,
> +			 (unsigned long long)offset, (unsigned long long)size);
> +		return mr->offset <= offset &&
> +		       mr->offset + mr->size >= offset + size;

Sashiko says:

---
Does returning true here without creating a new tracking object leave
the new mapping tied to the original mapping's lifetime?
If the driver unmaps the original region, iounmap() is called and the
tracking object is freed. Any cached virtual address pointers to the
sub-region would then become a use-after-free, and subsequent queries
for the sub-region would fail.
---

/P


^ permalink raw reply

* Re: [PATCH] hwmon: (asus-ec-sensors) add ROG STRIX B650E-E GAMING WIFI
From: Eugene Shalygin @ 2026-04-09  8:18 UTC (permalink / raw)
  To: Veronika Kossmann
  Cc: Guenter Roeck, Veronika Kossmann, Veronika Kossmann,
	Jonathan Corbet, Shuah Khan, linux-hwmon, linux-doc, linux-kernel
In-Reply-To: <25bbdd98-656e-407a-ada7-da2bdacb1aea@rxtx.cx>

Hey Veronika,

On Wed, 8 Apr 2026 at 22:29, Veronika Kossmann <nanodesu@rxtx.cx> wrote:
>
> Of course:
>
> $sensors asusec-isa-000a
> asusec-isa-000a
> Adapter: ISA adapter
> CPU:          +37.0°C
> Motherboard:  +38.0°C
> VRM:          +51.0°C
>
> These are relevant to actual temperatures.
>

Thanks! So, there is no output for CPU current and chipset
temperature. Could you, please, test that CPU current displays
reasonable values with the following additional change:

diff --git a/asus-ec-sensors.c b/asus-ec-sensors.c
index 47e6c2db8b97..4a0b80012a6d 100644
--- a/asus-ec-sensors.c
+++ b/asus-ec-sensors.c
@@ -284,6 +284,7 @@ static const struct ec_sensor_info
sensors_family_amd_600[] = {
   EC_SENSOR("VRM", hwmon_temp, 1, 0x00, 0x33),
 [ec_sensor_temp_t_sensor] =
   EC_SENSOR("T_Sensor", hwmon_temp, 1, 0x00, 0x36),
+ [ec_sensor_curr_cpu] = EC_SENSOR("CPU", hwmon_curr, 1, 0x00, 0xf4),
 [ec_sensor_fan_cpu_opt] =
   EC_SENSOR("CPU_Opt", hwmon_fan, 2, 0x00, 0xb0),
 [ec_sensor_temp_water_in] =

At least it should correlate with CPU load.

And we need to replace SENSOR_SET_TEMP_CHIPSET_CPU_MB with
SENSOR_TEMP_CPU | SENSOR_TEMP_MB.

Cheers,
Eugene

^ permalink raw reply related

* Re: [PATCH mm-unstable v15 03/13] mm/khugepaged: generalize __collapse_huge_page_* for mTHP support
From: David Hildenbrand (Arm) @ 2026-04-09  8:14 UTC (permalink / raw)
  To: Nico Pache
  Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
	akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe
In-Reply-To: <CAA1CXcA8nE2PZrB4J1gV5v16PeQ7X2AiwjJ3gO1Q8hW7tyTtPQ@mail.gmail.com>

On 4/8/26 21:48, Nico Pache wrote:
> On Thu, Mar 12, 2026 at 2:56 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>> On 3/12/26 21:36, David Hildenbrand (Arm) wrote:
>>>
>>> Okay, now I am confused. Why are you not taking care of
>>> collapse_scan_pmd() in the same context?
>>>
>>> Because if you make sure that we properly check against a max_ptes_swap
>>> similar as in the style above, we'd rule out swapin right from the start?
>>>
>>> Also, I would expect that all other parameters in there are similarly
>>> handled?
>>>
>>
>> Okay, I think you should add the following:
> 
> Hey! Thanks for all your reviews here.
> 
> For multiple reasons, here is the solution I developed:
> 
> Add a patch before the generalize __collapse.. patch that reworks the
> max_ptes* handling and introduces the helpers (no functional changes).

I assume that's roughly the patch I shared below? If so, sounds good to me.

-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH 0/1] Documentation: leds: leds-class: Document keyboard backlight LED class naming
From: Kate Hsuan @ 2026-04-09  6:43 UTC (permalink / raw)
  To: Hans de Goede, Lee Jones, Pavel Machek, Jonathan Corbet,
	Shuah Khan
  Cc: Rishit Bansal, Carlos Ferreira, Edip Hazuri, Mustafa Ekşi,
	Xavier Bestel, linux-leds, linux-doc
In-Reply-To: <20260406174638.320135-1-johannes.goede@oss.qualcomm.com>

Hi Hans,

On 4/7/26 1:46 AM, Hans de Goede wrote:
> Hi All,
>
> Over the last couple of years there have been several attempts to add
> upstream kernel support for controlling keyboard backlights consisting of
> a small number of backlight zones, think e.g. : "main", "cursor" and
> "keypad" zones.
>
> All of these attempts have gotten or are stuck on the lack of consensus on
> a userspace API (1) for controlling such zoned keyboard backlights.
>
> Previous discussion can be summarized as there being consensus that
> these backlights should be represented as (multi-color) LED class devices
> with one LED class device per zone, mirroring the existing use of
> a LED class device for controlling single zone keyboard backlights.
>
> The only thing which really still needs to be agreed upon is a naming
> scheme for the per zone LED class devices so that userspace can detect:
>
> 1. That the function of these is to control a zoned keyboard backlight.
> 2. How to group the per zone devices together for a single keyboard.
>
> The single patch in this series documents the currently undocumented naming
> scheme for single zone keyboard backlights and extends this with a naming
> scheme to use for multi-zone keyboard backlights.
>
> This is send out as a separate patch rather then as part of a series
> implementing this in the hope to get multiple drivers which are in
> the process of being upstreamed unstuck wrt the LED class naming problem.
>
> Drivers which need this are:
>
> 1. HP WMI laptop driver Omen gaming keyboards backlight control support:
> First 2023 attempt:
> https://lore.kernel.org/platform-driver-x86/20230131235027.36304-1-rishitbansal0@gmail.com/
> Later 2024 attempt which includes an earlier version of this doc patch:
> https://lore.kernel.org/platform-driver-x86/20240719100011.16656-1-carlosmiguelferreira.2003@gmail.com/
> Current ongoing 2026 attempt:
> https://lore.kernel.org/platform-driver-x86/20260304105831.119349-3-edip@medip.dev/
>
> 2. Casper Excalibur laptop driver (inc. multi-zone kbd backlight control):
> https://lore.kernel.org/platform-driver-x86/20240806205001.191551-2-mustafa.eskieksi@gmail.com/
> This one unfortunately seems to have stalled.
>
> 3. Logitech G710/G710+ gaming keyboards HID driver:
> https://lore.kernel.org/linux-input/20260402075239.3829699-1-xav@bes.tel/
> Posted a week ago, needs an agreement on the LED class dev naming scheme
> to continue.
>
> Regards,
>
> Hans
>
>
> 1) The lack of such an API may not always have been the sole reason these
> drivers have gotten stuck, but it was always a factor.
>
>
> Carlos Ferreira (1):
>    Documentation: leds: leds-class: Document keyboard backlight LED class
>      naming
>
>   Documentation/leds/leds-class.rst | 63 +++++++++++++++++++++++++++++++
>   1 file changed, 63 insertions(+)
>
Thank you for your work.

The kbd_zoned_backlight is pretty useful for the upper-layer apps, such 
as upower.
This gives additional information about the location of the keyboard 
backlight LED and allows the upower to expose the APIs with the zone 
information to the user space. It also improves the user experience of 
the keyboard backlight control.

Acked-by: Kate Hsuan <hpa@redhat.com>


^ permalink raw reply

* Re: [PATCH 3/4] docs/zh_CN: update rust/quick-start.rst translation
From: Dongliang Mu @ 2026-04-09  5:37 UTC (permalink / raw)
  To: Gary Guo, Ben Guo, Alex Shi, Yanteng Si, Jonathan Corbet
  Cc: linux-doc, linux-kernel, rust-for-linux
In-Reply-To: <DHNYKCR34P1F.1EZ3D0A8UB8S5@garyguo.net>


On 4/9/26 1:43 AM, Gary Guo wrote:
> On Wed Apr 8, 2026 at 5:51 PM BST, Ben Guo wrote:
>> On 4/8/26 7:33 PM, Gary Guo wrote:
>>> Hi Ben,
>>>
>>> Thanks on updating the doc translation. There has been new changes to
>>> quick-start.rst on rust-next, could you update the translation to base on that
>>> please?
>>>
>>> Thanks,
>>> Gary
>> Hi Gary,
>>    
>>
>>    
>>    
>>
>> Thanks for the review. This series is based on the Chinese documentation
>> maintainer's tree (alexs/linux.git docs-next), which does not yet have
>> the latest quick-start.rst changes from the Rust-for-Linux rust-next
>> tree.
>>
>> Would it be better to wait until those changes land in our base tree
>> and then resend with the updated translation? Or would you prefer a
>> different approach?
>>
>> Thanks,
>> Ben
> I don't see the issue of sending translation of the latest quick-start.rst even
> if it's not in your base yet. By the time the changes land upstream, the
> original quick-start.rst would already be there.

Hi Gary,

Let’s wait for the rust-next changes to land upstream first, then I’ll 
ask Ben Guo to sync that commit. Otherwise, the Chinese translation 
would do not match the original English doc, which will confuse readers.

We have checktransupdate.py in place for monitoring the updates in 
English documents.

Dongliang Mu


>
> Best,
> Gary


^ permalink raw reply

* Re: [PATCH net-next V5 00/12] devlink: add per-port resource support
From: patchwork-bot+netdevbpf @ 2026-04-09  3:10 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: edumazet, kuba, pabeni, andrew+netdev, davem, horms,
	donald.hunter, jiri, corbet, skhan, saeedm, leon, mbloch, shuah,
	matttbe, chuck.lever, cjubran, ohartoov, moshe, dtatulea,
	daniel.zahka, shshitrit, cratiu, jacob.e.keller, parav,
	ajayachandra, shayd, kees, danielj, netdev, linux-kernel,
	linux-doc, linux-rdma, linux-kselftest, gal
In-Reply-To: <20260407194107.148063-1-tariqt@nvidia.com>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 7 Apr 2026 22:40:55 +0300 you wrote:
> Hi,
> 
> This series by Or adds devlink per-port resource support.
> See detailed description by Or below [1].
> 
> Regards,
> Tariq
> 
> [...]

Here is the summary with links:
  - [net-next,V5,01/12] devlink: Refactor resource functions to be generic
    https://git.kernel.org/netdev/net-next/c/7be3163c49b2
  - [net-next,V5,02/12] devlink: Add port-level resource registration infrastructure
    https://git.kernel.org/netdev/net-next/c/6f38acfed5ed
  - [net-next,V5,03/12] net/mlx5: Register SF resource on PF port representor
    https://git.kernel.org/netdev/net-next/c/4be8326d817e
  - [net-next,V5,04/12] netdevsim: Add devlink port resource registration
    https://git.kernel.org/netdev/net-next/c/085b234b28cc
  - [net-next,V5,05/12] devlink: Add dump support for device-level resources
    https://git.kernel.org/netdev/net-next/c/11636b550eea
  - [net-next,V5,06/12] devlink: Include port resources in resource dump dumpit
    https://git.kernel.org/netdev/net-next/c/810b76394d69
  - [net-next,V5,07/12] devlink: Add port-specific option to resource dump doit
    https://git.kernel.org/netdev/net-next/c/7511ff14f30d
  - [net-next,V5,08/12] selftest: netdevsim: Add devlink port resource doit test
    https://git.kernel.org/netdev/net-next/c/396135377104
  - [net-next,V5,09/12] devlink: Document port-level resources and full dump
    https://git.kernel.org/netdev/net-next/c/170e160a0e7c
  - [net-next,V5,10/12] devlink: Add resource scope filtering to resource dump
    https://git.kernel.org/netdev/net-next/c/1bc45341a6ea
  - [net-next,V5,11/12] selftest: netdevsim: Add resource dump and scope filter test
    https://git.kernel.org/netdev/net-next/c/2a8e91235254
  - [net-next,V5,12/12] devlink: Document resource scope filtering
    https://git.kernel.org/netdev/net-next/c/78c327c1728d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* 答复: [PATCH v2] Documentation/kernel-parameters: fix architecture alignment for pt, nopt, and nobypass
From: Li,Rongqing(ACG CCN) @ 2026-04-09  2:18 UTC (permalink / raw)
  To: Jonathan Corbet, Andrew Morton, Borislav Petkov, Randy Dunlap,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
  Cc: Shuah Khan, Peter Zijlstra, Feng Tang, Pawan Gupta, Dapeng Mi,
	Kees Cook, Marco Elver, Paul E . McKenney, Askar Safin,
	Bjorn Helgaas, Sohil Mehta
In-Reply-To: <20260330105957.2271-1-lirongqing@baidu.com>

> 主题: [PATCH v2] Documentation/kernel-parameters: fix architecture alignment
> for pt, nopt, and nobypass
> 
> From: Li RongQing <lirongqing@baidu.com>
> 
> Commit ab0e7f20768a ("Documentation: Merge x86-specific boot options doc
> into kernel-parameters.txt") introduced a formatting regression where
> architecture tags were placed on separate lines with broken indentation.
> This caused the 'nopt' [X86] parameter to appear as if it belonged to the
> [PPC/POWERNV] section.
> 
> Furthermore, since the main 'iommu=' parameter heading already specifies it is
> for [X86, EARLY], the subsequent standalone [X86] tags for 'pt', 'nopt', and the
> AMD GART options are redundant and clutter the documentation.
> 
> Clean up the formatting by removing these redundant tags and properly
> attributing the 'nobypass' option to [PPC/POWERNV].
> 


Ping

thanks

[Li,Rongqing] 



> Fixes: ab0e7f20768a ("Documentation: Merge x86-specific boot options doc
> into kernel-parameters.txt")
> Acked-by: Randy Dunlap <rdunlap@infradead.org>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Shuah Khan <skhan@linuxfoundation.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Borislav Petkov (AMD) <bp@alien8.de>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Feng Tang <feng.tang@linux.alibaba.com>
> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Cc: Kees Cook <kees@kernel.org>
> Cc: Marco Elver <elver@google.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Askar Safin <safinaskar@gmail.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Sohil Mehta <sohil.mehta@intel.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt
> b/Documentation/admin-guide/kernel-parameters.txt
> index 03a5506..5253c23 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2615,15 +2615,11 @@ Kernel parameters
>  			Intel machines). This can be used to prevent the usage
>  			of an available hardware IOMMU.
> 
> -			[X86]
>  		pt
> -			[X86]
>  		nopt
> -			[PPC/POWERNV]
> -		nobypass
> +		nobypass	[PPC/POWERNV]
>  			Disable IOMMU bypass, using IOMMU for PCI devices.
> 
> -		[X86]
>  		AMD Gart HW IOMMU-specific options:
> 
>  		<size>
> --
> 2.9.4


^ permalink raw reply

* Re: [RFC net-next 15/15] Documentation: networking: add ipxlat translator guide
From: Xavier HSINYUAN @ 2026-04-09  2:17 UTC (permalink / raw)
  To: Daniel Gröber
  Cc: ralf, antonio, corbet, davem, edumazet, horms, kuba, linux-doc,
	linux-kernel, netdev, pabeni, skhan
In-Reply-To: <fldksy7obiaonlcxrjcbnfkfmaup27t3fq3ktubd7sx35fsswx@hjmchh6sr7rw>

Hi Daniel,

> Indeed, the JSON is just wrong and --do dev-set is missing. However
> `--family ipxlat` works for me and looking at the code is basically the
> same as specifying --spec.
> 
> Could you try this:
>
>    $ JSON='{"ifindex": '"$IID"', "config": {"xlat-prefix6": { "prefix": "'$ADDR_HEX'", "prefix-len": 96}}}'
>    $ ./tools/net/ynl/pyynl/cli.py --family ipxlat --do dev-set --json "$JSON"
This looks good to me now. `--family ipxlat` is fine with me if this runs
from the source tree.

> I worry once we start with that we're really just re-stating what's already
> extensively documented in the RFCs.
> 
> How about a reference to RFC 7915 Appendix A? This has a full bidirectional
> end-to-end example of how translation operates:
> https://datatracker.ietf.org/doc/html/rfc7915#appendix-A
>
> Admittedly using a /96 prefix (which the appendix doesn't) would make it
> easier to grok whats going on. Not sure that's reason enough to get into
> more detailed examples here.

A reference to RFC 7915 Appendix A sounds good to me. Still, a short /96
mapping example would help readers quickly see how the translation works
before reading the full RFC, and would make the following NAT64 section
easier to follow as well.

Best regards,
Xavier

^ permalink raw reply

* Re: [PATCH net-next] docs: netdev: improve wording of reviewer guidance
From: patchwork-bot+netdevbpf @ 2026-04-09  2:10 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
	skhan, workflows, linux-doc
In-Reply-To: <20260406175334.3153451-1-kuba@kernel.org>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon,  6 Apr 2026 10:53:34 -0700 you wrote:
> Reword the reviewer guidance based on behavior we see on the list.
> Steer folks:
>  - towards sending tags
>  - away from process issues.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> 
> [...]

Here is the summary with links:
  - [net-next] docs: netdev: improve wording of reviewer guidance
    https://git.kernel.org/netdev/net-next/c/bd5c24e4001d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] crash: Support high memory reservation for range syntax
From: Youling Tang @ 2026-04-09  1:55 UTC (permalink / raw)
  To: Baoquan He, Sourabh Jain
  Cc: Andrew Morton, Jonathan Corbet, Vivek Goyal, Dave Young, kexec,
	linux-kernel, linux-doc, Youling Tang
In-Reply-To: <adZYpnwOxgvFMLaT@MiWiFi-R3L-srv>

Hi, Baoquan

On 4/8/26 21:32, Baoquan He wrote:
> On 04/08/26 at 10:01am, Sourabh Jain wrote:
>> Hello Youling,
>>
>> On 04/04/26 13:11, Youling Tang wrote:
>>> From: Youling Tang <tangyouling@kylinos.cn>
>>>
>>> The crashkernel range syntax (range1:size1[,range2:size2,...]) allows
>>> automatic size selection based on system RAM, but it always reserves
>>> from low memory. When a large crashkernel is selected, this can
>>> consume most of the low memory, causing subsequent hardware
>>> hotplug or drivers requiring low memory to fail due to allocation
>>> failures.
>>
>> Support for high crashkernel reservation has been added to
>> address the above problem.
>>
>> However, high crashkernel reservation is not supported with
>> range-based crashkernel kernel command-line arguments.
>> For example: crashkernel=0M-1G:100M,1G-4G:160M,4G-8G:192M
>>
>> Many users, including some distributions, use range-based
>> crashkernel configuration. So, adding support for high crashkernel
>> reservation with range-based configuration would be useful.
> Sorry for late response. And I have to say sorry because I have some
> negative tendency on this change.
>
> We use crashkernel=xM|G and crashkernel=range1:size1[,range2:size2,...]
> as default setting, so that people only need to set suggested amount
> of memory. While crashkernel=,high|low is for advanced user to customize
> their crashkernel value. In that case, user knows what's high memory and
> low memory, and how much is needed separately to achieve their goal, e.g
> saving low memory, taking away more high memory.
>
> To be honest, above grammers sounds simple, right? I believe both of you
> know very well how complicated the current crashkernel code is. I would
> suggest not letting them becomre more and more complicated by extending
> the grammer further and further. Unless you meet unavoidable issue with
> the existing grammer.
>
> Here comes my question, do you meet unavoidable issue with the existing
> grammer when you use crashkernel=range1:size1[,range2:size2,...] and
> think it's not satisfactory, and at the same time crashkernel=,high|low
> can't meet your demand either?

Yes, regular users generally don't know about high memory and low memory,
and probably don't know how much crashkernel memory should be reserved
either. They mostly just use the default crashkernel parameters configured
by the distribution.

For advanced users, the current grammar is sufficient, because
'crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset],>boundary'
can definitely be replaced with 'crashkernel=size,high'.

The main purpose of this patch is to provide distributions with a more
reasonable default parameter configuration (satisfying most requirements),
without having to set different distribution default parameters for 
different
scenarios (physical machines, virtual machines) and different machine 
models.

Thanks,
Youling.
>
> Thanks
> Baoquan
>

^ permalink raw reply

* Re: [RFC PATCH v3 00/10] mm/damon: introduce DAMOS failed region quota charge ratio
From: SeongJae Park @ 2026-04-09  0:00 UTC (permalink / raw)
  To: Bijan Tabatabai
  Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, Brendan Higgins,
	David Gow, David Hildenbrand, Jonathan Corbet, Lorenzo Stoakes,
	Michal Hocko, Mike Rapoport, Shuah Khan, Shuah Khan,
	Suren Baghdasaryan, Vlastimil Babka, damon, kunit-dev, linux-doc,
	linux-kernel, linux-kselftest, linux-mm
In-Reply-To: <20260408165001.8473-1-bijan311@gmail.com>

On Wed,  8 Apr 2026 11:48:27 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:

> On Mon,  6 Apr 2026 18:05:22 -0700 SeongJae Park <sj@kernel.org> wrote:
> 
> Hi SJ,
> 
> > TL; DR: Let users set different DAMOS quota charge ratios for DAMOS
> > action failed regions, for deterministic and consistent DAMOS action
> > progress.
> > 
> > Common Reports: Unexpectedly Slow DAMOS
> > =======================================
> > 
> > One common issue report that we get from DAMON users is that DAMOS
> > action applying progress speed is sometimes much slower than expected.
> > And one common root cause is that the DAMOS quota is exceeded by the
> > action applying failed memory regions.
> > 
> > For example, a group of users tried to run DAMOS-based proactive memory
> > reclamation (DAMON_RECLAIM) with 100 MiB per second DAMOS quota.  They
> > ran it on a system having no active workload which means all memory of
> > the system is cold.  The expectation was that the system will show 100
> > MiB per second reclamation until (nearly) all memory is reclaimed. But
> > what they found is that the speed is quite inconsistent and sometimes it
> > becomes very slower than the expectation, sometimes even no reclamation
> > at all for about tens of seconds.  The upper limit of the speed (100 MiB
> > per second) was being kept as expected, though.
> > 
> > By monitoring the qt_exceeds (number of DAMOS quota exceed events) DAMOS
> > stat, we found DAMOS quota is always exceeded when the speed is slow. By
> > monitoring sz_tried and sz_applied (the total amount of DAMOS action
> > tried memory and succeeded memory) DAMOS stats together, we found the
> > reclamation attempts nearly always failed when the speed is slow.
> > 
> > DAMOS quota charges DAMOS action tried regions regardless of the
> > successfulness of the try.  Hence in the example reported case, there
> > was unreclaimable memory spread around the system memory.  Sometimes
> > nearly 100 MiB of memory that DAMOS tried to reclaim in the given quota
> > interval was reclaimable, and therefore showed nearly 100 MiB per second
> > speed.  Sometimes nearly 99 MiB of memory that DAMOS was trying to
> > reclaim in the given quota interval was unreclaimable, and therefore
> > showing only about 1 MiB per second reclaim speed.
> > 
> > We explained it is an expected behavior of the feature rather than a
> > bug, as DAMOS quota is there for only the upper-limit of the speed.  The
> > users agreed and later reported a huge win from the adoption of
> > DAMON_RECLAIM on their products.
> 
> Thanks for this series. This is a problem I have come across and am looking
> forward to seeing this land.

Thank you for acknowledging.  I'm hoping this to land on 7.2-rc1.

[...]
> > DAMOS Action Failed Region Quota Charge Ratio
> > =============================================
> > 
> > Let users set the charge ratio for the action-failed memory, for more
> > optimal and deterministic use of DAMOS.  It allows users to specify the
> > numerator and the denominator of the ratio for flexible setup.  For
> > example, let's suppose the numerator and the denominator are set to 1
> > and 4,096, respectively.  The ratio is 1 / 4,096.  A DAMOS scheme action
> > is applied to 5 GiB memory.  For 1 GiB of the memory, the action is
> > succeeded.  For the rest (4 GiB), the action is failed.  Then, only 1
> > GiB and 1 MiB quota is charged.
> > 
> > The optimal charge ratio will depend on the use case and
> > system/workload.  I'd recommend starting from setting the nominator as 1
> > and the denominator as PAGE_SIZE and tune based on the results, because
> > many DAMOS actions are applied at page level.
> 
> This makes sense, but the quota is also considered when setting the minimum
> allowable score in damos_adjust_quota(), which, to my understanding, assumes
> that all of the all of a region's data will by applied. If an action fails for
> a significant amount of the memory, a lower score than what was calculated in
> damos_adjust_quota() could be valid. If that's the case, the scheme would be
> applied to fewer regions than strictly necessary.

Good point, you are right.

> 
> As you mention above, this is not a correctness issue because the quota only
> guarantees an upper limit on the amount of data the scheme is applied to.

I agree.

> Additionally, it may very well be true that what I listed above would not be
> very noticeable in practice.

I guess it is hopefully true, for following reason.

The score for each region is calculated as a weigted sum of the access
frequency and the age of the region.  To avoid DAMOS action is repeatedly
applied to only a few regions, we reset age of regions after a DAMOS action is
applied to the region, regardless of the action failure.  So, periodically the
score of the regions having the action unapplicable region will get low, make
no big impact to the minimum score threshold calculation.

But real data could say something different.  I will be happy to be proven
wrong my real data. :)

> I just thought this was worth pointing out as
> something to think about.

Indeed.  Thank you for pointing out.  Nonetheless this is not a new issue that
introduced by this patch series.  And the impact is not clear at the moment.  I
will be happy to revisit this in parallel to this patch series.


Thanks,
SJ

[...]

^ permalink raw reply

* Re: [PATCH v2 00/16] fs,x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Reinette Chatre @ 2026-04-08 23:41 UTC (permalink / raw)
  To: Moger, Babu, Babu Moger, corbet@lwn.net, tony.luck@intel.com,
	Dave.Martin@arm.com, james.morse@arm.com, tglx@kernel.org,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
  Cc: skhan@linuxfoundation.org, x86@kernel.org, hpa@zytor.com,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	vschneid@redhat.com, kas@kernel.org, rick.p.edgecombe@intel.com,
	akpm@linux-foundation.org, pmladek@suse.com,
	rdunlap@infradead.org, dapeng1.mi@linux.intel.com,
	kees@kernel.org, elver@google.com, paulmck@kernel.org,
	lirongqing@baidu.com, safinaskar@gmail.com, fvdl@google.com,
	seanjc@google.com, pawan.kumar.gupta@linux.intel.com,
	xin@zytor.com, tiala@microsoft.com, chang.seok.bae@intel.com,
	Lendacky, Thomas, elena.reshetova@intel.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-coco@lists.linux.dev, kvm@vger.kernel.org,
	eranian@google.com, peternewman@google.com
In-Reply-To: <20aaacfb-9601-4343-a5d5-f3df6152155b@amd.com>

Hi Babu,

On 4/8/26 4:07 PM, Moger, Babu wrote:
> On 4/8/2026 4:24 PM, Reinette Chatre wrote:
>> On 4/8/26 1:45 PM, Babu Moger wrote:
...

>>> The modes "global_assign_ctrl_inherit_mon_per_cpu" and "global_assign_ctrl_assign_mon_per_cpu" represent the actual PLZA modes.
>>>
>>> Both of these modes introduce new files kernel_mode_cpus/ and kernel_mode_cpus_list in the resctrl group.
>>
>> Right. To be specific when the user changes the mode to either "global_assign_ctrl_inherit_mon_per_cpu" or
>> "global_assign_ctrl_assign_mon_per_cpu" the new files will be created in the default resource group with
>> associated setting applied globally at that time.
> 
> If, at that point, "info/kernel_mode_assignment" points to // (the default group), is that correct?

I see "info/kernel_mode_assignment" pointing to default group as the only
option right after a mode switch away from "inherit_ctrl_and_mon".

To elaborate, the current idea is that the mode within info/kernel_mode determines
which, if any, control files are presented to user space.
Assuming that the system boots up with:
	# cat info/kernel_mode
	[inherit_ctrl_and_mon]
	global_assign_ctrl_inherit_mon_per_cpu
	global_assign_ctrl_assign_mon_per_cpu

In above scenario "info/kernel_mode_assignment" does not exist (is not visible to
user space).

When the user switches to either "global_assign_ctrl_inherit_mon_per_cpu" or
'global_assign_ctrl_assign_mon_per_cpu" then "info/kernel_mode_assignment" is created
(or made visible to user space) and is expected to point to default group.
User can change the group using "info/kernel_mode_assignment" at this point.

If the current scenario is below ...
	# cat info/kernel_mode
	[global_assign_ctrl_inherit_mon_per_cpu]
	inherit_ctrl_and_mon
	global_assign_ctrl_assign_mon_per_cpu

... then "info/kernel_mode_assignment" will exist but what it should contain if
user switches mode at this point may be up for discussion.

option 1)
When user switches mode to "global_assign_ctrl_assign_mon_per_cpu" then
the resource group in "info/kernel_mode_assignment" is reset to the
default group and all CPUs PLZA state reset to match. The kernel_mode_cpus
and kernel_mode_cpuslist files become visible in default resource group
and they contain "all online CPUs".

option 2)
When user switches mode to "global_assign_ctrl_assign_mon_per_cpu" then
the resource group in "info/kernel_mode_assignment" is kept and all
CPUs PLZA state set to match it while also keeping the current 
values of that resource group's kernel_mode_cpus and kernel_mode_cpuslist
files.

I am leaning towards "option 1" to keep it consistent with a switch from
"inherit_ctrl_and_mon" and being deterministic about how a mode is started with
a clean slate. What are your thoughts? What would be use case where a user would
want to switch between "global_assign_ctrl_inherit_mon_per_cpu" and
"global_assign_ctrl_assign_mon_per_cpu" to just switch rmid_en on and off?


> And if "info/kernel_mode_assignment" points to a different group
> (for example, test//), then the kernel_mode_cpus/ and
> kernel_mode_cpus_list files will be created only under the test//
> group. Is that correct?

I expect that if "info/kernel_mode_assignment" exists then the group
listed within contains kernel_mode_cpus and kernel_mode_cpuslist.
How the group ends up in "info/kernel_mode_assignment" could result
from mode change or from write by user space.

Reinette


^ permalink raw reply

* Re: [PATCH v10 12/21] gpu: nova-core: mm: Add unified page table entry wrapper enums
From: John Hubbard @ 2026-04-08 23:13 UTC (permalink / raw)
  To: Joel Fernandes, Eliot Courtney, linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
	Alex Gaynor, Boqun Feng, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi, joel,
	linux-doc, amd-gfx, intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <da8d03f8-0294-417b-b684-2c20d577f94a@nvidia.com>

On 4/8/26 9:58 AM, Joel Fernandes wrote:
> On 4/8/2026 9:26 AM, Eliot Courtney wrote:
>> On Tue Apr 7, 2026 at 10:59 PM JST, Joel Fernandes wrote:
>>> On 4/7/2026 9:42 AM, Eliot Courtney wrote:
>>>> On Tue Apr 7, 2026 at 6:55 AM JST, Joel Fernandes wrote:
...>> [1]: https://github.com/Edgeworth/linux/commits/review/nova-mm-v10/
> First, thanks for the effort. I looked through this, its pretty much what I
> had before when I used traits. I don't think it is better to be honest. In
> fact your version is worse, it adds many new types and things like the
> following which I did not need before.

Hi Joel and all,

I also looked through Eliot's above attempt carefully, and actually
liked it a lot (sorry! haha):

* It cleans up the code. The initial working version was readable, but
  also had lots of noise on the screen: match statements and pairs of
  v2/v3 statements.

  And interestingly, the mmu_version was, in effect, sporadically
  implementing a Trait-based approach. But because it is custom,
  readers don't benefit as much as they would with Traits, which
  tell you immediately how things are structured.

Joel, I am passionately in agreement with your principles: code must
be readable on the screen.

In this case, though, Traits make considerably more readable,
especially if one makes the very reasonable assumption that readers are
thoroughly accustomed to dealing with Rust traits.

> 
> To put it mildly, the following suggestion should not be anywhere near my code:
> 

lol I understand, believe me. But this is short and not too bad, really.

> /// Type-erased MMU-specific [`Vmm`] implementations.

Type erasure remains a semi-exotic thing, IMHO. As such, another
sentence to elaborate on this would be a nice touch.

> enum VmmInner {
>     /// `Vmm` implementation for MMU v2.
>     V2(VmmImpl<MmuV2>),
>     /// `Vmm` implementation for MMU v3.
>     V3(VmmImpl<MmuV3>),
> }
> 
> /// MMU-specific [`Vmm`] implementation.
> struct VmmImpl<M: Mmu> {
> 
> Seriously, I have to pass on this. :-)
> 
> And, you unfortunately seem to have ignored my point about requiring 4 NEW
> traits (Mmu, PteOps, PdeOps, DualPdeOps etc), which I did not need before.
> So you're making the code much much worse than before actually. We don't
> new traits and types pointlessly.

They are not pointless.

However! What I think would be nice is: do a new v11 with approximately
this approach, and then we can beat it into being as readable as 
possible.
 

thanks,
-- 
John Hubbard


^ permalink raw reply

* Re: [PATCH v8 0/2] PCI: s390: Expose the UID as an arch specific PCI slot attribute
From: Vasily Gorbik @ 2026-04-08 23:12 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: Bjorn Helgaas, Jonathan Corbet, Lukas Wunner, Shuah Khan,
	Farhan Ali, Alexander Gordeev, Christian Borntraeger,
	Gerald Schaefer, Gerd Bayer, Heiko Carstens, Julian Ruess,
	Matthew Rosato, Peter Oberparleiter, Ramesh Errabolu,
	Sven Schnelle, linux-doc, linux-kernel, linux-pci, linux-s390,
	Randy Dunlap
In-Reply-To: <20260407-uid_slot-v8-0-15ae4409d2ce@linux.ibm.com>

On Tue, Apr 07, 2026 at 03:24:44PM +0200, Niklas Schnelle wrote:
> Add a mechanism for architecture specific attributes on
> PCI slots in order to add the user-defined ID (UID) as an s390 specific
> PCI slot attribute. First though improve some issues with the s390 specific
> documentation of PCI sysfs attributes noticed during development. 

> Niklas Schnelle (2):
>       docs: s390/pci: Improve and update PCI documentation
>       PCI: s390: Expose the UID as an arch specific PCI slot attribute
> 
>  Documentation/arch/s390/pci.rst | 151 +++++++++++++++++++++++++++-------------
>  arch/s390/include/asm/pci.h     |   4 ++
>  arch/s390/pci/pci_sysfs.c       |  20 ++++++
>  drivers/pci/slot.c              |  13 +++-
>  4 files changed, 140 insertions(+), 48 deletions(-)

Applied to s390 tree, thank you!

^ permalink raw reply

* Re: [PATCH v2 00/16] fs,x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Moger, Babu @ 2026-04-08 23:07 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet@lwn.net, tony.luck@intel.com,
	Dave.Martin@arm.com, james.morse@arm.com, tglx@kernel.org,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
  Cc: skhan@linuxfoundation.org, x86@kernel.org, hpa@zytor.com,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	vschneid@redhat.com, kas@kernel.org, rick.p.edgecombe@intel.com,
	akpm@linux-foundation.org, pmladek@suse.com,
	rdunlap@infradead.org, dapeng1.mi@linux.intel.com,
	kees@kernel.org, elver@google.com, paulmck@kernel.org,
	lirongqing@baidu.com, safinaskar@gmail.com, fvdl@google.com,
	seanjc@google.com, pawan.kumar.gupta@linux.intel.com,
	xin@zytor.com, tiala@microsoft.com, chang.seok.bae@intel.com,
	Lendacky, Thomas, elena.reshetova@intel.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-coco@lists.linux.dev, kvm@vger.kernel.org,
	eranian@google.com, peternewman@google.com
In-Reply-To: <72297351-2954-4318-81b6-7de409e5552c@intel.com>

Hi Reinette,

On 4/8/2026 4:24 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 4/8/26 1:45 PM, Babu Moger wrote:
>> On 4/7/26 23:45, Reinette Chatre wrote:
>>> On 4/7/26 6:01 PM, Babu Moger wrote:
> 
>>>> That said, I’m open to not having a dedicated group if we can still support all the features that PLZA provides without it.
>>>
>>> I find that enabling user space to share CLOSID/RMID between user space
>>> and kernel space to indeed support what PLZA provides. I think I am missing
>>> something here since below proposal again attempts to isolate a resource group
>>> (CLOSID) for kernel work.
>>
>> No. I dont want to isolate a group just for PLZA. All I am saying
>> is, we should provide option to create a dedicated group if the user
>> wants to do it.
> I agree. I do not see resctrl needing to do anything to accomplish this though. If
> the user wants a group dedicated to kernel mode/PLZA then all that is needed is for the
> user not to assign any tasks to this group, either via changes to the group's tasks file
> or via the group's cpus/cpus_list files.
> 
>>>>
>>>> The mode can simply be determined on a per-group basis. We can
>>>> introduce two new files—kernel_mode_cpus and
>>>> kernel_mode_cpus_list—within each resctrl group when kmode (or
>>>> PLZA) is supported.
>>>
>>> I think having these files in every resource group is confusing since user can only interact
>>> with these files in one resource group for current PLZA. Why not *just* have the files in the
>>> resource group that matches the group in info/kernel_mode_assignment?
>>
>> The default group can also serve as the PLZA group.
>>
>> #cat info/kernel_mode_assignment
>> //
>>
>> At this point, the (kmode_cpus / kmode_cpus_list) files will exist in the default group:
>>
>> Then user changes the PLZA group to "test".
>>
>> #echo "test//" > info/kernel_mode_assignment
>>
>> At this point, we expect the files "(kmode_cpus/kmode_cpus_list)" to be visible in "test//" group.
>>
>> One open question is whether we should remove the visibility of these files from the default group. It’s unclear if we can safely do this dynamically.
>>
>> An alternative approach would be to always keep the files present, but allow access to them only for groups that are listed in "info/kernel_mode_assignment".
> 
> The files appearing/disappearing is just how the user experiences the resctrl fs interface.
> Within resctrl the files could indeed always exist but resctrl can use the kernfs_show()
> API to show/hide them as needed. Similar to resctrl_bmec_files_show() that you created.
> Allowing/removing access becomes complicated because user space can always do a chmod
> to change permissions that resctrl would need to handle.
> 
> I do not know if there are sharp corners here when thinking about strange scenarios where
> user opens a file before resctrl changes visibility or permissions and then user space
> interacts with the file. This may be worthwhile to test to matter which mechanism is used.
> 
>>>> Files and behavior:
>>>> - cpus / cpus_list:
>>>>
>>>> CPUs listed here use the same allocation for both user and kernel space.
>>>
>>> Both user and kernel space?
>>
>> As it stands today, the CPU list is written to MSR_PQR_ASSOC, resulting in the same allocation for both user and kernel within a given CLOS.
>>
>> Kernel-mode allocation changes only if specific CPUs are included in the kmode_cpus list.
> 
> ack.
> 
>>>> There is no change to the current semantics of these files.
>>>> If these files are empty, the group effectively becomes a PLZA-dedicated group.
>>>
>>> I do not see it this way. If the cpu/cpus_list files are empty then it means that the
>>> tasks in the group will use their own CLOSID/RMID for user space allocation and
>>> monitoring. What allocations/monitoring is used by tasks when in kernel mode depends
>>> on whether the CPU the task is running on can be found in a kernel_mode_cpus/kernel_mode_cpuslist
>>> file. If the CPU the task is running on can be found in a kernel_mode_cpus/kernel_mode_cpuslist
>>> file then it will inherit whatever the PQR_PLZA setting of that CPU which is the allocation
>>> associated with the resource group to which that kernel_mode_cpus/kernel_mode_cpuslist belongs.
>>> If the CPU the task is running on cannot be found in kernel_mode_cpus/kernel_mode_cpuslist
>>> then its kernel work will inherit its user space allocations and monitoring.
>>>
>>
>> Yes. that is correct. I think our understanding is correct, but our implementation ideas are different it seems.
> 
> While we have been sharing different ideas I have tried to be clear on *why* I made
> certain choices and attempted to provide specific feedback to your ideas. If you find
> your plan to be better then please respond to my feedback about it to help me understand
> why that may be the better solution. If you find your solution is better then could you please
> describe it with detail? At this time I do not have a clear understanding of what you propose.
> 
> ...
>>
>> Let me make sure I understand what you mentioned earlier. Copied the text below from the thread for the context:
>>
>> https://lore.kernel.org/lkml/3305c18e-9e50-4df0-b9f1-c61028628967@intel.com/
>> =====================================================================
>>
>> Please consider the intent of this file when thinking about names. The idea is that "info/kernel_mode"
>> specifies the "mode" of how kernel work is handled and it determines the configuration files used in that
>> mode as well as the syntax when interacting with those files. By renaming "kernel_mode_assignment" to
>> "kmode_groups" it implicitly requires all future kernel mode enhancements to need some data related to "groups".
>>
>> In summary, I think this can be simplified by introducing just two new files in info/ that enables the
>> user to (a) select and (b) configure the "kernel mode". To start there can be just two modes,
>> global_assign_ctrl_inherit_mon_per_cpu and global_assign_ctrl_assign_mon_per_cpu.
>> global_assign_ctrl_inherit_mon_per_cpu mode requires a control group in kernel_mode_assignment while
>> global_assign_ctrl_assign_mon_per_cpu requires a control and monitoring group.
>>
>> The resource group in info/kernel_mode_assignment gets two additional files "kernel_mode_cpus" and
>> "kernel_mode_cpus_list" that contains the CPUs enabled with the kernel mode configuration, by default
>> it will be all online CPUs. The resource group can continue to be used to manage allocations of and
>> monitor user space tasks. Specifically, the "cpus", "cpus_list", and "tasks" files remain.
>>
>> A user wanting just "global" settings will get just that when writing the group to
>> info/kernel_mode_assignment. A user wanting "per CPU" settings can follow the
>> info/kernel_mode_assignment setting with changes to that resource group's kernel_mode_cpus/kernel_mode_cpus_list
>> files. Any task running on a CPU that is *not* in kernel_mode_cpus/kernel_mode_cpus_list can be
>> expected to inherit both CLOSID and RMID from user space for all kernel work.
>>
>> ======================================================================
>>
>> Let me try to get few clarification on things here.
>>
>> # cat info/kernel_mode
>>    [inherit_ctrl_and_mon]
>>    global_assign_ctrl_inherit_mon_per_cpu
>>    global_assign_ctrl_assign_mon_per_cpu
>>
>> My understanding of "inherit_ctrl_and_mon" is that the kernel
>> inherits both the CLOS and the RMID from user space. Basically both
>> user and kernel uses same CLOSID and RMID. This reflects the current
>> behavior (without PLZA) correct? This would correspond to the
> 
> Correct.
> 
>> default group when resctrl is mounted.
> 
>>
>> The modes "global_assign_ctrl_inherit_mon_per_cpu" and "global_assign_ctrl_assign_mon_per_cpu" represent the actual PLZA modes.
>>
>> Both of these modes introduce new files kernel_mode_cpus/ and kernel_mode_cpus_list in the resctrl group.
> 
> Right. To be specific when the user changes the mode to either "global_assign_ctrl_inherit_mon_per_cpu" or
> "global_assign_ctrl_assign_mon_per_cpu" the new files will be created in the default resource group with
> associated setting applied globally at that time.

If, at that point, "info/kernel_mode_assignment" points to // (the 
default group), is that correct?

And if "info/kernel_mode_assignment" points to a different group (for 
example, test//), then the kernel_mode_cpus/ and kernel_mode_cpus_list 
files will be created only under the test// group. Is that correct?

Thanks
Babu


^ permalink raw reply

* Re: allowing '-' instead of ':' in kernel-doc descriptions
From: Randy Dunlap @ 2026-04-08 22:44 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: Jonathan Corbet, Linux Documentation
In-Reply-To: <dskdc44um6l6sw43uazfpzmsv4tkesog7sro22qkvzxyflvurt@pwhb3rs44ga7>

Hi,
[modified Subject & recipients]

On 11/13/25 2:32 AM, Mauro Carvalho Chehab wrote:
> On Thu, Nov 13, 2025 at 03:49:27AM -0500, Michael S. Tsirkin wrote:
>> On Thu, Nov 13, 2025 at 12:55:37PM +1100, Stephen Rothwell wrote:
>>> Hi all,
>>>
>>> Today's linux-next build (htmldocs) produced these warnings:
>>>
>>> WARNING: /home/sfr/kernels/next/next/include/linux/virtio_config.h:174 duplicate section name 'Return'
>>> WARNING: /home/sfr/kernels/next/next/include/linux/virtio_config.h:184 duplicate section name 'Return'
>>> WARNING: /home/sfr/kernels/next/next/include/linux/virtio_config.h:190 duplicate section name 'Return'
>>>
>>> Introduced by commit
>>>
>>>   bee8c7c24b73 ("virtio: introduce map ops in virtio core")
>>>
>>> but is probably a bug in our scripts as those lines above have "Returns:"
>>> in them, not "Return:".
>>>
>>> These have turned up now since a bug was fixed that was repressing a
>>> lot of warnings.
>>
>> Indeed. But the rest of header says Returns ... without : so I will just
>> fix this one to do the same. I also fixed other issues in the comments
>> in this header while I was at it. Will post shortly.
> 
> That's the best approach. We could instead change the new section detection
> regex to accept just one space at most:
> 
>     diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
>     index f7dbb0868367..bab0ec3abe31 100644
>     --- a/scripts/lib/kdoc/kdoc_parser.py
>     +++ b/scripts/lib/kdoc/kdoc_parser.py
>     @@ -46,7 +46,7 @@ doc_decl = doc_com + KernRe(r'(\w+)', cache=False)
>      known_section_names = 'description|context|returns?|notes?|examples?'
>      known_sections = KernRe(known_section_names, flags = re.I)
>      doc_sect = doc_com + \
>     -    KernRe(r'\s*(@[.\w]+|@\.\.\.|' + known_section_names + r')\s*:([^:].*)?$',
>     +    KernRe(r'\s?(@[.\w]+|@\.\.\.|' + known_section_names + r')\s*:([^:].*)?$',
>                 flags=re.I, cache=False)
>  
>      doc_content = doc_com_body + KernRe(r'(.*)', cache=False)
> 
> (patch not tested)
> 
> But, if we do so, someone has to check if this won't cause regressions
> elsewhere. I'm almost sure a change like that will break something...

Following up:

I've been testing this patch for about 3 months now.
The only problems that I have seen with it are these:
(in linux-next-20260408)


WARNING: ../drivers/pci/msi/api.c:102 duplicate section name 'Return'
WARNING: ../mm/damon/core.c:1472 duplicate section name 'Return'
WARNING: ../mm/damon/core.c:1472 duplicate section name 'Return'
WARNING: ../include/uapi/drm/i915_drm.h:2403 duplicate section name 'Return'
WARNING: ../include/uapi/drm/i915_drm.h:2403 duplicate section name 'Return'
WARNING: ../include/uapi/drm/i915_drm.h:2403 duplicate section name 'Return'
WARNING: ../drivers/gpu/drm/drm_atomic_helper.c:3546 duplicate section name 'Return'
WARNING: ../drivers/gpu/drm/drm_atomic_helper.c:3710 duplicate section name 'Return'
WARNING: ../drivers/gpu/drm/drm_of.c:382 duplicate section name 'Return'
WARNING: ../drivers/gpu/drm/drm_of.c:432 duplicate section name 'Return'
WARNING: ../drivers/gpu/drm/drm_gem.c:900 duplicate section name 'Return'
WARNING: ../include/linux/w1.h:115 duplicate section name 'Return'
WARNING: ../include/linux/w1.h:115 duplicate section name 'Return'


-- 
~Randy


^ permalink raw reply

* [PATCH v2] doc: watchdog: fix typos etc.
From: Randy Dunlap @ 2026-04-08 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Randy Dunlap, Andrew Morton, Jonathan Corbet, Shuah Khan,
	linux-doc, Björn Persson

Correct typos in lockup-watchdogs.rst.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
---
v2: corrections from Björn (Thanks)

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-doc@vger.kernel.org
Cc: Björn Persson <Bjorn@xn--rombobjrn-67a.se>

 Documentation/admin-guide/lockup-watchdogs.rst |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- linux-next-20260406.orig/Documentation/admin-guide/lockup-watchdogs.rst
+++ linux-next-20260406/Documentation/admin-guide/lockup-watchdogs.rst
@@ -41,7 +41,7 @@ is a trade-off between fast response to
 Implementation
 ==============
 
-The soft and hard lockup detectors are built around a hrtimer.
+The soft and hard lockup detectors are built around an hrtimer.
 In addition, the softlockup detector regularly schedules a job, and
 the hard lockup detector might use Perf/NMI events on architectures
 that support it.
@@ -49,7 +49,7 @@ that support it.
 Frequency and Heartbeats
 ------------------------
 
-The core of the detectors in a hrtimer. It servers multiple purpose:
+The core of the detectors is an hrtimer. It serves multiple purposes:
 
 - schedules watchdog job for the softlockup detector
 - bumps the interrupt counter for hardlockup detectors (heartbeat)

^ permalink raw reply

* Re: [PATCH] doc: watchdog: fix typos etc.
From: Randy Dunlap @ 2026-04-08 21:28 UTC (permalink / raw)
  To: Björn Persson
  Cc: Andrew Morton, Jonathan Corbet, Shuah Khan, linux-doc,
	linux-kernel
In-Reply-To: <20260408205611.0f7e38de@tag.xn--rombobjrn-67a.se>



On 4/8/26 11:56 AM, Björn Persson wrote:
> Randy Dunlap wrote:
>> -Similarly to the softlockup case, the current stack trace is displayed
>> +Similar to the softlockup case, the current stack trace is displayed
> 
> "Similarly" modifies "is displayed", so the adverbial form is correct.
> 
>> -The core of the detectors in a hrtimer. It servers multiple purpose:
>> +The core of the detectors is an hrtimer. It servers multiple purposes:
> 
> And "servers" should be "serves".

Thank you.

Andrew, I'll send a v2 patch.

-- 
~Randy


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox