Linux Documentation
 help / color / mirror / Atom feed
* Re: [PATCH 05/33] rust: remove `RUSTC_HAS_COERCE_POINTEE` and simplify code
From: Tamir Duberstein @ 2026-04-01 22:59 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Nathan Chancellor, Nicolas Schier, Danilo Krummrich,
	Andreas Hindborg, Catalin Marinas, Will Deacon, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Courbot, David Airlie,
	Simona Vetter, Brendan Higgins, David Gow, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Alice Ryhl, Jonathan Corbet, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Trevor Gross, rust-for-linux,
	linux-kbuild, Lorenzo Stoakes, Vlastimil Babka, Liam R . Howlett,
	Uladzislau Rezki, linux-block, linux-arm-kernel, Alexandre Ghiti,
	linux-riscv, nouveau, dri-devel, Rae Moar, linux-kselftest,
	kunit-dev, Nick Desaulniers, Bill Wendling, Justin Stitt, llvm,
	linux-kernel, Shuah Khan, linux-doc
In-Reply-To: <20260401114540.30108-6-ojeda@kernel.org>

On Wed, 01 Apr 2026 13:45:12 +0200, Miguel Ojeda <ojeda@kernel.org> wrote:
> With the Rust version bump in place, the `RUSTC_HAS_COERCE_POINTEE`
> Kconfig (automatic) option is always true.
> 
> Thus remove the option and simplify the code.
> 
> In particular, this includes removing our use of the predecessor unstable
> features we used with Rust < 1.84.0 (`coerce_unsized`, `dispatch_from_dyn`
> and `unsize`).
> 
> [...]

Reviewed-by: Tamir Duberstein <tamird@kernel.org>

-- 
Tamir Duberstein <tamird@kernel.org>

^ permalink raw reply

* Re: [PATCH 04/33] rust: remove `RUSTC_HAS_SLICE_AS_FLATTENED` and simplify code
From: Tamir Duberstein @ 2026-04-01 22:59 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Nathan Chancellor, Nicolas Schier, Danilo Krummrich,
	Andreas Hindborg, Catalin Marinas, Will Deacon, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Courbot, David Airlie,
	Simona Vetter, Brendan Higgins, David Gow, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Alice Ryhl, Jonathan Corbet, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Trevor Gross, rust-for-linux,
	linux-kbuild, Lorenzo Stoakes, Vlastimil Babka, Liam R . Howlett,
	Uladzislau Rezki, linux-block, linux-arm-kernel, Alexandre Ghiti,
	linux-riscv, nouveau, dri-devel, Rae Moar, linux-kselftest,
	kunit-dev, Nick Desaulniers, Bill Wendling, Justin Stitt, llvm,
	linux-kernel, Shuah Khan, linux-doc
In-Reply-To: <20260401114540.30108-5-ojeda@kernel.org>

On Wed, 01 Apr 2026 13:45:11 +0200, Miguel Ojeda <ojeda@kernel.org> wrote:
> With the Rust version bump in place, the `RUSTC_HAS_SLICE_AS_FLATTENED`
> Kconfig (automatic) option is always true.
> 
> Thus remove the option and simplify the code.
> 
> In particular, this includes removing the `slice` module which contained
> the temporary slice helpers, i.e. the `AsFlattened` extension trait and
> its `impl`s.
> 
> [...]

Reviewed-by: Tamir Duberstein <tamird@kernel.org>

-- 
Tamir Duberstein <tamird@kernel.org>

^ permalink raw reply

* Re: [PATCH 03/33] rust: simplify `RUSTC_VERSION` Kconfig conditions
From: Tamir Duberstein @ 2026-04-01 22:59 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Nathan Chancellor, Nicolas Schier, Danilo Krummrich,
	Andreas Hindborg, Catalin Marinas, Will Deacon, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Courbot, David Airlie,
	Simona Vetter, Brendan Higgins, David Gow, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Alice Ryhl, Jonathan Corbet, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Trevor Gross, rust-for-linux,
	linux-kbuild, Lorenzo Stoakes, Vlastimil Babka, Liam R . Howlett,
	Uladzislau Rezki, linux-block, linux-arm-kernel, Alexandre Ghiti,
	linux-riscv, nouveau, dri-devel, Rae Moar, linux-kselftest,
	kunit-dev, Nick Desaulniers, Bill Wendling, Justin Stitt, llvm,
	linux-kernel, Shuah Khan, linux-doc
In-Reply-To: <20260401114540.30108-4-ojeda@kernel.org>

On Wed, 01 Apr 2026 13:45:10 +0200, Miguel Ojeda <ojeda@kernel.org> wrote:
> With the Rust version bump in place, several Kconfig conditions based on
> `RUSTC_VERSION` are always true.
> 
> Thus simplify them.
> 
> The minimum supported major LLVM version by our new Rust minimum version
> is now LLVM 18, instead of LLVM 16. However, there are no possible
> cleanups for `RUSTC_LLVM_VERSION`.
> 
> [...]

Reviewed-by: Tamir Duberstein <tamird@kernel.org>

-- 
Tamir Duberstein <tamird@kernel.org>

^ permalink raw reply

* Re: [PATCH 02/33] rust: bump Clippy's MSRV and clean `incompatible_msrv` allows
From: Tamir Duberstein @ 2026-04-01 22:59 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Nathan Chancellor, Nicolas Schier, Danilo Krummrich,
	Andreas Hindborg, Catalin Marinas, Will Deacon, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Courbot, David Airlie,
	Simona Vetter, Brendan Higgins, David Gow, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Alice Ryhl, Jonathan Corbet, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Trevor Gross, rust-for-linux,
	linux-kbuild, Lorenzo Stoakes, Vlastimil Babka, Liam R . Howlett,
	Uladzislau Rezki, linux-block, linux-arm-kernel, Alexandre Ghiti,
	linux-riscv, nouveau, dri-devel, Rae Moar, linux-kselftest,
	kunit-dev, Nick Desaulniers, Bill Wendling, Justin Stitt, llvm,
	linux-kernel, Shuah Khan, linux-doc
In-Reply-To: <20260401114540.30108-3-ojeda@kernel.org>

On Wed, 01 Apr 2026 13:45:09 +0200, Miguel Ojeda <ojeda@kernel.org> wrote:
> Following the Rust compiler bump, we can now update Clippy's MSRV we
> set in the configuration, which will improve the diagnostics it generates.
> 
> Thus do so and clean a few of the `allow`s that are not needed anymore.

Reviewed-by: Tamir Duberstein <tamird@kernel.org>

-- 
Tamir Duberstein <tamird@kernel.org>

^ permalink raw reply

* Re: [PATCH 01/33] rust: bump Rust minimum supported version to 1.85.0 (Debian Trixie)
From: Tamir Duberstein @ 2026-04-01 22:59 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Nathan Chancellor, Nicolas Schier, Danilo Krummrich,
	Andreas Hindborg, Catalin Marinas, Will Deacon, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Courbot, David Airlie,
	Simona Vetter, Brendan Higgins, David Gow, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Alice Ryhl, Jonathan Corbet, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Trevor Gross, rust-for-linux,
	linux-kbuild, Lorenzo Stoakes, Vlastimil Babka, Liam R . Howlett,
	Uladzislau Rezki, linux-block, linux-arm-kernel, Alexandre Ghiti,
	linux-riscv, nouveau, dri-devel, Rae Moar, linux-kselftest,
	kunit-dev, Nick Desaulniers, Bill Wendling, Justin Stitt, llvm,
	linux-kernel, Shuah Khan, linux-doc
In-Reply-To: <20260401114540.30108-2-ojeda@kernel.org>

On Wed, 01 Apr 2026 13:45:08 +0200, Miguel Ojeda <ojeda@kernel.org> wrote:
> As proposed in the past in e.g. LPC 2025 and the Maintainers Summit [1],
> we are going to follow Debian Stable's Rust versions as our minimum
> supported version.
> 
> Debian Trixie was released with a Rust 1.85.0 toolchain [2], which it
> still uses to this day [3] (i.e. no update to Rust 1.85.1).
> 
> [...]

Acked-by: Tamir Duberstein <tamird@kernel.org>

-- 
Tamir Duberstein <tamird@kernel.org>

^ permalink raw reply

* Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
From: Ackerley Tng @ 2026-04-01 22:46 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jroedel, jthoughton, oupton, pankaj.gupta,
	qperret, rick.p.edgecombe, rientjes, shivankg, steven.price,
	tabba, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, Paolo Bonzini, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm
In-Reply-To: <cqkyz4zxosmev6lnbasa32ed5jjfljrx3gr6plyjfhrtbysuwr@rg5noy6y6ayu>

Michael Roth <michael.roth@amd.com> writes:

> On Thu, Mar 26, 2026 at 03:24:19PM -0700, Ackerley Tng wrote:
>> For shared to private conversions, if refcounts on any of the folios
>> within the range are elevated, fail the conversion with -EAGAIN.
>>
>> At the point of shared to private conversion, all folios in range are
>> also unmapped. The filemap_invalidate_lock() is held, so no faulting
>> can occur. Hence, from that point on, only transient refcounts can be
>> taken on the folios associated with that guest_memfd.
>>
>> Hence, it is safe to do the conversion from shared to private.
>>
>> After conversion is complete, refcounts may become elevated, but that
>> is fine since users of transient refcounts don't actually access
>> memory.
>>
>> For private to shared conversions, there are no refcount checks, since
>> the guest is the only user of private pages, and guest_memfd will be the
>> only holder of refcounts on private pages.
>
> I think KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES deserves some mention in
> the commit log.
>

Will update this in the next revision. Thanks!

>>
>>
>> [...snip...]
>>
>>
>> +Set attributes for a range of offsets within a guest_memfd to
>> +KVM_MEMORY_ATTRIBUTE_PRIVATE to limit the specified guest_memfd backed
>> +memory range for guest_use. Even if KVM_CAP_GUEST_MEMFD_MMAP is
>> +supported, after a successful call to set
>> +KVM_MEMORY_ATTRIBUTE_PRIVATE, the requested range will not be mappable
>> +into host userspace and will only be mappable by the guest.
>> +
>> +To allow the range to be mappable into host userspace again, call
>> +KVM_SET_MEMORY_ATTRIBUTES2 on the guest_memfd again with
>> +KVM_MEMORY_ATTRIBUTE_PRIVATE unset.
>> +
>> +If this ioctl returns -EAGAIN, the offset of the page with unexpected
>> +refcounts will be returned in `error_offset`. This can occur if there
>> +are transient refcounts on the pages, taken by other parts of the
>> +kernel.
>
> That's only true for the guest_memfd ioctl, for KVM ioctl these new
> fields and r/w behavior are basically ignored. So you might need to be
> clearer on which fields/behavior are specific to guest_memfd like in
> the preceeding paragraphs..
>

Yes, will update in the next revision, thanks!

> ..or maybe it's better to do the opposite and just have a blanket 'for
> now, all newly-described behavior pertains only to usage via a
> guest_memfd ioctl, and for KVM ioctls only the fields/behaviors common
> with KVM_SET_MEMORY_ATTRIBUTES are applicable.', since it doesn't seem
> like vm_memory_attributes=1 is long for this world and that's the only
> case where KVM memory attribute ioctls seem relevant.
>
> But then it makes me wonder, if we adopt the semantics I mentioned
> earlier and have KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES advertise both
> the gmem ioctl support as well as the struct kvm_memory_attributes2
> support, if we should even advertise KVM_CAP_MEMORY_ATTRIBUTES2 at all
> as part of this series.
>

Read your other email as well, thanks for reviewing!

It makes sense, hope this captures what you suggested. In v5,

If vm_memory_attributes == 1:
    (KVM_CAP_MEMORY_ATTRIBUTES2 will be removed (will return 0))

If vm_memory_attributes == 0 aka attributes are tracked by guest_memfd:
    KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES2 will return valid attributes
    (KVM_CAP_MEMORY_ATTRIBUTES2 will be removed (will return 0))

So yup, KVM_CAP_MEMORY_ATTRIBUTES2 will not even be #defined at all.

>> +
>> +Userspace is expected to figure out how to remove all known refcounts
>> +on the shared pages, such as refcounts taken by get_user_pages(), and
>> +try the ioctl again. A possible source of these long term refcounts is
>> +if the guest_memfd memory was pinned in IOMMU page tables.
>
> One might read this to mean error_offset is used purely for the EAGAIN
> case, so it might be worth touching on the other cases as well.
>

Will update this in the next revision.

> -Mike
>
>>
>> [...snip...]
>>

^ permalink raw reply

* Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
From: Ackerley Tng @ 2026-04-01 22:38 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jroedel, jthoughton, oupton, pankaj.gupta,
	qperret, rick.p.edgecombe, rientjes, shivankg, steven.price,
	tabba, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, Paolo Bonzini, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm
In-Reply-To: <2r4mmfiuisw26qymahnbh2oxqkkrywqev477kc4rlkcyx7tels@c7ple7kdgpo3>

Michael Roth <michael.roth@amd.com> writes:

>
> [...snip...]
>
>>  static unsigned long kvm_get_vm_memory_attributes(struct kvm *kvm, gfn_t gfn)
>>  {
>> @@ -2635,6 +2625,8 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
>>  		return -EINVAL;
>>  	if (!PAGE_ALIGNED(attrs->address) || !PAGE_ALIGNED(attrs->size))
>>  		return -EINVAL;
>> +	if (attrs->error_offset)
>> +		return -EINVAL;
>>  	for (i = 0; i < ARRAY_SIZE(attrs->reserved); i++) {
>>  		if (attrs->reserved[i])
>>  			return -EINVAL;
>> @@ -4983,6 +4975,11 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>>  		return 1;
>>  	case KVM_CAP_GUEST_MEMFD_FLAGS:
>>  		return kvm_gmem_get_supported_flags(kvm);
>> +	case KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES:
>> +		if (vm_memory_attributes)
>> +			return 0;
>> +
>> +		return kvm_supported_mem_attributes(kvm);
>
> Based on the discussion from the PUCK call this morning,

Thanks for copying the discussion here, I'll start attending PUCK to
catch those discussions too :)

> it sounds like it
> would be a good idea to limit kvm_supported_mem_attributes() to only
> reporting KVM_MEMORY_ATTRIBUTE_PRIVATE if the underlying CoCo
> implementation has all the necessary enablement to support in-place
> conversion via guest_memfd. In the case of SNP, there is a
> documentation/parameter check in snp_launch_update() that needs to be
> relaxed in order for userspace to be able to pass in a NULL 'src'
> parameter (since, for in-place conversion, it would be initialized in place
> as shared memory prior to the call, since by the time kvm_gmem_poulate()
> it will have been set to private and therefore cannot be faulted in via
> GUP (and if it could, we'd be unecessarily copying the src back on top
> of itself since src/dst are the same).

Could this be a separate thing? If I'm understanding you correctly, it's
not strictly a requirement for snp_launch_update() to first support a
NULL 'src' parameter before this series lands.

Without this series, the startup procedure is to have memory set up in
non-guest_memfd shared memory, and then snp_launch_update()-ed into
guest_memfd private memory.

With this series, it is a little troublesome, but the startup procedure
can still set up memory in guest_memfd shared memory, then copy
everything out to some temporary memory, then set guest_memfd memory to
private, then snp_launch_update() the temporary memory into guest_memfd
private memory.

We would be unnecessarily copying the src (now in some temporary memory)
back onto itself. Can that be a separate patch series?

Btw, if snp_launch_update() is going to accept a NULL src parameter and
launch-update the src in-place:

+ Will userspace have to set that memory to private before calling launch
  update?
    + If yes, then would we need some other mode of conversion that is
      not ZERO and not quite PRESERVE (since PRESERVE is defined as that
      the guest will see what the host wrote post-encryption, but it
      sounds like launch update is doing the encryption)
+ Or should launch update be called when that memory is shared? Will
  launch update then also set that memory to private in guest_memfd?

>
> So maybe there should be an arch hook to check a whitelist of VM types
> that support KVM_MEMORY_ATTRIBUTE_PRIVATE when vm_memory_attributes=0,
> and if we decide to enable it for SNP as part of this series you could
> include the 1-2 patches needed there, or I could enable the SNP support
> separately as a small series and I guess that would then become a prereq
> for the SNP self-tests?
>
> Not sure if additional enablement is needed for TDX or not before
> KVM_MEMORY_ATTRIBUTE_PRIVATE would be advertised, but similar
> considerations there.
>
> -Mike
>
>>  #endif
>>  	default:
>>  		break;
>>
>> --
>> 2.53.0.1018.g2bb0e51243-goog
>>

^ permalink raw reply

* Re: [PATCH RFC v4 07/44] KVM: guest_memfd: Only prepare folios for private pages
From: Ackerley Tng @ 2026-04-01 21:43 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jroedel, jthoughton, oupton, pankaj.gupta,
	qperret, rick.p.edgecombe, rientjes, shivankg, steven.price,
	tabba, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, Paolo Bonzini, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm
In-Reply-To: <s4dbqrv2c6yzt4nsflfarnggtl25xlz6mzg74tfeg3eskceno6@6l5hpfmcbju3>

Michael Roth <michael.roth@amd.com> writes:

> On Wed, Apr 01, 2026 at 07:05:16AM -0700, Ackerley Tng wrote:
>> Ackerley Tng <ackerleytng@google.com> writes:
>>
>> > All-shared guest_memfd used to be only supported for non-CoCo VMs where
>> > preparation doesn't apply. INIT_SHARED is about to be supported for
>> > non-CoCo VMs in a later patch in this series.
>> >
>> > In addition, KVM_SET_MEMORY_ATTRIBUTES2 is about to be supported in
>> > guest_memfd in a later patch in this series.
>> >
>> > This means that the kvm fault handler may now call kvm_gmem_get_pfn() on a
>> > shared folio for a CoCo VM where preparation applies.
>> >
>> > Add a check to make sure that preparation is only performed for private
>> > folios.
>> >
>> > Preparation will be undone on freeing (see kvm_gmem_free_folio()) and on
>> > conversion to shared.
>> >
>> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>> > ---
>> >  virt/kvm/guest_memfd.c | 9 ++++++---
>> >  1 file changed, 6 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> > index b6ffa8734175d..d414ebfcb4c19 100644
>> > --- a/virt/kvm/guest_memfd.c
>> > +++ b/virt/kvm/guest_memfd.c
>> > @@ -900,6 +900,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>> >  		     int *max_order)
>> >  {
>> >  	pgoff_t index = kvm_gmem_get_index(slot, gfn);
>> > +	struct inode *inode;
>> >  	struct folio *folio;
>> >  	int r = 0;
>> >
>> > @@ -907,7 +908,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>> >  	if (!file)
>> >  		return -EFAULT;
>> >
>> > -	filemap_invalidate_lock_shared(file_inode(file)->i_mapping);
>> > +	inode = file_inode(file);
>> > +	filemap_invalidate_lock_shared(inode->i_mapping);
>> >
>> >  	folio = __kvm_gmem_get_pfn(file, slot, index, pfn, max_order);
>> >  	if (IS_ERR(folio)) {
>> > @@ -920,7 +922,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>> >  		folio_mark_uptodate(folio);
>> >  	}
>> >
>> > -	r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
>> > +	if (kvm_gmem_is_private_mem(inode, index))
>> > +		r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
>>
>> Michael, I might have misunderstood you at the last guest_memfd call:
>> sev_gmem_prepare() doesn't prepare a page for being a shared page,
>> right? Does this work? That prepare is only called to "make private"?
>
> Hmm, I guess your guest_memfd-inplace-conversion-v4 branch is out of sync with
> these patches?
>

My bad, it was. I just force-pushed to github to synchronize them with
this patch series.

> I have the below local patch based on top of that for SNP-specific enablement,
> which is basically identically, so suffice to say: yes, this should work
> for SNP :) If any architecture pops up that needs to do some prep in
> advance of mapping shared pages, then we could potentially plumb the
> shared/private flag through to the arch-specific prep hook, as was also
> suggested on the call, but it doesn't seem like that's needed by any
> users for now.
>

Thanks for checking :)

> -Mike
>
>   Author: Michael Roth <michael.roth@amd.com>
>   Date:   Mon Oct 27 07:58:32 2025 -0500
>
>       KVM: guest_memfd: Don't prepare shared folios
>
>       In the current guest_memfd logic, "preparation" is only used currently
>       to describe the additional work of putting a guest_memfd page into an
>       architecturally-defined "private" state, such as updating RMP table
>       entries for SEV-SNP guests. As such, there's no input to the
>       corresponding kvm_arch_gmem_prepare() hooks as to whether a page is
>       being prepared/accessed as shared or as private, so "preparation" will
>       end up being erroneously done on pages that were supposed to remain in a
>       shared state. Rather than plumb through the additional information
>       needed to distinguish between shared vs. private preparation, just
>       continue to only do preparation on private pages, as was the case prior
>       to support for GUEST_MEMFD_FLAG_MMAP being introduced.
>
>       Signed-off-by: Michael Roth <michael.roth@amd.com>
>
>   diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>   index 3acc6d983449..4869e59e4fc5 100644
>   --- a/virt/kvm/guest_memfd.c
>   +++ b/virt/kvm/guest_memfd.c
>   @@ -1249,7 +1249,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>                   folio_mark_uptodate(folio);
>           }
>
>   -       r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
>   +       if (!kvm_gmem_is_shared_mem(file_inode(file), index))
>   +               r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
>
>           folio_unlock(folio);
>
>>
>> >
>> >  	folio_unlock(folio);
>> >
>> > @@ -930,7 +933,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>> >  		folio_put(folio);
>> >
>> >  out:
>> > -	filemap_invalidate_unlock_shared(file_inode(file)->i_mapping);
>> > +	filemap_invalidate_unlock_shared(inode->i_mapping);
>> >  	return r;
>> >  }
>> >  EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_get_pfn);
>> >
>> > --
>> > 2.53.0.1018.g2bb0e51243-goog

^ permalink raw reply

* Re: [PATCH 5/5] types: Add standard __ob_trap and __ob_wrap scalar types
From: Kees Cook @ 2026-04-01 21:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Justin Stitt, Miguel Ojeda, Nathan Chancellor,
	Andrew Morton, Andy Shevchenko, Arnd Bergmann, Mark Rutland,
	Matthew Wilcox (Oracle), Suren Baghdasaryan, Thomas Gleixner,
	Finn Thain, Geert Uytterhoeven, Thomas Weißschuh, llvm,
	Marco Elver, Jonathan Corbet, Nicolas Schier, Greg Kroah-Hartman,
	linux-kernel, kasan-dev, linux-hardening, linux-doc, linux-kbuild
In-Reply-To: <20260401093820.GX3738786@noisy.programming.kicks-ass.net>

On Wed, Apr 01, 2026 at 11:38:20AM +0200, Peter Zijlstra wrote:
> On Tue, Mar 31, 2026 at 01:31:16PM -0700, Kees Cook wrote:
> 
> > int func()
> > {
> > 	...
> > 	u8 __ob_trap product = 5;
> > 	...
> > 	product = a * b; // if store is truncated, goto __overflow
> > 	...
> > 	return product;
> > 
> > __overflow:
> > 	pr_info("%u\n", product); // shows "5"
> > 	return -1;
> > }
> > 
> > (Isn't this just an implicit "try"?)
> 
> So I like this implicit try with a default label, and mostly I expect
> this will be fine.
> 
> But as Linus already mentioned, sometimes you might want more. Could we
> perhaps also have an explicit version, something along the lines of:
> 
> int func()
> {
> 	int __ob_trap size;
> 
> 	size = try(count * flex_size, __mul_overflow);
> 	size = try(size + base_size, __add_overflow);
> 
> 	obj = kzalloc(size,...);
> 
> }
> 
> where we have something like:
> 
> #define try(stmt, _label) ({		\
> 	__label __overflow; 		\
> 	if (0) {			\
> __overflow:				\
> 		goto _label;		\
> 	}				\
> 	stmt; })
> 
> That is, have the overflow trapped and confined in the
> statement-expression by using the overflow label as a local label and
> use this little trampoline to re-direct to a custom label.

Yeah, that should work, and gives us a nice way to create handler
overrides. We've have to make sure the "locally defined" labels (with
__label__) and __ob_trap worked together sanely.

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
From: Sean Christopherson @ 2026-04-01 21:12 UTC (permalink / raw)
  To: Michael Roth
  Cc: Ackerley Tng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
	david, ira.weiny, jmattson, jroedel, jthoughton, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, tabba, willy, wyihan, yan.y.zhao, forkloop,
	pratyush, suzuki.poulose, aneesh.kumar, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm
In-Reply-To: <2r4mmfiuisw26qymahnbh2oxqkkrywqev477kc4rlkcyx7tels@c7ple7kdgpo3>

On Wed, Apr 01, 2026, Michael Roth wrote:
> On Thu, Mar 26, 2026 at 03:24:19PM -0700, Ackerley Tng wrote:
> >  #ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> >  static unsigned long kvm_get_vm_memory_attributes(struct kvm *kvm, gfn_t gfn)
> >  {
> > @@ -2635,6 +2625,8 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> >  		return -EINVAL;
> >  	if (!PAGE_ALIGNED(attrs->address) || !PAGE_ALIGNED(attrs->size))
> >  		return -EINVAL;
> > +	if (attrs->error_offset)
> > +		return -EINVAL;
> >  	for (i = 0; i < ARRAY_SIZE(attrs->reserved); i++) {
> >  		if (attrs->reserved[i])
> >  			return -EINVAL;
> > @@ -4983,6 +4975,11 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
> >  		return 1;
> >  	case KVM_CAP_GUEST_MEMFD_FLAGS:
> >  		return kvm_gmem_get_supported_flags(kvm);
> > +	case KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES:
> > +		if (vm_memory_attributes)
> > +			return 0;
> > +
> > +		return kvm_supported_mem_attributes(kvm);
> 
> Based on the discussion from the PUCK call this morning, it sounds like it
> would be a good idea to limit kvm_supported_mem_attributes() to only
> reporting KVM_MEMORY_ATTRIBUTE_PRIVATE if the underlying CoCo
> implementation has all the necessary enablement to support in-place
> conversion via guest_memfd. In the case of SNP, there is a
> documentation/parameter check in snp_launch_update() that needs to be
> relaxed in order for userspace to be able to pass in a NULL 'src'
> parameter (since, for in-place conversion, it would be initialized in place
> as shared memory prior to the call, since by the time kvm_gmem_poulate()
> it will have been set to private and therefore cannot be faulted in via
> GUP (and if it could, we'd be unecessarily copying the src back on top
> of itself since src/dst are the same).
> 
> So maybe there should be an arch hook to check a whitelist of VM types
> that support KVM_MEMORY_ATTRIBUTE_PRIVATE when vm_memory_attributes=0,
> and if we decide to enable it for SNP as part of this series you could
> include the 1-2 patches needed there, or I could enable the SNP support
> separately as a small series and I guess that would then become a prereq
> for the SNP self-tests?

If it's trivial-ish, my preference would be to include SNP as part of this series,
_before_ KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES is exposed to userspace.  I think
that would avoid the need for pivoting on the VM type?  I.e. don't advertise
support until all VM types play nice.

> Not sure if additional enablement is needed for TDX or not before
> KVM_MEMORY_ATTRIBUTE_PRIVATE would be advertised, but similar
> considerations there.

^ permalink raw reply

* Re: [PATCH RFC v4 25/44] KVM: selftests: Test basic single-page conversion flow
From: Sean Christopherson @ 2026-04-01 21:08 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jroedel, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, tabba, willy, wyihan, yan.y.zhao, forkloop,
	pratyush, suzuki.poulose, aneesh.kumar, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm
In-Reply-To: <CAEvNRgE6Tn81Yddgbjqs-gs491NzpppjbDHKzpmdPCxgSPeUPQ@mail.gmail.com>

On Tue, Mar 31, 2026, Ackerley Tng wrote:
> Ackerley Tng <ackerleytng@google.com> writes:

Please trim your replies (even more, since you did trim a little).

> > +static void run_guest_do_rmw(struct kvm_vcpu *vcpu, loff_t pgoff,
> > +			     char expected_val, char write_val)
> > +{
> > +	struct ucall uc;
> > +	int r;
> > +
> > +	guest_data.mem = (void *)GUEST_MEMFD_SHARING_TEST_GVA + pgoff * page_size;
> > +	guest_data.expected_val = expected_val;
> > +	guest_data.write_val = write_val;
> > +	sync_global_to_guest(vcpu->vm, guest_data);
> > +
> > +	do {
> > +		r = __vcpu_run(vcpu);
> > +	} while (r == -1 && errno == EINTR);
> > +
> > +	TEST_ASSERT_EQ(r, 0);
> 
> TEST_ASSERT_EQ() ends up calling exit() on failures, which skips
> FIXTURE_TEARDOWN().
> 
> Other than the explicit assertions not working with the
> kselftest_harness, kvm selftest library functions like vm_mem_add() also
> call TEST_ASSERT, which doesn't play nice with kselftest_harness.
> 
> Any suggestions for this? Should we use the kselftest framework with
> these tests?
> 
> (I ran into this issue while trying to test something else, where I
> needed FIXTURE_TEARDOWN() to clean up system state.)
> 
> Or is it "okay" in this case since FIXTURE_TEARDOWN() only cleans up
> stuff that would happen if the program exits anyway?

Can you see if any of the ideas in https://lore.kernel.org/all/ZjUwqEXPA5QVItyX@google.com
would help?  Converting more tests to TAP+FIXTURE is still on my wish list, I've
just never been able to carve out cycles to see it through.

^ permalink raw reply

* Re: [PATCH RFC v4 08/44] KVM: Introduce KVM_SET_MEMORY_ATTRIBUTES2
From: Sean Christopherson @ 2026-04-01 21:04 UTC (permalink / raw)
  To: Michael Roth
  Cc: Ackerley Tng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
	david, ira.weiny, jmattson, jroedel, jthoughton, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, tabba, willy, wyihan, yan.y.zhao, forkloop,
	pratyush, suzuki.poulose, aneesh.kumar, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm
In-Reply-To: <cxmtst7txfodp6lo4ipue3trohx2ge7nkagkfzfixfdnlf5qlo@e3jw3tmbe272>

On Tue, Mar 31, 2026, Michael Roth wrote:
> On Thu, Mar 26, 2026 at 03:24:17PM -0700, Ackerley Tng wrote:
> > Introduce a "version 2" of KVM_SET_MEMORY_ATTRIBUTES to support returning
> > information back to userspace.
> 
> Hi Ackerley,
> 
> Not trying to bikeshed below, but I'm working on getting related QEMU
> patches cleaned up to post soon and was working through some of the new
> uAPI bits, and plumbing some of these capabilities in seems a little
> awkward in a couple places so wondering if we should revisit how some of
> this API is defined...
> 
> > 
> > This new ioctl and structure will, in a later patch, be shared as a
> > guest_memfd ioctl, where the padding in the new kvm_memory_attributes2
> > structure will be for writing the response from the guest_memfd ioctl to
> > userspace.
> > 
> > A new ioctl is necessary for these reasons:
> > 
> > 1. KVM_SET_MEMORY_ATTRIBUTES is currently a write-only ioctl and does not
> >    allow userspace to read fields. There's nothing in code (yet?) that
> >    validates this, but using _IOWR for consistency would be prudent.
> > 
> > 2. KVM_SET_MEMORY_ATTRIBUTES, when used as a guest_memfd ioctl, will need
> >    an additional field to provide userspace with more error details.
> > 
> > Alternatively, a completely new ioctl could be defined, unrelated to
> > KVM_SET_MEMORY_ATTRIBUTES, but using the same ioctl number and struct for
> > the vm and guest_memfd ioctls streamlines the interface for userspace. In
> > addition, any memory attributes, implemented on the vm or guest_memfd
> > ioctl, can be easily shared with the other.
> > 
> > Add KVM_CAP_MEMORY_ATTRIBUTES2 to indicate that struct
> > kvm_memory_attributes2 exists and can be used either with
> > KVM_SET_MEMORY_ATTRIBUTES2 via the vm or guest_memfd ioctl.
> 
> The guest_memfd support for the KVM_SET_MEMORY_ATTRIBUTES2 ioctl isn't
> added until patch #10, and to scan for it you sort of need to infer it
> via KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES reporting non-zero (i.e.
> KVM_MEMORY_ATTRIBUTE_PRIVATE), so it's confusing to state that
> KVM_CAP_MEMORY_ATTRIBUTES2 means you can use the struct via a guest_memfd
> ioctl.
> 
> I think the above is trying to simply say that the corresponding struct
> exists, and remain agnostic about how it can be used. But if that were
> the case, there would be no way to know when KVM_SET_MEMORY_ATTRIBUTES2 is
> available in the first place, so in the case of KVM ioctls at least,
> KVM_CAP_MEMORY_ATTRIBUTES2 is advertising both the struct and the ioctl,
> whereas for guest_memfd it's only advertising the struct and not saying
> anything about whether a similar gmem ioctl is available to use it.

+1 to everything Mike said.

> Instead, maybe they should both have the same semantics:
> 
>   KVM_CAP_MEMORY_ATTRIBUTES2: *SET_ATTRIBUTES* ioctl exists for KVM that utilizes
>     struct kvm_memory_attributes2
> 
>   KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES: *SET_ATTRIBUTES* ioctl exists for
>     guest_memfd that utilizes struct kvm_memory_attributes2
> 
> In which case you would leave out any mention of guest_memfd here as far as
> the documentation does, and then in patch #10 you could modify it to be
> something like:
> 
>    4.145 KVM_SET_MEMORY_ATTRIBUTES2
>    ---------------------------------
> 
>   -:Capability: KVM_CAP_MEMORY_ATTRIBUTES2
>   +:Capability: KVM_CAP_MEMORY_ATTRIBUTES2, KVM_GUEST_MEMFD_CAP_MEMORY_ATTRIBUTES
>   -:Architectures: x86
>   +:Architectures: all
>   -:Type: vm ioctl
>   +:Type: vm, guest_memfd ioctl
>    :Parameters: struct kvm_memory_attributes2 (in/out)
>    :Returns: 0 on success, <0 on error

As discussed at PUCK, I think we should omit KVM_CAP_MEMORY_ATTRIBUTES2 and
vm-scoped support entirely until it's needed (which may be never).

> and *then* add in your mentions of how the usage/fields differ for
> guest_memfd/KVM_GUEST_MEMFD_CAP_MEMORY_ATTRIBUTES case vs. KVM ioctls.
> 
> This avoids needing to issue 2 checks for the guest_memfd variant vs. 1
> for KVM, but more importantly avoids subtle differences in how these
> similarly-named capabilities are used/documented that might cause
> unecessary confusion.

^ permalink raw reply

* Re: [PATCH 3/5] compiler_attributes: Add overflow_behavior macros __ob_trap and __ob_wrap
From: Kees Cook @ 2026-04-01 20:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Justin Stitt, Miguel Ojeda, Marco Elver, Andrey Konovalov,
	Andrey Ryabinin, Jonathan Corbet, Shuah Khan, Miguel Ojeda,
	Nathan Chancellor, kasan-dev, linux-doc, llvm, Linus Torvalds,
	Nicolas Schier, Arnd Bergmann, Greg Kroah-Hartman, Andrew Morton,
	linux-kernel, linux-hardening, linux-kbuild
In-Reply-To: <20260401203053.GC3254421@noisy.programming.kicks-ass.net>

On Wed, Apr 01, 2026 at 10:30:53PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 01, 2026 at 01:21:17PM -0700, Kees Cook wrote:
> > On Wed, Apr 01, 2026 at 11:08:15AM +0200, Peter Zijlstra wrote:
> > > On Tue, Mar 31, 2026 at 12:52:10PM -0700, Kees Cook wrote:
> > > 
> > > > I think for this series, __ob_trap/__ob_wrap is what should be used.
> > > > 
> > > > And for other folks, the background here is that we originally wanted
> > > > to use macros for "__trap" and "__wrap", but the powerpc C compiler
> > > > (both Clang and GCC) have a builtin macro named "__trap" already. So
> > > > I switched to just using the Clang-native type qualifier. We can use
> > > > the attribute style too, but there was a lot of confusion during the
> > > > Clang development phases where people kept forgetting this was a type
> > > > qualifier, not an attribute (i.e. the attribute is an internal alias
> > > > for the qualifier, and the qualifier is a new type).
> > > 
> > > Since you mention qualifiers...
> > > 
> > > What is the result of __typeof_unqual__(int __ob_trap) ?
> > 
> > Hmm, it seems like "const" doesn't get peeled off. That can be fixed, if
> > that's needed?
> > 
> > 'typeof_unqual(int)' (aka 'int')
> > 'typeof_unqual(__ob_trap int)' (aka '__ob_trap int')
> > 'typeof_unqual(const int)' (aka 'int')
> > 'typeof_unqual(__ob_trap const int)' (aka '__ob_trap const int')
> 
> So how can something be called a qualifier if unqual doesn't strip it?
> 
> (We might already have had this discussion, but I can't find the answer
> in the LLVM documentation page and didn't search our previous
> correspondence on this).

I'll let Justin answer this correctly, but I suspect I'm using the wrong
name for things. I think I should have said "keyword"? The mechanism
produces a distinct type, so it's not actually a "qualifier", but the
intent is that they are distinct types (this came from a long long
discussion with the LLVM devs, etc), but that they have implicit casts
to the non-obt scalars, but Justin knows more on this.

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH 5/5] types: Add standard __ob_trap and __ob_wrap scalar types
From: Kees Cook @ 2026-04-01 20:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Justin Stitt, Miguel Ojeda, Nathan Chancellor,
	Andrew Morton, Andy Shevchenko, Arnd Bergmann, Mark Rutland,
	Matthew Wilcox (Oracle), Suren Baghdasaryan, Thomas Gleixner,
	Finn Thain, Geert Uytterhoeven, Thomas Weißschuh, llvm,
	Marco Elver, Jonathan Corbet, Nicolas Schier, Greg Kroah-Hartman,
	linux-kernel, kasan-dev, linux-hardening, linux-doc, linux-kbuild
In-Reply-To: <20260401083137.GT3738786@noisy.programming.kicks-ass.net>

On Wed, Apr 01, 2026 at 10:31:37AM +0200, Peter Zijlstra wrote:
> On Tue, Mar 31, 2026 at 01:31:16PM -0700, Kees Cook wrote:
> 
> (still slowly digesting the thread)
> 
> > Yeah, as you mentioned earlier, I'd agree that nesting is rarely
> > useful. The only thing I'd want to be careful about is ordering/scope. I
> > *think* it would just operate as a "goto" and things like the cleanup.h
> > handlers wouldn't be involved: they operate when a scope is crossed
> > like before. And I think the overflow result wouldn't be represented
> > anywhere. i.e. the wrapped/truncated value wouldn't be stored:
> > 
> > int func()
> > {
> > 	...
> > 	u8 __ob_trap product = 5;
> > 	...
> > 	product = a * b; // if store is truncated, goto __overflow
> > 	...
> > 	return product;
> > 
> > __overflow:
> > 	pr_info("%u\n", product); // shows "5"
> > 	return -1;
> > }
> 
> Note that there is a 'fun' problem with this in combination with
> cleanup.h.
> 
> Something like:
> 
> int func()
> {
> 	u8 __ob_trap prod = 0;
> 
> 	scoped_guard (mutex, &my_lock) {
> 		prod = a * b;
> 	}
> 
> 	return prod;
> 
> __overflow:
> 	// whatever
> 	return -1;
> }
> 
> is fine. *HOWEVER*, something like:
> 
> int func()
> {
> 	int __ob_trap size = base + count * extra;
> 	int err;
> 
> 	struct my_obj *obj __cleanup(kfree) = kzalloc(size, GFP_KERNEL);
> 
> 	err = my_obj_init(obj);
> 	if (err)
> 		return ERR_PTR(err);
> 
> 	return_ptr(obj);
> 
> __overflow:
> 	// what now..
> 	return NULL;
> }
> 
> is most terribly broken. Specifically, the goto will jump into the scope
> of obj -- and that is not allowed.

Right, this has been my primary concern about having an implicit "goto"
sprinkled basically anywhere into the code flow. However, it does seem
that initialization checking is aware of the problem:

void func(void)
{
        unsigned long __ob_trap value = ({ goto weird; 256; });
        size_t outcome = 99;

        outcome = get_outcome();

	pr_info("outcome: %zu\n", outcome);
        return;
weird:
        pr_info("value: %lu\n", value);
        pr_info("outcome: %zu\n", outcome);
}

../drivers/misc/lkdtm/bugs.c:1059:35: warning: variable 'outcome' is uninitialized when used here [-Wuninitialized]
 1059 |         pr_info("outcome: %zu\n", outcome);
      |                                   ^~~~~~~
../drivers/misc/lkdtm/bugs.c:1021:2: note: variable 'outcome' is declared here
 1021 |         size_t outcome = 99;
      |         ^
../drivers/misc/lkdtm/bugs.c:1058:33: warning: variable 'value' is uninitialized when used here [-Wuninitialized]
 1058 |         pr_info("value: %lu\n", value);
      |                                 ^~~~~
../drivers/misc/lkdtm/bugs.c:1020:2: note: variable 'value' is declared here
 1020 |         unsigned long __ob_trap value = ({ goto weird; 256; });
      |         ^

But most importantly, if I add a cleanup after it, it gets rejected:

        unsigned long __ob_trap value = ({ goto weird; 256; });
        size_t outcome = 99;
        u8 *obj __cleanup(kfree) = kzalloc(33, GFP_KERNEL);
	...

../drivers/misc/lkdtm/bugs.c:1021:37: error: cannot jump from this goto statement to its label
 1021 |         unsigned long __ob_trap value = ({ goto weird; 256; });
      |                                            ^
../drivers/misc/lkdtm/bugs.c:1023:6: note: jump bypasses initialization of variable with __attribute__((cleanup))
 1023 |         u8 *obj __cleanup(kfree) = kzalloc(33, GFP_KERNEL);
      |             ^


(Though I would note that GCC does _not_ refuse the jump when there is a
cleanup; it only see the other two uninitialized values.)

So that makes it not totally broken, but it does make it a bit fragile.



Another concern I have is dealing with older compilers and how to
"hide" the label and its code. e.g. if I remove the "goto" from above:

../drivers/misc/lkdtm/bugs.c:1060:1: warning: label 'weird' defined but not used [-Wunused-label]
 1060 | weird:
      | ^~~~~

Oddly, the unreachable code isn't a problem, so we could just wrap the
label is some macro like:

#define force_label(x) if (0) goto x; x

force_label(weird):
        pr_info("value: %lu\n", value);
        pr_info("outcome: %zu\n", outcome);


-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH 3/5] compiler_attributes: Add overflow_behavior macros __ob_trap and __ob_wrap
From: Peter Zijlstra @ 2026-04-01 20:30 UTC (permalink / raw)
  To: Kees Cook
  Cc: Justin Stitt, Miguel Ojeda, Marco Elver, Andrey Konovalov,
	Andrey Ryabinin, Jonathan Corbet, Shuah Khan, Miguel Ojeda,
	Nathan Chancellor, kasan-dev, linux-doc, llvm, Linus Torvalds,
	Nicolas Schier, Arnd Bergmann, Greg Kroah-Hartman, Andrew Morton,
	linux-kernel, linux-hardening, linux-kbuild
In-Reply-To: <202604011313.AD471BC8@keescook>

On Wed, Apr 01, 2026 at 01:21:17PM -0700, Kees Cook wrote:
> On Wed, Apr 01, 2026 at 11:08:15AM +0200, Peter Zijlstra wrote:
> > On Tue, Mar 31, 2026 at 12:52:10PM -0700, Kees Cook wrote:
> > 
> > > I think for this series, __ob_trap/__ob_wrap is what should be used.
> > > 
> > > And for other folks, the background here is that we originally wanted
> > > to use macros for "__trap" and "__wrap", but the powerpc C compiler
> > > (both Clang and GCC) have a builtin macro named "__trap" already. So
> > > I switched to just using the Clang-native type qualifier. We can use
> > > the attribute style too, but there was a lot of confusion during the
> > > Clang development phases where people kept forgetting this was a type
> > > qualifier, not an attribute (i.e. the attribute is an internal alias
> > > for the qualifier, and the qualifier is a new type).
> > 
> > Since you mention qualifiers...
> > 
> > What is the result of __typeof_unqual__(int __ob_trap) ?
> 
> Hmm, it seems like "const" doesn't get peeled off. That can be fixed, if
> that's needed?
> 
> 'typeof_unqual(int)' (aka 'int')
> 'typeof_unqual(__ob_trap int)' (aka '__ob_trap int')
> 'typeof_unqual(const int)' (aka 'int')
> 'typeof_unqual(__ob_trap const int)' (aka '__ob_trap const int')

So how can something be called a qualifier if unqual doesn't strip it?

(We might already have had this discussion, but I can't find the answer
in the LLVM documentation page and didn't search our previous
correspondence on this).


^ permalink raw reply

* Re: [PATCH 5/5] types: Add standard __ob_trap and __ob_wrap scalar types
From: Kees Cook @ 2026-04-01 20:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Justin Stitt, Miguel Ojeda, Nathan Chancellor,
	Andrew Morton, Andy Shevchenko, Arnd Bergmann, Mark Rutland,
	Matthew Wilcox (Oracle), Suren Baghdasaryan, Thomas Gleixner,
	Finn Thain, Geert Uytterhoeven, Thomas Weißschuh, llvm,
	Marco Elver, Jonathan Corbet, Nicolas Schier, Greg Kroah-Hartman,
	linux-kernel, kasan-dev, linux-hardening, linux-doc, linux-kbuild
In-Reply-To: <20260401085706.GU3738786@noisy.programming.kicks-ass.net>

On Wed, Apr 01, 2026 at 10:57:06AM +0200, Peter Zijlstra wrote:
> On Tue, Mar 31, 2026 at 01:31:16PM -0700, Kees Cook wrote:
> 
> > int func()
> > {
> > 	...
> > 	u8 __ob_trap product = 5;
> > 	...
> > 	product = a * b; // if store is truncated, goto __overflow
> > 	...
> > 	return product;
> > 
> > __overflow:
> > 	pr_info("%u\n", product); // shows "5"
> 
> I'm confused by this 'product is still 5' thing. It seems to me that
> making this happen will, in general, require more instructions/registers
> than allowing the old value to be clobbered and have product be the
> truncated result of whatever overflow.

Yeah, that's fair. That's what happens to "out" already with the existing
check_{op}_overflow(a, b, &out) builtins, and what happens right now
with the "while (var--)" exception (var will wrap even as an __ob_trap).

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH 3/5] compiler_attributes: Add overflow_behavior macros __ob_trap and __ob_wrap
From: Kees Cook @ 2026-04-01 20:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Justin Stitt, Miguel Ojeda, Marco Elver, Andrey Konovalov,
	Andrey Ryabinin, Jonathan Corbet, Shuah Khan, Miguel Ojeda,
	Nathan Chancellor, kasan-dev, linux-doc, llvm, Linus Torvalds,
	Nicolas Schier, Arnd Bergmann, Greg Kroah-Hartman, Andrew Morton,
	linux-kernel, linux-hardening, linux-kbuild
In-Reply-To: <20260401090815.GV3738786@noisy.programming.kicks-ass.net>

On Wed, Apr 01, 2026 at 11:08:15AM +0200, Peter Zijlstra wrote:
> On Tue, Mar 31, 2026 at 12:52:10PM -0700, Kees Cook wrote:
> 
> > I think for this series, __ob_trap/__ob_wrap is what should be used.
> > 
> > And for other folks, the background here is that we originally wanted
> > to use macros for "__trap" and "__wrap", but the powerpc C compiler
> > (both Clang and GCC) have a builtin macro named "__trap" already. So
> > I switched to just using the Clang-native type qualifier. We can use
> > the attribute style too, but there was a lot of confusion during the
> > Clang development phases where people kept forgetting this was a type
> > qualifier, not an attribute (i.e. the attribute is an internal alias
> > for the qualifier, and the qualifier is a new type).
> 
> Since you mention qualifiers...
> 
> What is the result of __typeof_unqual__(int __ob_trap) ?

Hmm, it seems like "const" doesn't get peeled off. That can be fixed, if
that's needed?

'typeof_unqual(int)' (aka 'int')
'typeof_unqual(__ob_trap int)' (aka '__ob_trap int')
'typeof_unqual(const int)' (aka 'int')
'typeof_unqual(__ob_trap const int)' (aka '__ob_trap const int')

-Kees

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH v5] PCI: s390: Expose the UID as an arch specific PCI slot attribute
From: Randy Dunlap @ 2026-04-01 20:20 UTC (permalink / raw)
  To: Niklas Schnelle, Bjorn Helgaas, Jonathan Corbet, Lukas Wunner,
	Shuah Khan
  Cc: Alexander Gordeev, Christian Borntraeger, Gerald Schaefer,
	Gerd Bayer, Heiko Carstens, Julian Ruess, Matthew Rosato,
	Peter Oberparleiter, Ramesh Errabolu, Sven Schnelle,
	Vasily Gorbik, linux-doc, linux-kernel, linux-pci, linux-s390
In-Reply-To: <20260401-uid_slot-v5-1-e73036c74bf6@linux.ibm.com>



On 4/1/26 7:50 AM, Niklas Schnelle wrote:
> On s390, an individual PCI function can generally be identified by two
> identifiers, the FID and the UID. Which identifier is used depends on
> the scope and the platform configuration.
> 
> The first identifier, the FID, is always available and identifies a PCI
> device uniquely within a machine. The FID may be virtualized by
> hypervisors, but on the LPAR level, the machine scope makes it
> impossible to create the same configuration based on FIDs on two
> different LPARs of the same machine, and difficult to reuse across
> machines.
> 
> Such matching LPAR configurations are useful, though, allowing
> standardized setups and booting a Linux installation on different LPARs.
> To this end the UID, or user-defined identifier, was introduced. While
> it is only guaranteed to be unique within an LPAR and only if indicated
> by firmware, it allows users to replicate PCI device setups.
> 
> On s390, which uses a machine hypervisor, a per PCI function hotplug
> model is used. The shortcoming with the UID then is, that it is not
> visible to the user without first attaching the PCI function and
> accessing the "uid" device attribute. The FID, on the other hand, is
> used as the slot name and is thus known even with the PCI function in
> standby.
> 
> Remedy this shortcoming by providing the UID as an attribute on the slot
> allowing the user to identify a PCI function based on the UID without
> having to first attach it. Do this via a macro mechanism analogous to
> what was introduced by commit 265baca69a07 ("s390/pci: Stop usurping
> pdev->dev.groups") for the PCI device attributes.
> 
> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
> Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> ---
> Note: I considered adding the UID as a generic slot index attribute
> analogous to the PCI device index attribute (SMBIOS index / s390 UID)
> but decided against it as this seems rather s390 specific.
> 
> v4->v5:
> - Rebase on v7.0-rc6
> - Reworded note
> - Add documentation for the new attribute
> - Link to v4: https://lore.kernel.org/r/20251217-uid_slot-v4-1-559ceb59ba69@linux.ibm.com
> 
> v3->v4:
> - Rebase on v6.19-rc1
> - Collect R-bs
> - Link to v3: https://lore.kernel.org/r/20251015-uid_slot-v3-1-44389895c1bb@linux.ibm.com
> ---
> 
> ---
>  Documentation/arch/s390/pci.rst |  7 +++++++
>  arch/s390/include/asm/pci.h     |  4 ++++
>  arch/s390/pci/pci_sysfs.c       | 20 ++++++++++++++++++++
>  drivers/pci/slot.c              | 13 ++++++++++++-
>  4 files changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/arch/s390/pci.rst b/Documentation/arch/s390/pci.rst
> index d5755484d8e75c7bf67a350e61bbe04f0452a2fa..03afb57ece4df90a75597cb34c1f048c2e162b67 100644
> --- a/Documentation/arch/s390/pci.rst
> +++ b/Documentation/arch/s390/pci.rst
> @@ -55,6 +55,13 @@ Entries specific to zPCI functions and entries that hold zPCI information.
>  
>    - /sys/bus/pci/slots/XXXXXXXX/power
>  
> +  In addition to using the FID as the name of the slot the slot directory
> +  also contains the following s390 specific slot attributes

                                 s390-specific

Use a period (full stop, '.') at the end of the sentence above.

> +
> +  - uid
> +    The User-defined identifier (UID) of the function which may be configured
> +    by this slot. See also the corresponding attribute of the device.

These 3 lines are run together in html output. Maybe add a ':' or '-' after
uid?

> +>    A physical function that currently supports a virtual function cannot be
>    powered off until all virtual functions are removed with:
>    echo 0 > /sys/bus/pci/devices/XXXX:XX:XX.X/sriov_numvf


-- 
~Randy


^ permalink raw reply

* Re: [PATCH net-next V9 02/14] devlink: Add helpers to lock nested-in instances
From: Jacob Keller @ 2026-04-01 20:18 UTC (permalink / raw)
  To: Cosmin Ratiu, Tariq Toukan, kuba@kernel.org
  Cc: allison.henderson@oracle.com, jiri@resnulli.us, Moshe Shemesh,
	davem@davemloft.net, daniel.zahka@gmail.com,
	donald.hunter@gmail.com, netdev@vger.kernel.org,
	matttbe@kernel.org, pabeni@redhat.com, horms@kernel.org,
	Parav Pandit, corbet@lwn.net, razor@blackwall.org, Dragos Tatulea,
	linux-kernel@vger.kernel.org, willemb@google.com, Jiri Pirko,
	Adithya Jayachandran, Dan Jurgens, leon@kernel.org,
	kees@kernel.org, vadim.fedorenko@linux.dev, Saeed Mahameed,
	shuah@kernel.org, andrew+netdev@lunn.ch, Mark Bloch,
	Shahar Shitrit, Carolina Jubran, Nimrod Oren,
	daniel@iogearbox.net, minhquangbui99@gmail.com, dw@davidwei.uk,
	skhan@linuxfoundation.org, Petr Machata, edumazet@google.com,
	antonio@openvpn.net, mst@redhat.com,
	linux-kselftest@vger.kernel.org, linux-rdma@vger.kernel.org,
	Shay Drori, sdf@fomichev.me, chuck.lever@oracle.com, Gal Pressman,
	joe@dama.to, linux-doc@vger.kernel.org
In-Reply-To: <4fcbf36799b5bfd5c0b68b0127f4f67aef00fdde.camel@nvidia.com>

On 4/1/2026 3:22 AM, Cosmin Ratiu wrote:
> On Tue, 2026-03-31 at 16:55 -0700, Jacob Keller wrote:
>> On 3/31/2026 5:20 AM, Cosmin Ratiu wrote:
>>> On Mon, 2026-03-30 at 19:08 -0700, Jakub Kicinski wrote:
>>>> This is an AI-generated review of your patch. The human sending
>>>> this
>>>> email has considered the AI review valid, or at least plausible.
>>>> ---
>>>> devlink: Add helpers to lock nested-in instances
>>>>
>>>> Upcoming code will need to obtain a reference to locked nested-in
>>>> devlink instances. Add helpers to lock, obtain an already locked
>>>> reference and unlock/unref the nested-in instance.
>>>>
>>>>> diff --git a/net/devlink/core.c b/net/devlink/core.c
>>>>> index eeb6a71f5f56e..db11248df7120 100644
>>>>> --- a/net/devlink/core.c
>>>>> +++ b/net/devlink/core.c
>>>>
>>>> [ ... ]
>>>>
>>>>> +/* Returns the nested in devlink object and validates its lock
>>>>> is
>>>>> held. */
>>>>> +struct devlink *devlink_nested_in_get_locked(struct
>>>>> devlink_rel
>>>>> *rel)
>>>>> +{
>>>>> +	struct devlink *devlink;
>>>>> +	unsigned long index;
>>>>> +
>>>>> +	if (!rel)
>>>>> +		return NULL;
>>>>> +	index = rel->nested_in.devlink_index;
>>>>> +	devlink = xa_find(&devlinks, &index, index,
>>>>> DEVLINK_REGISTERED);
>>>>> +	if (devlink)
>>>>> +		devl_assert_locked(devlink);
>>>>
>>>> Can this dereference a freed pointer if the lock is not actually
>>>> held?
>>>>
>>>> xa_find() acquires and releases the RCU read lock internally.
>>>> When it
>>>> returns,
>>>> the devlink pointer is no longer protected by RCU or a reference
>>>> count.
>>>>
>>>> If a caller invokes this helper without holding the devlink lock
>>>> (which is the
>>>> exact scenario the assertion is designed to detect), another
>>>> thread
>>>> could
>>>> concurrently unregister and free the devlink.
>>>>
>>>> If the RCU grace period expires before the assertion executes,
>>>> devl_assert_locked() would dereference freed memory. Should the
>>>> lookup and
>>>> assertion be enclosed within an explicit rcu_read_lock() block to
>>>> safely
>>>> assert the lock?
>>>
>>> This function is meant to be invoked by callers already holding all
>>> locks from a given instance up the nesting tree. Calling it outside
>>> this context could indeed lead to a race as described, where
>>> another
>>> entity unregisters a devlink about-to-be-asserted on.
>>>
>>
>> Hmm. I'm struggling to follow this. If you already expect the parent
>> to
>> hold the nested devlink's lock, it must have a pointer to this
>> devlink
>> instance. In that case, why would you even need
>> devlink_nested_in_get_locked in the first place?
> 
> After some more intense staring, I realized that intermediate instances
> don't actually need to be locked, only the ancestor needs to. With that
> in mind, the code get simplified:
> - devlink_nested_in_get_locked and devlink_nested_in_put_unlock can be
> removed.
> - recursive unlocking in devl_rate_unlock is gone.
> 
That seems like it would be better, as long as we can prove correctness
of every access. Thanks!

-Jake

^ permalink raw reply

* [linux-next:master 1471/11049] htmldocs: Warning: Documentation/devicetree/bindings/mfd/motorola-cpcap.txt references a file that doesn't exist: Documentation/devicetree/bindings/regulator/cpcap-regulator.txt
From: kernel test robot @ 2026-04-01 20:14 UTC (permalink / raw)
  To: Svyatoslav Ryhel; +Cc: oe-kbuild-all, Mark Brown, Rob Herring (Arm), linux-doc

Hi Svyatoslav,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
head:   bd0f139e5fc11182777b81cefc3893ea508544ec
commit: 5a8ffc5dca9c096fe9c8879fa3a2faff723fbb8a [1471/11049] regulator: dt-bindings: cpcap-regulator: convert to DT schema
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
docutils: docutils (Docutils 0.21.2, Python 3.13.5, on linux)
reproduce: (https://download.01.org/0day-ci/archive/20260401/202604012234.rkuyW7h8-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604012234.rkuyW7h8-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> Warning: Documentation/devicetree/bindings/mfd/motorola-cpcap.txt references a file that doesn't exist: Documentation/devicetree/bindings/regulator/cpcap-regulator.txt
   Warning: Documentation/devicetree/bindings/mfd/motorola-cpcap.txt references a file that doesn't exist: Documentation/devicetree/bindings/rtc/cpcap-rtc.txt
>> Warning: Documentation/devicetree/bindings/regulator/motorola,cpcap-regulator.yaml references a file that doesn't exist: Documentation/devicetree/bindings/mfd/motorola,cpcap.yaml
   Warning: Documentation/devicetree/bindings/regulator/siliconmitus,sm5703-regulator.yaml references a file that doesn't exist: Documentation/devicetree/bindings/mfd/siliconmitus,sm5703.yaml
   Warning: Documentation/devicetree/bindings/rtc/motorola,cpcap-rtc.yaml references a file that doesn't exist: Documentation/devicetree/bindings/mfd/motorola,cpcap.yaml
   Warning: Documentation/doc-guide/parse-headers.rst references a file that doesn't exist: Documentation/userspace-api/media/Makefile
   Warning: Documentation/leds/leds-lp5812.rst references a file that doesn't exist: Documentation/ABI/testing/sysfs-class-led-multicolor.rst
   Warning: Documentation/translations/it_IT/doc-guide/parse-headers.rst references a file that doesn't exist: Documentation/userspace-api/media/Makefile

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH 3/5] compiler_attributes: Add overflow_behavior macros __ob_trap and __ob_wrap
From: Kees Cook @ 2026-04-01 19:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vincent Mailhol, Justin Stitt, Marco Elver, Andrey Konovalov,
	Andrey Ryabinin, Jonathan Corbet, Shuah Khan, Miguel Ojeda,
	Nathan Chancellor, kasan-dev, linux-doc, llvm, Linus Torvalds,
	Nicolas Schier, Arnd Bergmann, Greg Kroah-Hartman, Andrew Morton,
	linux-kernel, linux-hardening, linux-kbuild
In-Reply-To: <20260401092027.GW3738786@noisy.programming.kicks-ass.net>

On Wed, Apr 01, 2026 at 11:20:27AM +0200, Peter Zijlstra wrote:
> On Wed, Apr 01, 2026 at 09:19:51AM +0200, Vincent Mailhol wrote:
> > Le 31/03/2026 à 18:37, Kees Cook a écrit :
> 
> > > +  - Saturate (explicitly hold the maximum or minimum representable value)
> > 
> > I just wanted to ask how much consideration was put into this last
> > "saturate" option.
> > 
> > When speaking of "safe" as in "functional safety" this seems a good
> > option to me. The best option is of course proper handling, but as
> > discussed, we are speaking of the scenario in which the code is already
> > buggy and which is the fallout option doing the least damage.
> > 
> > What I have in mind is a new __ob_saturate type qualifier. Something like:
> > 
> > 	void foo(int num)
> > 	{
> > 		int __ob_saturate saturate_var = num;
> > 	
> > 		saturate_var += 42;
> > 	}
> > 
> > would just print a warning and continue execution, thus solving the
> > trapping issue. The above code would generate something equivalent to that:
> > 
> > 	void foo(int num)
> > 	{
> > 		int __ob_saturate saturate_var = num;
> > 	
> > 		if (check_add_overflow(saturate_var, increment,
> > 				       &saturate_var) {
> > 			WARN(true, "saturation occurred");
> > 			saturate_var = type_max(saturate_var);
> > 	}
> 
> So I would like to second this option as being interesting.
> 
> But while pondering it, I did want to note that all of the options, with
> the exception of __ob_wrap (which is effectively what we have today for
> *everything*), will be 'interesting' to compose with _Atomic, another
> one of these qualifiers.
> 
> Now, in the kernel we don't use _Atomic, so strictly speaking I don't
> care ;-) But here goes...
> 
> Something like _Atomic int __ob_wrap, is trivial and good.
> 
> _Atomic int __ob_trap is either doable or impossible depending on how
> you define the result to be on 'trap'. Specifically, the semantics
> proposed where it keeps the old value makes it impossible.
> 
> And _Atomic int __ob_saturate is equally 'challenging', since the
> fundamental thing of 'reset to min/max on under/over-flow' is rather
> a non-atomic kind of thing. Look at the trouble we went through with
> refcount_t to sort of make this work.

Yeah, this is mainly why we didn't spend time working on an
__ob_saturate implementation: the primary place we want it in Linux is
already solved with all the refcount_t work. That said, if the behavior
could be replicated using a future __ob_saturate, that would be very
nice. :)

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH 3/5] compiler_attributes: Add overflow_behavior macros __ob_trap and __ob_wrap
From: Kees Cook @ 2026-04-01 19:42 UTC (permalink / raw)
  To: Vincent Mailhol
  Cc: Peter Zijlstra, Justin Stitt, Marco Elver, Andrey Konovalov,
	Andrey Ryabinin, Jonathan Corbet, Shuah Khan, Miguel Ojeda,
	Nathan Chancellor, kasan-dev, linux-doc, llvm, Linus Torvalds,
	Nicolas Schier, Arnd Bergmann, Greg Kroah-Hartman, Andrew Morton,
	linux-kernel, linux-hardening, linux-kbuild
In-Reply-To: <bd0a4235-a7f0-4624-802c-aa49a9d13f29@kernel.org>

On Wed, Apr 01, 2026 at 09:19:51AM +0200, Vincent Mailhol wrote:
> Many thanks for this series. Great work and I am ready it with a lot of
> interest!

Yay! Glad to have folks looking at it all.

> I just wanted to ask how much consideration was put into this last
> "saturate" option.
> 
> When speaking of "safe" as in "functional safety" this seems a good
> option to me. The best option is of course proper handling, but as
> discussed, we are speaking of the scenario in which the code is already
> buggy and which is the fallout option doing the least damage.

Right -- harm reduction. :)

> What I have in mind is a new __ob_saturate type qualifier. Something like:
> 
> 	void foo(int num)
> 	{
> 		int __ob_saturate saturate_var = num;
> 	
> 		saturate_var += 42;
> 	}
> 
> would just print a warning and continue execution, thus solving the
> trapping issue. The above code would generate something equivalent to that:
> 
> 	void foo(int num)
> 	{
> 		int __ob_saturate saturate_var = num;
> 	
> 		if (check_add_overflow(saturate_var, increment,
> 				       &saturate_var) {
> 			WARN(true, "saturation occurred");
> 			saturate_var = type_max(saturate_var);
> 	}

Right, yes. Note that __ob_saturate is entirely unimplemented, but we
wanted to leave the door open for other Overflow Behaviors. (It was
tricky enough to even get the semantics worked out from wrap and trap,
so we wanted to get to a distinct first step landed first.)

For the "warn" part with __ob_trap, we borrowed the Sanitizer
infrastructure since architecturally it's in exactly the same places
that __ob_trap needs to be checking, and already has everything
available. In the case of __ob_saturate, it would only be informational.
(Arguably, there should be no "warn" at all, as it's the "expected"
behavior, just like __ob_wrap has no "warn" on wrap-around. But it seems
sensible to me to make that available by enabling the sanitizers too.)

> People using those saturating integers could then later check that the
> value is still in bound.
> 
> This is basically what your size_add() from overflow.h is already doing.
> If an overflow occurred, the allocation the addition does not trap, it
> just saturates and let the allocation functions properly handle the issue.

Right.

> The saturation can neutralize many security attacks and can mitigate
> some safety issues. Think of the Ariane 5 rocket launch: a saturation
> could have prevented the unintended fireworks.
> 
> The caveat I can think of is that the old overflow check pattern becomes
> invalid. Doing:
> 
> 	if (saturate_var + increment < increment)
> 
> is now bogus and would need to be caught if possible by static analysis.
> So those saturating integers will only be usable in newly written code
> and could not be easily retrofitted.

In theory, the "ignored patterns" (or "idiom exclusions") would already
allow this to continue to behave correctly, though it may be worth trying
to figure out if this is "correct" or not.

> > +In the C standard, three basic types can be involved in arithmetic, and each
> > +has a default strategy for solving the overflow problem:
> > +
> > +  - Signed overflow is undefined
> > +  - Unsigned overflow explicitly wraps around
> > +  - Pointer overflow is undefined
> 
> Nitpick: the C standard uses different definitions than yours. In the
> standard:
> 
>   - overflow is *always* undefined
>   - unsigned integer wraparound
>   - signed integer overflow
> 
> The nuance is that in the standard unsigned integers do not overflow,
> they just wraparound.

I guess that's technically true, but for understanding the "overflow
resolution" properties (from a mathematical perspective), the question
is "what happens when a value cannot be represented by the bit pattern
of the storage?" But I think we understand each other here. :)
So given that under C, signed is undefined and unsigned in wraparound,
this is how we ended up phrasing it.

> I am not asking you to change your terminology, but it could be good to
> state in your document that your definition of overflow differs from the
> standard's definition. Maybe a terminology section could help.

I'm open to whatever you think would make this more clear. :)

-Kees

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH v8 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
From: Pawan Gupta @ 2026-04-01 18:52 UTC (permalink / raw)
  To: David Laight
  Cc: Borislav Petkov, x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, David Kaplan, Sean Christopherson, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	Andy Lutomirski, Thomas Gleixner, Ingo Molnar, David Ahern,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, Stanislav Fomichev, Hao Luo, Paolo Bonzini,
	Jonathan Corbet, linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf,
	netdev, linux-doc
In-Reply-To: <20260401100200.5b347628@pumpkin>

On Wed, Apr 01, 2026 at 10:02:00AM +0100, David Laight wrote:
> > > As well as swapping %al <-> %ah try changing the outer loop decrement to
> > > 	sub $0x100, %ax
> > > since %al is zero that will set the z flag the same.  
> > 
> > Unfortunately, using "sub $0x100, %ax"(with %al as inner loop) isn't better
> > than just using "sub $1, %ah" in the outer loop:
> > 
> >   Event                     %al inner      + sub %ax       Delta
> >   ----------------------  -------------  -------------  ----------
> >   cycles                    776,775,020    813,372,036     +4.7%
> >   instructions/cycle               1.23           1.17     -4.5%
> >   branch-misses               4,792,502      7,610,323    +58.8%
> >   uops_issued.any           768,019,010    827,465,137     +7.7%
> >   time elapsed                 0.1627s        0.1707s      +4.9%
> 
> That is even more interesting.
> The 'sub %ax' version has more uops and more branch-misses.
> Looks like the extra cost of the %ah access is less than the cost
> of the extra mis-predicted branches.
> 
> Makes me wonder where a version that uses %cl fits?
> (Or use a zero-extending read and %eax/%ecx - likely to be the same.)
> I'll bet 'one beer' that is nearest the 'sub %ax' version.

%cl didn't make a noticeable difference, but ...

    Event                      %al/%ah        %al/%cl        Delta
                             (inner/outer)  (inner/outer)
    ----------------------  -------------  -------------  ----------
    cycles                    776,380,149    778,294,183     +0.2%
    instructions/cycle               1.23           1.22     -0.4%
    branch-misses               4,986,437      5,679,599    +13.9%
    uops_issued.any           773,223,387    765,724,878     -1.0%
    time elapsed                 0.1631s        0.1637s      +0.4%

... there are meaningful gains with 32-bit registers:

    Event                      %al/%ah        %eax/%ecx      Delta
                             (inner/outer)  (inner/outer)
    ----------------------  -------------  -------------  ----------
    cycles                    776,380,149    706,331,177     -9.0%
    instructions/cycle               1.23           1.35     +9.9%
    branch-misses               4,986,437      6,089,306    +22.1%
    uops_issued.any           773,223,387    774,539,522     +0.2%
    time elapsed                 0.1631s        0.1482s      -9.1%

These values are for userspace tests with immediates. Next, I will test how
they perform with memory loads in kernel. Before we finalize these uarch
nuances needs to be tested on a variety of CPUs.

^ permalink raw reply

* [PATCH net-next V4 12/12] devlink: Document resource scope filtering
From: Tariq Toukan @ 2026-04-01 18:49 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Simon Horman, Donald Hunter, Jiri Pirko, Jonathan Corbet,
	Shuah Khan, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Shuah Khan, Chuck Lever, Matthieu Baerts (NGI0),
	Carolina Jubran, Or Har-Toov, Moshe Shemesh, Dragos Tatulea,
	Shahar Shitrit, Daniel Zahka, Jacob Keller, Cosmin Ratiu,
	Parav Pandit, Shay Drori, Adithya Jayachandran, Kees Cook,
	Daniel Jurgens, netdev, linux-kernel, linux-doc, linux-rdma,
	linux-kselftest, Gal Pressman
In-Reply-To: <20260401184947.135205-1-tariqt@nvidia.com>

From: Or Har-Toov <ohartoov@nvidia.com>

Document the scope parameter for devlink resource show, which allows
filtering the dump to device-level or port-level resources only.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../networking/devlink/devlink-resource.rst   | 35 +++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/Documentation/networking/devlink/devlink-resource.rst b/Documentation/networking/devlink/devlink-resource.rst
index 9839c1661315..47eec8f875b4 100644
--- a/Documentation/networking/devlink/devlink-resource.rst
+++ b/Documentation/networking/devlink/devlink-resource.rst
@@ -109,3 +109,38 @@ To show resources for a specific port:
     $ devlink resource show pci/0000:03:00.0/196608
     pci/0000:03:00.0/196608:
       name max_SFs size 128 unit entry dpipe_tables none
+
+Resource Scope Filtering
+========================
+
+When dumping resources for all devices, ``devlink resource show`` accepts
+an optional ``scope`` parameter to restrict the response to device-level
+resources, port-level resources, or both (the default).
+
+To dump only device-level resources across all devices:
+
+.. code:: shell
+
+    $ devlink resource show scope dev
+    pci/0000:03:00.0:
+      name max_local_SFs size 128 unit entry dpipe_tables none
+      name max_external_SFs size 128 unit entry dpipe_tables none
+    pci/0000:03:00.1:
+      name max_local_SFs size 128 unit entry dpipe_tables none
+      name max_external_SFs size 128 unit entry dpipe_tables none
+
+To dump only port-level resources across all devices:
+
+.. code:: shell
+
+    $ devlink resource show scope port
+    pci/0000:03:00.0/196608:
+      name max_SFs size 128 unit entry dpipe_tables none
+    pci/0000:03:00.0/196609:
+      name max_SFs size 128 unit entry dpipe_tables none
+    pci/0000:03:00.1/196708:
+      name max_SFs size 128 unit entry dpipe_tables none
+    pci/0000:03:00.1/196709:
+      name max_SFs size 128 unit entry dpipe_tables none
+
+Note that port-level resources are read-only.
-- 
2.44.0


^ permalink raw reply related

* [PATCH net-next V4 11/12] selftest: netdevsim: Add resource dump and scope filter test
From: Tariq Toukan @ 2026-04-01 18:49 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Simon Horman, Donald Hunter, Jiri Pirko, Jonathan Corbet,
	Shuah Khan, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Shuah Khan, Chuck Lever, Matthieu Baerts (NGI0),
	Carolina Jubran, Or Har-Toov, Moshe Shemesh, Dragos Tatulea,
	Shahar Shitrit, Daniel Zahka, Jacob Keller, Cosmin Ratiu,
	Parav Pandit, Shay Drori, Adithya Jayachandran, Kees Cook,
	Daniel Jurgens, netdev, linux-kernel, linux-doc, linux-rdma,
	linux-kselftest, Gal Pressman
In-Reply-To: <20260401184947.135205-1-tariqt@nvidia.com>

From: Or Har-Toov <ohartoov@nvidia.com>

Add resource_dump_test() which verifies dumping resources for all
devices and ports, and tests that scope=dev returns only device-level
resources and scope=port returns only port resources.

Skip if userspace does not support the scope parameter.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../drivers/net/netdevsim/devlink.sh          | 52 ++++++++++++++++++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
index 2e63d02fae4b..8118cc211590 100755
--- a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
+++ b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
@@ -5,7 +5,7 @@ lib_dir=$(dirname $0)/../../../net/forwarding
 
 ALL_TESTS="fw_flash_test params_test  \
 	   params_default_test regions_test reload_test \
-	   netns_reload_test resource_test \
+	   netns_reload_test resource_test resource_dump_test \
 	   port_resource_doit_test dev_info_test \
 	   empty_reporter_test dummy_reporter_test rate_test"
 NUM_NETIFS=0
@@ -483,6 +483,56 @@ resource_test()
 	log_test "resource test"
 }
 
+resource_dump_test()
+{
+	RET=0
+
+	local port_jq
+	local dev_jq
+	local dl_jq
+	local count
+
+	dl_jq="with_entries(select(.key | startswith(\"$DL_HANDLE\")))"
+	port_jq="[.[] | $dl_jq | keys |"
+	port_jq+=" map(select(test(\"/.+/\"))) | length] | add"
+	dev_jq="[.[] | $dl_jq | keys |"
+	dev_jq+=" map(select(test(\"/.+/\")|not)) | length] | add"
+
+	if ! devlink resource help 2>&1 | grep -q "scope"; then
+		echo "SKIP: devlink resource show not supported"
+		return
+	fi
+
+	devlink resource show > /dev/null 2>&1
+	check_err $? "Failed to dump all resources"
+
+	count=$(cmd_jq "devlink resource show -j" "$port_jq")
+	[ "$count" -gt "0" ]
+	check_err $? "missing port resources in resource dump"
+
+	count=$(cmd_jq "devlink resource show -j" "$dev_jq")
+	[ "$count" -gt "0" ]
+	check_err $? "missing device resources in resource dump"
+
+	count=$(cmd_jq "devlink resource show scope dev -j" "$dev_jq")
+	[ "$count" -gt "0" ]
+	check_err $? "dev scope missing device resources"
+
+	count=$(cmd_jq "devlink resource show scope dev -j" "$port_jq")
+	[ "$count" -eq "0" ]
+	check_err $? "dev scope returned port resources"
+
+	count=$(cmd_jq "devlink resource show scope port -j" "$port_jq")
+	[ "$count" -gt "0" ]
+	check_err $? "port scope missing port resources"
+
+	count=$(cmd_jq "devlink resource show scope port -j" "$dev_jq")
+	[ "$count" -eq "0" ]
+	check_err $? "port scope returned device resources"
+
+	log_test "resource dump test"
+}
+
 info_get()
 {
 	local name=$1
-- 
2.44.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox