Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH v10 21/21] gpu: nova-core: Use runtime BAR1 size instead of hardcoded 256MB
From: Joel Fernandes @ 2026-04-15 20:23 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: linux-kernel, Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Dave Airlie, Daniel Almeida, Koen Koning,
	dri-devel, rust-for-linux, Nikola Djukic, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jonathan Corbet, Alex Deucher, Christian Koenig, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, Huang Rui,
	Matthew Auld, Matthew Brost, Lucas De Marchi, Thomas Hellstrom,
	Helge Deller, Alex Gaynor, Boqun Feng, John Hubbard,
	Alistair Popple, Timur Tabi, Edwin Peer, Alexandre Courbot,
	Andrea Righi, Andy Ritger, Zhi Wang, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, alexeyi, joel, linux-doc, amd-gfx,
	intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <DHIFPXAGCWU7.300NRYPR06KDG@nvidia.com>

On Thu, Apr 02, 2026 at 02:54:30PM +0900, Eliot Courtney wrote:
> On Wed Apr 1, 2026 at 6:20 AM JST, Joel Fernandes wrote:
> > From: Zhi Wang <zhiw@nvidia.com>
> >
> > Remove the hardcoded BAR1_SIZE = SZ_256M constant. On GPUs like L40 the
> > BAR1 aperture is larger than 256MB; using a hardcoded size prevents large
> > BAR1 from working and mapping it would fail.
> >
> > Signed-off-by: Zhi Wang <zhiw@nvidia.com>
> > Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> > ---
> >  drivers/gpu/nova-core/driver.rs | 8 ++------
> >  drivers/gpu/nova-core/gpu.rs    | 7 +------
> >  2 files changed, 3 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
> > index b1aafaff0cee..6f95f8672158 100644
> > --- a/drivers/gpu/nova-core/driver.rs
> > +++ b/drivers/gpu/nova-core/driver.rs
> > @@ -13,10 +13,7 @@
> >          Vendor, //
> >      },
> >      prelude::*,
> > -    sizes::{
> > -        SZ_16M,
> > -        SZ_256M, //
> > -    },
> > +    sizes::SZ_16M,
> >      sync::{
> >          atomic::{
> >              Atomic,
> > @@ -40,7 +37,6 @@ pub(crate) struct NovaCore {
> >  }
> >  
> >  const BAR0_SIZE: usize = SZ_16M;
> > -pub(crate) const BAR1_SIZE: usize = SZ_256M;
> >  
> >  // For now we only support Ampere which can use up to 47-bit DMA addresses.
> >  //
> > @@ -51,7 +47,7 @@ pub(crate) struct NovaCore {
> >  const GPU_DMA_BITS: u32 = 47;
> >  
> >  pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
> > -pub(crate) type Bar1 = pci::Bar<BAR1_SIZE>;
> > +pub(crate) type Bar1 = pci::Bar;
> >  
> >  kernel::pci_device_table!(
> >      PCI_TABLE,
> > diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> > index 8206ec015b26..ba6f1f6f0485 100644
> > --- a/drivers/gpu/nova-core/gpu.rs
> > +++ b/drivers/gpu/nova-core/gpu.rs
> > @@ -353,16 +353,11 @@ pub(crate) fn run_selftests(
> >  
> >      #[cfg(CONFIG_NOVA_MM_SELFTESTS)]
> >      fn run_mm_selftests(self: Pin<&mut Self>, pdev: &pci::Device<device::Bound>) -> Result {
> > -        use crate::driver::BAR1_SIZE;
> > -
> >          // PRAMIN aperture self-tests.
> >          crate::mm::pramin::run_self_test(pdev.as_ref(), self.mm.pramin(), self.spec.chipset)?;
> >  
> >          // BAR1 self-tests.
> > -        let bar1 = Arc::pin_init(
> > -            pdev.iomap_region_sized::<BAR1_SIZE>(1, c"nova-core/bar1"),
> > -            GFP_KERNEL,
> > -        )?;
> > +        let bar1 = Arc::pin_init(pdev.iomap_region(1, c"nova-core/bar1"), GFP_KERNEL)?;
> >          let bar1_access = bar1.access(pdev.as_ref())?;
> >  
> >          crate::mm::bar_user::run_self_test(
> 
> Can we move this directly after patch 17 which adds the fixed bar1? Or
> alternatively fold it in while preserving Zhi's attribution (I am not
> sure what the conventional method for this is).

Generally, squashing and attribution works. I will do that with attribution.
(Zhi did tell me he would be Ok with that as well in this instance).

thanks,

--
Joel Fernandes



^ permalink raw reply

* Re: [PATCH 0/6] hugetlb: normalize exported interfaces to use base-page indices
From: jane.chu @ 2026-04-15 19:40 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: akpm, david, muchun.song, lorenzo.stoakes, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, corbet, skhan, hughd, baolin.wang, peterx,
	linux-mm, linux-doc, linux-kernel
In-Reply-To: <ad9GS_hRv5hbLfl3@localhost.localdomain>



On 4/15/2026 1:03 AM, Oscar Salvador wrote:
> On Thu, Apr 09, 2026 at 05:41:51PM -0600, Jane Chu wrote:
>> This series stems from a discussion with David. [1]
>> The series makes a small cleanup to a few hugetlb interfaces used
>> outside the subsystem by standardizing them on base-page indices.
>> Hopefully this makes the interface semantics a bit more coherent with
>> the rest of mm, while the internal hugetlb code continue to use hugepage
>> indices where that remains the more natural fit.
>>
>> It is based off mm-stable, 3/30/2026, b2c31180b9d6.
>>
>> [1] https://lore.kernel.org/linux-mm/9ec9edd1-0f4c-4da2-ae78-0e7b251a9e25@kernel.org/
> 
> It seems you got some trailing spaces issues:
> 
> Applying: hugetlb: open-code hugetlb folio lookup index conversion
> .git/rebase-apply/patch:64: trailing whitespace.
> 	pgoff_t index = start >> PAGE_SHIFT;
> 
> Applying: hugetlb: make hugetlb_fault_mutex_hash() take PAGE_SIZE index
> .git/rebase-apply/patch:161: trailing whitespace.
> 	key[1] = index >> huge_page_order(hstate_inode(mapping->host));
> 
> Applying: hugetlb: drop vma_hugecache_offset() in favor of linear_page_index()
> .git/rebase-apply/patch:44: trailing whitespace.
> 	start = linear_page_index(vma, vma->vm_start);
> .git/rebase-apply/patch:46: trailing whitespace.
> 	end = linear_page_index(vma, vma->vm_end);
> 
> Applying: hugetlb: pass hugetlb reservation ranges in base-page indices
> .git/rebase-apply/patch:237: trailing whitespace.
> 		next_index = index + pages_per_huge_page(h)
> 
> 

Sorry, will remove them in v2.

thanks!
-jane
> 


^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Gregory Price @ 2026-04-15 19:40 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Matthew Wilcox, Miklos Szeredi, David Hildenbrand (Arm),
	Darrick J. Wong, John Groves, Bernd Schubert, John Groves,
	Dan Williams, Bernd Schubert, Alison Schofield, John Groves,
	Jonathan Corbet, Shuah Khan, Vishal Verma, Dave Jiang, Jan Kara,
	Alexander Viro, Christian Brauner, Randy Dunlap, Jeff Layton,
	Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Josef Bacik,
	Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
	Sean Christopherson, Shivank Garg, Ackerley Tng, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <CAJnrk1Z+uNjn+BcmpciqPZhxYXEJ5Zgh=uNCxt17WTkdOubbog@mail.gmail.com>

On Wed, Apr 15, 2026 at 10:12:54AM -0700, Joanne Koong wrote:
> On Wed, Apr 15, 2026 at 8:32 AM Gregory Price <gourry@gourry.net> wrote:
> >
> > My initial take is that it's a real concern a "bug" in a BPF program
> > could let userland map arbitrary memory into userland page tables, and
> > such an extension would not be a quick fix to the FAMFS problem.
> 
> If you're concerned about arbitrary addresses in the bpf path, you
> should be equally concerned about the FUSE_GET_FMAP path that's in
> this series, because they're functionally identical. The kernel trusts
> userspace-provided addresses in both cases. If that's acceptable for
> this series then it's acceptable for bpf too. You can't reject bpf on
> security grounds without also rejecting the current approach.
> 

To be clear, i'm not rejecting it.  I'm saying (!) that's something that
needs a careful look.

It's a novel interaction and a new ops structure. I don't think it's in
any way unfair to point out there will (and should) be questions outside
the scope of FAMFS.

> Please take a look at the famfs bpf program [1] and compare that to
> the logic in patch 6 in this series [2]. In both cases, iomap->addr
> gets set to the address that was earlier specified by the userspace
> famfs server. In the non-bpf path, the userspace server passes this
> address through a FUSE_GET_FMAP request. In the bpf path, the
> userspace server passes this address by updating the bpf hashmap from
> userspace. There is no functional difference. Also btw, this is one of
> the cases that I was referring to about the bpf path being more
> helpful - in the bpf path, we avoid having to add a FUSE_FMAP opcode
> to fuse (which will be used by no other server) and famfs gets to skip
> 2 extra context-switches that the FUSE_FMAP path otherwise entails.
> 

The question isn't about the functional differences between the FAMFS
static code or a BPF blob doing the same thing - the question is what
the new ops structure introduces for the general case that wasn't
there before.

We have to reason about the BPF extension separately from the context of
FAMFS - as it's a general interface now (forever :P).

~Gregory

^ permalink raw reply

* Re: [PATCH 6/6] hugetlb: pass hugetlb reservation ranges in base-page indices
From: jane.chu @ 2026-04-15 19:39 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: akpm, david, muchun.song, lorenzo.stoakes, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, corbet, skhan, hughd, baolin.wang, peterx,
	linux-mm, linux-doc, linux-kernel
In-Reply-To: <ad9F5duupm8Rn-Yw@localhost.localdomain>



On 4/15/2026 1:01 AM, Oscar Salvador wrote:
> On Thu, Apr 09, 2026 at 05:41:57PM -0600, Jane Chu wrote:
>> hugetlb_reserve_pages() consume indices in hugepage granularity although
>> some callers naturally compute offsets in PAGE_SIZE units.
>>
>> Teach the reservation helpers to accept base-page index ranges and
>> convert to hugepage indices internally before operating on the
>> reservation map. This keeps the internal representation unchanged while
>> making the API contract more uniform for callers.
>>
>> Update hugetlbfs and memfd call sites to pass base-page indices, and
>> adjust the documentation to describe the new calling convention. Add
>> alignment warnings in hugetlb_reserve_pages() to catch invalid ranges
>> early.
>>
>> No functional changes.
>>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>> ---
>>   Documentation/mm/hugetlbfs_reserv.rst | 12 +++++------
>>   fs/hugetlbfs/inode.c                  | 29 ++++++++++++---------------
>>   mm/hugetlb.c                          | 26 ++++++++++++++++--------
>>   mm/memfd.c                            |  9 +++++----
>>   4 files changed, 42 insertions(+), 34 deletions(-)
>>
>> diff --git a/Documentation/mm/hugetlbfs_reserv.rst b/Documentation/mm/hugetlbfs_reserv.rst
>> index a49115db18c7..60a52b28f0b4 100644
>> --- a/Documentation/mm/hugetlbfs_reserv.rst
>> +++ b/Documentation/mm/hugetlbfs_reserv.rst
>> @@ -112,8 +112,8 @@ flag was specified in either the shmget() or mmap() call.  If NORESERVE
>>   was specified, then this routine returns immediately as no reservations
>>   are desired.
>>   
>> -The arguments 'from' and 'to' are huge page indices into the mapping or
>> -underlying file.  For shmget(), 'from' is always 0 and 'to' corresponds to
>> +The arguments 'from' and 'to' are base page indices into the mapping or
>> +underlying file. For shmget(), 'from' is always 0 and 'to' corresponds to
>>   the length of the segment/mapping.  For mmap(), the offset argument could
>>   be used to specify the offset into the underlying file.  In such a case,
>>   the 'from' and 'to' arguments have been adjusted by this offset.
>> @@ -136,10 +136,10 @@ to indicate this VMA owns the reservations.
>>   
>>   The reservation map is consulted to determine how many huge page reservations
>>   are needed for the current mapping/segment.  For private mappings, this is
>> -always the value (to - from).  However, for shared mappings it is possible that
>> -some reservations may already exist within the range (to - from).  See the
>> -section :ref:`Reservation Map Modifications <resv_map_modifications>`
>> -for details on how this is accomplished.
>> +always the number of huge pages covered by the range [from, to).  However,
>> +for shared mappings it is possible that some reservations may already exist
>> +within the range [from, to).  See the section :ref:`Reservation Map Modifications
>> +<resv_map_modifications>` for details on how this is accomplished.
>>   
>>   The mapping may be associated with a subpool.  If so, the subpool is consulted
>>   to ensure there is sufficient space for the mapping.  It is possible that the
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index a72d46ff7980..ec05ed30b70f 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -157,10 +157,8 @@ static int hugetlbfs_file_mmap_prepare(struct vm_area_desc *desc)
>>   	if (inode->i_flags & S_PRIVATE)
>>   		vma_flags_set(&vma_flags, VMA_NORESERVE_BIT);
>>   
>> -	if (hugetlb_reserve_pages(inode,
>> -			desc->pgoff >> huge_page_order(h),
>> -			len >> huge_page_shift(h), desc,
>> -			vma_flags) < 0)
>> +	if (hugetlb_reserve_pages(inode, desc->pgoff, len >> PAGE_SHIFT, desc,
>> +				  vma_flags) < 0)
> 
> Ok, this is something that I have been thinking every time  I looked
> into hugetlb reserve code, but I think we should be really starting to
> put some meaningful names for from and to, and pass that to
> hugetlb_reserve_pages.
> Because "desc->pgoff" and "len >> PAGE_SHIFT", meh, and it is not that
> many places we need to touch, but we might want in clarity.
> The same goes for hugetlb_unreserve_pages() of course.

indeed, will try to work on that in v2.
> 
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 47ef41b6fb2e..eb4ab5bd0c9f 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -6532,10 +6532,11 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
>>   }
> [...]
>> @@ -6558,6 +6560,12 @@ long hugetlb_reserve_pages(struct inode *inode,
>>   		return -EINVAL;
>>   	}
>>   
>> +	VM_WARN_ON(!IS_ALIGNED(from, 1UL << huge_page_order(h)));
>> +	VM_WARN_ON(!IS_ALIGNED(to,   1UL << huge_page_order(h)));
> 
> If we want to scream if someone passes us unaligned indices, we might
> want to do the same in hugetlb_unreserve_pages() ?

Sure.
> 
>> diff --git a/mm/memfd.c b/mm/memfd.c
>> index 56c8833c4195..59c174c7533c 100644
>> --- a/mm/memfd.c
>> +++ b/mm/memfd.c
>> @@ -80,14 +80,15 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t index)
>>   		struct inode *inode = file_inode(memfd);
>>   		struct hstate *h = hstate_file(memfd);
>>   		long nr_resv;
>> -		pgoff_t idx;
>> +		pgoff_t next_index;
>>   		int err = -ENOMEM;
>>   
>>   		gfp_mask = htlb_alloc_mask(h);
>>   		gfp_mask &= ~(__GFP_HIGHMEM | __GFP_MOVABLE);
>> -		idx = index >> huge_page_order(h);
>> +		next_index = index + pages_per_huge_page(h);
> 
> Trailing white space.

My bad, should have checked.

Thanks!
-jane

> 
> 


^ permalink raw reply

* Re: [PATCH net-next v4 04/13] devlink: allow to use devlink index as a command handle
From: Geert Uytterhoeven @ 2026-04-15 19:04 UTC (permalink / raw)
  To: jiri
  Cc: andrew+netdev, chuck.lever, cjubran, corbet, daniel.zahka, davem,
	donald.hunter, edumazet, horms, kuba, leon, linux-doc, linux-rdma,
	linux-trace-kernel, mathieu.desnoyers, matttbe, mbloch, mhiramat,
	mschmidt, netdev, pabeni, przemyslaw.kitszel, rostedt, saeedm,
	skhan, tariqt, linux-kernel
In-Reply-To: <20260312100407.551173-5-jiri@resnulli.us>

On Thu, 12 Mar 2026, Jiri Pirko wrote:
> Currently devlink instances are addressed bus_name/dev_name tuple.
> Allow the newly introduced DEVLINK_ATTR_INDEX to be used as
> an alternative handle for all devlink commands.
> 
> When DEVLINK_ATTR_INDEX is present in the request, use it for a direct
> xarray lookup instead of iterating over all instances comparing
> bus_name/dev_name strings.
> 
> Signed-off-by: Jiri Pirko <jiri@nvidia.com>

Thanks for your patch, which is now commit d85a8af57da87196 ("devlink:
allow to use devlink index as a command handle").

This has a rather large impact on kernel size.
For e.g. m68k/atari_defconfig, bloat-o-meter reports:

    add/remove: 4/1 grow/shrink: 72/1 up/down: 65804/-76 (65728)
    Function                                     old     new   delta
    devlink_trap_policer_get_dump_nl_policy       24    1480   +1456
    devlink_trap_group_get_dump_nl_policy         24    1480   +1456
    devlink_trap_get_dump_nl_policy               24    1480   +1456
    devlink_selftests_get_nl_policy               24    1480   +1456
    devlink_sb_tc_pool_bind_get_dump_nl_policy      24    1480   +1456
    devlink_sb_port_pool_get_dump_nl_policy       24    1480   +1456
    devlink_sb_pool_get_dump_nl_policy            24    1480   +1456
    devlink_sb_get_dump_nl_policy                 24    1480   +1456
    devlink_resource_dump_nl_policy               24    1480   +1456
    devlink_region_get_dump_nl_policy             24    1480   +1456
    devlink_rate_get_dump_nl_policy               24    1480   +1456
    devlink_port_get_dump_nl_policy               24    1480   +1456
    devlink_param_get_dump_nl_policy              24    1480   +1456
    devlink_linecard_get_dump_nl_policy           24    1480   +1456
    devlink_info_get_nl_policy                    24    1480   +1456
    devlink_get_nl_policy                         24    1480   +1456
    devlink_eswitch_get_nl_policy                 24    1480   +1456
    devlink_dpipe_headers_get_nl_policy           24    1480   +1456
    devlink_port_unsplit_nl_policy                32    1480   +1448
    devlink_port_param_set_nl_policy              32    1480   +1448
    devlink_port_param_get_nl_policy              32    1480   +1448
    devlink_port_get_do_nl_policy                 32    1480   +1448
    devlink_port_del_nl_policy                    32    1480   +1448
    devlink_notify_filter_set_nl_policy           32    1480   +1448
    devlink_health_reporter_get_dump_nl_policy      32    1480   +1448
    devlink_port_split_nl_policy                  80    1480   +1400
    devlink_sb_occ_snapshot_nl_policy             96    1480   +1384
    devlink_sb_occ_max_clear_nl_policy            96    1480   +1384
    devlink_sb_get_do_nl_policy                   96    1480   +1384
    devlink_sb_port_pool_get_do_nl_policy        144    1480   +1336
    devlink_sb_pool_get_do_nl_policy             144    1480   +1336
    devlink_sb_pool_set_nl_policy                168    1480   +1312
    devlink_sb_port_pool_set_nl_policy           176    1480   +1304
    devlink_sb_tc_pool_bind_set_nl_policy        184    1480   +1296
    devlink_sb_tc_pool_bind_get_do_nl_policy     184    1480   +1296
    devlink_dpipe_table_get_nl_policy            240    1480   +1240
    devlink_dpipe_entries_get_nl_policy          240    1480   +1240
    devlink_dpipe_table_counters_set_nl_policy     272    1480   +1208
    devlink_eswitch_set_nl_policy                504    1480    +976
    devlink_resource_set_nl_policy               544    1480    +936
    devlink_param_get_do_nl_policy               656    1480    +824
    devlink_region_get_do_nl_policy              712    1480    +768
    devlink_region_new_nl_policy                 744    1480    +736
    devlink_region_del_nl_policy                 744    1480    +736
    devlink_health_reporter_test_nl_policy       928    1480    +552
    devlink_health_reporter_recover_nl_policy     928    1480    +552
    devlink_health_reporter_get_do_nl_policy     928    1480    +552
    devlink_health_reporter_dump_get_nl_policy     928    1480    +552
    devlink_health_reporter_dump_clear_nl_policy     928    1480    +552
    devlink_health_reporter_diagnose_nl_policy     928    1480    +552
    devlink_trap_get_do_nl_policy               1048    1480    +432
    devlink_trap_set_nl_policy                  1056    1480    +424
    devlink_trap_group_get_do_nl_policy         1088    1480    +392
    devlink_trap_policer_get_do_nl_policy       1144    1480    +336
    devlink_trap_group_set_nl_policy            1144    1480    +336
    devlink_trap_policer_set_nl_policy          1160    1480    +320
    devlink_port_set_nl_policy                  1168    1480    +312
    devlink_flash_update_nl_policy              1224    1480    +256
    devlink_reload_nl_policy                    1248    1480    +232
    devlink_port_new_nl_policy                  1320    1480    +160
    devlink_rate_get_do_nl_policy               1352    1480    +128
    devlink_rate_del_nl_policy                  1352    1480    +128
    devlink_linecard_get_do_nl_policy           1376    1480    +104
    __devlinks_xa_find_get                         -      96     +96
    devlink_linecard_set_nl_policy              1392    1480     +88
    devlink_selftests_run_nl_policy             1416    1480     +64
    devlink_get_from_attrs_lock                  262     314     +52
    devlink_region_read_nl_policy               1440    1480     +40
    devlink_rate_set_nl_policy                  1448    1480     +32
    devlink_rate_new_nl_policy                  1448    1480     +32
    devlinks_xa_lookup_get                         -      30     +30
    devlink_health_reporter_set_nl_policy       1456    1480     +24
    devlink_attr_index_range                       -      16     +16
    devlink_param_set_nl_policy                 1472    1480      +8
    devlink_nl_dumpit                            276     282      +6
    __initcall__kmod_core__670_573_devlink_init4       -       4      +4
    __initcall__kmod_core__670_561_devlink_init4       4       -      -4
    devlinks_xa_find_get                          96      24     -72
    Total: Before=5203976, After=5269704, chg +1.26%

> --- a/net/devlink/netlink_gen.c
> +++ b/net/devlink/netlink_gen.c
> @@ -11,6 +11,11 @@
>  
>  #include <uapi/linux/devlink.h>
>  
> +/* Integer value ranges */
> +static const struct netlink_range_validation devlink_attr_index_range = {
> +	.max	= U32_MAX,
> +};
> +
>  /* Sparse enums validation callbacks */
>  static int
>  devlink_attr_param_type_validate(const struct nlattr *attr,
> @@ -56,37 +61,42 @@ const struct nla_policy devlink_dl_selftest_id_nl_policy[DEVLINK_ATTR_SELFTEST_I
>  };
>  
>  /* DEVLINK_CMD_GET - do */
> -static const struct nla_policy devlink_get_nl_policy[DEVLINK_ATTR_DEV_NAME + 1] = {
> +static const struct nla_policy devlink_get_nl_policy[DEVLINK_ATTR_INDEX + 1] = {

Unrelated to this change, but the explicit sizing of these arrays is not
needed, as the compiler will take care of that.

>  	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
>  	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
> +	[DEVLINK_ATTR_INDEX] = NLA_POLICY_FULL_RANGE(NLA_UINT, &devlink_attr_index_range),

This array, and many others below, are sparse, with large gaps (up to
1456 or 2912 bytes on 32-bit resp. 64-bit systems) before the last
entries.

>  };
>  
>  /* DEVLINK_CMD_PORT_GET - do */
> -static const struct nla_policy devlink_port_get_do_nl_policy[DEVLINK_ATTR_PORT_INDEX + 1] = {
> +static const struct nla_policy devlink_port_get_do_nl_policy[DEVLINK_ATTR_INDEX + 1] = {
>  	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
>  	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
> +	[DEVLINK_ATTR_INDEX] = NLA_POLICY_FULL_RANGE(NLA_UINT, &devlink_attr_index_range),

Shouldn't this be inserted at the end, as DEVLINK_ATTR_INDEX >
DEVLINK_ATTR_PORT_INDEX, for readability?

>  	[DEVLINK_ATTR_PORT_INDEX] = { .type = NLA_U32, },
>  };

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH bpf] bpf,tcp: avoid infinite recursion in BPF_SOCK_OPS_HDR_OPT_LEN_CB
From: Martin KaFai Lau @ 2026-04-15 18:55 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: bpf, Quan Sun, Yinhao Hu, Kaiyan Mei, Dongliang Mu, Eric Dumazet,
	Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Ahern,
	netdev, linux-doc, linux-kernel
In-Reply-To: <20260414105702.248310-1-jiayuan.chen@linux.dev>

On Tue, Apr 14, 2026 at 06:57:00PM +0800, Jiayuan Chen wrote:
> A BPF_PROG_TYPE_SOCK_OPS program can set BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG
> to inject custom TCP header options. When the kernel builds a TCP packet,
> it calls tcp_established_options() to calculate the header size, which
> invokes bpf_skops_hdr_opt_len() to trigger the BPF_SOCK_OPS_HDR_OPT_LEN_CB
> callback.
> 
> If the BPF program calls bpf_setsockopt(TCP_NODELAY) inside this callback,
> __tcp_sock_set_nodelay() will call tcp_push_pending_frames(), which calls
> tcp_current_mss(), which calls tcp_established_options() again,
> re-triggering the same BPF callback. This creates an infinite recursion
> that exhausts the kernel stack and causes a panic.
> 
> BPF_SOCK_OPS_HDR_OPT_LEN_CB
>   -> bpf_setsockopt(TCP_NODELAY)
> 	-> tcp_push_pending_frames()
> 	  -> tcp_current_mss()
> 		-> tcp_established_options()
> 		  -> bpf_skops_hdr_opt_len()
>                            /* infinite recursion */
> 			-> BPF_SOCK_OPS_HDR_OPT_LEN_CB
> 
> A similar reentrancy issue exists for TCP congestion control, which is
> guarded by tp->bpf_chg_cc_inprogress. Adopt the same approach: introduce
> tp->bpf_hdr_opt_len_cb_inprogress, set it before invoking the callback in
> bpf_skops_hdr_opt_len(), and check it in sol_tcp_sockopt() to reject
> bpf_setsockopt(TCP_NODELAY) calls that would trigger
> tcp_push_pending_frames() and cause the recursion.
> 
> Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
> Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
> Reported-by: Dongliang Mu <dzm91@hust.edu.cn>
> Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/

Thanks for the report and fixes suggested across different threads.

Using has_current_bpf_ctx() to avoid tcp_push_pending_frames() should
work but it may change the expectation for bpf_setsockopt(TCP_NODELAY).
e.g. A bpf_tcp_iter does bpf_setsockopt(TCP_NODELAY).

Adding another bit in the tcp_sock is not ideal either. I agree with
Alexei that it is better to reuse the existing bit if we go down this path.
We also need to audit more closely if there are cases that two different
type of bpf progs may call bpf_setsockopt(). e.g.
bpf_tcp_iter does bpf_setsockopt(TCP_CONGESTION) to switch
to a bpf_tcp_cc and the new bpf_tcp_cc->init() will also do
bpf_setsockopt(xxx) which then will be rejected.

Another fix could be, the bpf_setsockopt(TCP_NODELAY) is always broken
for BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB unless
the bpf prog is doing some maneuver to avoid the recursion. Thus,
this use case is basically broken as is and I don't see a use case
for bpf_setsockopt(TCP_NODELAY) when writing header also.
How about checking the bpf_sock->op, level, and optname in
bpf_sock_ops_setsockopt() and return -EOPNOTSUPP?

^ permalink raw reply

* Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
From: Michael Roth @ 2026-04-15 18:20 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jroedel, jthoughton, oupton, pankaj.gupta,
	qperret, rick.p.edgecombe, rientjes, shivankg, steven.price,
	tabba, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, Paolo Bonzini, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm
In-Reply-To: <eiiecl7jvywvqb4drq7cchmcabcrdka25wxr77uavxqineeedm@rfcnhdz6xoxf>

On Tue, Apr 14, 2026 at 06:37:00PM -0500, Michael Roth wrote:
> On Wed, Apr 01, 2026 at 03:38:12PM -0700, Ackerley Tng wrote:
> > Michael Roth <michael.roth@amd.com> writes:
> > 
> > >
> > > [...snip...]
> > >
> > >>  static unsigned long kvm_get_vm_memory_attributes(struct kvm *kvm, gfn_t gfn)
> > >>  {
> > >> @@ -2635,6 +2625,8 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> > >>  		return -EINVAL;
> > >>  	if (!PAGE_ALIGNED(attrs->address) || !PAGE_ALIGNED(attrs->size))
> > >>  		return -EINVAL;
> > >> +	if (attrs->error_offset)
> > >> +		return -EINVAL;
> > >>  	for (i = 0; i < ARRAY_SIZE(attrs->reserved); i++) {
> > >>  		if (attrs->reserved[i])
> > >>  			return -EINVAL;
> > >> @@ -4983,6 +4975,11 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
> > >>  		return 1;
> > >>  	case KVM_CAP_GUEST_MEMFD_FLAGS:
> > >>  		return kvm_gmem_get_supported_flags(kvm);
> > >> +	case KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES:
> > >> +		if (vm_memory_attributes)
> > >> +			return 0;
> > >> +
> > >> +		return kvm_supported_mem_attributes(kvm);
> > >
> > > Based on the discussion from the PUCK call this morning,
> > 
> > Thanks for copying the discussion here, I'll start attending PUCK to
> > catch those discussions too :)
> > 
> > > it sounds like it
> > > would be a good idea to limit kvm_supported_mem_attributes() to only
> > > reporting KVM_MEMORY_ATTRIBUTE_PRIVATE if the underlying CoCo
> > > implementation has all the necessary enablement to support in-place
> > > conversion via guest_memfd. In the case of SNP, there is a
> > > documentation/parameter check in snp_launch_update() that needs to be
> > > relaxed in order for userspace to be able to pass in a NULL 'src'
> > > parameter (since, for in-place conversion, it would be initialized in place
> > > as shared memory prior to the call, since by the time kvm_gmem_poulate()
> > > it will have been set to private and therefore cannot be faulted in via
> > > GUP (and if it could, we'd be unecessarily copying the src back on top
> > > of itself since src/dst are the same).
> > 
> > Could this be a separate thing? If I'm understanding you correctly, it's
> > not strictly a requirement for snp_launch_update() to first support a
> > NULL 'src' parameter before this series lands.
> 
> I think we are already sync'd up on this during PUCK, but for the benefit
> of others: Sean pointed out that if we don't then we'll need to add yet
> another capability so userspace can determine when it can actually do
> in-place conversion for SNP.

(in-place conversion for SNP during pre-launch/populate phase, I meant)

> 
> Right now, this series effectively advertises in place conversion at the
> point where KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES reports
> 'KVM_MEMORY_ATTRIBUTE_PRIVATE', so I slightly reworked the series to
> include the snp_launch_update() change prior to that point in time in
> the series. Thanks to prereqs and changes/requirements you've already
> pulled in, it's just one additional patch now:
> 
>  KVM: SEV: Make 'uaddr' parameter optional for KVM_SEV_SNP_LAUNCH_UPDATE 
> 
> I also did some minor updates (prefixed with a "[squash]" tag) to advertise
> the KVM_SET_MEMORY_ATTRIBUTES2_PRESERVED flag so it can be used by

Though I'm not sure how we deal with it if SNP/TDX at some point become
capable of using the PRESERVED flag *after* populate... but maybe that's
too unlikely to worry about? If we wanted to address it though, we could
have both PRESERVED and PRESERVED_BEFORE_LAUNCH so they can be
enumerated separately from the start.

> userspace for SNP/TDX in the kvm_gmem_populate() path as agreed upon
> during PUCK.
> 
> The branch is here, with the patches moved to where I think they
> should remain (or be squashed in for the [squash] ones):
> 
>   https://github.com/AMDESE/linux/commits/guest_memfd-inplace-conversion-v4-snp2/
> 
> I've also updated the QEMU patches to use the agreed-upon API flow and
> pushed them here:
> 
>   https://github.com/AMDESE/qemu/commits/snp-inplace-for-v4-wip2/
> 
> To start an SNP guest with in-place conversion:
> 
>   qemu-system-x86 \
>   -machine q35,confidential-guest-support=sev0,memory-backend=ram1 \
>   -object sev-snp-guest,id=sev0,...,convert-in-place=true \
>   -object memory-backend-memfd,id=ram1,size=16G,share=true,reserve=false

Sorry, that should've been:

  -object memory-backend-guest-memfd,id=ram1,size=16G,share=true,reserve=false

> 
> To start an normal non-CoCo guest backed by guest_memfd with shared memory:
> 
>   qemu-system-x86 \
>   -machine q35,confidential-guest-support=sev0,memory-backend=ram1 \
>   -object memory-backend-memfd,id=ram1,size=16G,share=true,reserve=false

and:

  -object memory-backend-guest-memfd,id=ram1,size=16G,share=true,reserve=false

(and both require kvm.vm_memory_attributes=0)

-Mike

> 
> Thanks,
> 
> Mike

^ permalink raw reply

* [PATCH] docs: proc: fix minor grammar and formatting issues
From: Myro @ 2026-04-15 17:52 UTC (permalink / raw)
  To: corbet; +Cc: linux-doc, linux-kernel, linux-fsdevel, Myro

Fix missing "from" in "prevent <pid> --from-- being reused" and
add spacing in vm_area_struct range notation for readability.

No functional changes. :)
---
 Documentation/filesystems/proc.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 873761087f8d..d828006bd91c 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -118,7 +118,7 @@ PTRACE_MODE_ATTACH permissions; CAP_PERFMON capability does not grant access
 to /proc/PID/mem for other processes.
 
 Note that an open file descriptor to /proc/<pid> or to any of its
-contained files or subdirectories does not prevent <pid> being reused
+contained files or subdirectories does not prevent <pid> from being reused
 for some other process in the event that <pid> exits. Operations on
 open /proc/<pid> file descriptors corresponding to dead processes
 never act on any new process that the kernel may, through chance, have
@@ -2199,7 +2199,7 @@ the process is maintaining.  Example output::
      | lr-------- 1 root root 64 Jan 27 11:24 400000-41a000 -> /usr/bin/ls
 
 The name of a link represents the virtual memory bounds of a mapping, i.e.
-vm_area_struct::vm_start-vm_area_struct::vm_end.
+vm_area_struct::vm_start - vm_area_struct::vm_end.
 
 The main purpose of the map_files is to retrieve a set of memory mapped
 files in a fast way instead of parsing /proc/<pid>/maps or
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v4 02/13] dt-bindings: leds: document Samsung S2M series PMIC RGB LED device
From: Kaustabh Chakraborty @ 2026-04-15 17:30 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Kaustabh Chakraborty
  Cc: Lee Jones, Pavel Machek, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, MyungJoo Ham, Chanwoo Choi, Sebastian Reichel,
	André Draszik, Alexandre Belloni, Jonathan Corbet,
	Shuah Khan, Nam Tran, Łukasz Lebiedziński, linux-leds,
	devicetree, linux-kernel, linux-pm, linux-samsung-soc, linux-rtc,
	linux-doc
In-Reply-To: <20260415-sensible-kiwi-of-argument-44d6ed@quoll>

On 2026-04-15 09:03 +02:00, Krzysztof Kozlowski wrote:
> On Tue, Apr 14, 2026 at 12:02:54PM +0530, Kaustabh Chakraborty wrote:
>> +description: |
>> +  The Samsung S2M series PMIC RGB LED is a three-channel LED device with
>> +  8-bit brightness control for each channel, typically used as status
>> +  indicators in mobile phones.
>> +
>> +  This is a part of device tree bindings for S2M and S5M family of Power
>> +  Management IC (PMIC).
>> +
>> +  See also Documentation/devicetree/bindings/mfd/samsung,s2mps11.yaml for
>> +  additional information and example.
>> +
>> +allOf:
>> +  - $ref: common.yaml#
>
> Rob's comment is still valid:
> 1. How do you address one of three LEDs in non-RGB case?
> 2. Where is multi-color?

Yes, multi-color should have been added here.

>
> And based on this alone without other properties, I say this should be
> part of top-level schema.  Separate node is fine, but no need for
> separate binding.

BTW, for loading the sub-device driver via platform (as it won't be a
separate binding) the driver *must* be built-in. Although not related to
bindings, this seems counter-intuitive. I see the same problem with the
PMIC charger.

>
> Best regards,
> Krzysztof


^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Joanne Koong @ 2026-04-15 17:12 UTC (permalink / raw)
  To: Gregory Price
  Cc: Matthew Wilcox, Miklos Szeredi, David Hildenbrand (Arm),
	Darrick J. Wong, John Groves, Bernd Schubert, John Groves,
	Dan Williams, Bernd Schubert, Alison Schofield, John Groves,
	Jonathan Corbet, Shuah Khan, Vishal Verma, Dave Jiang, Jan Kara,
	Alexander Viro, Christian Brauner, Randy Dunlap, Jeff Layton,
	Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Josef Bacik,
	Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
	Sean Christopherson, Shivank Garg, Ackerley Tng, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <ad-vnqRrUGs9n0N8@gourry-fedora-PF4VCD3F>

On Wed, Apr 15, 2026 at 8:32 AM Gregory Price <gourry@gourry.net> wrote:
>
> On Wed, Apr 15, 2026 at 04:10:00PM +0100, Matthew Wilcox wrote:
> > On Wed, Apr 15, 2026 at 04:04:50PM +0200, Miklos Szeredi wrote:
> > > On Wed, 15 Apr 2026 at 15:35, Gregory Price <gourry@gourry.net> wrote:
> > >
> > > > This was my first reaction when I realized the BPF program would be
> > > > controlling iomap return value in the fault path.  Big ol' (!)  popped
> > > > up over my head.
> > >
> > > I'm wondering which part of this triggers the big (!).
> > >
> > > BPF program being run in the fault path?
> > >
> > > Or that the return value from the BPF function is used as iomap?
> > >
> > > Or something else?
> >
> > If a BPF program controls what memory address a fault now allows access
> > to, who validates that this is a memory address within the purview of
> > the BPF program, and not, say, the address of the kernel page tables?
> >
> > (I have done no looking to determine if this is already considered)
>
> From an initial look at the existing bpf ops structures, I do not see
> any other struct with a similar (obvious) pattern - so it's not clear to
> me such a concern has been exposed elsewhere or directly addressed.
>
> There is a verifier step for the BPF program that in theory would
> validate the range matches the DAX ranges, but i think that only
> validates the types are right and only on load - I think the BPF
> program itself would be the address validater, which is a strong no.
>
> BPF folks please correct me if i'm off base here.
>
> My initial take is that it's a real concern a "bug" in a BPF program
> could let userland map arbitrary memory into userland page tables, and
> such an extension would not be a quick fix to the FAMFS problem.

If you're concerned about arbitrary addresses in the bpf path, you
should be equally concerned about the FUSE_GET_FMAP path that's in
this series, because they're functionally identical. The kernel trusts
userspace-provided addresses in both cases. If that's acceptable for
this series then it's acceptable for bpf too. You can't reject bpf on
security grounds without also rejecting the current approach.

Please take a look at the famfs bpf program [1] and compare that to
the logic in patch 6 in this series [2]. In both cases, iomap->addr
gets set to the address that was earlier specified by the userspace
famfs server. In the non-bpf path, the userspace server passes this
address through a FUSE_GET_FMAP request. In the bpf path, the
userspace server passes this address by updating the bpf hashmap from
userspace. There is no functional difference. Also btw, this is one of
the cases that I was referring to about the bpf path being more
helpful - in the bpf path, we avoid having to add a FUSE_FMAP opcode
to fuse (which will be used by no other server) and famfs gets to skip
2 extra context-switches that the FUSE_FMAP path otherwise entails.

As I understand it, famfs is gated behind CAP_SYS_RAWIO, which is a
highly privileged capability. To use iomap bpf, this would also
require similar high privileges.

Thanks,
Joanne

[1] https://github.com/joannekoong/libfuse/blob/444fa27fa9fd2118a0dc332933197faf9bbf25aa/example/famfs.bpf.c
[2] https://lore.kernel.org/linux-fsdevel/0100019d43e79794-0eadcf5e-b659-43f7-8fdc-dec9f4ccce14-000000@email.amazonses.com/

>
> ~Gregory

^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Gregory Price @ 2026-04-15 15:32 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Miklos Szeredi, David Hildenbrand (Arm), Darrick J. Wong,
	John Groves, Joanne Koong, Bernd Schubert, John Groves,
	Dan Williams, Bernd Schubert, Alison Schofield, John Groves,
	Jonathan Corbet, Shuah Khan, Vishal Verma, Dave Jiang, Jan Kara,
	Alexander Viro, Christian Brauner, Randy Dunlap, Jeff Layton,
	Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Josef Bacik,
	Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
	Sean Christopherson, Shivank Garg, Ackerley Tng, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <ad-qSB4oL5D3S-ht@casper.infradead.org>

On Wed, Apr 15, 2026 at 04:10:00PM +0100, Matthew Wilcox wrote:
> On Wed, Apr 15, 2026 at 04:04:50PM +0200, Miklos Szeredi wrote:
> > On Wed, 15 Apr 2026 at 15:35, Gregory Price <gourry@gourry.net> wrote:
> > 
> > > This was my first reaction when I realized the BPF program would be
> > > controlling iomap return value in the fault path.  Big ol' (!)  popped
> > > up over my head.
> > 
> > I'm wondering which part of this triggers the big (!).
> > 
> > BPF program being run in the fault path?
> > 
> > Or that the return value from the BPF function is used as iomap?
> > 
> > Or something else?
> 
> If a BPF program controls what memory address a fault now allows access
> to, who validates that this is a memory address within the purview of
> the BPF program, and not, say, the address of the kernel page tables?
> 
> (I have done no looking to determine if this is already considered)

From an initial look at the existing bpf ops structures, I do not see
any other struct with a similar (obvious) pattern - so it's not clear to
me such a concern has been exposed elsewhere or directly addressed.

There is a verifier step for the BPF program that in theory would
validate the range matches the DAX ranges, but i think that only
validates the types are right and only on load - I think the BPF
program itself would be the address validater, which is a strong no.

BPF folks please correct me if i'm off base here.

My initial take is that it's a real concern a "bug" in a BPF program
could let userland map arbitrary memory into userland page tables, and
such an extension would not be a quick fix to the FAMFS problem.

~Gregory

^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Darrick J. Wong @ 2026-04-15 15:28 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Miklos Szeredi, Gregory Price, David Hildenbrand (Arm),
	John Groves, Joanne Koong, Bernd Schubert, John Groves,
	Dan Williams, Bernd Schubert, Alison Schofield, John Groves,
	Jonathan Corbet, Shuah Khan, Vishal Verma, Dave Jiang, Jan Kara,
	Alexander Viro, Christian Brauner, Randy Dunlap, Jeff Layton,
	Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Josef Bacik,
	Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
	Sean Christopherson, Shivank Garg, Ackerley Tng, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <ad-qSB4oL5D3S-ht@casper.infradead.org>

On Wed, Apr 15, 2026 at 04:10:00PM +0100, Matthew Wilcox wrote:
> On Wed, Apr 15, 2026 at 04:04:50PM +0200, Miklos Szeredi wrote:
> > On Wed, 15 Apr 2026 at 15:35, Gregory Price <gourry@gourry.net> wrote:
> > 
> > > This was my first reaction when I realized the BPF program would be
> > > controlling iomap return value in the fault path.  Big ol' (!)  popped
> > > up over my head.
> > 
> > I'm wondering which part of this triggers the big (!).
> > 
> > BPF program being run in the fault path?
> > 
> > Or that the return value from the BPF function is used as iomap?
> > 
> > Or something else?
> 
> If a BPF program controls what memory address a fault now allows access
> to, who validates that this is a memory address within the purview of
> the BPF program, and not, say, the address of the kernel page tables?
> 
> (I have done no looking to determine if this is already considered)

We're not using bpf to implement ->filemap_fault directly.  fuse would
implement that as a call to dax_iomap_fault, iomap would then call
fuse's ->iomap_begin, and that's where the bpf program would take over.
The mapping provided would return a (struct dax_device, u64 addr, u64
length), and (presumably) the dax device access function would
rangecheck that.

Unless someone foolishly puts kernel page tables on the dax device...

--D

^ permalink raw reply

* Re: [PATCH v4] PM: QoS: Introduce boot parameter pm_qos_resume_latency_us
From: Aaron Tomlin @ 2026-04-15 15:23 UTC (permalink / raw)
  To: rafael, dakr, pavel, lenb
  Cc: zhongqiu.han, akpm, bp, pmladek, rdunlap, feng.tang,
	pawan.kumar.gupta, kees, elver, arnd, fvdl, lirongqing, bhelgaas,
	neelx, sean, mproche, chjohnst, nick.lange, linux-kernel,
	linux-pm, linux-doc
In-Reply-To: <20260308190421.46657-1-atomlin@atomlin.com>

[-- Attachment #1: Type: text/plain, Size: 968 bytes --]

On Sun, Mar 08, 2026 at 03:04:21PM -0400, Aaron Tomlin wrote:
> Users currently lack a mechanism to define granular, per-CPU PM QoS
> resume latency constraints during the early boot phase.
> 
> While the idle=poll boot parameter exists, it enforces a global
> override, forcing all CPUs in the system to "poll". This global approach
> is not suitable for asymmetric workloads where strict latency guarantees
> are required only on specific critical CPUs, while housekeeping or
> non-critical CPUs should be allowed to enter deeper idle states to save
> energy.
> 

Hi Rafael, Danilo, Pavel, Len, Zhongqiu,

A gentle ping on this v4 series. I was hoping to see if there is any
further feedback on this approach to introducing the
"pm_qos_resume_latency_us=" boot parameter.

Please let me know if any further adjustments are required, or if you need
me to rebase this against the latest power management tree.


Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Matthew Wilcox @ 2026-04-15 15:10 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Gregory Price, David Hildenbrand (Arm), Darrick J. Wong,
	John Groves, Joanne Koong, Bernd Schubert, John Groves,
	Dan Williams, Bernd Schubert, Alison Schofield, John Groves,
	Jonathan Corbet, Shuah Khan, Vishal Verma, Dave Jiang, Jan Kara,
	Alexander Viro, Christian Brauner, Randy Dunlap, Jeff Layton,
	Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Josef Bacik,
	Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
	Sean Christopherson, Shivank Garg, Ackerley Tng, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <CAJfpegsUVv0ziMSQiq9pKeXf6G-+LROPTW077hHMSmAirVCLQw@mail.gmail.com>

On Wed, Apr 15, 2026 at 04:04:50PM +0200, Miklos Szeredi wrote:
> On Wed, 15 Apr 2026 at 15:35, Gregory Price <gourry@gourry.net> wrote:
> 
> > This was my first reaction when I realized the BPF program would be
> > controlling iomap return value in the fault path.  Big ol' (!)  popped
> > up over my head.
> 
> I'm wondering which part of this triggers the big (!).
> 
> BPF program being run in the fault path?
> 
> Or that the return value from the BPF function is used as iomap?
> 
> Or something else?

If a BPF program controls what memory address a fault now allows access
to, who validates that this is a memory address within the purview of
the BPF program, and not, say, the address of the kernel page tables?

(I have done no looking to determine if this is already considered)

^ permalink raw reply

* Re: [RFC, PATCH 10/12] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle
From: Usama Arif @ 2026-04-15 15:08 UTC (permalink / raw)
  To: Kiryl Shutsemau (Meta)
  Cc: Usama Arif, Andrew Morton, Peter Xu, David Hildenbrand,
	Lorenzo Stoakes, Mike Rapoport, Suren Baghdasaryan,
	Vlastimil Babka, Liam R . Howlett, Zi Yan, Jonathan Corbet,
	Shuah Khan, Sean Christopherson, Paolo Bonzini, linux-mm,
	linux-kernel, linux-doc, linux-kselftest, kvm
In-Reply-To: <20260414142354.1465950-11-kas@kernel.org>

On Tue, 14 Apr 2026 15:23:44 +0100 "Kiryl Shutsemau (Meta)" <kas@kernel.org> wrote:

> Add UFFDIO_SET_MODE ioctl to toggle UFFD_FEATURE_MINOR_ASYNC at
> runtime. Takes mmap_write_lock for serialization against all in-flight
> faults. On sync-to-async transition, wake threads blocked in
> handle_userfault() so they retry and auto-resolve.
> 
> Since ctx->features can now be modified concurrently, add
> userfaultfd_features() helper that wraps READ_ONCE() and convert
> all ctx->features reads to use it.
> 
> Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
> Assisted-by: Claude:claude-opus-4-6
> ---
>  fs/userfaultfd.c                 | 95 ++++++++++++++++++++++++++++----
>  include/uapi/linux/userfaultfd.h | 13 +++++
>  2 files changed, 96 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 43064238fd8d..0edb33599491 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -79,24 +79,33 @@ struct userfaultfd_wake_range {
>  /* internal indication that UFFD_API ioctl was successfully executed */
>  #define UFFD_FEATURE_INITIALIZED		(1u << 31)
>  
> +/*
> + * Read ctx->features with READ_ONCE() since UFFDIO_SET_MODE can
> + * modify it concurrently.
> + */
> +static unsigned int userfaultfd_features(struct userfaultfd_ctx *ctx)
> +{
> +	return READ_ONCE(ctx->features);
> +}
> +
>  static bool userfaultfd_is_initialized(struct userfaultfd_ctx *ctx)
>  {
> -	return ctx->features & UFFD_FEATURE_INITIALIZED;
> +	return userfaultfd_features(ctx) & UFFD_FEATURE_INITIALIZED;
>  }
>  
>  static bool userfaultfd_wp_async_ctx(struct userfaultfd_ctx *ctx)
>  {
> -	return ctx && (ctx->features & UFFD_FEATURE_WP_ASYNC);
> +	return ctx && (userfaultfd_features(ctx) & UFFD_FEATURE_WP_ASYNC);
>  }
>  
>  static bool userfaultfd_minor_anon_ctx(struct userfaultfd_ctx *ctx)
>  {
> -	return ctx && (ctx->features & UFFD_FEATURE_MINOR_ANON);
> +	return ctx && (userfaultfd_features(ctx) & UFFD_FEATURE_MINOR_ANON);
>  }
>  
>  static bool userfaultfd_minor_async_ctx(struct userfaultfd_ctx *ctx)
>  {
> -	return ctx && (ctx->features & UFFD_FEATURE_MINOR_ASYNC);
> +	return ctx && (userfaultfd_features(ctx) & UFFD_FEATURE_MINOR_ASYNC);
>  }
>  
>  static unsigned int userfaultfd_ctx_flags(struct userfaultfd_ctx *ctx)
> @@ -122,7 +131,7 @@ bool userfaultfd_wp_unpopulated(struct vm_area_struct *vma)
>  	if (!ctx)
>  		return false;
>  
> -	return ctx->features & UFFD_FEATURE_WP_UNPOPULATED;
> +	return userfaultfd_features(ctx) & UFFD_FEATURE_WP_UNPOPULATED;
>  }
>  
>  static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode,
> @@ -435,7 +444,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
>  	/* 0 or > 1 flags set is a bug; we expect exactly 1. */
>  	VM_WARN_ON_ONCE(!reason || (reason & (reason - 1)));
>  
> -	if (ctx->features & UFFD_FEATURE_SIGBUS)
> +	if (userfaultfd_features(ctx) & UFFD_FEATURE_SIGBUS)
>  		goto out;
>  	if (!(vmf->flags & FAULT_FLAG_USER) && (ctx->flags & UFFD_USER_MODE_ONLY))
>  		goto out;
> @@ -506,7 +515,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
>  	init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function);
>  	uwq.wq.private = current;
>  	uwq.msg = userfault_msg(vmf->address, vmf->real_address, vmf->flags,
> -				reason, ctx->features);
> +				reason, userfaultfd_features(ctx));
>  	uwq.ctx = ctx;
>  	uwq.waken = false;
>  
> @@ -668,7 +677,7 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs)
>  	if (!octx)
>  		return 0;
>  
> -	if (!(octx->features & UFFD_FEATURE_EVENT_FORK)) {
> +	if (!(userfaultfd_features(octx) & UFFD_FEATURE_EVENT_FORK)) {
>  		userfaultfd_reset_ctx(vma);
>  		return 0;
>  	}
> @@ -774,7 +783,7 @@ void mremap_userfaultfd_prep(struct vm_area_struct *vma,
>  	if (!ctx)
>  		return;
>  
> -	if (ctx->features & UFFD_FEATURE_EVENT_REMAP) {
> +	if (userfaultfd_features(ctx) & UFFD_FEATURE_EVENT_REMAP) {
>  		vm_ctx->ctx = ctx;
>  		userfaultfd_ctx_get(ctx);
>  		down_write(&ctx->map_changing_lock);
> @@ -824,7 +833,7 @@ bool userfaultfd_remove(struct vm_area_struct *vma,
>  	struct userfaultfd_wait_queue ewq;
>  
>  	ctx = vma->vm_userfaultfd_ctx.ctx;
> -	if (!ctx || !(ctx->features & UFFD_FEATURE_EVENT_REMOVE))
> +	if (!ctx || !(userfaultfd_features(ctx) & UFFD_FEATURE_EVENT_REMOVE))
>  		return true;
>  
>  	userfaultfd_ctx_get(ctx);
> @@ -863,7 +872,7 @@ int userfaultfd_unmap_prep(struct vm_area_struct *vma, unsigned long start,
>  	struct userfaultfd_unmap_ctx *unmap_ctx;
>  	struct userfaultfd_ctx *ctx = vma->vm_userfaultfd_ctx.ctx;
>  
> -	if (!ctx || !(ctx->features & UFFD_FEATURE_EVENT_UNMAP) ||
> +	if (!ctx || !(userfaultfd_features(ctx) & UFFD_FEATURE_EVENT_UNMAP) ||
>  	    has_unmap_ctx(ctx, unmaps, start, end))
>  		return 0;
>  
> @@ -1826,6 +1835,65 @@ static int userfaultfd_deactivate(struct userfaultfd_ctx *ctx,
>  	return ret;
>  }
>  
> +/*
> + * Features that can be toggled at runtime via UFFDIO_SET_MODE.
> + * Only async features that were enabled at UFFDIO_API time may be toggled.
> + */
> +#define UFFD_FEATURE_TOGGLEABLE	(UFFD_FEATURE_MINOR_ASYNC)
> +
> +static int userfaultfd_set_mode(struct userfaultfd_ctx *ctx,
> +				  unsigned long arg)
> +{
> +	struct uffdio_set_mode mode;
> +	struct mm_struct *mm = ctx->mm;
> +
> +	if (copy_from_user(&mode, (void __user *)arg, sizeof(mode)))
> +		return -EFAULT;
> +
> +	/* enable and disable must not overlap */
> +	if (mode.enable & mode.disable)
> +		return -EINVAL;
> +
> +	/* only toggleable features are allowed */
> +	if ((mode.enable | mode.disable) & ~UFFD_FEATURE_TOGGLEABLE)
> +		return -EINVAL;

The commit message states "Only async features that were enabled at
UFFDIO_API time may be toggled."  However, the code only checks that
the requested feature is in UFFD_FEATURE_TOGGLEABLE.

Is it intentional that a user who opened a uffd without
UFFD_FEATURE_MINOR_ASYNC can still enable it later via
UFFDIO_SET_MODE? 

> +
> +	if (!mmget_not_zero(mm))
> +		return -ESRCH;
> +
> +	/*
> +	 * mmap_write_lock serializes against all page faults.
> +	 * After we release, no in-flight faults from the old mode exist.
> +	 */
> +	{
> +		unsigned int new_features;
> +
> +		mmap_write_lock(mm);
> +		new_features = userfaultfd_features(ctx);
> +		new_features |= mode.enable;
> +		new_features &= ~mode.disable;
> +		WRITE_ONCE(ctx->features, new_features);
> +		mmap_write_unlock(mm);
> +	}
> +
> +	/*
> +	 * If switching to async, wake threads blocked in handle_userfault().
> +	 * They will retry the fault and auto-resolve under the new mode.
> +	 * len=0 means wake all pending faults on this context.
> +	 */
> +	if (mode.enable & UFFD_FEATURE_MINOR_ASYNC) {
> +		struct userfaultfd_wake_range range = { .len = 0 };
> +
> +		spin_lock_irq(&ctx->fault_pending_wqh.lock);
> +		__wake_up_locked_key(&ctx->fault_pending_wqh, TASK_NORMAL,
> +				     &range);
> +		__wake_up(&ctx->fault_wqh, TASK_NORMAL, 1, &range);
> +		spin_unlock_irq(&ctx->fault_pending_wqh.lock);
> +	}
> +
> +	mmput(mm);
> +	return 0;
> +}
>  
>  static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg)
>  {
> @@ -2150,6 +2218,9 @@ static long userfaultfd_ioctl(struct file *file, unsigned cmd,
>  	case UFFDIO_DEACTIVATE:
>  		ret = userfaultfd_deactivate(ctx, arg);
>  		break;
> +	case UFFDIO_SET_MODE:
> +		ret = userfaultfd_set_mode(ctx, arg);
> +		break;
>  	}
>  	return ret;
>  }
> @@ -2177,7 +2248,7 @@ static void userfaultfd_show_fdinfo(struct seq_file *m, struct file *f)
>  	 *	protocols: aa:... bb:...
>  	 */
>  	seq_printf(m, "pending:\t%lu\ntotal:\t%lu\nAPI:\t%Lx:%x:%Lx\n",
> -		   pending, total, UFFD_API, ctx->features,
> +		   pending, total, UFFD_API, userfaultfd_features(ctx),
>  		   UFFD_API_IOCTLS|UFFD_API_RANGE_IOCTLS);
>  }
>  #endif
> diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
> index 775825da2596..f0f14f9db06c 100644
> --- a/include/uapi/linux/userfaultfd.h
> +++ b/include/uapi/linux/userfaultfd.h
> @@ -84,6 +84,7 @@
>  #define _UFFDIO_CONTINUE		(0x07)
>  #define _UFFDIO_POISON			(0x08)
>  #define _UFFDIO_DEACTIVATE		(0x09)
> +#define _UFFDIO_SET_MODE		(0x0A)
>  #define _UFFDIO_API			(0x3F)
>  
>  /* userfaultfd ioctl ids */
> @@ -110,6 +111,8 @@
>  				      struct uffdio_poison)
>  #define UFFDIO_DEACTIVATE	_IOR(UFFDIO, _UFFDIO_DEACTIVATE,	\
>  				     struct uffdio_range)
> +#define UFFDIO_SET_MODE		_IOW(UFFDIO, _UFFDIO_SET_MODE,	\
> +				     struct uffdio_set_mode)
>  
>  /* read() structure */
>  struct uffd_msg {
> @@ -395,6 +398,16 @@ struct uffdio_move {
>  	__s64 move;
>  };
>  
> +struct uffdio_set_mode {
> +	/*
> +	 * Toggle async mode for features at runtime.
> +	 * Supported: UFFD_FEATURE_MINOR_ASYNC.
> +	 * Setting a bit in both enable and disable is invalid.
> +	 */
> +	__u64 enable;
> +	__u64 disable;
> +};
> +
>  /*
>   * Flags for the userfaultfd(2) system call itself.
>   */
> -- 
> 2.51.2
> 
> 

^ permalink raw reply

* Re: [PATCH v4 04/13] dt-bindings: power: supply: document Samsung S2M series PMIC charger device
From: Krzysztof Kozlowski @ 2026-04-15 14:39 UTC (permalink / raw)
  To: Kaustabh Chakraborty
  Cc: Lee Jones, Pavel Machek, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, MyungJoo Ham, Chanwoo Choi, Sebastian Reichel,
	André Draszik, Alexandre Belloni, Jonathan Corbet,
	Shuah Khan, Nam Tran, Łukasz Lebiedziński, linux-leds,
	devicetree, linux-kernel, linux-pm, linux-samsung-soc, linux-rtc,
	linux-doc
In-Reply-To: <DHTS9H2EIM2D.2TC17F9WBOOR1@disroot.org>

On 15/04/2026 16:03, Kaustabh Chakraborty wrote:
> On 2026-04-15 09:18 +02:00, Krzysztof Kozlowski wrote:
>> On Tue, Apr 14, 2026 at 12:02:56PM +0530, Kaustabh Chakraborty wrote:
>>> +description: |
>>> +  The Samsung S2M series PMIC battery charger manages power interfacing
>>> +  of the USB port. It may supply power, as done in USB OTG operation
>>> +  mode, or it may accept power and redirect it to the battery fuelgauge
>>> +  for charging.
>>> +
>>> +  This is a part of device tree bindings for S2M and S5M family of Power
>>> +  Management IC (PMIC).
>>> +
>>> +  See also Documentation/devicetree/bindings/mfd/samsung,s2mps11.yaml for
>>> +  additional information and example.
>>> +
>>> +allOf:
>>> +  - $ref: power-supply.yaml#
>>> +
>>> +properties:
>>> +  compatible:
>>> +    enum:
>>> +      - samsung,s2mu005-charger
>>> +
>>> +  port:
>>> +    $ref: /schemas/graph.yaml#/properties/port
>>
>> That port is internal part of the device, thus should be dropped which
>> leaves you with only one property - monitored battery - and therefore
>> fold the node into the parent node.
> 
> And that monitored-battery belongs to power-supply.yaml. Do I then
> include the allOf block in the mfd/samsung,s2mps11.yaml under the
> s2mu005 compatible?

allOf does not go under the compatible. The entire device schema should
have $ref to power-supply.yaml, just like many other devices have that
or different $ref.

Best regards,
Krzysztof

^ permalink raw reply

* Re: [PATCH v4 05/13] dt-bindings: mfd: s2mps11: add documentation for S2MU005 PMIC
From: Krzysztof Kozlowski @ 2026-04-15 14:27 UTC (permalink / raw)
  To: Kaustabh Chakraborty
  Cc: Lee Jones, Pavel Machek, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, MyungJoo Ham, Chanwoo Choi, Sebastian Reichel,
	André Draszik, Alexandre Belloni, Jonathan Corbet,
	Shuah Khan, Nam Tran, Łukasz Lebiedziński, linux-leds,
	devicetree, linux-kernel, linux-pm, linux-samsung-soc, linux-rtc,
	linux-doc
In-Reply-To: <DHTSO9L6YZTQ.WYM9ERXBGNGB@disroot.org>

On 15/04/2026 16:22, Kaustabh Chakraborty wrote:
> On 2026-04-15 09:17 +02:00, Krzysztof Kozlowski wrote:
>> On Tue, Apr 14, 2026 at 12:02:57PM +0530, Kaustabh Chakraborty wrote:
>>>  
>>>    clocks:
>>>      $ref: /schemas/clock/samsung,s2mps11.yaml
>>>      description:
>>>        Child node describing clock provider.
>>>  
>>> +  charger:
>>> +    $ref: /schemas/power/supply/samsung,s2mu005-charger.yaml
>>> +    description:
>>> +      Child node describing battery charger device.
>>> +
>>> +  extcon:
>>
>> You got comment to drop extcon naming. If this stays, it's muic for
>> example.
>>
>>> +    $ref: /schemas/extcon/samsung,s2mu005-muic.yaml
>>> +    description:
>>> +      Child node describing extcon device.
>>> +
>>> +  flash:
>>> +    $ref: /schemas/leds/samsung,s2mu005-flash.yaml
>>> +    description:
>>> +      Child node describing flash LEDs.
>>> +
>>
>> Please make it a separate binding file.
> 
> What do you mean by that?

I mean, S2MU005 should go to its own file.

> 
>>
>>>    interrupts:
>>>      maxItems: 1
>>>  
>>> @@ -43,6 +59,11 @@ properties:
>>>      description:
>>>        List of child nodes that specify the regulators.
>>>  
>>> +  rgb:
>>
>> led
> 
> Well flash ones are also LEDs. Would you rather have `flash { ... }` and
> `rgb { ... }` under `led { ... }` instead?

There is no approved name "rgb" for LEDs. What is the name for flash LEDs?

Best regards,
Krzysztof

^ permalink raw reply

* Re: [PATCH v4 05/13] dt-bindings: mfd: s2mps11: add documentation for S2MU005 PMIC
From: Kaustabh Chakraborty @ 2026-04-15 14:22 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Kaustabh Chakraborty
  Cc: Lee Jones, Pavel Machek, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, MyungJoo Ham, Chanwoo Choi, Sebastian Reichel,
	André Draszik, Alexandre Belloni, Jonathan Corbet,
	Shuah Khan, Nam Tran, Łukasz Lebiedziński, linux-leds,
	devicetree, linux-kernel, linux-pm, linux-samsung-soc, linux-rtc,
	linux-doc
In-Reply-To: <20260415-notorious-dainty-starfish-58a13c@quoll>

On 2026-04-15 09:17 +02:00, Krzysztof Kozlowski wrote:
> On Tue, Apr 14, 2026 at 12:02:57PM +0530, Kaustabh Chakraborty wrote:
>>  
>>    clocks:
>>      $ref: /schemas/clock/samsung,s2mps11.yaml
>>      description:
>>        Child node describing clock provider.
>>  
>> +  charger:
>> +    $ref: /schemas/power/supply/samsung,s2mu005-charger.yaml
>> +    description:
>> +      Child node describing battery charger device.
>> +
>> +  extcon:
>
> You got comment to drop extcon naming. If this stays, it's muic for
> example.
>
>> +    $ref: /schemas/extcon/samsung,s2mu005-muic.yaml
>> +    description:
>> +      Child node describing extcon device.
>> +
>> +  flash:
>> +    $ref: /schemas/leds/samsung,s2mu005-flash.yaml
>> +    description:
>> +      Child node describing flash LEDs.
>> +
>
> Please make it a separate binding file.

What do you mean by that?

>
>>    interrupts:
>>      maxItems: 1
>>  
>> @@ -43,6 +59,11 @@ properties:
>>      description:
>>        List of child nodes that specify the regulators.
>>  
>> +  rgb:
>
> led

Well flash ones are also LEDs. Would you rather have `flash { ... }` and
`rgb { ... }` under `led { ... }` instead?

>
>> +    $ref: /schemas/leds/samsung,s2mu005-rgb.yaml
>> +    description:
>> +      Child node describing RGB LEDs.
>> +

^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Miklos Szeredi @ 2026-04-15 14:04 UTC (permalink / raw)
  To: Gregory Price
  Cc: David Hildenbrand (Arm), Darrick J. Wong, John Groves,
	Joanne Koong, Bernd Schubert, John Groves, Dan Williams,
	Bernd Schubert, Alison Schofield, John Groves, Jonathan Corbet,
	Shuah Khan, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Randy Dunlap, Jeff Layton,
	Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Josef Bacik,
	Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
	Sean Christopherson, Shivank Garg, Ackerley Tng, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <ad-UAMcALRubBcHk@gourry-fedora-PF4VCD3F>

On Wed, 15 Apr 2026 at 15:35, Gregory Price <gourry@gourry.net> wrote:

> This was my first reaction when I realized the BPF program would be
> controlling iomap return value in the fault path.  Big ol' (!)  popped
> up over my head.

I'm wondering which part of this triggers the big (!).

BPF program being run in the fault path?

Or that the return value from the BPF function is used as iomap?

Or something else?

Thanks,
Miklos

^ permalink raw reply

* Re: [PATCH v4 04/13] dt-bindings: power: supply: document Samsung S2M series PMIC charger device
From: Kaustabh Chakraborty @ 2026-04-15 14:03 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Kaustabh Chakraborty
  Cc: Lee Jones, Pavel Machek, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, MyungJoo Ham, Chanwoo Choi, Sebastian Reichel,
	André Draszik, Alexandre Belloni, Jonathan Corbet,
	Shuah Khan, Nam Tran, Łukasz Lebiedziński, linux-leds,
	devicetree, linux-kernel, linux-pm, linux-samsung-soc, linux-rtc,
	linux-doc
In-Reply-To: <20260415-swinging-radical-junglefowl-85dcf7@quoll>

On 2026-04-15 09:18 +02:00, Krzysztof Kozlowski wrote:
> On Tue, Apr 14, 2026 at 12:02:56PM +0530, Kaustabh Chakraborty wrote:
>> +description: |
>> +  The Samsung S2M series PMIC battery charger manages power interfacing
>> +  of the USB port. It may supply power, as done in USB OTG operation
>> +  mode, or it may accept power and redirect it to the battery fuelgauge
>> +  for charging.
>> +
>> +  This is a part of device tree bindings for S2M and S5M family of Power
>> +  Management IC (PMIC).
>> +
>> +  See also Documentation/devicetree/bindings/mfd/samsung,s2mps11.yaml for
>> +  additional information and example.
>> +
>> +allOf:
>> +  - $ref: power-supply.yaml#
>> +
>> +properties:
>> +  compatible:
>> +    enum:
>> +      - samsung,s2mu005-charger
>> +
>> +  port:
>> +    $ref: /schemas/graph.yaml#/properties/port
>
> That port is internal part of the device, thus should be dropped which
> leaves you with only one property - monitored battery - and therefore
> fold the node into the parent node.

And that monitored-battery belongs to power-supply.yaml. Do I then
include the allOf block in the mfd/samsung,s2mps11.yaml under the
s2mu005 compatible?

>
> Best regards,
> Krzysztof


^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Gregory Price @ 2026-04-15 13:34 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Darrick J. Wong, John Groves, Miklos Szeredi, Joanne Koong,
	Bernd Schubert, John Groves, Dan Williams, Bernd Schubert,
	Alison Schofield, John Groves, Jonathan Corbet, Shuah Khan,
	Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Randy Dunlap, Jeff Layton,
	Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Josef Bacik,
	Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
	Sean Christopherson, Shivank Garg, Ackerley Tng, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <f254f6fc-dc06-4612-82d7-35bb10dbd32e@kernel.org>

On Wed, Apr 15, 2026 at 10:16:38AM +0200, David Hildenbrand (Arm) wrote:
> On 4/15/26 00:20, Gregory Price wrote:
> > On Tue, Apr 14, 2026 at 11:57:40AM -0700, Darrick J. Wong wrote:
> >>>
> >>> I very strongly object to making this a prerequisite to merging. This
> >>> is an untested idea that will certainly delay us by at least a couple
> >>> of merge windows when products are shipping now, and the existing approach
> >>> has been in circulation for a long time. It is TOO LATE!!!!!!
> >>
> > ...
> >>
> >> That said, you're clearly pissed at the goalposts changing yet again,
> >> and that's really not fair that we collectively keep moving them.
> >>
> > 
> > This seems a bit more than moving a goalpost.
> > 
> > We're now gating working software, for real working hardware, on a novel,
> > unproven BPF ops structure that controls page table mappings on page table
> > faults which would be used by exactly 1 user : FAMFS.
> 
> Are MM people on board with even letting BPF do that? Honest question,
> if someone has a pointer to how that should work, that would be appreciated.
> 

This was my first reaction when I realized the BPF program would be
controlling iomap return value in the fault path.  Big ol' (!)  popped
up over my head.

~Gregory

^ permalink raw reply

* RE: [PATCH v7 5/6] iio: adc: ad4691: add oversampling support
From: Sabau, Radu bogdan @ 2026-04-15 13:26 UTC (permalink / raw)
  To: Nuno Sá, David Lechner
  Cc: Jonathan Cameron, Lars-Peter Clausen, Hennerich, Michael,
	Sa, Nuno, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Uwe Kleine-König, Liam Girdwood, Mark Brown,
	Linus Walleij, Bartosz Golaszewski, Philipp Zabel,
	Jonathan Corbet, Shuah Khan, linux-iio@vger.kernel.org,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pwm@vger.kernel.org, linux-gpio@vger.kernel.org,
	linux-doc@vger.kernel.org
In-Reply-To: <LV9PR03MB8414CFF38DAD2BEB7AE3E704F7222@LV9PR03MB8414.namprd03.prod.outlook.com>



> -----Original Message-----
> From: Sabau, Radu bogdan
> Sent: Wednesday, April 15, 2026 4:03 PM

...
 
> > > >
> > > > More than this, if the OSR is 32 the maximum effective rate would be
> > 31250, so 25kHz
> > > > would make it the closes available one. If the user would select 1MHz
> from
> > the available
> > > > list it would be weird I would say. So perhaps a solution for this is to
> display
> > the avail list
> > > > depending on the set OSR value.
> > >
> > > Yes, the available list should reflect the current state of any other attributes
> > > that affect it.
> >
> > IMO, the above makes total sense to me.
> >
> > - Nuno Sá
> >
> 
> Hi everyone and thank you so much for your feedback!
> 
> After thinking this through carefully and testing on hardware (ad4692), here is
> the design I have in mind:
> 
> in_voltageN_sampling_frequency = effective rate = `osc_freq / osr[N]`:
> 
> The chip has a single internal oscillator shred by all channels; each channel
> independently accumulating osc[N] oscillator cycles before producing a result.
> 
> Writing in_voltageN_sampling_frequency = freq:
> 
> The driver computes the needed_osc = freq * osr[N] and snaps down to the
> largest
> available oscillator table entry satisfying both `osc <= needed_osc` and an exact
> division to osr. The divisibility constraint ensures the read-back is always an
> exact
> integer.
> The result is stored in a single shared `target_osc_freq_Hz` - writing the
> attribute
> for any channel changes the shared oscillator and therefore the read-back of
> all
> other channels.
> 
> in_voltageN_sampling_frequency_available:
> 
> Computed dynamically from the channel's current OSR. The list naturally
> becomes
> sparser as OSR increases, capping at `max_rate / osr[N]` which is exactly the
> chip's
> behaviour, and therefore more intuitive for the user.
> 
> OSC_FREQ_REG write timing:
> 
> `target_osc_freq_Hz` is written to hardware at two points:
> - Single-shot read: immediately before starting accumulation.
> - CNV busrt buffer enable: inside enter_conversion_mode, after the manual
> mode
> early return (manual mode uses SPI CS toggling, not the internal oscillator, so
> the
> write is skipped there).
> 
> This keeps the deffered-write benefit - both sampling_frequency and osr can
> be
> set in any order before enabling the buffer/single-shot reading.
> 
> Buffer Mode:
> 
> After desired rates/osr are set by the user for each channel, reading back the
> sampling
> frequency of each channel gives him the true effective rate for each. Therefore
> he can use that information in order to set the buffer sampling frequency
> accordingly
> and helping him use the chip with correct synchronization more intuitively.
> 
> I have also performed the next test using the hardware and got correct results:
> - test case (ad4692, 1MHz maximum internal oscillator rate):
> 
> 1. Set channel 0 OSR=32. Available list: {31250, 15625, 12500, 6250, 3125}.
>     Write sampling_frequency=10000 (not in the list) -> snaps to 6250
> (osc=200000Hz).
>     Correct readback = 6250.
> 2. Set channel 1 OSR=4. Read channel 1 sampling frequency -> 50000
> (=200000/4).
>     Shared oscillator correctly reflected across channels.
> 3. Change channel 0 OSR from 32 to 8. Driver recomputes as follows : effective
> stays
>     6250 as before and needed_osc becomes 50000, exact table hit. Readback
> channel 0:
>     6250 (rate preserved). Readback channel 1 (OSR=4): 12500. (oscillator
> change visible).
>     The sampling for channel 0 can be of course set to another available value as
> well and
>     Make match with the initial requested 50k of channel 1. (in this case, set
> channel 0 to
>     25k).
> 4. -EINVAL rejection is atomic: with OSR=1 and SF=1250 at start for lets say
> channel 0, writing
>     OSR=32 is rejected since the needed_osc=40000, which is not a table entry
> and also has no
>     table entry <= 40000 that is divisible by 32). Both OSR and SF remain
> unchanged. Raising SF
>     to 500000 first then writing OSR=32 succeeds - osc snaps to 1000000,
> readback SF=31250.
> 
>     In (4) case we could still let the user have its sampling frequency as is
> (1250/32=39.0625),
>     though it won't result in a precise true integer value, but a rounded (39)
> one, and when
>     other channel would have OSR/rate changed it would imply a messy change
> in the previous
>     channel's SF and requiring a non-existent/matching internal osc value (most
> of the times
>     a float one), and true SF would be lost.
> 
> Do you guys think this approach suits the best?
> 
> Thanks,
> Radu

Hmm, perhaps changing the internal osc value when changing OSR is not correct.
If OSR is changed, only the effective SF of the respective channel should be changed
not the whole internal osc value. The effective rate readback value then becomes
target_osc_freq / new_osr automatically - no oscillator recalculation upon osr write,
no -EINVAL.

Then, if after an OSR change the effective rate is not on the available list (as the edge
case before of 39 rounded), writing `sampling_frequency` (choosing a new available value)
fixes it. The 39 rounded would still work correctly, only that the value at hand wouldn't
be precise to the last decimal though I guess the user should be aware that 1250/32 is
not an actual round 39, right?

^ permalink raw reply

* Re: Volunteering to do more reviews
From: Jonathan Corbet @ 2026-04-15 13:16 UTC (permalink / raw)
  To: Konstantin Ryabitsev, linux-doc
In-Reply-To: <20260414-valiant-sticky-piculet-3b7b3f@lemur>

Konstantin Ryabitsev <mricon@kernel.org> writes:

> Jon and others:
>
> I need more direct hands-on experience doing reviews and using my own
> tooling, so I'd like to offer to do more reviewing of patches sent to
> linux-doc, if that sort of thing is welcome and I won't be stepping on
> anyone's toes.

Of course it's welcome!  I'd love to see it.

Thanks,

jon

^ permalink raw reply

* RE: [PATCH v7 5/6] iio: adc: ad4691: add oversampling support
From: Sabau, Radu bogdan @ 2026-04-15 13:03 UTC (permalink / raw)
  To: Nuno Sá, David Lechner
  Cc: Jonathan Cameron, Lars-Peter Clausen, Hennerich, Michael,
	Sa, Nuno, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Uwe Kleine-König, Liam Girdwood, Mark Brown,
	Linus Walleij, Bartosz Golaszewski, Philipp Zabel,
	Jonathan Corbet, Shuah Khan, linux-iio@vger.kernel.org,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pwm@vger.kernel.org, linux-gpio@vger.kernel.org,
	linux-doc@vger.kernel.org
In-Reply-To: <ad9J9C5K7tyxuztU@nsa>

> -----Original Message-----
> From: Nuno Sá <noname.nuno@gmail.com>
> Sent: Wednesday, April 15, 2026 11:21 AM

...

> > >
> > > More than this, if the OSR is 32 the maximum effective rate would be
> 31250, so 25kHz
> > > would make it the closes available one. If the user would select 1MHz from
> the available
> > > list it would be weird I would say. So perhaps a solution for this is to display
> the avail list
> > > depending on the set OSR value.
> >
> > Yes, the available list should reflect the current state of any other attributes
> > that affect it.
> 
> IMO, the above makes total sense to me.
> 
> - Nuno Sá
> 

Hi everyone and thank you so much for your feedback!

After thinking this through carefully and testing on hardware (ad4692), here is
the design I have in mind:

in_voltageN_sampling_frequency = effective rate = `osc_freq / osr[N]`:

The chip has a single internal oscillator shred by all channels; each channel
independently accumulating osc[N] oscillator cycles before producing a result.

Writing in_voltageN_sampling_frequency = freq:

The driver computes the needed_osc = freq * osr[N] and snaps down to the largest
available oscillator table entry satisfying both `osc <= needed_osc` and an exact
division to osr. The divisibility constraint ensures the read-back is always an exact
integer.
The result is stored in a single shared `target_osc_freq_Hz` - writing the attribute
for any channel changes the shared oscillator and therefore the read-back of all
other channels.

in_voltageN_sampling_frequency_available:

Computed dynamically from the channel's current OSR. The list naturally becomes
sparser as OSR increases, capping at `max_rate / osr[N]` which is exactly the chip's
behaviour, and therefore more intuitive for the user.

OSC_FREQ_REG write timing:

`target_osc_freq_Hz` is written to hardware at two points:
- Single-shot read: immediately before starting accumulation.
- CNV busrt buffer enable: inside enter_conversion_mode, after the manual mode
early return (manual mode uses SPI CS toggling, not the internal oscillator, so the
write is skipped there).

This keeps the deffered-write benefit - both sampling_frequency and osr can be
set in any order before enabling the buffer/single-shot reading.

Buffer Mode:

After desired rates/osr are set by the user for each channel, reading back the sampling
frequency of each channel gives him the true effective rate for each. Therefore
he can use that information in order to set the buffer sampling frequency accordingly
and helping him use the chip with correct synchronization more intuitively.

I have also performed the next test using the hardware and got correct results:
- test case (ad4692, 1MHz maximum internal oscillator rate):

1. Set channel 0 OSR=32. Available list: {31250, 15625, 12500, 6250, 3125}.
    Write sampling_frequency=10000 (not in the list) -> snaps to 6250 (osc=200000Hz).
    Correct readback = 6250.
2. Set channel 1 OSR=4. Read channel 1 sampling frequency -> 50000 (=200000/4).
    Shared oscillator correctly reflected across channels.
3. Change channel 0 OSR from 32 to 8. Driver recomputes as follows : effective stays
    6250 as before and needed_osc becomes 50000, exact table hit. Readback channel 0:
    6250 (rate preserved). Readback channel 1 (OSR=4): 12500. (oscillator change visible).
    The sampling for channel 0 can be of course set to another available value as well and
    Make match with the initial requested 50k of channel 1. (in this case, set channel 0 to
    25k).
4. -EINVAL rejection is atomic: with OSR=1 and SF=1250 at start for lets say channel 0, writing
    OSR=32 is rejected since the needed_osc=40000, which is not a table entry and also has no
    table entry <= 40000 that is divisible by 32). Both OSR and SF remain unchanged. Raising SF
    to 500000 first then writing OSR=32 succeeds - osc snaps to 1000000, readback SF=31250.
    
    In (4) case we could still let the user have its sampling frequency as is (1250/32=39.0625),
    though it won't result in a precise true integer value, but a rounded (39) one, and when
    other channel would have OSR/rate changed it would imply a messy change in the previous
    channel's SF and requiring a non-existent/matching internal osc value (most of the times
    a float one), and true SF would be lost.

Do you guys think this approach suits the best?

Thanks,
Radu

    >
> > >
> > > Linking the two together is perhaps wrong to begin with from my end,
> since in this
> > > driver's case, the per-channel sampling frequency is controlled by the
> internal oscillator
> > > which has static available values. So perhaps sampling frequency should be
> separate, and
> > > OSR separate as well, which would make everything cleaner.
> > >
> > > Indeed, the effective rate is changed by OSR, but perhaps that is something
> the user
> > > should be aware of, since the sampling frequency is the rate at which the
> channel samples
> > > (1 sample per period) and OSR is how many times the channel samples
> upon a final sample
> > > is to be read. The user already has to take this into account when setting
> the buffer
> > > sampling frequency, so it would make sense to take this into account here
> too.
> >
> > We can't change the definition of the IIO ABI just to make one driver simpler
> > to implement. The OSR and sample rate can't be completely independent.
> >
> > If you want to leave it the way it is currently implemented though, that is
> fine.
> >
> > >
> > > Please let me know you thoughts on this,
> > > Radu
> >

^ permalink raw reply

* [PATCH v4 3/3] Documentation: document panic_on_unrecoverable_memory_failure sysctl
From: Breno Leitao @ 2026-04-15 12:55 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko
  Cc: linux-mm, linux-kernel, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260415-ecc_panic-v4-0-2d0277f8f601@debian.org>

Add documentation for the new vm.panic_on_unrecoverable_memory_failure
sysctl, describing the three categories of failures that trigger a
panic and noting which kernel page types are not yet covered.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 Documentation/admin-guide/sysctl/vm.rst | 37 +++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 97e12359775c9..592ce9ec38c4b 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -67,6 +67,7 @@ Currently, these files are in /proc/sys/vm:
 - page-cluster
 - page_lock_unfairness
 - panic_on_oom
+- panic_on_unrecoverable_memory_failure
 - percpu_pagelist_high_fraction
 - stat_interval
 - stat_refresh
@@ -925,6 +926,42 @@ panic_on_oom=2+kdump gives you very strong tool to investigate
 why oom happens. You can get snapshot.
 
 
+panic_on_unrecoverable_memory_failure
+======================================
+
+When a hardware memory error (e.g. multi-bit ECC) hits a kernel page
+that cannot be recovered by the memory failure handler, the default
+behaviour is to ignore the error and continue operation.  This is
+dangerous because the corrupted data remains accessible to the kernel,
+risking silent data corruption or a delayed crash when the poisoned
+memory is next accessed.
+
+When enabled, this sysctl triggers a panic on three categories of
+unrecoverable failures: reserved kernel pages, non-buddy kernel pages
+with zero refcount (e.g. tail pages of high-order allocations), and
+pages whose state cannot be classified as recoverable.
+
+Note that some kernel page types — such as slab objects, vmalloc
+allocations, kernel stacks, and page tables — share a failure path
+with transient refcount races and are not currently covered by this
+option. I.e, do not panic when not confident of the page status.
+
+For many environments it is preferable to panic immediately with a clean
+crash dump that captures the original error context, rather than to
+continue and face a random crash later whose cause is difficult to
+diagnose.
+
+= =====================================================================
+0 Try to continue operation (default).
+1 Panic immediately.  If the ``panic`` sysctl is also non-zero then the
+  machine will be rebooted.
+= =====================================================================
+
+Example::
+
+     echo 1 > /proc/sys/vm/panic_on_unrecoverable_memory_failure
+
+
 percpu_pagelist_high_fraction
 =============================
 

-- 
2.52.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox