Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* [PATCH v11 03/20] gpu: nova-core: gsp: Expose total physical VRAM end from FB region info
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Add `total_fb_end()` to `GspStaticConfigInfo` that computes the
exclusive end address of the highest valid FB region covering both
usable and GSP-reserved areas.

This allows callers to know the full physical VRAM extent, not just
the allocatable portion.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/gsp/commands.rs    | 6 ++++++
 drivers/gpu/nova-core/gsp/fw/commands.rs | 7 +++++++
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index d18abd8b5f04..e42a865fd4ac 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -196,6 +196,9 @@ pub(crate) struct GetGspStaticInfoReply {
     /// Usable FB (VRAM) region for driver memory allocation.
     #[expect(dead_code)]
     pub(crate) usable_fb_region: Range<u64>,
+    /// End of VRAM.
+    #[expect(dead_code)]
+    pub(crate) total_fb_end: u64,
 }
 
 impl MessageFromGsp for GetGspStaticInfoReply {
@@ -207,9 +210,12 @@ fn read(
         msg: &Self::Message,
         _sbuffer: &mut SBufferIter<array::IntoIter<&[u8], 2>>,
     ) -> Result<Self, Self::InitError> {
+        let total_fb_end = msg.total_fb_end().ok_or(ENODEV)?;
+
         Ok(GetGspStaticInfoReply {
             gpu_name: msg.gpu_name_str(),
             usable_fb_region: msg.first_usable_fb_region().ok_or(ENODEV)?,
+            total_fb_end,
         })
     }
 }
diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
index 3d5180e6b1e0..f2d59aa3131f 100644
--- a/drivers/gpu/nova-core/gsp/fw/commands.rs
+++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
@@ -164,6 +164,13 @@ pub(crate) fn first_usable_fb_region(&self) -> Option<Range<u64>> {
             }
         })
     }
+
+    /// Compute the end of physical VRAM from all FB regions.
+    pub(crate) fn total_fb_end(&self) -> Option<u64> {
+        self.fb_regions()
+            .filter_map(|reg| reg.limit.checked_add(1))
+            .max()
+    }
 }
 
 // SAFETY: Padding is explicit and will not contain uninitialized data.
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 02/20] gpu: nova-core: gsp: Extract usable FB region from GSP
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes
In-Reply-To: <20260415210548.3776595-1-joelagnelf@nvidia.com>

Add first_usable_fb_region() to GspStaticConfigInfo to extract the first
usable FB region from GSP's fbRegionInfoParams. Usable regions are those
that are not reserved or protected.

The extracted region is stored in GetGspStaticInfoReply and exposed as
usable_fb_region field for use by the memory subsystem.

Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/gsp/commands.rs    | 11 ++++--
 drivers/gpu/nova-core/gsp/fw/commands.rs | 45 +++++++++++++++++++++++-
 2 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index c89c7b57a751..d18abd8b5f04 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -4,6 +4,7 @@
     array,
     convert::Infallible,
     ffi::FromBytesUntilNulError,
+    ops::Range,
     str::Utf8Error, //
 };
 
@@ -189,15 +190,18 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
     }
 }
 
-/// The reply from the GSP to the [`GetGspInfo`] command.
+/// The reply from the GSP to the [`GetGspStaticInfo`] command.
 pub(crate) struct GetGspStaticInfoReply {
     gpu_name: [u8; 64],
+    /// Usable FB (VRAM) region for driver memory allocation.
+    #[expect(dead_code)]
+    pub(crate) usable_fb_region: Range<u64>,
 }
 
 impl MessageFromGsp for GetGspStaticInfoReply {
     const FUNCTION: MsgFunction = MsgFunction::GetGspStaticInfo;
     type Message = GspStaticConfigInfo;
-    type InitError = Infallible;
+    type InitError = Error;
 
     fn read(
         msg: &Self::Message,
@@ -205,6 +209,7 @@ fn read(
     ) -> Result<Self, Self::InitError> {
         Ok(GetGspStaticInfoReply {
             gpu_name: msg.gpu_name_str(),
+            usable_fb_region: msg.first_usable_fb_region().ok_or(ENODEV)?,
         })
     }
 }
@@ -233,7 +238,7 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
     }
 }
 
-/// Send the [`GetGspInfo`] command and awaits for its reply.
+/// Send the [`GetGspStaticInfo`] command and awaits for its reply.
 pub(crate) fn get_gsp_info(cmdq: &Cmdq, bar: &Bar0) -> Result<GetGspStaticInfoReply> {
     cmdq.send_command(bar, GetGspStaticInfo)
 }
diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
index db46276430be..3d5180e6b1e0 100644
--- a/drivers/gpu/nova-core/gsp/fw/commands.rs
+++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
@@ -1,5 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
+use core::ops::Range;
+
 use kernel::{
     device,
     pci,
@@ -10,7 +12,10 @@
     }, //
 };
 
-use crate::gsp::GSP_PAGE_SIZE;
+use crate::{
+    gsp::GSP_PAGE_SIZE,
+    num::IntoSafeCast, //
+};
 
 use super::bindings;
 
@@ -121,6 +126,44 @@ impl GspStaticConfigInfo {
     pub(crate) fn gpu_name_str(&self) -> [u8; 64] {
         self.0.gpuNameString
     }
+
+    /// Returns an iterator over valid FB regions from GSP firmware data.
+    fn fb_regions(
+        &self,
+    ) -> impl Iterator<Item = &bindings::NV2080_CTRL_CMD_FB_GET_FB_REGION_FB_REGION_INFO> {
+        let fb_info = &self.0.fbRegionInfoParams;
+        fb_info
+            .fbRegion
+            .iter()
+            .take(fb_info.numFBRegions.into_safe_cast())
+            .filter(|reg| reg.limit >= reg.base)
+    }
+
+    /// Extracts the first usable FB region from GSP firmware data.
+    ///
+    /// Returns the first region suitable for driver memory allocation as a [`Range<u64>`].
+    /// Usable regions are those that satisfy all the following properties:
+    /// - Are not reserved for firmware internal use.
+    /// - Are not protected (hardware-enforced access restrictions).
+    /// - Support compression (can use GPU memory compression for bandwidth).
+    /// - Support ISO (isochronous memory for display requiring guaranteed bandwidth).
+    ///
+    /// TODO: Multiple discontinuous usable regions of RAM are possible in
+    /// special cases. We need to support it (to also match Nouveau's behavior).
+    pub(crate) fn first_usable_fb_region(&self) -> Option<Range<u64>> {
+        self.fb_regions().find_map(|reg| {
+            // Filter: not reserved, not protected, supports compression and ISO.
+            if reg.reserved == 0
+                && reg.bProtected == 0
+                && reg.supportCompressed != 0
+                && reg.supportISO != 0
+            {
+                reg.limit.checked_add(1).map(|end| reg.base..end)
+            } else {
+                None
+            }
+        })
+    }
 }
 
 // SAFETY: Padding is explicit and will not contain uninitialized data.
-- 
2.34.1


^ permalink raw reply related

* [PATCH v11 01/20] gpu: nova-core: gsp: Return GspStaticInfo from boot()
From: Joel Fernandes @ 2026-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
	Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
	rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
	Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
	Lucas De Marchi, Thomas Hellstrom, Helge Deller, Alex Gaynor,
	Boqun Feng, John Hubbard, Alistair Popple, Timur Tabi, Edwin Peer,
	Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
	Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi,
	Eliot Courtney, joel, linux-doc, amd-gfx, intel-gfx, intel-xe,
	linux-fbdev, Joel Fernandes

Refactor the GSP boot function to return GetGspStaticInfoReply.

This enables access required for memory management initialization to:
- bar1_pde_base: BAR1 page directory base.
- bar2_pde_base: BAR2 page directory base.
- usable memory regions in video memory.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Cc: Nikola Djukic <ndjukic@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs      | 9 +++++++--
 drivers/gpu/nova-core/gsp/boot.rs | 9 ++++++---
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 0f6fe9a1b955..b4da4a1ae156 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -21,7 +21,10 @@
     },
     fb::SysmemFlush,
     gfw,
-    gsp::Gsp,
+    gsp::{
+        commands::GetGspStaticInfoReply,
+        Gsp, //
+    },
     regs,
 };
 
@@ -238,6 +241,8 @@ pub(crate) struct Gpu {
     /// GSP runtime data. Temporarily an empty placeholder.
     #[pin]
     gsp: Gsp,
+    /// Static GPU information from GSP.
+    gsp_static_info: GetGspStaticInfoReply,
 }
 
 impl Gpu {
@@ -269,7 +274,7 @@ pub(crate) fn new<'a>(
 
             gsp <- Gsp::new(pdev),
 
-            _: { gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)? },
+            gsp_static_info: { gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)? },
 
             bar: devres_bar,
         })
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 6f707b3d1a54..d42637db06dd 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -33,7 +33,10 @@
     },
     gpu::Chipset,
     gsp::{
-        commands,
+        commands::{
+            self,
+            GetGspStaticInfoReply, //
+        },
         sequencer::{
             GspSequencer,
             GspSequencerParams, //
@@ -145,7 +148,7 @@ pub(crate) fn boot(
         chipset: Chipset,
         gsp_falcon: &Falcon<Gsp>,
         sec2_falcon: &Falcon<Sec2>,
-    ) -> Result {
+    ) -> Result<GetGspStaticInfoReply> {
         let dev = pdev.as_ref();
 
         let bios = Vbios::new(dev, bar)?;
@@ -235,6 +238,6 @@ pub(crate) fn boot(
             Err(e) => dev_warn!(pdev, "GPU name unavailable: {:?}\n", e),
         }
 
-        Ok(())
+        Ok(info)
     }
 }
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH v4 0/3] mm/memory-failure: add panic option for unrecoverable pages
From: Jiaqi Yan @ 2026-04-15 20:56 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	linux-mm, linux-kernel, linux-doc, kernel-team
In-Reply-To: <20260415-ecc_panic-v4-0-2d0277f8f601@debian.org>

Hi Breno,

On Wed, Apr 15, 2026 at 5:55 AM Breno Leitao <leitao@debian.org> wrote:
>
> When the memory failure handler encounters an in-use kernel page that it
> cannot recover (slab, page tables, kernel stacks, vmalloc, etc.), it
> currently logs the error as "Ignored" and continues operation.
>
> This leaves corrupted data accessible to the kernel, which will inevitably
> cause either silent data corruption or a delayed crash when the poisoned memory
> is next accessed.
>
> This is a common problem on large fleets. We frequently observe multi-bit ECC
> errors hitting kernel slab pages, where memory_failure() fails to recover them
> and the system crashes later at an unrelated code path, making root cause
> analysis unnecessarily difficult.
>
> Here is one specific example from production on an arm64 server: a multi-bit
> ECC error hit a dentry cache slab page, memory_failure() failed to recover it
> (slab pages are not supported by the hwpoison recovery mechanism), and 67
> seconds later d_lookup() accessed the poisoned cache line causing
> a synchronous external abort:
>
>     [88690.479680] [Hardware Error]: error_type: 3, multi-bit ECC
>     [88690.498473] Memory failure: 0x40272d: unhandlable page.
>     [88690.498619] Memory failure: 0x40272d: recovery action for
>                    get hwpoison page: Ignored
>     ...
>     [88757.847126] Internal error: synchronous external abort:
>                    0000000096000410 [#1] SMP
>     [88758.061075] pc : d_lookup+0x5c/0x220
>
> This series adds a new sysctl vm.panic_on_unrecoverable_memory_failure
> (default 0) that, when enabled, panics immediately on unrecoverable
> memory failures. This provides a clean crash dump at the time of the

I get the fail-fast part, but wonder will kernel really be able to
provide clean crash dump useful for diagnosis?

In your example at 88757.847126, kernel was handling SEA and because
we are under kernel context, eventually has to die(). Apparently not
only your patch, but also memory-failure has no role to play there.
But at least SEA handling tried its best to show the kernel code that
consumed the memory error.

So your code should apply to the memory failure handling at
88690.498473, which is likely triggered from APEI GHES for poison
detection (I guess the example is from ARM64). Anything except SEA is
considered not synchronous (by APEI is_hest_sync_notify()). If kernel
panics there, I guess it will be in a random process context or a
kworker thread? How useful is it for diagnosis? Just the exact time an
error detected (which is already logged by kernel)?

On X86, for UCNA or SRAO type machine check exceptions, I think with
your patch the panic would also happen in random process context or
kworker thread,

Can you share some clean crash dumps from your testing that show they
are more useful than the crash at SEA? Thanks!

> error, which is far more useful for diagnosis than a random crash later
> at an unrelated code path.
>
> This also categorizes reserved pages as MF_MSG_KERNEL, and panics on
> unknown page types (MF_MSG_UNKNOWN).
>
> Note that dynamically allocated kernel memory (SLAB/SLUB, vmalloc,
> kernel stacks, page tables) shares the MF_MSG_GET_HWPOISON return path
> with transient refcount races, so it is intentionally excluded from the
> panic conditions to avoid false positives.
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> Changes in v4:
> - Drop CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel configuration option.
> - Split the reserved page classification (MF_MSG_KERNEL) into its own
>   patch, separate from the panic mechanism.
> - Document why the buddy allocator TOCTOU race (between
>   get_hwpoison_page() and is_free_buddy_page()) cannot cause false
>   positives: PG_hwpoison is set beforehand and check_new_page() in the
>   page allocator rejects hwpoisoned pages.
> - Document the narrow LRU isolation race window for MF_MSG_UNKNOWN and
>   its mitigation via identify_page_state()'s two-pass design.
> - Explicitly document why MF_MSG_GET_HWPOISON is excluded from the
>   panic conditions (shared path with transient races and non-reserved
>   kernel memory).
> - Link to v3: https://patch.msgid.link/20260413-ecc_panic-v3-0-1dcbb2f12bc4@debian.org
>
> Changes in v3:
> - Rename is_unrecoverable_memory_failure() to panic_on_unrecoverable_mf()
>   as suggested by maintainer.
> - Add CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel configuration option,
>   similar to CONFIG_BOOTPARAM_HARDLOCKUP_PANIC.
> - Add documentation for the sysctl and CONFIG option.
> - Add code comments documenting the panic condition design rationale and
>   how the retry mechanism mitigates false positives from buddy allocator
>   races.
> - Link to v2: https://patch.msgid.link/20260331-ecc_panic-v2-0-9e40d0f64f7a@debian.org
>
> Changes in v2:
> - Panic on MF_MSG_KERNEL, MF_MSG_KERNEL_HIGH_ORDER and MF_MSG_UNKNOWN
>   instead of MF_MSG_GET_HWPOISON.
> - Report MF_MSG_KERNEL for reserved pages when get_hwpoison_page() fails
>   instead of MF_MSG_GET_HWPOISON.
> - Link to v1: https://patch.msgid.link/20260323-ecc_panic-v1-0-72a1921726c5@debian.org
>
> ---
> Breno Leitao (3):
>       mm/memory-failure: report MF_MSG_KERNEL for reserved pages
>       mm/memory-failure: add panic option for unrecoverable pages
>       Documentation: document panic_on_unrecoverable_memory_failure sysctl
>
>  Documentation/admin-guide/sysctl/vm.rst | 37 +++++++++++++
>  mm/memory-failure.c                     | 92 ++++++++++++++++++++++++++++++++-
>  2 files changed, 128 insertions(+), 1 deletion(-)
> ---
> base-commit: e6efabc0afca02efa263aba533f35d90117ab283
> change-id: 20260323-ecc_panic-4e473b83087c
>
> Best regards,
> --
> Breno Leitao <leitao@debian.org>
>
>

^ permalink raw reply

* Re: [PATCH bpf] bpf,tcp: avoid infinite recursion in BPF_SOCK_OPS_HDR_OPT_LEN_CB
From: KaFai Wan @ 2026-04-15 20:47 UTC (permalink / raw)
  To: Martin KaFai Lau, Jiayuan Chen
  Cc: bpf, Quan Sun, Yinhao Hu, Kaiyan Mei, Dongliang Mu, Eric Dumazet,
	Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Ahern,
	netdev, linux-doc, linux-kernel
In-Reply-To: <2026415181939.1bue.martin.lau@linux.dev>

On Wed, 2026-04-15 at 11:55 -0700, Martin KaFai Lau wrote:
> On Tue, Apr 14, 2026 at 06:57:00PM +0800, Jiayuan Chen wrote:
> > A BPF_PROG_TYPE_SOCK_OPS program can set BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG
> > to inject custom TCP header options. When the kernel builds a TCP packet,
> > it calls tcp_established_options() to calculate the header size, which
> > invokes bpf_skops_hdr_opt_len() to trigger the BPF_SOCK_OPS_HDR_OPT_LEN_CB
> > callback.
> > 
> > If the BPF program calls bpf_setsockopt(TCP_NODELAY) inside this callback,
> > __tcp_sock_set_nodelay() will call tcp_push_pending_frames(), which calls
> > tcp_current_mss(), which calls tcp_established_options() again,
> > re-triggering the same BPF callback. This creates an infinite recursion
> > that exhausts the kernel stack and causes a panic.
> > 
> > BPF_SOCK_OPS_HDR_OPT_LEN_CB
> >   -> bpf_setsockopt(TCP_NODELAY)
> > 	-> tcp_push_pending_frames()
> > 	  -> tcp_current_mss()
> > 		-> tcp_established_options()
> > 		  -> bpf_skops_hdr_opt_len()
> >                            /* infinite recursion */
> > 			-> BPF_SOCK_OPS_HDR_OPT_LEN_CB
> > 
> > A similar reentrancy issue exists for TCP congestion control, which is
> > guarded by tp->bpf_chg_cc_inprogress. Adopt the same approach: introduce
> > tp->bpf_hdr_opt_len_cb_inprogress, set it before invoking the callback in
> > bpf_skops_hdr_opt_len(), and check it in sol_tcp_sockopt() to reject
> > bpf_setsockopt(TCP_NODELAY) calls that would trigger
> > tcp_push_pending_frames() and cause the recursion.
> > 
> > Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
> > Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
> > Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
> > Reported-by: Dongliang Mu <dzm91@hust.edu.cn>
> > Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/
> 
> Thanks for the report and fixes suggested across different threads.
> 
> Using has_current_bpf_ctx() to avoid tcp_push_pending_frames() should
> work but it may change the expectation for bpf_setsockopt(TCP_NODELAY).
> e.g. A bpf_tcp_iter does bpf_setsockopt(TCP_NODELAY).
> 
> Adding another bit in the tcp_sock is not ideal either. I agree with
> Alexei that it is better to reuse the existing bit if we go down this path.
> We also need to audit more closely if there are cases that two different
> type of bpf progs may call bpf_setsockopt(). e.g.
> bpf_tcp_iter does bpf_setsockopt(TCP_CONGESTION) to switch
> to a bpf_tcp_cc and the new bpf_tcp_cc->init() will also do
> bpf_setsockopt(xxx) which then will be rejected.
> 
> Another fix could be, the bpf_setsockopt(TCP_NODELAY) is always broken
> for BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB unless
> the bpf prog is doing some maneuver to avoid the recursion. Thus,
> this use case is basically broken as is and I don't see a use case
> for bpf_setsockopt(TCP_NODELAY) when writing header also.
> How about checking the bpf_sock->op, level, and optname in
> bpf_sock_ops_setsockopt() and return -EOPNOTSUPP?

Hi Martin, thanks for the review.

I'm working whit return -EOPNOTSUPP. I've completed whit the code of fix and test, and will send the
patch later.

The fix is:

diff --git a/net/core/filter.c b/net/core/filter.c
index fcfcb72663ca..911ff04bca5a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5833,6 +5833,11 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 	if (!is_locked_tcp_sock_ops(bpf_sock))
 		return -EOPNOTSUPP;
 
+	if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB ||
+	     bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) &&
+	    IS_ENABLED(CONFIG_INET) && level == SOL_TCP && optname == TCP_NODELAY)
+		return -EOPNOTSUPP;
+
 	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
 }

-- 
Thanks,
KaFai

^ permalink raw reply related

* Re: [PATCH 0/8] Auto-generate maintainer profile entries
From: Randy Dunlap @ 2026-04-15 20:41 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Albert Ou, Jonathan Corbet,
	Mauro Carvalho Chehab, Palmer Dabbelt, Paul Walmsley
  Cc: linux-doc, linux-kernel, linux-riscv, workflows, Alexandre Ghiti,
	Shuah Khan, Dan Williams
In-Reply-To: <cover.1776242739.git.mchehab+huawei@kernel.org>

Hi Mauro,

Thanks for tackling this issue.

On 4/15/26 1:52 AM, Mauro Carvalho Chehab wrote:
> Date: Tue, 14 Apr 2026 16:29:03 +0200
> From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> To: Albert Ou <aou@eecs.berkeley.edu>, Jonathan Corbet <corbet@lwn.net>, Dan Williams <djbw@kernel.org>, Mauro Carvalho Chehab <mchehab@kernel.org>, Palmer Dabbelt <palmer@dabbelt.com>, Paul Walmsley <pjw@kernel.org>
> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>, Randy Dunlap <rdunlap@infradead.org>, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, workflows@vger.kernel.org, Alexandre Ghiti <alex@ghiti.fr>, Shuah Khan <skhan@linuxfoundation.org>
> Message-ID: <cover.1776176108.git.mchehab+huawei@kernel.org>
> 
> Hi Dan/Jon,
> 
> This patch series change the way maintainer entry profile links
> are added to the documentation. Instead of having an entry for
> each of them at an ReST file, get them from MAINTAINERS content.
> 
> That should likely make easier to maintain, as there will be a single
> point to place all such profiles.
> 
> On this version, I added Dan's text to patch 4.
> 
> I also added a couple of other patches to improve its output. While
> I could have them merged at the first patch, I opted to make them
> separate, as, in case of problems or needed changes, it would be
> easier to revert or modify the corresponding logic. Also, it should
> be better to review, in case one wants some changes there.
> 
> The main changes against RFC are:
> 
> - now, the TOC will be presented with 1 depth identation level,
>   meaning that it would look like a list;
> - for files outside Documentation/process, it will use the name of
>   the subsystem with title capitalization for the name of the
>   profile entry;
> - the logic also parses and produces a list of profiles that are
>   maintained elsewhere, picking its http/https link;
> - entries are now better sorted: first by subsystem name, then
>   by its name.
> 
> Suggested-by: Dan Williams <djbw@kernel.org>
> Closes: https://lore.kernel.org/linux-doc/69dd6299440be_147c801005b@djbw-dev.notmuch/
> 
> Mauro Carvalho Chehab (8):
>   docs: maintainers_include: auto-generate maintainer profile TOC
>   MAINTAINERS: add an entry for media maintainers profile
>   MAINTAINERS: add maintainer-tip.rst to X86
>   docs: auto-generate maintainer entry profile links
>   docs: maintainers_include: use a better title for profiles
>   docs: maintainers_include: add external profile URLs
>   docs: maintainers_include: preserve names for files under process/
>   docs: maintainers_include: Only show main entry for profiles
> 
>  .../maintainer/maintainer-entry-profile.rst   |  24 +---
>  .../process/maintainer-handbooks.rst          |  17 ++-
>  Documentation/sphinx/maintainers_include.py   | 131 +++++++++++++++---
>  MAINTAINERS                                   |   2 +
>  4 files changed, 128 insertions(+), 46 deletions(-)

When building htmldocs with O=DOCS, I get a bunch of warnings.
I tested against today's linux-next tree.

The 'make O=DOCS htmldocs' warnings are (subset of all warnings):

linux-next/MAINTAINERS:38: WARNING: toctree contains reference to nonexisting document 'DOCS/Documentation/process/maintainer-kvm-x86' [toc.not_readable]
linux-next/MAINTAINERS:38: WARNING: toctree contains reference to nonexisting document 'DOCS/Documentation/filesystems/xfs/xfs-maintainer-entry-profile' [toc.not_readable]
linux-next/MAINTAINERS:38: WARNING: toctree contains reference to nonexisting document 'DOCS/Documentation/process/maintainer-soc-clean-dts' [toc.not_readable]
linux-next/MAINTAINERS:38: WARNING: toctree contains reference to nonexisting document 'DOCS/Documentation/process/maintainer-netdev' [toc.not_readable]
linux-next/MAINTAINERS:38: WARNING: toctree contains reference to nonexisting document 'DOCS/Documentation/process/maintainer-tip' [toc.not_readable]

linux-next/Documentation/filesystems/nfs/nfsd-maintainer-entry-profile.rst: WARNING: document isn't included in any toctree [toc.not_included]
linux-next/Documentation/process/maintainer-kvm-x86.rst: WARNING: document isn't included in any toctree [toc.not_included]
linux-next/Documentation/process/maintainer-netdev.rst: WARNING: document isn't included in any toctree [toc.not_included]
linux-next/Documentation/process/maintainer-soc.rst: WARNING: document isn't included in any toctree [toc.not_included]
linux-next/Documentation/process/maintainer-soc-clean-dts.rst: WARNING: document isn't included in any toctree [toc.not_included]
linux-next/Documentation/process/maintainer-tip.rst: WARNING: document isn't included in any toctree [toc.not_included]

linux-next/MAINTAINERS:1: WARNING: unknown document: '../../DOCS/Documentation/process/maintainer-soc' [ref.doc]
linux-next/MAINTAINERS:2: WARNING: unknown document: '../../DOCS/Documentation/process/maintainer-soc-clean-dts' [ref.doc]
linux-next/MAINTAINERS:3: WARNING: unknown document: '../../DOCS/Documentation/process/maintainer-soc-clean-dts' [ref.doc]
linux-next/MAINTAINERS:5: WARNING: unknown document: '../../DOCS/Documentation/process/maintainer-tip' [ref.doc]
linux-next/MAINTAINERS:6: WARNING: unknown document: '../../DOCS/Documentation/process/maintainer-tip' [ref.doc]


-- 
~Randy


^ permalink raw reply

* Re: [PATCH v10 21/21] gpu: nova-core: Use runtime BAR1 size instead of hardcoded 256MB
From: Joel Fernandes @ 2026-04-15 20:23 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: linux-kernel, Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Dave Airlie, Daniel Almeida, Koen Koning,
	dri-devel, rust-for-linux, Nikola Djukic, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jonathan Corbet, Alex Deucher, Christian Koenig, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, Huang Rui,
	Matthew Auld, Matthew Brost, Lucas De Marchi, Thomas Hellstrom,
	Helge Deller, Alex Gaynor, Boqun Feng, John Hubbard,
	Alistair Popple, Timur Tabi, Edwin Peer, Alexandre Courbot,
	Andrea Righi, Andy Ritger, Zhi Wang, Balbir Singh,
	Philipp Stanner, Elle Rhumsaa, alexeyi, joel, linux-doc, amd-gfx,
	intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <DHIFPXAGCWU7.300NRYPR06KDG@nvidia.com>

On Thu, Apr 02, 2026 at 02:54:30PM +0900, Eliot Courtney wrote:
> On Wed Apr 1, 2026 at 6:20 AM JST, Joel Fernandes wrote:
> > From: Zhi Wang <zhiw@nvidia.com>
> >
> > Remove the hardcoded BAR1_SIZE = SZ_256M constant. On GPUs like L40 the
> > BAR1 aperture is larger than 256MB; using a hardcoded size prevents large
> > BAR1 from working and mapping it would fail.
> >
> > Signed-off-by: Zhi Wang <zhiw@nvidia.com>
> > Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> > ---
> >  drivers/gpu/nova-core/driver.rs | 8 ++------
> >  drivers/gpu/nova-core/gpu.rs    | 7 +------
> >  2 files changed, 3 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
> > index b1aafaff0cee..6f95f8672158 100644
> > --- a/drivers/gpu/nova-core/driver.rs
> > +++ b/drivers/gpu/nova-core/driver.rs
> > @@ -13,10 +13,7 @@
> >          Vendor, //
> >      },
> >      prelude::*,
> > -    sizes::{
> > -        SZ_16M,
> > -        SZ_256M, //
> > -    },
> > +    sizes::SZ_16M,
> >      sync::{
> >          atomic::{
> >              Atomic,
> > @@ -40,7 +37,6 @@ pub(crate) struct NovaCore {
> >  }
> >  
> >  const BAR0_SIZE: usize = SZ_16M;
> > -pub(crate) const BAR1_SIZE: usize = SZ_256M;
> >  
> >  // For now we only support Ampere which can use up to 47-bit DMA addresses.
> >  //
> > @@ -51,7 +47,7 @@ pub(crate) struct NovaCore {
> >  const GPU_DMA_BITS: u32 = 47;
> >  
> >  pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
> > -pub(crate) type Bar1 = pci::Bar<BAR1_SIZE>;
> > +pub(crate) type Bar1 = pci::Bar;
> >  
> >  kernel::pci_device_table!(
> >      PCI_TABLE,
> > diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> > index 8206ec015b26..ba6f1f6f0485 100644
> > --- a/drivers/gpu/nova-core/gpu.rs
> > +++ b/drivers/gpu/nova-core/gpu.rs
> > @@ -353,16 +353,11 @@ pub(crate) fn run_selftests(
> >  
> >      #[cfg(CONFIG_NOVA_MM_SELFTESTS)]
> >      fn run_mm_selftests(self: Pin<&mut Self>, pdev: &pci::Device<device::Bound>) -> Result {
> > -        use crate::driver::BAR1_SIZE;
> > -
> >          // PRAMIN aperture self-tests.
> >          crate::mm::pramin::run_self_test(pdev.as_ref(), self.mm.pramin(), self.spec.chipset)?;
> >  
> >          // BAR1 self-tests.
> > -        let bar1 = Arc::pin_init(
> > -            pdev.iomap_region_sized::<BAR1_SIZE>(1, c"nova-core/bar1"),
> > -            GFP_KERNEL,
> > -        )?;
> > +        let bar1 = Arc::pin_init(pdev.iomap_region(1, c"nova-core/bar1"), GFP_KERNEL)?;
> >          let bar1_access = bar1.access(pdev.as_ref())?;
> >  
> >          crate::mm::bar_user::run_self_test(
> 
> Can we move this directly after patch 17 which adds the fixed bar1? Or
> alternatively fold it in while preserving Zhi's attribution (I am not
> sure what the conventional method for this is).

Generally, squashing and attribution works. I will do that with attribution.
(Zhi did tell me he would be Ok with that as well in this instance).

thanks,

--
Joel Fernandes



^ permalink raw reply

* Re: [PATCH 0/6] hugetlb: normalize exported interfaces to use base-page indices
From: jane.chu @ 2026-04-15 19:40 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: akpm, david, muchun.song, lorenzo.stoakes, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, corbet, skhan, hughd, baolin.wang, peterx,
	linux-mm, linux-doc, linux-kernel
In-Reply-To: <ad9GS_hRv5hbLfl3@localhost.localdomain>



On 4/15/2026 1:03 AM, Oscar Salvador wrote:
> On Thu, Apr 09, 2026 at 05:41:51PM -0600, Jane Chu wrote:
>> This series stems from a discussion with David. [1]
>> The series makes a small cleanup to a few hugetlb interfaces used
>> outside the subsystem by standardizing them on base-page indices.
>> Hopefully this makes the interface semantics a bit more coherent with
>> the rest of mm, while the internal hugetlb code continue to use hugepage
>> indices where that remains the more natural fit.
>>
>> It is based off mm-stable, 3/30/2026, b2c31180b9d6.
>>
>> [1] https://lore.kernel.org/linux-mm/9ec9edd1-0f4c-4da2-ae78-0e7b251a9e25@kernel.org/
> 
> It seems you got some trailing spaces issues:
> 
> Applying: hugetlb: open-code hugetlb folio lookup index conversion
> .git/rebase-apply/patch:64: trailing whitespace.
> 	pgoff_t index = start >> PAGE_SHIFT;
> 
> Applying: hugetlb: make hugetlb_fault_mutex_hash() take PAGE_SIZE index
> .git/rebase-apply/patch:161: trailing whitespace.
> 	key[1] = index >> huge_page_order(hstate_inode(mapping->host));
> 
> Applying: hugetlb: drop vma_hugecache_offset() in favor of linear_page_index()
> .git/rebase-apply/patch:44: trailing whitespace.
> 	start = linear_page_index(vma, vma->vm_start);
> .git/rebase-apply/patch:46: trailing whitespace.
> 	end = linear_page_index(vma, vma->vm_end);
> 
> Applying: hugetlb: pass hugetlb reservation ranges in base-page indices
> .git/rebase-apply/patch:237: trailing whitespace.
> 		next_index = index + pages_per_huge_page(h)
> 
> 

Sorry, will remove them in v2.

thanks!
-jane
> 


^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Gregory Price @ 2026-04-15 19:40 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Matthew Wilcox, Miklos Szeredi, David Hildenbrand (Arm),
	Darrick J. Wong, John Groves, Bernd Schubert, John Groves,
	Dan Williams, Bernd Schubert, Alison Schofield, John Groves,
	Jonathan Corbet, Shuah Khan, Vishal Verma, Dave Jiang, Jan Kara,
	Alexander Viro, Christian Brauner, Randy Dunlap, Jeff Layton,
	Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Josef Bacik,
	Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
	Sean Christopherson, Shivank Garg, Ackerley Tng, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <CAJnrk1Z+uNjn+BcmpciqPZhxYXEJ5Zgh=uNCxt17WTkdOubbog@mail.gmail.com>

On Wed, Apr 15, 2026 at 10:12:54AM -0700, Joanne Koong wrote:
> On Wed, Apr 15, 2026 at 8:32 AM Gregory Price <gourry@gourry.net> wrote:
> >
> > My initial take is that it's a real concern a "bug" in a BPF program
> > could let userland map arbitrary memory into userland page tables, and
> > such an extension would not be a quick fix to the FAMFS problem.
> 
> If you're concerned about arbitrary addresses in the bpf path, you
> should be equally concerned about the FUSE_GET_FMAP path that's in
> this series, because they're functionally identical. The kernel trusts
> userspace-provided addresses in both cases. If that's acceptable for
> this series then it's acceptable for bpf too. You can't reject bpf on
> security grounds without also rejecting the current approach.
> 

To be clear, i'm not rejecting it.  I'm saying (!) that's something that
needs a careful look.

It's a novel interaction and a new ops structure. I don't think it's in
any way unfair to point out there will (and should) be questions outside
the scope of FAMFS.

> Please take a look at the famfs bpf program [1] and compare that to
> the logic in patch 6 in this series [2]. In both cases, iomap->addr
> gets set to the address that was earlier specified by the userspace
> famfs server. In the non-bpf path, the userspace server passes this
> address through a FUSE_GET_FMAP request. In the bpf path, the
> userspace server passes this address by updating the bpf hashmap from
> userspace. There is no functional difference. Also btw, this is one of
> the cases that I was referring to about the bpf path being more
> helpful - in the bpf path, we avoid having to add a FUSE_FMAP opcode
> to fuse (which will be used by no other server) and famfs gets to skip
> 2 extra context-switches that the FUSE_FMAP path otherwise entails.
> 

The question isn't about the functional differences between the FAMFS
static code or a BPF blob doing the same thing - the question is what
the new ops structure introduces for the general case that wasn't
there before.

We have to reason about the BPF extension separately from the context of
FAMFS - as it's a general interface now (forever :P).

~Gregory

^ permalink raw reply

* Re: [PATCH 6/6] hugetlb: pass hugetlb reservation ranges in base-page indices
From: jane.chu @ 2026-04-15 19:39 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: akpm, david, muchun.song, lorenzo.stoakes, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, corbet, skhan, hughd, baolin.wang, peterx,
	linux-mm, linux-doc, linux-kernel
In-Reply-To: <ad9F5duupm8Rn-Yw@localhost.localdomain>



On 4/15/2026 1:01 AM, Oscar Salvador wrote:
> On Thu, Apr 09, 2026 at 05:41:57PM -0600, Jane Chu wrote:
>> hugetlb_reserve_pages() consume indices in hugepage granularity although
>> some callers naturally compute offsets in PAGE_SIZE units.
>>
>> Teach the reservation helpers to accept base-page index ranges and
>> convert to hugepage indices internally before operating on the
>> reservation map. This keeps the internal representation unchanged while
>> making the API contract more uniform for callers.
>>
>> Update hugetlbfs and memfd call sites to pass base-page indices, and
>> adjust the documentation to describe the new calling convention. Add
>> alignment warnings in hugetlb_reserve_pages() to catch invalid ranges
>> early.
>>
>> No functional changes.
>>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>> ---
>>   Documentation/mm/hugetlbfs_reserv.rst | 12 +++++------
>>   fs/hugetlbfs/inode.c                  | 29 ++++++++++++---------------
>>   mm/hugetlb.c                          | 26 ++++++++++++++++--------
>>   mm/memfd.c                            |  9 +++++----
>>   4 files changed, 42 insertions(+), 34 deletions(-)
>>
>> diff --git a/Documentation/mm/hugetlbfs_reserv.rst b/Documentation/mm/hugetlbfs_reserv.rst
>> index a49115db18c7..60a52b28f0b4 100644
>> --- a/Documentation/mm/hugetlbfs_reserv.rst
>> +++ b/Documentation/mm/hugetlbfs_reserv.rst
>> @@ -112,8 +112,8 @@ flag was specified in either the shmget() or mmap() call.  If NORESERVE
>>   was specified, then this routine returns immediately as no reservations
>>   are desired.
>>   
>> -The arguments 'from' and 'to' are huge page indices into the mapping or
>> -underlying file.  For shmget(), 'from' is always 0 and 'to' corresponds to
>> +The arguments 'from' and 'to' are base page indices into the mapping or
>> +underlying file. For shmget(), 'from' is always 0 and 'to' corresponds to
>>   the length of the segment/mapping.  For mmap(), the offset argument could
>>   be used to specify the offset into the underlying file.  In such a case,
>>   the 'from' and 'to' arguments have been adjusted by this offset.
>> @@ -136,10 +136,10 @@ to indicate this VMA owns the reservations.
>>   
>>   The reservation map is consulted to determine how many huge page reservations
>>   are needed for the current mapping/segment.  For private mappings, this is
>> -always the value (to - from).  However, for shared mappings it is possible that
>> -some reservations may already exist within the range (to - from).  See the
>> -section :ref:`Reservation Map Modifications <resv_map_modifications>`
>> -for details on how this is accomplished.
>> +always the number of huge pages covered by the range [from, to).  However,
>> +for shared mappings it is possible that some reservations may already exist
>> +within the range [from, to).  See the section :ref:`Reservation Map Modifications
>> +<resv_map_modifications>` for details on how this is accomplished.
>>   
>>   The mapping may be associated with a subpool.  If so, the subpool is consulted
>>   to ensure there is sufficient space for the mapping.  It is possible that the
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index a72d46ff7980..ec05ed30b70f 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -157,10 +157,8 @@ static int hugetlbfs_file_mmap_prepare(struct vm_area_desc *desc)
>>   	if (inode->i_flags & S_PRIVATE)
>>   		vma_flags_set(&vma_flags, VMA_NORESERVE_BIT);
>>   
>> -	if (hugetlb_reserve_pages(inode,
>> -			desc->pgoff >> huge_page_order(h),
>> -			len >> huge_page_shift(h), desc,
>> -			vma_flags) < 0)
>> +	if (hugetlb_reserve_pages(inode, desc->pgoff, len >> PAGE_SHIFT, desc,
>> +				  vma_flags) < 0)
> 
> Ok, this is something that I have been thinking every time  I looked
> into hugetlb reserve code, but I think we should be really starting to
> put some meaningful names for from and to, and pass that to
> hugetlb_reserve_pages.
> Because "desc->pgoff" and "len >> PAGE_SHIFT", meh, and it is not that
> many places we need to touch, but we might want in clarity.
> The same goes for hugetlb_unreserve_pages() of course.

indeed, will try to work on that in v2.
> 
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 47ef41b6fb2e..eb4ab5bd0c9f 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -6532,10 +6532,11 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
>>   }
> [...]
>> @@ -6558,6 +6560,12 @@ long hugetlb_reserve_pages(struct inode *inode,
>>   		return -EINVAL;
>>   	}
>>   
>> +	VM_WARN_ON(!IS_ALIGNED(from, 1UL << huge_page_order(h)));
>> +	VM_WARN_ON(!IS_ALIGNED(to,   1UL << huge_page_order(h)));
> 
> If we want to scream if someone passes us unaligned indices, we might
> want to do the same in hugetlb_unreserve_pages() ?

Sure.
> 
>> diff --git a/mm/memfd.c b/mm/memfd.c
>> index 56c8833c4195..59c174c7533c 100644
>> --- a/mm/memfd.c
>> +++ b/mm/memfd.c
>> @@ -80,14 +80,15 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t index)
>>   		struct inode *inode = file_inode(memfd);
>>   		struct hstate *h = hstate_file(memfd);
>>   		long nr_resv;
>> -		pgoff_t idx;
>> +		pgoff_t next_index;
>>   		int err = -ENOMEM;
>>   
>>   		gfp_mask = htlb_alloc_mask(h);
>>   		gfp_mask &= ~(__GFP_HIGHMEM | __GFP_MOVABLE);
>> -		idx = index >> huge_page_order(h);
>> +		next_index = index + pages_per_huge_page(h);
> 
> Trailing white space.

My bad, should have checked.

Thanks!
-jane

> 
> 


^ permalink raw reply

* Re: [PATCH net-next v4 04/13] devlink: allow to use devlink index as a command handle
From: Geert Uytterhoeven @ 2026-04-15 19:04 UTC (permalink / raw)
  To: jiri
  Cc: andrew+netdev, chuck.lever, cjubran, corbet, daniel.zahka, davem,
	donald.hunter, edumazet, horms, kuba, leon, linux-doc, linux-rdma,
	linux-trace-kernel, mathieu.desnoyers, matttbe, mbloch, mhiramat,
	mschmidt, netdev, pabeni, przemyslaw.kitszel, rostedt, saeedm,
	skhan, tariqt, linux-kernel
In-Reply-To: <20260312100407.551173-5-jiri@resnulli.us>

On Thu, 12 Mar 2026, Jiri Pirko wrote:
> Currently devlink instances are addressed bus_name/dev_name tuple.
> Allow the newly introduced DEVLINK_ATTR_INDEX to be used as
> an alternative handle for all devlink commands.
> 
> When DEVLINK_ATTR_INDEX is present in the request, use it for a direct
> xarray lookup instead of iterating over all instances comparing
> bus_name/dev_name strings.
> 
> Signed-off-by: Jiri Pirko <jiri@nvidia.com>

Thanks for your patch, which is now commit d85a8af57da87196 ("devlink:
allow to use devlink index as a command handle").

This has a rather large impact on kernel size.
For e.g. m68k/atari_defconfig, bloat-o-meter reports:

    add/remove: 4/1 grow/shrink: 72/1 up/down: 65804/-76 (65728)
    Function                                     old     new   delta
    devlink_trap_policer_get_dump_nl_policy       24    1480   +1456
    devlink_trap_group_get_dump_nl_policy         24    1480   +1456
    devlink_trap_get_dump_nl_policy               24    1480   +1456
    devlink_selftests_get_nl_policy               24    1480   +1456
    devlink_sb_tc_pool_bind_get_dump_nl_policy      24    1480   +1456
    devlink_sb_port_pool_get_dump_nl_policy       24    1480   +1456
    devlink_sb_pool_get_dump_nl_policy            24    1480   +1456
    devlink_sb_get_dump_nl_policy                 24    1480   +1456
    devlink_resource_dump_nl_policy               24    1480   +1456
    devlink_region_get_dump_nl_policy             24    1480   +1456
    devlink_rate_get_dump_nl_policy               24    1480   +1456
    devlink_port_get_dump_nl_policy               24    1480   +1456
    devlink_param_get_dump_nl_policy              24    1480   +1456
    devlink_linecard_get_dump_nl_policy           24    1480   +1456
    devlink_info_get_nl_policy                    24    1480   +1456
    devlink_get_nl_policy                         24    1480   +1456
    devlink_eswitch_get_nl_policy                 24    1480   +1456
    devlink_dpipe_headers_get_nl_policy           24    1480   +1456
    devlink_port_unsplit_nl_policy                32    1480   +1448
    devlink_port_param_set_nl_policy              32    1480   +1448
    devlink_port_param_get_nl_policy              32    1480   +1448
    devlink_port_get_do_nl_policy                 32    1480   +1448
    devlink_port_del_nl_policy                    32    1480   +1448
    devlink_notify_filter_set_nl_policy           32    1480   +1448
    devlink_health_reporter_get_dump_nl_policy      32    1480   +1448
    devlink_port_split_nl_policy                  80    1480   +1400
    devlink_sb_occ_snapshot_nl_policy             96    1480   +1384
    devlink_sb_occ_max_clear_nl_policy            96    1480   +1384
    devlink_sb_get_do_nl_policy                   96    1480   +1384
    devlink_sb_port_pool_get_do_nl_policy        144    1480   +1336
    devlink_sb_pool_get_do_nl_policy             144    1480   +1336
    devlink_sb_pool_set_nl_policy                168    1480   +1312
    devlink_sb_port_pool_set_nl_policy           176    1480   +1304
    devlink_sb_tc_pool_bind_set_nl_policy        184    1480   +1296
    devlink_sb_tc_pool_bind_get_do_nl_policy     184    1480   +1296
    devlink_dpipe_table_get_nl_policy            240    1480   +1240
    devlink_dpipe_entries_get_nl_policy          240    1480   +1240
    devlink_dpipe_table_counters_set_nl_policy     272    1480   +1208
    devlink_eswitch_set_nl_policy                504    1480    +976
    devlink_resource_set_nl_policy               544    1480    +936
    devlink_param_get_do_nl_policy               656    1480    +824
    devlink_region_get_do_nl_policy              712    1480    +768
    devlink_region_new_nl_policy                 744    1480    +736
    devlink_region_del_nl_policy                 744    1480    +736
    devlink_health_reporter_test_nl_policy       928    1480    +552
    devlink_health_reporter_recover_nl_policy     928    1480    +552
    devlink_health_reporter_get_do_nl_policy     928    1480    +552
    devlink_health_reporter_dump_get_nl_policy     928    1480    +552
    devlink_health_reporter_dump_clear_nl_policy     928    1480    +552
    devlink_health_reporter_diagnose_nl_policy     928    1480    +552
    devlink_trap_get_do_nl_policy               1048    1480    +432
    devlink_trap_set_nl_policy                  1056    1480    +424
    devlink_trap_group_get_do_nl_policy         1088    1480    +392
    devlink_trap_policer_get_do_nl_policy       1144    1480    +336
    devlink_trap_group_set_nl_policy            1144    1480    +336
    devlink_trap_policer_set_nl_policy          1160    1480    +320
    devlink_port_set_nl_policy                  1168    1480    +312
    devlink_flash_update_nl_policy              1224    1480    +256
    devlink_reload_nl_policy                    1248    1480    +232
    devlink_port_new_nl_policy                  1320    1480    +160
    devlink_rate_get_do_nl_policy               1352    1480    +128
    devlink_rate_del_nl_policy                  1352    1480    +128
    devlink_linecard_get_do_nl_policy           1376    1480    +104
    __devlinks_xa_find_get                         -      96     +96
    devlink_linecard_set_nl_policy              1392    1480     +88
    devlink_selftests_run_nl_policy             1416    1480     +64
    devlink_get_from_attrs_lock                  262     314     +52
    devlink_region_read_nl_policy               1440    1480     +40
    devlink_rate_set_nl_policy                  1448    1480     +32
    devlink_rate_new_nl_policy                  1448    1480     +32
    devlinks_xa_lookup_get                         -      30     +30
    devlink_health_reporter_set_nl_policy       1456    1480     +24
    devlink_attr_index_range                       -      16     +16
    devlink_param_set_nl_policy                 1472    1480      +8
    devlink_nl_dumpit                            276     282      +6
    __initcall__kmod_core__670_573_devlink_init4       -       4      +4
    __initcall__kmod_core__670_561_devlink_init4       4       -      -4
    devlinks_xa_find_get                          96      24     -72
    Total: Before=5203976, After=5269704, chg +1.26%

> --- a/net/devlink/netlink_gen.c
> +++ b/net/devlink/netlink_gen.c
> @@ -11,6 +11,11 @@
>  
>  #include <uapi/linux/devlink.h>
>  
> +/* Integer value ranges */
> +static const struct netlink_range_validation devlink_attr_index_range = {
> +	.max	= U32_MAX,
> +};
> +
>  /* Sparse enums validation callbacks */
>  static int
>  devlink_attr_param_type_validate(const struct nlattr *attr,
> @@ -56,37 +61,42 @@ const struct nla_policy devlink_dl_selftest_id_nl_policy[DEVLINK_ATTR_SELFTEST_I
>  };
>  
>  /* DEVLINK_CMD_GET - do */
> -static const struct nla_policy devlink_get_nl_policy[DEVLINK_ATTR_DEV_NAME + 1] = {
> +static const struct nla_policy devlink_get_nl_policy[DEVLINK_ATTR_INDEX + 1] = {

Unrelated to this change, but the explicit sizing of these arrays is not
needed, as the compiler will take care of that.

>  	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
>  	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
> +	[DEVLINK_ATTR_INDEX] = NLA_POLICY_FULL_RANGE(NLA_UINT, &devlink_attr_index_range),

This array, and many others below, are sparse, with large gaps (up to
1456 or 2912 bytes on 32-bit resp. 64-bit systems) before the last
entries.

>  };
>  
>  /* DEVLINK_CMD_PORT_GET - do */
> -static const struct nla_policy devlink_port_get_do_nl_policy[DEVLINK_ATTR_PORT_INDEX + 1] = {
> +static const struct nla_policy devlink_port_get_do_nl_policy[DEVLINK_ATTR_INDEX + 1] = {
>  	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
>  	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
> +	[DEVLINK_ATTR_INDEX] = NLA_POLICY_FULL_RANGE(NLA_UINT, &devlink_attr_index_range),

Shouldn't this be inserted at the end, as DEVLINK_ATTR_INDEX >
DEVLINK_ATTR_PORT_INDEX, for readability?

>  	[DEVLINK_ATTR_PORT_INDEX] = { .type = NLA_U32, },
>  };

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH bpf] bpf,tcp: avoid infinite recursion in BPF_SOCK_OPS_HDR_OPT_LEN_CB
From: Martin KaFai Lau @ 2026-04-15 18:55 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: bpf, Quan Sun, Yinhao Hu, Kaiyan Mei, Dongliang Mu, Eric Dumazet,
	Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Ahern,
	netdev, linux-doc, linux-kernel
In-Reply-To: <20260414105702.248310-1-jiayuan.chen@linux.dev>

On Tue, Apr 14, 2026 at 06:57:00PM +0800, Jiayuan Chen wrote:
> A BPF_PROG_TYPE_SOCK_OPS program can set BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG
> to inject custom TCP header options. When the kernel builds a TCP packet,
> it calls tcp_established_options() to calculate the header size, which
> invokes bpf_skops_hdr_opt_len() to trigger the BPF_SOCK_OPS_HDR_OPT_LEN_CB
> callback.
> 
> If the BPF program calls bpf_setsockopt(TCP_NODELAY) inside this callback,
> __tcp_sock_set_nodelay() will call tcp_push_pending_frames(), which calls
> tcp_current_mss(), which calls tcp_established_options() again,
> re-triggering the same BPF callback. This creates an infinite recursion
> that exhausts the kernel stack and causes a panic.
> 
> BPF_SOCK_OPS_HDR_OPT_LEN_CB
>   -> bpf_setsockopt(TCP_NODELAY)
> 	-> tcp_push_pending_frames()
> 	  -> tcp_current_mss()
> 		-> tcp_established_options()
> 		  -> bpf_skops_hdr_opt_len()
>                            /* infinite recursion */
> 			-> BPF_SOCK_OPS_HDR_OPT_LEN_CB
> 
> A similar reentrancy issue exists for TCP congestion control, which is
> guarded by tp->bpf_chg_cc_inprogress. Adopt the same approach: introduce
> tp->bpf_hdr_opt_len_cb_inprogress, set it before invoking the callback in
> bpf_skops_hdr_opt_len(), and check it in sol_tcp_sockopt() to reject
> bpf_setsockopt(TCP_NODELAY) calls that would trigger
> tcp_push_pending_frames() and cause the recursion.
> 
> Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
> Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
> Reported-by: Dongliang Mu <dzm91@hust.edu.cn>
> Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/

Thanks for the report and fixes suggested across different threads.

Using has_current_bpf_ctx() to avoid tcp_push_pending_frames() should
work but it may change the expectation for bpf_setsockopt(TCP_NODELAY).
e.g. A bpf_tcp_iter does bpf_setsockopt(TCP_NODELAY).

Adding another bit in the tcp_sock is not ideal either. I agree with
Alexei that it is better to reuse the existing bit if we go down this path.
We also need to audit more closely if there are cases that two different
type of bpf progs may call bpf_setsockopt(). e.g.
bpf_tcp_iter does bpf_setsockopt(TCP_CONGESTION) to switch
to a bpf_tcp_cc and the new bpf_tcp_cc->init() will also do
bpf_setsockopt(xxx) which then will be rejected.

Another fix could be, the bpf_setsockopt(TCP_NODELAY) is always broken
for BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB unless
the bpf prog is doing some maneuver to avoid the recursion. Thus,
this use case is basically broken as is and I don't see a use case
for bpf_setsockopt(TCP_NODELAY) when writing header also.
How about checking the bpf_sock->op, level, and optname in
bpf_sock_ops_setsockopt() and return -EOPNOTSUPP?

^ permalink raw reply

* Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
From: Michael Roth @ 2026-04-15 18:20 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jroedel, jthoughton, oupton, pankaj.gupta,
	qperret, rick.p.edgecombe, rientjes, shivankg, steven.price,
	tabba, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, Paolo Bonzini, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm
In-Reply-To: <eiiecl7jvywvqb4drq7cchmcabcrdka25wxr77uavxqineeedm@rfcnhdz6xoxf>

On Tue, Apr 14, 2026 at 06:37:00PM -0500, Michael Roth wrote:
> On Wed, Apr 01, 2026 at 03:38:12PM -0700, Ackerley Tng wrote:
> > Michael Roth <michael.roth@amd.com> writes:
> > 
> > >
> > > [...snip...]
> > >
> > >>  static unsigned long kvm_get_vm_memory_attributes(struct kvm *kvm, gfn_t gfn)
> > >>  {
> > >> @@ -2635,6 +2625,8 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> > >>  		return -EINVAL;
> > >>  	if (!PAGE_ALIGNED(attrs->address) || !PAGE_ALIGNED(attrs->size))
> > >>  		return -EINVAL;
> > >> +	if (attrs->error_offset)
> > >> +		return -EINVAL;
> > >>  	for (i = 0; i < ARRAY_SIZE(attrs->reserved); i++) {
> > >>  		if (attrs->reserved[i])
> > >>  			return -EINVAL;
> > >> @@ -4983,6 +4975,11 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
> > >>  		return 1;
> > >>  	case KVM_CAP_GUEST_MEMFD_FLAGS:
> > >>  		return kvm_gmem_get_supported_flags(kvm);
> > >> +	case KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES:
> > >> +		if (vm_memory_attributes)
> > >> +			return 0;
> > >> +
> > >> +		return kvm_supported_mem_attributes(kvm);
> > >
> > > Based on the discussion from the PUCK call this morning,
> > 
> > Thanks for copying the discussion here, I'll start attending PUCK to
> > catch those discussions too :)
> > 
> > > it sounds like it
> > > would be a good idea to limit kvm_supported_mem_attributes() to only
> > > reporting KVM_MEMORY_ATTRIBUTE_PRIVATE if the underlying CoCo
> > > implementation has all the necessary enablement to support in-place
> > > conversion via guest_memfd. In the case of SNP, there is a
> > > documentation/parameter check in snp_launch_update() that needs to be
> > > relaxed in order for userspace to be able to pass in a NULL 'src'
> > > parameter (since, for in-place conversion, it would be initialized in place
> > > as shared memory prior to the call, since by the time kvm_gmem_poulate()
> > > it will have been set to private and therefore cannot be faulted in via
> > > GUP (and if it could, we'd be unecessarily copying the src back on top
> > > of itself since src/dst are the same).
> > 
> > Could this be a separate thing? If I'm understanding you correctly, it's
> > not strictly a requirement for snp_launch_update() to first support a
> > NULL 'src' parameter before this series lands.
> 
> I think we are already sync'd up on this during PUCK, but for the benefit
> of others: Sean pointed out that if we don't then we'll need to add yet
> another capability so userspace can determine when it can actually do
> in-place conversion for SNP.

(in-place conversion for SNP during pre-launch/populate phase, I meant)

> 
> Right now, this series effectively advertises in place conversion at the
> point where KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES reports
> 'KVM_MEMORY_ATTRIBUTE_PRIVATE', so I slightly reworked the series to
> include the snp_launch_update() change prior to that point in time in
> the series. Thanks to prereqs and changes/requirements you've already
> pulled in, it's just one additional patch now:
> 
>  KVM: SEV: Make 'uaddr' parameter optional for KVM_SEV_SNP_LAUNCH_UPDATE 
> 
> I also did some minor updates (prefixed with a "[squash]" tag) to advertise
> the KVM_SET_MEMORY_ATTRIBUTES2_PRESERVED flag so it can be used by

Though I'm not sure how we deal with it if SNP/TDX at some point become
capable of using the PRESERVED flag *after* populate... but maybe that's
too unlikely to worry about? If we wanted to address it though, we could
have both PRESERVED and PRESERVED_BEFORE_LAUNCH so they can be
enumerated separately from the start.

> userspace for SNP/TDX in the kvm_gmem_populate() path as agreed upon
> during PUCK.
> 
> The branch is here, with the patches moved to where I think they
> should remain (or be squashed in for the [squash] ones):
> 
>   https://github.com/AMDESE/linux/commits/guest_memfd-inplace-conversion-v4-snp2/
> 
> I've also updated the QEMU patches to use the agreed-upon API flow and
> pushed them here:
> 
>   https://github.com/AMDESE/qemu/commits/snp-inplace-for-v4-wip2/
> 
> To start an SNP guest with in-place conversion:
> 
>   qemu-system-x86 \
>   -machine q35,confidential-guest-support=sev0,memory-backend=ram1 \
>   -object sev-snp-guest,id=sev0,...,convert-in-place=true \
>   -object memory-backend-memfd,id=ram1,size=16G,share=true,reserve=false

Sorry, that should've been:

  -object memory-backend-guest-memfd,id=ram1,size=16G,share=true,reserve=false

> 
> To start an normal non-CoCo guest backed by guest_memfd with shared memory:
> 
>   qemu-system-x86 \
>   -machine q35,confidential-guest-support=sev0,memory-backend=ram1 \
>   -object memory-backend-memfd,id=ram1,size=16G,share=true,reserve=false

and:

  -object memory-backend-guest-memfd,id=ram1,size=16G,share=true,reserve=false

(and both require kvm.vm_memory_attributes=0)

-Mike

> 
> Thanks,
> 
> Mike

^ permalink raw reply

* [PATCH] docs: proc: fix minor grammar and formatting issues
From: Myro @ 2026-04-15 17:52 UTC (permalink / raw)
  To: corbet; +Cc: linux-doc, linux-kernel, linux-fsdevel, Myro

Fix missing "from" in "prevent <pid> --from-- being reused" and
add spacing in vm_area_struct range notation for readability.

No functional changes. :)
---
 Documentation/filesystems/proc.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 873761087f8d..d828006bd91c 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -118,7 +118,7 @@ PTRACE_MODE_ATTACH permissions; CAP_PERFMON capability does not grant access
 to /proc/PID/mem for other processes.
 
 Note that an open file descriptor to /proc/<pid> or to any of its
-contained files or subdirectories does not prevent <pid> being reused
+contained files or subdirectories does not prevent <pid> from being reused
 for some other process in the event that <pid> exits. Operations on
 open /proc/<pid> file descriptors corresponding to dead processes
 never act on any new process that the kernel may, through chance, have
@@ -2199,7 +2199,7 @@ the process is maintaining.  Example output::
      | lr-------- 1 root root 64 Jan 27 11:24 400000-41a000 -> /usr/bin/ls
 
 The name of a link represents the virtual memory bounds of a mapping, i.e.
-vm_area_struct::vm_start-vm_area_struct::vm_end.
+vm_area_struct::vm_start - vm_area_struct::vm_end.
 
 The main purpose of the map_files is to retrieve a set of memory mapped
 files in a fast way instead of parsing /proc/<pid>/maps or
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v4 02/13] dt-bindings: leds: document Samsung S2M series PMIC RGB LED device
From: Kaustabh Chakraborty @ 2026-04-15 17:30 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Kaustabh Chakraborty
  Cc: Lee Jones, Pavel Machek, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, MyungJoo Ham, Chanwoo Choi, Sebastian Reichel,
	André Draszik, Alexandre Belloni, Jonathan Corbet,
	Shuah Khan, Nam Tran, Łukasz Lebiedziński, linux-leds,
	devicetree, linux-kernel, linux-pm, linux-samsung-soc, linux-rtc,
	linux-doc
In-Reply-To: <20260415-sensible-kiwi-of-argument-44d6ed@quoll>

On 2026-04-15 09:03 +02:00, Krzysztof Kozlowski wrote:
> On Tue, Apr 14, 2026 at 12:02:54PM +0530, Kaustabh Chakraborty wrote:
>> +description: |
>> +  The Samsung S2M series PMIC RGB LED is a three-channel LED device with
>> +  8-bit brightness control for each channel, typically used as status
>> +  indicators in mobile phones.
>> +
>> +  This is a part of device tree bindings for S2M and S5M family of Power
>> +  Management IC (PMIC).
>> +
>> +  See also Documentation/devicetree/bindings/mfd/samsung,s2mps11.yaml for
>> +  additional information and example.
>> +
>> +allOf:
>> +  - $ref: common.yaml#
>
> Rob's comment is still valid:
> 1. How do you address one of three LEDs in non-RGB case?
> 2. Where is multi-color?

Yes, multi-color should have been added here.

>
> And based on this alone without other properties, I say this should be
> part of top-level schema.  Separate node is fine, but no need for
> separate binding.

BTW, for loading the sub-device driver via platform (as it won't be a
separate binding) the driver *must* be built-in. Although not related to
bindings, this seems counter-intuitive. I see the same problem with the
PMIC charger.

>
> Best regards,
> Krzysztof


^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Joanne Koong @ 2026-04-15 17:12 UTC (permalink / raw)
  To: Gregory Price
  Cc: Matthew Wilcox, Miklos Szeredi, David Hildenbrand (Arm),
	Darrick J. Wong, John Groves, Bernd Schubert, John Groves,
	Dan Williams, Bernd Schubert, Alison Schofield, John Groves,
	Jonathan Corbet, Shuah Khan, Vishal Verma, Dave Jiang, Jan Kara,
	Alexander Viro, Christian Brauner, Randy Dunlap, Jeff Layton,
	Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Josef Bacik,
	Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
	Sean Christopherson, Shivank Garg, Ackerley Tng, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <ad-vnqRrUGs9n0N8@gourry-fedora-PF4VCD3F>

On Wed, Apr 15, 2026 at 8:32 AM Gregory Price <gourry@gourry.net> wrote:
>
> On Wed, Apr 15, 2026 at 04:10:00PM +0100, Matthew Wilcox wrote:
> > On Wed, Apr 15, 2026 at 04:04:50PM +0200, Miklos Szeredi wrote:
> > > On Wed, 15 Apr 2026 at 15:35, Gregory Price <gourry@gourry.net> wrote:
> > >
> > > > This was my first reaction when I realized the BPF program would be
> > > > controlling iomap return value in the fault path.  Big ol' (!)  popped
> > > > up over my head.
> > >
> > > I'm wondering which part of this triggers the big (!).
> > >
> > > BPF program being run in the fault path?
> > >
> > > Or that the return value from the BPF function is used as iomap?
> > >
> > > Or something else?
> >
> > If a BPF program controls what memory address a fault now allows access
> > to, who validates that this is a memory address within the purview of
> > the BPF program, and not, say, the address of the kernel page tables?
> >
> > (I have done no looking to determine if this is already considered)
>
> From an initial look at the existing bpf ops structures, I do not see
> any other struct with a similar (obvious) pattern - so it's not clear to
> me such a concern has been exposed elsewhere or directly addressed.
>
> There is a verifier step for the BPF program that in theory would
> validate the range matches the DAX ranges, but i think that only
> validates the types are right and only on load - I think the BPF
> program itself would be the address validater, which is a strong no.
>
> BPF folks please correct me if i'm off base here.
>
> My initial take is that it's a real concern a "bug" in a BPF program
> could let userland map arbitrary memory into userland page tables, and
> such an extension would not be a quick fix to the FAMFS problem.

If you're concerned about arbitrary addresses in the bpf path, you
should be equally concerned about the FUSE_GET_FMAP path that's in
this series, because they're functionally identical. The kernel trusts
userspace-provided addresses in both cases. If that's acceptable for
this series then it's acceptable for bpf too. You can't reject bpf on
security grounds without also rejecting the current approach.

Please take a look at the famfs bpf program [1] and compare that to
the logic in patch 6 in this series [2]. In both cases, iomap->addr
gets set to the address that was earlier specified by the userspace
famfs server. In the non-bpf path, the userspace server passes this
address through a FUSE_GET_FMAP request. In the bpf path, the
userspace server passes this address by updating the bpf hashmap from
userspace. There is no functional difference. Also btw, this is one of
the cases that I was referring to about the bpf path being more
helpful - in the bpf path, we avoid having to add a FUSE_FMAP opcode
to fuse (which will be used by no other server) and famfs gets to skip
2 extra context-switches that the FUSE_FMAP path otherwise entails.

As I understand it, famfs is gated behind CAP_SYS_RAWIO, which is a
highly privileged capability. To use iomap bpf, this would also
require similar high privileges.

Thanks,
Joanne

[1] https://github.com/joannekoong/libfuse/blob/444fa27fa9fd2118a0dc332933197faf9bbf25aa/example/famfs.bpf.c
[2] https://lore.kernel.org/linux-fsdevel/0100019d43e79794-0eadcf5e-b659-43f7-8fdc-dec9f4ccce14-000000@email.amazonses.com/

>
> ~Gregory

^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Gregory Price @ 2026-04-15 15:32 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Miklos Szeredi, David Hildenbrand (Arm), Darrick J. Wong,
	John Groves, Joanne Koong, Bernd Schubert, John Groves,
	Dan Williams, Bernd Schubert, Alison Schofield, John Groves,
	Jonathan Corbet, Shuah Khan, Vishal Verma, Dave Jiang, Jan Kara,
	Alexander Viro, Christian Brauner, Randy Dunlap, Jeff Layton,
	Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Josef Bacik,
	Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
	Sean Christopherson, Shivank Garg, Ackerley Tng, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <ad-qSB4oL5D3S-ht@casper.infradead.org>

On Wed, Apr 15, 2026 at 04:10:00PM +0100, Matthew Wilcox wrote:
> On Wed, Apr 15, 2026 at 04:04:50PM +0200, Miklos Szeredi wrote:
> > On Wed, 15 Apr 2026 at 15:35, Gregory Price <gourry@gourry.net> wrote:
> > 
> > > This was my first reaction when I realized the BPF program would be
> > > controlling iomap return value in the fault path.  Big ol' (!)  popped
> > > up over my head.
> > 
> > I'm wondering which part of this triggers the big (!).
> > 
> > BPF program being run in the fault path?
> > 
> > Or that the return value from the BPF function is used as iomap?
> > 
> > Or something else?
> 
> If a BPF program controls what memory address a fault now allows access
> to, who validates that this is a memory address within the purview of
> the BPF program, and not, say, the address of the kernel page tables?
> 
> (I have done no looking to determine if this is already considered)

From an initial look at the existing bpf ops structures, I do not see
any other struct with a similar (obvious) pattern - so it's not clear to
me such a concern has been exposed elsewhere or directly addressed.

There is a verifier step for the BPF program that in theory would
validate the range matches the DAX ranges, but i think that only
validates the types are right and only on load - I think the BPF
program itself would be the address validater, which is a strong no.

BPF folks please correct me if i'm off base here.

My initial take is that it's a real concern a "bug" in a BPF program
could let userland map arbitrary memory into userland page tables, and
such an extension would not be a quick fix to the FAMFS problem.

~Gregory

^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Darrick J. Wong @ 2026-04-15 15:28 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Miklos Szeredi, Gregory Price, David Hildenbrand (Arm),
	John Groves, Joanne Koong, Bernd Schubert, John Groves,
	Dan Williams, Bernd Schubert, Alison Schofield, John Groves,
	Jonathan Corbet, Shuah Khan, Vishal Verma, Dave Jiang, Jan Kara,
	Alexander Viro, Christian Brauner, Randy Dunlap, Jeff Layton,
	Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Josef Bacik,
	Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
	Sean Christopherson, Shivank Garg, Ackerley Tng, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <ad-qSB4oL5D3S-ht@casper.infradead.org>

On Wed, Apr 15, 2026 at 04:10:00PM +0100, Matthew Wilcox wrote:
> On Wed, Apr 15, 2026 at 04:04:50PM +0200, Miklos Szeredi wrote:
> > On Wed, 15 Apr 2026 at 15:35, Gregory Price <gourry@gourry.net> wrote:
> > 
> > > This was my first reaction when I realized the BPF program would be
> > > controlling iomap return value in the fault path.  Big ol' (!)  popped
> > > up over my head.
> > 
> > I'm wondering which part of this triggers the big (!).
> > 
> > BPF program being run in the fault path?
> > 
> > Or that the return value from the BPF function is used as iomap?
> > 
> > Or something else?
> 
> If a BPF program controls what memory address a fault now allows access
> to, who validates that this is a memory address within the purview of
> the BPF program, and not, say, the address of the kernel page tables?
> 
> (I have done no looking to determine if this is already considered)

We're not using bpf to implement ->filemap_fault directly.  fuse would
implement that as a call to dax_iomap_fault, iomap would then call
fuse's ->iomap_begin, and that's where the bpf program would take over.
The mapping provided would return a (struct dax_device, u64 addr, u64
length), and (presumably) the dax device access function would
rangecheck that.

Unless someone foolishly puts kernel page tables on the dax device...

--D

^ permalink raw reply

* Re: [PATCH v4] PM: QoS: Introduce boot parameter pm_qos_resume_latency_us
From: Aaron Tomlin @ 2026-04-15 15:23 UTC (permalink / raw)
  To: rafael, dakr, pavel, lenb
  Cc: zhongqiu.han, akpm, bp, pmladek, rdunlap, feng.tang,
	pawan.kumar.gupta, kees, elver, arnd, fvdl, lirongqing, bhelgaas,
	neelx, sean, mproche, chjohnst, nick.lange, linux-kernel,
	linux-pm, linux-doc
In-Reply-To: <20260308190421.46657-1-atomlin@atomlin.com>

[-- Attachment #1: Type: text/plain, Size: 968 bytes --]

On Sun, Mar 08, 2026 at 03:04:21PM -0400, Aaron Tomlin wrote:
> Users currently lack a mechanism to define granular, per-CPU PM QoS
> resume latency constraints during the early boot phase.
> 
> While the idle=poll boot parameter exists, it enforces a global
> override, forcing all CPUs in the system to "poll". This global approach
> is not suitable for asymmetric workloads where strict latency guarantees
> are required only on specific critical CPUs, while housekeeping or
> non-critical CPUs should be allowed to enter deeper idle states to save
> energy.
> 

Hi Rafael, Danilo, Pavel, Len, Zhongqiu,

A gentle ping on this v4 series. I was hoping to see if there is any
further feedback on this approach to introducing the
"pm_qos_resume_latency_us=" boot parameter.

Please let me know if any further adjustments are required, or if you need
me to rebase this against the latest power management tree.


Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Matthew Wilcox @ 2026-04-15 15:10 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Gregory Price, David Hildenbrand (Arm), Darrick J. Wong,
	John Groves, Joanne Koong, Bernd Schubert, John Groves,
	Dan Williams, Bernd Schubert, Alison Schofield, John Groves,
	Jonathan Corbet, Shuah Khan, Vishal Verma, Dave Jiang, Jan Kara,
	Alexander Viro, Christian Brauner, Randy Dunlap, Jeff Layton,
	Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Josef Bacik,
	Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
	Sean Christopherson, Shivank Garg, Ackerley Tng, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <CAJfpegsUVv0ziMSQiq9pKeXf6G-+LROPTW077hHMSmAirVCLQw@mail.gmail.com>

On Wed, Apr 15, 2026 at 04:04:50PM +0200, Miklos Szeredi wrote:
> On Wed, 15 Apr 2026 at 15:35, Gregory Price <gourry@gourry.net> wrote:
> 
> > This was my first reaction when I realized the BPF program would be
> > controlling iomap return value in the fault path.  Big ol' (!)  popped
> > up over my head.
> 
> I'm wondering which part of this triggers the big (!).
> 
> BPF program being run in the fault path?
> 
> Or that the return value from the BPF function is used as iomap?
> 
> Or something else?

If a BPF program controls what memory address a fault now allows access
to, who validates that this is a memory address within the purview of
the BPF program, and not, say, the address of the kernel page tables?

(I have done no looking to determine if this is already considered)

^ permalink raw reply

* Re: [RFC, PATCH 10/12] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle
From: Usama Arif @ 2026-04-15 15:08 UTC (permalink / raw)
  To: Kiryl Shutsemau (Meta)
  Cc: Usama Arif, Andrew Morton, Peter Xu, David Hildenbrand,
	Lorenzo Stoakes, Mike Rapoport, Suren Baghdasaryan,
	Vlastimil Babka, Liam R . Howlett, Zi Yan, Jonathan Corbet,
	Shuah Khan, Sean Christopherson, Paolo Bonzini, linux-mm,
	linux-kernel, linux-doc, linux-kselftest, kvm
In-Reply-To: <20260414142354.1465950-11-kas@kernel.org>

On Tue, 14 Apr 2026 15:23:44 +0100 "Kiryl Shutsemau (Meta)" <kas@kernel.org> wrote:

> Add UFFDIO_SET_MODE ioctl to toggle UFFD_FEATURE_MINOR_ASYNC at
> runtime. Takes mmap_write_lock for serialization against all in-flight
> faults. On sync-to-async transition, wake threads blocked in
> handle_userfault() so they retry and auto-resolve.
> 
> Since ctx->features can now be modified concurrently, add
> userfaultfd_features() helper that wraps READ_ONCE() and convert
> all ctx->features reads to use it.
> 
> Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
> Assisted-by: Claude:claude-opus-4-6
> ---
>  fs/userfaultfd.c                 | 95 ++++++++++++++++++++++++++++----
>  include/uapi/linux/userfaultfd.h | 13 +++++
>  2 files changed, 96 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 43064238fd8d..0edb33599491 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -79,24 +79,33 @@ struct userfaultfd_wake_range {
>  /* internal indication that UFFD_API ioctl was successfully executed */
>  #define UFFD_FEATURE_INITIALIZED		(1u << 31)
>  
> +/*
> + * Read ctx->features with READ_ONCE() since UFFDIO_SET_MODE can
> + * modify it concurrently.
> + */
> +static unsigned int userfaultfd_features(struct userfaultfd_ctx *ctx)
> +{
> +	return READ_ONCE(ctx->features);
> +}
> +
>  static bool userfaultfd_is_initialized(struct userfaultfd_ctx *ctx)
>  {
> -	return ctx->features & UFFD_FEATURE_INITIALIZED;
> +	return userfaultfd_features(ctx) & UFFD_FEATURE_INITIALIZED;
>  }
>  
>  static bool userfaultfd_wp_async_ctx(struct userfaultfd_ctx *ctx)
>  {
> -	return ctx && (ctx->features & UFFD_FEATURE_WP_ASYNC);
> +	return ctx && (userfaultfd_features(ctx) & UFFD_FEATURE_WP_ASYNC);
>  }
>  
>  static bool userfaultfd_minor_anon_ctx(struct userfaultfd_ctx *ctx)
>  {
> -	return ctx && (ctx->features & UFFD_FEATURE_MINOR_ANON);
> +	return ctx && (userfaultfd_features(ctx) & UFFD_FEATURE_MINOR_ANON);
>  }
>  
>  static bool userfaultfd_minor_async_ctx(struct userfaultfd_ctx *ctx)
>  {
> -	return ctx && (ctx->features & UFFD_FEATURE_MINOR_ASYNC);
> +	return ctx && (userfaultfd_features(ctx) & UFFD_FEATURE_MINOR_ASYNC);
>  }
>  
>  static unsigned int userfaultfd_ctx_flags(struct userfaultfd_ctx *ctx)
> @@ -122,7 +131,7 @@ bool userfaultfd_wp_unpopulated(struct vm_area_struct *vma)
>  	if (!ctx)
>  		return false;
>  
> -	return ctx->features & UFFD_FEATURE_WP_UNPOPULATED;
> +	return userfaultfd_features(ctx) & UFFD_FEATURE_WP_UNPOPULATED;
>  }
>  
>  static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode,
> @@ -435,7 +444,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
>  	/* 0 or > 1 flags set is a bug; we expect exactly 1. */
>  	VM_WARN_ON_ONCE(!reason || (reason & (reason - 1)));
>  
> -	if (ctx->features & UFFD_FEATURE_SIGBUS)
> +	if (userfaultfd_features(ctx) & UFFD_FEATURE_SIGBUS)
>  		goto out;
>  	if (!(vmf->flags & FAULT_FLAG_USER) && (ctx->flags & UFFD_USER_MODE_ONLY))
>  		goto out;
> @@ -506,7 +515,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
>  	init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function);
>  	uwq.wq.private = current;
>  	uwq.msg = userfault_msg(vmf->address, vmf->real_address, vmf->flags,
> -				reason, ctx->features);
> +				reason, userfaultfd_features(ctx));
>  	uwq.ctx = ctx;
>  	uwq.waken = false;
>  
> @@ -668,7 +677,7 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs)
>  	if (!octx)
>  		return 0;
>  
> -	if (!(octx->features & UFFD_FEATURE_EVENT_FORK)) {
> +	if (!(userfaultfd_features(octx) & UFFD_FEATURE_EVENT_FORK)) {
>  		userfaultfd_reset_ctx(vma);
>  		return 0;
>  	}
> @@ -774,7 +783,7 @@ void mremap_userfaultfd_prep(struct vm_area_struct *vma,
>  	if (!ctx)
>  		return;
>  
> -	if (ctx->features & UFFD_FEATURE_EVENT_REMAP) {
> +	if (userfaultfd_features(ctx) & UFFD_FEATURE_EVENT_REMAP) {
>  		vm_ctx->ctx = ctx;
>  		userfaultfd_ctx_get(ctx);
>  		down_write(&ctx->map_changing_lock);
> @@ -824,7 +833,7 @@ bool userfaultfd_remove(struct vm_area_struct *vma,
>  	struct userfaultfd_wait_queue ewq;
>  
>  	ctx = vma->vm_userfaultfd_ctx.ctx;
> -	if (!ctx || !(ctx->features & UFFD_FEATURE_EVENT_REMOVE))
> +	if (!ctx || !(userfaultfd_features(ctx) & UFFD_FEATURE_EVENT_REMOVE))
>  		return true;
>  
>  	userfaultfd_ctx_get(ctx);
> @@ -863,7 +872,7 @@ int userfaultfd_unmap_prep(struct vm_area_struct *vma, unsigned long start,
>  	struct userfaultfd_unmap_ctx *unmap_ctx;
>  	struct userfaultfd_ctx *ctx = vma->vm_userfaultfd_ctx.ctx;
>  
> -	if (!ctx || !(ctx->features & UFFD_FEATURE_EVENT_UNMAP) ||
> +	if (!ctx || !(userfaultfd_features(ctx) & UFFD_FEATURE_EVENT_UNMAP) ||
>  	    has_unmap_ctx(ctx, unmaps, start, end))
>  		return 0;
>  
> @@ -1826,6 +1835,65 @@ static int userfaultfd_deactivate(struct userfaultfd_ctx *ctx,
>  	return ret;
>  }
>  
> +/*
> + * Features that can be toggled at runtime via UFFDIO_SET_MODE.
> + * Only async features that were enabled at UFFDIO_API time may be toggled.
> + */
> +#define UFFD_FEATURE_TOGGLEABLE	(UFFD_FEATURE_MINOR_ASYNC)
> +
> +static int userfaultfd_set_mode(struct userfaultfd_ctx *ctx,
> +				  unsigned long arg)
> +{
> +	struct uffdio_set_mode mode;
> +	struct mm_struct *mm = ctx->mm;
> +
> +	if (copy_from_user(&mode, (void __user *)arg, sizeof(mode)))
> +		return -EFAULT;
> +
> +	/* enable and disable must not overlap */
> +	if (mode.enable & mode.disable)
> +		return -EINVAL;
> +
> +	/* only toggleable features are allowed */
> +	if ((mode.enable | mode.disable) & ~UFFD_FEATURE_TOGGLEABLE)
> +		return -EINVAL;

The commit message states "Only async features that were enabled at
UFFDIO_API time may be toggled."  However, the code only checks that
the requested feature is in UFFD_FEATURE_TOGGLEABLE.

Is it intentional that a user who opened a uffd without
UFFD_FEATURE_MINOR_ASYNC can still enable it later via
UFFDIO_SET_MODE? 

> +
> +	if (!mmget_not_zero(mm))
> +		return -ESRCH;
> +
> +	/*
> +	 * mmap_write_lock serializes against all page faults.
> +	 * After we release, no in-flight faults from the old mode exist.
> +	 */
> +	{
> +		unsigned int new_features;
> +
> +		mmap_write_lock(mm);
> +		new_features = userfaultfd_features(ctx);
> +		new_features |= mode.enable;
> +		new_features &= ~mode.disable;
> +		WRITE_ONCE(ctx->features, new_features);
> +		mmap_write_unlock(mm);
> +	}
> +
> +	/*
> +	 * If switching to async, wake threads blocked in handle_userfault().
> +	 * They will retry the fault and auto-resolve under the new mode.
> +	 * len=0 means wake all pending faults on this context.
> +	 */
> +	if (mode.enable & UFFD_FEATURE_MINOR_ASYNC) {
> +		struct userfaultfd_wake_range range = { .len = 0 };
> +
> +		spin_lock_irq(&ctx->fault_pending_wqh.lock);
> +		__wake_up_locked_key(&ctx->fault_pending_wqh, TASK_NORMAL,
> +				     &range);
> +		__wake_up(&ctx->fault_wqh, TASK_NORMAL, 1, &range);
> +		spin_unlock_irq(&ctx->fault_pending_wqh.lock);
> +	}
> +
> +	mmput(mm);
> +	return 0;
> +}
>  
>  static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg)
>  {
> @@ -2150,6 +2218,9 @@ static long userfaultfd_ioctl(struct file *file, unsigned cmd,
>  	case UFFDIO_DEACTIVATE:
>  		ret = userfaultfd_deactivate(ctx, arg);
>  		break;
> +	case UFFDIO_SET_MODE:
> +		ret = userfaultfd_set_mode(ctx, arg);
> +		break;
>  	}
>  	return ret;
>  }
> @@ -2177,7 +2248,7 @@ static void userfaultfd_show_fdinfo(struct seq_file *m, struct file *f)
>  	 *	protocols: aa:... bb:...
>  	 */
>  	seq_printf(m, "pending:\t%lu\ntotal:\t%lu\nAPI:\t%Lx:%x:%Lx\n",
> -		   pending, total, UFFD_API, ctx->features,
> +		   pending, total, UFFD_API, userfaultfd_features(ctx),
>  		   UFFD_API_IOCTLS|UFFD_API_RANGE_IOCTLS);
>  }
>  #endif
> diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
> index 775825da2596..f0f14f9db06c 100644
> --- a/include/uapi/linux/userfaultfd.h
> +++ b/include/uapi/linux/userfaultfd.h
> @@ -84,6 +84,7 @@
>  #define _UFFDIO_CONTINUE		(0x07)
>  #define _UFFDIO_POISON			(0x08)
>  #define _UFFDIO_DEACTIVATE		(0x09)
> +#define _UFFDIO_SET_MODE		(0x0A)
>  #define _UFFDIO_API			(0x3F)
>  
>  /* userfaultfd ioctl ids */
> @@ -110,6 +111,8 @@
>  				      struct uffdio_poison)
>  #define UFFDIO_DEACTIVATE	_IOR(UFFDIO, _UFFDIO_DEACTIVATE,	\
>  				     struct uffdio_range)
> +#define UFFDIO_SET_MODE		_IOW(UFFDIO, _UFFDIO_SET_MODE,	\
> +				     struct uffdio_set_mode)
>  
>  /* read() structure */
>  struct uffd_msg {
> @@ -395,6 +398,16 @@ struct uffdio_move {
>  	__s64 move;
>  };
>  
> +struct uffdio_set_mode {
> +	/*
> +	 * Toggle async mode for features at runtime.
> +	 * Supported: UFFD_FEATURE_MINOR_ASYNC.
> +	 * Setting a bit in both enable and disable is invalid.
> +	 */
> +	__u64 enable;
> +	__u64 disable;
> +};
> +
>  /*
>   * Flags for the userfaultfd(2) system call itself.
>   */
> -- 
> 2.51.2
> 
> 

^ permalink raw reply

* Re: [PATCH v4 04/13] dt-bindings: power: supply: document Samsung S2M series PMIC charger device
From: Krzysztof Kozlowski @ 2026-04-15 14:39 UTC (permalink / raw)
  To: Kaustabh Chakraborty
  Cc: Lee Jones, Pavel Machek, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, MyungJoo Ham, Chanwoo Choi, Sebastian Reichel,
	André Draszik, Alexandre Belloni, Jonathan Corbet,
	Shuah Khan, Nam Tran, Łukasz Lebiedziński, linux-leds,
	devicetree, linux-kernel, linux-pm, linux-samsung-soc, linux-rtc,
	linux-doc
In-Reply-To: <DHTS9H2EIM2D.2TC17F9WBOOR1@disroot.org>

On 15/04/2026 16:03, Kaustabh Chakraborty wrote:
> On 2026-04-15 09:18 +02:00, Krzysztof Kozlowski wrote:
>> On Tue, Apr 14, 2026 at 12:02:56PM +0530, Kaustabh Chakraborty wrote:
>>> +description: |
>>> +  The Samsung S2M series PMIC battery charger manages power interfacing
>>> +  of the USB port. It may supply power, as done in USB OTG operation
>>> +  mode, or it may accept power and redirect it to the battery fuelgauge
>>> +  for charging.
>>> +
>>> +  This is a part of device tree bindings for S2M and S5M family of Power
>>> +  Management IC (PMIC).
>>> +
>>> +  See also Documentation/devicetree/bindings/mfd/samsung,s2mps11.yaml for
>>> +  additional information and example.
>>> +
>>> +allOf:
>>> +  - $ref: power-supply.yaml#
>>> +
>>> +properties:
>>> +  compatible:
>>> +    enum:
>>> +      - samsung,s2mu005-charger
>>> +
>>> +  port:
>>> +    $ref: /schemas/graph.yaml#/properties/port
>>
>> That port is internal part of the device, thus should be dropped which
>> leaves you with only one property - monitored battery - and therefore
>> fold the node into the parent node.
> 
> And that monitored-battery belongs to power-supply.yaml. Do I then
> include the allOf block in the mfd/samsung,s2mps11.yaml under the
> s2mu005 compatible?

allOf does not go under the compatible. The entire device schema should
have $ref to power-supply.yaml, just like many other devices have that
or different $ref.

Best regards,
Krzysztof

^ permalink raw reply

* Re: [PATCH v4 05/13] dt-bindings: mfd: s2mps11: add documentation for S2MU005 PMIC
From: Krzysztof Kozlowski @ 2026-04-15 14:27 UTC (permalink / raw)
  To: Kaustabh Chakraborty
  Cc: Lee Jones, Pavel Machek, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, MyungJoo Ham, Chanwoo Choi, Sebastian Reichel,
	André Draszik, Alexandre Belloni, Jonathan Corbet,
	Shuah Khan, Nam Tran, Łukasz Lebiedziński, linux-leds,
	devicetree, linux-kernel, linux-pm, linux-samsung-soc, linux-rtc,
	linux-doc
In-Reply-To: <DHTSO9L6YZTQ.WYM9ERXBGNGB@disroot.org>

On 15/04/2026 16:22, Kaustabh Chakraborty wrote:
> On 2026-04-15 09:17 +02:00, Krzysztof Kozlowski wrote:
>> On Tue, Apr 14, 2026 at 12:02:57PM +0530, Kaustabh Chakraborty wrote:
>>>  
>>>    clocks:
>>>      $ref: /schemas/clock/samsung,s2mps11.yaml
>>>      description:
>>>        Child node describing clock provider.
>>>  
>>> +  charger:
>>> +    $ref: /schemas/power/supply/samsung,s2mu005-charger.yaml
>>> +    description:
>>> +      Child node describing battery charger device.
>>> +
>>> +  extcon:
>>
>> You got comment to drop extcon naming. If this stays, it's muic for
>> example.
>>
>>> +    $ref: /schemas/extcon/samsung,s2mu005-muic.yaml
>>> +    description:
>>> +      Child node describing extcon device.
>>> +
>>> +  flash:
>>> +    $ref: /schemas/leds/samsung,s2mu005-flash.yaml
>>> +    description:
>>> +      Child node describing flash LEDs.
>>> +
>>
>> Please make it a separate binding file.
> 
> What do you mean by that?

I mean, S2MU005 should go to its own file.

> 
>>
>>>    interrupts:
>>>      maxItems: 1
>>>  
>>> @@ -43,6 +59,11 @@ properties:
>>>      description:
>>>        List of child nodes that specify the regulators.
>>>  
>>> +  rgb:
>>
>> led
> 
> Well flash ones are also LEDs. Would you rather have `flash { ... }` and
> `rgb { ... }` under `led { ... }` instead?

There is no approved name "rgb" for LEDs. What is the name for flash LEDs?

Best regards,
Krzysztof

^ permalink raw reply

* Re: [PATCH v4 05/13] dt-bindings: mfd: s2mps11: add documentation for S2MU005 PMIC
From: Kaustabh Chakraborty @ 2026-04-15 14:22 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Kaustabh Chakraborty
  Cc: Lee Jones, Pavel Machek, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, MyungJoo Ham, Chanwoo Choi, Sebastian Reichel,
	André Draszik, Alexandre Belloni, Jonathan Corbet,
	Shuah Khan, Nam Tran, Łukasz Lebiedziński, linux-leds,
	devicetree, linux-kernel, linux-pm, linux-samsung-soc, linux-rtc,
	linux-doc
In-Reply-To: <20260415-notorious-dainty-starfish-58a13c@quoll>

On 2026-04-15 09:17 +02:00, Krzysztof Kozlowski wrote:
> On Tue, Apr 14, 2026 at 12:02:57PM +0530, Kaustabh Chakraborty wrote:
>>  
>>    clocks:
>>      $ref: /schemas/clock/samsung,s2mps11.yaml
>>      description:
>>        Child node describing clock provider.
>>  
>> +  charger:
>> +    $ref: /schemas/power/supply/samsung,s2mu005-charger.yaml
>> +    description:
>> +      Child node describing battery charger device.
>> +
>> +  extcon:
>
> You got comment to drop extcon naming. If this stays, it's muic for
> example.
>
>> +    $ref: /schemas/extcon/samsung,s2mu005-muic.yaml
>> +    description:
>> +      Child node describing extcon device.
>> +
>> +  flash:
>> +    $ref: /schemas/leds/samsung,s2mu005-flash.yaml
>> +    description:
>> +      Child node describing flash LEDs.
>> +
>
> Please make it a separate binding file.

What do you mean by that?

>
>>    interrupts:
>>      maxItems: 1
>>  
>> @@ -43,6 +59,11 @@ properties:
>>      description:
>>        List of child nodes that specify the regulators.
>>  
>> +  rgb:
>
> led

Well flash ones are also LEDs. Would you rather have `flash { ... }` and
`rgb { ... }` under `led { ... }` instead?

>
>> +    $ref: /schemas/leds/samsung,s2mu005-rgb.yaml
>> +    description:
>> +      Child node describing RGB LEDs.
>> +

^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: Miklos Szeredi @ 2026-04-15 14:04 UTC (permalink / raw)
  To: Gregory Price
  Cc: David Hildenbrand (Arm), Darrick J. Wong, John Groves,
	Joanne Koong, Bernd Schubert, John Groves, Dan Williams,
	Bernd Schubert, Alison Schofield, John Groves, Jonathan Corbet,
	Shuah Khan, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
	Alexander Viro, Christian Brauner, Randy Dunlap, Jeff Layton,
	Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Josef Bacik,
	Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
	Sean Christopherson, Shivank Garg, Ackerley Tng, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <ad-UAMcALRubBcHk@gourry-fedora-PF4VCD3F>

On Wed, 15 Apr 2026 at 15:35, Gregory Price <gourry@gourry.net> wrote:

> This was my first reaction when I realized the BPF program would be
> controlling iomap return value in the fault path.  Big ol' (!)  popped
> up over my head.

I'm wondering which part of this triggers the big (!).

BPF program being run in the fault path?

Or that the return value from the BPF function is used as iomap?

Or something else?

Thanks,
Miklos

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox