Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH 03/12] swap,fs: move swapfile operations to struct file_operations
From: Damien Le Moal @ 2026-05-12  7:16 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song
  Cc: Christian Brauner, Darrick J . Wong, Jens Axboe, David Sterba,
	Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
	Paulo Alcantara, Carlos Maiolino, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-4-hch@lst.de>

On 5/12/26 14:35, Christoph Hellwig wrote:
> The swap operations have nothing to do with the address_space, which is
> used for pagecache operations.  Move them to struct file_operations
> instead.  This will allow moving the block device special cases into
> block/fops.c subsequently.
> 
> Pass struct file first to ->swap_activate as file operations typically
> get the file or iocb as first argument and use swap_activate instead of
> swapfile_activate in all names to be consistent.
> 
> Note that while the trivial iomap wrappers are moved to a new file when
> applicable to keep them local to the file operation instances, complex
> implementation are kept in their existing place.  It might be worth to
> move them in follow-on patches if the maintainers desire so.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply

* Re: [PATCH 02/12] swap: move boilerplate code into the core swap code
From: Damien Le Moal @ 2026-05-12  7:11 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song
  Cc: Christian Brauner, Darrick J . Wong, Jens Axboe, David Sterba,
	Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
	Paulo Alcantara, Carlos Maiolino, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-3-hch@lst.de>

On 5/12/26 14:35, Christoph Hellwig wrote:
> Make the core swap code calculate sis->pages, nr_extents and the span,
> re-set sis->max based on it and don't require passing the current offset
> into the swap file to swap_add_extent as all that can trivially be
> calculated internally.  Also truncate the spans based on the available
> information.
> 
> All this removes a lot of boilerplate code in the callers.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

For the zonefs bits,

Acked-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply

* Re: [PATCH 01/12] swap: remove the maxpages variable in sys_swapon
From: Damien Le Moal @ 2026-05-12  7:08 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song
  Cc: Christian Brauner, Darrick J . Wong, Jens Axboe, David Sterba,
	Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
	Paulo Alcantara, Carlos Maiolino, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-2-hch@lst.de>

On 5/12/26 14:35, Christoph Hellwig wrote:
> Always use si->max which is updated setup_swap_extents instead of copying
> into and out of maxpages.

Checking mm/swapfile.c, I see s->max being set only in swapon(). Is this a typo
or am I misunderstanding this sentence ?

Looks good otherwise, but it would be nice to rename ->max to ->maxpages to make
it clear what this is counting.

> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  mm/swapfile.c | 27 +++++++++++----------------
>  1 file changed, 11 insertions(+), 16 deletions(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 9174f1eeffb0..f7ebd97e28a3 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -3350,10 +3350,9 @@ static unsigned long read_swap_header(struct swap_info_struct *si,
>  }
>  
>  static int setup_swap_clusters_info(struct swap_info_struct *si,
> -				    union swap_header *swap_header,
> -				    unsigned long maxpages)
> +				    union swap_header *swap_header)
>  {
> -	unsigned long nr_clusters = DIV_ROUND_UP(maxpages, SWAPFILE_CLUSTER);
> +	unsigned long nr_clusters = DIV_ROUND_UP(si->max, SWAPFILE_CLUSTER);
>  	struct swap_cluster_info *cluster_info;
>  	int err = -ENOMEM;
>  	unsigned long i;
> @@ -3395,7 +3394,7 @@ static int setup_swap_clusters_info(struct swap_info_struct *si,
>  		if (err)
>  			goto err;
>  	}
> -	for (i = maxpages; i < round_up(maxpages, SWAPFILE_CLUSTER); i++) {
> +	for (i = si->max; i < round_up(si->max, SWAPFILE_CLUSTER); i++) {
>  		err = swap_cluster_setup_bad_slot(si, cluster_info, i, true);
>  		if (err)
>  			goto err;
> @@ -3425,7 +3424,7 @@ static int setup_swap_clusters_info(struct swap_info_struct *si,
>  	si->cluster_info = cluster_info;
>  	return 0;
>  err:
> -	free_swap_cluster_info(cluster_info, maxpages);
> +	free_swap_cluster_info(cluster_info, si->max);
>  	return err;
>  }
>  
> @@ -3440,7 +3439,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>  	union swap_header *swap_header;
>  	int nr_extents;
>  	sector_t span;
> -	unsigned long maxpages;
>  	struct folio *folio = NULL;
>  	struct inode *inode = NULL;
>  	bool inced_nr_rotate_swap = false;
> @@ -3512,14 +3510,13 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>  	}
>  	swap_header = kmap_local_folio(folio, 0);
>  
> -	maxpages = read_swap_header(si, swap_header, inode);
> -	if (unlikely(!maxpages)) {
> +	si->max = read_swap_header(si, swap_header, inode);
> +	if (unlikely(!si->max)) {
>  		error = -EINVAL;
>  		goto bad_swap_unlock_inode;
>  	}
>  
> -	si->max = maxpages;
> -	si->pages = maxpages - 1;
> +	si->pages = si->max - 1;
>  	nr_extents = setup_swap_extents(si, swap_file, &span);
>  	if (nr_extents < 0) {
>  		error = nr_extents;
> @@ -3531,14 +3528,12 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>  		goto bad_swap_unlock_inode;
>  	}
>  
> -	maxpages = si->max;
> -
>  	/* Set up the swap cluster info */
> -	error = setup_swap_clusters_info(si, swap_header, maxpages);
> +	error = setup_swap_clusters_info(si, swap_header);
>  	if (error)
>  		goto bad_swap_unlock_inode;
>  
> -	error = swap_cgroup_swapon(si->type, maxpages);
> +	error = swap_cgroup_swapon(si->type, si->max);
>  	if (error)
>  		goto bad_swap_unlock_inode;
>  
> @@ -3546,7 +3541,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>  	 * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might
>  	 * be above MAX_PAGE_ORDER incase of a large swap file.
>  	 */
> -	si->zeromap = kvmalloc_array(BITS_TO_LONGS(maxpages), sizeof(long),
> +	si->zeromap = kvmalloc_array(BITS_TO_LONGS(si->max), sizeof(long),
>  				     GFP_KERNEL | __GFP_ZERO);
>  	if (!si->zeromap) {
>  		error = -ENOMEM;
> @@ -3597,7 +3592,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>  		}
>  	}
>  
> -	error = zswap_swapon(si->type, maxpages);
> +	error = zswap_swapon(si->type, si->max);
>  	if (error)
>  		goto bad_swap_unlock_inode;
>  


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply

* Re: [PATCH RESEND bpf-next v10 5/8] bpf: Add bpf_list_add to insert node after a given list node
From: bot+bpf-ci @ 2026-05-12  6:41 UTC (permalink / raw)
  To: kaitao.cheng, ast, corbet, martin.lau, daniel, andrii, eddyz87,
	song, yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
	shuah, chengkaitao, skhan, memxor
  Cc: bpf, linux-kernel, linux-doc, vmalik, linux-kselftest, ast,
	andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai
In-Reply-To: <20260512055919.95716-6-kaitao.cheng@linux.dev>

[-- Attachment #1: Type: text/plain, Size: 3224 bytes --]

> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2570,6 +2570,16 @@ __bpf_kfunc int bpf_list_push_back_impl(struct bpf_list_head *head,
>  	return bpf_list_push_back(head, node, meta__ign, off);
>  }
>
> +__bpf_kfunc int bpf_list_add(struct bpf_list_head *head, struct bpf_list_node *new,
> +			     struct bpf_list_node *prev, struct btf_struct_meta *meta,
> +			     u64 off)

The commit message states that 'prev' must already be in the list. How can a
BPF program obtain such a reference and pass it to this kfunc?

A node already in the list is tracked by the verifier as a non-owning
reference (PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF with ref_obj_id == 0).
The typical way to obtain this is via bpf_list_front() or bpf_list_back(),
which return non-owning references.

However, check_kfunc_args() in kernel/bpf/verifier.c handles
KF_ARG_PTR_TO_LIST_NODE by requiring owning references and rejects non-owning
references.

Without a similar distinction for KF_ARG_PTR_TO_LIST_NODE (as exists for
KF_ARG_PTR_TO_RB_NODE), is there any valid BPF register state that satisfies
both the verifier check and the runtime semantics stated in the commit
message?

The BPF CI bot reported this issue in v10. The author responded that this was
fixed in [PATCH bpf-next v10 7/8], which introduces the __nonown_allowed
annotation and corresponding verifier logic. However, patch 7/8 is not
present in this commit (feed006fb969), so the parameter lacks the
__nonown_allowed suffix and the verifier lacks the is_kfunc_arg_nonown_allowed()
check.

Reference: https://lore.kernel.org/bpf/cb7368c6062232b6ad5f9ab1c6a97bc8530566a8b3db5c4deefdd5ea63d31538@mail.kernel.org/

> +{
> +	struct bpf_list_node_kern *n = (void *)new, *p = (void *)prev;
> +	struct list_head *prev_ptr = &p->list_head;
> +
> +	return __bpf_list_add(n, head, &prev_ptr, meta ? meta->record : NULL, off);
> +}
> +

[ ... ]

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c

[ ... ]

> @@ -19235,8 +19238,11 @@ struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
>  		int struct_meta_reg = BPF_REG_3;
>  		int node_offset_reg = BPF_REG_4;
>
> -		/* rbtree_add has extra 'less' arg, so args-to-fixup are in diff regs */
> -		if (is_bpf_rbtree_add_kfunc(desc->func_id)) {
> +		/* list_add/rbtree_add have an extra arg (prev/less),
> +		 * so args-to-fixup are in diff regs.
> +		 */

This isn't a bug, but the multi-line comment format doesn't follow the BPF
subsystem style. The preferred form in kernel/bpf/ has the opening /* on its
own line:

    /*
     * list_add/rbtree_add have an extra arg (prev/less),
     * so args-to-fixup are in diff regs.
     */

> +		if (desc->func_id == special_kfunc_list[KF_bpf_list_add] ||
> +		    is_bpf_rbtree_add_kfunc(desc->func_id)) {
>  			struct_meta_reg = BPF_REG_4;
>  			node_offset_reg = BPF_REG_5;
>  		}

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25716874656

^ permalink raw reply

* Re: [PATCH RESEND bpf-next v10 7/8] bpf: allow non-owning list-node args via __nonown_allowed
From: bot+bpf-ci @ 2026-05-12  6:41 UTC (permalink / raw)
  To: kaitao.cheng, ast, corbet, martin.lau, daniel, andrii, eddyz87,
	song, yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
	shuah, chengkaitao, skhan, memxor
  Cc: bpf, linux-kernel, linux-doc, vmalik, linux-kselftest, ast,
	andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai
In-Reply-To: <20260512055919.95716-8-kaitao.cheng@linux.dev>

[-- Attachment #1: Type: text/plain, Size: 2556 bytes --]

> diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> index 75e6c078e0e7..3a9db1108b95 100644
> --- a/Documentation/bpf/kfuncs.rst
> +++ b/Documentation/bpf/kfuncs.rst

[ ... ]

> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index dfd465badd9d..f2f8705f0e9a 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c

[ ... ]

> @@ -2620,9 +2620,9 @@ __bpf_kfunc struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head)
>  }
>
>  __bpf_kfunc struct bpf_list_node *bpf_list_del(struct bpf_list_head *head,
> -					       struct bpf_list_node *node)
> +				struct bpf_list_node *node__nonown_allowed)
>  {
> -	struct bpf_list_node_kern *kn = (void *)node;
> +	struct bpf_list_node_kern *kn = (void *)node__nonown_allowed;

This isn't a bug, but the second parameter alignment is inconsistent with
the style used in the rest of this file. Most multi-line function
declarations in helpers.c align continuation lines under the first
parameter. The other functions modified in this same patch (bpf_list_add,
bpf_list_is_first, bpf_list_is_last) follow that convention, but this
one does not.

[ ... ]

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 71d50415f743..4d78d834c609 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c

[ ... ]

> @@ -12017,6 +12022,13 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>  			break;
>  		case KF_ARG_PTR_TO_LIST_NODE:
> +			if (is_kfunc_arg_nonown_allowed(btf, &args[i]) &&
> +			    type_is_non_owning_ref(reg->type) && !reg->ref_obj_id) {
> +				/* Allow bpf_list_front/back return value for
> +				 * __nonown_allowed list-node arguments.
> +				 */
> +				goto check_ok;
> +			}

A subsystem pattern flags this as potentially concerning: The BPF
subsystem guide requests that multi-line comments in kernel/bpf/ files
follow the modern kernel style with the opening /* on its own line,
even when the surrounding file uses the old style. The comment you added
begins text on the same line as the opening delimiter. While the comment
itself is clear and correct, the preferred formatting for new comments
in this directory would be:

				/*
				 * Allow bpf_list_front/back return value for
				 * __nonown_allowed list-node arguments.
				 */

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25716874656

^ permalink raw reply

* Re: [PATCH RESEND bpf-next v10 2/8] bpf: clear list node owner and unlink before drop
From: bot+bpf-ci @ 2026-05-12  6:41 UTC (permalink / raw)
  To: kaitao.cheng, ast, corbet, martin.lau, daniel, andrii, eddyz87,
	song, yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
	shuah, chengkaitao, skhan, memxor
  Cc: bpf, linux-kernel, linux-doc, vmalik, linux-kselftest, ast,
	andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai
In-Reply-To: <20260512055919.95716-3-kaitao.cheng@linux.dev>

[-- Attachment #1: Type: text/plain, Size: 6354 bytes --]

> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 9cd7b028592c..1e8754877dd1 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2247,10 +2247,11 @@ EXPORT_SYMBOL_GPL(bpf_base_func_proto);
>  void bpf_list_head_free(const struct btf_field *field, void *list_head,
>  			struct bpf_spin_lock *spin_lock)
>  {
> -	struct list_head *head = list_head, *orig_head = list_head;
> +	struct list_head *head = list_head, drain, *pos, *n;
>
>  	BUILD_BUG_ON(sizeof(struct list_head) > sizeof(struct bpf_list_head));
>  	BUILD_BUG_ON(__alignof__(struct list_head) > __alignof__(struct bpf_list_head));
> +	INIT_LIST_HEAD(&drain);
>
>  	/* Do the actual list draining outside the lock to not hold the lock for
>  	 * too long, and also prevent deadlocks if tracing programs end up
> @@ -2261,20 +2262,23 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
>  	__bpf_spin_lock_irqsave(spin_lock);
>  	if (!head->next || list_empty(head))
>  		goto unlock;
> -	head = head->next;
> +	list_for_each_safe(pos, n, head) {
> +		WRITE_ONCE(container_of(pos,
> +			struct bpf_list_node_kern, list_head)->owner, NULL);
> +		list_move_tail(pos, &drain);
> +	}
>  unlock:
> -	INIT_LIST_HEAD(orig_head);
> +	INIT_LIST_HEAD(head);
>  	__bpf_spin_unlock_irqrestore(spin_lock);
>
> -	while (head != orig_head) {
> -		void *obj = head;
> -
> -		obj -= field->graph_root.node_offset;
> -		head = head->next;
> +	while (!list_empty(&drain)) {
> +		pos = drain.next;
> +		list_del_init(pos);

Does this create a race condition when a BPF program holds a
refcount-acquired reference to a node being freed?

The patch clears each node's owner to NULL under spin_lock, moves the
node to a local 'drain' list, then drops the lock. After unlock, the
drain loop walks the local list and calls list_del_init(pos) before
__bpf_obj_drop_impl().

Between spin_unlock and list_del_init(pos), each node in 'drain' has
owner == NULL AND is still linked in 'drain'.

When a BPF program holds an owning or refcount-acquired reference to
one of these nodes (node X), which is structurally supported because
__bpf_obj_drop_impl() uses refcount_dec_and_test() and only frees at
refcount 0, a concurrent push to a DIFFERENT bpf_list_head becomes a
corruption:

CPU 0 (bpf_list_head_free, lock released)  CPU 1 (BPF prog, refcount X)
-----------------------------------------   ----------------------------
(owner of X == NULL, X linked in drain)
                                            bpf_list_push_back(other, X)
                                              __bpf_list_add: spin_lock()
                                              cmpxchg(X->owner, NULL,
                                                      POISON) -> OK
                                              list_add_tail(&X->list_head,
                                                            other_head)
                                                -> overwrites X->next,
                                                   X->prev, corrupts
                                                   other_head's chain
                                                   because X is still
                                                   stitched into drain
pos = drain.next;      (may be X or neighbor using X's stale next)
list_del_init(pos);    reads X->next/prev now pointing into other_head,
                       corrupts other_head's list and/or drain

Pre-patch, the owner pointer was left pointing at the old head (never
cleared). Any concurrent push attempt cmpxchg(&node->owner, NULL,
POISON) would fail and return -EINVAL, blocking re-insertion. This
matches what bpf_rb_root_free() still does: it does NOT clear owner, so
concurrent bpf_rbtree_add() cmpxchg fails.

The patch clears owner under the lock but still links the node into the
unprotected local drain list, then drops the lock. The owner==NULL
state now acts as a 'push is allowed' signal even though the node is
still on a list, violating the invariant __bpf_list_add() relies on:

kernel/bpf/helpers.c:__bpf_list_add() {
    /* node->owner != NULL implies !list_empty(n), no need to separately
     * check the latter
     */
    if (cmpxchg(&node->owner, NULL, BPF_PTR_POISON)) {
            __bpf_obj_drop_impl((void *)n - off, rec, false);
            return -EINVAL;
    }

    tail ? list_add_tail(n, h) : list_add(n, h);
    WRITE_ONCE(node->owner, head);
}

Compounding this, 'drain' is a local stack variable: corruption can
redirect list pointers into kernel stack memory of the freeing CPU,
giving an unbounded list_add/list_del_init corruption vector.

Could you do list_del_init(pos) under spin_lock in the same critical
section that clears owner? Swapping the order (list_del_init then
WRITE_ONCE(owner, NULL), like __bpf_list_del() does), or leaving owner
non-NULL like bpf_rb_root_free does so cmpxchg in concurrent push still
fails?

The commit message's stated goal ('preventing subsequent users who hold
a reference count to the node from acquiring an invalid next node') is
only achievable if the delinking is done before unlock, not after.

>  		/* The contained type can also have resources, including a
>  		 * bpf_list_head which needs to be freed.
>  		 */
> -		__bpf_obj_drop_impl(obj, field->graph_root.value_rec, false);
> +		__bpf_obj_drop_impl((char *)pos - field->graph_root.node_offset,
> +				    field->graph_root.value_rec, false);
>  	}
>  }

bot+bpf-ci@kernel.org raised a concern in v9 about whether this patch
should include a Fixes tag. The CI bot suggested:

  Fixes: f0c5941ff5b2 ("bpf: Support bpf_list_head in map values")

since the concurrent reader issue was introduced when bpf_list_head
support was added. The author responded in v10 that the issue only
appears after the later patch 'bpf: Introduce the bpf_list_del kfunc' is
merged, so no Fixes tag was added.

However, the reasoning suggests the Fixes tag should point to the
bpf_list_del introduction rather than f0c5941ff5b2, and a Fixes tag may
still be appropriate. Should this include a Fixes tag pointing to the
commit that introduced bpf_list_del?

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25716874656

^ permalink raw reply

* Re: [PATCH RESEND bpf-next v10 3/8] bpf: Introduce the bpf_list_del kfunc.
From: bot+bpf-ci @ 2026-05-12  6:41 UTC (permalink / raw)
  To: kaitao.cheng, ast, corbet, martin.lau, daniel, andrii, eddyz87,
	song, yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
	shuah, chengkaitao, skhan, memxor
  Cc: bpf, linux-kernel, linux-doc, vmalik, linux-kselftest, ast,
	andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai
In-Reply-To: <20260512055919.95716-4-kaitao.cheng@linux.dev>

[-- Attachment #1: Type: text/plain, Size: 3947 bytes --]

> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 1e8754877dd1..51b6ea4bb8cb 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2593,6 +2593,15 @@ __bpf_kfunc struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head)
>  	return __bpf_list_del(head, h->prev);
>  }
>
> +__bpf_kfunc struct bpf_list_node *bpf_list_del(struct bpf_list_head *head,
> +					       struct bpf_list_node *node)
> +{
> +	struct bpf_list_node_kern *kn = (void *)node;
> +
> +	/* verifier guarantees node is a list node rather than list head */
> +	return __bpf_list_del(head, &kn->list_head);
> +}
> +

Looking at bpf_list_pop_front() and bpf_list_pop_back() in the same
file, both functions read from the list_head (h->next or h->prev)
before passing it to __bpf_list_del(). If the list is uninitialized,
they end up passing potentially invalid pointer values to __bpf_list_del(),
which then has to handle this by initializing the list and returning NULL.

While __bpf_list_del() handles this correctly, would it be clearer to
move the initialization check into the top-level kfuncs (like
bpf_list_pop_front/back) so that __bpf_list_del() always receives a
valid list_head pointer rather than potentially garbage values that
get ignored later?

Reference: https://lore.kernel.org/bpf/af50d188f6bc2de81fe3c321b26a645a9425ba9f2644cb6f8525d5bb637385f9@mail.kernel.org/

> @@ -4725,6 +4734,7 @@ BTF_ID_FLAGS(func, bpf_list_push_back, KF_IMPLICIT_ARGS)
>  BTF_ID_FLAGS(func, bpf_list_push_back_impl)
>  BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_list_del, KF_ACQUIRE | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_list_front, KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_list_back, KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_RCU | KF_RET_NULL)

[ ... ]

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 11054ad89c14..a6bcbe68cd72 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -10744,6 +10744,7 @@ enum special_kfunc_type {
>  	KF_bpf_list_push_back,
>  	KF_bpf_list_pop_front,
>  	KF_bpf_list_pop_back,
> +	KF_bpf_list_del,
>  	KF_bpf_list_front,
>  	KF_bpf_list_back,
>  	KF_bpf_cast_to_kern_ctx,

[ ... ]

> @@ -11456,7 +11459,8 @@ static bool check_kfunc_is_graph_node_api(struct bpf_verifier_env *env,
>
>  	switch (node_field_type) {
>  	case BPF_LIST_NODE:
> -		ret = is_bpf_list_push_kfunc(kfunc_btf_id);
> +		ret = is_bpf_list_push_kfunc(kfunc_btf_id) ||
> +		      kfunc_btf_id == special_kfunc_list[KF_bpf_list_del];
>  		break;
>  	case BPF_RB_NODE:
>  		ret = (is_bpf_rbtree_add_kfunc(kfunc_btf_id) ||

Can bpf_list_del actually be called from BPF programs? Looking at the
verifier's handling of KF_ARG_PTR_TO_LIST_NODE arguments in
check_kfunc_args():

	case KF_ARG_PTR_TO_LIST_NODE:
		...
		if (!reg->ref_obj_id) {
			verbose(env, "allocated object must be referenced\n");
			return -EINVAL;
		}

This code path unconditionally requires an owning reference (ref_obj_id
must be non-zero). However, once a node is added to the list via
bpf_list_push_front/back, the owning reference is converted to a
non-owning reference (ref_obj_id becomes 0). This means bpf_list_del
cannot be called on nodes that are actually in the list.

The sister kfunc bpf_rbtree_remove handles this by having a special
branch in check_kfunc_args() that accepts non-owning references for
that specific kfunc. Does bpf_list_del need similar handling to allow
removing nodes that are actually in the list?

Reference: https://lore.kernel.org/bpf/294242698d576e4fedf243ab055c6f9a984e55033933a9bd09808f62a7018e9d@mail.kernel.org/

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25716874656

^ permalink raw reply

* Re: [PATCH RESEND bpf-next v10 1/8] bpf: refactor __bpf_list_del to take list node pointer
From: bot+bpf-ci @ 2026-05-12  6:41 UTC (permalink / raw)
  To: kaitao.cheng, ast, corbet, martin.lau, daniel, andrii, eddyz87,
	song, yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
	shuah, chengkaitao, skhan, memxor
  Cc: bpf, linux-kernel, linux-doc, vmalik, linux-kselftest, ast,
	andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai
In-Reply-To: <20260512055919.95716-2-kaitao.cheng@linux.dev>

[-- Attachment #1: Type: text/plain, Size: 2700 bytes --]

> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index baa12b24bb64..9cd7b028592c 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2550,22 +2550,24 @@ __bpf_kfunc int bpf_list_push_back_impl(struct bpf_list_head *head,
>  	return bpf_list_push_back(head, node, meta__ign, off);
>  }
>
> -static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head, bool tail)
> +static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head,
> +					    struct list_head *n)
>  {
> -	struct list_head *n, *h = (void *)head;
> +	struct list_head *h = (void *)head;
>  	struct bpf_list_node_kern *node;
>
>  	/* If list_head was 0-initialized by map, bpf_obj_init_field wasn't
>  	 * called on its fields, so init here
>  	 */
> -	if (unlikely(!h->next))
> +	if (unlikely(!h->next)) {
>  		INIT_LIST_HEAD(h);
> +		return NULL;
> +	}
>  	if (list_empty(h))
>  		return NULL;
>
> -	n = tail ? h->prev : h->next;
>  	node = container_of(n, struct bpf_list_node_kern, list_head);
> -	if (WARN_ON_ONCE(READ_ONCE(node->owner) != head))
> +	if (unlikely(READ_ONCE(node->owner) != head))
>  		return NULL;
>
>  	list_del_init(n);
> @@ -2575,12 +2577,16 @@ static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head, bool tai
>
>  __bpf_kfunc struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head)
>  {
> -	return __bpf_list_del(head, false);
> +	struct list_head *h = (void *)head;
> +
> +	return __bpf_list_del(head, h->next);
>  }

The new code reads h->next before __bpf_list_del checks if the list is
initialized. If the list is uninitialized (h->next is NULL), a NULL pointer
is passed to __bpf_list_del.

While __bpf_list_del eventually initializes the list and returns NULL without
dereferencing the NULL pointer, this is counterintuitive - passing a
potentially garbage value that is later ignored.

Can you move the initialization check into the top-level kfuncs
(bpf_list_pop_front and bpf_list_pop_back) to ensure the list_head passed to
__bpf_list_del is always valid? You could initialize the list and return NULL
in that case - we know it's empty.

This concern was raised by Emil Tsalapatis in v8:
https://lore.kernel.org/bpf/DH6W7K2SLPYB.PCO7FQ6CR1YZ@etsalapatis.com/

>
>  __bpf_kfunc struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head)
>  {
> -	return __bpf_list_del(head, true);
> +	struct list_head *h = (void *)head;
> +
> +	return __bpf_list_del(head, h->prev);
>  }


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25716874656

^ permalink raw reply

* Re: [PATCH] docs: kernel-doc: python: strip __counted_by_ptr macro
From: Mauro Carvalho Chehab @ 2026-05-12  6:20 UTC (permalink / raw)
  To: Tudor Ambarus
  Cc: Mauro Carvalho Chehab, Kees Cook, Gustavo A. R. Silva,
	linux-kernel, linux-doc, linux-hardening, peter.griffin,
	andre.draszik, willmcvicker, jyescas, krzk, kernel-team
In-Reply-To: <20260506-kdoc-__counted_by_ptr-v1-1-70763486871f@linaro.org>

On Wed, 06 May 2026 11:04:12 +0000
Tudor Ambarus <tudor.ambarus@linaro.org> wrote:

> The `__counted_by_ptr` macro was recently introduced [1] to extend
> bounds checking semantics to standard dynamically allocated pointers.
> 
> However, the new Python implementation of kernel-doc does not currently
> recognize it as a compiler attribute. When kernel-doc encounters a
> struct member annotated with this macro, it fails to parse the variable
> name correctly, resulting in false-positive warnings like:
> 
>   Warning: ... struct member '__counted_by_ptr(cmdcnt' not described
> 
> Add `__counted_by_ptr` to the `struct_xforms` regex list so it gets
> safely stripped out during the parsing phase, mirroring the existing
> behavior for `__counted_by`. Update the corresponding unit tests.
> 
> Link: https://git.kernel.org/torvalds/c/150a04d817d8 [1]
> Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> ---
>  tools/lib/python/kdoc/xforms_lists.py | 1 +
>  tools/unittests/test_cmatch.py        | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/tools/lib/python/kdoc/xforms_lists.py b/tools/lib/python/kdoc/xforms_lists.py
> index f6ea9efb11ae..118156ea8cd2 100644
> --- a/tools/lib/python/kdoc/xforms_lists.py
> +++ b/tools/lib/python/kdoc/xforms_lists.py
> @@ -29,6 +29,7 @@ class CTransforms:
>          (CMatch("__aligned"), ""),
>          (CMatch("__counted_by"), ""),
>          (CMatch("__counted_by_(le|be)"), ""),
> +        (CMatch("__counted_by_ptr"), ""),
>          (CMatch("__guarded_by"), ""),
>          (CMatch("__pt_guarded_by"), ""),
>          (CMatch("__packed"), ""),
> diff --git a/tools/unittests/test_cmatch.py b/tools/unittests/test_cmatch.py
> index 7b996f83784d..109141cd2ab8 100755
> --- a/tools/unittests/test_cmatch.py
> +++ b/tools/unittests/test_cmatch.py
> @@ -320,6 +320,7 @@ class TestSubWithLocalXforms(TestCaseDiff):
>          (CMatch('__aligned'), ' '),
>          (CMatch('__counted_by'), ' '),
>          (CMatch('__counted_by_(le|be)'), ' '),
> +        (CMatch('__counted_by_ptr'), ' '),
>          (CMatch('__guarded_by'), ' '),
>          (CMatch('__pt_guarded_by'), ' '),
>  
> 
> ---
> base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
> change-id: 20260506-kdoc-__counted_by_ptr-1e206f3f1dc1
> 
> Best regards,



Thanks,
Mauro

^ permalink raw reply

* Re: [RFC PATCH 0/5] mm: support zswap-backed anonymous large folio swapin
From: David Hildenbrand (Arm) @ 2026-05-12  6:14 UTC (permalink / raw)
  To: Yosry Ahmed, fujunjie
  Cc: Andrew Morton, Chris Li, Kairui Song, Johannes Weiner, Nhat Pham,
	linux-mm, linux-kernel, linux-doc, Jonathan Corbet, Ryan Roberts,
	Barry Song, Baolin Wang, Chengming Zhou, Baoquan He,
	Lorenzo Stoakes
In-Reply-To: <agJT6D5zaUD6FpwQ@google.com>

On 5/12/26 00:13, Yosry Ahmed wrote:
> On Fri, May 08, 2026 at 08:18:29PM +0000, fujunjie wrote:
>> Hi,
>>
>> This RFC explores anonymous large folio swapin when a contiguous swap
>> range is backed consistently by zswap.
>>
>> Large folio swapout to zswap is already supported by storing each base
>> page in the folio as a separate zswap entry. The anonymous synchronous
>> swapin path has remained order-0 once zswap has ever been enabled:
>> zswap_load() rejected large folios, and alloc_swap_folio() avoided large
>> folio allocation to protect against mixed backend ranges.
>>
>> This RFC keeps the scope intentionally conservative. It does not try to
>> read one large folio from mixed zswap and disk backends, and it does not
>> change shmem swapin. Shmem still has its existing zswap fallback and is
>> left for later discussion. For anonymous swapin, the backend rule is made
>> explicit:
>>
>> - a range fully absent from zswap can keep using the disk backend
>> - a range fully present in zswap can be decompressed into a large folio
>> - a mixed zswap/non-zswap range falls back to order-0 swapin
>>
>> The series adds a zswap range query helper, teaches zswap_load() to
>> decompress all-zswap large folios one base page at a time, accounts mTHP
>> swpin for zswap-loaded large folios, retries synchronous large-folio
>> insertion races with order-0 swapin, and removes the anonymous
>> zswap-never-enabled restriction once mixed ranges are filtered.
>>
>> I tested the series with a full bzImage build using CONFIG_ZSWAP=y,
>> CONFIG_ZRAM=y, CONFIG_MEMCG=y and CONFIG_THP_SWAP=y.
>>
>> The QEMU/KVM runs covered both the fully-zswap path and the mixed-backend
>> fallback path. In the all-zswap run, a 512MiB anonymous mapping was faulted
>> as 8192 64KiB groups, reclaimed into zswap, and faulted back. Reclaim
>> reported mthp64_zswpout=8192 and zswpout=131072. Refault then reported
>> mthp64_swpin=8192 and zswpin=131072, and pagemap/kpageflags showed 8192
>> order-4 THP groups in the mapping.
>>
>> In the mixed-backend run, the workload used a 64MiB anonymous mapping
>> split into 1024 64KiB groups. After shrinker debugfs wrote back exactly
>> one zswap base-page entry, refault left 1023 order-4 THP groups and one
>> order-0 mixed group. The kernel stats matched that shape:
>> mthp64_swpin=1023, zswpin=16383 and zswpwb=1.
>>
>> CONFIG_SHRINKER_DEBUG is only a test aid for making that one zswap
>> writeback deterministic; it is not required by the implementation.
>>
>> Nhat Pham's active Virtual Swap Space series is adjacent work. It moves
>> swap cache and zswap entry state into a virtual swap descriptor, and lists
>> mixed backing THP swapin as a future use case. This RFC is independent and
>> works with the current swap/zswap infrastructure, but may need rebasing if
>> VSS lands first.
>>
>> Feedback would be especially helpful on:
>>
>> 1. whether it makes sense to support all-zswap large folio swapin first,
>>    while keeping mixed zswap/disk ranges on the order-0 fallback path
> 
> I think so, yes, but based on my read of the code this RFC only affects
> synchornous swapin, which is more-or-less zram+zswap. This is an
> uncommon setup outside of testing.

BLK_FEAT_SYNCHRONOUS is also set for pmem and brd devices I think, but that's
also pretty uncommon I assume. Well, maybe if your hypervisor provides you with
an emulated NVDIMM to use as swap backend ... maybe.

I thought there were other ways to get BLK_FEAT_SYNCHRONOUS set, but I don't see
other usage.

So seeing it for zswap is pretty rare I assume.

-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH] docs: reporting-issues: fix advice wording
From: Thorsten Leemhuis @ 2026-05-12  6:09 UTC (permalink / raw)
  To: Chen-Shi-Hong; +Cc: corbet, skhan, linux-doc, linux-kernel
In-Reply-To: <20260512015146.4081-1-eric039eric@gmail.com>

On 5/12/26 03:51, Chen-Shi-Hong wrote:
> Replace "these advices" with "this advice" in
> Documentation/admin-guide/reporting-issues.rst.

Thx for this, fixing this is a good idea. It nevertheless makes me go
"hmmm...", as the wrongly executed and maybe not obvious enough original
intention of the author (disclaimer: me) was to make it a bit clearer
that "this advice" does not only mean the one advice right before it,
but all the pieces of advice in the paragraph. It would be great to
cover that while fixing it. "pieces of advice" maybe? Not sure. Maybe
somebody has a better idea. And maybe just ignore my nitpicking, guess
"this advice" just feels too easy to misinterpret from my point of view
as someone to whom English is a second language.

Ciao, Thorsten


> Signed-off-by: Chen-Shi-Hong <eric039eric@gmail.com>
> ---
>  Documentation/admin-guide/reporting-issues.rst | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst
> index 16a66a1f1975..731865b5e8ff 100644
> --- a/Documentation/admin-guide/reporting-issues.rst
> +++ b/Documentation/admin-guide/reporting-issues.rst
> @@ -129,7 +129,7 @@ After these preparations you'll now enter the main part:
>     situations; during the merge window that actually might be even the best
>     approach, but in that development phase it can be an even better idea to
>     suspend your efforts for a few days anyway. Whatever version you choose,
> -   ideally use a 'vanilla' build. Ignoring these advices will dramatically
> +   ideally use a 'vanilla' build. Ignoring this advice will dramatically
>     increase the risk your report will be rejected or ignored.
>  
>   * Ensure the kernel you just installed does not 'taint' itself when
> @@ -795,7 +795,7 @@ Install a fresh kernel for testing
>      situations; during the merge window that actually might be even the best
>      approach, but in that development phase it can be an even better idea to
>      suspend your efforts for a few days anyway. Whatever version you choose,
> -    ideally use a 'vanilla' built. Ignoring these advices will dramatically
> +    ideally use a 'vanilla' built. Ignoring this advice will dramatically
>      increase the risk your report will be rejected or ignored.*
>  
>  As mentioned in the detailed explanation for the first step already: Like most
> 
> base-commit: 5d6919055dec134de3c40167a490f33c74c12581
> prerequisite-patch-id: 1089bde9e188a84c873ff722a776bc107a6e8103


^ permalink raw reply

* [PATCH net-next 2/2] net: ti: icssg: Add HSR and LRE PA statistics
From: MD Danish Anwar @ 2026-05-12  6:06 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Shuah Khan, MD Danish Anwar,
	Roger Quadros, Andrew Lunn, Jacob Keller, Meghana Malladi,
	David Carlier, Kevin Hao, Vadim Fedorenko
  Cc: netdev, linux-doc, linux-kernel, linux-arm-kernel,
	Vignesh Raghavendra
In-Reply-To: <20260512060627.3781329-1-danishanwar@ti.com>

Add new firmware PA statistics counters for HSR and LRE to the ethtool
statistics exposed by the ICSSG driver.

New statistics added:
 - FW_HSR_FWD_CHECK_FAIL_DROP: Packets dropped on the HSR forwarding path
 - FW_HSR_HE_CHECK_FAIL_DROP: Packets dropped on the HSR host egress path
 - FW_HSR_SKIP_HOST_DUP_DISCARD_FRAMES: Frames with duplicate discard
   skipped
 - FW_LRE_CNT_UNIQUE/DUPLICATE/MULTIPLE_RX: LRE duplicate detetcion
   counters
 - FW_LRE_CNT_RX/TX: LRE per-port frame counters
 - FW_LRE_CNT_OWN_RX: Own HSR tagged frames received
 - FW_LRE_CNT_ERRWRONGLAN: Frames with wrong LAN identifier (PRP)

Document the new HSR/LRE statistics in icssg_prueth.rst.

Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
---
 .../device_drivers/ethernet/ti/icssg_prueth.rst        | 10 ++++++++++
 drivers/net/ethernet/ti/icssg/icssg_common.c           |  7 +++++--
 drivers/net/ethernet/ti/icssg/icssg_stats.h            | 10 ++++++++++
 drivers/net/ethernet/ti/icssg/icssg_switch_map.h       | 10 ++++++++++
 4 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/device_drivers/ethernet/ti/icssg_prueth.rst b/Documentation/networking/device_drivers/ethernet/ti/icssg_prueth.rst
index da21ddf431bb..b0bda7327b2a 100644
--- a/Documentation/networking/device_drivers/ethernet/ti/icssg_prueth.rst
+++ b/Documentation/networking/device_drivers/ethernet/ti/icssg_prueth.rst
@@ -54,3 +54,13 @@ These statistics are as follows,
  - ``FW_HOST_TX_PKT_CNT``: Number of valid packets copied by RTU0 to Tx queues
  - ``FW_HOST_EGRESS_Q_PRE_OVERFLOW``: Host Egress Q (Pre-emptible) Overflow Counter
  - ``FW_HOST_EGRESS_Q_EXP_OVERFLOW``: Host Egress Q (Pre-emptible) Overflow Counter
+ - ``FW_HSR_FWD_CHECK_FAIL_DROP``: Packets dropped on the HSR forwarding path due to failed checks
+ - ``FW_HSR_HE_CHECK_FAIL_DROP``: Packets dropped on the host egress path due to failed checks
+ - ``FW_HSR_SKIP_HOST_DUP_DISCARD_FRAMES``: Frames for which the host duplicate discard check was skipped
+ - ``FW_LRE_CNT_UNIQUE_RX``: Number of frames received with no duplicate detected
+ - ``FW_LRE_CNT_DUPLICATE_RX``: Number of frames received for which exactly one duplicate was detected
+ - ``FW_LRE_CNT_MULTIPLE_RX``: Number of frames received for which more than one duplicate was detected
+ - ``FW_LRE_CNT_RX``: Number of HSR/PRP tagged frames received
+ - ``FW_LRE_CNT_TX``: Number of HSR/PRP tagged frames sent
+ - ``FW_LRE_CNT_OWN_RX``: Number of HSR/PRP tagged frames received whose source MAC matches the node's own address
+ - ``FW_LRE_CNT_ERRWRONGLAN``: Number of frames received with a wrong LAN identifier, PRP only
diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
index a28a608f9bf4..e7a51a9eee24 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_common.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
@@ -1633,7 +1633,8 @@ void icssg_ndo_get_stats64(struct net_device *ndev,
 			    emac_get_stat_by_name(emac, "FW_RX_EOF_SHORT_FRMERR") +
 			    emac_get_stat_by_name(emac, "FW_RX_B0_DROP_EARLY_EOF") +
 			    emac_get_stat_by_name(emac, "FW_RX_EXP_FRAG_Q_DROP") +
-			    emac_get_stat_by_name(emac, "FW_RX_FIFO_OVERRUN");
+			    emac_get_stat_by_name(emac, "FW_RX_FIFO_OVERRUN") +
+			    emac_get_stat_by_name(emac, "FW_LRE_CNT_ERRWRONGLAN");
 	stats->rx_dropped = ndev->stats.rx_dropped +
 			    emac_get_stat_by_name(emac, "FW_DROPPED_PKT") +
 			    emac_get_stat_by_name(emac, "FW_INF_PORT_DISABLED") +
@@ -1643,7 +1644,9 @@ void icssg_ndo_get_stats64(struct net_device *ndev,
 			    emac_get_stat_by_name(emac, "FW_INF_DROP_TAGGED") +
 			    emac_get_stat_by_name(emac, "FW_INF_DROP_PRIOTAGGED") +
 			    emac_get_stat_by_name(emac, "FW_INF_DROP_NOTAG") +
-			    emac_get_stat_by_name(emac, "FW_INF_DROP_NOTMEMBER");
+			    emac_get_stat_by_name(emac, "FW_INF_DROP_NOTMEMBER") +
+			    emac_get_stat_by_name(emac, "FW_HSR_FWD_CHECK_FAIL_DROP") +
+			    emac_get_stat_by_name(emac, "FW_HSR_HE_CHECK_FAIL_DROP");
 	stats->tx_errors  = ndev->stats.tx_errors;
 	stats->tx_dropped = ndev->stats.tx_dropped +
 			    emac_get_stat_by_name(emac, "FW_RTU_PKT_DROP") +
diff --git a/drivers/net/ethernet/ti/icssg/icssg_stats.h b/drivers/net/ethernet/ti/icssg/icssg_stats.h
index b854eb587c1e..af3fcecac403 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_stats.h
+++ b/drivers/net/ethernet/ti/icssg/icssg_stats.h
@@ -204,6 +204,16 @@ static const struct icssg_pa_stats icssg_all_pa_stats[] = {
 	ICSSG_PA_STATS(FW_HOST_TX_PKT_CNT),
 	ICSSG_PA_STATS(FW_HOST_EGRESS_Q_PRE_OVERFLOW),
 	ICSSG_PA_STATS(FW_HOST_EGRESS_Q_EXP_OVERFLOW),
+	ICSSG_PA_STATS(FW_HSR_FWD_CHECK_FAIL_DROP),
+	ICSSG_PA_STATS(FW_HSR_HE_CHECK_FAIL_DROP),
+	ICSSG_PA_STATS(FW_HSR_SKIP_HOST_DUP_DISCARD_FRAMES),
+	ICSSG_PA_STATS(FW_LRE_CNT_UNIQUE_RX),
+	ICSSG_PA_STATS(FW_LRE_CNT_DUPLICATE_RX),
+	ICSSG_PA_STATS(FW_LRE_CNT_MULTIPLE_RX),
+	ICSSG_PA_STATS(FW_LRE_CNT_RX),
+	ICSSG_PA_STATS(FW_LRE_CNT_TX),
+	ICSSG_PA_STATS(FW_LRE_CNT_OWN_RX),
+	ICSSG_PA_STATS(FW_LRE_CNT_ERRWRONGLAN),
 };
 
 #endif /* __NET_TI_ICSSG_STATS_H */
diff --git a/drivers/net/ethernet/ti/icssg/icssg_switch_map.h b/drivers/net/ethernet/ti/icssg/icssg_switch_map.h
index 7e053b8af3ec..bd2d54dd7f45 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_switch_map.h
+++ b/drivers/net/ethernet/ti/icssg/icssg_switch_map.h
@@ -266,5 +266,15 @@
 #define FW_HOST_TX_PKT_CNT		0x0250
 #define FW_HOST_EGRESS_Q_PRE_OVERFLOW	0x0258
 #define FW_HOST_EGRESS_Q_EXP_OVERFLOW	0x0260
+#define FW_HSR_FWD_CHECK_FAIL_DROP		0x0500
+#define FW_HSR_HE_CHECK_FAIL_DROP		0x0508
+#define FW_HSR_SKIP_HOST_DUP_DISCARD_FRAMES	0x0510
+#define FW_LRE_CNT_UNIQUE_RX			0x0518
+#define FW_LRE_CNT_DUPLICATE_RX			0x0520
+#define FW_LRE_CNT_MULTIPLE_RX			0x0528
+#define FW_LRE_CNT_RX				0x0530
+#define FW_LRE_CNT_TX				0x0538
+#define FW_LRE_CNT_OWN_RX			0x0540
+#define FW_LRE_CNT_ERRWRONGLAN			0x0548
 
 #endif /* __NET_TI_ICSSG_SWITCH_MAP_H  */
-- 
2.34.1


^ permalink raw reply related

* [PATCH net-next 1/2] net: ti: icssg: Derive stats array lengths from ARRAY_SIZE
From: MD Danish Anwar @ 2026-05-12  6:06 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Shuah Khan, MD Danish Anwar,
	Roger Quadros, Andrew Lunn, Jacob Keller, Meghana Malladi,
	David Carlier, Kevin Hao, Vadim Fedorenko
  Cc: netdev, linux-doc, linux-kernel, linux-arm-kernel,
	Vignesh Raghavendra
In-Reply-To: <20260512060627.3781329-1-danishanwar@ti.com>

Replace the manually maintained ICSSG_NUM_MIIG_STATS and
ICSSG_NUM_PA_STATS constants with ARRAY_SIZE() expressions derived
directly from the corresponding stat descriptor arrays, so that adding
new entries to icssg_all_miig_stats[] or icssg_all_pa_stats[] no longer
requires a separate update to a numeric constant.

To make this self-contained, break the circular include dependency
between icssg_stats.h and icssg_prueth.h:

  - icssg_stats.h previously included icssg_prueth.h (transitively
    pulling in icssg_switch_map.h and ETH_GSTRING_LEN).  Replace that
    with direct includes of <linux/ethtool.h>, <linux/kernel.h> and
    "icssg_switch_map.h".

  - icssg_prueth.h now includes icssg_stats.h, giving it access to
    the ARRAY_SIZE-based ICSSG_NUM_MIIG_STATS and ICSSG_NUM_PA_STATS
    before they are used in the prueth_emac struct and ICSSG_NUM_STATS.

Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
---
 drivers/net/ethernet/ti/icssg/icssg_prueth.h | 3 +--
 drivers/net/ethernet/ti/icssg/icssg_stats.h  | 7 ++++++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
index df93d15c5b78..e2ccecb0a0dd 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h
+++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
@@ -43,6 +43,7 @@
 
 #include "icssg_config.h"
 #include "icss_iep.h"
+#include "icssg_stats.h"
 #include "icssg_switch_map.h"
 
 #define PRUETH_MAX_MTU          (2000 - ETH_HLEN - ETH_FCS_LEN)
@@ -57,8 +58,6 @@
 
 #define ICSSG_MAX_RFLOWS	8	/* per slice */
 
-#define ICSSG_NUM_PA_STATS	32
-#define ICSSG_NUM_MIIG_STATS	60
 /* Number of ICSSG related stats */
 #define ICSSG_NUM_STATS (ICSSG_NUM_MIIG_STATS + ICSSG_NUM_PA_STATS)
 #define ICSSG_NUM_STANDARD_STATS 31
diff --git a/drivers/net/ethernet/ti/icssg/icssg_stats.h b/drivers/net/ethernet/ti/icssg/icssg_stats.h
index 5ec0b38e0c67..b854eb587c1e 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_stats.h
+++ b/drivers/net/ethernet/ti/icssg/icssg_stats.h
@@ -8,10 +8,15 @@
 #ifndef __NET_TI_ICSSG_STATS_H
 #define __NET_TI_ICSSG_STATS_H
 
-#include "icssg_prueth.h"
+#include <linux/ethtool.h>
+#include <linux/kernel.h>
+#include "icssg_switch_map.h"
 
 #define STATS_TIME_LIMIT_1G_MS    25000    /* 25 seconds @ 1G */
 
+#define ICSSG_NUM_MIIG_STATS	ARRAY_SIZE(icssg_all_miig_stats)
+#define ICSSG_NUM_PA_STATS	ARRAY_SIZE(icssg_all_pa_stats)
+
 struct miig_stats_regs {
 	/* Rx */
 	u32 rx_packets;
-- 
2.34.1


^ permalink raw reply related

* [PATCH net-next 0/2] Add ICSSG firmware stats related to HSR
From: MD Danish Anwar @ 2026-05-12  6:06 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Shuah Khan, MD Danish Anwar,
	Roger Quadros, Andrew Lunn, Jacob Keller, Meghana Malladi,
	David Carlier, Kevin Hao, Vadim Fedorenko
  Cc: netdev, linux-doc, linux-kernel, linux-arm-kernel,
	Vignesh Raghavendra

This series has two pacthes,
Patch 1/2 Updates Stats counter to use ARRAY_SIZE instead of hardcoded length.
Patch 2/2 Adds new stats related to HSR / PRP maintained by ICSSG firmware.

MD Danish Anwar (2):
  net: ti: icssg: Derive stats array lengths from ARRAY_SIZE
  net: ti: icssg: Add HSR and LRE PA statistics

 .../device_drivers/ethernet/ti/icssg_prueth.rst | 10 ++++++++++
 drivers/net/ethernet/ti/icssg/icssg_common.c    |  7 +++++--
 drivers/net/ethernet/ti/icssg/icssg_prueth.h    |  3 +--
 drivers/net/ethernet/ti/icssg/icssg_stats.h     | 17 ++++++++++++++++-
 .../net/ethernet/ti/icssg/icssg_switch_map.h    | 10 ++++++++++
 5 files changed, 42 insertions(+), 5 deletions(-)


base-commit: 63751099502d10f0aa6bb35273e56c5800cc4e3a
-- 
2.34.1


^ permalink raw reply

* [PATCH RESEND bpf-next v10 8/8] selftests/bpf: Add test cases for bpf_list_del/add/is_first/is_last/empty
From: Kaitao cheng @ 2026-05-12  5:59 UTC (permalink / raw)
  To: ast, corbet, martin.lau, daniel, andrii, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, shuah,
	chengkaitao, skhan, memxor
  Cc: bpf, linux-kernel, linux-doc, vmalik, linux-kselftest
In-Reply-To: <20260512055919.95716-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

Extend refcounted_kptr with tests for bpf_list_add (including prev from
bpf_list_front and bpf_refcount_acquire), bpf_list_del (including node
from bpf_list_front, bpf_rbtree_remove and bpf_refcount_acquire),
bpf_list_empty, bpf_list_is_first/last, and push_back on uninit head.

To verify the validity of bpf_list_del/add, the test also expects the
verifier to reject calls to bpf_list_del/add made without holding the
spin_lock.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 .../selftests/bpf/progs/refcounted_kptr.c     | 421 ++++++++++++++++++
 1 file changed, 421 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr.c b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
index c847398837cc..21ae06797b18 100644
--- a/tools/testing/selftests/bpf/progs/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
@@ -367,6 +367,427 @@ long insert_rbtree_and_stash__del_tree_##rem_tree(void *ctx)		\
 INSERT_STASH_READ(true, "insert_stash_read: remove from tree");
 INSERT_STASH_READ(false, "insert_stash_read: don't remove from tree");
 
+SEC("tc")
+__description("list_empty_test: list empty before add, non-empty after add")
+__success __retval(0)
+int list_empty_test(void *ctx)
+{
+	struct node_data *node_new;
+
+	bpf_spin_lock(&lock);
+	if (!bpf_list_empty(&head)) {
+		bpf_spin_unlock(&lock);
+		return -1;
+	}
+	bpf_spin_unlock(&lock);
+
+	node_new = bpf_obj_new(typeof(*node_new));
+	if (!node_new)
+		return -2;
+
+	bpf_spin_lock(&lock);
+	bpf_list_push_front(&head, &node_new->l);
+
+	if (bpf_list_empty(&head)) {
+		bpf_spin_unlock(&lock);
+		return -3;
+	}
+	bpf_spin_unlock(&lock);
+	return 0;
+}
+
+static struct node_data *__add_in_list(struct bpf_list_head *head,
+				       struct bpf_spin_lock *lock)
+{
+	struct node_data *node_new, *node_ref;
+
+	node_new = bpf_obj_new(typeof(*node_new));
+	if (!node_new)
+		return NULL;
+
+	node_ref = bpf_refcount_acquire(node_new);
+
+	bpf_spin_lock(lock);
+	bpf_list_push_front(head, &node_new->l);
+	bpf_spin_unlock(lock);
+	return node_ref;
+}
+
+SEC("tc")
+__description("list_is_edge_test1: is_first on first node, is_last on last node")
+__success __retval(0)
+int list_is_edge_test1(void *ctx)
+{
+	struct node_data *node_first, *node_last;
+	int err = 0;
+
+	node_last = __add_in_list(&head, &lock);
+	if (!node_last)
+		return -1;
+
+	node_first = __add_in_list(&head, &lock);
+	if (!node_first) {
+		bpf_obj_drop(node_last);
+		return -2;
+	}
+
+	bpf_spin_lock(&lock);
+	if (!bpf_list_is_first(&head, &node_first->l)) {
+		err = -3;
+		goto fail;
+	}
+	if (!bpf_list_is_last(&head, &node_last->l))
+		err = -4;
+
+fail:
+	bpf_spin_unlock(&lock);
+	bpf_obj_drop(node_first);
+	bpf_obj_drop(node_last);
+	return err;
+}
+
+SEC("tc")
+__description("list_is_edge_test2: accept list_front/list_back return value")
+__success __retval(0)
+int list_is_edge_test2(void *ctx)
+{
+	struct bpf_list_node *front, *back;
+	struct node_data *a, *b;
+	long err = 0;
+
+	a = __add_in_list(&head, &lock);
+	if (!a)
+		return -1;
+
+	b = __add_in_list(&head, &lock);
+	if (!b) {
+		bpf_obj_drop(a);
+		return -2;
+	}
+
+	bpf_spin_lock(&lock);
+	front = bpf_list_front(&head);
+	back = bpf_list_back(&head);
+	if (!front || !back) {
+		err = -3;
+		goto out_unlock;
+	}
+
+	if (!bpf_list_is_first(&head, front) || bpf_list_is_last(&head, front)) {
+		err = -4;
+		goto out_unlock;
+	}
+
+	if (!bpf_list_is_last(&head, back) || bpf_list_is_first(&head, back)) {
+		err = -5;
+		goto out_unlock;
+	}
+
+out_unlock:
+	bpf_spin_unlock(&lock);
+	bpf_obj_drop(a);
+	bpf_obj_drop(b);
+	return err;
+}
+
+SEC("tc")
+__description("list_is_edge_test3: single node is both first and last")
+__success __retval(0)
+int list_is_edge_test3(void *ctx)
+{
+	struct node_data *tmp;
+	struct bpf_list_node *node;
+	long err = 0;
+
+	tmp = __add_in_list(&head, &lock);
+	if (!tmp)
+		return -1;
+
+	bpf_spin_lock(&lock);
+	node = bpf_list_front(&head);
+	if (!node) {
+		bpf_spin_unlock(&lock);
+		bpf_obj_drop(tmp);
+		return -2;
+	}
+
+	if (!bpf_list_is_first(&head, node) || !bpf_list_is_last(&head, node))
+		err = -3;
+	bpf_spin_unlock(&lock);
+
+	bpf_obj_drop(tmp);
+	return err;
+}
+
+SEC("tc")
+__description("list_del_test1: del returns removed nodes")
+__success __retval(0)
+int list_del_test1(void *ctx)
+{
+	struct node_data *node_first, *node_last;
+	struct bpf_list_node *bpf_node_first, *bpf_node_last;
+	int err = 0;
+
+	node_last = __add_in_list(&head, &lock);
+	if (!node_last)
+		return -1;
+
+	node_first = __add_in_list(&head, &lock);
+	if (!node_first) {
+		bpf_obj_drop(node_last);
+		return -2;
+	}
+
+	bpf_spin_lock(&lock);
+	bpf_node_last = bpf_list_del(&head, &node_last->l);
+	bpf_node_first = bpf_list_del(&head, &node_first->l);
+	bpf_spin_unlock(&lock);
+
+	if (bpf_node_first)
+		bpf_obj_drop(container_of(bpf_node_first, struct node_data, l));
+	else
+		err = -3;
+
+	if (bpf_node_last)
+		bpf_obj_drop(container_of(bpf_node_last, struct node_data, l));
+	else
+		err = -4;
+
+	bpf_obj_drop(node_first);
+	bpf_obj_drop(node_last);
+	return err;
+}
+
+SEC("tc")
+__description("list_del_test2: remove an arbitrary node from the list")
+__success __retval(0)
+int list_del_test2(void *ctx)
+{
+	struct bpf_rb_node *rb;
+	struct bpf_list_node *l;
+	struct node_data *n;
+	long err;
+
+	err = __insert_in_tree_and_list(&head, &root, &lock);
+	if (err)
+		return err;
+
+	bpf_spin_lock(&lock);
+	rb = bpf_rbtree_first(&root);
+	if (!rb) {
+		bpf_spin_unlock(&lock);
+		return -4;
+	}
+
+	rb = bpf_rbtree_remove(&root, rb);
+	if (!rb) {
+		bpf_spin_unlock(&lock);
+		return -5;
+	}
+
+	n = container_of(rb, struct node_data, r);
+	l = bpf_list_del(&head, &n->l);
+	bpf_spin_unlock(&lock);
+	bpf_obj_drop(n);
+	if (!l)
+		return -6;
+
+	bpf_obj_drop(container_of(l, struct node_data, l));
+	return 0;
+}
+
+SEC("tc")
+__description("list_del_test3: list_del accepts list_front return value as node")
+__success __retval(0)
+int list_del_test3(void *ctx)
+{
+	struct node_data *tmp;
+	struct bpf_list_node *bpf_node, *l;
+	long err = 0;
+
+	tmp = __add_in_list(&head, &lock);
+	if (!tmp)
+		return -1;
+
+	bpf_spin_lock(&lock);
+	bpf_node = bpf_list_front(&head);
+	if (!bpf_node) {
+		bpf_spin_unlock(&lock);
+		err = -2;
+		goto fail;
+	}
+
+	l = bpf_list_del(&head, bpf_node);
+	bpf_spin_unlock(&lock);
+	if (!l) {
+		err = -3;
+		goto fail;
+	}
+
+	bpf_obj_drop(container_of(l, struct node_data, l));
+	bpf_obj_drop(tmp);
+	return 0;
+
+fail:
+	bpf_obj_drop(tmp);
+	return err;
+}
+
+SEC("tc")
+__description("list_add_test1: insert new node after prev")
+__success __retval(0)
+int list_add_test1(void *ctx)
+{
+	struct node_data *node_first;
+	struct node_data *new_node;
+	long err = 0;
+
+	node_first = __add_in_list(&head, &lock);
+	if (!node_first)
+		return -1;
+
+	new_node = bpf_obj_new(typeof(*new_node));
+	if (!new_node) {
+		err = -2;
+		goto fail;
+	}
+
+	bpf_spin_lock(&lock);
+	err = bpf_list_add(&head, &new_node->l, &node_first->l);
+	bpf_spin_unlock(&lock);
+	if (err) {
+		err = -3;
+		goto fail;
+	}
+
+fail:
+	bpf_obj_drop(node_first);
+	return err;
+}
+
+SEC("tc")
+__description("list_add_test2: list_add accepts list_front return value as prev")
+__success __retval(0)
+int list_add_test2(void *ctx)
+{
+	struct node_data *new_node, *tmp;
+	struct bpf_list_node *bpf_node;
+	long err = 0;
+
+	tmp = __add_in_list(&head, &lock);
+	if (!tmp)
+		return -1;
+
+	new_node = bpf_obj_new(typeof(*new_node));
+	if (!new_node) {
+		err = -2;
+		goto fail;
+	}
+
+	bpf_spin_lock(&lock);
+	bpf_node = bpf_list_front(&head);
+	if (!bpf_node) {
+		bpf_spin_unlock(&lock);
+		bpf_obj_drop(new_node);
+		err = -3;
+		goto fail;
+	}
+
+	err = bpf_list_add(&head, &new_node->l, bpf_node);
+	bpf_spin_unlock(&lock);
+	if (err) {
+		err = -4;
+		goto fail;
+	}
+
+fail:
+	bpf_obj_drop(tmp);
+	return err;
+}
+
+struct uninit_head_val {
+	struct bpf_spin_lock lock;
+	struct bpf_list_head head __contains(node_data, l);
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__type(key, int);
+	__type(value, struct uninit_head_val);
+	__uint(max_entries, 1);
+} uninit_head_map SEC(".maps");
+
+SEC("tc")
+__description("list_push_back_uninit_head: push_back on 0-initialized list head")
+__success __retval(0)
+int list_push_back_uninit_head(void *ctx)
+{
+	struct uninit_head_val *st;
+	struct node_data *node;
+	int ret = -1, key = 0;
+
+	st = bpf_map_lookup_elem(&uninit_head_map, &key);
+	if (!st)
+		return -1;
+
+	node = bpf_obj_new(typeof(*node));
+	if (!node)
+		return -1;
+
+	bpf_spin_lock(&st->lock);
+	ret = bpf_list_push_back(&st->head, &node->l);
+	bpf_spin_unlock(&st->lock);
+
+	return ret;
+}
+
+SEC("?tc")
+__failure __msg("bpf_spin_lock at off=32 must be held for bpf_list_head")
+long list_del_without_lock_fail(void *ctx)
+{
+	struct node_data *n;
+	struct bpf_list_node *l;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return -1;
+
+	/* Error case: delete list node without holding lock */
+	l = bpf_list_del(&head, &n->l);
+	bpf_obj_drop(n);
+	if (!l)
+		return -2;
+	bpf_obj_drop(container_of(l, struct node_data, l));
+
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("bpf_spin_lock at off=32 must be held for bpf_list_head")
+long list_add_without_lock_fail(void *ctx)
+{
+	struct node_data *n, *prev;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return -1;
+
+	prev = bpf_obj_new(typeof(*prev));
+	if (!prev) {
+		bpf_obj_drop(n);
+		return -1;
+	}
+
+	/* Error case: add list node without holding lock */
+	if (bpf_list_add(&head, &n->l, &prev->l)) {
+		bpf_obj_drop(prev);
+		bpf_obj_drop(n);
+		return -2;
+	}
+
+	return 0;
+}
+
 SEC("tc")
 __success
 long rbtree_refcounted_node_ref_escapes(void *ctx)
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related

* [PATCH RESEND bpf-next v10 7/8] bpf: allow non-owning list-node args via __nonown_allowed
From: Kaitao cheng @ 2026-05-12  5:59 UTC (permalink / raw)
  To: ast, corbet, martin.lau, daniel, andrii, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, shuah,
	chengkaitao, skhan, memxor
  Cc: bpf, linux-kernel, linux-doc, vmalik, linux-kselftest
In-Reply-To: <20260512055919.95716-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

KF_ARG_PTR_TO_LIST_NODE normally requires an owning reference
(PTR_TO_BTF_ID | MEM_ALLOC with ref_obj_id). Introduce and use
the __nonown_allowed annotation on selected list-node arguments
so non-owning references with ref_obj_id==0 are accepted as well.

This enables passing bpf_list_front() / bpf_list_back() results to:

bpf_list_add() as insertion point (prev)
bpf_list_del() as deletion target (node)
bpf_list_is_first/last() as query target (node)

Verifier keeps existing owning-ref checks by default; only arguments
annotated with __nonown_allowed bypass MEM_ALLOC/ref_obj_id checks
and then follow the same list-node validation path.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 Documentation/bpf/kfuncs.rst | 22 ++++++++++++++++++++--
 kernel/bpf/helpers.c         | 20 +++++++++++---------
 kernel/bpf/verifier.c        | 13 +++++++++++++
 3 files changed, 44 insertions(+), 11 deletions(-)

diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index 75e6c078e0e7..3a9db1108b95 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -207,8 +207,26 @@ Here, the buffer may be NULL. If the buffer is not NULL, it must be at least
 buffer__szk bytes in size. The kfunc is responsible for checking if the buffer
 is NULL before using it.
 
-2.3.5 __str Annotation
-----------------------------
+2.3.5 __nonown_allowed Annotation
+---------------------------------
+
+This annotation is used to indicate that the parameter may be a non-owning reference.
+
+An example is given below::
+
+        __bpf_kfunc int bpf_list_add(..., struct bpf_list_node
+                                     *prev__nonown_allowed, ...)
+        {
+                ...
+        }
+
+For the ``prev__nonown_allowed`` parameter (resolved as ``KF_ARG_PTR_TO_LIST_NODE``),
+suffix ``__nonown_allowed`` retains the usual owning-pointer rules and also
+permits a non-owning reference with no ref_obj_id (e.g. the return value of
+bpf_list_front() / bpf_list_back()).
+
+2.3.6 __str Annotation
+----------------------
 This annotation is used to indicate that the argument is a constant string.
 
 An example is given below::
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index dfd465badd9d..f2f8705f0e9a 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2571,10 +2571,10 @@ __bpf_kfunc int bpf_list_push_back_impl(struct bpf_list_head *head,
 }
 
 __bpf_kfunc int bpf_list_add(struct bpf_list_head *head, struct bpf_list_node *new,
-			     struct bpf_list_node *prev, struct btf_struct_meta *meta,
-			     u64 off)
+			     struct bpf_list_node *prev__nonown_allowed,
+			     struct btf_struct_meta *meta, u64 off)
 {
-	struct bpf_list_node_kern *n = (void *)new, *p = (void *)prev;
+	struct bpf_list_node_kern *n = (void *)new, *p = (void *)prev__nonown_allowed;
 	struct list_head *prev_ptr = &p->list_head;
 
 	return __bpf_list_add(n, head, &prev_ptr, meta ? meta->record : NULL, off);
@@ -2620,9 +2620,9 @@ __bpf_kfunc struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head)
 }
 
 __bpf_kfunc struct bpf_list_node *bpf_list_del(struct bpf_list_head *head,
-					       struct bpf_list_node *node)
+				struct bpf_list_node *node__nonown_allowed)
 {
-	struct bpf_list_node_kern *kn = (void *)node;
+	struct bpf_list_node_kern *kn = (void *)node__nonown_allowed;
 
 	/* verifier guarantees node is a list node rather than list head */
 	return __bpf_list_del(head, &kn->list_head);
@@ -2648,10 +2648,11 @@ __bpf_kfunc struct bpf_list_node *bpf_list_back(struct bpf_list_head *head)
 	return (struct bpf_list_node *)h->prev;
 }
 
-__bpf_kfunc bool bpf_list_is_first(struct bpf_list_head *head, struct bpf_list_node *node)
+__bpf_kfunc bool bpf_list_is_first(struct bpf_list_head *head,
+				   struct bpf_list_node *node__nonown_allowed)
 {
 	struct list_head *h = (struct list_head *)head;
-	struct bpf_list_node_kern *kn = (struct bpf_list_node_kern *)node;
+	struct bpf_list_node_kern *kn = (struct bpf_list_node_kern *)node__nonown_allowed;
 
 	if (READ_ONCE(kn->owner) != head)
 		return false;
@@ -2659,10 +2660,11 @@ __bpf_kfunc bool bpf_list_is_first(struct bpf_list_head *head, struct bpf_list_n
 	return list_is_first(&kn->list_head, h);
 }
 
-__bpf_kfunc bool bpf_list_is_last(struct bpf_list_head *head, struct bpf_list_node *node)
+__bpf_kfunc bool bpf_list_is_last(struct bpf_list_head *head,
+				  struct bpf_list_node *node__nonown_allowed)
 {
 	struct list_head *h = (struct list_head *)head;
-	struct bpf_list_node_kern *kn = (struct bpf_list_node_kern *)node;
+	struct bpf_list_node_kern *kn = (struct bpf_list_node_kern *)node__nonown_allowed;
 
 	if (READ_ONCE(kn->owner) != head)
 		return false;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ca33f35bc3eb..08ab337866bf 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10502,6 +10502,11 @@ static bool is_kfunc_arg_nullable(const struct btf *btf, const struct btf_param
 	return btf_param_match_suffix(btf, arg, "__nullable");
 }
 
+static bool is_kfunc_arg_nonown_allowed(const struct btf *btf, const struct btf_param *arg)
+{
+	return btf_param_match_suffix(btf, arg, "__nonown_allowed");
+}
+
 static bool is_kfunc_arg_const_str(const struct btf *btf, const struct btf_param *arg)
 {
 	return btf_param_match_suffix(btf, arg, "__str");
@@ -12017,6 +12022,13 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				return ret;
 			break;
 		case KF_ARG_PTR_TO_LIST_NODE:
+			if (is_kfunc_arg_nonown_allowed(btf, &args[i]) &&
+			    type_is_non_owning_ref(reg->type) && !reg->ref_obj_id) {
+				/* Allow bpf_list_front/back return value for
+				 * __nonown_allowed list-node arguments.
+				 */
+				goto check_ok;
+			}
 			if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
 				verbose(env, "%s expected pointer to allocated object\n",
 					reg_arg_name(env, argno));
@@ -12026,6 +12038,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				verbose(env, "allocated object must be referenced\n");
 				return -EINVAL;
 			}
+check_ok:
 			ret = process_kf_arg_ptr_to_list_node(env, reg, argno, meta);
 			if (ret < 0)
 				return ret;
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related

* [PATCH RESEND bpf-next v10 6/8] bpf: add bpf_list_is_first/last/empty kfuncs
From: Kaitao cheng @ 2026-05-12  5:59 UTC (permalink / raw)
  To: ast, corbet, martin.lau, daniel, andrii, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, shuah,
	chengkaitao, skhan, memxor
  Cc: bpf, linux-kernel, linux-doc, vmalik, linux-kselftest,
	Emil Tsalapatis
In-Reply-To: <20260512055919.95716-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

Add three kfuncs for BPF linked list queries:
- bpf_list_is_first(head, node): true if node is the first in the list.
- bpf_list_is_last(head, node): true if node is the last in the list.
- bpf_list_empty(head): true if the list has no entries.

Currently, without these kfuncs, to implement the above functionality
it is necessary to first call bpf_list_pop_front/back to retrieve the
first or last node before checking whether the passed-in node was the
first or last one. After the check, the node had to be pushed back into
the list using bpf_list_push_front/back, which was very inefficient.

Now, with the bpf_list_is_first/last/empty kfuncs, we can directly
check whether a node is the first, last, or whether the list is empty,
without having to first retrieve the node.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
---
 kernel/bpf/helpers.c  | 38 ++++++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c | 15 +++++++++++++--
 2 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 2b8e8d4284a5..dfd465badd9d 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2648,6 +2648,41 @@ __bpf_kfunc struct bpf_list_node *bpf_list_back(struct bpf_list_head *head)
 	return (struct bpf_list_node *)h->prev;
 }
 
+__bpf_kfunc bool bpf_list_is_first(struct bpf_list_head *head, struct bpf_list_node *node)
+{
+	struct list_head *h = (struct list_head *)head;
+	struct bpf_list_node_kern *kn = (struct bpf_list_node_kern *)node;
+
+	if (READ_ONCE(kn->owner) != head)
+		return false;
+
+	return list_is_first(&kn->list_head, h);
+}
+
+__bpf_kfunc bool bpf_list_is_last(struct bpf_list_head *head, struct bpf_list_node *node)
+{
+	struct list_head *h = (struct list_head *)head;
+	struct bpf_list_node_kern *kn = (struct bpf_list_node_kern *)node;
+
+	if (READ_ONCE(kn->owner) != head)
+		return false;
+
+	return list_is_last(&kn->list_head, h);
+}
+
+__bpf_kfunc bool bpf_list_empty(struct bpf_list_head *head)
+{
+	struct list_head *h = (struct list_head *)head;
+
+	/* If list_head was 0-initialized by map, bpf_obj_init_field wasn't
+	 * called on its fields, so init here
+	 */
+	if (unlikely(!h->next))
+		INIT_LIST_HEAD(h);
+
+	return list_empty(h);
+}
+
 __bpf_kfunc struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root,
 						  struct bpf_rb_node *node)
 {
@@ -4764,6 +4799,9 @@ BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_list_del, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_list_front, KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_list_back, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_list_is_first)
+BTF_ID_FLAGS(func, bpf_list_is_last)
+BTF_ID_FLAGS(func, bpf_list_empty)
 BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE)
 BTF_ID_FLAGS(func, bpf_rbtree_remove, KF_ACQUIRE | KF_RET_NULL)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 50f8732aa065..ca33f35bc3eb 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10748,6 +10748,9 @@ enum special_kfunc_type {
 	KF_bpf_list_del,
 	KF_bpf_list_front,
 	KF_bpf_list_back,
+	KF_bpf_list_is_first,
+	KF_bpf_list_is_last,
+	KF_bpf_list_empty,
 	KF_bpf_cast_to_kern_ctx,
 	KF_bpf_rdonly_cast,
 	KF_bpf_rcu_read_lock,
@@ -10818,6 +10821,9 @@ BTF_ID(func, bpf_list_pop_back)
 BTF_ID(func, bpf_list_del)
 BTF_ID(func, bpf_list_front)
 BTF_ID(func, bpf_list_back)
+BTF_ID(func, bpf_list_is_first)
+BTF_ID(func, bpf_list_is_last)
+BTF_ID(func, bpf_list_empty)
 BTF_ID(func, bpf_cast_to_kern_ctx)
 BTF_ID(func, bpf_rdonly_cast)
 BTF_ID(func, bpf_rcu_read_lock)
@@ -11341,7 +11347,10 @@ static bool is_bpf_list_api_kfunc(u32 btf_id)
 	       btf_id == special_kfunc_list[KF_bpf_list_pop_back] ||
 	       btf_id == special_kfunc_list[KF_bpf_list_del] ||
 	       btf_id == special_kfunc_list[KF_bpf_list_front] ||
-	       btf_id == special_kfunc_list[KF_bpf_list_back];
+	       btf_id == special_kfunc_list[KF_bpf_list_back] ||
+	       btf_id == special_kfunc_list[KF_bpf_list_is_first] ||
+	       btf_id == special_kfunc_list[KF_bpf_list_is_last] ||
+	       btf_id == special_kfunc_list[KF_bpf_list_empty];
 }
 
 static bool is_bpf_rbtree_api_kfunc(u32 btf_id)
@@ -11463,7 +11472,9 @@ static bool check_kfunc_is_graph_node_api(struct bpf_verifier_env *env,
 	switch (node_field_type) {
 	case BPF_LIST_NODE:
 		ret = is_bpf_list_push_kfunc(kfunc_btf_id) ||
-		      kfunc_btf_id == special_kfunc_list[KF_bpf_list_del];
+		      kfunc_btf_id == special_kfunc_list[KF_bpf_list_del] ||
+		      kfunc_btf_id == special_kfunc_list[KF_bpf_list_is_first] ||
+		      kfunc_btf_id == special_kfunc_list[KF_bpf_list_is_last];
 		break;
 	case BPF_RB_NODE:
 		ret = (is_bpf_rbtree_add_kfunc(kfunc_btf_id) ||
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related

* [PATCH RESEND bpf-next v10 5/8] bpf: Add bpf_list_add to insert node after a given list node
From: Kaitao cheng @ 2026-05-12  5:59 UTC (permalink / raw)
  To: ast, corbet, martin.lau, daniel, andrii, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, shuah,
	chengkaitao, skhan, memxor
  Cc: bpf, linux-kernel, linux-doc, vmalik, linux-kselftest
In-Reply-To: <20260512055919.95716-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

Add a new kfunc bpf_list_add(head, new, prev, meta, off) that
inserts 'new' after 'prev' in the BPF linked list. Both must be in
the same list; 'prev' must already be in the list. The new node must
be an owning reference (e.g. from bpf_obj_new); the kfunc consumes
that reference and the node becomes non-owning once inserted.

We have added an additional parameter bpf_list_head *head to
bpf_list_add, as the verifier requires the head parameter to
check whether the lock is being held.

Returns 0 on success, -EINVAL if 'prev' is not in a list or 'new'
is already in a list (or duplicate insertion). On failure, the
kernel drops the passed-in node.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 kernel/bpf/helpers.c  | 11 +++++++++++
 kernel/bpf/verifier.c | 12 +++++++++---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 5388078f3171..2b8e8d4284a5 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2570,6 +2570,16 @@ __bpf_kfunc int bpf_list_push_back_impl(struct bpf_list_head *head,
 	return bpf_list_push_back(head, node, meta__ign, off);
 }
 
+__bpf_kfunc int bpf_list_add(struct bpf_list_head *head, struct bpf_list_node *new,
+			     struct bpf_list_node *prev, struct btf_struct_meta *meta,
+			     u64 off)
+{
+	struct bpf_list_node_kern *n = (void *)new, *p = (void *)prev;
+	struct list_head *prev_ptr = &p->list_head;
+
+	return __bpf_list_add(n, head, &prev_ptr, meta ? meta->record : NULL, off);
+}
+
 static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head,
 					    struct list_head *n)
 {
@@ -4748,6 +4758,7 @@ BTF_ID_FLAGS(func, bpf_list_push_front, KF_IMPLICIT_ARGS)
 BTF_ID_FLAGS(func, bpf_list_push_front_impl)
 BTF_ID_FLAGS(func, bpf_list_push_back, KF_IMPLICIT_ARGS)
 BTF_ID_FLAGS(func, bpf_list_push_back_impl)
+BTF_ID_FLAGS(func, bpf_list_add, KF_IMPLICIT_ARGS)
 BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_list_del, KF_ACQUIRE | KF_RET_NULL)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3c0e0076bd69..50f8732aa065 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10742,6 +10742,7 @@ enum special_kfunc_type {
 	KF_bpf_list_push_front,
 	KF_bpf_list_push_back_impl,
 	KF_bpf_list_push_back,
+	KF_bpf_list_add,
 	KF_bpf_list_pop_front,
 	KF_bpf_list_pop_back,
 	KF_bpf_list_del,
@@ -10811,6 +10812,7 @@ BTF_ID(func, bpf_list_push_front_impl)
 BTF_ID(func, bpf_list_push_front)
 BTF_ID(func, bpf_list_push_back_impl)
 BTF_ID(func, bpf_list_push_back)
+BTF_ID(func, bpf_list_add)
 BTF_ID(func, bpf_list_pop_front)
 BTF_ID(func, bpf_list_pop_back)
 BTF_ID(func, bpf_list_del)
@@ -10923,7 +10925,8 @@ static bool is_bpf_list_push_kfunc(u32 func_id)
 	return func_id == special_kfunc_list[KF_bpf_list_push_front] ||
 	       func_id == special_kfunc_list[KF_bpf_list_push_front_impl] ||
 	       func_id == special_kfunc_list[KF_bpf_list_push_back] ||
-	       func_id == special_kfunc_list[KF_bpf_list_push_back_impl];
+	       func_id == special_kfunc_list[KF_bpf_list_push_back_impl] ||
+	       func_id == special_kfunc_list[KF_bpf_list_add];
 }
 
 static bool is_bpf_rbtree_add_kfunc(u32 func_id)
@@ -19228,8 +19231,11 @@ int bpf_fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		int struct_meta_reg = BPF_REG_3;
 		int node_offset_reg = BPF_REG_4;
 
-		/* rbtree_add has extra 'less' arg, so args-to-fixup are in diff regs */
-		if (is_bpf_rbtree_add_kfunc(desc->func_id)) {
+		/* list_add/rbtree_add have an extra arg (prev/less),
+		 * so args-to-fixup are in diff regs.
+		 */
+		if (desc->func_id == special_kfunc_list[KF_bpf_list_add] ||
+		    is_bpf_rbtree_add_kfunc(desc->func_id)) {
 			struct_meta_reg = BPF_REG_4;
 			node_offset_reg = BPF_REG_5;
 		}
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related

* [PATCH RESEND bpf-next v10 4/8] bpf: refactor __bpf_list_add to take insertion point via **prev_ptr
From: Kaitao cheng @ 2026-05-12  5:59 UTC (permalink / raw)
  To: ast, corbet, martin.lau, daniel, andrii, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, shuah,
	chengkaitao, skhan, memxor
  Cc: bpf, linux-kernel, linux-doc, vmalik, linux-kselftest
In-Reply-To: <20260512055919.95716-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

Refactor __bpf_list_add to accept (node, head, struct list_head **prev_ptr,
..) instead of (node, head, bool tail, ..). Load prev from *prev_ptr after
INIT_LIST_HEAD(h), so we never dereference an uninitialized h->prev when
head was 0-initialized (e.g. push_back passes &h->prev).

When prev is not the list head, validate that prev is in the list via
its owner.

Prepares for bpf_list_add(head, new, prev, ..) to insert after a given
list node.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 kernel/bpf/helpers.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 51b6ea4bb8cb..5388078f3171 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2471,9 +2471,11 @@ __bpf_kfunc void *bpf_refcount_acquire_impl(void *p__refcounted_kptr, void *meta
 
 static int __bpf_list_add(struct bpf_list_node_kern *node,
 			  struct bpf_list_head *head,
-			  bool tail, struct btf_record *rec, u64 off)
+			  struct list_head **prev_ptr,
+			  struct btf_record *rec, u64 off)
 {
 	struct list_head *n = &node->list_head, *h = (void *)head;
+	struct list_head *prev;
 
 	/* If list_head was 0-initialized by map, bpf_obj_init_field wasn't
 	 * called on its fields, so init here
@@ -2481,19 +2483,31 @@ static int __bpf_list_add(struct bpf_list_node_kern *node,
 	if (unlikely(!h->next))
 		INIT_LIST_HEAD(h);
 
+	prev = *prev_ptr;
+
+	/* When prev is not the list head, it must be a node in this list. */
+	if (prev != h) {
+		struct bpf_list_node_kern *prev_kn =
+			container_of(prev, struct bpf_list_node_kern, list_head);
+
+		if (unlikely(READ_ONCE(prev_kn->owner) != head))
+			goto fail;
+	}
+
 	/* node->owner != NULL implies !list_empty(n), no need to separately
 	 * check the latter
 	 */
-	if (cmpxchg(&node->owner, NULL, BPF_PTR_POISON)) {
-		/* Only called from BPF prog, no need to migrate_disable */
-		__bpf_obj_drop_impl((void *)n - off, rec, false);
-		return -EINVAL;
-	}
+	if (cmpxchg(&node->owner, NULL, BPF_PTR_POISON))
+		goto fail;
 
-	tail ? list_add_tail(n, h) : list_add(n, h);
+	list_add(n, prev);
 	WRITE_ONCE(node->owner, head);
-
 	return 0;
+
+fail:
+	/* Only called from BPF prog, no need to migrate_disable */
+	__bpf_obj_drop_impl((void *)n - off, rec, false);
+	return -EINVAL;
 }
 
 /**
@@ -2514,8 +2528,9 @@ __bpf_kfunc int bpf_list_push_front(struct bpf_list_head *head,
 				    u64 off)
 {
 	struct bpf_list_node_kern *n = (void *)node;
+	struct list_head *h = (void *)head;
 
-	return __bpf_list_add(n, head, false, meta ? meta->record : NULL, off);
+	return __bpf_list_add(n, head, &h, meta ? meta->record : NULL, off);
 }
 
 __bpf_kfunc int bpf_list_push_front_impl(struct bpf_list_head *head,
@@ -2543,8 +2558,9 @@ __bpf_kfunc int bpf_list_push_back(struct bpf_list_head *head,
 				   u64 off)
 {
 	struct bpf_list_node_kern *n = (void *)node;
+	struct list_head *h = (void *)head;
 
-	return __bpf_list_add(n, head, true, meta ? meta->record : NULL, off);
+	return __bpf_list_add(n, head, &h->prev, meta ? meta->record : NULL, off);
 }
 
 __bpf_kfunc int bpf_list_push_back_impl(struct bpf_list_head *head,
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related

* [PATCH RESEND bpf-next v10 3/8] bpf: Introduce the bpf_list_del kfunc.
From: Kaitao cheng @ 2026-05-12  5:59 UTC (permalink / raw)
  To: ast, corbet, martin.lau, daniel, andrii, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, shuah,
	chengkaitao, skhan, memxor
  Cc: bpf, linux-kernel, linux-doc, vmalik, linux-kselftest
In-Reply-To: <20260512055919.95716-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

Allow users to remove any node from a linked list.

We have added an additional parameter bpf_list_head *head to
bpf_list_del, as the verifier requires the head parameter to
check whether the lock is being held.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 kernel/bpf/helpers.c  | 10 ++++++++++
 kernel/bpf/verifier.c |  6 +++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 1e8754877dd1..51b6ea4bb8cb 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2593,6 +2593,15 @@ __bpf_kfunc struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head)
 	return __bpf_list_del(head, h->prev);
 }
 
+__bpf_kfunc struct bpf_list_node *bpf_list_del(struct bpf_list_head *head,
+					       struct bpf_list_node *node)
+{
+	struct bpf_list_node_kern *kn = (void *)node;
+
+	/* verifier guarantees node is a list node rather than list head */
+	return __bpf_list_del(head, &kn->list_head);
+}
+
 __bpf_kfunc struct bpf_list_node *bpf_list_front(struct bpf_list_head *head)
 {
 	struct list_head *h = (struct list_head *)head;
@@ -4725,6 +4734,7 @@ BTF_ID_FLAGS(func, bpf_list_push_back, KF_IMPLICIT_ARGS)
 BTF_ID_FLAGS(func, bpf_list_push_back_impl)
 BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_list_del, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_list_front, KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_list_back, KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 03f9e16c2abe..3c0e0076bd69 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10744,6 +10744,7 @@ enum special_kfunc_type {
 	KF_bpf_list_push_back,
 	KF_bpf_list_pop_front,
 	KF_bpf_list_pop_back,
+	KF_bpf_list_del,
 	KF_bpf_list_front,
 	KF_bpf_list_back,
 	KF_bpf_cast_to_kern_ctx,
@@ -10812,6 +10813,7 @@ BTF_ID(func, bpf_list_push_back_impl)
 BTF_ID(func, bpf_list_push_back)
 BTF_ID(func, bpf_list_pop_front)
 BTF_ID(func, bpf_list_pop_back)
+BTF_ID(func, bpf_list_del)
 BTF_ID(func, bpf_list_front)
 BTF_ID(func, bpf_list_back)
 BTF_ID(func, bpf_cast_to_kern_ctx)
@@ -11334,6 +11336,7 @@ static bool is_bpf_list_api_kfunc(u32 btf_id)
 	return is_bpf_list_push_kfunc(btf_id) ||
 	       btf_id == special_kfunc_list[KF_bpf_list_pop_front] ||
 	       btf_id == special_kfunc_list[KF_bpf_list_pop_back] ||
+	       btf_id == special_kfunc_list[KF_bpf_list_del] ||
 	       btf_id == special_kfunc_list[KF_bpf_list_front] ||
 	       btf_id == special_kfunc_list[KF_bpf_list_back];
 }
@@ -11456,7 +11459,8 @@ static bool check_kfunc_is_graph_node_api(struct bpf_verifier_env *env,
 
 	switch (node_field_type) {
 	case BPF_LIST_NODE:
-		ret = is_bpf_list_push_kfunc(kfunc_btf_id);
+		ret = is_bpf_list_push_kfunc(kfunc_btf_id) ||
+		      kfunc_btf_id == special_kfunc_list[KF_bpf_list_del];
 		break;
 	case BPF_RB_NODE:
 		ret = (is_bpf_rbtree_add_kfunc(kfunc_btf_id) ||
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related

* [PATCH RESEND bpf-next v10 2/8] bpf: clear list node owner and unlink before drop
From: Kaitao cheng @ 2026-05-12  5:59 UTC (permalink / raw)
  To: ast, corbet, martin.lau, daniel, andrii, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, shuah,
	chengkaitao, skhan, memxor
  Cc: bpf, linux-kernel, linux-doc, vmalik, linux-kselftest
In-Reply-To: <20260512055919.95716-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

When draining a BPF list_head, clear each node's owner pointer while still
holding the spinlock, so concurrent readers always see a consistent owner.

Delink each node with list_del_init() before calling __bpf_obj_drop_impl(),
preventing subsequent users who hold a reference count to the node from
acquiring an invalid next node.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 kernel/bpf/helpers.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 9cd7b028592c..1e8754877dd1 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2247,10 +2247,11 @@ EXPORT_SYMBOL_GPL(bpf_base_func_proto);
 void bpf_list_head_free(const struct btf_field *field, void *list_head,
 			struct bpf_spin_lock *spin_lock)
 {
-	struct list_head *head = list_head, *orig_head = list_head;
+	struct list_head *head = list_head, drain, *pos, *n;
 
 	BUILD_BUG_ON(sizeof(struct list_head) > sizeof(struct bpf_list_head));
 	BUILD_BUG_ON(__alignof__(struct list_head) > __alignof__(struct bpf_list_head));
+	INIT_LIST_HEAD(&drain);
 
 	/* Do the actual list draining outside the lock to not hold the lock for
 	 * too long, and also prevent deadlocks if tracing programs end up
@@ -2261,20 +2262,23 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
 	__bpf_spin_lock_irqsave(spin_lock);
 	if (!head->next || list_empty(head))
 		goto unlock;
-	head = head->next;
+	list_for_each_safe(pos, n, head) {
+		WRITE_ONCE(container_of(pos,
+			struct bpf_list_node_kern, list_head)->owner, NULL);
+		list_move_tail(pos, &drain);
+	}
 unlock:
-	INIT_LIST_HEAD(orig_head);
+	INIT_LIST_HEAD(head);
 	__bpf_spin_unlock_irqrestore(spin_lock);
 
-	while (head != orig_head) {
-		void *obj = head;
-
-		obj -= field->graph_root.node_offset;
-		head = head->next;
+	while (!list_empty(&drain)) {
+		pos = drain.next;
+		list_del_init(pos);
 		/* The contained type can also have resources, including a
 		 * bpf_list_head which needs to be freed.
 		 */
-		__bpf_obj_drop_impl(obj, field->graph_root.value_rec, false);
+		__bpf_obj_drop_impl((char *)pos - field->graph_root.node_offset,
+				    field->graph_root.value_rec, false);
 	}
 }
 
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related

* [PATCH RESEND bpf-next v10 1/8] bpf: refactor __bpf_list_del to take list node pointer
From: Kaitao cheng @ 2026-05-12  5:59 UTC (permalink / raw)
  To: ast, corbet, martin.lau, daniel, andrii, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, shuah,
	chengkaitao, skhan, memxor
  Cc: bpf, linux-kernel, linux-doc, vmalik, linux-kselftest
In-Reply-To: <20260512055919.95716-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

Refactor __bpf_list_del to accept (head, struct list_head *n) instead of
(head, bool tail). The caller now passes the specific node to remove:
bpf_list_pop_front passes h->next, bpf_list_pop_back passes h->prev.

Prepares for introducing bpf_list_del(head, node) kfunc to remove an
arbitrary node when the user holds ownership.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 kernel/bpf/helpers.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index baa12b24bb64..9cd7b028592c 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2550,22 +2550,24 @@ __bpf_kfunc int bpf_list_push_back_impl(struct bpf_list_head *head,
 	return bpf_list_push_back(head, node, meta__ign, off);
 }
 
-static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head, bool tail)
+static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head,
+					    struct list_head *n)
 {
-	struct list_head *n, *h = (void *)head;
+	struct list_head *h = (void *)head;
 	struct bpf_list_node_kern *node;
 
 	/* If list_head was 0-initialized by map, bpf_obj_init_field wasn't
 	 * called on its fields, so init here
 	 */
-	if (unlikely(!h->next))
+	if (unlikely(!h->next)) {
 		INIT_LIST_HEAD(h);
+		return NULL;
+	}
 	if (list_empty(h))
 		return NULL;
 
-	n = tail ? h->prev : h->next;
 	node = container_of(n, struct bpf_list_node_kern, list_head);
-	if (WARN_ON_ONCE(READ_ONCE(node->owner) != head))
+	if (unlikely(READ_ONCE(node->owner) != head))
 		return NULL;
 
 	list_del_init(n);
@@ -2575,12 +2577,16 @@ static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head, bool tai
 
 __bpf_kfunc struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head)
 {
-	return __bpf_list_del(head, false);
+	struct list_head *h = (void *)head;
+
+	return __bpf_list_del(head, h->next);
 }
 
 __bpf_kfunc struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head)
 {
-	return __bpf_list_del(head, true);
+	struct list_head *h = (void *)head;
+
+	return __bpf_list_del(head, h->prev);
 }
 
 __bpf_kfunc struct bpf_list_node *bpf_list_front(struct bpf_list_head *head)
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related

* [PATCH RESEND bpf-next v10 0/8] bpf: Extend the bpf_list family of APIs
From: Kaitao cheng @ 2026-05-12  5:59 UTC (permalink / raw)
  To: ast, corbet, martin.lau, daniel, andrii, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, shuah,
	chengkaitao, skhan, memxor
  Cc: bpf, linux-kernel, linux-doc, vmalik, linux-kselftest,
	Kaitao cheng

In BPF, a list can only be used to implement a stack structure.
Due to an incomplete API set, only FIFO or LIFO operations are
supported. The patches enhance the BPF list API, making it more
list-like.

Five new kfuncs have been added:
bpf_list_del: remove a node from the list
bpf_list_add_impl: insert a node after a given list node
bpf_list_is_first: check if a node is the first in the list
bpf_list_is_last: check if a node is the last in the list
bpf_list_empty: check if the list is empty

And add test cases for the aforementioned kfuncs.

Changes in v10:
- Remove the table-driven approach (Ihor Solodrai)
- Use the __nonown_allowed suffix for bpf_list_del/front/back
- Add test cases for __nonown_allowed

Changes in v9:
- Expand table-driven approach coverage (Emil Tsalapatis)
- Clear list node owner and unlink before drop (Emil Tsalapatis)
- Remove warnings caused by WARN_ON_ONCE() (Emil Tsalapatis)
- Introduce the __nonown_allowed suffix (Alexei Starovoitov)

Changes in v8:
- Use [patch v7 5/5] as the start of the patch series (Leon Hwang)
- Introduce double pointer prev_ptr in __bpf_list_del
  (Kumar Kartikeya Dwivedi)
- Extract refactored __bpf_list_del/add into separate patches (Leon Hwang)
- Allow bpf_list_front/back result as the prev argument of bpf_list_add
- Split test cases (Leon Hwang)

Changes in v7:
- Replace bpf_list_node_is_edge with bpf_list_is_first/is_last
- Reimplement __bpf_list_del and __bpf_list_add (Kumar Kartikeya Dwivedi)
- Simplify test cases (Mykyta Yatsenko)

Changes in v6:
- Merge [patch v5 (2,4,6)/6] into [patch v6 4/5] (Leon Hwang)
- If list_head was 0-initialized, init it
- refactor kfunc checks to table-driven approach (Leon Hwang)

Changes in v5:
- Fix bpf_obj leak on bpf_list_add_impl error

Changes in v4:
- [patch v3 1/6] Revert to version v1 (Alexei Starovoitov)
- Change the parameters of bpf_list_add_impl to (head, new, prev, ...)

Changes in v3:
- Add a new lock_rec member to struct bpf_reference_state for lock
  holding detection.
- Add test cases to verify that the verifier correctly restricts calls
  to bpf_list_del when the spin_lock is not held.

Changes in v2:
- Remove the head parameter from bpf_list_del (Alexei Starovoitov)
- Add bpf_list_add/is_first/is_last/empty to API and test cases
  (Alexei Starovoitov)

Link to v9:
https://lore.kernel.org/all/20260329140506.9595-1-pilgrimtao@gmail.com/

Link to v8:
https://lore.kernel.org/all/20260316112843.78657-1-pilgrimtao@gmail.com/

Link to v7:
https://lore.kernel.org/all/20260308134614.29711-1-pilgrimtao@gmail.com/

Link to v6:
https://lore.kernel.org/all/20260304143459.78059-1-pilgrimtao@gmail.com/

Link to v5:
https://lore.kernel.org/all/20260304031606.43884-1-pilgrimtao@gmail.com/

Link to v4:
https://lore.kernel.org/all/20260303135219.33726-1-pilgrimtao@gmail.com/

Link to v3:
https://lore.kernel.org/all/20260302124028.82420-1-pilgrimtao@gmail.com/

Link to v2:
https://lore.kernel.org/all/20260225092651.94689-1-pilgrimtao@gmail.com/

Link to v1:
https://lore.kernel.org/all/20260209025250.55750-1-pilgrimtao@gmail.com/

Kaitao Cheng (8):
  bpf: refactor __bpf_list_del to take list node pointer
  bpf: clear list node owner and unlink before drop
  bpf: Introduce the bpf_list_del kfunc.
  bpf: refactor __bpf_list_add to take insertion point via **prev_ptr
  bpf: Add bpf_list_add to insert node after a given list node
  bpf: add bpf_list_is_first/last/empty kfuncs
  bpf: allow non-owning list-node args via __nonown_allowed
  selftests/bpf: Add test cases for
    bpf_list_del/add/is_first/is_last/empty

 Documentation/bpf/kfuncs.rst                  |  22 +-
 kernel/bpf/helpers.c                          | 139 ++++--
 kernel/bpf/verifier.c                         |  44 +-
 .../selftests/bpf/progs/refcounted_kptr.c     | 421 ++++++++++++++++++
 4 files changed, 593 insertions(+), 33 deletions(-)

-- 
2.50.1 (Apple Git-155)


^ permalink raw reply

* Re: [PATCH v3 2/3] Documentation: security-bugs: explain what is and is not a security bug
From: Willy Tarreau @ 2026-05-12  5:54 UTC (permalink / raw)
  To: Greg KH
  Cc: Jonathan Corbet, Leon Romanovsky, skhan, security, workflows,
	linux-doc, linux-kernel
In-Reply-To: <2026051220-fetal-obituary-bb2b@gregkh>

On Tue, May 12, 2026 at 07:46:34AM +0200, Greg KH wrote:
> On Mon, May 11, 2026 at 02:42:14PM -0600, Jonathan Corbet wrote:
> > Willy Tarreau <w@1wt.eu> writes:
> > 
> > >> I can ship stuff Linusward quickly too... :)  But it's fine if Greg
> > >> takes it, of course.
> > >
> > > Oh that's fine then. I thought you only delivered such updates into next
> > > releases. I'm fine with either way of course! Let's pick the path of
> > > least effort for each.
> > 
> > That's my normal procedure, since there are few docs changes that have
> > greater urgency, but I do have a "fixes" branch.
> > 
> > Greg, what's your preference?  Unless I hear otherwise, I guess I'll
> > apply it shortly.
> 
> Please apply it and take it through your tree, thanks!

Thanks to you both!
Willy

^ permalink raw reply

* Re: [PATCH v3 2/3] Documentation: security-bugs: explain what is and is not a security bug
From: Greg KH @ 2026-05-12  5:46 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Willy Tarreau, Leon Romanovsky, skhan, security, workflows,
	linux-doc, linux-kernel
In-Reply-To: <87a4u5u195.fsf@trenco.lwn.net>

On Mon, May 11, 2026 at 02:42:14PM -0600, Jonathan Corbet wrote:
> Willy Tarreau <w@1wt.eu> writes:
> 
> >> I can ship stuff Linusward quickly too... :)  But it's fine if Greg
> >> takes it, of course.
> >
> > Oh that's fine then. I thought you only delivered such updates into next
> > releases. I'm fine with either way of course! Let's pick the path of
> > least effort for each.
> 
> That's my normal procedure, since there are few docs changes that have
> greater urgency, but I do have a "fixes" branch.
> 
> Greg, what's your preference?  Unless I hear otherwise, I guess I'll
> apply it shortly.

Please apply it and take it through your tree, thanks!

greg k-h

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox