Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: Jakub Kicinski @ 2026-06-17 20:21 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Peter Zijlstra, Petr Mladek, Sebastian Andrzej Siewior,
	John Ogness, Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner,
	netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
	stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot,
	Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <ajKi4wtA8U1iZkMD@gmail.com>

On Wed, 17 Jun 2026 07:56:50 -0700 Breno Leitao wrote:
> As far as I can tell, there isn't a network driver today whose transmit
> path is completely lockless, so, even if we make netpoll lockless.
> 
> It's unlikely any NIC will ever achieve this, given that NIC TX
> fundamentally relies on a shared DMA ring and doorbell register, which
> inherently cannot be made lockless.

The lock which protects the queue is maintained by the stack,
and we trylock it. Maybe I lost the thread but if you're saying
that writes to netconsole are impossible from arbitrary context,
that is _not_ true, AFAIU. We can queue a packet and kick off 
the transfer on well-behaved drivers.

Main problem is the opportunistic freeing up of the queue space.
If we could avoid that in atomic context I think we'd be good.

^ permalink raw reply

* Re: [PATCH net] net: dst_metadata: fix false-positive memcpy overflow in tun_dst_unclone
From: Gustavo A. R. Silva @ 2026-06-17 20:08 UTC (permalink / raw)
  To: Ilya Maximets, netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kees Cook, Gustavo A. R. Silva, Nathan Chancellor,
	Nick Desaulniers, Bill Wendling, Justin Stitt, linux-kernel,
	linux-hardening, llvm, Johan Thomsen
In-Reply-To: <20260616100332.1308294-1-i.maximets@ovn.org>

Hi,

On 6/16/26 04:03, Ilya Maximets wrote:
> kmalloc_flex() in metadata_dst_alloc() sets __counted_by for the
> structure to the options_len, which is then initialized to zero.
> Later, we're initializing the structure by copying the tunnel info
> together with the options, and this triggers a warning for a potential
> memcpy overflow, since the compiler estimates that the options can't
> fit into the structure, even though the memory for them is actually
> allocated.
> 
>   memcpy: detected buffer overflow: 104 byte write of buffer size 96
>   WARNING: CPU: X PID: Y at lib/string_helpers.c:1036 __fortify_report
>    skb_tunnel_info_unclone+0x179/0x190
>    geneve_xmit+0x7fe/0xe00

This warning has nothing to do with counted_by. See below for more
comments.

> 
> The issue is triggered when built with clang and source fortification.
> 
> Fix that by doing the copy in two stages: first - the main data with
> the options_len, then the options.  This way the correct length should
> be known at the time of the copy.
> 
> It would be better if the options_len never changed after allocation,
> but the allocation code is a little separate from the initialization
> and it would be awkward and potentially dangerous to return a struct
> with options_len set to a non-zero value from the metadata_dst_alloc().
> 
> Another option would be to use ip_tunnel_info_opts_set(), but it is
> doing too many unnecessary operations for the use case here.
> 
> Fixes: 69050f8d6d07 ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types")
> Reported-by: Johan Thomsen <write@ownrisk.dk>
> Closes: https://lore.kernel.org/netdev/CAKv6aAM8_EWgXScnKmKYm_4SwGDVBK++dzfP+Y6msUXbp99QUw@mail.gmail.com/
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---
> 
> Johan, if you can test this one in your setup as well, that would
> be great.  Thanks.
> 
>   include/net/dst_metadata.h | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
> index 1fc2fb03ce3f..f45d1e3163f0 100644
> --- a/include/net/dst_metadata.h
> +++ b/include/net/dst_metadata.h
> @@ -164,8 +164,11 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
>   	if (!new_md)
>   		return ERR_PTR(-ENOMEM);
>   
> -	memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
> -	       sizeof(struct ip_tunnel_info) + md_size);

What's going on here is that, internally, fortified memcpy() retrieves
the destination size via __builtin_dynamic_object_size() in mode 1.

That is:

__builtin_dynamic_object_size(&new_md->u.tun_info, 1)

For the above case, Clang returns sizeof(new_md->u.tun_info) == 96.

So the warning is reporting that 104 bytes don't fit in an object of
size 96 bytes, regardless of any counted_by annotation or allocation.

Of course, in this case, the write of 104 bytes into new_md->u.tun_info
is intentional and controlled, but what if it weren't?

On the other hand, for this same case, GCC currently returns SIZE_MAX,
which translates to -1 inside fortified memcpy(). Thus, bounds-checking
is bypassed, which is why this warning doesn't show up with GCC.

However, this is a bug in GCC. We're already looking into that.

I think we've had just a handful of cases like this across the whole
kernel tree. We can deal with them as you did here (by directly copying
the composite structure first, and then using memcpy() to copy into the
flexible-array member). If these cases ever become more common, we
could create some kind of helper to do both things at once. :)

> +	/* Copy in two stages to keep the __counted_by happy. */

So based on my comments above, this code comment is not correct.

> +	new_md->u.tun_info = md_dst->u.tun_info;

This is fine.

> +	memcpy(ip_tunnel_info_opts(&new_md->u.tun_info),
> +	       ip_tunnel_info_opts(&md_dst->u.tun_info), md_size);

Is ip_tunnel_info_opts() really needed here?

Probably this works just fine:

memcpy(new_md->u.tun_info.options, md_dst->u.tun_info.options, md_size);

-Gustavo

^ permalink raw reply

* Re: [PATCH] e1000: Remove redundant else after return
From: kernel test robot @ 2026-06-17 20:02 UTC (permalink / raw)
  To: Lovekesh Solanki, anthony.l.nguyen
  Cc: oe-kbuild-all, przemyslaw.kitszel, andrew+netdev, davem, edumazet,
	kuba, pabeni, netdev, Lovekesh Solanki
In-Reply-To: <20260616210008.109635-1-lovekeshsolanki00@gmail.com>

Hi Lovekesh,

kernel test robot noticed the following build warnings:

[auto build test WARNING on tnguy-next-queue/dev-queue]
[also build test WARNING on tnguy-net-queue/dev-queue linus/master v7.1 next-20260616]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Lovekesh-Solanki/e1000-Remove-redundant-else-after-return/20260617-051633
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git dev-queue
patch link:    https://lore.kernel.org/r/20260616210008.109635-1-lovekeshsolanki00%40gmail.com
patch subject: [PATCH] e1000: Remove redundant else after return
config: powerpc-randconfig-r073-20260617 (https://download.01.org/0day-ci/archive/20260618/202606180301.nRhk5lMR-lkp@intel.com/config)
compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project e19d1f51a2c80b63cd8ca95bcc757b7077112808)
smatch: v0.5.0-9185-gbcc58b9c

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606180301.nRhk5lMR-lkp@intel.com/

New smatch warnings:
drivers/net/ethernet/intel/e1000/e1000_main.c:1551 e1000_setup_tx_resources() warn: inconsistent indenting

Old smatch warnings:
arch/powerpc/include/asm/checksum.h:94 csum_tcpudp_nofold() warn: inconsistent indenting

vim +1551 drivers/net/ethernet/intel/e1000/e1000_main.c

^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1491  
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1492  /**
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1493   * e1000_setup_tx_resources - allocate Tx resources (Descriptors)
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1494   * @adapter: board private structure
581d708eb47ccc drivers/net/e1000/e1000_main.c                Mallikarjuna R Chilakala 2005-10-04  1495   * @txdr:    tx descriptor ring (for a specific queue) to setup
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1496   *
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1497   * Return 0 on success, negative on failure
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1498   **/
6479884509e6cd drivers/net/e1000/e1000_main.c                Joe Perches              2008-07-11  1499  static int e1000_setup_tx_resources(struct e1000_adapter *adapter,
581d708eb47ccc drivers/net/e1000/e1000_main.c                Mallikarjuna R Chilakala 2005-10-04  1500  				    struct e1000_tx_ring *txdr)
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1501  {
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1502  	struct pci_dev *pdev = adapter->pdev;
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1503  	int size;
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1504  
580f321d8498f1 drivers/net/ethernet/intel/e1000/e1000_main.c Florian Westphal         2014-09-03  1505  	size = sizeof(struct e1000_tx_buffer) * txdr->count;
89bf67f1f080c9 drivers/net/e1000/e1000_main.c                Eric Dumazet             2010-11-22  1506  	txdr->buffer_info = vzalloc(size);
14f8dc49532f76 drivers/net/ethernet/intel/e1000/e1000_main.c Joe Perches              2013-02-07  1507  	if (!txdr->buffer_info)
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1508  		return -ENOMEM;
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1509  
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1510  	/* round up to nearest 4K */
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1511  
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1512  	txdr->size = txdr->count * sizeof(struct e1000_tx_desc);
9099cfb9170f35 drivers/net/e1000/e1000_main.c                Milind Arun Choudhary    2007-04-27  1513  	txdr->size = ALIGN(txdr->size, 4096);
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1514  
b16f53bef9be0a drivers/net/e1000/e1000_main.c                Nick Nunley              2010-04-27  1515  	txdr->desc = dma_alloc_coherent(&pdev->dev, txdr->size, &txdr->dma,
b16f53bef9be0a drivers/net/e1000/e1000_main.c                Nick Nunley              2010-04-27  1516  					GFP_KERNEL);
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1517  	if (!txdr->desc) {
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1518  setup_tx_desc_die:
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1519  		vfree(txdr->buffer_info);
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1520  		return -ENOMEM;
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1521  	}
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1522  
2648345fcbadfa drivers/net/e1000/e1000_main.c                Malli Chilakala          2005-04-28  1523  	/* Fix for errata 23, can't cross 64kB boundary */
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1524  	if (!e1000_check_64k_bound(adapter, txdr->desc, txdr->size)) {
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1525  		void *olddesc = txdr->desc;
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1526  		dma_addr_t olddma = txdr->dma;
feb8f47809fcc6 drivers/net/e1000/e1000_main.c                Emil Tantilov            2010-07-26  1527  		e_err(tx_err, "txdr align check failed: %u bytes at %p\n",
675ad47375c76a drivers/net/e1000/e1000_main.c                Emil Tantilov            2010-04-27  1528  		      txdr->size, txdr->desc);
2648345fcbadfa drivers/net/e1000/e1000_main.c                Malli Chilakala          2005-04-28  1529  		/* Try again, without freeing the previous */
b16f53bef9be0a drivers/net/e1000/e1000_main.c                Nick Nunley              2010-04-27  1530  		txdr->desc = dma_alloc_coherent(&pdev->dev, txdr->size,
b16f53bef9be0a drivers/net/e1000/e1000_main.c                Nick Nunley              2010-04-27  1531  						&txdr->dma, GFP_KERNEL);
2648345fcbadfa drivers/net/e1000/e1000_main.c                Malli Chilakala          2005-04-28  1532  		/* Failed allocation, critical failure */
96838a40f02950 drivers/net/e1000/e1000_main.c                Jesse Brandeburg         2006-01-18  1533  		if (!txdr->desc) {
b16f53bef9be0a drivers/net/e1000/e1000_main.c                Nick Nunley              2010-04-27  1534  			dma_free_coherent(&pdev->dev, txdr->size, olddesc,
b16f53bef9be0a drivers/net/e1000/e1000_main.c                Nick Nunley              2010-04-27  1535  					  olddma);
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1536  			goto setup_tx_desc_die;
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1537  		}
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1538  
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1539  		if (!e1000_check_64k_bound(adapter, txdr->desc, txdr->size)) {
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1540  			/* give up */
b16f53bef9be0a drivers/net/e1000/e1000_main.c                Nick Nunley              2010-04-27  1541  			dma_free_coherent(&pdev->dev, txdr->size, txdr->desc,
2648345fcbadfa drivers/net/e1000/e1000_main.c                Malli Chilakala          2005-04-28  1542  					  txdr->dma);
b16f53bef9be0a drivers/net/e1000/e1000_main.c                Nick Nunley              2010-04-27  1543  			dma_free_coherent(&pdev->dev, txdr->size, olddesc,
b16f53bef9be0a drivers/net/e1000/e1000_main.c                Nick Nunley              2010-04-27  1544  					  olddma);
feb8f47809fcc6 drivers/net/e1000/e1000_main.c                Emil Tantilov            2010-07-26  1545  			e_err(probe, "Unable to allocate aligned memory "
2648345fcbadfa drivers/net/e1000/e1000_main.c                Malli Chilakala          2005-04-28  1546  			      "for the transmit descriptor ring\n");
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1547  			vfree(txdr->buffer_info);
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1548  			return -ENOMEM;
d66eba46397f54 drivers/net/ethernet/intel/e1000/e1000_main.c Lovekesh Solanki         2026-06-17  1549  		}
2648345fcbadfa drivers/net/e1000/e1000_main.c                Malli Chilakala          2005-04-28  1550  			/* Free old allocation, new allocation was successful */
b16f53bef9be0a drivers/net/e1000/e1000_main.c                Nick Nunley              2010-04-27 @1551  			dma_free_coherent(&pdev->dev, txdr->size, olddesc,
b16f53bef9be0a drivers/net/e1000/e1000_main.c                Nick Nunley              2010-04-27  1552  					  olddma);
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1553  	}
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1554  	memset(txdr->desc, 0, txdr->size);
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1555  
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1556  	txdr->next_to_use = 0;
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1557  	txdr->next_to_clean = 0;
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1558  
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1559  	return 0;
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1560  }
^1da177e4c3f41 drivers/net/e1000/e1000_main.c                Linus Torvalds           2005-04-16  1561  

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2
From: Andrei Vagin @ 2026-06-17 19:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Joanne Koong, Val Packett, Al Viro, Linus Torvalds, Askar Safin,
	linux-kernel, linux-mm, linux-api, netdev, Matthew Wilcox,
	Jens Axboe, Christoph Hellwig, David Howells, Andrew Morton,
	David Hildenbrand, Pedro Falcato, Miklos Szeredi, patches,
	linux-fsdevel, Jan Kara, Steven Rostedt, fuse-devel,
	Bernd Schubert, Aleksandr Mikhalitsyn, criu@lists.linux.dev
In-Reply-To: <20260617-attest-gewechselt-tragik-7ed473860051@brauner>

On Wed, Jun 17, 2026 at 4:08 AM Christian Brauner <brauner@kernel.org> wrote:
>
> > After this patch, step b) is a straight copy which means step d)'s
> > fixup doesn't modify what's in the pipe. This could be fixed up in
> > libfuse to not depend on modify-after-vmsplice, but I don't think this
> > helps for applications using already-released libfuse versions. I
> > think this patch needs to be reverted.
>
> Note, nothing was merged. I deliberately kept in -next though for a long
> time to see how quickly we'd see regressions.

The bait worked. CRIU wins a prize in this lottery.

The CRIU fifo test fails with this change. The problem is that vmsplice
with SPLICE_F_NONBLOCK to a fifo file descriptor fails with -EOPNOTSUPP.

It seems we need a fix like this one:

diff --git a/fs/pipe.c b/fs/pipe.c
index 429b0714ec57..6fc49e933727 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1253,6 +1253,7 @@ static int fifo_open(struct inode *inode, struct
file *filp)

        /* We can only do regular read/write on fifos */
        stream_open(inode, filp);
+       filp->f_mode |= FMODE_NOWAIT;

        switch (filp->f_mode & (FMODE_READ | FMODE_WRITE)) {
        case FMODE_READ:

^ permalink raw reply related

* [PATCH net v2] net: marvell: prestera: initialize err in prestera_port_sfp_bind
From: Ruoyu Wang @ 2026-06-17 19:32 UTC (permalink / raw)
  To: Taras Chornyi, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Russell King, Oleksandr Mazur,
	Yevhen Orlov, netdev, linux-kernel

prestera_port_sfp_bind() returns err after walking the ports node. If no
child node matches the port's front-panel id, err is never assigned.

Initialize err to 0 because absence of a matching optional port device
tree node is not an error. In that case no phylink is created and port
creation should continue with port->phy_link left NULL. Errors from
malformed matched nodes and phylink_create() still propagate.

Fixes: 52323ef75414 ("net: marvell: prestera: add phylink support")
Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
---
v2:
- Add net tree target to the subject.
- Explain why the no-match path returns 0 instead of -ENODEV.

 drivers/net/ethernet/marvell/prestera/prestera_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/prestera/prestera_main.c b/drivers/net/ethernet/marvell/prestera/prestera_main.c
index 41e19e9ad28d4..a82e7a8029851 100644
--- a/drivers/net/ethernet/marvell/prestera/prestera_main.c
+++ b/drivers/net/ethernet/marvell/prestera/prestera_main.c
@@ -373,7 +373,7 @@ static int prestera_port_sfp_bind(struct prestera_port *port)
 	struct device_node *ports, *node;
 	struct fwnode_handle *fwnode;
 	struct phylink *phy_link;
-	int err;
+	int err = 0;
 
 	if (!sw->np)
 		return 0;
-- 
2.51.0

^ permalink raw reply related

* Re: [GIT PULL] virtio,vhost,vdpa: features, fixes
From: pr-tracker-bot @ 2026-06-17 19:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Linus Torvalds, kvm, virtualization, netdev, linux-kernel, a0yami,
	ammarfaizi2, arnd, chenhuacai, chenhuacai, christfontanez,
	Damir.Shaikhutdinov, david, den, enelsonmoore, eperezma, ethan,
	evg28bur, filip.hejsek, francesco, graf, harald.mommer, jasowang,
	jiri, johan, johannes.thumshirn, lingshan.zhu, luis.hernandez093,
	lulu, mhi, michael.bommarito, mikhail.golubev-ciuchea, mkl, mst,
	mvaralar, nathan, oleg, pawel.moll, physicalmtea, polina.vishneva,
	q.h.hack.winter, r 
In-Reply-To: <20260617065516-mutt-send-email-mst@kernel.org>

The pull request you sent on Wed, 17 Jun 2026 06:55:16 -0400:

> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/d44ade05aa21468bd30652bc4492891b854a400a

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply

* [ANN] netdev development stats for 7.2
From: Jakub Kicinski @ 2026-06-17 18:53 UTC (permalink / raw)
  To: netdev

Intro
-----

As is tradition here are the development statistics based on mailing
list traffic on netdev@vger.

These stats are somewhat like LWN stats: https://lwn.net/Articles/1004998/
but more focused on mailing list participation. And by participation
we mean reviewing code more than producing patches.

In particular "review score" tries to capture the balance between
reviewing other people's code vs posting patches. It's roughly
number of patches reviewed minus number of patches posted.
Those who post more than they review will have a negative score.

Previous 3 reports:
 - for 6.19: https://lore.kernel.org/20251202175548.6b5eb80e@kernel.org
 - for 7.0:  https://lore.kernel.org/20260212124208.187e53ae@kernel.org
 - for 7.1:  https://lore.kernel.org/20260414182653.40d84ccc@kernel.org

General stats
-------------

The increase in traffic continues. Reading the stats below keep in mind
that the previous release to which we're comparing was already a record
release in every aspect.

We saw an average of 339 emails a day on netdev, which is +6.5%
compared to the 7.1 cycle. Number of commits merged by netdev
maintainers directly grew even faster to 29 / day (+20.9%) compared
to the previous release. The commit growth is similar to growth of
linux-next as a whole (+22.6%). There were 1011 individuals posting
code or participating in discussions (+15.7%).

Review percentage continues to slip, from 53.36% in previous release 
to 50.84% in the current release.

Average revisions of a patch went down to 1.38 (-4.9%) for single
posting, and down to 2.13 (-2.9%) for multi-patch series.
These are approximate since I'm not sure I trust our series tracking
with folks renaming series across revisions. One more factor to keep
in mind is that we include in the stat only patch sets which were
merged. Last but not least since we try to prioritize patches from
active reviewers we tend to commit more patches from experts who rarely
need revisions (*cough* Eric) than new comes who need a lot of help.

Tenure histograms
-----------------

Tenure histograms are meant to measure how well we're doing at
converting one-off contributors to long term community members.
"no commit" means that someone posted a patch but no commit was
found under their name in git.

    Time since poster's first commit in 6.18
    no commit |  76 | *******************************
     0- 3mo   |  33 | **************
     3- 6mo   |  18 | *******
    6mo-1yr   |  30 | ************

    Time since poster's first commit in 7.1
    no commit | 107 | ********************************************
     0- 3mo   |  61 | *************************
     3- 6mo   |  15 | ******
    6mo-1yr   |  29 | ************

    Time since poster's first commit in 7.2
    no commit | 130 | *****************************************************
     0- 3mo   |  95 | **************************************
     3- 6mo   |  25 | *********
    6mo-1yr   |  34 | ************

The 0-3mo bucket are pretty much people who had the first commit in 7.2.
The 3-6mo bucket can be viewed as people who stuck around for 2 releases.
It seems like the number of one-off contributors is growing by 50% release
to release, with retention dropping. We shall have a more definite
signal after the next release cycle.

AI reviews
----------

I suspect we are more or less accustomed to the reviews now. Sashiko
has been updated to mark the pre-existing issues more explicitly which
is very helpful.

The overall quality of the reviews still leaves a lot to be desired,
especially when it comes to driver code.

I have been waiting to get access to OpenAI GPT models. In local use,
having the models cross review each others work drives down the false
positives noticeably. Unfortunately, the model access is getting restricted.
For those working at large corporations, at least, LLM inference used
to be a matter of adding a service to a cloud account. Since we had a CI 
for netdev adding LLMs was easy. Now, paperwork and policies prevent
us / me from accessing GPT. Anthropic made Fable available with 30 day
prompt retention, which, of course, also got it blocked in most corps.

We used to be ahead of the game with Chris's and Roman's effort. Now it
feels like (some would say subsidized) $20 LLM subscription buys much
more LLM access than we can access in our CIs. Ironically, I think this
is the inverse of the problem some in the community were predicting 
(individual contributors will have hard time because corps will have
access to very powerful models). Of course, things change very fast,
they may be proven right tomorrow.

Testing
-------

I was complaining that the number of patches to selftests remains at
10%, this release shows that things can be worse. The fraction of
patches adding tests dropped to 8% (-2%), from 152 -> 147 patches.

On the other hand it's great to see others take the #1 and #2 slots!
Thanks Allison and Matthieu!

Contributions to selftests:
   1 [ 27] Allison Henderson
   2 [ 13] Matthieu Baerts
   3 [ 13] Jakub Kicinski
   4 [  9] Victor Nogueira
   5 [  8] Bobby Eshleman
   6 [  7] Wei Wang
   7 [  6] Minxi Hou
   8 [  6] Willem de Bruijn
   9 [  4] Daniel Borkmann
  10 [  4] Fernando Fernandez Mancera
  11 [  4] Ido Schimmel
  12 [  4] Tushar Vyavahare
  13 [  4] Qingfang Deng

In the last report I mentioned that we started testing on real HW.
I can think of at least 3 bugs we caught using our HW CI.

Developer rankings
------------------

Top reviewers (cs):                  Top reviewers (msg): 
   1 (   ) [50] Jakub Kicinski          1 (   ) [116] Jakub Kicinski     
   2 (   ) [27] Simon Horman            2 ( +1) [ 53] Andrew Lunn        
   3 (   ) [21] Andrew Lunn             3 ( -1) [ 42] Simon Horman       
   4 (   ) [18] Paolo Abeni             4 (   ) [ 30] Paolo Abeni        
   5 (   ) [11] Eric Dumazet            5 (   ) [ 17] Eric Dumazet       
   6 (+10) [ 7] Ido Schimmel            6 ( +7) [ 12] Ido Schimmel       
   7 (+13) [ 6] Jacob Keller            7 (+31) [ 12] David Laight       
   8 (+20) [ 6] David Laight            8 ( -1) [ 12] Aleksandr Loktionov
   9 ( -3) [ 5] Kuniyuki Iwashima       9 (+27) [ 11] Jacob Keller       
  10 ( -2) [ 5] Aleksandr Loktionov    10 ( -2) [ 10] Kuniyuki Iwashima  
  11 (+32) [ 5] Jiayuan Chen           11 (+22) [ 10] Stanislav Fomichev 
  12 ( -2) [ 4] Krzysztof Kozlowski    12 ( -3) [  9] Willem de Bruijn   
  13 ( +2) [ 4] Maxime Chevallier      13 ( +4) [  8] Maxime Chevallier  
  14 (+44) [ 4] Alexander Lobakin      14 ( -2) [  8] Sabrina Dubroca    
  15 ( -6) [ 3] Willem de Bruijn       15 (+12) [  8] Nikolay Aleksandrov

The number of people stepping up to help with reviews is definitely
a bright spot in the patch avalanche. We have some gaps, of course,
but there's quite a few people I can tell are intentionally helping
out. Thank you all so much!

Individual shout outs this cycle go to.. Ido who recently became a L3
(IPv4/IPv6 etc) co-maintainer, but is also helping in other areas.
Intel folks (Jake, Olek, Aleks) stepped up driver reviews after a brief
absence ;) Since Ido works at nVidia, we are now in a position where
the two biggest vendor-contributors are solidly "in the green" when
it comes to review / authorship balance!

Jiayuan Chen has been helping review and triage a lot of security /
bug reports. We're really glad to see this progress, keep it up!

Also big thanks to Maxime, without Maxime we would be in a pretty
bad place in phylink / embedded reviews now that Russell (hopefully
temporarily?) stepped away from this work.

Top authors (cs):                    Top authors (msg):                  
   1 (   ) [11] Eric Dumazet            1 ( +1) [30] Eric Dumazet        
   2 (   ) [ 6] Jakub Kicinski          2 ( +3) [27] Tariq Toukan        
   3 ( +4) [ 4] Tariq Toukan            3 ( +4) [24] Jakub Kicinski      
   4 ( +2) [ 4] Lorenzo Bianconi        4 (+13) [24] Wei Fang            
   5 (***) [ 4] Selvamani Rajagopal     5 (+41) [19] Pablo Neira Ayuso   
   6 ( +7) [ 3] Weiming Shi             6 (+13) [18] Lorenzo Bianconi    
   7 ( +2) [ 3] Kuniyuki Iwashima       7 ( -3) [18] Kuniyuki Iwashima   
   8 (***) [ 3] Michael Bommarito       8 (+13) [17] Ratheesh Kannoth    
   9 ( +9) [ 3] Rosen Penev             9 (***) [15] Breno Leitao        
  10 (***) [ 3] David Laight           10 (***) [13] javen               
  11 (***) [ 2] Wentao Liang           11 (***) [12] Luiz Angelo Daros de Luca
  12 (+40) [ 2] Breno Leitao           12 (+24) [12] Chuck Lever         
  13 (***) [ 2] Samuel Moelius         13 (***) [11] Matthieu Baerts     
  14 ( -3) [ 2] David Carlier          14 (***) [11] Simon Wunderlich    
  15 (***) [ 2] Ren Wei                15 (***) [10] Jason Xing     

With some exceptions the "top authors by message" is populated with
folks who needed a lot of revisions of large series.

On the change set side we have a mix of core work (Eric, Jakub, Kuniyuki),
vendor submissions (Tariq, Selvamani), refactoring (Breno), "cleanups"
(David L, Rosen), presumably AI-driven fixes (Weiming, Wentao, Michael B,
Samuel M, Ren Wei, David C).

Top scores (positive):               Top scores (negative):              
   1 (   ) [768] Jakub Kicinski         1 ( +1) [91] Tariq Toukan        
   2 (   ) [376] Simon Horman           2 ( +8) [86] Wei Fang            
   3 (   ) [346] Andrew Lunn            3 ( +4) [67] Ratheesh Kannoth    
   4 (   ) [265] Paolo Abeni            4 (***) [54] javen               
   5 ( +4) [ 91] Ido Schimmel           5 ( +6) [49] Lorenzo Bianconi    
   6 (+14) [ 74] David Laight           6 (***) [48] Luiz Angelo Daros de Luca
   7 (   ) [ 62] Krzysztof Kozlowski    7 (***) [43] Simon Wunderlich    
   8 ( +2) [ 57] Aleksandr Loktionov    8 (***) [38] Chuck Lever         
   9 (+12) [ 50] Nikolay Aleksandrov    9 (+18) [38] Grzegorz Nitka      
  10 ( -4) [ 49] Willem de Bruijn      10 (***) [35] Pablo Neira Ayuso   
  11 ( +3) [ 49] Sabrina Dubroca       11 (***) [35] Markus Stockhausen  
  12 (+41) [ 47] Alexander Lobakin     12 (***) [34] Selvamani Rajagopal 
  13 (+24) [ 47] Maxime Chevallier     13 (***) [34] Jason Xing          
  14 ( -6) [ 46] David Ahern           14 ( -8) [33] Illusion Wang       
  15 (***) [ 43] Jiayuan Chen          15 (***) [30] Minxi Hou       

One process note on the reviewer score. Tariq tops the negative list. 
I've been returning to the question of whether it's fair since 
he has to handle submissions of most of nVidia's patches.
Still, I don't understand why reading thru the list and reviewing
one patchset from another company a day is too much to ask.

Company rankings
----------------

Top reviewers (cs):                  Top reviewers (msg):                
   1 (   ) [54] Meta                    1 (   ) [135] Meta               
   2 (   ) [53] RedHat                  2 (   ) [109] RedHat             
   3 ( +2) [21] Andrew Lunn             3 ( +2) [ 53] Andrew Lunn        
   4 (   ) [19] Intel                   4 (   ) [ 46] Intel              
   5 ( -2) [18] Google                  5 ( -2) [ 42] Google             
   6 (   ) [17] nVidia                  6 (   ) [ 37] nVidia             
   7 (+12) [ 6] David Laight            7 (+15) [ 12] David Laight       
   8 ( +2) [ 5] Bootlin                 8 ( +1) [ 11] SUSE               
   9 ( -1) [ 5] Linaro                  9 ( -1) [ 11] Linaro       

Top authors (cs):                    Top authors (msg):                  
   1 (   ) [19] Google                  1 (   ) [88] Meta                
   2 ( +1) [14] Meta                    2 ( +1) [70] Google              
   3 ( -1) [12] RedHat                  3 ( +1) [69] Intel               
   4 (   ) [12] Intel                   4 ( -2) [56] RedHat              
   5 ( +1) [ 9] nVidia                  5 ( +1) [54] nVidia              
   6 ( +1) [ 4] Microsoft               6 ( +1) [38] NXP                 
   7 (***) [ 4] Onsemi                  7 (+10) [26] Marvell             
   8 (+10) [ 3] Weiming Shi             8 ( +2) [23] Qualcomm            
   9 (***) [ 3] Michael Bommarito       9 ( -1) [21] Microsoft               

Top scores (positive):               Top scores (negative):              
   1 ( +1) [616] RedHat                 1 (   ) [133] NXP                
   2 ( -1) [608] Meta                   2 ( +7) [ 95] Marvell            
   3 (   ) [346] Andrew Lunn            3 ( +3) [ 63] Qualcomm           
   4 ( +4) [ 74] David Laight           4 (***) [ 54] Realsil            
   5 (+28) [ 58] nVidia                 5 (***) [ 48] Luiz Angelo Daros de Luca
   6 ( -1) [ 46] Linux Foundation       6 (***) [ 43] Simon Wunderlich & co.
   7 ( -3) [ 42] Linaro                 7 ( +7) [ 41] AMD                
   8 (***) [ 37] Shopee                 8 ( -6) [ 35] Microsoft          
   9 ( +1) [ 36] ARM                    9 ( +3) [ 35] Oracle                 

As already mentioned nVidia moves to the green zone.
Shopee is Jiayuan Chen. ARM and Linaro are device tree reviewers.

The negative side is primarily HW vendors dumping code. Without
Vladimir's participation NXP takes the smelly cake. Marvell is
not much better (less bad?).

A reminder that we rank patches for maintainer review based
on the "review standing" of the submitter and their company.
This used to matter much less, because historically I'd try to keep
the number of patches in patchwork around 100 at the end of each day.
These days it feels impossible to get it to 200, because we receive
over 150 patches every working day. If you think maintainers take
forever to look at your code - it's probably your review standing.
-- 
Code: https://github.com/kuba-moo/ml-stat
Raw output: https://netdev.bots.linux.dev/ml-stats/stats-7.2

^ permalink raw reply

* Re: [PATCH] net: prestera: initialize err in prestera_port_sfp_bind
From: Andrew Lunn @ 2026-06-17 18:44 UTC (permalink / raw)
  To: Ruoyu Wang
  Cc: Taras Chornyi, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Russell King, netdev, linux-kernel
In-Reply-To: <20260617182214.963694-1-ruoyuw560@gmail.com>

On Thu, Jun 18, 2026 at 02:22:14AM +0800, Ruoyu Wang wrote:
> prestera_port_sfp_bind() returns err after walking the ports node. If no
> child node matches the port's front-panel id, err is never assigned.
> Initialize it to 0 so the no-match path does not return stack data.

Why 0, and not -ENODEV? Please include an explanation of your choice
in the commit message.

Please also take a read of

https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html

You need to indicate the tree this patch is for in the Subject:

    Andrew

---
pw-bot: cr

^ permalink raw reply

* Re: [PATCH net-next v2] net: rds: check cmsg_len before reading rds_rdma_args in size pass
From: Allison Henderson @ 2026-06-17 18:25 UTC (permalink / raw)
  To: Michael Bommarito, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet
  Cc: Simon Horman, netdev, linux-rdma, rds-devel, linux-kernel
In-Reply-To: <20260617023146.2780077-1-michael.bommarito@gmail.com>

On Tue, 2026-06-16 at 22:31 -0400, Michael Bommarito wrote:
> rds_rm_size() handles RDS_CMSG_RDMA_ARGS after only CMSG_OK() and then
> calls rds_rdma_extra_size(), which reads args->local_vec_addr and
> args->nr_local without first checking that cmsg_len covers struct
> rds_rdma_args. The other two RDS_CMSG_RDMA_ARGS consumers already guard
> this: rds_rdma_bytes() in rds_sendmsg() and rds_cmsg_rdma_args() in
> rds_cmsg_send() both reject cmsg_len < CMSG_LEN(sizeof(struct
> rds_rdma_args)). Add the same check to rds_rm_size() so all three RDMA
> args passes are consistent.
> 
> This is a consistency and hardening change with no behavioral effect for
> well-formed senders and no reachable bug today: rds_rdma_bytes() runs
> before rds_rm_size() in rds_sendmsg() and already rejects a short
> RDS_CMSG_RDMA_ARGS, so the size pass is not reached with an undersized
> cmsg. But rds_rm_size() reads the args independently of that earlier
> pass, and nothing in rds_rm_size() itself records or enforces the
> precondition, so a reader or a future refactor of the size pass cannot
> tell the cmsg has already been length-checked. Applying the same
> cmsg_len guard in all three RDS_CMSG_RDMA_ARGS consumers keeps that
> invariant local to each and robust to reordering.
> 
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
This looks good to me.  Thanks for working on this.

Reviewed-by: Allison Henderson <achender@kernel.org>
Allison

> ---
> v2:
>  - Re-target net-next and drop the Fixes: tag and the stable Cc. This
>    is a consistency/hardening change, not a reachable bug: as Allison
>    Henderson noted, rds_rdma_bytes() runs before rds_rm_size() in
>    rds_sendmsg() and already rejects a short RDS_CMSG_RDMA_ARGS, so a
>    user cannot reach the rds_rm_size() read through sendmsg.
>  - Corrected the changelog: the two sibling guards are rds_rdma_bytes()
>    in rds_sendmsg() and rds_cmsg_rdma_args() in rds_cmsg_send(); the
>    former runs before, not after, rds_rm_size().
>  - Dropped the KASAN/AF_RDS reachability framing. No code change from v1.
>  - v1: https://lore.kernel.org/all/20260614130725.2520842-1-michael.bommarito@gmail.com/
> 
>  net/rds/send.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/rds/send.c b/net/rds/send.c
> index d8b14ff9d366b..6ca3192b1d8af 100644
> --- a/net/rds/send.c
> +++ b/net/rds/send.c
> @@ -967,6 +967,8 @@ static int rds_rm_size(struct msghdr *msg, int num_sgs,
>  
>  		switch (cmsg->cmsg_type) {
>  		case RDS_CMSG_RDMA_ARGS:
> +			if (cmsg->cmsg_len < CMSG_LEN(sizeof(struct rds_rdma_args)))
> +				return -EINVAL;
>  			if (vct->indx >= vct->len) {
>  				vct->len += vct->incr;
>  				tmp_iov =
> 
> base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8


^ permalink raw reply

* [PATCH] net: prestera: initialize err in prestera_port_sfp_bind
From: Ruoyu Wang @ 2026-06-17 18:22 UTC (permalink / raw)
  To: Taras Chornyi, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Russell King, netdev, linux-kernel

prestera_port_sfp_bind() returns err after walking the ports node. If no
child node matches the port's front-panel id, err is never assigned.
Initialize it to 0 so the no-match path does not return stack data.

Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
---
 drivers/net/ethernet/marvell/prestera/prestera_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/prestera/prestera_main.c b/drivers/net/ethernet/marvell/prestera/prestera_main.c
index 41e19e9ad28d4..a82e7a8029851 100644
--- a/drivers/net/ethernet/marvell/prestera/prestera_main.c
+++ b/drivers/net/ethernet/marvell/prestera/prestera_main.c
@@ -373,7 +373,7 @@ static int prestera_port_sfp_bind(struct prestera_port *port)
 	struct device_node *ports, *node;
 	struct fwnode_handle *fwnode;
 	struct phylink *phy_link;
-	int err;
+	int err = 0;
 
 	if (!sw->np)
 		return 0;
-- 
2.51.0


^ permalink raw reply related

* [RFC PATCH 2/2] selftests/landlock: Add test for TCP fast open
From: Matthieu Buffet @ 2026-06-17 18:05 UTC (permalink / raw)
  To: Bryam Vargas
  Cc: Mickaël Salaün, Günther Noack,
	linux-security-module, Mikhail Ivanov, Paul Moore, Eric Dumazet,
	Neal Cardwell, linux-kernel, netdev, Matthieu Buffet
In-Reply-To: <20260617180526.15627-1-matthieu@buffet.re>

Enforce that TCP Fast Open is controlled by
LANDLOCK_ACCESS_NET_CONNECT_TCP. Semantics of connect() and
sendmsg(MSG_FASTOPEN) should be identical from Landlock's perspective.
Also enforce error code consistency, since UDP sockets ignore
the MSG_FASTOPEN flag while Unix sockets reject it.

Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
---
 tools/testing/selftests/landlock/net_test.c | 155 ++++++++++++++++++++
 1 file changed, 155 insertions(+)

diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c
index 0c256e7c8675..177ed28e70f6 100644
--- a/tools/testing/selftests/landlock/net_test.c
+++ b/tools/testing/selftests/landlock/net_test.c
@@ -258,6 +258,64 @@ static int connect_variant(const int sock_fd,
 	return connect_variant_addrlen(sock_fd, srv, get_addrlen(srv, false));
 }
 
+static int sendto_variant_addrlen(const int sock_fd,
+				  const struct service_fixture *const srv,
+				  const socklen_t addrlen, void *buf,
+				  size_t len, size_t flags)
+{
+	const struct sockaddr *dst = NULL;
+	ssize_t ret;
+
+	/*
+        * We never want our processes to be killed by SIGPIPE: we check return
+        * codes and errno, so that we have actual error messages.
+        */
+	flags |= MSG_NOSIGNAL;
+
+	if (srv != NULL) {
+		switch (srv->protocol.domain) {
+		case AF_UNSPEC:
+		case AF_INET:
+			dst = (const struct sockaddr *)&srv->ipv4_addr;
+			break;
+
+		case AF_INET6:
+			dst = (const struct sockaddr *)&srv->ipv6_addr;
+			break;
+
+		case AF_UNIX:
+			dst = (const struct sockaddr *)&srv->unix_addr;
+			break;
+
+		default:
+			errno = EAFNOSUPPORT;
+			return -errno;
+		}
+	}
+
+	ret = sendto(sock_fd, buf, len, flags, dst, addrlen);
+	if (ret < 0)
+		return -errno;
+
+	/* errno is not set in cases of partial writes. */
+	if (ret != len)
+		return -EINTR;
+
+	return 0;
+}
+
+static int sendto_variant(const int sock_fd,
+			  const struct service_fixture *const srv, void *buf,
+			  size_t len, size_t flags)
+{
+	socklen_t addrlen = 0;
+
+	if (srv != NULL)
+		addrlen = get_addrlen(srv, false);
+
+	return sendto_variant_addrlen(sock_fd, srv, addrlen, buf, len, flags);
+}
+
 FIXTURE(protocol)
 {
 	struct service_fixture srv0, srv1, srv2, unspec_any0, unspec_srv0;
@@ -950,6 +1008,103 @@ TEST_F(protocol, connect_unspec)
 	EXPECT_EQ(0, close(bind_fd));
 }
 
+TEST_F(protocol, tcp_fastopen)
+{
+	const bool restricted = variant->sandbox == TCP_SANDBOX &&
+				variant->prot.type == SOCK_STREAM &&
+				(variant->prot.protocol == IPPROTO_TCP ||
+				 variant->prot.protocol == IPPROTO_IP) &&
+				(variant->prot.domain == AF_INET ||
+				 variant->prot.domain == AF_INET6);
+	const struct landlock_ruleset_attr ruleset_attr = {
+		.handled_access_net = LANDLOCK_ACCESS_NET_CONNECT_TCP,
+	};
+	int bind_fd, client_fd, status;
+	char buf;
+	pid_t child;
+
+	bind_fd = socket_variant(&self->srv0);
+	ASSERT_LE(0, bind_fd);
+	EXPECT_EQ(0, bind_variant(bind_fd, &self->srv0));
+	if (self->srv0.protocol.type == SOCK_STREAM)
+		EXPECT_EQ(0, listen(bind_fd, backlog));
+
+	child = fork();
+	ASSERT_LE(0, child);
+	if (child == 0) {
+		int connect_fd, ret;
+
+		/* Closes listening socket for the child. */
+		EXPECT_EQ(0, close(bind_fd));
+
+		connect_fd = socket_variant(&self->srv0);
+		ASSERT_LE(0, connect_fd);
+
+		if (variant->sandbox == TCP_SANDBOX) {
+			const int ruleset_fd = landlock_create_ruleset(
+				&ruleset_attr, sizeof(ruleset_attr), 0);
+			ASSERT_LE(0, ruleset_fd);
+
+			enforce_ruleset(_metadata, ruleset_fd);
+			EXPECT_EQ(0, close(ruleset_fd));
+		}
+
+		/* Fast Open with no address. */
+		ret = sendto_variant(connect_fd, NULL, NULL, 0, MSG_FASTOPEN);
+		if (self->srv0.protocol.domain == AF_UNIX) {
+			ASSERT_EQ(-ENOTCONN, ret);
+		} else if (self->srv0.protocol.type == SOCK_DGRAM) {
+			ASSERT_EQ(-EDESTADDRREQ, ret);
+		} else {
+			ASSERT_EQ(-EINVAL, ret);
+		}
+
+		/* Fast Open to a denied address. */
+		ret = sendto_variant(connect_fd, &self->srv0, "A", 1,
+				     MSG_FASTOPEN);
+		if (restricted) {
+			ASSERT_EQ(-EACCES, ret);
+		} else if (self->srv0.protocol.domain == AF_UNIX &&
+			   self->srv0.protocol.type == SOCK_STREAM) {
+			ASSERT_EQ(-EOPNOTSUPP, ret);
+		} else {
+			ASSERT_EQ(0, ret);
+		}
+
+		EXPECT_EQ(0, close(connect_fd));
+		_exit(_metadata->exit_code);
+		return;
+	}
+
+	client_fd = bind_fd;
+	if (!restricted && self->srv0.protocol.type == SOCK_STREAM &&
+	    self->srv0.protocol.domain != AF_UNIX) {
+		client_fd = accept(bind_fd, NULL, 0);
+		ASSERT_LE(0, client_fd);
+	}
+
+	if (restricted) {
+		EXPECT_EQ(-1, read(client_fd, &buf, 1));
+		EXPECT_EQ(ENOTCONN, errno);
+	} else if (self->srv0.protocol.domain == AF_UNIX &&
+		   self->srv0.protocol.type == SOCK_STREAM) {
+		EXPECT_EQ(-1, read(client_fd, &buf, 1));
+		EXPECT_EQ(EINVAL, errno);
+	} else {
+		EXPECT_EQ(1, read(client_fd, &buf, 1));
+		EXPECT_EQ('A', buf);
+	}
+
+	EXPECT_EQ(child, waitpid(child, &status, 0));
+	EXPECT_EQ(1, WIFEXITED(status));
+	EXPECT_EQ(EXIT_SUCCESS, WEXITSTATUS(status));
+
+	if (client_fd != bind_fd)
+		EXPECT_LE(0, close(client_fd));
+
+	EXPECT_EQ(0, close(bind_fd));
+}
+
 FIXTURE(ipv4)
 {
 	struct service_fixture srv0, srv1;
-- 
2.47.3


^ permalink raw reply related

* Re: Landlock: LANDLOCK_ACCESS_NET_CONNECT_TCP bypass via TCP Fast Open
From: Matthieu Buffet @ 2026-06-17 18:05 UTC (permalink / raw)
  To: Bryam Vargas
  Cc: Mickaël Salaün, Günther Noack,
	linux-security-module, Mikhail Ivanov, Paul Moore, Eric Dumazet,
	Neal Cardwell, linux-kernel, netdev, Matthieu Buffet
In-Reply-To: <20260617.eemahv8ui7Ee@digikod.net>

Hi,

On 6/17/2026 4:22 PM, Mickaël Salaün wrote:
> Thanks for the report.  This was previously identified by Mikhail and
> Matthieu, see the related issue:
> https://github.com/landlock-lsm/linux/issues/41

(I worked on a v0 patch for that issue after I first reported it to
Mickaël, missing the fact that it was already documented as a github
issue. Then tried a more generic approach that failed. Here's the v0,
rebased on the beggining of -next to ease backporting, it might be a
good start. For instance, someone with more performance/benchmarking
background might want to add an unlikely() around the MSG_FASTOPEN
condition in the hot code path?)

Have a nice day!

Matthieu Buffet (2):
  landlock: fix TCP Fast Open connection bypass
  selftests/landlock: Add test for TCP fast open

 security/landlock/net.c                     |  17 +++
 tools/testing/selftests/landlock/net_test.c | 155 ++++++++++++++++++++
 2 files changed, 172 insertions(+)

base-commit: 0ce4243509d1580349dd0d50624036d6b097e958
-- 
2.47.3

^ permalink raw reply

* [RFC PATCH 1/2] landlock: fix TCP Fast Open connection bypass
From: Matthieu Buffet @ 2026-06-17 18:05 UTC (permalink / raw)
  To: Bryam Vargas
  Cc: Mickaël Salaün, Günther Noack,
	linux-security-module, Mikhail Ivanov, Paul Moore, Eric Dumazet,
	Neal Cardwell, linux-kernel, netdev, Matthieu Buffet
In-Reply-To: <20260617180526.15627-1-matthieu@buffet.re>

The documentation of the socket_connect() LSM hook states that it
controls connecting a socket to a remote address. It has not been the
case since the addition of TCP Fast Open (RFC 7413) support, which allows
opening a TCP connection (thus, setting a socket's destination address)
via the MSG_FASTOPEN flag passed to sendto()/sendmsg()/sendmmsg(). The
problem then got duplicated into MPTCP.

Landlock did not take it into account when its TCP support was added,
leaving a bypass of TCP connect policy.

Ideally a call to the LSM hook would be added in the fastopen code path,
in order to fix this generically. But connect() hooks are designed to run
with the socket locked, unlike sendmsg() hooks.

Closes: https://github.com/landlock-lsm/linux/issues/41
Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
---
 security/landlock/net.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/security/landlock/net.c b/security/landlock/net.c
index 4ee4002a8f56..a2375762c18b 100644
--- a/security/landlock/net.c
+++ b/security/landlock/net.c
@@ -246,9 +246,26 @@ static int hook_socket_connect(struct socket *const sock,
 					   access_request);
 }
 
+static int hook_socket_sendmsg(struct socket *const sock,
+			       struct msghdr *const msg, const int size)
+{
+	struct sockaddr *const address = msg->msg_name;
+	const int addrlen = msg->msg_namelen;
+
+	if (sk_is_tcp(sock->sk) && address != NULL &&
+	    (msg->msg_flags & MSG_FASTOPEN) != 0) {
+		return current_check_access_socket(
+			sock, address, addrlen,
+			LANDLOCK_ACCESS_NET_CONNECT_TCP);
+	}
+
+	return 0;
+}
+
 static struct security_hook_list landlock_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(socket_bind, hook_socket_bind),
 	LSM_HOOK_INIT(socket_connect, hook_socket_connect),
+	LSM_HOOK_INIT(socket_sendmsg, hook_socket_sendmsg),
 };
 
 __init void landlock_add_net_hooks(void)
-- 
2.47.3


^ permalink raw reply related

* [PATCH 6.6.y] rxrpc: Fix the ACK parser to extract the SACK table for parsing
From: Sasha Levin @ 2026-06-17 18:04 UTC (permalink / raw)
  To: stable
  Cc: David Howells, Michael Bommarito, Marc Dionne, Jeffrey Altman,
	Eric Dumazet, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, linux-afs, netdev, stable, Sasha Levin
In-Reply-To: <2026061543-superior-passerby-d597@gregkh>

From: David Howells <dhowells@redhat.com>

[ Upstream commit 333b6d5bb9f87827ac2639c737bf9613dbae7253 ]

Fix modification of the received skbuff in rxrpc_input_soft_acks() and a
potential incorrect access of the buffer in a fragmented UDP packet (the
packet would probably have to be deliberately pre-generated as fragmented)
when AF_RXRPC tries to extract the contents of the SACK table by copying
out the contents of the SACK table into a buffer before attempting to parse

AF_RXRPC assumes that it can just call skb_condense() and then validly
access the SACK table from skb->data and that it will be a flat buffer -
but skb_condense() can silently fail to do anything under some
circumstances.

Note that whilst rxrpc_input_soft_acks() should be able to parse extended
ACKs, the rest of AF_RXRPC doesn't currently support that.

Further, there's then no need to call skb_condense() in rxrpc_input_ack(),
so don't.

Fixes: d57a3a151660 ("rxrpc: Save last ACK's SACK table rather than marking txbufs")
Reported-by: Michael Bommarito <michael.bommarito@gmail.com>
Link: https://lore.kernel.org/r/20260513180907.2061972-1-michael.bommarito@gmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
cc: stable@kernel.org
Link: https://patch.msgid.link/105362.1780573560@warthog.procyon.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/rxrpc/input.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 9a162035d4c1d0..1157bf75ef9c8c 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -781,7 +781,18 @@ static void rxrpc_input_soft_acks(struct rxrpc_call *call,
 	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
 	unsigned int i, old_nacks = 0;
 	rxrpc_seq_t lowest_nak = seq + sp->nr_acks;
-	u8 *acks = skb->data + sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket);
+	u8 sack[256] __aligned(sizeof(unsigned long));
+	u8 *acks = sack;
+
+	/* Extract the SACK table into a flat buffer rather than accessing it
+	 * directly through skb->data, which is not guaranteed to be linear for
+	 * a fragmented packet (skb_condense() can silently fail to linearise
+	 * it).
+	 */
+	if (skb_copy_bits(skb,
+			  sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket),
+			  sack, umin(sp->nr_acks, sizeof(sack))) < 0)
+		return;
 
 	for (i = 0; i < sp->nr_acks; i++) {
 		if (acks[i] == RXRPC_ACK_TYPE_ACK) {
-- 
2.53.0


^ permalink raw reply related

* [PATCH 6.12.y] rxrpc: Fix the ACK parser to extract the SACK table for parsing
From: Sasha Levin @ 2026-06-17 17:21 UTC (permalink / raw)
  To: stable
  Cc: David Howells, Michael Bommarito, Marc Dionne, Jeffrey Altman,
	Eric Dumazet, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, linux-afs, netdev, stable, Sasha Levin
In-Reply-To: <2026061541-settle-letdown-ad0c@gregkh>

From: David Howells <dhowells@redhat.com>

[ Upstream commit 333b6d5bb9f87827ac2639c737bf9613dbae7253 ]

Fix modification of the received skbuff in rxrpc_input_soft_acks() and a
potential incorrect access of the buffer in a fragmented UDP packet (the
packet would probably have to be deliberately pre-generated as fragmented)
when AF_RXRPC tries to extract the contents of the SACK table by copying
out the contents of the SACK table into a buffer before attempting to parse

AF_RXRPC assumes that it can just call skb_condense() and then validly
access the SACK table from skb->data and that it will be a flat buffer -
but skb_condense() can silently fail to do anything under some
circumstances.

Note that whilst rxrpc_input_soft_acks() should be able to parse extended
ACKs, the rest of AF_RXRPC doesn't currently support that.

Further, there's then no need to call skb_condense() in rxrpc_input_ack(),
so don't.

Fixes: d57a3a151660 ("rxrpc: Save last ACK's SACK table rather than marking txbufs")
Reported-by: Michael Bommarito <michael.bommarito@gmail.com>
Link: https://lore.kernel.org/r/20260513180907.2061972-1-michael.bommarito@gmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
cc: stable@kernel.org
Link: https://patch.msgid.link/105362.1780573560@warthog.procyon.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/rxrpc/input.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 6a075a7c190db3..ca2d40ba7098f5 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -775,9 +775,23 @@ static void rxrpc_input_soft_acks(struct rxrpc_call *call,
 				  rxrpc_seq_t since)
 {
 	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
-	unsigned int i, old_nacks = 0;
+	unsigned int i, old_nacks = 0, nsack;
 	rxrpc_seq_t lowest_nak = seq + sp->ack.nr_acks;
-	u8 *acks = skb->data + sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket);
+	u8 sack[256] __aligned(sizeof(unsigned long));
+	u8 *acks = sack;
+
+	/* AF_RXRPC assumes that it can access the SACK table directly from
+	 * skb->data as a flat buffer, but the skb may be non-linear (e.g. a
+	 * fragmented UDP packet) and skb_condense() can silently fail to
+	 * linearise it.  Copy the SACK table out into a local buffer before
+	 * parsing it.
+	 */
+	memset(sack, 0, sizeof(sack));
+	nsack = umin(sp->ack.nr_acks, 256);
+	if (skb_copy_bits(skb,
+			  sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket),
+			  sack, nsack) < 0)
+		return;
 
 	for (i = 0; i < sp->ack.nr_acks; i++) {
 		if (acks[i] == RXRPC_ACK_TYPE_ACK) {
@@ -934,9 +948,6 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
 	    skb_copy_bits(skb, ioffset, &trailer, sizeof(trailer)) < 0)
 		return rxrpc_proto_abort(call, 0, rxrpc_badmsg_short_ack_trailer);
 
-	if (nr_acks > 0)
-		skb_condense(skb);
-
 	if (call->cong_last_nack) {
 		since = rxrpc_input_check_prev_ack(call, &summary, first_soft_ack);
 		rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack);
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH net] octeontx2-af: fix memory leak in rvu_setup_hw_resources()
From: patchwork-bot+netdevbpf @ 2026-06-17 17:16 UTC (permalink / raw)
  To: Dawei Feng
  Cc: sgoutham, lcherian, gakula, hkelam, sbhatta, andrew+netdev, davem,
	edumazet, kuba, pabeni, netdev, linux-kernel, jianhao.xu, zilin,
	stable
In-Reply-To: <20260617013416.113860-1-dawei.feng@seu.edu.cn>

Hello:

This patch was applied to bpf/bpf-next.git (master)
by Paolo Abeni <pabeni@redhat.com>:

On Wed, 17 Jun 2026 09:34:16 +0800 you wrote:
> If rvu_npc_exact_init() fails in rvu_setup_hw_resources(), the function
> returns directly instead of jumping to the error handling path. This
> causes a resource leak for the previously initialized CGX, NPC, fwdata,
> and MSI-X states.
> 
> Fix this by replacing the direct return with goto cgx_err to ensure
> proper cleanup.
> 
> [...]

Here is the summary with links:
  - [net] octeontx2-af: fix memory leak in rvu_setup_hw_resources()
    https://git.kernel.org/bpf/bpf-next/c/09a5bf856aa7

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [GIT PULL] Networking for 7.2
From: patchwork-bot+netdevbpf @ 2026-06-17 17:16 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: torvalds, davem, netdev, linux-kernel, pabeni
In-Reply-To: <20260617000705.931602-1-kuba@kernel.org>

Hello:

This pull request was applied to bpf/bpf-next.git (master)
by Linus Torvalds <torvalds@linux-foundation.org>:

On Tue, 16 Jun 2026 17:07:05 -0700 you wrote:
> Hi Linus!
> 
> The following changes since commit 22e2036479cb77df6281ebbd376ae6c330774790:
> 
>   Merge tag 'net-7.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2026-06-11 10:17:49 -0700)
> 
> are available in the Git repository at:
> 
> [...]

Here is the summary with links:
  - [GIT,PULL] Networking for 7.2
    https://git.kernel.org/bpf/bpf-next/c/b85966adbf5d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: John Ogness @ 2026-06-17 17:07 UTC (permalink / raw)
  To: Breno Leitao, Peter Zijlstra
  Cc: Petr Mladek, Jakub Kicinski, Sebastian Andrzej Siewior,
	Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev,
	David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
	stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot,
	Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <ajKi4wtA8U1iZkMD@gmail.com>

On 2026-06-17, Breno Leitao <leitao@debian.org> wrote:
> On Wed, Jun 17, 2026 at 01:19:58PM +0200, Peter Zijlstra wrote:
>> But anything using locking is not ->write_atomic() and should be driven
>> from a kthread, no?
>
> Good point. If that's the case, netconsole might not ever be able to drop
> CON_NBCON_ATOMIC_UNSAFE for any network-based console driver at all. 

It depends on what it needs to synchronize against. For example, the
UART consoles cannot write if the port lock is taken by another
context. And the port lock is the sole lock for writing to the UART. To
deal with this, we added wrappers [0] for acquiring/releasing the port
lock. The wrappers acquire the nbcon hardware after taking the port
lock.

The write_atomic() implementations for UART consoles do not take the
port lock. Only the nbcon hardware is acquired (which can be done from
any context). This automatically provides the synchronization based on
the port lock.

> As far as I can tell, there isn't a network driver today whose transmit
> path is completely lockless, so, even if we make netpoll lockless.
>
> It's unlikely any NIC will ever achieve this, given that NIC TX
> fundamentally relies on a shared DMA ring and doorbell register, which
> inherently cannot be made lockless.
>
> So, is it correct to state that CON_NBCON_ATOMIC_UNSAFE will be part of
> netconsole forever-ish?

Is there some lock that can be taken to synchronize all writing of
packets to the network? If yes, the netconsole can use a similar
solution.

That is an example of a general solution, but individual drivers may be
able to provide unique solutions, such as dedicated tx-channels for
netconsole. (Sorry, I am not a network guy.)

John Ogness

[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/serial_core.h?h=v7.1#n715

^ permalink raw reply

* Re: [PATCH bpf] bpf, sockmap: fix lock inversion between stab->lock and sk_callback_lock
From: John Fastabend @ 2026-06-17 16:59 UTC (permalink / raw)
  To: Sechang Lim
  Cc: Jiayuan Chen, Jakub Sitnicki, Alexei Starovoitov, Daniel Borkmann,
	Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
	David S . Miller, Jakub Kicinski, Simon Horman, netdev, bpf,
	linux-kernel
In-Reply-To: <jxrrwhfd5igwnlo6v5y3l3grqhqqpiedfnsqzg62cmwxwaa7xd@qzthhuahjm5f>

On Tue, Jun 16, 2026 at 06:40:09PM +0000, Sechang Lim wrote:
>On Tue, Jun 16, 2026 at 06:17:48PM +0800, Jiayuan Chen wrote:
>>
>>On 6/16/26 5:11 PM, Sechang Lim wrote:
>>>sock_map_update_common() and __sock_map_delete() hold stab->lock and call
>>>sock_map_unref() -> sock_map_del_link() under it. sock_map_del_link() takes
>>>sk_callback_lock for write to stop the strparser and verdict, giving the
>>>lock order stab->lock -> sk_callback_lock.
>>>
>>>The opposite order comes from an SK_SKB stream parser. On RX,
>>>sk_psock_strp_data_ready() holds sk_callback_lock for read while running
>>>the parser. The verdict redirects the skb to egress, where a sched_cls
>>
>>
>>The commit message is wrong. A verdict does not redirect to egress
>>synchronously — sk_psock_skb_redirect() only queues the skb and
>>schedule_delayed_work()s sk_psock_backlog, so egress runs in workqueue
>>context, not under sk_callback_lock.
>>
>
>Thanks, you're right. it's the inline ACK, not the redirect. Sorry for
>the misleading changelog, I'll fix it in v2.
>
>>
>>>program calls bpf_map_delete_elem() on a sockmap, which takes stab->lock:
>>>
>>>  WARNING: possible circular locking dependency detected
>>>  7.1.0-rc6 Not tainted
>>>  ------------------------------------------------------
>>>  syz.9.8824 is trying to acquire lock:
>>>  (&stab->lock){+.-.}-{3:3}, at: __sock_map_delete net/core/sock_map.c:421
>>>  but task is already holding lock:
>>>  (clock-AF_INET){++.-}-{3:3}, at: sk_psock_strp_data_ready net/core/skmsg.c:1173
>>>
>>>  -> #1 (clock-AF_INET){++.-}-{3:3}:
>>>         _raw_write_lock_bh
>>>         sock_map_del_link net/core/sock_map.c:167
>>>         sock_map_unref net/core/sock_map.c:184
>>>         sock_map_update_common net/core/sock_map.c:509
>>>         sock_map_update_elem_sys net/core/sock_map.c:588
>>>         map_update_elem kernel/bpf/syscall.c:1805
>>>
>>>  -> #0 (&stab->lock){+.-.}-{3:3}:
>>>         _raw_spin_lock_bh
>>>         __sock_map_delete net/core/sock_map.c:421
>>>         sock_map_delete_elem net/core/sock_map.c:452
>>>         bpf_prog_06044d24140080b6
>>>         tcx_run net/core/dev.c:4451
>>>         sch_handle_egress net/core/dev.c:4541
>>>         __dev_queue_xmit net/core/dev.c:4808
>>>         ...
>>>         tcp_bpf_strp_read_sock net/ipv4/tcp_bpf.c:701
>>
>>
>>I guess it is an ACK. What is the actual purpose of a sched_cls 
>>program calling
>>
>>sockmap delete on the TX path of an ACK? If there is no real use 
>>case for it, this is
>>
>>just broken BPF usage, not a kernel bug worth this change.
>>
>>
>
>I don't have a real use case for that exact program. But the verifier
>allows sockmap delete from tc, and it deadlocks when the strparser's
>socket is concurrently removed from the same map. The fix only moves
>sock_map_unref() out from under stab->lock.
>
>Best,
>Sechang

The bot also thinks it found another locking issue. I'm not sure
supporting 'tc' is really needed here. sockmap is much more easy
to reason about from socket layer. What about just blocking sockmap
manipulations from these prog types.

My current thinking on sockmap at the moment is its has sprawled
across so many layers the locking is overly tricky to reason about.

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d9bdc3b32c05..5e08d3e03453 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8567,11 +8567,7 @@ static bool may_update_sockmap(struct bpf_verifier_env *env, int func_id)
                         return true;
                 break;
         case BPF_PROG_TYPE_SOCKET_FILTER:
-       case BPF_PROG_TYPE_SCHED_CLS:
-       case BPF_PROG_TYPE_SCHED_ACT:
-       case BPF_PROG_TYPE_XDP:
         case BPF_PROG_TYPE_SK_REUSEPORT:
-       case BPF_PROG_TYPE_FLOW_DISSECTOR:
         case BPF_PROG_TYPE_SK_LOOKUP:
                 return true;
	default:

^ permalink raw reply related

* Re: [PATCH net-next] r8169: migrate Rx path to page_pool
From: Heiner Kallweit @ 2026-06-17 16:52 UTC (permalink / raw)
  To: Atharva Potdar, Francois Romieu
  Cc: nic_swsd, andrew+netdev, davem, edumazet, kuba, pabeni, netdev
In-Reply-To: <CAF9AHva0TSFz5tedMEgJTkhThzDGqmW7MJshAtf3ULbLY4wd=w@mail.gmail.com>

On 17.06.2026 05:28, Atharva Potdar wrote:
> Hi Heiner, Francois,
> Thank you for reviewing this patch.
> 
> Francois:
>> You may consider fdd7b4c3302c93f6833e338903ea77245eb510b4 and some related
>> changes around that time.
> 
> I am sorry but I don't fully understand the context of this commit or
> the behaviour it addresses. Could you please help me regarding what I
> need to watch out for this change?
> 
> Heiner:
>> Assuming your link speed is 1Gbps, 470Mbps is quite low.
> 
> I apologize, that was my benchmark figure when I passed my NIC via
> VFIO to a VM for testing. When I tested it bare metal again with
> iperf3, I hit line rates of 941 Mbps.
> 
>> If I read this correctly, max_mtu may be lower with this patch.
>> This may cause a regression for existing users.
> 
> My main intention for restricting to order-0 pages is to prepare the
> driver for XDP support in the subsequent patches. I understand this
> causes a regression but I am not sure of another way to tackle it. How
> do you prefer I handle this to avoid breaking current setups while
> still having the driver be ready for XDP?
> 
>> Did you test also on non-x86 architectures? We had DMA-related regressions
>> in the past which showed up on certain non-x86 architectures only.
> 
> Unfortunately, I currently only have access to x86 hardware. I cannot
> test this on a bare-metal ARM machine, only an ARM VM - which may not
> show those hardware issues. How is the testing typically handled for
> other architectures in a situation like this?
> 
It's not only about ARM, I'm aware of at least loongarch systems with
such Realtek NICs. If you can't test it, then you should at least
ensure that in theory the DMA-related flags are OK for basically
any architecture.

> Thanks,
> Atharva.


^ permalink raw reply

* [PATCH RFC v2 9/9] platform/x86: ideapad-laptop: Fully support auto keyboard backlight
From: Rong Zhang @ 2026-06-17 16:48 UTC (permalink / raw)
  To: Lee Jones, Pavel Machek, Jonathan Corbet, Shuah Khan,
	Thomas Weißschuh, Benson Leung, Guenter Roeck,
	Marek Behún, Mark Pearson, Derek J. Clark, Hans de Goede,
	Ilpo Järvinen, Ike Panhc
  Cc: Andrew Lunn, Jakub Kicinski, Vishnu Sankar, Vishnu Sankar,
	linux-leds, netdev, linux-doc, linux-kernel, chrome-platform,
	platform-driver-x86, Rong Zhang
In-Reply-To: <20260618-leds-trigger-hw-changed-v2-0-c28c44053cf3@rong.moe>

Currently, the auto brightness mode of keyboard backlight maps to
brightness=0 in LED classdev. The only method to switch to such a mode
is by pressing the manufacturer-defined shortcut (Fn+Space). However, 0
is a multiplexed brightness value; writing 0 simply results in the
backlight being turned off.

With brightness processing code decoupled from LED classdev, we can now
fully support the auto brightness mode. In this mode, the keyboard
backlight is controlled by the EC according to the ambient light sensor
(ALS).

To utilize this, a private hardware control trigger "ideapad-auto" is
added, with the event handling procedure calling the
led_trigger_notify_hw_control_changed() interface to activate/deactivate
the private trigger according to the current LED trigger state.

Meanwhile, block brightness changes on exit to prevent the side effect
of LED device unregistration when the private trigger is active from
resetting the brightness to zero, so that we can retain the state of
auto mode among boots.

Signed-off-by: Rong Zhang <i@rong.moe>
---
 drivers/platform/x86/lenovo/ideapad-laptop.c | 63 ++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/drivers/platform/x86/lenovo/ideapad-laptop.c b/drivers/platform/x86/lenovo/ideapad-laptop.c
index 97949094ead4..a83af9bf843c 100644
--- a/drivers/platform/x86/lenovo/ideapad-laptop.c
+++ b/drivers/platform/x86/lenovo/ideapad-laptop.c
@@ -1714,9 +1714,56 @@ static int ideapad_kbd_bl_led_cdev_brightness_set(struct led_classdev *led_cdev,
 {
 	struct ideapad_private *priv = container_of(led_cdev, struct ideapad_private, kbd_bl.led);
 
+	/*
+	 * When deinitializing: It must be the side effect of led_cdev
+	 * unregistration when our private trigger is active. We've set
+	 * LED_RETAIN_AT_SHUTDOWN to retain led_cdev brightness level.
+	 * To do the same for auto mode, gate changes and return early.
+	 */
+	if (unlikely(!priv->kbd_bl.initialized))
+		return 0;
+
 	return ideapad_kbd_bl_brightness_set(priv, brightness);
 }
 
+static bool ideapad_kbd_bl_auto_trigger_offloaded(struct led_classdev *led_cdev)
+{
+	struct ideapad_private *priv = container_of(led_cdev, struct ideapad_private, kbd_bl.led);
+
+	return atomic_read(&priv->kbd_bl.last_hw_brightness) == KBD_BL_AUTO_MODE_HW_BRIGHTNESS;
+}
+
+static int ideapad_kbd_bl_auto_trigger_activate(struct led_classdev *led_cdev)
+{
+	struct ideapad_private *priv = container_of(led_cdev, struct ideapad_private, kbd_bl.led);
+
+	return ideapad_kbd_bl_hw_brightness_set(priv, KBD_BL_AUTO_MODE_HW_BRIGHTNESS);
+}
+
+static struct led_hw_trigger_type ideapad_kbd_bl_auto_trigger_type;
+
+static struct led_trigger ideapad_kbd_bl_auto_trigger = {
+	.name = "ideapad-auto",
+	.trigger_type = &ideapad_kbd_bl_auto_trigger_type,
+	.activate = ideapad_kbd_bl_auto_trigger_activate,
+	.offloaded = ideapad_kbd_bl_auto_trigger_offloaded,
+};
+
+static void ideapad_kbd_bl_notify_hw_control(struct ideapad_private *priv,
+					     int hw_brightness, int last_hw_brightness)
+{
+	bool hw_control, last_hw_control;
+
+	if (priv->kbd_bl.type != KBD_BL_TRISTATE_AUTO)
+		return;
+
+	hw_control = hw_brightness == KBD_BL_AUTO_MODE_HW_BRIGHTNESS;
+	last_hw_control = last_hw_brightness == KBD_BL_AUTO_MODE_HW_BRIGHTNESS;
+
+	if (hw_control != last_hw_control)
+		led_trigger_notify_hw_control_changed(&priv->kbd_bl.led, hw_control);
+}
+
 static void ideapad_kbd_bl_notify(struct ideapad_private *priv)
 {
 	int hw_brightness, brightness, last_brightness, last_hw_brightness;
@@ -1738,6 +1785,8 @@ static void ideapad_kbd_bl_notify(struct ideapad_private *priv)
 	if (hw_brightness == last_hw_brightness)
 		return;
 
+	ideapad_kbd_bl_notify_hw_control(priv, hw_brightness, last_hw_brightness);
+
 	last_brightness = ideapad_kbd_bl_brightness_parse(priv, last_hw_brightness);
 	if (last_brightness < 0 || brightness != last_brightness)
 		led_classdev_notify_brightness_hw_changed(&priv->kbd_bl.led, brightness);
@@ -1770,6 +1819,20 @@ static int ideapad_kbd_bl_init(struct ideapad_private *priv)
 
 	switch (priv->kbd_bl.type) {
 	case KBD_BL_TRISTATE_AUTO:
+		err = devm_led_trigger_register(&priv->platform_device->dev,
+						&ideapad_kbd_bl_auto_trigger);
+		if (err)
+			return err;
+
+		priv->kbd_bl.led.flags             |= LED_TRIG_HW_CHANGED;
+		priv->kbd_bl.led.hw_control_trigger = ideapad_kbd_bl_auto_trigger.name;
+		priv->kbd_bl.led.trigger_type       = &ideapad_kbd_bl_auto_trigger_type;
+
+		/* Hardware remembers the last brightness level, including auto mode. */
+		if (hw_brightness == KBD_BL_AUTO_MODE_HW_BRIGHTNESS)
+			priv->kbd_bl.led.default_trigger = ideapad_kbd_bl_auto_trigger.name;
+
+		fallthrough;
 	case KBD_BL_TRISTATE:
 		priv->kbd_bl.led.max_brightness = 2;
 		break;

-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH net-next] r8169: migrate Rx path to page_pool
From: Heiner Kallweit @ 2026-06-17 16:50 UTC (permalink / raw)
  To: Atharva Potdar, Francois Romieu
  Cc: nic_swsd, andrew+netdev, davem, edumazet, kuba, pabeni, netdev
In-Reply-To: <CAF9AHva0TSFz5tedMEgJTkhThzDGqmW7MJshAtf3ULbLY4wd=w@mail.gmail.com>

On 17.06.2026 05:28, Atharva Potdar wrote:
> Hi Heiner, Francois,
> Thank you for reviewing this patch.
> 
> Francois:
>> You may consider fdd7b4c3302c93f6833e338903ea77245eb510b4 and some related
>> changes around that time.
> 
> I am sorry but I don't fully understand the context of this commit or
> the behaviour it addresses. Could you please help me regarding what I
> need to watch out for this change?
> 
> Heiner:
>> Assuming your link speed is 1Gbps, 470Mbps is quite low.
> 
> I apologize, that was my benchmark figure when I passed my NIC via
> VFIO to a VM for testing. When I tested it bare metal again with
> iperf3, I hit line rates of 941 Mbps.
> 

OK, I see. 1Gbps isn't really a challenge, the same at 10Gbps with
a RTL8127 may be more telling.

>> If I read this correctly, max_mtu may be lower with this patch.
>> This may cause a regression for existing users.
> 
> My main intention for restricting to order-0 pages is to prepare the
> driver for XDP support in the subsequent patches. I understand this
> causes a regression but I am not sure of another way to tackle it. How
> do you prefer I handle this to avoid breaking current setups while
> still having the driver be ready for XDP?
> 
Is XDP in general not supported with bigger jumbo packets?
You should find a way to avoid the regression. Intentionally introducing
a regression I don't think is acceptable.

>> Did you test also on non-x86 architectures? We had DMA-related regressions
>> in the past which showed up on certain non-x86 architectures only.
> 
> Unfortunately, I currently only have access to x86 hardware. I cannot
> test this on a bare-metal ARM machine, only an ARM VM - which may not
> show those hardware issues. How is the testing typically handled for
> other architectures in a situation like this?
> 
> Thanks,
> Atharva.


^ permalink raw reply

* [PATCH RFC v2 8/9] platform/x86: ideapad-laptop: Serialize keyboard backlight notifications
From: Rong Zhang @ 2026-06-17 16:48 UTC (permalink / raw)
  To: Lee Jones, Pavel Machek, Jonathan Corbet, Shuah Khan,
	Thomas Weißschuh, Benson Leung, Guenter Roeck,
	Marek Behún, Mark Pearson, Derek J. Clark, Hans de Goede,
	Ilpo Järvinen, Ike Panhc
  Cc: Andrew Lunn, Jakub Kicinski, Vishnu Sankar, Vishnu Sankar,
	linux-leds, netdev, linux-doc, linux-kernel, chrome-platform,
	platform-driver-x86, Rong Zhang
In-Reply-To: <20260618-leds-trigger-hw-changed-v2-0-c28c44053cf3@rong.moe>

ACPI notifications are delivered in dedicated work contexts and may
arrive simultaneously. In the following change, much work will be done
while handling the notification, which could lead to potential race
conditions.

Introduce a new mutex to serialize keyboard backlight notifications to
prevent potential race conditions.

Signed-off-by: Rong Zhang <i@rong.moe>
---
 drivers/platform/x86/lenovo/ideapad-laptop.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/platform/x86/lenovo/ideapad-laptop.c b/drivers/platform/x86/lenovo/ideapad-laptop.c
index 40153dc9a5f2..97949094ead4 100644
--- a/drivers/platform/x86/lenovo/ideapad-laptop.c
+++ b/drivers/platform/x86/lenovo/ideapad-laptop.c
@@ -26,7 +26,9 @@
 #include <linux/jiffies.h>
 #include <linux/kernel.h>
 #include <linux/leds.h>
+#include <linux/lockdep.h>
 #include <linux/module.h>
+#include <linux/mutex.h>
 #include <linux/platform_device.h>
 #include <linux/platform_profile.h>
 #include <linux/power_supply.h>
@@ -228,6 +230,8 @@ struct ideapad_private {
 		int type;
 		struct led_classdev led;
 		atomic_t last_hw_brightness;
+
+		struct mutex notif_mutex; /* protects notifications */
 	} kbd_bl;
 	struct {
 		bool initialized;
@@ -1720,6 +1724,8 @@ static void ideapad_kbd_bl_notify(struct ideapad_private *priv)
 	if (!priv->kbd_bl.initialized)
 		return;
 
+	guard(mutex)(&priv->kbd_bl.notif_mutex);
+
 	hw_brightness = ideapad_kbd_bl_hw_brightness_get(priv);
 	if (hw_brightness < 0)
 		return;
@@ -1747,6 +1753,10 @@ static int ideapad_kbd_bl_init(struct ideapad_private *priv)
 	if (WARN_ON(priv->kbd_bl.initialized))
 		return -EEXIST;
 
+	err = devm_mutex_init(&priv->platform_device->dev, &priv->kbd_bl.notif_mutex);
+	if (err)
+		return err;
+
 	hw_brightness = ideapad_kbd_bl_hw_brightness_get(priv);
 	if (hw_brightness < 0)
 		return hw_brightness;

-- 
2.53.0


^ permalink raw reply related

* [PATCH RFC v2 7/9] platform/x86: ideapad-laptop: Decouple hardware & classdev brightness for keyboard backlight
From: Rong Zhang @ 2026-06-17 16:48 UTC (permalink / raw)
  To: Lee Jones, Pavel Machek, Jonathan Corbet, Shuah Khan,
	Thomas Weißschuh, Benson Leung, Guenter Roeck,
	Marek Behún, Mark Pearson, Derek J. Clark, Hans de Goede,
	Ilpo Järvinen, Ike Panhc
  Cc: Andrew Lunn, Jakub Kicinski, Vishnu Sankar, Vishnu Sankar,
	linux-leds, netdev, linux-doc, linux-kernel, chrome-platform,
	platform-driver-x86, Rong Zhang
In-Reply-To: <20260618-leds-trigger-hw-changed-v2-0-c28c44053cf3@rong.moe>

Some recent models come with an ambient light sensor (ALS). On these
models, their EC will automatically set the keyboard backlight to an
appropriate brightness when the effective "hardware brightness" is 3.
"Hardware brightness" can't be perfectly mapped to an LED classdev
brightness, but the EC does use this predefined brightness value to
represent auto mode.

Currently, the code processing keyboard backlight is coupled with LED
classdev, making it hard to expose the auto brightness (ALS) mode to the
userspace.

As the first step toward the goal, decouple hardware brightness from LED
classdev brightness, and update comments about corresponding backlight
modes.

Since upcoming changes will heavily rely on kbd_bl.last_hw_brightness,
also convert it into an atomic_t to prevent potential race conditions.

To minimalize the diff set in upcoming changes, a trivial refactor
also converts the initialization path into another equivalent form.

Signed-off-by: Rong Zhang <i@rong.moe>
---
 drivers/platform/x86/lenovo/Kconfig          |   1 +
 drivers/platform/x86/lenovo/ideapad-laptop.c | 148 ++++++++++++++++++---------
 2 files changed, 103 insertions(+), 46 deletions(-)

diff --git a/drivers/platform/x86/lenovo/Kconfig b/drivers/platform/x86/lenovo/Kconfig
index 09b1b055d2e0..76ed1593e2aa 100644
--- a/drivers/platform/x86/lenovo/Kconfig
+++ b/drivers/platform/x86/lenovo/Kconfig
@@ -16,6 +16,7 @@ config IDEAPAD_LAPTOP
 	select INPUT_SPARSEKMAP
 	select NEW_LEDS
 	select LEDS_CLASS
+	select LEDS_TRIGGERS
 	help
 	  This is a driver for Lenovo IdeaPad netbooks contains drivers for
 	  rfkill switch, hotkey, fan control and backlight control.
diff --git a/drivers/platform/x86/lenovo/ideapad-laptop.c b/drivers/platform/x86/lenovo/ideapad-laptop.c
index 4fbc904f1fc3..40153dc9a5f2 100644
--- a/drivers/platform/x86/lenovo/ideapad-laptop.c
+++ b/drivers/platform/x86/lenovo/ideapad-laptop.c
@@ -9,6 +9,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/acpi.h>
+#include <linux/atomic.h>
 #include <linux/backlight.h>
 #include <linux/bitfield.h>
 #include <linux/bitops.h>
@@ -134,10 +135,31 @@ enum {
 };
 
 /*
- * These correspond to the number of supported states - 1
- * Future keyboard types may need a new system, if there's a collision
- * KBD_BL_TRISTATE_AUTO has no way to report or set the auto state
- * so it effectively has 3 states, but needs to handle 4
+ * The enumeration has two purposes:
+ *   - as an internal identifier for all known types of keyboard backlight
+ *   - as a mandatory parameter of the KBLC command
+ *
+ * For each type, the hardware brightness values are defined as follows:
+ * +--------------------------+----------+-----+------+------+
+ * |      Hardware brightness |        0 |   1 |    2 |    3 |
+ * | Type                     |          |     |      |      |
+ * +--------------------------+----------+-----+------+------+
+ * | KBD_BL_STANDARD          |      off |  on |  N/A |  N/A |
+ * +--------------------------+----------+-----+------+------+
+ * | KBD_BL_TRISTATE          |      off | low | high |  N/A |
+ * +--------------------------+----------+-----+------+------+
+ * | KBD_BL_TRISTATE_AUTO     |      off | low | high | auto |
+ * +--------------------------+----------+-----+------+------+
+ *
+ * We map LED classdev brightness for KBD_BL_TRISTATE_AUTO as follows:
+ * +--------------------------+----------+-----+------+
+ * |  LED classdev brightness |        0 |   1 |    2 |
+ * | Operation                |          |     |      |
+ * +--------------------------+----------+-----+------+
+ * | Read                     | off/auto | low | high |
+ * +--------------------------+----------+-----+------+
+ * | Write                    |      off | low | high |
+ * +--------------------------+----------+-----+------+
  */
 enum {
 	KBD_BL_STANDARD      = 1,
@@ -145,6 +167,8 @@ enum {
 	KBD_BL_TRISTATE_AUTO = 3,
 };
 
+#define KBD_BL_AUTO_MODE_HW_BRIGHTNESS	3
+
 #define KBD_BL_QUERY_TYPE		0x1
 #define KBD_BL_TRISTATE_TYPE		0x5
 #define KBD_BL_TRISTATE_AUTO_TYPE	0x7
@@ -203,7 +227,7 @@ struct ideapad_private {
 		bool initialized;
 		int type;
 		struct led_classdev led;
-		unsigned int last_brightness;
+		atomic_t last_hw_brightness;
 	} kbd_bl;
 	struct {
 		bool initialized;
@@ -1592,7 +1616,24 @@ static int ideapad_kbd_bl_check_tristate(int type)
 	return (type == KBD_BL_TRISTATE) || (type == KBD_BL_TRISTATE_AUTO);
 }
 
-static int ideapad_kbd_bl_brightness_get(struct ideapad_private *priv)
+static int ideapad_kbd_bl_brightness_parse(struct ideapad_private *priv, int hw_brightness)
+{
+	/* Off, low or high */
+	if (hw_brightness <= priv->kbd_bl.led.max_brightness)
+		return hw_brightness;
+
+	/* Auto (controlled by EC according to ALS), report as off */
+	if (priv->kbd_bl.type == KBD_BL_TRISTATE_AUTO &&
+	    hw_brightness == KBD_BL_AUTO_MODE_HW_BRIGHTNESS)
+		return 0;
+
+	/* Unknown value */
+	dev_warn(&priv->platform_device->dev,
+		 "Unknown keyboard backlight value: %u", hw_brightness);
+	return -EINVAL;
+}
+
+static int ideapad_kbd_bl_hw_brightness_get(struct ideapad_private *priv)
 {
 	unsigned long value;
 	int err;
@@ -1606,21 +1647,7 @@ static int ideapad_kbd_bl_brightness_get(struct ideapad_private *priv)
 		if (err)
 			return err;
 
-		/* Convert returned value to brightness level */
-		value = FIELD_GET(KBD_BL_GET_BRIGHTNESS, value);
-
-		/* Off, low or high */
-		if (value <= priv->kbd_bl.led.max_brightness)
-			return value;
-
-		/* Auto, report as off */
-		if (value == priv->kbd_bl.led.max_brightness + 1)
-			return 0;
-
-		/* Unknown value */
-		dev_warn(&priv->platform_device->dev,
-			 "Unknown keyboard backlight value: %lu", value);
-		return -EINVAL;
+		return FIELD_GET(KBD_BL_GET_BRIGHTNESS, value);
 	}
 
 	err = eval_hals(priv->adev->handle, &value);
@@ -1630,6 +1657,16 @@ static int ideapad_kbd_bl_brightness_get(struct ideapad_private *priv)
 	return !!test_bit(HALS_KBD_BL_STATE_BIT, &value);
 }
 
+static int ideapad_kbd_bl_brightness_get(struct ideapad_private *priv)
+{
+	int hw_brightness = ideapad_kbd_bl_hw_brightness_get(priv);
+
+	if (hw_brightness < 0)
+		return hw_brightness;
+
+	return ideapad_kbd_bl_brightness_parse(priv, hw_brightness);
+}
+
 static enum led_brightness ideapad_kbd_bl_led_cdev_brightness_get(struct led_classdev *led_cdev)
 {
 	struct ideapad_private *priv = container_of(led_cdev, struct ideapad_private, kbd_bl.led);
@@ -1637,32 +1674,37 @@ static enum led_brightness ideapad_kbd_bl_led_cdev_brightness_get(struct led_cla
 	return ideapad_kbd_bl_brightness_get(priv);
 }
 
-static int ideapad_kbd_bl_brightness_set(struct ideapad_private *priv, unsigned int brightness)
+static int ideapad_kbd_bl_hw_brightness_set(struct ideapad_private *priv, int hw_brightness)
 {
-	int err;
 	unsigned long value;
 	int type = priv->kbd_bl.type;
+	int err;
 
 	if (ideapad_kbd_bl_check_tristate(type)) {
-		if (brightness > priv->kbd_bl.led.max_brightness)
-			return -EINVAL;
-
-		value = FIELD_PREP(KBD_BL_SET_BRIGHTNESS, brightness) |
+		value = FIELD_PREP(KBD_BL_SET_BRIGHTNESS, hw_brightness) |
 			FIELD_PREP(KBD_BL_COMMAND_TYPE, type) |
 			KBD_BL_COMMAND_SET;
 		err = exec_kblc(priv->adev->handle, value);
 	} else {
-		err = exec_sals(priv->adev->handle, brightness ? SALS_KBD_BL_ON : SALS_KBD_BL_OFF);
+		value = hw_brightness ? SALS_KBD_BL_ON : SALS_KBD_BL_OFF;
+		err = exec_sals(priv->adev->handle, value);
 	}
-
 	if (err)
 		return err;
 
-	priv->kbd_bl.last_brightness = brightness;
+	atomic_set(&priv->kbd_bl.last_hw_brightness, hw_brightness);
 
 	return 0;
 }
 
+static int ideapad_kbd_bl_brightness_set(struct ideapad_private *priv, int brightness)
+{
+	if (brightness > priv->kbd_bl.led.max_brightness)
+		return -EINVAL;
+
+	return ideapad_kbd_bl_hw_brightness_set(priv, brightness);
+}
+
 static int ideapad_kbd_bl_led_cdev_brightness_set(struct led_classdev *led_cdev,
 						  enum led_brightness brightness)
 {
@@ -1673,26 +1715,31 @@ static int ideapad_kbd_bl_led_cdev_brightness_set(struct led_classdev *led_cdev,
 
 static void ideapad_kbd_bl_notify(struct ideapad_private *priv)
 {
-	int brightness;
+	int hw_brightness, brightness, last_brightness, last_hw_brightness;
 
 	if (!priv->kbd_bl.initialized)
 		return;
 
-	brightness = ideapad_kbd_bl_brightness_get(priv);
-	if (brightness < 0)
+	hw_brightness = ideapad_kbd_bl_hw_brightness_get(priv);
+	if (hw_brightness < 0)
 		return;
 
-	if (brightness == priv->kbd_bl.last_brightness)
-		return;
+	brightness = ideapad_kbd_bl_brightness_parse(priv, hw_brightness);
+	if (brightness < 0)
+		return; /* Reject insane values early. */
 
-	priv->kbd_bl.last_brightness = brightness;
+	last_hw_brightness = atomic_xchg(&priv->kbd_bl.last_hw_brightness, hw_brightness);
+	if (hw_brightness == last_hw_brightness)
+		return;
 
-	led_classdev_notify_brightness_hw_changed(&priv->kbd_bl.led, brightness);
+	last_brightness = ideapad_kbd_bl_brightness_parse(priv, last_hw_brightness);
+	if (last_brightness < 0 || brightness != last_brightness)
+		led_classdev_notify_brightness_hw_changed(&priv->kbd_bl.led, brightness);
 }
 
 static int ideapad_kbd_bl_init(struct ideapad_private *priv)
 {
-	int brightness, err;
+	int hw_brightness, err;
 
 	if (!priv->features.kbd_bl)
 		return -ENODEV;
@@ -1700,21 +1747,30 @@ static int ideapad_kbd_bl_init(struct ideapad_private *priv)
 	if (WARN_ON(priv->kbd_bl.initialized))
 		return -EEXIST;
 
-	if (ideapad_kbd_bl_check_tristate(priv->kbd_bl.type))
-		priv->kbd_bl.led.max_brightness = 2;
-	else
-		priv->kbd_bl.led.max_brightness = 1;
+	hw_brightness = ideapad_kbd_bl_hw_brightness_get(priv);
+	if (hw_brightness < 0)
+		return hw_brightness;
 
-	brightness = ideapad_kbd_bl_brightness_get(priv);
-	if (brightness < 0)
-		return brightness;
+	atomic_set(&priv->kbd_bl.last_hw_brightness, hw_brightness);
 
-	priv->kbd_bl.last_brightness = brightness;
 	priv->kbd_bl.led.name                    = "platform::" LED_FUNCTION_KBD_BACKLIGHT;
 	priv->kbd_bl.led.brightness_get          = ideapad_kbd_bl_led_cdev_brightness_get;
 	priv->kbd_bl.led.brightness_set_blocking = ideapad_kbd_bl_led_cdev_brightness_set;
 	priv->kbd_bl.led.flags                   = LED_BRIGHT_HW_CHANGED | LED_RETAIN_AT_SHUTDOWN;
 
+	switch (priv->kbd_bl.type) {
+	case KBD_BL_TRISTATE_AUTO:
+	case KBD_BL_TRISTATE:
+		priv->kbd_bl.led.max_brightness = 2;
+		break;
+	case KBD_BL_STANDARD:
+		priv->kbd_bl.led.max_brightness = 1;
+		break;
+	default:
+		/* This has already been validated by ideapad_check_features(). */
+		unreachable();
+	}
+
 	err = led_classdev_register(&priv->platform_device->dev, &priv->kbd_bl.led);
 	if (err)
 		return err;

-- 
2.53.0


^ permalink raw reply related

* [PATCH RFC v2 6/9] leds: trigger: Add led_trigger_notify_hw_control_changed() interface
From: Rong Zhang @ 2026-06-17 16:48 UTC (permalink / raw)
  To: Lee Jones, Pavel Machek, Jonathan Corbet, Shuah Khan,
	Thomas Weißschuh, Benson Leung, Guenter Roeck,
	Marek Behún, Mark Pearson, Derek J. Clark, Hans de Goede,
	Ilpo Järvinen, Ike Panhc
  Cc: Andrew Lunn, Jakub Kicinski, Vishnu Sankar, Vishnu Sankar,
	linux-leds, netdev, linux-doc, linux-kernel, chrome-platform,
	platform-driver-x86, Rong Zhang
In-Reply-To: <20260618-leds-trigger-hw-changed-v2-0-c28c44053cf3@rong.moe>

Some hardware can autonomously activate/deactivate hardware control.
After that, the LED hardware notifies the LED driver. Currently, there
is no mechanism for LED drivers to notify the LED core about such events
and initiate a trigger transition to reflect the hardware state.

Add a new interface called led_trigger_notify_hw_control_changed(), so
that LED drivers can call it to notify the LED core about the
transition.

The interface only allows two transitions:

1. "none" => private trigger
2. private trigger => "none"

If the current trigger is neither the private trigger nor "none", no
transition will be made. This protects the currently selected software
trigger.

Note that LED_OFF won't be emitted during the #2 transition, as some
hardware may have selected a new brightness level during its hardware
state transition (e.g., laptop keyboards with a shortcut cycling through
different backlight brightnesses and auto mode).

The interface is designed as a void function as any failure should be
non-fatal and the result of transition should not have any impact on the
LED drivers' event handling procedures.

To use the interface, LEDS_TRIGGERS_HW_CHANGED must be enabled in
Kconfig, and the LED driver must set the LED_TRIG_HW_CHANGED flag for
the classdev.

Signed-off-by: Rong Zhang <i@rong.moe>
---
 Documentation/leds/leds-class.rst | 61 +++++++++++++++++++++++++++
 drivers/leds/led-triggers.c       | 86 +++++++++++++++++++++++++++++++++++++--
 drivers/leds/trigger/Kconfig      |  9 ++++
 include/linux/leds.h              |  8 ++++
 4 files changed, 161 insertions(+), 3 deletions(-)

diff --git a/Documentation/leds/leds-class.rst b/Documentation/leds/leds-class.rst
index 41342ecb5f6b..f250dc938e1f 100644
--- a/Documentation/leds/leds-class.rst
+++ b/Documentation/leds/leds-class.rst
@@ -261,9 +261,70 @@ the end use hw_control_set to activate hw control.
 A trigger can use hw_control_get to check if a LED is already in hw control
 and init their flags.
 
+Alternatively, a private trigger can be implemented along with the LED driver if
+the LED's hw control doesn't fit any generic trigger. To associate the private
+trigger with the LED classdev, their `trigger_type` must be the same. The name
+of the private trigger must be the same as `hw_control_trigger`. Since both the
+LED classdev and the private trigger are in the same LED driver, it's not
+necessary for them to coordinate via `hw_control_*` callbacks.
+
 When the LED is in hw control, no software blink is possible and doing so
 will effectively disable hw control.
 
+Hardware-initiated trigger transition
+=====================================
+
+Some hardware can autonomously activate/deactivate hardware control. After that,
+the LED hardware notifies the LED driver.
+
+If the driver can detect such transitions and thus wants to notify the LED core
+to update the current trigger then the `LED_TRIG_HW_CHANGED` flag must be set in
+flags before registering. To update the current trigger accordingly, call
+`led_trigger_notify_hw_control_changed` on the LED classdev. Calling the method
+on a classdev not registered with the `LED_TRIG_HW_CHANGED` flag or an
+appropriate `hw_control_trigger` string is a bug and will trigger a WARN_ON.
+
+This capability is restricted to the LED device's private trigger. The private
+trigger must have been properly registered (see above) and named after
+`hw_control_trigger`, or else a dev_err() will be triggered.
+
+Only two transitions are defined:
+
+- "none" => private trigger:
+        This happens when the hardware autonomously activates hardware control
+        and when "none" (i.e., no trigger) is currently active. If the private
+        trigger is already active when the method is called, this is essentially
+        a no-op.
+
+        The activation sequence for the private trigger will be executed as
+        normal.
+
+        The LED driver and its private trigger must be able to handle the
+        activation sequence even if the hardware is currently in hardware
+        control.
+
+        If error occurs in the activation sequence, the LED Trigger core reverts
+        the effective trigger to "none".
+
+- private trigger => "none"
+        This happens when the hardware autonomously deactivates hardware control
+        and when the private trigger is currently active. If "none" (i.e., no
+        trigger) is active when the method is called, this is essentially a
+        no-op.
+
+        The deactivation sequence for the private trigger will be executed as
+        normal, except that the current LED brightness is retained. The reason
+        for keeping the brightness unchanged is that some hardware may choose a
+        specific brightness instead of simply turning off the LED after
+        autonomously deactivating hardware control.
+
+        The LED driver and its private trigger must be able to handle the
+        deactivation sequence even if the hardware is not currently in hardware
+        control.
+
+If the current trigger is neither the private trigger nor "none", no transition
+will be made.
+
 Known Issues
 ============
 
diff --git a/drivers/leds/led-triggers.c b/drivers/leds/led-triggers.c
index c43229d9c4c1..73e9ce376d02 100644
--- a/drivers/leds/led-triggers.c
+++ b/drivers/leds/led-triggers.c
@@ -7,6 +7,7 @@
  * Author: Richard Purdie <rpurdie@openedhand.com>
  */
 
+#include <linux/bug.h>
 #include <linux/export.h>
 #include <linux/kernel.h>
 #include <linux/list.h>
@@ -162,8 +163,8 @@ ssize_t led_trigger_read(struct file *filp, struct kobject *kobj,
 }
 EXPORT_SYMBOL_GPL(led_trigger_read);
 
-/* Caller must ensure led_cdev->trigger_lock held */
-int led_trigger_set(struct led_classdev *led_cdev, struct led_trigger *trig)
+static int __led_trigger_set(struct led_classdev *led_cdev, struct led_trigger *trig,
+			     bool hw_triggered)
 {
 	char *event = NULL;
 	char *envp[2];
@@ -194,7 +195,21 @@ int led_trigger_set(struct led_classdev *led_cdev, struct led_trigger *trig)
 		led_cdev->trigger_data = NULL;
 		led_cdev->activated = false;
 		led_cdev->flags &= ~LED_INIT_DEFAULT_TRIGGER;
-		led_set_brightness(led_cdev, LED_OFF);
+
+		/*
+		 * Hardware may have selected a new brightness level during its
+		 * hardware control transition, so only reset brightness if we
+		 * are switching to another trigger or if the switching is not
+		 * hardware triggered.
+		 *
+		 * Note that this does not apply to the error path, as running
+		 * into the error path implies a none => private trigger
+		 * transition. This hints that the LED driver and its private
+		 * trigger must have some fundamental bugs, so don't bother
+		 * leaving the LED in an undefined state.
+		 */
+		if (trig || !hw_triggered)
+			led_set_brightness(led_cdev, LED_OFF);
 	}
 	if (trig) {
 		spin_lock(&trig->leddev_list_lock);
@@ -258,6 +273,12 @@ int led_trigger_set(struct led_classdev *led_cdev, struct led_trigger *trig)
 
 	return ret;
 }
+
+/* Caller must ensure led_cdev->trigger_lock held */
+int led_trigger_set(struct led_classdev *led_cdev, struct led_trigger *trig)
+{
+	return __led_trigger_set(led_cdev, trig, false);
+}
 EXPORT_SYMBOL_GPL(led_trigger_set);
 
 void led_trigger_remove(struct led_classdev *led_cdev)
@@ -448,6 +469,65 @@ int devm_led_trigger_register(struct device *dev,
 }
 EXPORT_SYMBOL_GPL(devm_led_trigger_register);
 
+#ifdef CONFIG_LEDS_TRIGGERS_HW_CHANGED
+static void led_trigger_do_hw_control_transition(struct led_classdev *led_cdev, bool activate,
+						 struct led_trigger *hc_trig)
+{
+	int err = 0;
+
+	if (!led_cdev->trigger) {
+		/* "none" => private trigger. */
+		if (activate)
+			err = __led_trigger_set(led_cdev, hc_trig, true);
+	} else if (led_cdev->trigger == hc_trig) {
+		/* private trigger => "none". */
+		if (!activate)
+			err = __led_trigger_set(led_cdev, NULL, true);
+	} else {
+		/* Other trigger is active. */
+		dev_dbg(led_cdev->dev,
+			"Ignoring hw control transition (%s %s) while %s is active",
+			activate ? "activate" : "deactivate", hc_trig->name,
+			led_cdev->trigger->name);
+
+		return;
+	}
+
+	if (err)
+		dev_warn(led_cdev->dev, "Failed to %s %s in hw control transition: %d",
+			 activate ? "activate" : "deactivate", hc_trig->name, err);
+}
+
+void led_trigger_notify_hw_control_changed(struct led_classdev *led_cdev, bool activate)
+{
+	struct led_trigger *trig;
+
+	/* Restricted to private triggers. */
+	if (WARN_ON(!(led_cdev->flags & LED_TRIG_HW_CHANGED) ||
+		    !led_cdev->hw_control_trigger || !led_cdev->trigger_type))
+		return;
+
+	down_read(&triggers_list_lock);
+	list_for_each_entry(trig, &trigger_list, next_trig) {
+		if (trig->trigger_type == led_cdev->trigger_type &&
+		    !strcmp(trig->name, led_cdev->hw_control_trigger)) {
+			down_write(&led_cdev->trigger_lock);
+			led_trigger_do_hw_control_transition(led_cdev, activate, trig);
+			up_write(&led_cdev->trigger_lock);
+
+			up_read(&triggers_list_lock);
+			return;
+		}
+	}
+	up_read(&triggers_list_lock);
+
+	dev_err(led_cdev->dev,
+		"%s() is called, but the private trigger (%s) is never registered\n",
+		__func__, led_cdev->hw_control_trigger);
+}
+EXPORT_SYMBOL_GPL(led_trigger_notify_hw_control_changed);
+#endif /* CONFIG_LEDS_TRIGGERS_HW_CHANGED */
+
 /* Simple LED Trigger Interface */
 
 void led_trigger_event(struct led_trigger *trig,
diff --git a/drivers/leds/trigger/Kconfig b/drivers/leds/trigger/Kconfig
index c11282a74b5a..798122154049 100644
--- a/drivers/leds/trigger/Kconfig
+++ b/drivers/leds/trigger/Kconfig
@@ -9,6 +9,15 @@ menuconfig LEDS_TRIGGERS
 
 if LEDS_TRIGGERS
 
+config LEDS_TRIGGERS_HW_CHANGED
+	bool "LED hardware-initiated trigger transition support"
+	help
+	  This option enables support for hardware initiated hardware control
+	  transitions, where the LED hardware autonomously switches between
+	  "none" (i.e., no trigger) and its private trigger.
+
+	  See Documentation/leds/leds-class.rst for details.
+
 config LEDS_TRIGGER_TIMER
 	tristate "LED Timer Trigger"
 	help
diff --git a/include/linux/leds.h b/include/linux/leds.h
index 7332034a43c8..479391ddf5e5 100644
--- a/include/linux/leds.h
+++ b/include/linux/leds.h
@@ -109,6 +109,7 @@ struct led_classdev {
 #define LED_INIT_DEFAULT_TRIGGER BIT(23)
 #define LED_REJECT_NAME_CONFLICT BIT(24)
 #define LED_MULTI_COLOR		BIT(25)
+#define LED_TRIG_HW_CHANGED	BIT(26)
 
 	/* set_brightness_work / blink_timer flags, atomic, private. */
 	unsigned long		work_flags;
@@ -599,6 +600,13 @@ led_trigger_get_brightness(const struct led_trigger *trigger)
 
 #endif /* CONFIG_LEDS_TRIGGERS */
 
+#ifdef CONFIG_LEDS_TRIGGERS_HW_CHANGED
+void led_trigger_notify_hw_control_changed(struct led_classdev *led_cdev, bool activate);
+#else
+static inline void led_trigger_notify_hw_control_changed(struct led_classdev *led_cdev,
+							 bool activate) {}
+#endif
+
 /* Trigger specific enum */
 enum led_trigger_netdev_modes {
 	TRIGGER_NETDEV_LINK = 0,

-- 
2.53.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox