Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2 5/5] net: qrtr: ns: Fix use-after-free in driver remove()
From: Manivannan Sadhasivam @ 2026-04-08 16:23 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: manivannan.sadhasivam, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Simon Horman, linux-arm-msm, netdev, linux-kernel,
	stable
In-Reply-To: <0ab00cb4-8335-472d-b43e-3bbd99b41480@redhat.com>

On Tue, Apr 07, 2026 at 05:33:55PM +0200, Paolo Abeni wrote:
> On 4/3/26 6:06 PM, Manivannan Sadhasivam via B4 Relay wrote:
> > From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> > 
> > In the remove callback, if a packet arrives after destroy_workqueue() is
> > called, but before sock_release(), the qrtr_ns_data_ready() callback will
> > try to queue the work, causing use-after-free issue.
> > 
> > Fix this issue by saving the default 'sk_data_ready' callback during
> > qrtr_ns_init() and use it to replace the qrtr_ns_data_ready() callback at
> > the start of remove(). This ensures that even if a packet arrives after
> > destroy_workqueue(), the work struct will not be dereferenced.
> > 
> > Cc: stable@vger.kernel.org
> > Fixes: 0c2204a4ad71 ("net: qrtr: Migrate nameservice to kernel from userspace")
> > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> > ---
> >  net/qrtr/ns.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/net/qrtr/ns.c b/net/qrtr/ns.c
> > index dfb5dad9473c..c62d79e03d64 100644
> > --- a/net/qrtr/ns.c
> > +++ b/net/qrtr/ns.c
> > @@ -25,6 +25,7 @@ static struct {
> >  	u32 lookup_count;
> >  	struct workqueue_struct *workqueue;
> >  	struct work_struct work;
> > +	void (*saved_data_ready)(struct sock *sk);
> >  	int local_node;
> >  } qrtr_ns;
> >  
> > @@ -754,6 +755,7 @@ int qrtr_ns_init(void)
> >  		goto err_sock;
> >  	}
> >  
> > +	qrtr_ns.saved_data_ready = qrtr_ns.sock->sk->sk_data_ready;
> >  	qrtr_ns.sock->sk->sk_data_ready = qrtr_ns_data_ready;
> >  
> >  	sq.sq_port = QRTR_PORT_CTRL;
> > @@ -803,6 +805,10 @@ EXPORT_SYMBOL_GPL(qrtr_ns_init);
> >  
> >  void qrtr_ns_remove(void)
> >  {
> > +	write_lock_bh(&qrtr_ns.sock->sk->sk_callback_lock);
> > +	qrtr_ns.sock->sk->sk_data_ready = qrtr_ns.saved_data_ready;
> > +	write_unlock_bh(&qrtr_ns.sock->sk->sk_callback_lock);
> 
> Sashiko says:
> 
> ---
> Does this lock adequately protect against concurrent callback execution?
> In the network receive path, __sock_queue_rcv_skb() typically evaluates
> !sock_flag(sk, SOCK_DEAD) and invokes sk->sk_data_ready() locklessly,
> without acquiring sk_callback_lock or being in an RCU read-side
> critical section.
> If a thread processing a packet fetches the qrtr_ns_data_ready pointer
> and is preempted, could it resume and execute the callback after
> qrtr_ns_remove() has already finished destroying the workqueue?
> ---
> 

This is a legitimate concern. I believe adding synchronize_net() before
destroy_workqueue() will ensure that all the RX packets are flushed before
destroying the workqueue.

> There are more remarks from sashiko:
> 
> https://sashiko.dev/#/patchset/20260403-qrtr-fix-v2-0-f88a14859c63%40oss.qualcomm.com
> 
> AFAICS they are pre-existing issues or false positive, but please have a
> look.
> 

There are a couple of worth fixing issues mentioned there in the error path that
I'll incorporate in the next revision. But for the issues not related to this
series, I will defer them to follow up series.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply

* Re: BUG: net-next (7.0-rc6 based and later) fails to boot on Jetson Xavier NX
From: Linus Torvalds @ 2026-04-08 16:22 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: netdev, linux-arm-kernel, linux-kernel, iommu, linux-ext4,
	dmaengine, Marek Szyprowski, Robin Murphy, Theodore Ts'o,
	Andreas Dilger, Vinod Koul, Frank Li
In-Reply-To: <adZ9grUg71f518Fg@shell.armlinux.org.uk>

On Wed, 8 Apr 2026 at 09:08, Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> The rebase is still progressing, but it's landed on:
>
> c7d812e33f3e dmaengine: xilinx: xilinx_dma: Fix unmasked residue subtraction

Well, that commit looks completely bogus.

The explanation is just garbage: when subtracting two values that may
have random crud in the top bits, it's actually likely *better* to do
the masking *after* the subtraction.

The subtract of bogus upper bits will only affect upper bits. The
carry-chain only works upwards, not downwards.

So the old code that did

                       residue += (cdma_hw->control - cdma_hw->status) &
                                  chan->xdev->max_buffer_len;

would correctly mask out the upper bits, and the result of the
subtraction would be done "modulo mac_buffer_len". Which is rather
reasonable.

The code was changed to

                       residue += (cdma_hw->control &
chan->xdev->max_buffer_len) -
                                  (cdma_hw->status &
chan->xdev->max_buffer_len);

and now it does obviously still mask out the upper bits on each of the
values), but then the subtraction is done "modulo the arithmetic C
type" (which is 'u32')

In particular, if the status bits are bigger than the control bits,
that residue addition will now add a *huge* 32-bit number. It used to
add a number that was limited by the  max_buffer_len mask.

So the "interference from those top bits" stated in the commit message
is simply NOT TRUE. It's just complete rambling garbage.

Instead, the commit purely changes the final modulus of the
subtraction - which has nothing to do with any upper bits, and
everything to do with what kind of answer you want.

I think that commit is just very very wrong. At least the commit
message is wrong. And see above why I think the changed arithmetic is
likely wrong too.

It's very possible that the 'residue' is now a random 32-bit number
with the high bits set, and you get DMA corruption.

That would explain why this happens on Jetson but I haven't seen other reports.

                    Linus

^ permalink raw reply

* Re: [PATCH net v3 4/7] net/sched: netem: restructure dequeue to avoid re-entrancy with child qdisc
From: Simon Horman @ 2026-04-08 16:21 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jakub Kicinski, netdev, Jamal Hadi Salim, Jiri Pirko,
	David S. Miller, Eric Dumazet, Paolo Abeni, open list
In-Reply-To: <20260406101238.2d106bfd@phoenix.local>

On Mon, Apr 06, 2026 at 10:12:38AM -0700, Stephen Hemminger wrote:
> On Mon, 6 Apr 2026 08:41:33 -0700
> Jakub Kicinski <kuba@kernel.org> wrote:
> 
> > On Sat, 4 Apr 2026 10:49:46 +0100 Simon Horman wrote:
> > > On Thu, Apr 02, 2026 at 01:19:32PM -0700, Stephen Hemminger wrote:  
> > > > netem_dequeue() enqueues packets into its child qdisc while being
> > > > called from the parent's dequeue path. This causes two problems:
> > > > 
> > > > - HFSC tracks class active/inactive state on qlen transitions.
> > > >   A child enqueue during dequeue causes double-insertion into
> > > >   the eltree (CVE-2025-37890, CVE-2025-38001).
> > > > 
> > > > - Non-work-conserving children like TBF may refuse to dequeue
> > > >   packets just enqueued, causing netem to return NULL despite
> > > >   having backlog. Parents like DRR then incorrectly deactivate
> > > >   the class.
> > > > 
> > > > Split the dequeue into helpers:
> > > > 
> > > >   netem_pull_tfifo()    - remove head packet from tfifo
> > > >   netem_slot_account()  - update slot pacing counters
> > > >   netem_dequeue_child() - batch-transfer ready packets to the
> > > >                           child, then dequeue from the child
> > > >   netem_dequeue_direct()- dequeue from tfifo when no child
> > > > 
> > > > When a child qdisc is present, all time-ready packets are moved
> > > > into the child before calling its dequeue. This separates the
> > > > enqueue and dequeue phases so the parent sees consistent qlen
> > > > transitions.
> > > > 
> > > > Fixes: 50612537e9ab ("netem: fix classful handling")
> > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > > > ---
> > > >  net/sched/sch_netem.c | 201 +++++++++++++++++++++++++++---------------
> > > >  1 file changed, 128 insertions(+), 73 deletions(-)    
> > > 
> > > Hi Stephen,
> > > 
> > > As a fix this is a large and complex patch.
> > > Could it be split up somehow to aid review?  
> > 
> > +1, FWIW it's perfectly fine to have refactoring patch in a net series
> > (without a Fixes tag) if it makes the fix a lot easier to review.
> 
> I split it into refactoring followed by fix for next version
> 
> The fix alone just gets really confusing to look at;
> I got more confused the pre-existing spaghetti code here..

Thanks Stephen,

Just to clarify, in case others come back to this thread for some reason,
that the aim here is to aid review. So whatever works in that direction is
appreciated.

^ permalink raw reply

* Re: BUG: net-next (7.0-rc6 based and later) fails to boot on Jetson Xavier NX
From: Russell King (Oracle) @ 2026-04-08 16:16 UTC (permalink / raw)
  To: netdev, linux-arm-kernel, linux-kernel, iommu, linux-ext4,
	Linus Torvalds, dmaengine
  Cc: Marek Szyprowski, Robin Murphy, Theodore Ts'o, Andreas Dilger,
	Vinod Koul, Frank Li
In-Reply-To: <adZ9grUg71f518Fg@shell.armlinux.org.uk>

On Wed, Apr 08, 2026 at 05:08:34PM +0100, Russell King (Oracle) wrote:
> The rebase is still progressing, but it's landed on:
> 
> c7d812e33f3e dmaengine: xilinx: xilinx_dma: Fix unmasked residue subtraction
> 
> and while this boots to a login prompt, it spat out a BUG():
> 
> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:591
> in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 56, name: kworker/u24:3
> preempt_count: 0, expected: 0
> RCU nest depth: 0, expected: 0
> 3 locks held by kworker/u24:3/56:
>  #0: ffff000080042148 ((wq_completion)events_unbound#2){+.+.}-{0:0}, at: process_one_work+0x184/0x780
>  #1: ffff80008299bdf8 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1ac/0x780
>  #2: ffff0000808b48f8 (&dev->mutex){....}-{4:4}, at: __device_attach+0x2c/0x188
> irq event stamp: 10872
> hardirqs last  enabled at (10871): [<ffff80008013a410>] ktime_get+0x130/0x180
> hardirqs last disabled at (10872): [<ffff800080d61ac8>] _raw_spin_lock_irqsave+0x84/0x88
> softirqs last  enabled at (9216): [<ffff80008002807c>] fpsimd_save_and_flush_current_state+0x3c/0x80
> softirqs last disabled at (9214): [<ffff800080028098>] fpsimd_save_and_flush_current_state+0x58/0x80
> CPU: 5 UID: 0 PID: 56 Comm: kworker/u24:3 Not tainted 7.0.0-rc1-bisect+ #654 PREEMPT
> Hardware name: NVIDIA NVIDIA Jetson Xavier NX Developer Kit/Jetson, BIOS 6.0-37391689 08/28/2024
> Workqueue: events_unbound deferred_probe_work_func
> Call trace:
>  show_stack+0x18/0x30 (C)
>  dump_stack_lvl+0x6c/0x94
>  dump_stack+0x18/0x24
>  __might_resched+0x154/0x220
>  __might_sleep+0x48/0x80
>  __mutex_lock+0x48/0x800
>  mutex_lock_nested+0x24/0x30
>  pinmux_disable_setting+0x9c/0x180
>  pinctrl_commit_state+0x5c/0x260
>  pinctrl_pm_select_idle_state+0x4c/0xa0
>  tegra_i2c_runtime_suspend+0x2c/0x3c
>  pm_generic_runtime_suspend+0x2c/0x44
>  __rpm_callback+0x48/0x1ec
>  rpm_callback+0x74/0x80
>  rpm_suspend+0xec/0x630
>  rpm_idle+0x2c0/0x420
>  __pm_runtime_idle+0x44/0x160
>  tegra_i2c_probe+0x2e4/0x640
>  platform_probe+0x5c/0xa4
>  really_probe+0xbc/0x2c0
>  __driver_probe_device+0x78/0x120
>  driver_probe_device+0x3c/0x160
>  __device_attach_driver+0xbc/0x160
>  bus_for_each_drv+0x70/0xb8
>  __device_attach+0xa4/0x188
>  device_initial_probe+0x50/0x54
>  bus_probe_device+0x38/0xa4
>  deferred_probe_work_func+0x90/0xcc
>  process_one_work+0x204/0x780
>  worker_thread+0x1c8/0x36c
>  kthread+0x138/0x144
>  ret_from_fork+0x10/0x20
> 
> This is reproducible.

I've just realised that it's the Tegra I2C bug that is already known
about, but took ages to be fixed in mainline - it's unrelated to the
memory corruption, so can be ignored. Sorry for the noise.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: BUG: net-next (7.0-rc6 based and later) fails to boot on Jetson Xavier NX
From: Russell King (Oracle) @ 2026-04-08 16:08 UTC (permalink / raw)
  To: netdev, linux-arm-kernel, linux-kernel, iommu, linux-ext4,
	Linus Torvalds, dmaengine
  Cc: Marek Szyprowski, Robin Murphy, Theodore Ts'o, Andreas Dilger,
	Vinod Koul, Frank Li
In-Reply-To: <adZfTi3R6jtsjXx-@shell.armlinux.org.uk>

On Wed, Apr 08, 2026 at 02:59:42PM +0100, Russell King (Oracle) wrote:
> On Wed, Apr 08, 2026 at 02:07:36PM +0100, Russell King (Oracle) wrote:
> > Hi,
> > 
> > Just a heads-up that current net-next (v7.0-rc6 based) fails to boot on
> > my nVidia Jetson Xavier platform. v7.0-rc5 and v6.14 based net-next both
> > boot fine. This is an arm64 platform.
> > 
> > The problem appears to be completely random in terms of its symptoms,
> > and looks like severe memory corruption - every boot seems to produce
> > a different problem. The common theme is, although the kernel gets to
> > userspace, it never gets anywhere close to a login prompt before
> > failing in some way.
> > 
> > The last net-next+ boot (which is currently v7.0-rc6 based) resulted
> > in:
> > 
> > tegra-mc 2c00000.memory-controller: xusb_hostw: secure write @0x00000003ffffff00: VPR violation ((null))
> > ...
> > irq 91: nobody cared (try booting with the "irqpoll" option)
> > ...
> > depmod: ERROR: could not open directory /lib/modules/7.0.0-rc6-net-next+: No such file or directory
> > ...
> > Unable to handle kernel paging request at virtual address 0003201fd50320cf
> > 
> > 
> > A previous boot of the exact same kernel didn't oops, but was unable
> > to find the block device to mount for /mnt via block UUID.
> > 
> > A previous boot to that resulted in an oops.
> > 
> > 
> > The intersting thing is - the depmod error above is incorrect:
> > 
> > root@tegra-ubuntu:~# ls -ld /lib/modules/7.0.0-rc6-net-next+
> > drwxrwxr-x 3 root root 4096 Apr  8 10:23 /lib/modules/7.0.0-rc6-net-next+
> > 
> > The directory is definitely there, and is readable - checked after
> > booting back into net-next based on 7.0-rc5. In some of these boots,
> > stmmac hasn't probed yet, which rules out my changes.
> > 
> > Rootfs is ext4, and it seems there were a lot of ext4 commits merged
> > between rc5 and rc6, but nothing for rc7.
> > 
> > My current net-next head is dfecb0c5af3b. Merging rc7 on top also
> > fails, I suspect also randomly, with that I just got:
> > 
> > EXT4-fs (mmcblk0p1): VFS: Can't find ext4 filesystem
> > mount: /mnt: wrong fs type, bad option, bad superblock on /dev/mmcblk0p1, missing codepage or helper program, or other error.
> > mount: /mnt/: can't find PARTUUID=741c0777-391a-4bce-a222-455e180ece2a.
> > Unable to handle kernel paging request at virtual address f9bf0011ac0fb893
> > Mem abort info:
> >   ESR = 0x0000000096000004
> >   EC = 0x25: DABT (current EL), IL = 32 bits
> >   SET = 0, FnV = 0
> >   EA = 0, S1PTW = 0
> >   FSC = 0x04: level 0 translation fault
> > Data abort info:
> >   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> >   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> >   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > [f9bf0011ac0fb893] address between user and kernel address ranges
> > Internal error: Oops: 0000000096000004 [#1]  SMP
> > Modules linked in:
> > CPU: 1 UID: 0 PID: 936 Comm: mount Not tainted 7.0.0-rc7-net-next+ #649 PREEMPT
> > Hardware name: NVIDIA NVIDIA Jetson Xavier NX Developer Kit/Jetson, BIOS 6.0-37391689 08/28/2024
> > pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > pc : refill_objects+0x298/0x5ec
> > lr : refill_objects+0x1f0/0x5ec
> > 
> > ...
> > 
> > Call trace:
> >  refill_objects+0x298/0x5ec (P)
> >  __pcs_replace_empty_main+0x13c/0x3a8
> >  kmem_cache_alloc_noprof+0x324/0x3a0
> >  alloc_iova+0x3c/0x290
> >  alloc_iova_fast+0x168/0x2d4
> >  iommu_dma_alloc_iova+0x84/0x154
> >  iommu_dma_map_sg+0x2c4/0x538
> >  __dma_map_sg_attrs+0x124/0x2c0
> >  dma_map_sg_attrs+0x10/0x20
> >  sdhci_pre_dma_transfer+0xb8/0x164
> >  sdhci_pre_req+0x38/0x44
> >  mmc_blk_mq_issue_rq+0x3dc/0x920
> >  mmc_mq_queue_rq+0x104/0x2b0
> >  __blk_mq_issue_directly+0x38/0xb0
> >  blk_mq_request_issue_directly+0x54/0xb4
> >  blk_mq_issue_direct+0x84/0x180
> >  blk_mq_dispatch_queue_requests+0x1a8/0x2e0
> >  blk_mq_flush_plug_list+0x60/0x140
> >  __blk_flush_plug+0xe0/0x11c
> >  blk_finish_plug+0x38/0x4c
> >  read_pages+0x158/0x260
> >  page_cache_ra_unbounded+0x158/0x3e0
> >  force_page_cache_ra+0xb0/0xe4
> >  page_cache_sync_ra+0x88/0x480
> >  filemap_get_pages+0xd8/0x850
> >  filemap_read+0xdc/0x3d8
> >  blkdev_read_iter+0x84/0x198
> >  vfs_read+0x208/0x2d8
> >  ksys_read+0x58/0xf4
> >  __arm64_sys_read+0x1c/0x28
> >  invoke_syscall.constprop.0+0x50/0xe0
> >  do_el0_svc+0x40/0xc0
> >  el0_svc+0x48/0x2a0
> >  el0t_64_sync_handler+0xa0/0xe4
> >  el0t_64_sync+0x19c/0x1a0
> > Code: 54000189 f9000022 aa0203e4 b9402ae3 (f8634840)
> > ---[ end trace 0000000000000000 ]---
> > Kernel panic - not syncing: Oops: Fatal exception
> > 
> > Looking at the changes between rc5 and rc6, there's one drivers/block
> > change for zram (which is used on this platform), one change in
> > drivers/base for regmap, nothing for drivers/mmc, but plenty for
> > fs/ext4. There are five DMA API changes.
> > 
> > Now building straight -rc7. If that also fails, my plan is to start
> > bisecting rc5..rc6, which will likely take most of the rest of the
> > day. So, in the mean time I'm sending this as a heads-up that rc6
> > and onwards has a problem.
> 
> Plain -rc7 fails (another random oops):
> 
> Root device found: PARTUUID=741c0777-391a-4bce-a222-455e180ece2a
> depmod: ERROR: could not open directory /lib/modules/7.0.0-rc7-net-next+: No such file or directory
> depmod: FATAL: could not search modules: No such file or directory
> usb 2-3: new SuperSpeed Plus Gen 2x1 USB device number 2 using tegra-xusb
> hub 2-3:1.0: USB hub found
> hub 2-3:1.0: 4 ports detected
> usb 1-3: new full-speed USB device number 3 using tegra-xusb
> Unable to handle kernel paging request at virtual address 0003201fd50320cf
> Mem abort info:
>   ESR = 0x0000000096000004
>   EC = 0x25: DABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
>   FSC = 0x04: level 0 translation fault
> Data abort info:
>   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [0003201fd50320cf] address between user and kernel address ranges
> Internal error: Oops: 0000000096000004 [#1]  SMP
> Modules linked in:
> CPU: 1 UID: 0 PID: 917 Comm: mount Not tainted 7.0.0-rc7-net-next+ #649 PREEMPT
> Hardware name: NVIDIA NVIDIA Jetson Xavier NX Developer Kit/Jetson, BIOS 6.0-37391689 08/28/2024
> pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : refill_objects+0x298/0x5ec
> lr : refill_objects+0x1f0/0x5ec
> sp : ffff80008606b500
> x29: ffff80008606b500 x28: 0000000000000001 x27: fffffdffc20e6200
> x26: 0000000000000006 x25: 0000000000000000 x24: 000000000000003c
> x23: ffff0000809e4840 x22: ffff0000809dba00 x21: ffff80008606b5a0
> x20: ffff000081133820 x19: fffffdffc20e6220 x18: 0000000000000000
> x17: 0000000000000000 x16: 0000000000000100 x15: 0000000000000000
> x14: 0000000000000000 x13: 0000000000000000 x12: ffff800081e5faa8
> x11: ffff800082192c70 x10: ffff8000814074dc x9 : 0000000000000050
> x8 : ffff80008606b490 x7 : ffff000083988b40 x6 : ffff80008606b4a0
> x5 : 000000080015000f x4 : d503201fd503201f x3 : 00000000000000b0
> x2 : d503201fd503201f x1 : ffff000081133828 x0 : d503201fd503201f
> Call trace:
>  refill_objects+0x298/0x5ec (P)
>  __pcs_replace_empty_main+0x13c/0x3a8
>  kmem_cache_alloc_noprof+0x324/0x3a0
>  mempool_alloc_slab+0x1c/0x28
>  mempool_alloc_noprof+0x98/0xe0
>  bio_alloc_bioset+0x160/0x3e0
>  do_mpage_readpage+0x3d0/0x618
>  mpage_readahead+0xb8/0x144
>  blkdev_readahead+0x18/0x24
>  read_pages+0x58/0x260
>  page_cache_ra_unbounded+0x158/0x3e0
>  force_page_cache_ra+0xb0/0xe4
>  page_cache_sync_ra+0x88/0x480
>  filemap_get_pages+0xd8/0x850
>  filemap_read+0xdc/0x3d8
>  blkdev_read_iter+0x84/0x198
>  vfs_read+0x208/0x2d8
>  ksys_read+0x58/0xf4
>  __arm64_sys_read+0x1c/0x28
>  invoke_syscall.constprop.0+0x50/0xe0
>  do_el0_svc+0x40/0xc0
>  el0_svc+0x48/0x2a0
>  el0t_64_sync_handler+0xa0/0xe4
>  el0t_64_sync+0x19c/0x1a0
> Code: 54000189 f9000022 aa0203e4 b9402ae3 (f8634840)
> ---[ end trace 0000000000000000 ]---
> 
> Now starting the bisect between 7.0-rc5 and 7.0-rc6.

The rebase is still progressing, but it's landed on:

c7d812e33f3e dmaengine: xilinx: xilinx_dma: Fix unmasked residue subtraction

and while this boots to a login prompt, it spat out a BUG():

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:591
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 56, name: kworker/u24:3
preempt_count: 0, expected: 0
RCU nest depth: 0, expected: 0
3 locks held by kworker/u24:3/56:
 #0: ffff000080042148 ((wq_completion)events_unbound#2){+.+.}-{0:0}, at: process_one_work+0x184/0x780
 #1: ffff80008299bdf8 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1ac/0x780
 #2: ffff0000808b48f8 (&dev->mutex){....}-{4:4}, at: __device_attach+0x2c/0x188
irq event stamp: 10872
hardirqs last  enabled at (10871): [<ffff80008013a410>] ktime_get+0x130/0x180
hardirqs last disabled at (10872): [<ffff800080d61ac8>] _raw_spin_lock_irqsave+0x84/0x88
softirqs last  enabled at (9216): [<ffff80008002807c>] fpsimd_save_and_flush_current_state+0x3c/0x80
softirqs last disabled at (9214): [<ffff800080028098>] fpsimd_save_and_flush_current_state+0x58/0x80
CPU: 5 UID: 0 PID: 56 Comm: kworker/u24:3 Not tainted 7.0.0-rc1-bisect+ #654 PREEMPT
Hardware name: NVIDIA NVIDIA Jetson Xavier NX Developer Kit/Jetson, BIOS 6.0-37391689 08/28/2024
Workqueue: events_unbound deferred_probe_work_func
Call trace:
 show_stack+0x18/0x30 (C)
 dump_stack_lvl+0x6c/0x94
 dump_stack+0x18/0x24
 __might_resched+0x154/0x220
 __might_sleep+0x48/0x80
 __mutex_lock+0x48/0x800
 mutex_lock_nested+0x24/0x30
 pinmux_disable_setting+0x9c/0x180
 pinctrl_commit_state+0x5c/0x260
 pinctrl_pm_select_idle_state+0x4c/0xa0
 tegra_i2c_runtime_suspend+0x2c/0x3c
 pm_generic_runtime_suspend+0x2c/0x44
 __rpm_callback+0x48/0x1ec
 rpm_callback+0x74/0x80
 rpm_suspend+0xec/0x630
 rpm_idle+0x2c0/0x420
 __pm_runtime_idle+0x44/0x160
 tegra_i2c_probe+0x2e4/0x640
 platform_probe+0x5c/0xa4
 really_probe+0xbc/0x2c0
 __driver_probe_device+0x78/0x120
 driver_probe_device+0x3c/0x160
 __device_attach_driver+0xbc/0x160
 bus_for_each_drv+0x70/0xb8
 __device_attach+0xa4/0x188
 device_initial_probe+0x50/0x54
 bus_probe_device+0x38/0xa4
 deferred_probe_work_func+0x90/0xcc
 process_one_work+0x204/0x780
 worker_thread+0x1c8/0x36c
 kthread+0x138/0x144
 ret_from_fork+0x10/0x20

This is reproducible.

Adding Vinod and Frank, and dmaengine mailing list.

Bisect continuing, assuming this is a "good" commit as it isn't
producing the boot failure with random memory corruption.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH net-next v2 01/10] enic: verify firmware supports V2 SR-IOV at probe time
From: Breno Leitao @ 2026-04-08 16:04 UTC (permalink / raw)
  To: satishkh
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel,
	20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9
In-Reply-To: <20260408-enic-sriov-v2-admin-channel-v2-v2-1-d05dd3623fd3@cisco.com>

On Wed, Apr 08, 2026 at 08:08:11AM -0700, Satish Kharat via B4 Relay wrote:
> From: Satish Kharat <satishkh@cisco.com>
> 
> During PF probe, query the firmware get-supported-feature interface
> to verify that the running firmware supports V2 SR-IOV. Firmware
> version 5.3(4.72) and later report VIC_FEATURE_SRIOV via
> CMD_GET_SUPP_FEATURE_VER. If the firmware does not support the
> feature, set vf_type to ENIC_VF_TYPE_NONE and log a warning so the
> admin knows a firmware upgrade is needed.
> 
> The VIC_FEATURE_SRIOV enum value (4) matches the firmware ABI. A
> placeholder entry (VIC_FEATURE_PTP at position 3) is added to keep
> the enum in sync with firmware's feature numbering.
> 
> Signed-off-by: Satish Kharat <satishkh@cisco.com>
> ---
>  drivers/net/ethernet/cisco/enic/enic_main.c   | 18 ++++++++++++++++++
>  drivers/net/ethernet/cisco/enic/vnic_devcmd.h |  2 ++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
> index e7125b818087..3a4afd6da41f 100644
> --- a/drivers/net/ethernet/cisco/enic/enic_main.c
> +++ b/drivers/net/ethernet/cisco/enic/enic_main.c
> @@ -2641,8 +2641,10 @@ static void enic_iounmap(struct enic *enic)
>  static void enic_sriov_detect_vf_type(struct enic *enic)
>  {
>  	struct pci_dev *pdev = enic->pdev;
> +	u64 supported_versions, a1 = 0;
>  	int pos;
>  	u16 vf_dev_id;
> +	int err;
>  
>  	if (enic_is_sriov_vf(enic) || enic_is_dynamic(enic))
>  		return;
> @@ -2669,6 +2671,22 @@ static void enic_sriov_detect_vf_type(struct enic *enic)
>  		enic->vf_type = ENIC_VF_TYPE_NONE;
>  		break;
>  	}
> +
> +	if (enic->vf_type == ENIC_VF_TYPE_V2) {

Maybe invert the if case here?

	if (enic->vf_type != ENIC_VF_TYPE_V2)
		return

And then shift the rest to the left

This might be easier to read, and the code looks better, in general.

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH iwl-next] igb: use ktime_get_real helpers in igb_ptp_reset()
From: Jacob Keller @ 2026-04-08 15:51 UTC (permalink / raw)
  To: Paul Menzel, Aleksandr Loktionov
  Cc: intel-wired-lan, anthony.l.nguyen, netdev, Simon Horman
In-Reply-To: <5496af54-1bfa-4ecd-9565-b87a9df81277@molgen.mpg.de>

On 4/8/2026 4:57 AM, Paul Menzel wrote:
> Dear Aleksandr,
> 
> 
> Thank you for your patch.
> 
> Am 08.04.26 um 10:35 schrieb Aleksandr Loktionov:
>> Replace ktime_to_ns(ktime_get_real()) with the direct equivalent
>> ktime_get_real_ns() and ktime_to_timespec64(ktime_get_real()) with
>> ktime_get_real_ts64() in igb_ptp_reset().  Using the combined helpers
>> avoids the unnecessary intermediate ktime_t variable and makes the
>> intent clearer.
> 
> No intermediate variable is removed in the diff below. What am I missing?
> 

The commit message is seems clear to me:

ktime_get_real() returns the current time as a ktime_t, and this is then
converted into a timepsec64 with ktime_to_timespec64.

The ktime_get_real_ts64() is implemented to generate the current time as
a timespec64 directly, avoiding the ktime_t passed between
ktime_get_real() and ktime_to_timespec64.

Thanks,
Jake

>> Suggested-by: Jacob Keller <jacob.e.keller@intel.com>
>> Suggested-by: Simon Horman <horms@kernel.org>
>> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
>> ---
>>   drivers/net/ethernet/intel/igb/igb_ptp.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c b/drivers/net/
>> ethernet/intel/igb/igb_ptp.c
>> index bd85d02..638d824 100644
>> --- a/drivers/net/ethernet/intel/igb/igb_ptp.c
>> +++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
>> @@ -1500,12 +1500,13 @@ void igb_ptp_reset(struct igb_adapter *adapter)
>>         /* Re-initialize the timer. */
>>       if ((hw->mac.type == e1000_i210) || (hw->mac.type == e1000_i211)) {
>> -        struct timespec64 ts = ktime_to_timespec64(ktime_get_real());
>> +        struct timespec64 ts;
>>   +        ktime_get_real_ts64(&ts);
>>           igb_ptp_write_i210(adapter, &ts);
>>       } else {
>>           timecounter_init(&adapter->tc, &adapter->cc,
>> -                 ktime_to_ns(ktime_get_real()));
>> +                 ktime_get_real_ns());
>>       }
>>   out:
>>       spin_unlock_irqrestore(&adapter->tmreg_lock, flags);
> 
> With the commit message clarified, feel free to add:
> 
> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
> 
> 
> Kind regards,
> 
> Paul


^ permalink raw reply

* [syzbot] [net?] memory leak in xfrm_policy_construct
From: syzbot @ 2026-04-08 15:48 UTC (permalink / raw)
  To: davem, edumazet, herbert, horms, kuba, linux-kernel, netdev,
	pabeni, steffen.klassert, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    a0c83177734a Merge tag 'drm-fixes-2026-03-21' of https://g..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=117d66da580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=e2bba615ee79faa5
dashboard link: https://syzkaller.appspot.com/bug?extid=901d48e0b95aed4a2548
compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12a481d6580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1669ccba580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/76025f732b88/disk-a0c83177.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/51c8f0f97de7/vmlinux-a0c83177.xz
kernel image: https://storage.googleapis.com/syzbot-assets/46b7135c73d1/bzImage-a0c83177.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+901d48e0b95aed4a2548@syzkaller.appspotmail.com

2026/03/21 23:24:08 executed programs: 5
BUG: memory leak
unreferenced object 0xffff888125a86c00 (size 1024):
  comm "syz.0.17", pid 6082, jiffies 4294946151
  hex dump (first 32 bytes):
    00 e5 5f 1c 81 88 ff ff 00 00 00 00 00 00 00 00  .._.............
    22 01 00 00 00 00 ad de 00 01 00 00 00 00 ad de  "...............
  backtrace (crc f62518df):
    kmemleak_alloc_recursive include/linux/kmemleak.h:44 [inline]
    slab_post_alloc_hook mm/slub.c:4543 [inline]
    slab_alloc_node mm/slub.c:4866 [inline]
    __kmalloc_cache_noprof+0x377/0x480 mm/slub.c:5375
    kmalloc_noprof include/linux/slab.h:950 [inline]
    kzalloc_noprof include/linux/slab.h:1188 [inline]
    xfrm_policy_alloc+0x63/0x180 net/xfrm/xfrm_policy.c:432
    xfrm_policy_construct+0x30/0x260 net/xfrm/xfrm_user.c:2187
    xfrm_add_policy+0x12e/0x390 net/xfrm/xfrm_user.c:2246
    xfrm_user_rcv_msg+0x248/0x570 net/xfrm/xfrm_user.c:3507
    netlink_rcv_skb+0x89/0x1c0 net/netlink/af_netlink.c:2550
    xfrm_netlink_rcv+0x34/0x40 net/xfrm/xfrm_user.c:3529
    netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
    netlink_unicast+0x3a1/0x4f0 net/netlink/af_netlink.c:1344
    netlink_sendmsg+0x335/0x690 net/netlink/af_netlink.c:1894
    sock_sendmsg_nosec net/socket.c:727 [inline]
    __sock_sendmsg net/socket.c:742 [inline]
    ____sys_sendmsg+0x54a/0x580 net/socket.c:2592
    ___sys_sendmsg+0x101/0x140 net/socket.c:2646
    __sys_sendmsg+0xcd/0x140 net/socket.c:2678
    do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
    do_syscall_64+0xe2/0xf80 arch/x86/entry/syscall_64.c:94
    entry_SYSCALL_64_after_hwframe+0x77/0x7f

connection error: failed to recv *flatrpc.ExecutorMessageRawT: EOF


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

* Re: [net,PATCH] net: ks8851: Reinstate disabling of BHs around IRQ handler
From: Marek Vasut @ 2026-04-08 15:41 UTC (permalink / raw)
  To: Nicolai Buchwitz
  Cc: netdev, stable, David S. Miller, Andrew Lunn, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Ronald Wahl, Yicong Hui,
	linux-kernel
In-Reply-To: <f4010cedaa49afc1648a73775a987ee5@tipi-net.de>

On 4/8/26 12:54 PM, Nicolai Buchwitz wrote:

Hello Nicolai,

thank you for testing on the SPI variant, that helped a lot.

> In order to make this work I would propose something like this (which 
> works in my SPI setup):
> 
> --- a/drivers/net/ethernet/micrel/ks8851_par.c
> +++ b/drivers/net/ethernet/micrel/ks8851_par.c
> @@ -60,12 +60,14 @@ static void ks8851_lock_par(struct ks8851_net *ks, 
> unsigned long *flags)
>   {
>       struct ks8851_net_par *ksp = to_ks8851_par(ks);
> 
> +    local_bh_disable();
>       spin_lock_irqsave(&ksp->lock, *flags);
>   }
> 
>   static void ks8851_unlock_par(struct ks8851_net *ks, unsigned long 
> *flags)
>   {
>       struct ks8851_net_par *ksp = to_ks8851_par(ks);
> 
>       spin_unlock_irqrestore(&ksp->lock, *flags);
> +    local_bh_enable();
>   }
> 
> Tested-by: Nicolai Buchwitz <nb@tipi-net.de>  # KS8851 SPI, non-RT 
> (regression + proposed fix)

Are you also able to test the KS8851 driver with PREEMPT_RT enabled and 
heavy iperf3 traffic on the SPI variant ? Does that trigger any issues ? 
I ran 'iperf3 -s' on the KS8851 end and 'iperf3 -c 192.168.1.300 -t 0 
--bidir' on the host PC side.

Let me prepare a slightly updated fix and send a V2.

^ permalink raw reply

* Re: [net-next v1 v1 4/5] net: stmmac: starfive: Add JHB100 SGMII interface
From: Andrew Lunn @ 2026-04-08 15:36 UTC (permalink / raw)
  To: Minda Chen
  Cc: Alexandre Torgue, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
	Emil Renner Berthing, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, netdev, linux-kernel, linux-stm32, devicetree
In-Reply-To: <20260408084416.29753-5-minda.chen@starfivetech.com>

> +	dwmac->sgmii_rx = devm_clk_get_optional(&pdev->dev, "rx");
> +	if (IS_ERR(dwmac->sgmii_rx))
> +		return dev_err_probe(&pdev->dev, PTR_ERR(dwmac->sgmii_rx),
> +				     "error getting sgmii rx clock\n");
> +

The SGMII clock is optional...

>  	/* Generally, the rgmii_tx clock is provided by the internal clock,
>  	 * which needs to match the corresponding clock frequency according
>  	 * to different speeds. If the rgmii_tx clock is provided by the
>  	 * external rgmii_rxin, there is no need to configure the clock
>  	 * internally, because rgmii_rxin will be adaptively adjusted.
>  	 */
> -	if (!device_property_read_bool(&pdev->dev, "starfive,tx-use-rgmii-clk"))
> -		plat_dat->set_clk_tx_rate = stmmac_set_clk_tx_rate;
> +	if (!device_property_read_bool(&pdev->dev, "starfive,tx-use-rgmii-clk")) {
> +		if (plat_dat->phy_interface == PHY_INTERFACE_MODE_SGMII)
> +			plat_dat->set_clk_tx_rate = stmmac_starfive_sgmii_set_clk_rate;

So you probably want to return an error here if it is missing.

Or you might want to look at the compatible, and make the clock
mandatory for this device.

   Andrew

^ permalink raw reply

* Re: [net-next v1 v1 3/5] dt-bindings: net: starfive,jh7110-dwmac: Add JHB100 sgmii rx clk
From: Andrew Lunn @ 2026-04-08 15:33 UTC (permalink / raw)
  To: Minda Chen
  Cc: Alexandre Torgue, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
	Emil Renner Berthing, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, netdev, linux-kernel, linux-stm32, devicetree
In-Reply-To: <20260408084416.29753-4-minda.chen@starfivetech.com>

> +      - description: SGMII RX clock
>  
>    clock-names:
> -    items:
> -      - const: stmmaceth
> -      - const: pclk
> -      - const: ptp_ref
> -      - const: tx
> -      - const: gtx
> +    minItems: 5
> +    maxItems: 6
> +    contains:
> +      enum:
> +       - stmmaceth
> +       - pclk
> +       - ptp_ref
> +       - tx
> +       - gtx
> +       - rx

If this is only used for sgmii, maybe it should have sgmii in the
name?

	Andrew

^ permalink raw reply

* Re: [PATCH net-next v2] vsock/virtio: remove unnecessary call to `virtio_transport_get_ops`
From: Stefano Garzarella @ 2026-04-08 15:31 UTC (permalink / raw)
  To: Luigi Leonardi
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Stefan Hajnoczi, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Arseniy Krasnov, kvm, virtualization,
	netdev, linux-kernel
In-Reply-To: <20260408-remove_parameter-v2-1-e00f31cf7a17@redhat.com>

On Wed, Apr 08, 2026 at 05:21:02PM +0200, Luigi Leonardi wrote:
>`virtio_transport_send_pkt_info` gets all the transport information
>from the parameter `t_ops`. There is no need to call
>`virtio_transport_get_ops()`.
>
>Remove it.
>
>Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
>Acked-by: Michael S. Tsirkin <mst@redhat.com>
>Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
>---
>Changes in v2:
>- Removed Fixes tag.
>- Picked up RoBs
>- Rebased to latest net-next
>- Link to v1: https://lore.kernel.org/r/20260407-remove_parameter-v1-1-e9729360a2be@redhat.com
>---
> net/vmw_vsock/virtio_transport_common.c | 2 --
> 1 file changed, 2 deletions(-)

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>


^ permalink raw reply

* Re: [net-next v1 v1 1/5] dt-bindings: net: starfive,jh7110-dwmac: Remove JH8100
From: Andrew Lunn @ 2026-04-08 15:27 UTC (permalink / raw)
  To: Minda Chen
  Cc: Alexandre Torgue, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
	Emil Renner Berthing, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, netdev, linux-kernel, linux-stm32, devicetree
In-Reply-To: <20260408084416.29753-2-minda.chen@starfivetech.com>

On Wed, Apr 08, 2026 at 04:44:12PM +0800, Minda Chen wrote:
> Remove JH8100 dt-bindings because do not support it now.

Could you expand on that. If there are devices out in the field, we
don't just drop support for it because the vendor has something newer.

If the device never made it outside of the vendors lab, then we might
consider dropping it.

Please explain in detail why this is being dropped.

	Andrew

^ permalink raw reply

* [PATCH net v3] net/sched: cls_fw: fix NULL dereference of "old" filters before change()
From: Davide Caratti @ 2026-04-08 15:24 UTC (permalink / raw)
  To: Jamal Hadi Salim, Jiri Pirko, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Xiang Mei, netdev
  Cc: Victor Nogueira

Like pointed out by Sashiko [1], since commit ed76f5edccc9 ("net: sched:
protect filter_chain list with filter_chain_lock mutex") TC filters are
added to a shared block and published to datapath before their ->change()
function is called. This is a problem for cls_fw: an invalid filter
created with the "old" method can still classify some packets before it
is destroyed by the validation logic added by Xiang.
Therefore, insisting with repeated runs of the following script:

 # ip link add dev crash0 type dummy
 # ip link set dev crash0 up
 # mausezahn  crash0 -c 100000 -P 10 \
 > -A 4.3.2.1 -B 1.2.3.4 -t udp "dp=1234" -q &
 # sleep 1
 # tc qdisc add dev crash0 egress_block 1 clsact
 # tc filter add block 1 protocol ip prio 1 matchall \
 > action skbedit mark 65536 continue
 # tc filter add block 1 protocol ip prio 2 fw
 # ip link del dev crash0

can still make fw_classify() hit the WARN_ON() in [2]:

 WARNING: ./include/net/pkt_cls.h:88 at fw_classify+0x244/0x250 [cls_fw], CPU#18: mausezahn/1399
 Modules linked in: cls_fw(E) act_skbedit(E)
 CPU: 18 UID: 0 PID: 1399 Comm: mausezahn Tainted: G            E       7.0.0-rc6-virtme #17 PREEMPT(full)
 Tainted: [E]=UNSIGNED_MODULE
 Hardware name: Red Hat KVM, BIOS 1.16.3-2.el9 04/01/2014
 RIP: 0010:fw_classify+0x244/0x250 [cls_fw]
 Code: 5c 49 c7 45 00 00 00 00 00 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 5b b8 ff ff ff ff 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 90 <0f> 0b 90 eb a0 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90
 RSP: 0018:ffffd1b7026bf8a8 EFLAGS: 00010202
 RAX: ffff8c5ac9c60800 RBX: ffff8c5ac99322c0 RCX: 0000000000000004
 RDX: 0000000000000001 RSI: ffff8c5b74d7a000 RDI: ffff8c5ac8284f40
 RBP: ffffd1b7026bf8d0 R08: 0000000000000000 R09: ffffd1b7026bf9b0
 R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000010000
 R13: ffffd1b7026bf930 R14: ffff8c5ac8284f40 R15: 0000000000000000
 FS:  00007fca40c37740(0000) GS:ffff8c5b74d7a000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fca40e822a0 CR3: 0000000005ca0001 CR4: 0000000000172ef0
 Call Trace:
  <TASK>
  tcf_classify+0x17d/0x5c0
  tc_run+0x9d/0x150
  __dev_queue_xmit+0x2ab/0x14d0
  ip_finish_output2+0x340/0x8f0
  ip_output+0xa4/0x250
  raw_sendmsg+0x147d/0x14b0
  __sys_sendto+0x1cc/0x1f0
  __x64_sys_sendto+0x24/0x30
  do_syscall_64+0x126/0xf80
  entry_SYSCALL_64_after_hwframe+0x77/0x7f
 RIP: 0033:0x7fca40e822ba
 Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
 RSP: 002b:00007ffc248a42c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 000055ef233289d0 RCX: 00007fca40e822ba
 RDX: 000000000000001e RSI: 000055ef23328c30 RDI: 0000000000000003
 RBP: 000055ef233289d0 R08: 00007ffc248a42d0 R09: 0000000000000010
 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000001e
 R13: 00000000000186a0 R14: 0000000000000000 R15: 00007fca41043000
  </TASK>
 irq event stamp: 1045778
 hardirqs last  enabled at (1045784): [<ffffffff864ec042>] __up_console_sem+0x52/0x60
 hardirqs last disabled at (1045789): [<ffffffff864ec027>] __up_console_sem+0x37/0x60
 softirqs last  enabled at (1045426): [<ffffffff874d48c7>] __alloc_skb+0x207/0x260
 softirqs last disabled at (1045434): [<ffffffff874fe8f8>] __dev_queue_xmit+0x78/0x14d0

Then, because of the value in the packet's mark, dereference on 'q->handle'
with NULL 'q' occurs:

 BUG: kernel NULL  pointer dereference, address: 0000000000000038
 [...]
 RIP: 0010:fw_classify+0x1fe/0x250 [cls_fw]
 [...]

Skip "old-style" classification on shared blocks, so that the NULL
dereference is fixed and WARN_ON() is not hit anymore in the short
lifetime of invalid cls_fw "old-style" filters.

V2: avoid NULL dereference without hitting WARN_ON() anymore (Sashiko)
V3: correct 'Fixes' tag (Jamal)

[1] https://sashiko.dev/#/patchset/20260331050217.504278-1-xmei5%40asu.edu
[2] https://elixir.bootlin.com/linux/v7.0-rc6/source/include/net/pkt_cls.h#L86

Fixes: faeea8bbf6e9 ("net/sched: cls_fw: fix NULL pointer dereference on shared blocks")
Fixes: ed76f5edccc9 ("net: sched: protect filter_chain list with filter_chain_lock mutex")

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/sched/cls_fw.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
index 23884ef8b80c..646a730dca93 100644
--- a/net/sched/cls_fw.c
+++ b/net/sched/cls_fw.c
@@ -74,9 +74,13 @@ TC_INDIRECT_SCOPE int fw_classify(struct sk_buff *skb,
 			}
 		}
 	} else {
-		struct Qdisc *q = tcf_block_q(tp->chain->block);
+		struct Qdisc *q;
 
 		/* Old method: classify the packet using its skb mark. */
+		if (tcf_block_shared(tp->chain->block))
+			return -1;
+
+		q = tcf_block_q(tp->chain->block);
 		if (id && (TC_H_MAJ(id) == 0 ||
 			   !(TC_H_MAJ(id ^ q->handle)))) {
 			res->classid = id;
-- 
2.52.0


^ permalink raw reply related

* Re: [PATCH v2] net: dsa: microchip: implement KSZ87xx Module 3 low-loss cable errata
From: Fidelio LAWSON @ 2026-04-08 15:25 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Woojung Huh, UNGLinuxDriver, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Marek Vasut, Maxime Chevallier,
	netdev, devicetree, linux-kernel, Fidelio Lawson
In-Reply-To: <a350c4b7-d816-455b-83c0-f4d98299c637@lunn.ch>

On 4/8/26 14:43, Andrew Lunn wrote:
>> The control register defines the following modes:
>>    bits [1:0]:
>>      00 = workaround disabled
>>      01 = workaround 1 (DSP EQ training adjustment, LinkMD reg 0x3c)
>>      10 = workaround 2 (receiver LPF bandwidth, LinkMD reg 0x4c)
> 
> There was a comment, which i only read after making the suggestion to
> use two bits, of exposing the different low pass filter bandwidths,
> rather than just picking one value. How useful is that?
> 
>         Andrew

Initially I limited the LPF setting to the single bandwidth explicitly 
recommended by the errata (62MHz).
But I’ll extend the implementation to expose all documented LPF 
bandwidth options so the interface is more flexible for users.

Best regards,
Fidelio


^ permalink raw reply

* [RFC PATCH iproute2] ip/bond: add lacp_fallback support
From: Louis Scalbert @ 2026-04-08 15:24 UTC (permalink / raw)
  To: netdev
  Cc: stephen, andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy,
	shemminger, maheshb, Louis Scalbert

lacp_fallback defines the behavior of a LACP bonding interface
when no slaves are in collecting(/distributing) state while at least
'min_links' slaves have carrier.

In the default (legacy) mode, the bonding master remains up and a
single slave is selected for TX/RX, while traffic received on other
slaves is dropped. This preserves the existing behavior.

In strict mode, the bonding master reports carrier down in this
situation.

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
 include/uapi/linux/if_link.h |  1 +
 ip/iplink_bond.c             | 25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 2037afbc..1588d520 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -1537,6 +1537,7 @@ enum {
 	IFLA_BOND_NS_IP6_TARGET,
 	IFLA_BOND_COUPLED_CONTROL,
 	IFLA_BOND_BROADCAST_NEIGH,
+	IFLA_BOND_LACP_FALLBACK,
 	__IFLA_BOND_MAX,
 };
 
diff --git a/ip/iplink_bond.c b/ip/iplink_bond.c
index 714fe7bd..f33252e8 100644
--- a/ip/iplink_bond.c
+++ b/ip/iplink_bond.c
@@ -87,6 +87,12 @@ static const char *lacp_rate_tbl[] = {
 	NULL,
 };
 
+static const char *lacp_fallback_tbl[] = {
+	"legacy",
+	"strict",
+	NULL,
+};
+
 static const char *ad_select_tbl[] = {
 	"stable",
 	"bandwidth",
@@ -155,6 +161,7 @@ static void print_explain(FILE *f)
 		"                [ ad_user_port_key PORTKEY ]\n"
 		"                [ ad_actor_sys_prio SYSPRIO ]\n"
 		"                [ ad_actor_system LLADDR ]\n"
+		"                [ lacp_fallback LACP_FALLBACK ]\n"
 		"                [ arp_missed_max MISSED_MAX ]\n"
 		"\n"
 		"BONDMODE := balance-rr|active-backup|balance-xor|broadcast|802.3ad|balance-tlb|balance-alb\n"
@@ -168,6 +175,7 @@ static void print_explain(FILE *f)
 		"AD_SELECT := stable|bandwidth|count\n"
 		"COUPLED_CONTROL := off|on\n"
 		"BROADCAST_NEIGHBOR := off|on\n"
+		"LACP_FALLBACK := legacy|strict\n"
 	);
 }
 
@@ -188,6 +196,7 @@ static int bond_parse_opt(struct link_util *lu, int argc, char **argv,
 	__u32 packets_per_slave;
 	__u8 missed_max;
 	__u8 broadcast_neighbor;
+	__u8 lacp_fallback;
 	unsigned int ifindex;
 	int ret;
 
@@ -417,6 +426,13 @@ static int bond_parse_opt(struct link_util *lu, int argc, char **argv,
 				return -1;
 			addattr_l(n, 1024, IFLA_BOND_AD_ACTOR_SYSTEM,
 				  abuf, len);
+		} else if (matches(*argv, "lacp_fallback") == 0) {
+			NEXT_ARG();
+			if (get_index(lacp_fallback_tbl, *argv) < 0)
+				invarg("invalid lacp_rate", *argv);
+
+			lacp_fallback = get_index(lacp_fallback_tbl, *argv);
+			addattr8(n, 1024, IFLA_BOND_LACP_FALLBACK, lacp_fallback);
 		} else if (matches(*argv, "tlb_dynamic_lb") == 0) {
 			NEXT_ARG();
 			if (get_u8(&tlb_dynamic_lb, *argv, 0)) {
@@ -642,6 +658,15 @@ static void bond_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 			   "all_slaves_active %u ",
 			   rta_getattr_u8(tb[IFLA_BOND_ALL_SLAVES_ACTIVE]));
 
+	if (tb[IFLA_BOND_LACP_FALLBACK]) {
+		const char *lacp_fallback = get_name(lacp_fallback_tbl,
+						 rta_getattr_u8(tb[IFLA_BOND_LACP_FALLBACK]));
+		print_string(PRINT_ANY,
+			     "lacp_fallback",
+			     "lacp_fallback %s ",
+			     lacp_fallback);
+	}
+
 	if (tb[IFLA_BOND_MIN_LINKS])
 		print_uint(PRINT_ANY,
 			   "min_links",
-- 
2.39.2


^ permalink raw reply related

* [PATCH net v3 1/5] bonding: 3ad: add lacp_fallback configuration knob
From: Louis Scalbert @ 2026-04-08 15:23 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
	maheshb, Louis Scalbert
In-Reply-To: <20260408152353.276204-1-louis.scalbert@6wind.com>

When an 802.3ad (LACP) bonding interface has no slaves in the
collecting/distributing state, the bonding master still reports
carrier as up as long as at least 'min_links' slaves have carrier.

In this situation, only one slave is effectively used for TX/RX,
while traffic received on other slaves is dropped. Upper-layer
daemons therefore consider the interface operational, even though
traffic may be blackholed if the lack of LACP negotiation means
the partner is not ready to deal with traffic.

Introduce a configuration knob to control this behavior. It allows
the bonding master to assert carrier only when at least 'min_links'
slaves are in collecting/distributing state (or collecting only
when coupled_control is disabled).

The default mode preserves the current (legacy) behavior. This
patch only introduces the knob; its behavior is implemented in
the subsequent commit.

Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
 Documentation/networking/bonding.rst | 33 ++++++++++++++++++++++++++++
 drivers/net/bonding/bond_main.c      |  1 +
 drivers/net/bonding/bond_netlink.c   | 16 ++++++++++++++
 drivers/net/bonding/bond_options.c   | 26 ++++++++++++++++++++++
 include/net/bond_options.h           |  1 +
 include/net/bonding.h                |  1 +
 include/uapi/linux/if_link.h         |  1 +
 7 files changed, 79 insertions(+)

diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
index e700bf1d095c..465d06aead27 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -619,6 +619,39 @@ min_links
 	aggregator cannot be active without at least one available link,
 	setting this option to 0 or to 1 has the exact same effect.
 
+lacp_fallback
+
+	Specifies the fallback behavior of a bonding when LACP negotiation fails on
+	all slave links, i.e. when no slave is in the Collecting/Distributing state
+	(or only in Collecting state when coupled_control is disabled), while at
+	least `min_links` link still reports carrier up.
+
+	This option is only applicable to 802.3ad mode (mode 4).
+
+	Valid values are:
+
+	legacy or 0
+		In this situation, the bonding master remains carrier up and
+		randomly selects a single slave to transmit and receive traffic.
+		Traffic received on other slaves is dropped.
+
+		This mode is deprecated, as it may lead to traffic blackholing
+		when the absence of LACP negotiation means the partner is not
+		ready to collect and distribute traffic.
+
+		This is the legacy default behavior.
+
+	strict or 1
+		In this situation, the bonding master reports carrier down, allowing
+		upper-layer processes to detect that the interface is not usable for
+		collecting and distributing traffic.
+
+		The master transitions to carrier up only when at least
+		`min_links` slaves reach the Collecting(/Distributing) state,
+		allowing traffic to flow.
+
+	The default value is 0 (legacy).
+
 mode
 
 	Specifies one of the bonding policies. The default is
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a5484d11553d..02cba0560a39 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -6440,6 +6440,7 @@ static int __init bond_check_params(struct bond_params *params)
 	params->ad_user_port_key = ad_user_port_key;
 	params->coupled_control = 1;
 	params->broadcast_neighbor = 0;
+	params->lacp_fallback = 0;
 	if (packets_per_slave > 0) {
 		params->reciprocal_packets_per_slave =
 			reciprocal_value(packets_per_slave);
diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index 286f11c517f7..1f92ad786b51 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -130,6 +130,7 @@ static const struct nla_policy bond_policy[IFLA_BOND_MAX + 1] = {
 	[IFLA_BOND_NS_IP6_TARGET]	= { .type = NLA_NESTED },
 	[IFLA_BOND_COUPLED_CONTROL]	= { .type = NLA_U8 },
 	[IFLA_BOND_BROADCAST_NEIGH]	= { .type = NLA_U8 },
+	[IFLA_BOND_LACP_FALLBACK]	= { .type = NLA_U8 },
 };
 
 static const struct nla_policy bond_slave_policy[IFLA_BOND_SLAVE_MAX + 1] = {
@@ -586,6 +587,16 @@ static int bond_changelink(struct net_device *bond_dev, struct nlattr *tb[],
 			return err;
 	}
 
+	if (data[IFLA_BOND_LACP_FALLBACK]) {
+		int fallback_mode = nla_get_u8(data[IFLA_BOND_LACP_FALLBACK]);
+
+		bond_opt_initval(&newval, fallback_mode);
+		err = __bond_opt_set(bond, BOND_OPT_LACP_FALLBACK, &newval,
+				     data[IFLA_BOND_LACP_FALLBACK], extack);
+		if (err)
+			return err;
+	}
+
 	return 0;
 }
 
@@ -658,6 +669,7 @@ static size_t bond_get_size(const struct net_device *bond_dev)
 		nla_total_size(sizeof(struct in6_addr)) * BOND_MAX_NS_TARGETS +
 		nla_total_size(sizeof(u8)) +	/* IFLA_BOND_COUPLED_CONTROL */
 		nla_total_size(sizeof(u8)) +	/* IFLA_BOND_BROADCAST_NEIGH */
+		nla_total_size(sizeof(u8)) +	/* IFLA_BOND_LACP_FALLBACK */
 		0;
 }
 
@@ -825,6 +837,10 @@ static int bond_fill_info(struct sk_buff *skb,
 		       bond->params.broadcast_neighbor))
 		goto nla_put_failure;
 
+	if (nla_put_u8(skb, IFLA_BOND_LACP_FALLBACK,
+		       bond->params.lacp_fallback))
+		goto nla_put_failure;
+
 	if (BOND_MODE(bond) == BOND_MODE_8023AD) {
 		struct ad_info info;
 
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 7380cc4ee75a..b672b8a881bb 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -68,6 +68,8 @@ static int bond_option_lacp_active_set(struct bonding *bond,
 				       const struct bond_opt_value *newval);
 static int bond_option_lacp_rate_set(struct bonding *bond,
 				     const struct bond_opt_value *newval);
+static int bond_option_lacp_fallback_set(struct bonding *bond,
+					 const struct bond_opt_value *newval);
 static int bond_option_ad_select_set(struct bonding *bond,
 				     const struct bond_opt_value *newval);
 static int bond_option_queue_id_set(struct bonding *bond,
@@ -162,6 +164,12 @@ static const struct bond_opt_value bond_lacp_rate_tbl[] = {
 	{ NULL,   -1,           0},
 };
 
+static const struct bond_opt_value bond_lacp_fallback_tbl[] = {
+	{ "legacy", 0, BOND_VALFLAG_DEFAULT},
+	{ "strict",  1, 0},
+	{ NULL, -1, 0 }
+};
+
 static const struct bond_opt_value bond_ad_select_tbl[] = {
 	{ "stable",          BOND_AD_STABLE,    BOND_VALFLAG_DEFAULT},
 	{ "bandwidth",       BOND_AD_BANDWIDTH, 0},
@@ -363,6 +371,14 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = {
 		.values = bond_lacp_rate_tbl,
 		.set = bond_option_lacp_rate_set
 	},
+	[BOND_OPT_LACP_FALLBACK] = {
+		.id = BOND_OPT_LACP_FALLBACK,
+		.name = "lacp_fallback",
+		.desc = "Define the LACP fallback mode when no slaves have negotiated",
+		.unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_8023AD)),
+		.values = bond_lacp_fallback_tbl,
+		.set = bond_option_lacp_fallback_set
+	},
 	[BOND_OPT_MINLINKS] = {
 		.id = BOND_OPT_MINLINKS,
 		.name = "min_links",
@@ -1684,6 +1700,16 @@ static int bond_option_lacp_rate_set(struct bonding *bond,
 	return 0;
 }
 
+static int bond_option_lacp_fallback_set(struct bonding *bond,
+					 const struct bond_opt_value *newval)
+{
+	netdev_dbg(bond->dev, "Setting LACP fallback to %s (%llu)\n",
+		   newval->string, newval->value);
+	bond->params.lacp_fallback = newval->value;
+
+	return 0;
+}
+
 static int bond_option_ad_select_set(struct bonding *bond,
 				     const struct bond_opt_value *newval)
 {
diff --git a/include/net/bond_options.h b/include/net/bond_options.h
index e6eedf23aea1..5eb64c831f54 100644
--- a/include/net/bond_options.h
+++ b/include/net/bond_options.h
@@ -79,6 +79,7 @@ enum {
 	BOND_OPT_COUPLED_CONTROL,
 	BOND_OPT_BROADCAST_NEIGH,
 	BOND_OPT_ACTOR_PORT_PRIO,
+	BOND_OPT_LACP_FALLBACK,
 	BOND_OPT_LAST
 };
 
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 395c6e281c5f..d8cb02643f8b 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -132,6 +132,7 @@ struct bond_params {
 	int peer_notif_delay;
 	int lacp_active;
 	int lacp_fast;
+	int lacp_fallback;
 	unsigned int min_links;
 	int ad_select;
 	char primary[IFNAMSIZ];
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index e9b5f79e1ee1..7ad3fc600c71 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -1539,6 +1539,7 @@ enum {
 	IFLA_BOND_NS_IP6_TARGET,
 	IFLA_BOND_COUPLED_CONTROL,
 	IFLA_BOND_BROADCAST_NEIGH,
+	IFLA_BOND_LACP_FALLBACK,
 	__IFLA_BOND_MAX,
 };
 
-- 
2.39.2


^ permalink raw reply related

* [PATCH net v3 5/5] selftests: bonding: add test for fallback mode
From: Louis Scalbert @ 2026-04-08 15:23 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
	maheshb, Louis Scalbert
In-Reply-To: <20260408152353.276204-1-louis.scalbert@6wind.com>

Add a test for the bonding legacy and strict LACP fallback modes.

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
 .../selftests/drivers/net/bonding/Makefile    |   1 +
 .../drivers/net/bonding/bond_lacp_fallback.sh | 299 ++++++++++++++++++
 2 files changed, 300 insertions(+)
 create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_lacp_fallback.sh

diff --git a/tools/testing/selftests/drivers/net/bonding/Makefile b/tools/testing/selftests/drivers/net/bonding/Makefile
index 6c5c60adb5e8..a117bf2e483b 100644
--- a/tools/testing/selftests/drivers/net/bonding/Makefile
+++ b/tools/testing/selftests/drivers/net/bonding/Makefile
@@ -7,6 +7,7 @@ TEST_PROGS := \
 	bond-eth-type-change.sh \
 	bond-lladdr-target.sh \
 	bond_ipsec_offload.sh \
+	bond_lacp_fallback.sh \
 	bond_lacp_prio.sh \
 	bond_macvlan_ipvlan.sh \
 	bond_options.sh \
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_lacp_fallback.sh b/tools/testing/selftests/drivers/net/bonding/bond_lacp_fallback.sh
new file mode 100755
index 000000000000..a983a2c2ea17
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/bonding/bond_lacp_fallback.sh
@@ -0,0 +1,299 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Testing if bond lacp_fallback works
+#
+#          Partner (p_ns)
+#  +-------------------------+
+#  |          bond0          |
+#  |            +            |
+#  |      eth0  |  eth1      |
+#  |        +---+---+        |
+#  |        |       |        |
+#  +-------------------------+
+#           |       |
+#  +--------------------------+
+#  |        |       |         |
+#  |        +---+---+         |
+#  |      eth0  |  eth1       |
+#  |            +             |
+#  |          bond0           |
+#  +--------------------------+
+#         Dut (d_ns)
+
+lib_dir=$(dirname "$0")
+# shellcheck disable=SC1090
+source "$lib_dir"/../../../net/lib.sh
+
+COLLECTING_DISTRIBUTING_MASK=48
+COLLECTING_DISTRIBUTING=48
+FAILED=0
+
+setup_links()
+{
+	# shellcheck disable=SC2154
+	ip -n "${d_ns}" link add eth0 type veth peer name eth0 netns "${p_ns}"
+	ip -n "${d_ns}" link add eth1 type veth peer name eth1 netns "${p_ns}"
+
+	ip -n "${d_ns}" link add bond0 type bond mode 802.3ad miimon 100 \
+		lacp_rate fast min_links 1
+	ip -n "${p_ns}" link add bond0 type bond mode 802.3ad miimon 100 \
+		lacp_rate fast min_links 1
+
+	ip -n "${d_ns}" link set eth0 master bond0
+	ip -n "${d_ns}" link set eth1 master bond0
+	ip -n "${p_ns}" link set eth0 master bond0
+	ip -n "${p_ns}" link set eth1 master bond0
+
+	ip -n "${d_ns}" link set bond0 up
+	ip -n "${p_ns}" link set bond0 up
+}
+
+test_master_carrier() {
+	local expected=$1
+	local mode_name=$2
+	local carrier
+
+	carrier=$(ip netns exec "${d_ns}" cat /sys/class/net/bond0/carrier)
+	[ "$carrier" == "1" ] && carrier="up" || carrier="down"
+
+	[ "$carrier" == "$expected" ] && return
+
+	echo "FAIL: Expected carrier $expected in $mode_name mode, got $carrier"
+
+	RET=1
+
+}
+
+compare_state() {
+	local actual_state=$1
+	local expected_state=$2
+	local iface=$3
+	local last_attempt=$4
+
+    [ $((actual_state & COLLECTING_DISTRIBUTING_MASK)) -eq "$expected_state" ] \
+		&& return 0
+
+	[ "$last_attempt" -ne 1 ] && return 1
+
+	printf "FAIL: Expected LACP %s actor state to " "$iface"
+	if [ "$expected_state" -eq $COLLECTING_DISTRIBUTING ]; then
+		echo "be in Collecting/Distributing state"
+	else
+		echo "have neither Collecting nor Distributing set."
+	fi
+
+	return 1
+}
+
+_test_lacp_port_state() {
+	local interface=$1
+	local expected=$2
+	local last_attempt=$3
+	local eth0_actor_state eth1_actor_state
+	local ret=0
+
+	# shellcheck disable=SC2016
+	while IFS='=' read -r k v; do
+		printf -v "$k" '%s' "$v"
+	done < <(
+		ip netns exec "${d_ns}" awk '
+		/^Slave Interface: / { iface=$3 }
+		/details actor lacp pdu:/ { ctx="actor" }
+		/details partner lacp pdu:/ { ctx="partner" }
+		/^[[:space:]]+port state: / {
+			if (ctx == "actor") {
+				gsub(":", "", iface)
+				printf "%s_%s_state=%s\n", iface, ctx, $3
+			}
+		}
+		' /proc/net/bonding/bond0
+	)
+
+	if [ "$interface" == "eth0" ] || [ "$interface" == "both" ]; then
+		compare_state "$eth0_actor_state" "$expected" eth0 "$last_attempt" || ret=1
+	fi
+
+	if [ "$interface" == "eth1" ] || [ "$interface" == "both" ]; then
+		compare_state "$eth1_actor_state" "$expected" eth1 "$last_attempt" || ret=1
+	fi
+
+	return $ret
+}
+
+test_lacp_port_state() {
+	local interface=$1
+	local expected=$2
+	local retry=$3
+	local last_attempt=0
+	local attempt=1
+	local ret=1
+
+	while [ $attempt -le $((retry + 1)) ]; do
+		[ $attempt -eq $((retry + 1)) ] && last_attempt=1
+		_test_lacp_port_state "$interface" "$expected" "$last_attempt" && return
+		((attempt++))
+		sleep 1
+	done
+
+	RET=1
+}
+
+
+trap cleanup_all_ns EXIT
+setup_ns d_ns p_ns
+setup_links
+
+# Initial state
+RET=0
+mode=legacy
+test_lacp_port_state both $COLLECTING_DISTRIBUTING 3
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 up"
+
+# partner eth0 down, eth1 up
+RET=0
+ip -n "${p_ns}" link set eth0 down
+test_lacp_port_state eth0 $FAILED 5
+test_lacp_port_state eth1 $COLLECTING_DISTRIBUTING 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 down"
+
+# partner eth0 and eth1 down
+RET=0
+ip -n "${p_ns}" link set eth1 down
+test_lacp_port_state both $FAILED 5
+test_master_carrier down $mode # down because of min_links
+log_test "bond LACP" "$mode fallback - eth0 and eth1 down"
+
+# partner eth0 up, eth1 down
+RET=0
+ip -n "${p_ns}" link set eth0 up
+test_lacp_port_state eth0 $COLLECTING_DISTRIBUTING 60
+test_lacp_port_state eth1 $FAILED 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 up, eth1 down"
+
+# partner eth0 and eth1 up
+RET=0
+ip -n "${p_ns}" link set eth1 up
+test_lacp_port_state both $COLLECTING_DISTRIBUTING 60
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 up"
+
+# partner eth0 stops LACP and eth1 up
+RET=0
+ip netns exec "${p_ns}" tc qdisc add dev eth0 root netem loss 100%
+test_lacp_port_state eth0 $FAILED 5
+test_lacp_port_state eth1 $COLLECTING_DISTRIBUTING 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 stopped sending LACP"
+
+# partner eth0 and eth1 stop LACP
+RET=0
+ip netns exec "${p_ns}" tc qdisc add dev eth1 root netem loss 100%
+test_lacp_port_state both $FAILED 5
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 stopped sending LACP"
+
+# switch to lacp_fallback strict
+RET=0
+mode=strict
+ip -n "${d_ns}" link set dev bond0 type bond lacp_fallback $mode
+test_lacp_port_state both $FAILED 1
+test_master_carrier down $mode 5
+log_test "bond LACP" "$mode fallback - eth0 and eth1 stopped sending LACP"
+
+# switch back to lacp_fallback legacy mode
+RET=0
+mode=legacy
+ip -n "${d_ns}" link set dev bond0 type bond lacp_fallback $mode
+test_lacp_port_state both $FAILED 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 stopped sending LACP"
+
+# eth0 recovers LACP
+RET=0
+ip netns exec "${p_ns}" tc qdisc del dev eth0 root
+test_lacp_port_state eth0 $COLLECTING_DISTRIBUTING 60
+test_lacp_port_state eth1 $FAILED 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 recovered and eth1 stopped sending LACP"
+
+# eth1 recovers LACP
+RET=0
+ip netns exec "${p_ns}" tc qdisc del dev eth1 root
+test_lacp_port_state both $COLLECTING_DISTRIBUTING 60
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 recovered LACP"
+
+# switch to lacp_fallback strict
+RET=0
+mode=strict
+ip -n "${d_ns}" link set dev bond0 type bond lacp_fallback $mode
+test_lacp_port_state both $COLLECTING_DISTRIBUTING 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 up"
+
+# partner eth0 down, eth1 up
+RET=0
+ip -n "${p_ns}" link set eth0 down
+test_lacp_port_state eth0 $FAILED 5
+test_lacp_port_state eth1 $COLLECTING_DISTRIBUTING 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 down"
+
+# partner eth0 and eth1 down
+RET=0
+ip -n "${p_ns}" link set eth1 down
+test_lacp_port_state both $FAILED 5
+test_master_carrier down $mode # down because of min_links
+log_test "bond LACP" "$mode fallback - eth0 and eth1 down"
+
+# partner eth0 up, eth1 down
+RET=0
+ip -n "${p_ns}" link set eth0 up
+test_lacp_port_state eth0 $COLLECTING_DISTRIBUTING 60
+test_lacp_port_state eth1 $FAILED 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 up, eth1 down"
+
+# partner eth0 and eth1 up
+RET=0
+ip -n "${p_ns}" link set eth1 up
+test_lacp_port_state both $COLLECTING_DISTRIBUTING 60
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 up"
+
+# partner eth0 stops LACP and eth1 up
+RET=0
+ip netns exec "${p_ns}" tc qdisc add dev eth0 root netem loss 100%
+test_lacp_port_state eth0 $FAILED 5
+test_lacp_port_state eth1 $COLLECTING_DISTRIBUTING 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 stopped sending LACP"
+
+# partner eth0 and eth1 stop LACP
+RET=0
+ip netns exec "${p_ns}" tc qdisc add dev eth1 root netem loss 100%
+test_lacp_port_state both $FAILED 5
+test_master_carrier down $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 stopped sending LACP"
+
+# eth0 recovers LACP
+RET=0
+ip netns exec "${p_ns}" tc qdisc del dev eth0 root
+test_lacp_port_state eth0 $COLLECTING_DISTRIBUTING 60
+test_lacp_port_state eth1 $FAILED 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 recovered and eth1 stopped sending LACP"
+
+# eth1 recovers LACP
+# shellcheck disable=SC2034
+RET=0
+ip netns exec "${p_ns}" tc qdisc del dev eth1 root
+test_lacp_port_state both $COLLECTING_DISTRIBUTING 60
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 recovered LACP"
+
+exit "${EXIT_STATUS}"
-- 
2.39.2


^ permalink raw reply related

* [PATCH net v3 4/5] bonding: 3ad: fix stuck negotiation on recovery
From: Louis Scalbert @ 2026-04-08 15:23 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
	maheshb, Louis Scalbert
In-Reply-To: <20260408152353.276204-1-louis.scalbert@6wind.com>

The previous commit introduced a side effect caused by clearing the
SELECTED flag on disabled ports. After all ports in an aggregator go
down, if only a subset of ports comes back up, those ports can no
longer renegotiate LACP unless all aggregator ports come back up.

1. All aggregator ports go down
  - The SELECTED flag is cleared on all of them.
2. One port comes back up
  - Its SELECTED flag is set again.
  - It enters the WAITING state and gets its READY_N flag.
  - The remaining ports stay UNSELECTED. Because of that, they cannot
  enter the WAITING state and therefore never get READY_N.
  - __agg_ports_are_ready() returns 0 because it finds a port without
  READY_N.
  - As a result, __set_agg_ports_ready() keeps the READY flag cleared on
  all ports.
  - The port that came back up is therefore not marked READY and cannot
  transition to ATTACHED.
  - LACP negotiation becomes stuck, and the port cannot be used.
3. All aggregator ports come back up
  - They all regain SELECTED and READY_N.
  - __agg_ports_are_ready() now returns 1.
  - __set_agg_ports_ready() sets READY on all ports.
  - They can then transition to ATTACHED.
  - Negotiation resumes and the aggregator becomes operational again.

Consider only ports currently in the WAITING mux state for READY_N in
order to avoid __agg_ports_are_ready() to return 0 because of a disabled
port. That matches 802.3ad, which states: "The Selection Logic asserts
Ready TRUE when the values of Ready_N for all ports that are waiting to
attach to a given Aggregator are TRUE.".

Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
 drivers/net/bonding/bond_3ad.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 3a94fbcbf721..3f56d892b101 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -700,7 +700,8 @@ static void __update_ntt(struct lacpdu *lacpdu, struct port *port)
 }
 
 /**
- * __agg_ports_are_ready - check if all ports in an aggregator are ready
+ * __agg_ports_are_ready - check if all ports in an aggregator that are in
+ * the WAITING state are ready
  * @aggregator: the aggregator we're looking at
  *
  */
@@ -716,6 +717,8 @@ static int __agg_ports_are_ready(struct aggregator *aggregator)
 		for (port = aggregator->lag_ports;
 		     port;
 		     port = port->next_port_in_aggregator) {
+			if (port->sm_mux_state != AD_MUX_WAITING)
+				continue;
 			if (!(port->sm_vars & AD_PORT_READY_N)) {
 				retval = 0;
 				break;
-- 
2.39.2


^ permalink raw reply related

* [PATCH net v3 3/5] bonding: 3ad: fix mux port state on oper down
From: Louis Scalbert @ 2026-04-08 15:23 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
	maheshb, Louis Scalbert
In-Reply-To: <20260408152353.276204-1-louis.scalbert@6wind.com>

When the bonding interface has carrier down due to the absence of
valid slaves and a slave transitions from down to up, the bonding
interface briefly goes carrier up, then down again, and finally up
once LACP negotiates collecting and distributing on the port.

The interface should not transition to carrier up until LACP
negotiation is complete.

This happens because the actor and partner port states remain in
collecting (and distributing) when the port goes down. When the port
comes back up, it temporarily remains in this state until LACP
renegotiation occurs.

Previously this was mostly cosmetic, but since the bonding carrier
state now depends on the LACP negotiation state, it causes the
interface to flap.

Fix this by unsetting the SELECTED flag when a port goes down so that
the mux state machine transitions through ATTACHED and DETACHED,
which clears the actor collecting and distributing flags. Do not
attempt to set the SELECTED flag if the port is still disabled.

Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
 drivers/net/bonding/bond_3ad.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index b79a76296966..3a94fbcbf721 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1570,6 +1570,12 @@ static void ad_port_selection_logic(struct port *port, bool *update_slave_arr)
 	struct slave *slave;
 	int found = 0;
 
+	/* Disabled ports cannot be SELECTED.
+	 * Do not attempt to set the SELECTED flag if the port is still disabled.
+	 */
+	if (!port->is_enabled)
+		return;
+
 	/* if the port is already Selected, do nothing */
 	if (port->sm_vars & AD_PORT_SELECTED)
 		return;
@@ -2794,6 +2800,7 @@ void bond_3ad_handle_link_change(struct slave *slave, char link)
 		/* link has failed */
 		port->is_enabled = false;
 		ad_update_actor_keys(port, true);
+		port->sm_vars &= ~AD_PORT_SELECTED;
 	}
 	agg = __get_first_agg(port);
 	ad_agg_selection_logic(agg, &dummy);
-- 
2.39.2


^ permalink raw reply related

* [PATCH net v3 2/5] bonding: 3ad: fix carrier when no valid slaves
From: Louis Scalbert @ 2026-04-08 15:23 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
	maheshb, Louis Scalbert
In-Reply-To: <20260408152353.276204-1-louis.scalbert@6wind.com>

Apply the "lacp_fallback" configuration from the previous commit.

"lacp_fallback" mode "strict" asserts that the bonding master carrier
only when at least 'min_links' slaves are in the collecting/distributing
state (or collecting only if the coupled_control default behavior is
disabled).

Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
 drivers/net/bonding/bond_3ad.c     | 26 ++++++++++++++++++++++++--
 drivers/net/bonding/bond_options.c |  1 +
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index af7f74cfdc08..b79a76296966 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -745,6 +745,22 @@ static void __set_agg_ports_ready(struct aggregator *aggregator, int val)
 	}
 }
 
+static int __agg_valid_ports(struct aggregator *agg)
+{
+	struct port *port;
+	int valid = 0;
+
+	for (port = agg->lag_ports; port;
+	     port = port->next_port_in_aggregator) {
+		if (port->actor_oper_port_state & LACP_STATE_COLLECTING &&
+		    (!port->slave->bond->params.coupled_control ||
+		     port->actor_oper_port_state & LACP_STATE_DISTRIBUTING))
+			valid++;
+	}
+
+	return valid;
+}
+
 static int __agg_active_ports(struct aggregator *agg)
 {
 	struct port *port;
@@ -2120,6 +2136,7 @@ static void ad_enable_collecting_distributing(struct port *port,
 			  port->actor_port_number,
 			  port->aggregator->aggregator_identifier);
 		__enable_port(port);
+		bond_3ad_set_carrier(port->slave->bond);
 		/* Slave array needs update */
 		*update_slave_arr = true;
 		/* Should notify peers if possible */
@@ -2141,6 +2158,7 @@ static void ad_disable_collecting_distributing(struct port *port,
 			  port->actor_port_number,
 			  port->aggregator->aggregator_identifier);
 		__disable_port(port);
+		bond_3ad_set_carrier(port->slave->bond);
 		/* Slave array needs an update */
 		*update_slave_arr = true;
 	}
@@ -2819,8 +2837,12 @@ int bond_3ad_set_carrier(struct bonding *bond)
 	}
 	active = __get_active_agg(&(SLAVE_AD_INFO(first_slave)->aggregator));
 	if (active) {
-		/* are enough slaves available to consider link up? */
-		if (__agg_active_ports(active) < bond->params.min_links) {
+		/* are enough slaves in collecting (and distributing) state to consider
+		 * link up?
+		 */
+		if ((bond->params.lacp_fallback ? __agg_valid_ports(active)
+					: __agg_active_ports(active)) <
+		    bond->params.min_links) {
 			if (netif_carrier_ok(bond->dev)) {
 				netif_carrier_off(bond->dev);
 				goto out;
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index b672b8a881bb..d64a5d2f80b6 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -1706,6 +1706,7 @@ static int bond_option_lacp_fallback_set(struct bonding *bond,
 	netdev_dbg(bond->dev, "Setting LACP fallback to %s (%llu)\n",
 		   newval->string, newval->value);
 	bond->params.lacp_fallback = newval->value;
+	bond_3ad_set_carrier(bond);
 
 	return 0;
 }
-- 
2.39.2


^ permalink raw reply related

* [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves
From: Louis Scalbert @ 2026-04-08 15:23 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
	maheshb, Louis Scalbert

Hi everyone,

This series addresses a blackholing issue and a subsequent link-flapping
issue in the 802.3ad bonding driver when dealing with inactive slaves
and the `min_links` parameter.

When an 802.3ad (LACP) bonding interface has no slaves in the
collecting/distributing state, the bonding master still reports
carrier as up as long as at least 'min_links' slaves have carrier.

In this situation, only one slave is effectively used for TX/RX,
while traffic received on other slaves is dropped. Upper-layer
daemons therefore consider the interface operational, even though
traffic may be blackholed if the lack of LACP negotiation means
the partner is not ready to deal with traffic.

The current behavior is not compliant with the LACP standard. This
patchset introduces a working behavior that is not strictly
standard-compliant either, but is widely adopted across the industry.
It consists of bringing the bonding master interface down to signal to
upper-layer processes that it is not usable.

This patchset depends on the following iproute2 change:
ip/bond: add lacp_fallback support

Patch 1 introduces the lacp_fallback configuration knob, which is
applied in the subsequent patch. The default (legacy) mode preserves
the existing behavior, while the strict mode is intended to force the
bonding master carrier down in this situation.

Patch 2 addresses the core issue when lacp_fallback is set to strict.
It ensures that carrier is asserted only when at least 'min_links'
slaves are in a valid state (collecting/distributing, or collecting
only when coupled_control is disabled).

Patch 3 fixes a side effect of the first patch. Tightening the carrier 
logic exposes a state persistence bug: when a physical link goes down, 
the LACP collecting/distributing flags remain set. When the link returns, 
the interface briefly hallucinates that it is ready, bounces the carrier 
up, and then drops it again once LACP renegotiation starts. Unsetting the 
SELECTED flag when the link goes down forces the state machine through 
DETACHED, clearing the stale flags and preventing the flap.

Patch 4 fixes a side effect of the second patch caused by clearing the
SELECTED flag on disabled ports. After all ports in an aggregator go
down, if only a subset of ports comes back up, those ports can no
longer renegotiate LACP unless all aggregator ports come back up.

Patch 5 adds a test for the bonding legacy and strict LACP fallback modes.

Louis Scalbert (5):
  bonding: 3ad: add lacp_fallback configuration knob
  bonding: 3ad: fix carrier when no valid slaves
  bonding: 3ad: fix mux port state on oper down
  bonding: 3ad: fix stuck negotiation on recovery
  selftests: bonding: add test for fallback mode

 Documentation/networking/bonding.rst          |  33 ++
 drivers/net/bonding/bond_3ad.c                |  38 ++-
 drivers/net/bonding/bond_main.c               |   1 +
 drivers/net/bonding/bond_netlink.c            |  16 +
 drivers/net/bonding/bond_options.c            |  27 ++
 include/net/bond_options.h                    |   1 +
 include/net/bonding.h                         |   1 +
 include/uapi/linux/if_link.h                  |   1 +
 .../selftests/drivers/net/bonding/Makefile    |   1 +
 .../drivers/net/bonding/bond_lacp_fallback.sh | 299 ++++++++++++++++++
 10 files changed, 415 insertions(+), 3 deletions(-)
 create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_lacp_fallback.sh

-- 
2.39.2


^ permalink raw reply

* Re: BUG: net-next (7.0-rc6 based and later) fails to boot on Jetson Xavier NX
From: Linus Torvalds @ 2026-04-08 15:22 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: netdev, linux-arm-kernel, linux-kernel, iommu, linux-ext4,
	Marek Szyprowski, Robin Murphy, Theodore Ts'o, Andreas Dilger
In-Reply-To: <adZfTi3R6jtsjXx-@shell.armlinux.org.uk>

On Wed, 8 Apr 2026 at 06:59, Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> > Now building straight -rc7. If that also fails, my plan is to start
> > bisecting rc5..rc6, which will likely take most of the rest of the
> > day. So, in the mean time I'm sending this as a heads-up that rc6
> > and onwards has a problem.
>
> Plain -rc7 fails (another random oops):
>
> Now starting the bisect between 7.0-rc5 and 7.0-rc6.

Thanks. Not what I wanted to hear at this point, but a bisect should
get the culprit if this is at least sufficiently repeatable.

The exact symptoms and oops details may be random, but hopefully the
"something bad happens" is reliable enough to bisect.

              Linus

^ permalink raw reply

* [PATCH net-next v2] vsock/virtio: remove unnecessary call to `virtio_transport_get_ops`
From: Luigi Leonardi @ 2026-04-08 15:21 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Stefan Hajnoczi, Stefano Garzarella, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arseniy Krasnov
  Cc: kvm, virtualization, netdev, linux-kernel, Luigi Leonardi

`virtio_transport_send_pkt_info` gets all the transport information
from the parameter `t_ops`. There is no need to call
`virtio_transport_get_ops()`.

Remove it.

Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
---
Changes in v2:
- Removed Fixes tag.
- Picked up RoBs
- Rebased to latest net-next
- Link to v1: https://lore.kernel.org/r/20260407-remove_parameter-v1-1-e9729360a2be@redhat.com
---
 net/vmw_vsock/virtio_transport_common.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 8a9fb23c6e85..a152a9e208d0 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -60,8 +60,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
 		return false;
 
 	/* Check that transport can send data in zerocopy mode. */
-	t_ops = virtio_transport_get_ops(info->vsk);
-
 	if (t_ops->can_msgzerocopy) {
 		int pages_to_send = iov_iter_npages(iov_iter, MAX_SKB_FRAGS);
 

---
base-commit: 4c2ffb3ea8a601bb3730754cfaa433e673037cda
change-id: 20260407-remove_parameter-f61a3e40cf90

Best regards,
-- 
Luigi Leonardi <leonardi@redhat.com>


^ permalink raw reply related

* Re: [PATCH net-next v2 3/4] bpf-timestamp: keep track of the skb when wait_for_space occurs
From: Willem de Bruijn @ 2026-04-08 15:15 UTC (permalink / raw)
  To: Jason Xing, Willem de Bruijn
  Cc: davem, edumazet, kuba, pabeni, horms, willemb, martin.lau, netdev,
	bpf, Jason Xing, Yushan Zhou
In-Reply-To: <CAL+tcoBt1A5GYFvimkcRFUtmy298y13AiF-XFF8b8E8Y6fg8Xw@mail.gmail.com>

> > > > Since we're modifying the kernel, how about adding a new member to
> > > > record sendmsg time which bpf script is able to read. The whole
> > > > scenario looks like this:
> > > > 1) in tcp_sendmsg_locked(), record the sendmsg time for each skb
> > > > 2) in either tso_fragment() or tcp_gso_tstamp(), each new skb will get
> > > > a copy of its original skb
> > > > 3) in each stage, bpf script reads the skb's sendmsg time and the
> > > > current time, and then effortlessly do the math.
> > > >
> > > > At this point, what I had in mind is we have two options:
> > > > 1) only handle the skb from the view of the send syscall layer, which
> > > > is, for sure, very simple but not thorough.
> > > > 2) stick to a pure authentic packet basis, then adding a new member
> > > > seems inevitable. so the question would be where to add? The space of
> > > > the skb structure is very precious :(
> > >
> > > Finding a suitable place to put this timestamp is really hard. IIRC,
> > > we can't expand the size of struct skb_shared_info so easily since
> > > it's a global effect.
> > >
> > > I'm wondering if we can turn the per-packet mode into a non-compatible
> > > feature by reusing 'u32 tskey' to store a microsecond timestamp of
> > > sendmsg.
> >
> > Agreed that an extra field is hard. We should avoid that.
> 
> Avoiding adding a new one makes the whole work extremely hard. I'm
> wondering since we have hwtstamp in shared info, why not add a
> software one for timestamping use? Then, we would support more
> different protocols in more different stages in a finer grain, which
> is a big coarse picture in my mind.

I don't understand the need to store more data in the skb for BPF.

With BPF hooks, the bpf program can record the relevant data directly
in a BPF map.

> Adding a software bit will completely reduce the whole complexity and
> be very easy to use. Would you expect to see a draft by adding such a
> bit first?
> 
> Or just like I mentioned, repurposing tskey seems an alternative,
> which, however, makes the new feature incompatible.
> 
> >
> > If the purpose is to group skbs by sendmsg call (e.g., to filter out
> > all but the last one), it is probably also unnecessary.
> >
> > From a process PoV, since the process knows the sendmsg len and each
> > skb has a tskey in byte offset, it can correlate the skb with a given
> > sendmsg buffer.
> >
> > The BPF program is under control of a third-party admin. So that does
> > not follow directly. But it can be passed additional metadata.
> >
> > I thought about passing the offset of the skb from the start of the
> > sendmsg buffer to identify all consecutive skbs for a sendmsg call,
> > as each new buffer will start with an skb with offset 0 ..
> >
> > .. but that won't work as there is no guarantee that a sendmsg call
> > will not append to an existing outstanding skb.
> 
> Right. TCP is way too complex and we indeed see some tough issues when
> trying to deploy the feature. So my humble take is to make the design
> as simple as possible.
> 
> >
> > Anyway, the general idea is to pass to the BPF program through
> > bpf_skops_tx_timestamping some relevant signal , without having to
> > expand either skb or sk itself.
> >
> > I hear you on that measuring every skb is too frequent. But is calling
> > the BPF program and letting it decide whether to measure too? BPF
> > program invocation itself should be cheap.
> 
> Oh, I was clear enough. Sorry. I meant tracing per skb is definitely
> an awesome way to go. My ultimate goal is to do so. Instead of letting
> people implement various fine grained bpf progs, we can provide a very
> easy/understandable/efficient approach with more samples. It should be
> very beneficial.
> 
> >
> > If per-push is preferable, with a filter ability like the above, it
> > seems more useful to me already.
> 
> Push-level is a compromise plan. Packet-level is what I always pursue :)

Then why not directly implement per-packet.

If the BPF call is cheap and the BPF program can choose to selectively
track packets.

Reminder that you do not want to break (BPF) users by changing
behavior. Let alone more than once. If per-push is going to be
obsoleted, skip ip entirely.

> The current series has this ability: the bpf prog noticed it's a
> SENDMSG sock option and will selectively call
> bpf_sock_ops_enable_tx_tstamp() to do so. Only by calling
> bpf_sock_ops_enable_tx_tstamp() could the skb be tracked.
> 
> Thanks,
> Jason



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox