Re: [syzbot] [net?] possible deadlock in rlb_choose_channel (2)

Netdev List
 help / color / mirror / Atom feed

From: Jay Vosburgh <jv@jvosburgh.net>
To: syzbot <syzbot+1db58dbbccbf93c65c83@syzkaller.appspotmail.com>
Cc: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, pabeni@redhat.com,
	syzkaller-bugs@googlegroups.com
Subject: Re: [syzbot] [net?] possible deadlock in rlb_choose_channel (2)
Date: Wed, 13 May 2026 16:41:28 +0200	[thread overview]
Message-ID: <197247.1778683288@vermin> (raw)
In-Reply-To: <6a043a69.170a0220.1fd042.0004.GAE@google.com>

syzbot <syzbot+1db58dbbccbf93c65c83@syzkaller.appspotmail.com> wrote:

>Hello,
>
>syzbot found the following issue on:
>
>HEAD commit:    c21b90f77687 x86/CPU/AMD: Prevent improper isolation of sh..
>git tree:       upstream
>console output: https://syzkaller.appspot.com/x/log.txt?x=10ec7dba580000
>kernel config:  https://syzkaller.appspot.com/x/.config?x=4caf64b1ee83dac0
>dashboard link: https://syzkaller.appspot.com/bug?extid=1db58dbbccbf93c65c83
>compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
>
>Unfortunately, I don't have any reproducer for this issue yet.
>
>Downloadable assets:
>disk image: https://storage.googleapis.com/syzbot-assets/2f3edabe3b67/disk-c21b90f7.raw.xz
>vmlinux: https://storage.googleapis.com/syzbot-assets/539b63753e79/vmlinux-c21b90f7.xz
>kernel image: https://storage.googleapis.com/syzbot-assets/48e6e7cbc4ca/bzImage-c21b90f7.xz
>
>IMPORTANT: if you fix the issue, please add the following tag to the commit:
>Reported-by: syzbot+1db58dbbccbf93c65c83@syzkaller.appspotmail.com
>
>ip6_tunnel: ip6tnl1 xmit: Local address not yet configured!
>ip6_tunnel: ip6tnl1 xmit: Local address not yet configured!
>============================================
>WARNING: possible recursive locking detected
>syzkaller #0 Tainted: G             L     
>--------------------------------------------
>kworker/u8:3/47 is trying to acquire lock:
>ffff88807a618e98 (&bond->mode_lock){+.-.}-{3:3}, at: spin_lock include/linux/spinlock.h:342 [inline]
>ffff88807a618e98 (&bond->mode_lock){+.-.}-{3:3}, at: rlb_choose_channel+0x37/0x19a0 drivers/net/bonding/bond_alb.c:562
>
>but task is already holding lock:
>ffff88807ffa0e98 (&bond->mode_lock){+.-.}-{3:3}, at: spin_lock_bh include/linux/spinlock.h:348 [inline]
>ffff88807ffa0e98 (&bond->mode_lock){+.-.}-{3:3}, at: rlb_update_rx_clients drivers/net/bonding/bond_alb.c:466 [inline]
>ffff88807ffa0e98 (&bond->mode_lock){+.-.}-{3:3}, at: bond_alb_monitor+0xe8a/0x17e0 drivers/net/bonding/bond_alb.c:1618
>
>other info that might help us debug this:
> Possible unsafe locking scenario:
>
>       CPU0
>       ----
>  lock(&bond->mode_lock);
>  lock(&bond->mode_lock);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
>7 locks held by kworker/u8:3/47:
> #0: ffff8880516b7140 ((wq_completion)bond5#2){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3277 [inline]
> #0: ffff8880516b7140 ((wq_completion)bond5#2){+.+.}-{0:0}, at: process_scheduled_works+0xa35/0x1860 kernel/workqueue.c:3385
> #1: ffffc90000b77c40 ((work_completion)(&(&bond->alb_work)->work)){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3278 [inline]
> #1: ffffc90000b77c40 ((work_completion)(&(&bond->alb_work)->work)){+.+.}-{0:0}, at: process_scheduled_works+0xa70/0x1860 kernel/workqueue.c:3385
> #2: ffffffff8e95cd60 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]
> #2: ffffffff8e95cd60 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
> #2: ffffffff8e95cd60 (rcu_read_lock){....}-{1:3}, at: bond_alb_monitor+0xf8/0x17e0 drivers/net/bonding/bond_alb.c:1546
> #3: ffff88807ffa0e98 (&bond->mode_lock){+.-.}-{3:3}, at: spin_lock_bh include/linux/spinlock.h:348 [inline]
> #3: ffff88807ffa0e98 (&bond->mode_lock){+.-.}-{3:3}, at: rlb_update_rx_clients drivers/net/bonding/bond_alb.c:466 [inline]
> #3: ffff88807ffa0e98 (&bond->mode_lock){+.-.}-{3:3}, at: bond_alb_monitor+0xe8a/0x17e0 drivers/net/bonding/bond_alb.c:1618
> #4: ffffffff8e95cd60 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]
> #4: ffffffff8e95cd60 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
> #4: ffffffff8e95cd60 (rcu_read_lock){....}-{1:3}, at: arp_xmit+0x23/0x270 net/ipv4/arp.c:663
> #5: ffffffff8e95cdc0 (rcu_read_lock_bh){....}-{1:3}, at: local_bh_disable include/linux/bottom_half.h:20 [inline]
> #5: ffffffff8e95cdc0 (rcu_read_lock_bh){....}-{1:3}, at: rcu_read_lock_bh include/linux/rcupdate.h:891 [inline]
> #5: ffffffff8e95cdc0 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x2b6/0x3950 net/core/dev.c:4791
> #6: ffffffff8e95cd60 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]
> #6: ffffffff8e95cd60 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
> #6: ffffffff8e95cd60 (rcu_read_lock){....}-{1:3}, at: bond_start_xmit+0xb4/0x1900 drivers/net/bonding/bond_main.c:5591
>
>stack backtrace:
>CPU: 0 UID: 0 PID: 47 Comm: kworker/u8:3 Tainted: G             L      syzkaller #0 PREEMPT(full) 
>Tainted: [L]=SOFTLOCKUP
>Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
>Workqueue: bond5 bond_alb_monitor
>Call Trace:
> <TASK>
> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
> print_deadlock_bug+0x279/0x290 kernel/locking/lockdep.c:3041
> check_deadlock kernel/locking/lockdep.c:3093 [inline]
> validate_chain kernel/locking/lockdep.c:3895 [inline]
> __lock_acquire+0x253f/0x2cf0 kernel/locking/lockdep.c:5237
> lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
> __raw_spin_lock include/linux/spinlock_api_smp.h:158 [inline]
> _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:158
> spin_lock include/linux/spinlock.h:342 [inline]
> rlb_choose_channel+0x37/0x19a0 drivers/net/bonding/bond_alb.c:562
> rlb_arp_xmit drivers/net/bonding/bond_alb.c:680 [inline]
> bond_xmit_alb_slave_get+0x1071/0x20a0 drivers/net/bonding/bond_alb.c:1493
> bond_alb_xmit+0x24/0x40 drivers/net/bonding/bond_alb.c:1528
> __bond_start_xmit drivers/net/bonding/bond_main.c:5569 [inline]
> bond_start_xmit+0x6a2/0x1900 drivers/net/bonding/bond_main.c:5593
> __netdev_start_xmit include/linux/netdevice.h:5368 [inline]
> netdev_start_xmit include/linux/netdevice.h:5377 [inline]
> xmit_one net/core/dev.c:3888 [inline]
> dev_hard_start_xmit+0x2cd/0x830 net/core/dev.c:3904
> __dev_queue_xmit+0x14d9/0x3950 net/core/dev.c:4870
> NF_HOOK+0x33a/0x3c0 include/linux/netfilter.h:-1
> arp_xmit+0x16c/0x270 net/ipv4/arp.c:665
> rlb_update_client+0x2a8/0x6b0 drivers/net/bonding/bond_alb.c:455
> rlb_update_rx_clients drivers/net/bonding/bond_alb.c:473 [inline]
> bond_alb_monitor+0xf6a/0x17e0 drivers/net/bonding/bond_alb.c:1618

	Just looking at the stack, I suspect that this is either a false
positive, or the NF_HOOK action (a netfilter rule) is reinjecting the
ARP packet in to the same bond that created it.

	If the packet is being reinjected to the same interface that
generated it in rlb_update_client, then I believe the above would be the
expected behavior.

	On the other hand, if the network configuration is nested bonds,
then the rlb_arp_xmit -> rlb_choose_channel call path above would be
operating on a different instance of the bond->mode_lock, and would not
actually deadlock.

	-J

> process_one_work kernel/workqueue.c:3302 [inline]
> process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3385
> worker_thread+0xa53/0xfc0 kernel/workqueue.c:3466
> kthread+0x388/0x470 kernel/kthread.c:436
> ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> </TASK>
>
>
>---
>This report is generated by a bot. It may contain errors.
>See https://goo.gl/tpsmEJ for more information about syzbot.
>syzbot engineers can be reached at syzkaller@googlegroups.com.
>
>syzbot will keep track of this issue. See:
>https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
>If the report is already addressed, let syzbot know by replying with:
>#syz fix: exact-commit-title
>
>If you want to overwrite report's subsystems, reply with:
>#syz set subsystems: new-subsystem
>(See the list of subsystem names on the web dashboard)
>
>If the report is a duplicate of another one, reply with:
>#syz dup: exact-subject-of-another-report
>
>If you want to undo deduplication, reply with:
>#syz undup

---
	-Jay Vosburgh, jv@jvosburgh.net

     prev parent reply	other threads:[~2026-05-13 14:41 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-13  8:46 [syzbot] [net?] possible deadlock in rlb_choose_channel (2) syzbot
2026-05-13 14:41 ` Jay Vosburgh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=197247.1778683288@vermin \
    --to=jv@jvosburgh.net \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=syzbot+1db58dbbccbf93c65c83@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox