* [PATCH net v1] bonding: fix type confusion in bond_setup_by_slave()
@ 2026-02-28 9:58 Jiayuan Chen
2026-03-04 5:23 ` Jay Vosburgh
2026-03-04 7:09 ` Eric Dumazet
0 siblings, 2 replies; 4+ messages in thread
From: Jiayuan Chen @ 2026-02-28 9:58 UTC (permalink / raw)
To: netdev
Cc: jiayuan.chen, Jiayuan Chen, Jay Vosburgh, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Pravin B Shelar, linux-kernel
From: Jiayuan Chen <jiayuan.chen@shopee.com>
kernel BUG at net/core/skbuff.c:2306!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
RIP: 0010:pskb_expand_head+0xa08/0xfe0 net/core/skbuff.c:2306
RSP: 0018:ffffc90004aff760 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88807e3c8780 RCX: ffffffff89593e0e
RDX: ffff88807b7c4900 RSI: ffffffff89594747 RDI: ffff88807b7c4900
RBP: 0000000000000820 R08: 0000000000000005 R09: 0000000000000000
R10: 00000000961a63e0 R11: 0000000000000000 R12: ffff88807e3c8780
R13: 00000000961a6560 R14: dffffc0000000000 R15: 00000000961a63e0
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe1a0ed8df0 CR3: 000000002d816000 CR4: 00000000003526f0
Call Trace:
<TASK>
ipgre_header+0xdd/0x540 net/ipv4/ip_gre.c:900
dev_hard_header include/linux/netdevice.h:3439 [inline]
packet_snd net/packet/af_packet.c:3028 [inline]
packet_sendmsg+0x3ae5/0x53c0 net/packet/af_packet.c:3108
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
____sys_sendmsg+0xa54/0xc30 net/socket.c:2592
___sys_sendmsg+0x190/0x1e0 net/socket.c:2646
__sys_sendmsg+0x170/0x220 net/socket.c:2678
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fe1a0e6c1a9
When a device (e.g. GRE tunnel) is enslaved to a bond,
bond_setup_by_slave() directly copies the slave's header_ops to the
bond device:
bond_dev->header_ops = slave_dev->header_ops;
This causes a type confusion when dev_hard_header() is later called
on the bond device. Functions like ipgre_header(), ip6gre_header(),
vlan_dev_hard_header(), and macvlan_hard_header() all use
netdev_priv(dev) to access their device-specific private data. When
called with the bond device, netdev_priv() returns the bond's private
data (struct bonding) instead of the expected type (e.g. struct
ip_tunnel), leading to garbage values being read and kernel crashes.
Fix this by introducing bond_header_ops with wrapper functions that
delegate to the active slave's header_ops using the slave's own
device. This ensures netdev_priv() in the slave's header functions
always receives the correct device.
The fix is placed in the bonding driver rather than individual device
drivers, as the root cause is bond blindly inheriting header_ops from
the slave without considering that these callbacks expect a specific
netdev_priv() layout.
The type confusion can be observed by adding a printk in
ipgre_header() and running the following commands:
ip link add dummy0 type dummy
ip addr add 10.0.0.1/24 dev dummy0
ip link set dummy0 up
ip link add gre1 type gre local 10.0.0.1
ip link add bond1 type bond mode active-backup
ip link set gre1 master bond1
ip link set gre1 up
ip link set bond1 up
ip addr add fe80::1/64 dev bond1
Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
---
drivers/net/bonding/bond_main.c | 41 ++++++++++++++++++++++++++++++++-
1 file changed, 40 insertions(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 0ccf928eecde..e1d1e3b49cb9 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1509,6 +1509,44 @@ static netdev_features_t bond_fix_features(struct net_device *dev,
return features;
}
+static int bond_header_create(struct sk_buff *skb, struct net_device *bond_dev,
+ unsigned short type, const void *daddr,
+ const void *saddr, unsigned int len)
+{
+ struct bonding *bond = netdev_priv(bond_dev);
+ struct slave *slave;
+ int ret = 0;
+
+ rcu_read_lock();
+ slave = rcu_dereference(bond->curr_active_slave);
+ if (slave && slave->dev->header_ops &&
+ slave->dev->header_ops->create)
+ ret = slave->dev->header_ops->create(skb, slave->dev,
+ type, daddr, saddr, len);
+ rcu_read_unlock();
+ return ret;
+}
+
+static int bond_header_parse(const struct sk_buff *skb, unsigned char *haddr)
+{
+ struct bonding *bond = netdev_priv(skb->dev);
+ struct slave *slave;
+ int ret = 0;
+
+ rcu_read_lock();
+ slave = rcu_dereference(bond->curr_active_slave);
+ if (slave && slave->dev->header_ops &&
+ slave->dev->header_ops->parse)
+ ret = slave->dev->header_ops->parse(skb, haddr);
+ rcu_read_unlock();
+ return ret;
+}
+
+static const struct header_ops bond_header_ops = {
+ .create = bond_header_create,
+ .parse = bond_header_parse,
+};
+
static void bond_setup_by_slave(struct net_device *bond_dev,
struct net_device *slave_dev)
{
@@ -1516,7 +1554,8 @@ static void bond_setup_by_slave(struct net_device *bond_dev,
dev_close(bond_dev);
- bond_dev->header_ops = slave_dev->header_ops;
+ bond_dev->header_ops = slave_dev->header_ops ?
+ &bond_header_ops : NULL;
bond_dev->type = slave_dev->type;
bond_dev->hard_header_len = slave_dev->hard_header_len;
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH net v1] bonding: fix type confusion in bond_setup_by_slave()
2026-02-28 9:58 [PATCH net v1] bonding: fix type confusion in bond_setup_by_slave() Jiayuan Chen
@ 2026-03-04 5:23 ` Jay Vosburgh
2026-03-04 10:14 ` Jiayuan Chen
2026-03-04 7:09 ` Eric Dumazet
1 sibling, 1 reply; 4+ messages in thread
From: Jay Vosburgh @ 2026-03-04 5:23 UTC (permalink / raw)
To: Jiayuan Chen
Cc: netdev, Jiayuan Chen, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pravin B Shelar, linux-kernel
Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>From: Jiayuan Chen <jiayuan.chen@shopee.com>
>
>kernel BUG at net/core/skbuff.c:2306!
>Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
>RIP: 0010:pskb_expand_head+0xa08/0xfe0 net/core/skbuff.c:2306
>RSP: 0018:ffffc90004aff760 EFLAGS: 00010293
>RAX: 0000000000000000 RBX: ffff88807e3c8780 RCX: ffffffff89593e0e
>RDX: ffff88807b7c4900 RSI: ffffffff89594747 RDI: ffff88807b7c4900
>RBP: 0000000000000820 R08: 0000000000000005 R09: 0000000000000000
>R10: 00000000961a63e0 R11: 0000000000000000 R12: ffff88807e3c8780
>R13: 00000000961a6560 R14: dffffc0000000000 R15: 00000000961a63e0
>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>CR2: 00007fe1a0ed8df0 CR3: 000000002d816000 CR4: 00000000003526f0
>Call Trace:
> <TASK>
> ipgre_header+0xdd/0x540 net/ipv4/ip_gre.c:900
> dev_hard_header include/linux/netdevice.h:3439 [inline]
> packet_snd net/packet/af_packet.c:3028 [inline]
> packet_sendmsg+0x3ae5/0x53c0 net/packet/af_packet.c:3108
> sock_sendmsg_nosec net/socket.c:727 [inline]
> __sock_sendmsg net/socket.c:742 [inline]
> ____sys_sendmsg+0xa54/0xc30 net/socket.c:2592
> ___sys_sendmsg+0x190/0x1e0 net/socket.c:2646
> __sys_sendmsg+0x170/0x220 net/socket.c:2678
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>RIP: 0033:0x7fe1a0e6c1a9
>
>When a device (e.g. GRE tunnel) is enslaved to a bond,
>bond_setup_by_slave() directly copies the slave's header_ops to the
>bond device:
>
> bond_dev->header_ops = slave_dev->header_ops;
This is true only for devices that are not ARPHRD_ETHER.
>This causes a type confusion when dev_hard_header() is later called
>on the bond device. Functions like ipgre_header(), ip6gre_header(),
>vlan_dev_hard_header(), and macvlan_hard_header() all use
>netdev_priv(dev) to access their device-specific private data. When
>called with the bond device, netdev_priv() returns the bond's private
>data (struct bonding) instead of the expected type (e.g. struct
>ip_tunnel), leading to garbage values being read and kernel crashes.
I believe that vlan and macvlan are both ARPHRD_ETHER, and thus
won't take the bond_setup_by_slave path (which was originally
implemented for Infiniband).
That said, your description for the GRE paths seems correct.
>Fix this by introducing bond_header_ops with wrapper functions that
>delegate to the active slave's header_ops using the slave's own
>device. This ensures netdev_priv() in the slave's header functions
>always receives the correct device.
>
>The fix is placed in the bonding driver rather than individual device
>drivers, as the root cause is bond blindly inheriting header_ops from
>the slave without considering that these callbacks expect a specific
>netdev_priv() layout.
>
>The type confusion can be observed by adding a printk in
>ipgre_header() and running the following commands:
>
> ip link add dummy0 type dummy
> ip addr add 10.0.0.1/24 dev dummy0
> ip link set dummy0 up
> ip link add gre1 type gre local 10.0.0.1
> ip link add bond1 type bond mode active-backup
> ip link set gre1 master bond1
> ip link set gre1 up
> ip link set bond1 up
> ip addr add fe80::1/64 dev bond1
>
>Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Has something relevant changed more recently than this? I seem
to recall hearing of folks using bonding with GRE more recently than the
2013 date on c54419321455. I didn't do an exhaustive search, but I feel
like someone would have run into this in the prior 13-ish years if it's
been broken the whole time.
>Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
>---
> drivers/net/bonding/bond_main.c | 41 ++++++++++++++++++++++++++++++++-
> 1 file changed, 40 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 0ccf928eecde..e1d1e3b49cb9 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1509,6 +1509,44 @@ static netdev_features_t bond_fix_features(struct net_device *dev,
> return features;
> }
>
>+static int bond_header_create(struct sk_buff *skb, struct net_device *bond_dev,
>+ unsigned short type, const void *daddr,
>+ const void *saddr, unsigned int len)
>+{
>+ struct bonding *bond = netdev_priv(bond_dev);
>+ struct slave *slave;
>+ int ret = 0;
>+
>+ rcu_read_lock();
>+ slave = rcu_dereference(bond->curr_active_slave);
>+ if (slave && slave->dev->header_ops &&
>+ slave->dev->header_ops->create)
>+ ret = slave->dev->header_ops->create(skb, slave->dev,
>+ type, daddr, saddr, len);
>+ rcu_read_unlock();
>+ return ret;
>+}
>+
>+static int bond_header_parse(const struct sk_buff *skb, unsigned char *haddr)
>+{
>+ struct bonding *bond = netdev_priv(skb->dev);
>+ struct slave *slave;
>+ int ret = 0;
>+
>+ rcu_read_lock();
>+ slave = rcu_dereference(bond->curr_active_slave);
>+ if (slave && slave->dev->header_ops &&
>+ slave->dev->header_ops->parse)
>+ ret = slave->dev->header_ops->parse(skb, haddr);
>+ rcu_read_unlock();
>+ return ret;
>+}
>+
>+static const struct header_ops bond_header_ops = {
>+ .create = bond_header_create,
>+ .parse = bond_header_parse,
>+};
>+
> static void bond_setup_by_slave(struct net_device *bond_dev,
> struct net_device *slave_dev)
> {
>@@ -1516,7 +1554,8 @@ static void bond_setup_by_slave(struct net_device *bond_dev,
>
> dev_close(bond_dev);
>
>- bond_dev->header_ops = slave_dev->header_ops;
>+ bond_dev->header_ops = slave_dev->header_ops ?
>+ &bond_header_ops : NULL;
Can slave_dev->header_ops actually be NULL?
-J
> bond_dev->type = slave_dev->type;
> bond_dev->hard_header_len = slave_dev->hard_header_len;
>--
>2.43.0
>
---
-Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH net v1] bonding: fix type confusion in bond_setup_by_slave()
2026-03-04 5:23 ` Jay Vosburgh
@ 2026-03-04 10:14 ` Jiayuan Chen
0 siblings, 0 replies; 4+ messages in thread
From: Jiayuan Chen @ 2026-03-04 10:14 UTC (permalink / raw)
To: Jay Vosburgh
Cc: netdev, Jiayuan Chen, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pravin B Shelar, linux-kernel
March 4, 2026 at 13:23, "Jay Vosburgh" <jv@jvosburgh.net mailto:jv@jvosburgh.net?to=%22Jay%20Vosburgh%22%20%3Cjv%40jvosburgh.net%3E > wrote:
>
> Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
> >
> > From: Jiayuan Chen <jiayuan.chen@shopee.com>
> >
[...]
> > bond_dev->header_ops = slave_dev->header_ops;
Hi Jay,
Thank you for reviewing this patch.
> This is true only for devices that are not ARPHRD_ETHER
> >
> > This causes a type confusion when dev_hard_header() is later called
> > on the bond device. Functions like ipgre_header(), ip6gre_header(),
> > vlan_dev_hard_header(), and macvlan_hard_header() all use
> > netdev_priv(dev) to access their device-specific private data. When
> > called with the bond device, netdev_priv() returns the bond's private
> > data (struct bonding) instead of the expected type (e.g. struct
> > ip_tunnel), leading to garbage values being read and kernel crashes.
> >
> I believe that vlan and macvlan are both ARPHRD_ETHER, and thus
> won't take the bond_setup_by_slave path (which was originally
> implemented for Infiniband).
> That said, your description for the GRE paths seems correct.
Thanks. You are right. vlan and macvlan are both ARPHRD_ETHER and go through
bond_ether_setup(), not bond_setup_by_slave(). I'll remove them from
the description in v2.
[...]
> > ip link add dummy0 type dummy
> > ip addr add 10.0.0.1/24 dev dummy0
> > ip link set dummy0 up
> > ip link add gre1 type gre local 10.0.0.1
> > ip link add bond1 type bond mode active-backup
> > ip link set gre1 master bond1
> > ip link set gre1 up
> > ip link set bond1 up
> > ip addr add fe80::1/64 dev bond1
> >
> > Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
> >
> Has something relevant changed more recently than this? I seem
> to recall hearing of folks using bonding with GRE more recently than the
> 2013 date on c54419321455. I didn't do an exhaustive search, but I feel
> like someone would have run into this in the prior 13-ish years if it's
> been broken the whole time.
Good question. Here is the full timeline:
1. 1da177e4c3f4 ("Linux-2.6.12-rc2", 2005) - ipgre_header() already
uses dev->priv (later netdev_priv(dev)), but bonding cannot
enslave non-Ethernet devices at all, so no issue.
2. 872254dd6b1f ("net/bonding: Enable bonding to enslave non
ARPHRD_ETHER", 2007-10-09) - bond_setup_by_slave() is introduced
for IPoIB support, but it does NOT copy hard_header or header_ops.
Still no issue.
3. 3b04ddde02cf ("[NET]: Move hardware header operations out of
netdevice.", 2007-10-09) - dev->hard_header is refactored into
dev->header_ops->create. GRE gets ipgre_header_ops. Still no issue
because bond doesn't copy header_ops yet.
4. 1284cd3a2b74 ("bonding: two small fixes for IPoIB support",
2007-10-15) - bond_setup_by_slave() starts copying header_ops from
slave to bond. This is the commit that introduces the flawed pattern.
The problem is not limited to the panic shown in the crash
log. The fundamental issue is that netdev_priv(bond_dev) returns a
pointer to struct bonding, which ipgre_header() interprets as struct
ip_tunnel. Fields like t->hlen and t->parms.iph contain whatever
garbage happens to be at those offsets in the bonding structure.
When t->hlen is abnormally large or negative, it triggers the BUG_ON in
pskb_expand_head(). But even without a panic, the corrupted values
would produce malformed packets with wrong IP headers and GRE flags.
So now I think 1284cd3a2b74 is more appropriate.
I also tested on kernel 5.10 and added printk to show dev->name, it was
still "bond1".
~# dmesg -c
[ 0.000000] Linux version 5.10.248+ (root@xxx) (gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #4 SMP PREEMPT Wed Mar 46
[ 0.000000] Command line: console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0 rw tmpfs=20G kmsan.panic=0
[ 0.000000] KERNEL supported cpus:
......
[ 44.942850] bond1: (slave gre1): The slave device specified does not support setting the MAC address
[ 44.942859] bond1: (slave gre1): Setting fail_over_mac to active for active-backup mode
[ 44.944602] bond1: (slave gre1): making interface the new active one
[ 44.944697] bond1: (slave gre1): Enslaving as an active interface with an up link
[ 44.951456] 8021q: adding VLAN 0 to HW filter on device bond1
[ 44.969646] dev name bond1
[ 45.333625] dev name bond1
[ 45.941654] IPv6: ADDRCONF(NETDEV_CHANGE): bond1: link becomes ready
[ 45.953612] dev name bond1
[ 46.357629] dev name bond1
[ 46.549627] dev name bond1
[ 46.581614] dev name bond1
[ 47.381737] dev name bond1
[ 47.381749] dev name bond1
[ 47.489620] dev name bond1
[...]
> > +
> > +static const struct header_ops bond_header_ops = {
> > + .create = bond_header_create,
> > + .parse = bond_header_parse,
> > +};
> > +
> > static void bond_setup_by_slave(struct net_device *bond_dev,
> > struct net_device *slave_dev)
> > {
> > @@ -1516,7 +1554,8 @@ static void bond_setup_by_slave(struct net_device *bond_dev,
> >
> > dev_close(bond_dev);
> >
> > - bond_dev->header_ops = slave_dev->header_ops;
> > + bond_dev->header_ops = slave_dev->header_ops ?
> > + &bond_header_ops : NULL;
> >
> Can slave_dev->header_ops actually be NULL?
Yes. For example, a GRE tunnel created with a unicast remote
ip link add gre1 type gre local 10.0.0.1 remote 10.0.0.2
does not set header_ops (see ipgre_tunnel_init(), the iph->daddr && !tunnel->collect_md branch).
> -J
>
> >
> > bond_dev->type = slave_dev->type;
> > bond_dev->hard_header_len = slave_dev->hard_header_len;
> > --
> > 2.43.0
> >
> ---
> -Jay Vosburgh, jv@jvosburgh.net
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net v1] bonding: fix type confusion in bond_setup_by_slave()
2026-02-28 9:58 [PATCH net v1] bonding: fix type confusion in bond_setup_by_slave() Jiayuan Chen
2026-03-04 5:23 ` Jay Vosburgh
@ 2026-03-04 7:09 ` Eric Dumazet
1 sibling, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2026-03-04 7:09 UTC (permalink / raw)
To: Jiayuan Chen
Cc: netdev, Jiayuan Chen, Jay Vosburgh, Andrew Lunn, David S. Miller,
Jakub Kicinski, Paolo Abeni, Pravin B Shelar, linux-kernel
On Sat, Feb 28, 2026 at 10:59 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
> From: Jiayuan Chen <jiayuan.chen@shopee.com>
>
>
> When a device (e.g. GRE tunnel) is enslaved to a bond,
> bond_setup_by_slave() directly copies the slave's header_ops to the
> bond device:
>
> bond_dev->header_ops = slave_dev->header_ops;
>
> This causes a type confusion when dev_hard_header() is later called
> on the bond device. Functions like ipgre_header(), ip6gre_header(),
> vlan_dev_hard_header(), and macvlan_hard_header() all use
> netdev_priv(dev) to access their device-specific private data. When
> called with the bond device, netdev_priv() returns the bond's private
> data (struct bonding) instead of the expected type (e.g. struct
> ip_tunnel), leading to garbage values being read and kernel crashes.
>
> Fix this by introducing bond_header_ops with wrapper functions that
> delegate to the active slave's header_ops using the slave's own
> device. This ensures netdev_priv() in the slave's header functions
> always receives the correct device.
>
> The fix is placed in the bonding driver rather than individual device
> drivers, as the root cause is bond blindly inheriting header_ops from
> the slave without considering that these callbacks expect a specific
> netdev_priv() layout.
>
> The type confusion can be observed by adding a printk in
> ipgre_header() and running the following commands:
>
> ip link add dummy0 type dummy
> ip addr add 10.0.0.1/24 dev dummy0
> ip link set dummy0 up
> ip link add gre1 type gre local 10.0.0.1
> ip link add bond1 type bond mode active-backup
> ip link set gre1 master bond1
> ip link set gre1 up
> ip link set bond1 up
> ip addr add fe80::1/64 dev bond1
>
> Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
> ---
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-03-04 10:15 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-28 9:58 [PATCH net v1] bonding: fix type confusion in bond_setup_by_slave() Jiayuan Chen
2026-03-04 5:23 ` Jay Vosburgh
2026-03-04 10:14 ` Jiayuan Chen
2026-03-04 7:09 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox