All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Jiayuan Chen <jiayuan.chen@linux.dev>
Cc: netdev@vger.kernel.org, edumazet@google.com,
	Jiayuan Chen <jiayuan.chen@shopee.com>,
	Jay Vosburgh <jv@jvosburgh.net>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Paolo Abeni <pabeni@redhat.com>, Shuah Khan <shuah@kernel.org>,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH net v1] selftests: bonding: add test for stacked bond header_parse recursion
Date: Sat, 14 Mar 2026 10:38:33 -0700	[thread overview]
Message-ID: <20260314103833.36698c29@kernel.org> (raw)
In-Reply-To: <20260314134211.33405-1-jiayuan.chen@linux.dev>

On Sat, 14 Mar 2026 21:42:05 +0800 Jiayuan Chen wrote:
> From: Jiayuan Chen <jiayuan.chen@shopee.com>
> 
> Add a selftest to reproduce the infinite recursion in bond_header_parse()
> when bonds are stacked (bond1 -> bond0 -> gre). When a packet is received
> via AF_PACKET SOCK_DGRAM on the topmost bond, dev_parse_header() calls
> bond_header_parse() which used skb->dev (always the topmost bond) to get
> the bonding struct. This caused it to recurse back into itself
> indefinitely, leading to stack overflow.
> 
> Before Eric's fix [2], the test triggers:
> 
>   ./bond-stacked-header-parse.sh
> 
>   [  71.999481] BUG: MAX_LOCK_DEPTH too low!
>   [  72.000170] turning off the locking correctness validator.
>   [  72.001029] Please attach the output of /proc/lock_stat to the bug report
>   [  72.002079] depth: 48  max: 48!
>   ...
> 
> After Eric's fix [2], everything works fine:
> 
>   ./bond-stacked-header-parse.sh
>   TEST: Stacked bond header_parse does not recurse                  [ OK ]
> 
> Also verified via make run_tests -C drivers/net/bonding:
> 
>   ...
>   ok 3 selftests: drivers/net/bonding: bond-eth-type-change.sh
>   # timeout set to 1200
>   # selftests: drivers/net/bonding: bond-stacked-header-parse.sh
>   # TEST: Stacked bond header_parse does not recurse                [ OK ]
>   ok 4 selftests: drivers/net/bonding: bond-stacked-header-parse.sh
>   # timeout set to 1200
>   # selftests: drivers/net/bonding: bond-lladdr-target.sh
>   # PASS
>   ...
> 
> [1] https://lore.kernel.org/netdev/CANn89iK2EURqsjtd=OVP4awYTJHGcR-UU-V9WovpWR1Z3f03oQ@mail.gmail.com/
> [2] https://lore.kernel.org/netdev/20260314115650.3646361-1-edumazet@google.com/
> 
> Cc: Jiayuan Chen <jiayuan.chen@linux.dev>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
> ---
>  .../selftests/drivers/net/bonding/Makefile    |   1 +
>  .../net/bonding/bond-stacked-header-parse.sh  | 142 ++++++++++++++++++
>  2 files changed, 143 insertions(+)
>  create mode 100755 tools/testing/selftests/drivers/net/bonding/bond-stacked-header-parse.sh
> 
> diff --git a/tools/testing/selftests/drivers/net/bonding/Makefile b/tools/testing/selftests/drivers/net/bonding/Makefile
> index 6c5c60adb5e8..055f6af03b5d 100644
> --- a/tools/testing/selftests/drivers/net/bonding/Makefile
> +++ b/tools/testing/selftests/drivers/net/bonding/Makefile
> @@ -5,6 +5,7 @@ TEST_PROGS := \
>  	bond-arp-interval-causes-panic.sh \
>  	bond-break-lacpdu-tx.sh \
>  	bond-eth-type-change.sh \
> +	bond-stacked-header-parse.sh \

this list is alphabetically sorted

>  	bond-lladdr-target.sh \
>  	bond_ipsec_offload.sh \
>  	bond_lacp_prio.sh \
> diff --git a/tools/testing/selftests/drivers/net/bonding/bond-stacked-header-parse.sh b/tools/testing/selftests/drivers/net/bonding/bond-stacked-header-parse.sh
> new file mode 100755
> index 000000000000..d377bedaef63
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/net/bonding/bond-stacked-header-parse.sh

please run shellcheck on this file. Most of the reports are probably 
false positive but some are reasonable I think?

> @@ -0,0 +1,142 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Test that bond_header_parse() does not infinitely recurse with stacked bonds.
> +#
> +# When a non-Ethernet device (e.g. GRE) is enslaved to a bond that is itself
> +# enslaved to another bond (bond1 -> bond0 -> gre), receiving a packet via
> +# AF_PACKET SOCK_DGRAM triggers dev_parse_header() -> bond_header_parse().
> +# Since parse() used skb->dev (always the topmost bond) instead of a passed-in
> +# dev pointer, it would recurse back into itself indefinitely.
> +
> +ALL_TESTS="
> +	bond_test_stacked_header_parse
> +"
> +REQUIRE_MZ=no
> +NUM_NETIFS=0
> +lib_dir=$(dirname "$0")
> +source "$lib_dir"/../../../net/forwarding/lib.sh
> +
> +require_command()
> +{
> +	if ! command -v "$1" &>/dev/null; then
> +		echo "SKIP: $1 is not installed"
> +		exit "$ksft_skip"
> +	fi
> +}
> +
> +bond_test_stacked_header_parse()
> +{
> +	local devdummy="test-dummy0"
> +	local devgre="test-gre0"
> +	local devbond0="test-bond0"
> +	local devbond1="test-bond1"
> +
> +	RET=0
> +
> +	# Setup: dummy -> gre -> bond0 -> bond1
> +	modprobe dummy 2>/dev/null
> +	modprobe ip_gre 2>/dev/null
> +	modprobe bonding 2>/dev/null

Doesn't rtnetlink auto-load the link modules?

> +	ip link add name "$devdummy" type dummy
> +	if [ $? -ne 0 ]; then
> +		log_test_skip "could not create dummy device (CONFIG_DUMMY)"
> +		return
> +	fi
> +	ip addr add 10.0.0.1/24 dev "$devdummy"
> +	ip link set "$devdummy" up
> +
> +	ip link add name "$devgre" type gre local 10.0.0.1
> +	if [ $? -ne 0 ]; then
> +		log_test_skip "could not create GRE device (CONFIG_NET_IPGRE)"
> +		ip link del "$devdummy" 2>/dev/null
> +		return
> +	fi

You have to add the dependencies to
tools/testing/selftests/drivers/net/bonding/config

You can keep these checks if you really want to but we don't really
encourage them

> +	ip link add name "$devbond0" type bond mode active-backup
> +	check_err $? "could not create bond0"
> +	ip link add name "$devbond1" type bond mode active-backup
> +	check_err $? "could not create bond1"
> +
> +	ip link set "$devgre" master "$devbond0"
> +	check_err $? "could not enslave $devgre to $devbond0"
> +	ip link set "$devbond0" master "$devbond1"
> +	check_err $? "could not enslave $devbond0 to $devbond1"
> +
> +	ip link set "$devgre" up
> +	ip link set "$devbond0" up
> +	ip link set "$devbond1" up
> +
> +	# Send a GRE-encapsulated packet to 10.0.0.1 while an AF_PACKET
> +	# SOCK_DGRAM socket is listening on bond1. The receive path calls
> +	# dev_parse_header() which invokes bond_header_parse(). With the
> +	# bug, this recurses infinitely and causes a stack overflow.
> +	#
> +	# Use Python to:
> +	# 1. Open AF_PACKET SOCK_DGRAM on bond1
> +	# 2. Send a GRE packet to 10.0.0.1 via raw socket
> +	# 3. Try to receive (triggers parse path)
> +	python3 -c "
> +import socket, struct, time

is this AI-generated?

You can add an extra script in TEST_FILES and just call it.
No need for inline scripts..

> +# AF_PACKET SOCK_DGRAM on bond1
> +ETH_P_ALL = 0x0003
> +pkt_fd = socket.socket(socket.AF_PACKET, socket.SOCK_DGRAM,
> +                       socket.htons(ETH_P_ALL))
> +pkt_fd.settimeout(2)
> +pkt_fd.bind(('$devbond1', ETH_P_ALL))
> +
> +# Build GRE-encapsulated IP packet
> +def build_ip_hdr(proto, saddr, daddr, payload_len):
> +    ihl_ver = 0x45
> +    total_len = 20 + payload_len
> +    hdr = struct.pack('!BBHHHBBH4s4s',
> +        ihl_ver, 0, total_len, 0, 0, 64, proto, 0,
> +        socket.inet_aton(saddr), socket.inet_aton(daddr))
> +    # compute checksum
> +    words = struct.unpack('!10H', hdr)
> +    s = sum(words)
> +    while s >> 16:
> +        s = (s & 0xffff) + (s >> 16)
> +    chksum = ~s & 0xffff
> +    hdr = hdr[:10] + struct.pack('!H', chksum) + hdr[12:]
> +    return hdr
> +
> +inner = build_ip_hdr(17, '192.168.1.1', '192.168.1.2', 8) + b'\x00' * 8
> +gre_hdr = struct.pack('!HH', 0, 0x0800)  # flags=0, proto=IP
> +outer = build_ip_hdr(47, '10.0.0.2', '10.0.0.1', len(gre_hdr) + len(inner))
> +pkt = outer + gre_hdr + inner
> +
> +raw_fd = socket.socket(socket.AF_INET, socket.SOCK_RAW, socket.IPPROTO_RAW)
> +raw_fd.setsockopt(socket.IPPROTO_IP, socket.IP_HDRINCL, 1)
> +raw_fd.sendto(pkt, ('10.0.0.1', 0))
> +raw_fd.close()
> +
> +try:
> +    pkt_fd.recv(2048)
> +except socket.timeout:
> +    pass
> +pkt_fd.close()
> +" 2>/dev/null
> +
> +	# If we get here without a kernel crash/hang, the test passed.
> +	# Also check dmesg for signs of the recursion bug.
> +	if dmesg | tail -20 | grep -q "BUG: MAX_LOCK_DEPTH\|stack-overflow\|stack overflow"; then
> +		check_err 1 "kernel detected recursion in bond_header_parse"
> +	fi
> +
> +	# Cleanup
> +	ip link del "$devbond1" 2>/dev/null
> +	ip link del "$devbond0" 2>/dev/null
> +	ip link del "$devgre" 2>/dev/null
> +	ip link del "$devdummy" 2>/dev/null
> +
> +	log_test "Stacked bond header_parse does not recurse"
> +}
> +
> +require_command python3

No need, we have pure python tests 

> +tests_run
> +
> +exit "$EXIT_STATUS"


  reply	other threads:[~2026-03-14 17:38 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-14 13:42 [PATCH net v1] selftests: bonding: add test for stacked bond header_parse recursion Jiayuan Chen
2026-03-14 17:38 ` Jakub Kicinski [this message]
2026-03-15 10:40   ` Jiayuan Chen
2026-03-16 23:59     ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260314103833.36698c29@kernel.org \
    --to=kuba@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jiayuan.chen@linux.dev \
    --cc=jiayuan.chen@shopee.com \
    --cc=jv@jvosburgh.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.