netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ido Schimmel <idosch@idosch.org>
To: Stephen Hemminger <stephen@networkplumber.org>,
	lucien.xin@gmail.com, edumazet@google.com
Cc: netdev@vger.kernel.org, fmei@sfs.com
Subject: Re: Fw: [Bug 219766] New: Garbage Ethernet Frames
Date: Sat, 15 Feb 2025 22:47:45 +0200	[thread overview]
Message-ID: <Z7D9cR22BDPN7WSJ@shredder> (raw)
In-Reply-To: <20250210084931.23a5c2e4@hermes.local>

On Mon, Feb 10, 2025 at 08:49:31AM -0800, Stephen Hemminger wrote:
> Not really enough information to do any deep analysis but forwarding to netdev
> anyway as it is not junk.
> 
> Begin forwarded message:
> 
> Date: Sun, 09 Feb 2025 12:24:32 +0000
> From: bugzilla-daemon@kernel.org
> To: stephen@networkplumber.org
> Subject: [Bug 219766] New: Garbage Ethernet Frames
> 
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=219766
> 
>             Bug ID: 219766
>            Summary: Garbage Ethernet Frames
>            Product: Networking
>            Version: 2.5
>           Hardware: All
>                 OS: Linux
>             Status: NEW
>           Severity: normal
>           Priority: P3
>          Component: Other
>           Assignee: stephen@networkplumber.org
>           Reporter: fmei@sfs.com
>         Regression: No
> 
> I am currently troubleshooting a very strange problem which appears when
> upgrading Kernel 6.6.58 to 6.6.60. The kernel version change is part of a
> change of talos linux (www.talos.dev) from 1.8.2 to 1.8.3.
> 
> We are running this machines at hetzner - a company which is providing server
> hosting. they complain that we are using mac addresses which are not allowed
> (are not the mac addresses of the physical nic)
> 
> In the investigation of the problem I did tcpdumps on the physical adapters and
> captured this suspicious ethernet frames. The frames do neither have a known
> ethertype, nor do they have a mac address of a known vendor or a known virtual
> mac address range. They seem garbage to me. Below an example. More can be found
> in the github issue. This frames are not emitted very often and the systems are
> operating normally. If I would not be informed by the hosting provider I would
> not have noticed it at all.
> 
> I also tried to track it down to a specific hardware (r8169), but we have the
> same problem with e1000e.
> 
> I checked the changelogs of the two kernel versions (6.6.59 & 6.6.60) and
> noticed there were some changes which could be the problem, but I simply do not
> have the experience for it.
> 
> Can anybody check the changelog of the 2 versions and see if there is a change
> which might cause the problem? Can anybody give me a hint how to track it down
> further?
> 
> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
> listening on enp9s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> 22:07:02.329668 20:00:40:11:18:fb > 45:00:00:44:f4:94, ethertype Unknown
> (0x58c6), length 68:
>         0x0000:  8dda 74ca f1ae ca6c ca6c 0098 969c 0400  ..t....l.l......
>         0x0010:  0000 4730 3f18 6800 0000 0000 0000 9971  ..G0?.h........q
>         0x0020:  c4c9 9055 a157 0a70 9ead bf83 38ca ab38  ...U.W.p....8..8
>         0x0030:  8add ab96 e052                           .....R
> 
> 
> Issue with more information: https://github.com/siderolabs/talos/issues/9837

Adding Xin and Eric as I believe this is caused by commit 22600596b675
("ipv4: give an IPv4 dev to blackhole_netdev"). It's present in v6.6.59,
but not in v6.6.58 which fits the report.

The Ethernet headers of all the captured packets start with 0x45, so
it's quite apparent that these are IPv4 packets that are transmitted
without an Ethernet header. Specifically, these seem to be UDP packets
(10th byte is always 0x11).

After 22600596b675 neighbours can be constructed on the blackhole device
and they are constructed with an output function (neigh_direct_output())
that simply calls dev_queue_xmit(). The latter will transmit packets via
skb->dev which might not be the blackhole device if dst_dev_put()
switched dst->dev to the blackhole device when another CPU was using
this dst entry in ip_output().

I verified this using these hackish scripts [1][2]. They create a
nexthop exception where a dst entry is cached. Two UDP applications use
this dst entry, but a background process continuously bumps the IPv4
generation ID so as to force these applications to perform a route
lookup instead of using the dst entry that they previously resolved.

After creating a new dst entry, one application will try to cache it in
the nexthop exception by calling rt_bind_exception() which will call
dst_dev_put() on the previously cached dst entry that might still be
used by the other application on a different CPU.

There's a counter at the end of the test that counts the number of
packets that were transmitted without an Ethernet header. The problem
does not reproduce with 22600596b675 reverted.

One possible solution is to add the dst entry to the uncached list by
converting dst_dev_put() to rt_add_uncached_list() [3]. The dst entry
will be eventually removed from this list when the dst entry is
destroyed or when dst->dev is unregistered. However, I am not sure it
will solve this report as rt_bind_exception() is not the only caller of
dst_dev_put().

Another possible solution is to have the blackhole device consume the
packets instead of letting them go out without an Ethernet header [4].
Doesn't seem great as the packets disappear without telling anyone
(before 22600596b675 an error was returned).

Let me know what you think. Especially if you have a better fix.

Thanks

[1]
#!/bin/bash
# blackhole.sh

for ns in client router server; do
	ip netns add $ns
	ip -n $ns link set dev lo up
done

ip -n client link add name veth0 type veth peer name veth1 netns router
ip -n router link add name veth2 type veth peer name veth3 netns server

# Client
ip -n client link set dev veth0 up
ip -n client address add 192.0.2.1/28 dev veth0
ip -n client route add default via 192.0.2.2
tc -n client qdisc add dev veth0 clsact
tc -n client filter add dev veth0 egress pref 1 proto all \
	flower dst_mac 45:00:00:00:00:00/ff:ff:00:00:00:00 action pass

# Router
ip netns exec router sysctl -wq net.ipv4.conf.all.forwarding=1
ip -n router link set dev veth1 up
ip -n router address add 192.0.2.2/28 dev veth1
ip -n router link set dev veth2 up mtu 1300
ip -n router address add 192.0.2.17/28 dev veth2

# Server
ip -n server link set dev veth3 up mtu 1300
ip -n server address add 192.0.2.18/28 dev veth3
ip -n server route add default via 192.0.2.17

sleep 1

# Create a nexthop exception where the dst entry will be cached.
ip netns exec client ping 192.0.2.18 -c 1 -M want -s 1400 -q &> /dev/null

# Continuously invalidate cached dst entries by bumping the IPv4 generation ID.
# When replacing the cached dst entry in the nexthop exception,
# rt_bind_exception() will call dst_dev_put(), thereby setting the dst entry's
# device to the blackhole device.
./bump_genid.sh &

ip netns exec server iperf3 -s -p 6000 -D
ip netns exec server iperf3 -s -p 7000 -D
ip netns exec client iperf3 -c 192.0.2.18 -p 6000 -u -b 0 -t 60 &> /dev/null &
ip netns exec client iperf3 -c 192.0.2.18 -p 7000 -u -b 0 -t 60 &> /dev/null

tc -n client -s filter show dev veth0 egress

ip netns pids server | xargs kill
pkill bump_genid.sh
ip -all netns delete

[2]
#!/bin/bash
# bump_genid.sh

while true; do
	ip -n client route add 198.51.100.1/32 dev lo
	ip -n client route del 198.51.100.1/32 dev lo
done

[3]
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 753704f75b2c..f40f860c0bec 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1461,7 +1461,7 @@ static bool rt_bind_exception(struct rtable *rt, struct fib_nh_exception *fnhe,
 			dst_hold(&rt->dst);
 			rcu_assign_pointer(*porig, rt);
 			if (orig) {
-				dst_dev_put(&orig->dst);
+				rt_add_uncached_list(orig);
 				dst_release(&orig->dst);
 			}
 			ret = true;

[4]
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index c8840c3b9a1b..448f352c3c92 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -244,8 +244,22 @@ static netdev_tx_t blackhole_netdev_xmit(struct sk_buff *skb,
 	return NETDEV_TX_OK;
 }
 
+static int blackhole_neigh_output(struct neighbour *n, struct sk_buff * skb)
+{
+	kfree_skb(skb);
+	return 0;
+}
+
+static int blackhole_neigh_construct(struct net_device *dev,
+				     struct neighbour *n)
+{
+	n->output = blackhole_neigh_output;
+	return 0;
+}
+
 static const struct net_device_ops blackhole_netdev_ops = {
 	.ndo_start_xmit = blackhole_netdev_xmit,
+	.ndo_neigh_construct = blackhole_neigh_construct,
 };
 
 /* This is a dst-dummy device used specifically for invalidated

  reply	other threads:[~2025-02-15 20:47 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-10 16:49 Fw: [Bug 219766] New: Garbage Ethernet Frames Stephen Hemminger
2025-02-15 20:47 ` Ido Schimmel [this message]
2025-02-17 22:31   ` Xin Long
2025-02-18 15:21     ` Ido Schimmel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z7D9cR22BDPN7WSJ@shredder \
    --to=idosch@idosch.org \
    --cc=edumazet@google.com \
    --cc=fmei@sfs.com \
    --cc=lucien.xin@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).