From: Jamal Hadi Salim <jhs@mojatatu.com>
To: netdev@vger.kernel.org
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, horms@kernel.org, jiri@resnulli.us,
stephen@networkplumber.org, victor@mojatatu.com,
will@willsroot.io, xmei5@asu.edu, pctammela@mojatatu.com,
savy@syst3mfailure.io, kuniyu@google.com, toke@toke.dk,
willemdebruijnkernel@gmail.com,
Jamal Hadi Salim <jhs@mojatatu.com>
Subject: [PATCH net v2 1/6] net: Introduce skb ttl field to track packet loops
Date: Mon, 16 Mar 2026 17:10:47 -0400 [thread overview]
Message-ID: <20260316211052.332383-2-jhs@mojatatu.com> (raw)
In-Reply-To: <20260316211052.332383-1-jhs@mojatatu.com>
In order to keep track of loops across the stack we need to _remember the global
loop state in the skb_.
We introduce a 2 bit per-skb ttl field to keep track of this state.
The following shows the before and after pahole diff:
pahole before(-) and after (+) diff looks like:
__u8 slow_gro:1; /* 132: 3 1 */
__u8 csum_not_inet:1; /* 132: 4 1 */
__u8 unreadable:1; /* 132: 5 1 */
+ __u8 ttl:2; /* 132: 6 1 */
- /* XXX 2 bits hole, try to pack */
/* XXX 1 byte hole, try to pack */
__u16 tc_index; /* 134 2 */
There used to be a ttl field removed as part of tc_verd in commit aec745e2c520
("net-tc: remove unused tc_verd fields"). It was already unused by
that time removed earlier in commit c19ae86a510c ("tc: remove unused redirect
ttl").
An existing per-cpu loop count, MIRRED_NEST_LIMIT, exists; however, this
count assumes a single call stack assumption and suffers from two challenges:
1)if we queue the packet somewhere and then restart processing later the
per-cpu state is lost (example, it gets wiped out the moment we go
egress->ingress and queue the packet in the backlog and later packets
are being pulled from backlog)
2) If we have X/RPS where a packet came in one CPU but may end up on a
different CPU.
Our first attempt was to "liberate" the skb->from_ingress bit into the skb->cb
field (v1) and after a lot of deeper reviews found that it does get trampled in
case of hardware offload via the mlnx driver.
Our second attempt (which we didnt post) was to "liberate" the
skb->tc_skip_classify bit into the skb->cb - but that led us to a path of making
changes that are sensitive such as making mods to dev queue xmit.
This is our third attempt.
Use cases:
1) Mirred increments the ttl whenever it sees an skb. This in combination with
MIRRED_NEST_LIMIT helps us resolve both challenges mentioned above.
This is ilustrated in patch #2.
2) netem increments the ttl when using the "duplicate" feature and catches it
when it sees the packet the second time.
This is ilustrated in patch #5.
Fixes: fe946a751d9b ("net/sched: act_mirred: add loop detection")
Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
include/linux/skbuff.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index daa4e4944ce3..f1326c4b4bcc 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -848,6 +848,7 @@ enum skb_tstamp_type {
* CHECKSUM_UNNECESSARY (max 3)
* @unreadable: indicates that at least 1 of the fragments in this skb is
* unreadable.
+ * @ttl: time to live counter for packet loops.
* @dst_pending_confirm: need to confirm neighbour
* @decrypted: Decrypted SKB
* @slow_gro: state present at GRO time, slower prepare step required
@@ -1030,6 +1031,7 @@ struct sk_buff {
__u8 csum_not_inet:1;
#endif
__u8 unreadable:1;
+ __u8 ttl:2;
#if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS)
__u16 tc_index; /* traffic control index */
#endif
--
2.34.1
next prev parent reply other threads:[~2026-03-16 21:11 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-16 21:10 [PATCH net v2 0/6] net/sched: Fix packet loops in mirred and netem Jamal Hadi Salim
2026-03-16 21:10 ` Jamal Hadi Salim [this message]
2026-03-16 21:10 ` [PATCH net v2 2/6] net/sched: Fix ethx:ingress -> ethy:egress -> ethx:ingress mirred loop Jamal Hadi Salim
2026-03-16 21:10 ` [PATCH net v2 3/6] Revert "net/sched: Restrict conditions for adding duplicating netems to qdisc tree" Jamal Hadi Salim
2026-03-17 0:54 ` Stephen Hemminger
2026-03-16 21:10 ` [PATCH net v2 4/6] Revert "selftests/tc-testing: Add tests for restrictions on netem duplication" Jamal Hadi Salim
2026-03-17 0:55 ` Stephen Hemminger
2026-03-18 19:26 ` Jamal Hadi Salim
2026-03-16 21:10 ` [PATCH net v2 5/6] net/sched: fix packet loop on netem when duplicate is on Jamal Hadi Salim
2026-03-17 0:57 ` Stephen Hemminger
2026-03-18 19:34 ` Jamal Hadi Salim
2026-03-19 1:25 ` William Liu
2026-03-23 23:14 ` Stephen Hemminger
2026-03-23 19:33 ` Jamal Hadi Salim
2026-03-16 21:10 ` [PATCH net v2 6/6] selftests/tc-testing: Add netem/mirred test cases exercising loops Jamal Hadi Salim
2026-03-17 0:58 ` Stephen Hemminger
2026-03-17 23:36 ` [PATCH net v2 0/6] net/sched: Fix packet loops in mirred and netem Stephen Hemminger
2026-03-18 19:41 ` Jamal Hadi Salim
[not found] ` <CAOaVG17Jp8QB_=r3-eWM6bvrLAixFuj+wYB_zv5+OZfiY0LyYA@mail.gmail.com>
2026-03-19 1:08 ` William Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260316211052.332383-2-jhs@mojatatu.com \
--to=jhs@mojatatu.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pctammela@mojatatu.com \
--cc=savy@syst3mfailure.io \
--cc=stephen@networkplumber.org \
--cc=toke@toke.dk \
--cc=victor@mojatatu.com \
--cc=will@willsroot.io \
--cc=willemdebruijnkernel@gmail.com \
--cc=xmei5@asu.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox