* [PATCH net v2 0/3] several fixes for ioam6, rpl and seg6 lwtunnels
@ 2025-02-11 22:16 Justin Iurman
2025-02-11 22:16 ` [PATCH net v2 1/3] net: ipv6: fix dst ref loops on input in " Justin Iurman
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Justin Iurman @ 2025-02-11 22:16 UTC (permalink / raw)
To: netdev; +Cc: davem, dsahern, edumazet, kuba, pabeni, horms, justin.iurman
v2:
- address warnings/errors reported by checkpatch
v1:
- https://lore.kernel.org/netdev/20250209193840.20509-1-justin.iurman@uliege.be/
This series provides fixes to prevent loops in ioam6_iptunnel,
rpl_iptunnel and seg6_iptunnel.
Justin Iurman (3):
net: ipv6: fix dst ref loops on input in rpl and seg6 lwtunnels
net: ipv6: fix lwtunnel loops in ioam6, rpl and seg6
net: ipv6: fix consecutive input and output transformation in
lwtunnels
net/ipv6/ioam6_iptunnel.c | 6 ++---
net/ipv6/rpl_iptunnel.c | 34 +++++++++++++++++++++--
net/ipv6/seg6_iptunnel.c | 57 +++++++++++++++++++++++++++++++++------
3 files changed, 83 insertions(+), 14 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH net v2 1/3] net: ipv6: fix dst ref loops on input in rpl and seg6 lwtunnels
2025-02-11 22:16 [PATCH net v2 0/3] several fixes for ioam6, rpl and seg6 lwtunnels Justin Iurman
@ 2025-02-11 22:16 ` Justin Iurman
2025-02-13 12:27 ` Ido Schimmel
2025-02-11 22:16 ` [PATCH net v2 2/3] net: ipv6: fix lwtunnel loops in ioam6, rpl and seg6 Justin Iurman
2025-02-11 22:16 ` [PATCH net v2 3/3] net: ipv6: fix consecutive input and output transformation in lwtunnels Justin Iurman
2 siblings, 1 reply; 16+ messages in thread
From: Justin Iurman @ 2025-02-11 22:16 UTC (permalink / raw)
To: netdev
Cc: davem, dsahern, edumazet, kuba, pabeni, horms, justin.iurman,
Alexander Aring, David Lebrun
As a follow up to commit 92191dd10730 ("net: ipv6: fix dst ref loops in
rpl, seg6 and ioam6 lwtunnels"), we also need a conditional dst cache on
input for seg6_iptunnel and rpl_iptunnel to prevent dst ref loops (i.e.,
if the packet destination did not change, we may end up recording a
reference to the lwtunnel in its own cache, and the lwtunnel state will
never be freed).
Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: David Lebrun <dlebrun@google.com>
---
net/ipv6/rpl_iptunnel.c | 14 ++++++++++++--
net/ipv6/seg6_iptunnel.c | 14 ++++++++++++--
2 files changed, 24 insertions(+), 4 deletions(-)
diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c
index 0ac4283acdf2..c26bf284459f 100644
--- a/net/ipv6/rpl_iptunnel.c
+++ b/net/ipv6/rpl_iptunnel.c
@@ -262,10 +262,18 @@ static int rpl_input(struct sk_buff *skb)
{
struct dst_entry *orig_dst = skb_dst(skb);
struct dst_entry *dst = NULL;
+ struct lwtunnel_state *lwtst;
struct rpl_lwt *rlwt;
int err;
- rlwt = rpl_lwt_lwtunnel(orig_dst->lwtstate);
+ /* Get the address of lwtstate now, because "orig_dst" may not be there
+ * anymore after a call to skb_dst_drop(). Note that ip6_route_input()
+ * also calls skb_dst_drop(). Below, we compare the address of lwtstate
+ * to detect loops.
+ */
+ lwtst = orig_dst->lwtstate;
+
+ rlwt = rpl_lwt_lwtunnel(lwtst);
local_bh_disable();
dst = dst_cache_get(&rlwt->cache);
@@ -280,7 +288,9 @@ static int rpl_input(struct sk_buff *skb)
if (!dst) {
ip6_route_input(skb);
dst = skb_dst(skb);
- if (!dst->error) {
+
+ /* cache only if we don't create a dst reference loop */
+ if (!dst->error && lwtst != dst->lwtstate) {
local_bh_disable();
dst_cache_set_ip6(&rlwt->cache, dst,
&ipv6_hdr(skb)->saddr);
diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index 33833b2064c0..6045e850b4bf 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -472,10 +472,18 @@ static int seg6_input_core(struct net *net, struct sock *sk,
{
struct dst_entry *orig_dst = skb_dst(skb);
struct dst_entry *dst = NULL;
+ struct lwtunnel_state *lwtst;
struct seg6_lwt *slwt;
int err;
- slwt = seg6_lwt_lwtunnel(orig_dst->lwtstate);
+ /* Get the address of lwtstate now, because "orig_dst" may not be there
+ * anymore after a call to skb_dst_drop(). Note that ip6_route_input()
+ * also calls skb_dst_drop(). Below, we compare the address of lwtstate
+ * to detect loops.
+ */
+ lwtst = orig_dst->lwtstate;
+
+ slwt = seg6_lwt_lwtunnel(lwtst);
local_bh_disable();
dst = dst_cache_get(&slwt->cache);
@@ -490,7 +498,9 @@ static int seg6_input_core(struct net *net, struct sock *sk,
if (!dst) {
ip6_route_input(skb);
dst = skb_dst(skb);
- if (!dst->error) {
+
+ /* cache only if we don't create a dst reference loop */
+ if (!dst->error && lwtst != dst->lwtstate) {
local_bh_disable();
dst_cache_set_ip6(&slwt->cache, dst,
&ipv6_hdr(skb)->saddr);
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net v2 2/3] net: ipv6: fix lwtunnel loops in ioam6, rpl and seg6
2025-02-11 22:16 [PATCH net v2 0/3] several fixes for ioam6, rpl and seg6 lwtunnels Justin Iurman
2025-02-11 22:16 ` [PATCH net v2 1/3] net: ipv6: fix dst ref loops on input in " Justin Iurman
@ 2025-02-11 22:16 ` Justin Iurman
2025-02-12 20:42 ` Justin Iurman
2025-02-13 13:28 ` Ido Schimmel
2025-02-11 22:16 ` [PATCH net v2 3/3] net: ipv6: fix consecutive input and output transformation in lwtunnels Justin Iurman
2 siblings, 2 replies; 16+ messages in thread
From: Justin Iurman @ 2025-02-11 22:16 UTC (permalink / raw)
To: netdev
Cc: davem, dsahern, edumazet, kuba, pabeni, horms, justin.iurman,
Alexander Aring, David Lebrun
When the destination is the same post-transformation, we enter a
lwtunnel loop. This is true for ioam6_iptunnel, rpl_iptunnel, and
seg6_iptunnel, in both input() and output() handlers respectively, where
either dst_input() or dst_output() is called at the end. It happens for
instance with the ioam6 inline mode, but can also happen for any of them
as long as the post-transformation destination still matches the fib
entry. Note that ioam6_iptunnel was already comparing the old and new
destination address to prevent the loop, but it is not enough (e.g.,
other addresses can still match the same subnet).
Here is an example for rpl_input():
dump_stack_lvl+0x60/0x80
rpl_input+0x9d/0x320
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
[...]
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
ip6_sublist_rcv_finish+0x85/0x90
ip6_sublist_rcv+0x236/0x2f0
... until rpl_do_srh() fails, which means skb_cow_head() failed.
This patch prevents that kind of loop by redirecting to the origin
input() or output() when the destination is the same
post-transformation.
Fixes: 8cb3bf8bff3c ("ipv6: ioam: Add support for the ip6ip6 encapsulation")
Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: David Lebrun <dlebrun@google.com>
---
net/ipv6/ioam6_iptunnel.c | 6 ++----
net/ipv6/rpl_iptunnel.c | 10 ++++++++++
net/ipv6/seg6_iptunnel.c | 33 +++++++++++++++++++++++++++------
3 files changed, 39 insertions(+), 10 deletions(-)
diff --git a/net/ipv6/ioam6_iptunnel.c b/net/ipv6/ioam6_iptunnel.c
index 2c383c12a431..6c61b306f2e9 100644
--- a/net/ipv6/ioam6_iptunnel.c
+++ b/net/ipv6/ioam6_iptunnel.c
@@ -337,7 +337,6 @@ static int ioam6_do_encap(struct net *net, struct sk_buff *skb,
static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
{
struct dst_entry *dst = skb_dst(skb), *cache_dst = NULL;
- struct in6_addr orig_daddr;
struct ioam6_lwt *ilwt;
int err = -EINVAL;
u32 pkt_cnt;
@@ -352,8 +351,6 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
if (pkt_cnt % ilwt->freq.n >= ilwt->freq.k)
goto out;
- orig_daddr = ipv6_hdr(skb)->daddr;
-
local_bh_disable();
cache_dst = dst_cache_get(&ilwt->cache);
local_bh_enable();
@@ -422,7 +419,8 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
goto drop;
}
- if (!ipv6_addr_equal(&orig_daddr, &ipv6_hdr(skb)->daddr)) {
+ /* avoid a lwtunnel_input() loop when dst_entry is the same */
+ if (dst->lwtstate != cache_dst->lwtstate) {
skb_dst_drop(skb);
skb_dst_set(skb, cache_dst);
return dst_output(net, sk, skb);
diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c
index c26bf284459f..dc004e9aa649 100644
--- a/net/ipv6/rpl_iptunnel.c
+++ b/net/ipv6/rpl_iptunnel.c
@@ -247,6 +247,12 @@ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb)
goto drop;
}
+ /* avoid a lwtunnel_output() loop when dst_entry is the same */
+ if (orig_dst->lwtstate == dst->lwtstate) {
+ dst_release(dst);
+ return orig_dst->lwtstate->orig_output(net, sk, skb);
+ }
+
skb_dst_drop(skb);
skb_dst_set(skb, dst);
@@ -305,6 +311,10 @@ static int rpl_input(struct sk_buff *skb)
skb_dst_set(skb, dst);
}
+ /* avoid a lwtunnel_input() loop when dst_entry is the same */
+ if (lwtst == dst->lwtstate)
+ return dst->lwtstate->orig_input(skb);
+
return dst_input(skb);
drop:
diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index 6045e850b4bf..5ce662d8f334 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -467,9 +467,16 @@ static int seg6_input_finish(struct net *net, struct sock *sk,
return dst_input(skb);
}
+static int seg6_input_redirect_finish(struct net *net, struct sock *sk,
+ struct sk_buff *skb)
+{
+ return skb_dst(skb)->lwtstate->orig_input(skb);
+}
+
static int seg6_input_core(struct net *net, struct sock *sk,
struct sk_buff *skb)
{
+ int (*in_func)(struct net *net, struct sock *sk, struct sk_buff *skb);
struct dst_entry *orig_dst = skb_dst(skb);
struct dst_entry *dst = NULL;
struct lwtunnel_state *lwtst;
@@ -515,12 +522,18 @@ static int seg6_input_core(struct net *net, struct sock *sk,
skb_dst_set(skb, dst);
}
+ /* avoid a lwtunnel_input() loop when dst_entry is the same */
+ if (lwtst == dst->lwtstate)
+ in_func = seg6_input_redirect_finish;
+ else
+ in_func = seg6_input_finish;
+
if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled))
return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_OUT,
dev_net(skb->dev), NULL, skb, NULL,
- skb_dst(skb)->dev, seg6_input_finish);
+ skb_dst(skb)->dev, in_func);
- return seg6_input_finish(dev_net(skb->dev), NULL, skb);
+ return in_func(dev_net(skb->dev), NULL, skb);
drop:
kfree_skb(skb);
return err;
@@ -554,6 +567,7 @@ static int seg6_input(struct sk_buff *skb)
static int seg6_output_core(struct net *net, struct sock *sk,
struct sk_buff *skb)
{
+ int (*out_func)(struct net *net, struct sock *sk, struct sk_buff *skb);
struct dst_entry *orig_dst = skb_dst(skb);
struct dst_entry *dst = NULL;
struct seg6_lwt *slwt;
@@ -598,14 +612,21 @@ static int seg6_output_core(struct net *net, struct sock *sk,
goto drop;
}
- skb_dst_drop(skb);
- skb_dst_set(skb, dst);
+ /* avoid a lwtunnel_output() loop when dst_entry is the same */
+ if (orig_dst->lwtstate == dst->lwtstate) {
+ dst_release(dst);
+ out_func = orig_dst->lwtstate->orig_output;
+ } else {
+ skb_dst_drop(skb);
+ skb_dst_set(skb, dst);
+ out_func = dst_output;
+ }
if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled))
return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net, sk, skb,
- NULL, skb_dst(skb)->dev, dst_output);
+ NULL, skb_dst(skb)->dev, out_func);
- return dst_output(net, sk, skb);
+ return out_func(net, sk, skb);
drop:
dst_release(dst);
kfree_skb(skb);
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net v2 3/3] net: ipv6: fix consecutive input and output transformation in lwtunnels
2025-02-11 22:16 [PATCH net v2 0/3] several fixes for ioam6, rpl and seg6 lwtunnels Justin Iurman
2025-02-11 22:16 ` [PATCH net v2 1/3] net: ipv6: fix dst ref loops on input in " Justin Iurman
2025-02-11 22:16 ` [PATCH net v2 2/3] net: ipv6: fix lwtunnel loops in ioam6, rpl and seg6 Justin Iurman
@ 2025-02-11 22:16 ` Justin Iurman
2025-02-13 14:33 ` Paolo Abeni
2 siblings, 1 reply; 16+ messages in thread
From: Justin Iurman @ 2025-02-11 22:16 UTC (permalink / raw)
To: netdev; +Cc: davem, dsahern, edumazet, kuba, pabeni, horms, justin.iurman
Some lwtunnel users implement both lwt input and output handlers. If the
post-transformation destination on input is the same, the output handler
is also called and the same transformation is applied (again). Here are
the users: ila, bpf, rpl, seg6. The first one (ila) does not need this
fix, since it already implements a check to avoid such a duplicate. The
second (bpf) may need this fix, but I'm not familiar with that code path
and will keep it out of this patch. The two others (rpl and seg6) do
need this patch.
Due to the ila implementation (as an example), we cannot fix the issue
in lwtunnel_input() and lwtunnel_output() directly. Instead, we need to
do it on a case-by-case basis. This patch fixes both rpl_iptunnel and
seg6_iptunnel users. The fix re-uses skb->redirected in input handlers
to notify corresponding output handlers that the transformation was
already applied and to skip it. The "redirected" field seems safe to be
used here.
Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
net/ipv6/rpl_iptunnel.c | 14 ++++++++++++--
net/ipv6/seg6_iptunnel.c | 16 +++++++++++++---
2 files changed, 25 insertions(+), 5 deletions(-)
diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c
index dc004e9aa649..2dc1f2297e39 100644
--- a/net/ipv6/rpl_iptunnel.c
+++ b/net/ipv6/rpl_iptunnel.c
@@ -208,6 +208,12 @@ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb)
struct rpl_lwt *rlwt;
int err;
+ /* Don't re-apply the transformation when rpl_input() already did it */
+ if (skb_is_redirected(skb)) {
+ skb_reset_redirect(skb);
+ return orig_dst->lwtstate->orig_output(net, sk, skb);
+ }
+
rlwt = rpl_lwt_lwtunnel(orig_dst->lwtstate);
local_bh_disable();
@@ -311,9 +317,13 @@ static int rpl_input(struct sk_buff *skb)
skb_dst_set(skb, dst);
}
- /* avoid a lwtunnel_input() loop when dst_entry is the same */
- if (lwtst == dst->lwtstate)
+ /* avoid a lwtunnel_input() loop when dst_entry is the same, and make
+ * sure rpl_output() does not apply the transformation one more time
+ */
+ if (lwtst == dst->lwtstate) {
+ skb_set_redirected_noclear(skb, true);
return dst->lwtstate->orig_input(skb);
+ }
return dst_input(skb);
diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index 5ce662d8f334..69233c2ed658 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -522,11 +522,15 @@ static int seg6_input_core(struct net *net, struct sock *sk,
skb_dst_set(skb, dst);
}
- /* avoid a lwtunnel_input() loop when dst_entry is the same */
- if (lwtst == dst->lwtstate)
+ /* avoid a lwtunnel_input() loop when dst_entry is the same, and make
+ * sure seg6_output() does not apply the transformation one more time
+ */
+ if (lwtst == dst->lwtstate) {
+ skb_set_redirected_noclear(skb, true);
in_func = seg6_input_redirect_finish;
- else
+ } else {
in_func = seg6_input_finish;
+ }
if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled))
return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_OUT,
@@ -573,6 +577,12 @@ static int seg6_output_core(struct net *net, struct sock *sk,
struct seg6_lwt *slwt;
int err;
+ /* Don't re-apply the transformation when seg6_input() already did it */
+ if (skb_is_redirected(skb)) {
+ skb_reset_redirect(skb);
+ return orig_dst->lwtstate->orig_output(net, sk, skb);
+ }
+
slwt = seg6_lwt_lwtunnel(orig_dst->lwtstate);
local_bh_disable();
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 2/3] net: ipv6: fix lwtunnel loops in ioam6, rpl and seg6
2025-02-11 22:16 ` [PATCH net v2 2/3] net: ipv6: fix lwtunnel loops in ioam6, rpl and seg6 Justin Iurman
@ 2025-02-12 20:42 ` Justin Iurman
2025-02-13 13:28 ` Ido Schimmel
1 sibling, 0 replies; 16+ messages in thread
From: Justin Iurman @ 2025-02-12 20:42 UTC (permalink / raw)
To: netdev
Cc: davem, dsahern, edumazet, kuba, pabeni, horms, Alexander Aring,
David Lebrun
On 2/11/25 23:16, Justin Iurman wrote:
> When the destination is the same post-transformation, we enter a
> lwtunnel loop. This is true for ioam6_iptunnel, rpl_iptunnel, and
> seg6_iptunnel, in both input() and output() handlers respectively, where
> either dst_input() or dst_output() is called at the end. It happens for
> instance with the ioam6 inline mode, but can also happen for any of them
> as long as the post-transformation destination still matches the fib
> entry. Note that ioam6_iptunnel was already comparing the old and new
> destination address to prevent the loop, but it is not enough (e.g.,
> other addresses can still match the same subnet).
>
> Here is an example for rpl_input():
>
> dump_stack_lvl+0x60/0x80
> rpl_input+0x9d/0x320
> lwtunnel_input+0x64/0xa0
> lwtunnel_input+0x64/0xa0
> lwtunnel_input+0x64/0xa0
> lwtunnel_input+0x64/0xa0
> lwtunnel_input+0x64/0xa0
> [...]
> lwtunnel_input+0x64/0xa0
> lwtunnel_input+0x64/0xa0
> lwtunnel_input+0x64/0xa0
> lwtunnel_input+0x64/0xa0
> lwtunnel_input+0x64/0xa0
> ip6_sublist_rcv_finish+0x85/0x90
> ip6_sublist_rcv+0x236/0x2f0
>
> ... until rpl_do_srh() fails, which means skb_cow_head() failed.
>
> This patch prevents that kind of loop by redirecting to the origin
> input() or output() when the destination is the same
> post-transformation.
>
> Fixes: 8cb3bf8bff3c ("ipv6: ioam: Add support for the ip6ip6 encapsulation")
> Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
> Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> Cc: Alexander Aring <alex.aring@gmail.com>
> Cc: David Lebrun <dlebrun@google.com>
> ---
> net/ipv6/ioam6_iptunnel.c | 6 ++----
> net/ipv6/rpl_iptunnel.c | 10 ++++++++++
> net/ipv6/seg6_iptunnel.c | 33 +++++++++++++++++++++++++++------
> 3 files changed, 39 insertions(+), 10 deletions(-)
>
> diff --git a/net/ipv6/ioam6_iptunnel.c b/net/ipv6/ioam6_iptunnel.c
> index 2c383c12a431..6c61b306f2e9 100644
> --- a/net/ipv6/ioam6_iptunnel.c
> +++ b/net/ipv6/ioam6_iptunnel.c
> @@ -337,7 +337,6 @@ static int ioam6_do_encap(struct net *net, struct sk_buff *skb,
> static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> {
> struct dst_entry *dst = skb_dst(skb), *cache_dst = NULL;
> - struct in6_addr orig_daddr;
> struct ioam6_lwt *ilwt;
> int err = -EINVAL;
> u32 pkt_cnt;
> @@ -352,8 +351,6 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> if (pkt_cnt % ilwt->freq.n >= ilwt->freq.k)
> goto out;
>
> - orig_daddr = ipv6_hdr(skb)->daddr;
> -
> local_bh_disable();
> cache_dst = dst_cache_get(&ilwt->cache);
> local_bh_enable();
> @@ -422,7 +419,8 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> goto drop;
> }
>
> - if (!ipv6_addr_equal(&orig_daddr, &ipv6_hdr(skb)->daddr)) {
> + /* avoid a lwtunnel_input() loop when dst_entry is the same */
sigh... Should be lwtunnel_output() in the comment, let me know if I
need to re-spin.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 1/3] net: ipv6: fix dst ref loops on input in rpl and seg6 lwtunnels
2025-02-11 22:16 ` [PATCH net v2 1/3] net: ipv6: fix dst ref loops on input in " Justin Iurman
@ 2025-02-13 12:27 ` Ido Schimmel
2025-02-13 22:37 ` Justin Iurman
0 siblings, 1 reply; 16+ messages in thread
From: Ido Schimmel @ 2025-02-13 12:27 UTC (permalink / raw)
To: Justin Iurman
Cc: netdev, davem, dsahern, edumazet, kuba, pabeni, horms,
Alexander Aring, David Lebrun
On Tue, Feb 11, 2025 at 11:16:22PM +0100, Justin Iurman wrote:
> As a follow up to commit 92191dd10730 ("net: ipv6: fix dst ref loops in
> rpl, seg6 and ioam6 lwtunnels"), we also need a conditional dst cache on
> input for seg6_iptunnel and rpl_iptunnel to prevent dst ref loops (i.e.,
> if the packet destination did not change, we may end up recording a
> reference to the lwtunnel in its own cache, and the lwtunnel state will
> never be freed).
>
> Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
> Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> Cc: Alexander Aring <alex.aring@gmail.com>
> Cc: David Lebrun <dlebrun@google.com>
Not an expert but was asked to take a look. Seems consistent with the
output path and comparing the state address seems safe as it is only
compared and never dereferenced after dropping the dst in the input
path.
I would have probably split it into two patches to make it a bit easier
to backport. 5.4.y needs the seg6 fix, but not the rpl one.
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 2/3] net: ipv6: fix lwtunnel loops in ioam6, rpl and seg6
2025-02-11 22:16 ` [PATCH net v2 2/3] net: ipv6: fix lwtunnel loops in ioam6, rpl and seg6 Justin Iurman
2025-02-12 20:42 ` Justin Iurman
@ 2025-02-13 13:28 ` Ido Schimmel
2025-02-13 22:51 ` Justin Iurman
1 sibling, 1 reply; 16+ messages in thread
From: Ido Schimmel @ 2025-02-13 13:28 UTC (permalink / raw)
To: Justin Iurman
Cc: netdev, davem, dsahern, edumazet, kuba, pabeni, horms,
Alexander Aring, David Lebrun
On Tue, Feb 11, 2025 at 11:16:23PM +0100, Justin Iurman wrote:
> When the destination is the same post-transformation, we enter a
> lwtunnel loop. This is true for ioam6_iptunnel, rpl_iptunnel, and
> seg6_iptunnel, in both input() and output() handlers respectively, where
> either dst_input() or dst_output() is called at the end. It happens for
> instance with the ioam6 inline mode, but can also happen for any of them
> as long as the post-transformation destination still matches the fib
> entry. Note that ioam6_iptunnel was already comparing the old and new
> destination address to prevent the loop, but it is not enough (e.g.,
> other addresses can still match the same subnet).
>
> Here is an example for rpl_input():
>
> dump_stack_lvl+0x60/0x80
> rpl_input+0x9d/0x320
> lwtunnel_input+0x64/0xa0
> lwtunnel_input+0x64/0xa0
> lwtunnel_input+0x64/0xa0
> lwtunnel_input+0x64/0xa0
> lwtunnel_input+0x64/0xa0
> [...]
> lwtunnel_input+0x64/0xa0
> lwtunnel_input+0x64/0xa0
> lwtunnel_input+0x64/0xa0
> lwtunnel_input+0x64/0xa0
> lwtunnel_input+0x64/0xa0
> ip6_sublist_rcv_finish+0x85/0x90
> ip6_sublist_rcv+0x236/0x2f0
>
> ... until rpl_do_srh() fails, which means skb_cow_head() failed.
>
> This patch prevents that kind of loop by redirecting to the origin
> input() or output() when the destination is the same
> post-transformation.
A loop was reported a few months ago with a similar stack trace:
https://lore.kernel.org/netdev/2bc9e2079e864a9290561894d2a602d6@akamai.com/
But even with this series applied my VM gets stuck. Can you please check
if the fix is incomplete?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 3/3] net: ipv6: fix consecutive input and output transformation in lwtunnels
2025-02-11 22:16 ` [PATCH net v2 3/3] net: ipv6: fix consecutive input and output transformation in lwtunnels Justin Iurman
@ 2025-02-13 14:33 ` Paolo Abeni
2025-02-13 22:57 ` Justin Iurman
0 siblings, 1 reply; 16+ messages in thread
From: Paolo Abeni @ 2025-02-13 14:33 UTC (permalink / raw)
To: Justin Iurman, netdev; +Cc: davem, dsahern, edumazet, kuba, horms
On 2/11/25 11:16 PM, Justin Iurman wrote:
> Some lwtunnel users implement both lwt input and output handlers. If the
> post-transformation destination on input is the same, the output handler
> is also called and the same transformation is applied (again). Here are
> the users: ila, bpf, rpl, seg6. The first one (ila) does not need this
> fix, since it already implements a check to avoid such a duplicate. The
> second (bpf) may need this fix, but I'm not familiar with that code path
> and will keep it out of this patch. The two others (rpl and seg6) do
> need this patch.
>
> Due to the ila implementation (as an example), we cannot fix the issue
> in lwtunnel_input() and lwtunnel_output() directly. Instead, we need to
> do it on a case-by-case basis. This patch fixes both rpl_iptunnel and
> seg6_iptunnel users. The fix re-uses skb->redirected in input handlers
> to notify corresponding output handlers that the transformation was
> already applied and to skip it. The "redirected" field seems safe to be
> used here.
>
> Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
> Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> ---
> net/ipv6/rpl_iptunnel.c | 14 ++++++++++++--
> net/ipv6/seg6_iptunnel.c | 16 +++++++++++++---
> 2 files changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c
> index dc004e9aa649..2dc1f2297e39 100644
> --- a/net/ipv6/rpl_iptunnel.c
> +++ b/net/ipv6/rpl_iptunnel.c
> @@ -208,6 +208,12 @@ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> struct rpl_lwt *rlwt;
> int err;
>
> + /* Don't re-apply the transformation when rpl_input() already did it */
> + if (skb_is_redirected(skb)) {
This check looks false-positive prone, i.e. if packet lands on an LWT
tunnel due to an tc redirect from another non lwt device.
On the flip side I don't see any good method to propagate the relevant
information. A skb ext would work, but I would not call that a good method.
/P
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 1/3] net: ipv6: fix dst ref loops on input in rpl and seg6 lwtunnels
2025-02-13 12:27 ` Ido Schimmel
@ 2025-02-13 22:37 ` Justin Iurman
0 siblings, 0 replies; 16+ messages in thread
From: Justin Iurman @ 2025-02-13 22:37 UTC (permalink / raw)
To: Ido Schimmel
Cc: netdev, davem, dsahern, edumazet, kuba, pabeni, horms,
Alexander Aring, David Lebrun
On 2/13/25 13:27, Ido Schimmel wrote:
> On Tue, Feb 11, 2025 at 11:16:22PM +0100, Justin Iurman wrote:
>> As a follow up to commit 92191dd10730 ("net: ipv6: fix dst ref loops in
>> rpl, seg6 and ioam6 lwtunnels"), we also need a conditional dst cache on
>> input for seg6_iptunnel and rpl_iptunnel to prevent dst ref loops (i.e.,
>> if the packet destination did not change, we may end up recording a
>> reference to the lwtunnel in its own cache, and the lwtunnel state will
>> never be freed).
>>
>> Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
>> Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
>> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>> Cc: Alexander Aring <alex.aring@gmail.com>
>> Cc: David Lebrun <dlebrun@google.com>
>
> Not an expert but was asked to take a look. Seems consistent with the
> output path and comparing the state address seems safe as it is only
> compared and never dereferenced after dropping the dst in the input
> path.
>
> I would have probably split it into two patches to make it a bit easier
> to backport. 5.4.y needs the seg6 fix, but not the rpl one.
>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Thanks Ido. I'll split it in two for v3.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 2/3] net: ipv6: fix lwtunnel loops in ioam6, rpl and seg6
2025-02-13 13:28 ` Ido Schimmel
@ 2025-02-13 22:51 ` Justin Iurman
2025-02-16 16:31 ` Ido Schimmel
0 siblings, 1 reply; 16+ messages in thread
From: Justin Iurman @ 2025-02-13 22:51 UTC (permalink / raw)
To: Ido Schimmel
Cc: netdev, davem, dsahern, edumazet, kuba, pabeni, horms,
Alexander Aring, David Lebrun
On 2/13/25 14:28, Ido Schimmel wrote:
> On Tue, Feb 11, 2025 at 11:16:23PM +0100, Justin Iurman wrote:
>> When the destination is the same post-transformation, we enter a
>> lwtunnel loop. This is true for ioam6_iptunnel, rpl_iptunnel, and
>> seg6_iptunnel, in both input() and output() handlers respectively, where
>> either dst_input() or dst_output() is called at the end. It happens for
>> instance with the ioam6 inline mode, but can also happen for any of them
>> as long as the post-transformation destination still matches the fib
>> entry. Note that ioam6_iptunnel was already comparing the old and new
>> destination address to prevent the loop, but it is not enough (e.g.,
>> other addresses can still match the same subnet).
>>
>> Here is an example for rpl_input():
>>
>> dump_stack_lvl+0x60/0x80
>> rpl_input+0x9d/0x320
>> lwtunnel_input+0x64/0xa0
>> lwtunnel_input+0x64/0xa0
>> lwtunnel_input+0x64/0xa0
>> lwtunnel_input+0x64/0xa0
>> lwtunnel_input+0x64/0xa0
>> [...]
>> lwtunnel_input+0x64/0xa0
>> lwtunnel_input+0x64/0xa0
>> lwtunnel_input+0x64/0xa0
>> lwtunnel_input+0x64/0xa0
>> lwtunnel_input+0x64/0xa0
>> ip6_sublist_rcv_finish+0x85/0x90
>> ip6_sublist_rcv+0x236/0x2f0
>>
>> ... until rpl_do_srh() fails, which means skb_cow_head() failed.
>>
>> This patch prevents that kind of loop by redirecting to the origin
>> input() or output() when the destination is the same
>> post-transformation.
>
> A loop was reported a few months ago with a similar stack trace:
> https://lore.kernel.org/netdev/2bc9e2079e864a9290561894d2a602d6@akamai.com/
>
> But even with this series applied my VM gets stuck. Can you please check
> if the fix is incomplete?
Good catch! Indeed, seg6_local also needs to be fixed the same way.
Back to my first idea: maybe we could directly fix it in
lwtunnel_input() and lwtunnel_output() to make our lives easier, but
we'd have to be careful to modify all users accordingly. The users I'm
100% sure that are concerned: ioam6 (output), rpl (input/output), seg6
(input/output), seg6_local (input). Other users I'm not totally sure (to
be checked): ila (output), bpf (input).
Otherwise, we'll need to apply the fix to each user concerned (probably
the safest (best?) option right now). Any opinions?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 3/3] net: ipv6: fix consecutive input and output transformation in lwtunnels
2025-02-13 14:33 ` Paolo Abeni
@ 2025-02-13 22:57 ` Justin Iurman
0 siblings, 0 replies; 16+ messages in thread
From: Justin Iurman @ 2025-02-13 22:57 UTC (permalink / raw)
To: Paolo Abeni, netdev; +Cc: davem, dsahern, edumazet, kuba, horms
On 2/13/25 15:33, Paolo Abeni wrote:
> On 2/11/25 11:16 PM, Justin Iurman wrote:
>> Some lwtunnel users implement both lwt input and output handlers. If the
>> post-transformation destination on input is the same, the output handler
>> is also called and the same transformation is applied (again). Here are
>> the users: ila, bpf, rpl, seg6. The first one (ila) does not need this
>> fix, since it already implements a check to avoid such a duplicate. The
>> second (bpf) may need this fix, but I'm not familiar with that code path
>> and will keep it out of this patch. The two others (rpl and seg6) do
>> need this patch.
>>
>> Due to the ila implementation (as an example), we cannot fix the issue
>> in lwtunnel_input() and lwtunnel_output() directly. Instead, we need to
>> do it on a case-by-case basis. This patch fixes both rpl_iptunnel and
>> seg6_iptunnel users. The fix re-uses skb->redirected in input handlers
>> to notify corresponding output handlers that the transformation was
>> already applied and to skip it. The "redirected" field seems safe to be
>> used here.
>>
>> Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
>> Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
>> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>> ---
>> net/ipv6/rpl_iptunnel.c | 14 ++++++++++++--
>> net/ipv6/seg6_iptunnel.c | 16 +++++++++++++---
>> 2 files changed, 25 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c
>> index dc004e9aa649..2dc1f2297e39 100644
>> --- a/net/ipv6/rpl_iptunnel.c
>> +++ b/net/ipv6/rpl_iptunnel.c
>> @@ -208,6 +208,12 @@ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>> struct rpl_lwt *rlwt;
>> int err;
>>
>> + /* Don't re-apply the transformation when rpl_input() already did it */
>> + if (skb_is_redirected(skb)) {
>
> This check looks false-positive prone, i.e. if packet lands on an LWT
> tunnel due to an tc redirect from another non lwt device.
True, it was indeed a trade-off solution :-/
> On the flip side I don't see any good method to propagate the relevant
> information. A skb ext would work, but I would not call that a good method.
Agree :-( Did not check but maybe we could also look at
skb->tc_at_ingress in that case? Not sure it'd help though. Or... any
chance we could find a hole in sk_buff for a new :1 field?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 2/3] net: ipv6: fix lwtunnel loops in ioam6, rpl and seg6
2025-02-13 22:51 ` Justin Iurman
@ 2025-02-16 16:31 ` Ido Schimmel
2025-02-17 14:40 ` Ido Schimmel
2025-02-25 18:36 ` Justin Iurman
0 siblings, 2 replies; 16+ messages in thread
From: Ido Schimmel @ 2025-02-16 16:31 UTC (permalink / raw)
To: Justin Iurman
Cc: netdev, davem, dsahern, edumazet, kuba, pabeni, horms,
Alexander Aring, David Lebrun
On Thu, Feb 13, 2025 at 11:51:49PM +0100, Justin Iurman wrote:
> On 2/13/25 14:28, Ido Schimmel wrote:
> > On Tue, Feb 11, 2025 at 11:16:23PM +0100, Justin Iurman wrote:
> > > When the destination is the same post-transformation, we enter a
> > > lwtunnel loop. This is true for ioam6_iptunnel, rpl_iptunnel, and
> > > seg6_iptunnel, in both input() and output() handlers respectively, where
> > > either dst_input() or dst_output() is called at the end. It happens for
> > > instance with the ioam6 inline mode, but can also happen for any of them
> > > as long as the post-transformation destination still matches the fib
> > > entry. Note that ioam6_iptunnel was already comparing the old and new
> > > destination address to prevent the loop, but it is not enough (e.g.,
> > > other addresses can still match the same subnet).
> > >
> > > Here is an example for rpl_input():
> > >
> > > dump_stack_lvl+0x60/0x80
> > > rpl_input+0x9d/0x320
> > > lwtunnel_input+0x64/0xa0
> > > lwtunnel_input+0x64/0xa0
> > > lwtunnel_input+0x64/0xa0
> > > lwtunnel_input+0x64/0xa0
> > > lwtunnel_input+0x64/0xa0
> > > [...]
> > > lwtunnel_input+0x64/0xa0
> > > lwtunnel_input+0x64/0xa0
> > > lwtunnel_input+0x64/0xa0
> > > lwtunnel_input+0x64/0xa0
> > > lwtunnel_input+0x64/0xa0
> > > ip6_sublist_rcv_finish+0x85/0x90
> > > ip6_sublist_rcv+0x236/0x2f0
> > >
> > > ... until rpl_do_srh() fails, which means skb_cow_head() failed.
> > >
> > > This patch prevents that kind of loop by redirecting to the origin
> > > input() or output() when the destination is the same
> > > post-transformation.
> >
> > A loop was reported a few months ago with a similar stack trace:
> > https://lore.kernel.org/netdev/2bc9e2079e864a9290561894d2a602d6@akamai.com/
> >
> > But even with this series applied my VM gets stuck. Can you please check
> > if the fix is incomplete?
>
> Good catch! Indeed, seg6_local also needs to be fixed the same way.
>
> Back to my first idea: maybe we could directly fix it in lwtunnel_input()
> and lwtunnel_output() to make our lives easier, but we'd have to be careful
> to modify all users accordingly. The users I'm 100% sure that are concerned:
> ioam6 (output), rpl (input/output), seg6 (input/output), seg6_local (input).
> Other users I'm not totally sure (to be checked): ila (output), bpf (input).
>
> Otherwise, we'll need to apply the fix to each user concerned (probably the
> safest (best?) option right now). Any opinions?
I audited the various lwt users and I agree with your analysis about
which users seem to be effected by this issue.
I'm not entirely sure how you want to fix this in
lwtunnel_{input,output}() given that only the input()/output() handlers
of the individual lwt users are aware of both the old and new dst
entries.
BTW, I noticed that bpf implements the xmit() hook in addition to
input()/output(). I wonder if a loop is possible in the following case:
ip_finish_output2() <----+
lwtunnel_xmit() |
bpf_xmit() |
// bpf program does not change |
// the packet and returns |
// BPF_LWT_REROUTE |
bpf_lwt_xmit_reroute() |
// unmodified packet resolves |
// the same dst entry |
dst_output() |
ip_output() -------------+
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 2/3] net: ipv6: fix lwtunnel loops in ioam6, rpl and seg6
2025-02-16 16:31 ` Ido Schimmel
@ 2025-02-17 14:40 ` Ido Schimmel
2025-02-25 18:47 ` Justin Iurman
2025-03-06 18:14 ` Justin Iurman
2025-02-25 18:36 ` Justin Iurman
1 sibling, 2 replies; 16+ messages in thread
From: Ido Schimmel @ 2025-02-17 14:40 UTC (permalink / raw)
To: Justin Iurman
Cc: netdev, davem, dsahern, edumazet, kuba, pabeni, horms,
Alexander Aring, David Lebrun
On Sun, Feb 16, 2025 at 06:31:06PM +0200, Ido Schimmel wrote:
> On Thu, Feb 13, 2025 at 11:51:49PM +0100, Justin Iurman wrote:
> > On 2/13/25 14:28, Ido Schimmel wrote:
> > > On Tue, Feb 11, 2025 at 11:16:23PM +0100, Justin Iurman wrote:
> > > > When the destination is the same post-transformation, we enter a
> > > > lwtunnel loop. This is true for ioam6_iptunnel, rpl_iptunnel, and
> > > > seg6_iptunnel, in both input() and output() handlers respectively, where
> > > > either dst_input() or dst_output() is called at the end. It happens for
> > > > instance with the ioam6 inline mode, but can also happen for any of them
> > > > as long as the post-transformation destination still matches the fib
> > > > entry. Note that ioam6_iptunnel was already comparing the old and new
> > > > destination address to prevent the loop, but it is not enough (e.g.,
> > > > other addresses can still match the same subnet).
> > > >
> > > > Here is an example for rpl_input():
> > > >
> > > > dump_stack_lvl+0x60/0x80
> > > > rpl_input+0x9d/0x320
> > > > lwtunnel_input+0x64/0xa0
> > > > lwtunnel_input+0x64/0xa0
> > > > lwtunnel_input+0x64/0xa0
> > > > lwtunnel_input+0x64/0xa0
> > > > lwtunnel_input+0x64/0xa0
> > > > [...]
> > > > lwtunnel_input+0x64/0xa0
> > > > lwtunnel_input+0x64/0xa0
> > > > lwtunnel_input+0x64/0xa0
> > > > lwtunnel_input+0x64/0xa0
> > > > lwtunnel_input+0x64/0xa0
> > > > ip6_sublist_rcv_finish+0x85/0x90
> > > > ip6_sublist_rcv+0x236/0x2f0
> > > >
> > > > ... until rpl_do_srh() fails, which means skb_cow_head() failed.
> > > >
> > > > This patch prevents that kind of loop by redirecting to the origin
> > > > input() or output() when the destination is the same
> > > > post-transformation.
> > >
> > > A loop was reported a few months ago with a similar stack trace:
> > > https://lore.kernel.org/netdev/2bc9e2079e864a9290561894d2a602d6@akamai.com/
> > >
> > > But even with this series applied my VM gets stuck. Can you please check
> > > if the fix is incomplete?
> >
> > Good catch! Indeed, seg6_local also needs to be fixed the same way.
> >
> > Back to my first idea: maybe we could directly fix it in lwtunnel_input()
> > and lwtunnel_output() to make our lives easier, but we'd have to be careful
> > to modify all users accordingly. The users I'm 100% sure that are concerned:
> > ioam6 (output), rpl (input/output), seg6 (input/output), seg6_local (input).
> > Other users I'm not totally sure (to be checked): ila (output), bpf (input).
> >
> > Otherwise, we'll need to apply the fix to each user concerned (probably the
> > safest (best?) option right now). Any opinions?
>
> I audited the various lwt users and I agree with your analysis about
> which users seem to be effected by this issue.
>
> I'm not entirely sure how you want to fix this in
> lwtunnel_{input,output}() given that only the input()/output() handlers
> of the individual lwt users are aware of both the old and new dst
> entries.
>
> BTW, I noticed that bpf implements the xmit() hook in addition to
> input()/output(). I wonder if a loop is possible in the following case:
>
> ip_finish_output2() <----+
> lwtunnel_xmit() |
> bpf_xmit() |
> // bpf program does not change |
> // the packet and returns |
> // BPF_LWT_REROUTE |
> bpf_lwt_xmit_reroute() |
> // unmodified packet resolves |
> // the same dst entry |
> dst_output() |
> ip_output() -------------+
FWIW, verified that this is indeed the case. Reproducer:
$ cat lwt_xmit_repo.bpf.c
// SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
SEC("lwt_xmit")
int repo(struct __sk_buff *skb)
{
return BPF_LWT_REROUTE;
}
$ clang -O2 -target bpf -c lwt_xmit_repo.bpf.c -o lwt_xmit_repo.o
# ip link add name dummy1 up type dummy
# ip route add 192.0.2.0/24 nexthop encap bpf xmit obj ./lwt_xmit_repo.o sec lwt_xmit dev dummy1
# ping 192.0.2.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 2/3] net: ipv6: fix lwtunnel loops in ioam6, rpl and seg6
2025-02-16 16:31 ` Ido Schimmel
2025-02-17 14:40 ` Ido Schimmel
@ 2025-02-25 18:36 ` Justin Iurman
1 sibling, 0 replies; 16+ messages in thread
From: Justin Iurman @ 2025-02-25 18:36 UTC (permalink / raw)
To: Ido Schimmel
Cc: netdev, davem, dsahern, edumazet, kuba, pabeni, horms,
Alexander Aring, David Lebrun
On 2/16/25 17:31, Ido Schimmel wrote:
> On Thu, Feb 13, 2025 at 11:51:49PM +0100, Justin Iurman wrote:
>> On 2/13/25 14:28, Ido Schimmel wrote:
>>> On Tue, Feb 11, 2025 at 11:16:23PM +0100, Justin Iurman wrote:
>>>> When the destination is the same post-transformation, we enter a
>>>> lwtunnel loop. This is true for ioam6_iptunnel, rpl_iptunnel, and
>>>> seg6_iptunnel, in both input() and output() handlers respectively, where
>>>> either dst_input() or dst_output() is called at the end. It happens for
>>>> instance with the ioam6 inline mode, but can also happen for any of them
>>>> as long as the post-transformation destination still matches the fib
>>>> entry. Note that ioam6_iptunnel was already comparing the old and new
>>>> destination address to prevent the loop, but it is not enough (e.g.,
>>>> other addresses can still match the same subnet).
>>>>
>>>> Here is an example for rpl_input():
>>>>
>>>> dump_stack_lvl+0x60/0x80
>>>> rpl_input+0x9d/0x320
>>>> lwtunnel_input+0x64/0xa0
>>>> lwtunnel_input+0x64/0xa0
>>>> lwtunnel_input+0x64/0xa0
>>>> lwtunnel_input+0x64/0xa0
>>>> lwtunnel_input+0x64/0xa0
>>>> [...]
>>>> lwtunnel_input+0x64/0xa0
>>>> lwtunnel_input+0x64/0xa0
>>>> lwtunnel_input+0x64/0xa0
>>>> lwtunnel_input+0x64/0xa0
>>>> lwtunnel_input+0x64/0xa0
>>>> ip6_sublist_rcv_finish+0x85/0x90
>>>> ip6_sublist_rcv+0x236/0x2f0
>>>>
>>>> ... until rpl_do_srh() fails, which means skb_cow_head() failed.
>>>>
>>>> This patch prevents that kind of loop by redirecting to the origin
>>>> input() or output() when the destination is the same
>>>> post-transformation.
>>>
>>> A loop was reported a few months ago with a similar stack trace:
>>> https://lore.kernel.org/netdev/2bc9e2079e864a9290561894d2a602d6@akamai.com/
>>>
>>> But even with this series applied my VM gets stuck. Can you please check
>>> if the fix is incomplete?
>>
>> Good catch! Indeed, seg6_local also needs to be fixed the same way.
>>
>> Back to my first idea: maybe we could directly fix it in lwtunnel_input()
>> and lwtunnel_output() to make our lives easier, but we'd have to be careful
>> to modify all users accordingly. The users I'm 100% sure that are concerned:
>> ioam6 (output), rpl (input/output), seg6 (input/output), seg6_local (input).
>> Other users I'm not totally sure (to be checked): ila (output), bpf (input).
>>
>> Otherwise, we'll need to apply the fix to each user concerned (probably the
>> safest (best?) option right now). Any opinions?
>
> I audited the various lwt users and I agree with your analysis about
> which users seem to be effected by this issue.
>
> I'm not entirely sure how you want to fix this in
> lwtunnel_{input,output}() given that only the input()/output() handlers
> of the individual lwt users are aware of both the old and new dst
> entries.
Right. The idea was to compare "orig_dst" with "new dst" before/after a
call to input()/output() in lwtunnel_input()/lwtunnel_output(). Which,
of course, would require to modify each of those input/output handlers
respectively, so that they don't call dst_input()/dst_output() nor
orig_input()/orig_output() anymore. Would be easier to apply the fix at
that level, instead of each one by one.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 2/3] net: ipv6: fix lwtunnel loops in ioam6, rpl and seg6
2025-02-17 14:40 ` Ido Schimmel
@ 2025-02-25 18:47 ` Justin Iurman
2025-03-06 18:14 ` Justin Iurman
1 sibling, 0 replies; 16+ messages in thread
From: Justin Iurman @ 2025-02-25 18:47 UTC (permalink / raw)
To: Ido Schimmel
Cc: netdev, davem, dsahern, edumazet, kuba, pabeni, horms,
Alexander Aring, David Lebrun
On 2/17/25 15:40, Ido Schimmel wrote:
> On Sun, Feb 16, 2025 at 06:31:06PM +0200, Ido Schimmel wrote:
>> On Thu, Feb 13, 2025 at 11:51:49PM +0100, Justin Iurman wrote:
>>> On 2/13/25 14:28, Ido Schimmel wrote:
>>>> On Tue, Feb 11, 2025 at 11:16:23PM +0100, Justin Iurman wrote:
>>>>> When the destination is the same post-transformation, we enter a
>>>>> lwtunnel loop. This is true for ioam6_iptunnel, rpl_iptunnel, and
>>>>> seg6_iptunnel, in both input() and output() handlers respectively, where
>>>>> either dst_input() or dst_output() is called at the end. It happens for
>>>>> instance with the ioam6 inline mode, but can also happen for any of them
>>>>> as long as the post-transformation destination still matches the fib
>>>>> entry. Note that ioam6_iptunnel was already comparing the old and new
>>>>> destination address to prevent the loop, but it is not enough (e.g.,
>>>>> other addresses can still match the same subnet).
>>>>>
>>>>> Here is an example for rpl_input():
>>>>>
>>>>> dump_stack_lvl+0x60/0x80
>>>>> rpl_input+0x9d/0x320
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> [...]
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> ip6_sublist_rcv_finish+0x85/0x90
>>>>> ip6_sublist_rcv+0x236/0x2f0
>>>>>
>>>>> ... until rpl_do_srh() fails, which means skb_cow_head() failed.
>>>>>
>>>>> This patch prevents that kind of loop by redirecting to the origin
>>>>> input() or output() when the destination is the same
>>>>> post-transformation.
>>>>
>>>> A loop was reported a few months ago with a similar stack trace:
>>>> https://lore.kernel.org/netdev/2bc9e2079e864a9290561894d2a602d6@akamai.com/
>>>>
>>>> But even with this series applied my VM gets stuck. Can you please check
>>>> if the fix is incomplete?
>>>
>>> Good catch! Indeed, seg6_local also needs to be fixed the same way.
>>>
>>> Back to my first idea: maybe we could directly fix it in lwtunnel_input()
>>> and lwtunnel_output() to make our lives easier, but we'd have to be careful
>>> to modify all users accordingly. The users I'm 100% sure that are concerned:
>>> ioam6 (output), rpl (input/output), seg6 (input/output), seg6_local (input).
>>> Other users I'm not totally sure (to be checked): ila (output), bpf (input).
>>>
>>> Otherwise, we'll need to apply the fix to each user concerned (probably the
>>> safest (best?) option right now). Any opinions?
>>
>> I audited the various lwt users and I agree with your analysis about
>> which users seem to be effected by this issue.
>>
>> I'm not entirely sure how you want to fix this in
>> lwtunnel_{input,output}() given that only the input()/output() handlers
>> of the individual lwt users are aware of both the old and new dst
>> entries.
>>
>> BTW, I noticed that bpf implements the xmit() hook in addition to
>> input()/output(). I wonder if a loop is possible in the following case:
>>
>> ip_finish_output2() <----+
>> lwtunnel_xmit() |
>> bpf_xmit() |
>> // bpf program does not change |
>> // the packet and returns |
>> // BPF_LWT_REROUTE |
>> bpf_lwt_xmit_reroute() |
>> // unmodified packet resolves |
>> // the same dst entry |
>> dst_output() |
>> ip_output() -------------+
>
> FWIW, verified that this is indeed the case. Reproducer:
>
> $ cat lwt_xmit_repo.bpf.c
> // SPDX-License-Identifier: GPL-2.0
> #include <linux/bpf.h>
> #include <bpf/bpf_helpers.h>
>
> SEC("lwt_xmit")
> int repo(struct __sk_buff *skb)
> {
> return BPF_LWT_REROUTE;
> }
> $ clang -O2 -target bpf -c lwt_xmit_repo.bpf.c -o lwt_xmit_repo.o
> # ip link add name dummy1 up type dummy
> # ip route add 192.0.2.0/24 nexthop encap bpf xmit obj ./lwt_xmit_repo.o sec lwt_xmit dev dummy1
> # ping 192.0.2.1
Thanks, Ido, appreciate. I'll post a new series based on this (#2) patch
to take all users into account. Note that the new logic I described
previously to solve the issue could be applied to lwtunnel_xmit() too. I
wonder what others might think about it.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net v2 2/3] net: ipv6: fix lwtunnel loops in ioam6, rpl and seg6
2025-02-17 14:40 ` Ido Schimmel
2025-02-25 18:47 ` Justin Iurman
@ 2025-03-06 18:14 ` Justin Iurman
1 sibling, 0 replies; 16+ messages in thread
From: Justin Iurman @ 2025-03-06 18:14 UTC (permalink / raw)
To: Ido Schimmel
Cc: netdev, davem, dsahern, edumazet, kuba, pabeni, horms,
Alexander Aring, David Lebrun
On 2/17/25 15:40, Ido Schimmel wrote:
> On Sun, Feb 16, 2025 at 06:31:06PM +0200, Ido Schimmel wrote:
>> On Thu, Feb 13, 2025 at 11:51:49PM +0100, Justin Iurman wrote:
>>> On 2/13/25 14:28, Ido Schimmel wrote:
>>>> On Tue, Feb 11, 2025 at 11:16:23PM +0100, Justin Iurman wrote:
>>>>> When the destination is the same post-transformation, we enter a
>>>>> lwtunnel loop. This is true for ioam6_iptunnel, rpl_iptunnel, and
>>>>> seg6_iptunnel, in both input() and output() handlers respectively, where
>>>>> either dst_input() or dst_output() is called at the end. It happens for
>>>>> instance with the ioam6 inline mode, but can also happen for any of them
>>>>> as long as the post-transformation destination still matches the fib
>>>>> entry. Note that ioam6_iptunnel was already comparing the old and new
>>>>> destination address to prevent the loop, but it is not enough (e.g.,
>>>>> other addresses can still match the same subnet).
>>>>>
>>>>> Here is an example for rpl_input():
>>>>>
>>>>> dump_stack_lvl+0x60/0x80
>>>>> rpl_input+0x9d/0x320
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> [...]
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> ip6_sublist_rcv_finish+0x85/0x90
>>>>> ip6_sublist_rcv+0x236/0x2f0
>>>>>
>>>>> ... until rpl_do_srh() fails, which means skb_cow_head() failed.
>>>>>
>>>>> This patch prevents that kind of loop by redirecting to the origin
>>>>> input() or output() when the destination is the same
>>>>> post-transformation.
>>>>
>>>> A loop was reported a few months ago with a similar stack trace:
>>>> https://lore.kernel.org/netdev/2bc9e2079e864a9290561894d2a602d6@akamai.com/
Ido,
That loop is another beast which is out of scope of the series I'm about
to send. Indeed, what I'm doing right now is to prevent reentry loops
within lwtunnel_{input|output}(). Which, by the way, is also applied to
seg6_local no matter what. The reported loop above is an infinite ping
pong game between two fib rules (vs an infinite loop within the same fib
rule -- what I'm fixing). If we want to fix that issue as well, we may
reuse something like dev_xmit_recursion() in
lwtunnel_{input|output|xmit}() to catch these buggy cases. Thoughts?
>> [...]
>>
>> BTW, I noticed that bpf implements the xmit() hook in addition to
>> input()/output(). I wonder if a loop is possible in the following case:
>>
>> ip_finish_output2() <----+
>> lwtunnel_xmit() |
>> bpf_xmit() |
>> // bpf program does not change |
>> // the packet and returns |
>> // BPF_LWT_REROUTE |
>> bpf_lwt_xmit_reroute() |
>> // unmodified packet resolves |
>> // the same dst entry |
>> dst_output() |
>> ip_output() -------------+
>
> FWIW, verified that this is indeed the case. Reproducer:
>
> $ cat lwt_xmit_repo.bpf.c
> // SPDX-License-Identifier: GPL-2.0
> #include <linux/bpf.h>
> #include <bpf/bpf_helpers.h>
>
> SEC("lwt_xmit")
> int repo(struct __sk_buff *skb)
> {
> return BPF_LWT_REROUTE;
> }
> $ clang -O2 -target bpf -c lwt_xmit_repo.bpf.c -o lwt_xmit_repo.o
> # ip link add name dummy1 up type dummy
> # ip route add 192.0.2.0/24 nexthop encap bpf xmit obj ./lwt_xmit_repo.o sec lwt_xmit dev dummy1
> # ping 192.0.2.1
This one's also something special because it's neither input nor output,
it's xmit. In that case, we cannot apply the same fix as for the others
(ioam6, rpl, seg6, ila). Here, what I suggest is simply to disallow
BPF_LWT_REROUTE when the dst_entry remains unchanged (which is, IMO, a
buggy case), as follows:
diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index ae74634310a3..ee3546d78903 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -180,6 +180,7 @@ static int bpf_lwt_xmit_reroute(struct sk_buff *skb)
struct net_device *l3mdev =
l3mdev_master_dev_rcu(skb_dst(skb)->dev);
int oif = l3mdev ? l3mdev->ifindex : 0;
struct dst_entry *dst = NULL;
+ struct dst_entry *orig_dst;
int err = -EAFNOSUPPORT;
struct sock *sk;
struct net *net;
@@ -201,6 +202,8 @@ static int bpf_lwt_xmit_reroute(struct sk_buff *skb)
net = dev_net(skb_dst(skb)->dev);
}
+ orig_dst = skb_dst(skb);
+
if (ipv4) {
struct iphdr *iph = ip_hdr(skb);
struct flowi4 fl4 = {};
@@ -254,6 +257,16 @@ static int bpf_lwt_xmit_reroute(struct sk_buff *skb)
if (unlikely(err))
goto err;
+ /* avoid lwtunnel_xmit() reentry loop when destination is the same
+ * after transformation (i.e., disallow BPF_LWT_REROUTE when
dst_entry
+ * remains the same).
+ */
+ if (orig_dst->lwtstate == dst->lwtstate) {
+ dst_release(dst);
+ err = -EINVAL;
+ goto err;
+ }
+
skb_dst_drop(skb);
skb_dst_set(skb, dst);
^ permalink raw reply related [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-03-06 18:14 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-11 22:16 [PATCH net v2 0/3] several fixes for ioam6, rpl and seg6 lwtunnels Justin Iurman
2025-02-11 22:16 ` [PATCH net v2 1/3] net: ipv6: fix dst ref loops on input in " Justin Iurman
2025-02-13 12:27 ` Ido Schimmel
2025-02-13 22:37 ` Justin Iurman
2025-02-11 22:16 ` [PATCH net v2 2/3] net: ipv6: fix lwtunnel loops in ioam6, rpl and seg6 Justin Iurman
2025-02-12 20:42 ` Justin Iurman
2025-02-13 13:28 ` Ido Schimmel
2025-02-13 22:51 ` Justin Iurman
2025-02-16 16:31 ` Ido Schimmel
2025-02-17 14:40 ` Ido Schimmel
2025-02-25 18:47 ` Justin Iurman
2025-03-06 18:14 ` Justin Iurman
2025-02-25 18:36 ` Justin Iurman
2025-02-11 22:16 ` [PATCH net v2 3/3] net: ipv6: fix consecutive input and output transformation in lwtunnels Justin Iurman
2025-02-13 14:33 ` Paolo Abeni
2025-02-13 22:57 ` Justin Iurman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).