* IPCB isn't initialized when MPLS route is pointing to a VRF
@ 2021-11-22 16:05 Stephen Suryaputra
2021-11-22 16:07 ` Stephen Suryaputra
2021-11-28 22:48 ` David Ahern
0 siblings, 2 replies; 3+ messages in thread
From: Stephen Suryaputra @ 2021-11-22 16:05 UTC (permalink / raw)
To: netdev, dsahern, ebiederm, roopa; +Cc: fw
Hi,
We ran into a problem because IPCB isn't being initialized when MPLS is
egressing into a VRF. Reproducer script and its teardown are attached,
but they are only to illustrate our setup rather than seeing the problem
as it depends on what is in the skb->cb[].
We found this when h0 is sending an ip packet with DF=1 to 10.1.4.2 and
on ler1 the code path is as follows: mpls_forward() calls mpls_egress()
and then calls neigh_xmit(), which ends up calling __dev_queue_xmit()
and vrf_xmit() through dev_hard_start_xmit(). vrf_xmit() eventually will
call ip_output() and __ip_finish_output() that has the check for
IPCB(skb)->frag_max_size. The conditional happens to be true for us due
to IPCB isn't being initialized even though the packet size is small.
The packet then is dropped in ip_fragment().
The change in this diff is a way to fix:
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index ffeb2df8be7a..1d0a0151e175 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -310,6 +310,7 @@ static bool mpls_egress(struct net *net, struct mpls_route *rt,
htons(hdr4->ttl << 8),
htons(new_ttl << 8));
hdr4->ttl = new_ttl;
+ memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
success = true;
break;
}
@@ -327,6 +328,7 @@ static bool mpls_egress(struct net *net, struct mpls_route *rt,
hdr6->hop_limit = dec.ttl;
else if (hdr6->hop_limit)
hdr6->hop_limit = hdr6->hop_limit - 1;
+ memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
success = true;
break;
}
Here are my questions. Is this the best place to initialize the IPCB?
Would it be better done in vrf.c? I can work on the formal patch once we
agree on where the final fix should be
Cc Florian Westphal since we found the problem after upgrading to kernel
version that has his commit bb4cc1a18856 ("net: ip: always refragment ip
defragmented packets"). But I think the bug is there without his commit
as well if (IPCB(skb)->flags & IPCB_FRAG_PMTU) happens to evaluate to
true.
Thanks,
Stephen.
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: IPCB isn't initialized when MPLS route is pointing to a VRF
2021-11-22 16:05 IPCB isn't initialized when MPLS route is pointing to a VRF Stephen Suryaputra
@ 2021-11-22 16:07 ` Stephen Suryaputra
2021-11-28 22:48 ` David Ahern
1 sibling, 0 replies; 3+ messages in thread
From: Stephen Suryaputra @ 2021-11-22 16:07 UTC (permalink / raw)
To: netdev, dsahern, ebiederm, roopa
[-- Attachment #1: Type: text/plain, Size: 2211 bytes --]
With the attachments this time.
On Mon, Nov 22, 2021 at 11:05:46AM -0500, Stephen Suryaputra wrote:
> Hi,
>
> We ran into a problem because IPCB isn't being initialized when MPLS is
> egressing into a VRF. Reproducer script and its teardown are attached,
> but they are only to illustrate our setup rather than seeing the problem
> as it depends on what is in the skb->cb[].
>
> We found this when h0 is sending an ip packet with DF=1 to 10.1.4.2 and
> on ler1 the code path is as follows: mpls_forward() calls mpls_egress()
> and then calls neigh_xmit(), which ends up calling __dev_queue_xmit()
> and vrf_xmit() through dev_hard_start_xmit(). vrf_xmit() eventually will
> call ip_output() and __ip_finish_output() that has the check for
> IPCB(skb)->frag_max_size. The conditional happens to be true for us due
> to IPCB isn't being initialized even though the packet size is small.
> The packet then is dropped in ip_fragment().
>
> The change in this diff is a way to fix:
>
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index ffeb2df8be7a..1d0a0151e175 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -310,6 +310,7 @@ static bool mpls_egress(struct net *net, struct mpls_route *rt,
> htons(hdr4->ttl << 8),
> htons(new_ttl << 8));
> hdr4->ttl = new_ttl;
> + memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
> success = true;
> break;
> }
> @@ -327,6 +328,7 @@ static bool mpls_egress(struct net *net, struct mpls_route *rt,
> hdr6->hop_limit = dec.ttl;
> else if (hdr6->hop_limit)
> hdr6->hop_limit = hdr6->hop_limit - 1;
> + memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
> success = true;
> break;
> }
>
> Here are my questions. Is this the best place to initialize the IPCB?
> Would it be better done in vrf.c? I can work on the formal patch once we
> agree on where the final fix should be
>
> Cc Florian Westphal since we found the problem after upgrading to kernel
> version that has his commit bb4cc1a18856 ("net: ip: always refragment ip
> defragmented packets"). But I think the bug is there without his commit
> as well if (IPCB(skb)->flags & IPCB_FRAG_PMTU) happens to evaluate to
> true.
>
> Thanks,
>
> Stephen.
[-- Attachment #2: mpls_test.sh --]
[-- Type: application/x-sh, Size: 3262 bytes --]
[-- Attachment #3: mpls_teardown.sh --]
[-- Type: application/x-sh, Size: 375 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: IPCB isn't initialized when MPLS route is pointing to a VRF
2021-11-22 16:05 IPCB isn't initialized when MPLS route is pointing to a VRF Stephen Suryaputra
2021-11-22 16:07 ` Stephen Suryaputra
@ 2021-11-28 22:48 ` David Ahern
1 sibling, 0 replies; 3+ messages in thread
From: David Ahern @ 2021-11-28 22:48 UTC (permalink / raw)
To: Stephen Suryaputra, netdev, ebiederm, roopa; +Cc: fw
On 11/22/21 9:05 AM, Stephen Suryaputra wrote:
> Hi,
>
> We ran into a problem because IPCB isn't being initialized when MPLS is
> egressing into a VRF. Reproducer script and its teardown are attached,
> but they are only to illustrate our setup rather than seeing the problem
> as it depends on what is in the skb->cb[].
>
> We found this when h0 is sending an ip packet with DF=1 to 10.1.4.2 and
> on ler1 the code path is as follows: mpls_forward() calls mpls_egress()
> and then calls neigh_xmit(), which ends up calling __dev_queue_xmit()
> and vrf_xmit() through dev_hard_start_xmit(). vrf_xmit() eventually will
> call ip_output() and __ip_finish_output() that has the check for
> IPCB(skb)->frag_max_size. The conditional happens to be true for us due
> to IPCB isn't being initialized even though the packet size is small.
> The packet then is dropped in ip_fragment().
>
> The change in this diff is a way to fix:
>
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index ffeb2df8be7a..1d0a0151e175 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -310,6 +310,7 @@ static bool mpls_egress(struct net *net, struct mpls_route *rt,
> htons(hdr4->ttl << 8),
> htons(new_ttl << 8));
> hdr4->ttl = new_ttl;
> + memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
> success = true;
> break;
> }
> @@ -327,6 +328,7 @@ static bool mpls_egress(struct net *net, struct mpls_route *rt,
> hdr6->hop_limit = dec.ttl;
> else if (hdr6->hop_limit)
> hdr6->hop_limit = hdr6->hop_limit - 1;
> + memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
> success = true;
> break;
> }
>
> Here are my questions. Is this the best place to initialize the IPCB?
> Would it be better done in vrf.c? I can work on the formal patch once we
> agree on where the final fix should be
vrf.c seems like the right place since it does the direct recirculation
(vs loopback which puts in back in the Rx queue).
>
> Cc Florian Westphal since we found the problem after upgrading to kernel
> version that has his commit bb4cc1a18856 ("net: ip: always refragment ip
> defragmented packets"). But I think the bug is there without his commit
> as well if (IPCB(skb)->flags & IPCB_FRAG_PMTU) happens to evaluate to
> true.
>
> Thanks,
>
> Stephen.
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-11-28 22:51 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-22 16:05 IPCB isn't initialized when MPLS route is pointing to a VRF Stephen Suryaputra
2021-11-22 16:07 ` Stephen Suryaputra
2021-11-28 22:48 ` David Ahern
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).