* [IPV4] Check mtu instead of frag_list in ip_push_pending_frames
@ 2005-03-21 10:56 Herbert Xu
2005-03-21 11:16 ` [IPV4] Clear DF bit in ip_fragment fast path Herbert Xu
2005-03-23 20:20 ` [IPV4] Check mtu instead of frag_list in ip_push_pending_frames David S. Miller
0 siblings, 2 replies; 4+ messages in thread
From: Herbert Xu @ 2005-03-21 10:56 UTC (permalink / raw)
To: David S. Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 1955 bytes --]
Hi Dave:
I still didn't like the fact that ip_append_data was the only user
of dst_pmtu :) So I went looking for bugs in the surrounding code.
I managed to find something in ip_push_pending_frames.
When dst_mtu < dst_pmtu - IPsec overhead (which can be caused by PMTU
discovery within an IPsec tunnel), and we transmit a packet that's
longer than dst_mtu but shorter than dst_pmtu - IPsec overhead, then
the DF bit will be incorrectly set in the inner IP header.
This will cause the packet to be dropped when it hits the router that
generated the original PMTU event. Unfortunately the ICMP packet coming
back doesn't tell us anything new so the next time we send a packet we
will do exactly the same thing.
The fix is similar to what we did in ip_output. Instead of checking
whether frag_list is empty, we check the condition skb->len <= dst_mtu
directly and set the DF bit based on that.
We can enumerate all the possibilities to see that this is correct.
If skb->len <= dst_mtu and frag_list is empty then this does the
samething as before and is obviously correct.
If skb->len <= dst_mtu and frag_list is non-empty then it implies
that dst_pmtu has increased since the fragments were constructed
as dst_pmtu = dst_mtu + IPsec overhead. So the skb will now fit
within a single fragment which means that setting DF is correct.
The fragments will be merged by skb_linearise in dev_queue_xmit.
If skb->len > dst_mtu and frag_list is non-empty then again this
maintains the status quo.
If skb->len > dst_mtu and frag_list is empty then we will leave the
DF bit clear as the packet will need to be fragmented between the
remote IPsec gateway and the final destination.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p --]
[-- Type: text/plain, Size: 530 bytes --]
===== net/ipv4/ip_output.c 1.80 vs edited =====
--- 1.80/net/ipv4/ip_output.c 2005-03-19 05:43:26 +11:00
+++ edited/net/ipv4/ip_output.c 2005-03-21 21:36:51 +11:00
@@ -1152,7 +1152,8 @@
* If local_df is set too, we still allow to fragment this frame
* locally. */
if (inet->pmtudisc == IP_PMTUDISC_DO ||
- (!skb_shinfo(skb)->frag_list && ip_dont_fragment(sk, &rt->u.dst)))
+ (skb->len <= dst_mtu(&rt->u.dst) &&
+ ip_dont_fragment(sk, &rt->u.dst)))
df = htons(IP_DF);
if (inet->cork.flags & IPCORK_OPT)
^ permalink raw reply [flat|nested] 4+ messages in thread* [IPV4] Clear DF bit in ip_fragment fast path
2005-03-21 10:56 [IPV4] Check mtu instead of frag_list in ip_push_pending_frames Herbert Xu
@ 2005-03-21 11:16 ` Herbert Xu
2005-03-23 20:20 ` David S. Miller
2005-03-23 20:20 ` [IPV4] Check mtu instead of frag_list in ip_push_pending_frames David S. Miller
1 sibling, 1 reply; 4+ messages in thread
From: Herbert Xu @ 2005-03-21 11:16 UTC (permalink / raw)
To: David S. Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 652 bytes --]
Hi Dave:
It is possible for ip_fragment() to send out head fragments with
both DF and MF set for packets with local_df set to true. This is
because the fast path tries to only modify the MF bit of the head
fragment.
Since the offset is always zero for the head fragment, and we know
that DF should be cleared in case of local_df, we can change |= to
a straight assignment.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p --]
[-- Type: text/plain, Size: 400 bytes --]
===== net/ipv4/ip_output.c 1.80 vs edited =====
--- 1.80/net/ipv4/ip_output.c 2005-03-19 05:43:26 +11:00
+++ edited/net/ipv4/ip_output.c 2005-03-21 22:09:45 +11:00
@@ -498,7 +498,7 @@
skb->data_len = first_len - skb_headlen(skb);
skb->len = first_len;
iph->tot_len = htons(first_len);
- iph->frag_off |= htons(IP_MF);
+ iph->frag_off = htons(IP_MF);
ip_send_check(iph);
for (;;) {
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [IPV4] Clear DF bit in ip_fragment fast path
2005-03-21 11:16 ` [IPV4] Clear DF bit in ip_fragment fast path Herbert Xu
@ 2005-03-23 20:20 ` David S. Miller
0 siblings, 0 replies; 4+ messages in thread
From: David S. Miller @ 2005-03-23 20:20 UTC (permalink / raw)
To: Herbert Xu; +Cc: netdev
On Mon, 21 Mar 2005 22:16:37 +1100
Herbert Xu <herbert@gondor.apana.org.au> wrote:
> It is possible for ip_fragment() to send out head fragments with
> both DF and MF set for packets with local_df set to true. This is
> because the fast path tries to only modify the MF bit of the head
> fragment.
>
> Since the offset is always zero for the head fragment, and we know
> that DF should be cleared in case of local_df, we can change |= to
> a straight assignment.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Also applied, thanks Herbert.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [IPV4] Check mtu instead of frag_list in ip_push_pending_frames
2005-03-21 10:56 [IPV4] Check mtu instead of frag_list in ip_push_pending_frames Herbert Xu
2005-03-21 11:16 ` [IPV4] Clear DF bit in ip_fragment fast path Herbert Xu
@ 2005-03-23 20:20 ` David S. Miller
1 sibling, 0 replies; 4+ messages in thread
From: David S. Miller @ 2005-03-23 20:20 UTC (permalink / raw)
To: Herbert Xu; +Cc: netdev
On Mon, 21 Mar 2005 21:56:22 +1100
Herbert Xu <herbert@gondor.apana.org.au> wrote:
> The fix is similar to what we did in ip_output. Instead of checking
> whether frag_list is empty, we check the condition skb->len <= dst_mtu
> directly and set the DF bit based on that.
Looks good, applied.
Thanks Herbert.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2005-03-23 20:20 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-21 10:56 [IPV4] Check mtu instead of frag_list in ip_push_pending_frames Herbert Xu
2005-03-21 11:16 ` [IPV4] Clear DF bit in ip_fragment fast path Herbert Xu
2005-03-23 20:20 ` David S. Miller
2005-03-23 20:20 ` [IPV4] Check mtu instead of frag_list in ip_push_pending_frames David S. Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).