From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
To: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: David Miller <davem@davemloft.net>,
wenxu@ucloud.cn, kuznet@ms2.inr.ac.ru, jmorris@namei.org,
kaber@trash.net, yoshfuji@linux-ipv6.org, netdev@vger.kernel.org,
wenx05124561@163.com
Subject: Re: [PATCH] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for gre tunneled skbs
Date: Fri, 19 Aug 2016 16:40:07 +0300 [thread overview]
Message-ID: <20160819164007.527fe984@halley> (raw)
In-Reply-To: <d9fcc388-4722-02c6-e3dc-b68893c6cc15@stressinduktion.org>
Hi,
On Fri, 19 Aug 2016 11:20:40 +0200 Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> >> Maybe we can change our criteria in the following manner:
> >>
> >> - if (skb_iif && proto == IPPROTO_UDP) {
> >> + if (skb_iif && !(df & htons(IP_DF))) {
> >> IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
> >>
> >> This way, any tunnel explicitly providing DF is NOT allowed to
> >> further fragment the resulting segments (leading to tx segments being
> >> dropped).
> >> Others tunnels, that do not care (e.g. vxlan and geneve, and probably
> >> ovs vport-gre, or other ovs encap vports, in df_default=false mode),
> >> will behave same for gso and non-gso.
> >>
>
> I am really not sure...
>
> Probably we have no other choice.
Further diving into this, seems the !IP_DF approach is more correct
then the IPPROTO_UDP approach (WRT packets/segments arriving from other
interface, that exceed egress mtu):
vxlan/geneve:
Both set df to zero.
!IP_DF approach acts same as IPPROTO_UDP approach
vxlan/geneve in collect_md (e.g. OvS):
They set df according to tun_flags & TUNNEL_DONT_FRAGMENT.
IPPROTO_UDP approach:
IPSKB_FRAG_SEGS gets set unconditionally.
In case TUNNEL_DONT_FRAGMENT requested, non-gso get dropped
due to IPSTATS_MIB_FRAGFAILS, whereas gso gets segmented+fragmented (!)
!IP_DF approach:
Aligned, both non-gso and gso gets dropped for TUNNEL_DONT_FRAGMENT.
ip_gre in collect_md (e.g. OvS):
Sets df according to tun_flags & TUNNEL_DONT_FRAGMENT.
IPPROTO_UDP approach:
IPSKB_FRAG_SEGS is never set.
Therefore in the case were df is NOT set, non-gso are fragged and
passed, whereas gso gets dropped (!)
!IP_DF approach:
Non-gso vs gso aligned.
ip_gre in nopmtudisc:
Will pass tnl_update_pmtu checks; Then, df inherrited from inner_iph
(or stays unset if IFLA_GRE_IGNORE_DF specified).
IPPROTO_UDP approach:
IPSKB_FRAG_SEGS never set.
Therefore in the case were df is NOT set, non-gso are fragged and
passed, whereas gso gets dropped (!)
!IP_DF approach:
Aligned.
ip_gre in fou/gue mode in nopmtudisc:
Assuming they pass tnl_update_pmtu checks; Then, df inherrited from
inner_iph (or stays unset if IFLA_GRE_IGNORE_DF specified).
IPPROTO_UDP approach:
IPSKB_FRAG_SEGS gets always set (since proto==IPPROTO_UDP).
In the case df is set, non-gso dropped by IPSTATS_MIB_FRAGFAILS,
whereas gso gets segmented+fragmented (!)
!IP_DF approach:
Aligned.
ip_gre in pmtudisc:
Sets df to IP_DF.
Non-gso will fail tnl_update_pmtu checks (gso should pass).
IPPROTO_UDP approach:
IPSKB_FRAG_SEGS never set. This leads the gso skbs to be eventually
dropped. okay.
!IP_DF approach:
IPSKB_FRAG_SEGS not set, since IP_DF is true.
This leads to gso skbs to be eventually dropped. okay.
(truely appreciate if you can review my above analysis)
Therefore using !(df & htons(IP_DF)) actually fixes some oversights of
our former proto==IPPROTO_UDP approach.
I'll send a patch.
Thanks
Shmulik
prev parent reply other threads:[~2016-08-19 13:40 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-09 7:04 [PATCH] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for gre tunneled skbs wenxu
2016-08-11 0:35 ` David Miller
2016-08-11 19:41 ` Shmulik Ladkani
2016-08-12 4:29 ` wenxu
[not found] ` <80d116d7-e61c-1bbd-64bf-e3b1f809419b@ucloud.cn>
2016-08-12 5:18 ` Shmulik Ladkani
2016-08-12 11:11 ` Hannes Frederic Sowa
2016-08-15 11:16 ` Shmulik Ladkani
2016-08-16 7:12 ` wenxu
2016-08-19 7:26 ` Shmulik Ladkani
2016-08-19 9:20 ` Hannes Frederic Sowa
2016-08-19 13:40 ` Shmulik Ladkani [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160819164007.527fe984@halley \
--to=shmulik.ladkani@gmail.com \
--cc=davem@davemloft.net \
--cc=hannes@stressinduktion.org \
--cc=jmorris@namei.org \
--cc=kaber@trash.net \
--cc=kuznet@ms2.inr.ac.ru \
--cc=netdev@vger.kernel.org \
--cc=wenx05124561@163.com \
--cc=wenxu@ucloud.cn \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).