netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
To: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: David Miller <davem@davemloft.net>,
	wenxu@ucloud.cn, kuznet@ms2.inr.ac.ru, jmorris@namei.org,
	kaber@trash.net, yoshfuji@linux-ipv6.org, netdev@vger.kernel.org,
	wenx05124561@163.com
Subject: Re: [PATCH] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for gre tunneled skbs
Date: Fri, 19 Aug 2016 16:40:07 +0300	[thread overview]
Message-ID: <20160819164007.527fe984@halley> (raw)
In-Reply-To: <d9fcc388-4722-02c6-e3dc-b68893c6cc15@stressinduktion.org>

Hi,

On Fri, 19 Aug 2016 11:20:40 +0200 Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> >> Maybe we can change our criteria in the following manner:
> >>  
> >> -	if (skb_iif && proto == IPPROTO_UDP) {
> >> +	if (skb_iif && !(df & htons(IP_DF))) {
> >> 		IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
> >>
> >> This way, any tunnel explicitly providing DF is NOT allowed to
> >> further fragment the resulting segments (leading to tx segments being
> >> dropped).
> >> Others tunnels, that do not care (e.g. vxlan and geneve, and probably
> >> ovs vport-gre, or other ovs encap vports, in df_default=false mode),
> >> will behave same for gso and non-gso.
> >>
> 
> I am really not sure...
> 
> Probably we have no other choice.

Further diving into this, seems the !IP_DF approach is more correct 
then the IPPROTO_UDP approach (WRT packets/segments arriving from other
interface, that exceed egress mtu):

vxlan/geneve:
  Both set df to zero.
  !IP_DF approach acts same as IPPROTO_UDP approach

vxlan/geneve in collect_md (e.g. OvS):
  They set df according to tun_flags & TUNNEL_DONT_FRAGMENT.
  IPPROTO_UDP approach:
    IPSKB_FRAG_SEGS gets set unconditionally.
    In case TUNNEL_DONT_FRAGMENT requested, non-gso get dropped
    due to IPSTATS_MIB_FRAGFAILS, whereas gso gets segmented+fragmented (!)
  !IP_DF approach:
    Aligned, both non-gso and gso gets dropped for TUNNEL_DONT_FRAGMENT.

ip_gre in collect_md (e.g. OvS):
  Sets df according to tun_flags & TUNNEL_DONT_FRAGMENT.
  IPPROTO_UDP approach:
    IPSKB_FRAG_SEGS is never set.
    Therefore in the case were df is NOT set, non-gso are fragged and
    passed, whereas gso gets dropped (!)
  !IP_DF approach:
    Non-gso vs gso aligned.

ip_gre in nopmtudisc:
  Will pass tnl_update_pmtu checks; Then, df inherrited from inner_iph
  (or stays unset if IFLA_GRE_IGNORE_DF specified).
  IPPROTO_UDP approach:
    IPSKB_FRAG_SEGS never set.
    Therefore in the case were df is NOT set, non-gso are fragged and
    passed, whereas gso gets dropped (!)
  !IP_DF approach:
    Aligned.
    
ip_gre in fou/gue mode in nopmtudisc:
  Assuming they pass tnl_update_pmtu checks; Then, df inherrited from
  inner_iph (or stays unset if IFLA_GRE_IGNORE_DF specified).
  IPPROTO_UDP approach:
    IPSKB_FRAG_SEGS gets always set (since proto==IPPROTO_UDP).
    In the case df is set, non-gso dropped by IPSTATS_MIB_FRAGFAILS,
    whereas gso gets segmented+fragmented (!)
  !IP_DF approach: 
    Aligned.

ip_gre in pmtudisc:
  Sets df to IP_DF.
  Non-gso will fail tnl_update_pmtu checks (gso should pass).
  IPPROTO_UDP approach:
    IPSKB_FRAG_SEGS never set. This leads the gso skbs to be eventually
    dropped. okay.
  !IP_DF approach:
    IPSKB_FRAG_SEGS not set, since IP_DF is true.
    This leads to gso skbs to be eventually dropped. okay.

(truely appreciate if you can review my above analysis)

Therefore using !(df & htons(IP_DF)) actually fixes some oversights of
our former proto==IPPROTO_UDP approach.

I'll send a patch.

Thanks
Shmulik

      reply	other threads:[~2016-08-19 13:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-09  7:04 [PATCH] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for gre tunneled skbs wenxu
2016-08-11  0:35 ` David Miller
2016-08-11 19:41   ` Shmulik Ladkani
2016-08-12  4:29     ` wenxu
     [not found]     ` <80d116d7-e61c-1bbd-64bf-e3b1f809419b@ucloud.cn>
2016-08-12  5:18       ` Shmulik Ladkani
2016-08-12 11:11     ` Hannes Frederic Sowa
2016-08-15 11:16       ` Shmulik Ladkani
2016-08-16  7:12         ` wenxu
2016-08-19  7:26         ` Shmulik Ladkani
2016-08-19  9:20           ` Hannes Frederic Sowa
2016-08-19 13:40             ` Shmulik Ladkani [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160819164007.527fe984@halley \
    --to=shmulik.ladkani@gmail.com \
    --cc=davem@davemloft.net \
    --cc=hannes@stressinduktion.org \
    --cc=jmorris@namei.org \
    --cc=kaber@trash.net \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=netdev@vger.kernel.org \
    --cc=wenx05124561@163.com \
    --cc=wenxu@ucloud.cn \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).