netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Cc: "dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org"
	<dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org>,
	"netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Jason Wang <jasowang-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"Du, Fan" <fan.du-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	"fw-HFFVJYpyMKqzQB+pC5nmwQ@public.gmane.org"
	<fw-HFFVJYpyMKqzQB+pC5nmwQ@public.gmane.org>,
	"davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org"
	<davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
Subject: Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
Date: Wed, 3 Dec 2014 20:38:59 +0200	[thread overview]
Message-ID: <20141203183859.GB16447@redhat.com> (raw)
In-Reply-To: <CAEP_g=9C+D3gbjJ4n1t6xuyjqEAMYi4ZfqPoe92UAoQJH-UsKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Wed, Dec 03, 2014 at 10:07:42AM -0800, Jesse Gross wrote:
> On Wed, Dec 3, 2014 at 1:03 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Dec 02, 2014 at 10:12:04AM -0800, Jesse Gross wrote:
> >> On Tue, Dec 2, 2014 at 9:41 AM, Thomas Graf <tgraf@suug.ch> wrote:
> >> > On 12/02/14 at 07:34pm, Michael S. Tsirkin wrote:
> >> >> On Tue, Dec 02, 2014 at 05:09:27PM +0000, Thomas Graf wrote:
> >> >> > On 12/02/14 at 01:48pm, Flavio Leitner wrote:
> >> >> > > What about containers or any other virtualization environment that
> >> >> > > doesn't use Virtio?
> >> >> >
> >> >> > The host can dictate the MTU in that case for both veth or OVS
> >> >> > internal which would be primary container plumbing techniques.
> >> >>
> >> >> It typically can't do this easily for VMs with emulated devices:
> >> >> real ethernet uses a fixed MTU.
> >> >>
> >> >> IMHO it's confusing to suggest MTU as a fix for this bug, it's
> >> >> an unrelated optimization.
> >> >> ICMP_DEST_UNREACH/ICMP_FRAG_NEEDED is the right fix here.
> >> >
> >> > PMTU discovery only resolves the issue if an actual IP stack is
> >> > running inside the VM. This may not be the case at all.
> >>
> >> It's also only really a correct thing to do if the ICMP packet is
> >> coming from an L3 node. If you are doing straight bridging then you
> >> have to resort to hacks like OVS had before, which I agree are not
> >> particularly desirable.
> >
> > The issue seems to be that fundamentally, this is
> > bridging interfaces with variable MTUs (even if MTU values
> > on devices don't let us figure this out)-
> > that is already not straight bridging, and
> > I would argue sending ICMPs back is the right thing to do.
> 
> How do you deal with the fact that there is no host IP stack inside
> the tunnel? And isn't this exactly the same as the former OVS
> implementation that you said you didn't like?

I was talking about the high level requirement, not the implementation
here. I agree it's not at all trivial, we need to propagate this across
tunnels.

But let's agree on what we are trying to accomplish first.


> >> > I agree that exposing an MTU towards the guest is not applicable
> >> > in all situations, in particular because it is difficult to decide
> >> > what MTU to expose. It is a relatively elegant solution in a lot
> >> > of virtualization host cases hooked up to an orchestration system
> >> > though.
> >>
> >> I also think this is the right thing to do as a common case
> >> optimization and I know other platforms (such as Hyper-V) do it. It's
> >> not a complete solution so we still need the original patch in this
> >> thread to handle things transparently.
> >
> > Well, as I believe David (and independently Jason) is saying, it looks like
> > the ICMPs we are sending back after applying the original patch have the
> > wrong MTU.
> 
> The problem is actually that the ICMP messages won't even go to the
> sending VM because the host IP stack and the VM are isolated from each
> other and there is no route.

Exactly.
But all this is talking about implementation.

Let's agree on what we want to do first.

And in my mind, we typically want originator to adjust its PMTU,
just for a given destination.
Sending ICMP to originating VM will typically accomplish this.




> > And if I understand what David is saying here, IP is also the wrong place to
> > do it.
> 
> ICMP can't be the complete solution in any case because it only works
> for IP traffic.

Let's be specific please.  What protocols do you most care about? IPX?

> I think there are only two full solutions: find a way
> to adjust the guest MTU to the minimum MTU that its traffic could hit
> in an L2 domain or fragmentation. ICMP could be a possible
> optimization in the fragmentation case.

Both approaches seem strange. You are sending 1 packet an hour to
some destination behind 100 tunnels. Why would you want to
cut down your MTU for all packets? On the other hand,
doubling the amount of packets because your MTU is off
by a couple of bytes will hurt performance significantly.

Still, if you want to cut down the MTU within guest,
that's only an ifconfig away.
Most people would not want to bother, I think it's a good
idea to make PMTU work properly for them.

-- 
MST
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

  parent reply	other threads:[~2014-12-03 18:38 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-28  6:33 [PATCH net] gso: do GSO for local skb with size bigger than MTU Fan Du
2014-11-28  7:02 ` Jason Wang
2014-11-30 10:08   ` Du, Fan
2014-12-01 13:52     ` Thomas Graf
     [not found]       ` <20141201135225.GA16814-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
2014-12-01 15:06         ` Michael S. Tsirkin
2014-12-02 15:48         ` Flavio Leitner
2014-12-02 17:09           ` Thomas Graf
     [not found]             ` <20141202170927.GA9457-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
2014-12-02 17:34               ` Michael S. Tsirkin
2014-12-02 17:41                 ` Thomas Graf
2014-12-02 18:12                   ` Jesse Gross
     [not found]                     ` <CAEP_g=-86Z6pxNow-wjnbx_v9er_TSn6x5waigqVqYHa7tEQJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-12-03  9:03                       ` Michael S. Tsirkin
2014-12-03 18:07                         ` Jesse Gross
     [not found]                           ` <CAEP_g=9C+D3gbjJ4n1t6xuyjqEAMYi4ZfqPoe92UAoQJH-UsKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-12-03 18:38                             ` Michael S. Tsirkin [this message]
2014-12-03 18:56                               ` Rick Jones
     [not found]                                 ` <547F5CC2.8000908-VXdhtT5mjnY@public.gmane.org>
2014-12-04 10:17                                   ` Michael S. Tsirkin
2014-12-03 19:38                               ` Jesse Gross
2014-12-03 22:02                                 ` Thomas Graf
     [not found]                                   ` <20141203220244.GA8822-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
2014-12-03 22:50                                     ` Michael S. Tsirkin
2014-12-03 22:51                                   ` Jesse Gross
2014-12-03 23:05                                     ` Thomas Graf
     [not found]                                       ` <20141203230551.GC8822-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
2014-12-04  0:54                                         ` Jesse Gross
2014-12-04  1:15                                           ` Thomas Graf
2014-12-04  1:51                                             ` Jesse Gross
2014-12-04  9:26                                               ` Thomas Graf
2014-12-04 23:19                                                 ` Jesse Gross
2014-12-04  7:48                                     ` Du Fan
2014-12-04 23:23                                       ` Jesse Gross
2014-12-05  0:25                                         ` Du Fan
2014-12-03  2:31                   ` Du, Fan
2015-01-05  6:02                     ` Fan Du
     [not found]                       ` <54AA2912.6090903-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-01-05 17:58                         ` Jesse Gross
2015-01-06  9:34                           ` Fan Du
2015-01-06 19:11                             ` Jesse Gross
     [not found]                               ` <CAEP_g=8bCR=PeSoi09jLWLtNUrxhzx45h1Wm=9D=R57AqUac2w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-07  5:58                                 ` Fan Du
2015-01-07 20:52                                   ` Jesse Gross
     [not found]                                     ` <CAEP_g=8EBeQUFkRRsG3sznYryd+LE9qJKWQXfS==HG2HDO=UKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-08  9:39                                       ` Fan Du
2015-01-08 19:55                                         ` Jesse Gross
     [not found]                                           ` <CAEP_g=9hh+MG7AWEnct7CwRqp=ZghpbkDeQ5BhGQktDgMST1jA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-09  5:42                                             ` Fan Du
2015-01-12 18:48                                               ` Jesse Gross
2015-01-09  5:48                                           ` Fan Du
2015-01-12 18:55                                             ` Jesse Gross
2015-01-13 16:58                                               ` Thomas Graf
2014-12-02 15:44     ` Flavio Leitner
2014-12-02 18:06       ` Jesse Gross
2014-12-02 21:32         ` Flavio Leitner
2014-12-02 21:47           ` Jesse Gross
2014-12-03  1:58           ` Du, Fan
2014-11-30 10:26 ` Florian Westphal
2014-11-30 10:55   ` Du, Fan
2014-11-30 15:11     ` Florian Westphal
2014-12-01  6:47       ` Du, Fan
2014-12-03  3:23 ` David Miller
2014-12-03  3:32   ` Du, Fan
2014-12-03  4:35     ` David Miller
2014-12-03  4:50       ` Du, Fan
2014-12-03  5:14         ` David Miller
2014-12-03  6:53           ` Du, Fan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141203183859.GB16447@redhat.com \
    --to=mst-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org \
    --cc=dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org \
    --cc=fan.du-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=fw-HFFVJYpyMKqzQB+pC5nmwQ@public.gmane.org \
    --cc=jasowang-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).