netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: David Miller <davem@davemloft.net>
Cc: fw@strlen.de, netdev@vger.kernel.org, hannes@stressinduktion.org,
	edumazet@google.com, herbert@gondor.apana.org.au
Subject: Re: [PATCH -next] net: preserve geometry of fragment sizes when forwarding
Date: Mon, 18 May 2015 23:33:29 +0200	[thread overview]
Message-ID: <20150518213329.GA2335@breakpoint.cc> (raw)
In-Reply-To: <20150518.165550.359134808190719687.davem@davemloft.net>

David Miller <davem@davemloft.net> wrote:
> From: Florian Westphal <fw@strlen.de>
> Date: Mon, 18 May 2015 22:40:49 +0200
> 
> > But, to the best of my understanding, what you ask will push a lot of
> > non-trivial code into the kernel for no functional gain over
> > what has been proposed.
> 
> The functional gain is that we stop linearizing the packet, which
> involves memory allocation and copying the entire packet.

AFAICS ipv4 and ipv6 defragmentations do not perform linearizations or
reallocations?

> I am very confident that the performance gains would be non-trivial
> and quite measurable.

Are fragmented packets that common?
I don't have any real data on this, the box sending this email has

55965898 incoming packets delivered
62 reassemblies required

... but it is just an end host.

TCP shouldn't be a problem thanks to pmtud, and for high-volume
fragmented ipv4 flows i'd expect poor performance due to the 16 bit ID space
limitations long before processing bottleneck.

> You'd also be able to trivially respect the geometry of the original
> incoming packet stream.

True.  OTOH, the patch proposed in this thread would have done the same
with a lot less code (I admit that removing the optimization from Eric
once nf_defrag is loaded is not desirable; but I did not find a solution
to this problem aside from doing route lookup or tentative 'forward is
off') check, which I did not like.

Another alternative might be to delay Erics 'coalescing' step and move
it into the ip stack, after 'local delivery' decision was taken.

I can investigate this if you think its worth it.

> Every objection has been of the form "this special case" (this time
> SIP) is not easy.

Yes, but these objections are not some random hand-waving gesture.
It presents us with certain dilemmas, e.g. single udp packet:

1280 1280 1280 542

sip nat helper has to do nat/pat and replaces 10.2.3.4 with 192.168.2.3
(lets assume we'd have helpers that deal with addresses split over 2
 fragmented skbs so we can deal with 10.2 appearing in fragment #2
  and .3.4 in fragment #3)

We can then end up with something like
1283 1281 1282 542

... and what should we do then?

shuffe payload via memcpy/memmove() to only grow last frag?
This will not be hot path or common by any means.
But nervertheless, this can happen, and we need to deal with it.

> If I were doing this, I would implement something that handles the
> normal cases properly.  And then take it from there.

What is a 'normal case'?
And how do you propose we deal with the 'non-normal' cases?

I assume you mean to e.g. linearize for edge cases + then refragment?

If thats true, then we'd still need one of the proposed solutions to handle
this to get packets we can send out without breaking geometry/growing
fragments to a larger mtu.

> If you try to imagine the totality of it and all the edge cases
> and details from the beginning, yes it will look impossible.

Hmm...  correct, but I still believe we're talking immense pain
for very little gain.

Thanks for spending time on this.

  reply	other threads:[~2015-05-18 21:33 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-07 21:04 [PATCH -next] net: preserve geometry of fragment sizes when forwarding Florian Westphal
2015-05-18 19:39 ` David Miller
2015-05-18 20:06   ` Florian Westphal
2015-05-18 20:28     ` David Miller
2015-05-18 20:40       ` Florian Westphal
2015-05-18 20:55         ` David Miller
2015-05-18 21:33           ` Florian Westphal [this message]
2015-05-18 22:50             ` Herbert Xu
2015-05-18 23:02               ` Florian Westphal
2015-05-18 23:20                 ` Herbert Xu
2015-05-18 23:51             ` David Miller
2015-05-19 12:34               ` Florian Westphal
2015-05-19 19:34                 ` Jay Vosburgh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150518213329.GA2335@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hannes@stressinduktion.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).