netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] IP_MAX_MTU value
@ 2012-12-21  6:47 Eric Dumazet
  2012-12-21  7:08 ` Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Eric Dumazet @ 2012-12-21  6:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Rick Jones

Hi David

We have the following definition in net/ipv4/route.c

#define IP_MAX_MTU   0xFFF0

This means that "netperf -t UDP_STREAM", using UDP messages of 65507
bytes, are fragmented on loopback interface (while its MTU is now 65536
and should allow those UDP messages being sent without fragments)

I guess Rick chose 65507 bytes in netperf because it was related to the
max IPv4 datagram length :

65507 + 28 = 65535

Changing IP_MAX_MTU from 0xFFF0 to 0x10000 seems safe [1], but I might
miss something really obvious ?

It might be because in old days we reserved 16 bytes for the ethernet
header, and we wanted to avoid kmalloc() round-up to kmalloc-131072
slab ?

If so, we certainly can limit skb->head to 32 or 64 KB and complete with
page fragments the remaining space.

Thanks

[1] performance increase is ~50% 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] IP_MAX_MTU value
  2012-12-21  6:47 [RFC] IP_MAX_MTU value Eric Dumazet
@ 2012-12-21  7:08 ` Eric Dumazet
  2012-12-21 18:19 ` Rick Jones
  2012-12-21 19:59 ` David Miller
  2 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2012-12-21  7:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Rick Jones

On Thu, 2012-12-20 at 22:47 -0800, Eric Dumazet wrote:

> It might be because in old days we reserved 16 bytes for the ethernet
> header, and we wanted to avoid kmalloc() round-up to kmalloc-131072
> slab ?

Well, we add the struct skb_shared_info anyway, so we hit kmalloc-131072
with current IP_MAX_MTU value...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] IP_MAX_MTU value
  2012-12-21  6:47 [RFC] IP_MAX_MTU value Eric Dumazet
  2012-12-21  7:08 ` Eric Dumazet
@ 2012-12-21 18:19 ` Rick Jones
  2012-12-21 18:34   ` Eric Dumazet
  2012-12-21 19:59 ` David Miller
  2 siblings, 1 reply; 8+ messages in thread
From: Rick Jones @ 2012-12-21 18:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On 12/20/2012 10:47 PM, Eric Dumazet wrote:
> Hi David
>
> We have the following definition in net/ipv4/route.c
>
> #define IP_MAX_MTU   0xFFF0
>
> This means that "netperf -t UDP_STREAM", using UDP messages of 65507
> bytes, are fragmented on loopback interface (while its MTU is now 65536
> and should allow those UDP messages being sent without fragments)
>
> I guess Rick chose 65507 bytes in netperf because it was related to the
> max IPv4 datagram length :
>
> 65507 + 28 = 65535

That is correct.  From src/nettest_opmni.c:

/* choosing the default send size is a trifle more complicated than it
    used to be as we have to account for different protocol limits */

#define UDP_LENGTH_MAX (0xFFFF - 28)

static int
choose_send_size(int lss, int protocol) {

   int send_size;

   if (lss > 0) {
     send_size = lss_size;

     /* we will assume that everyone has IPPROTO_UDP and thus avoid an
        issue with Windows using an enum */
     if ((protocol == IPPROTO_UDP) && (send_size > UDP_LENGTH_MAX))
       send_size = UDP_LENGTH_MAX;

   }
   else {
     send_size = 4096;
   }
   return send_size;
}

And I figured that while IPv6 allows even larger sizes, the likelihood 
of it mattering in the then near/medium term was minimal.

> Changing IP_MAX_MTU from 0xFFF0 to 0x10000 seems safe [1], but I might
> miss something really obvious ?

If you go beyond the protocol limit of an IPv4 datagram, won't it be 
necessary to  start being a bit more conditional on IPv4 vs IPv6?

> It might be because in old days we reserved 16 bytes for the ethernet
> header, and we wanted to avoid kmalloc() round-up to kmalloc-131072
> slab ?
>
> If so, we certainly can limit skb->head to 32 or 64 KB and complete with
> page fragments the remaining space.
>
> Thanks
>
> [1] performance increase is ~50%

99 times out of 10 I will assert that faster is better, but do we need 
another 50% for UDP over loopback with that large a message size?

happy benchmarking,

rick jones

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] IP_MAX_MTU value
  2012-12-21 18:19 ` Rick Jones
@ 2012-12-21 18:34   ` Eric Dumazet
  2012-12-21 18:50     ` Rick Jones
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2012-12-21 18:34 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Miller, netdev

On Fri, 2012-12-21 at 10:19 -0800, Rick Jones wrote:

> If you go beyond the protocol limit of an IPv4 datagram, won't it be 
> necessary to  start being a bit more conditional on IPv4 vs IPv6?
> 

This IP_MAX_MTU is really an IPv4 thing (static to net/ipv4/route.c)


> 
> 99 times out of 10 I will assert that faster is better, but do we need 
> another 50% for UDP over loopback with that large a message size?

Well, I only didnt understand why sending 65507 UDP messages had to use
fragments. I didnt care of performance at this point, only tried to have
an reasonable explanation.

It turns out its a strange limitation.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] IP_MAX_MTU value
  2012-12-21 18:34   ` Eric Dumazet
@ 2012-12-21 18:50     ` Rick Jones
  2012-12-21 18:54       ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Rick Jones @ 2012-12-21 18:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On 12/21/2012 10:34 AM, Eric Dumazet wrote:
> On Fri, 2012-12-21 at 10:19 -0800, Rick Jones wrote:
>
>> If you go beyond the protocol limit of an IPv4 datagram, won't it be
>> necessary to  start being a bit more conditional on IPv4 vs IPv6?
>>
>
> This IP_MAX_MTU is really an IPv4 thing (static to net/ipv4/route.c)

OK. Doesn't this:

         if (mtu > IP_MAX_MTU)
                 mtu = IP_MAX_MTU;

mean it should be OK to go to 0xFFFF but not 0x10000?  Since 65535 is 
the limit of an IPv4 datagram and so I would think would be the maximum 
MTU for an IPv4 interface.

rick

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] IP_MAX_MTU value
  2012-12-21 18:50     ` Rick Jones
@ 2012-12-21 18:54       ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2012-12-21 18:54 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Miller, netdev

On Fri, 2012-12-21 at 10:50 -0800, Rick Jones wrote:
> On 12/21/2012 10:34 AM, Eric Dumazet wrote:
> > On Fri, 2012-12-21 at 10:19 -0800, Rick Jones wrote:
> >
> >> If you go beyond the protocol limit of an IPv4 datagram, won't it be
> >> necessary to  start being a bit more conditional on IPv4 vs IPv6?
> >>
> >
> > This IP_MAX_MTU is really an IPv4 thing (static to net/ipv4/route.c)
> 
> OK. Doesn't this:
> 
>          if (mtu > IP_MAX_MTU)
>                  mtu = IP_MAX_MTU;
> 
> mean it should be OK to go to 0xFFFF but not 0x10000?  Since 65535 is 
> the limit of an IPv4 datagram and so I would think would be the maximum 
> MTU for an IPv4 interface.

Sure, I meant 0xFFFF and it is the value I used in my tests.

65536 is the current MTU of loopback device, and this can be used by
other protocols.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] IP_MAX_MTU value
  2012-12-21  6:47 [RFC] IP_MAX_MTU value Eric Dumazet
  2012-12-21  7:08 ` Eric Dumazet
  2012-12-21 18:19 ` Rick Jones
@ 2012-12-21 19:59 ` David Miller
  2012-12-21 23:45   ` Alexey Kuznetsov
  2 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2012-12-21 19:59 UTC (permalink / raw)
  To: erdnetdev; +Cc: netdev, rick.jones2, kuznet

From: Eric Dumazet <erdnetdev@gmail.com>
Date: Thu, 20 Dec 2012 22:47:48 -0800

> We have the following definition in net/ipv4/route.c
> 
> #define IP_MAX_MTU   0xFFF0
> 
> This means that "netperf -t UDP_STREAM", using UDP messages of 65507
> bytes, are fragmented on loopback interface (while its MTU is now 65536
> and should allow those UDP messages being sent without fragments)
> 
> I guess Rick chose 65507 bytes in netperf because it was related to the
> max IPv4 datagram length :
> 
> 65507 + 28 = 65535
> 
> Changing IP_MAX_MTU from 0xFFF0 to 0x10000 seems safe [1], but I might
> miss something really obvious ?
> 
> It might be because in old days we reserved 16 bytes for the ethernet
> header, and we wanted to avoid kmalloc() round-up to kmalloc-131072
> slab ?
> 
> If so, we certainly can limit skb->head to 32 or 64 KB and complete with
> page fragments the remaining space.

I don't think it has to do with kmalloc() at all.

Maybe something strange to do with the fact that each frag has to
be an 8-byte multiple, or something like that?

Alexey choose this value back in 1998, maybe he remembers the
reason for the strange value.

Alexey?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] IP_MAX_MTU value
  2012-12-21 19:59 ` David Miller
@ 2012-12-21 23:45   ` Alexey Kuznetsov
  0 siblings, 0 replies; 8+ messages in thread
From: Alexey Kuznetsov @ 2012-12-21 23:45 UTC (permalink / raw)
  To: David Miller; +Cc: erdnetdev, netdev, rick.jones2

On Fri, Dec 21, 2012 at 11:59:10AM -0800, David Miller wrote:
> From: Eric Dumazet <erdnetdev@gmail.com>
> > Changing IP_MAX_MTU from 0xFFF0 to 0x10000 seems safe [1], but I might
> > miss something really obvious ?

Maximal IP MTU is 65535 (due to iph->len).


> > It might be because in old days we reserved 16 bytes for the ethernet
> > header, and we wanted to avoid kmalloc() round-up to kmalloc-131072
> > slab ?
> > 
> > If so, we certainly can limit skb->head to 32 or 64 KB and complete with
> > page fragments the remaining space.
> 
> I don't think it has to do with kmalloc() at all.
> 
> Maybe something strange to do with the fact that each frag has to
> be an 8-byte multiple, or something like that?
> 
> Alexey choose this value back in 1998, maybe he remembers the
> reason for the strange value.

Vaguely. I do not think it was something clever.

Apparently, this was done while implementing IPv6 jumbo frames.
dev->mtu was allowed to be arbitrarily high and for use with IPv4 it had
to be clamped at some reasonable value. This value should be 65535, of course.

Here was a problem: apparently, when experimenting with jumbo frames I used
virtual device (loopback?) setting some crazy values for mtu there. And it looked
dubious that nicely aligned device mtu sort of 256K converted to perverted odd
value 65535 for IP. This value would be translated to odd value for mss and so on.
Nothing fatal, but forcing tcp segmentation to an odd value looked too weird.

So, I just decided to apply some bound which looked sane, safe and aligned. :-)
Note that values of mtu > 65280 (or something like that, which was used by some funny hippi device)
were purely of theoritical value, they were used only to experiment with jumbo frames or for stress
testing, so actual value did not have any meaning.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-12-22  0:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-21  6:47 [RFC] IP_MAX_MTU value Eric Dumazet
2012-12-21  7:08 ` Eric Dumazet
2012-12-21 18:19 ` Rick Jones
2012-12-21 18:34   ` Eric Dumazet
2012-12-21 18:50     ` Rick Jones
2012-12-21 18:54       ` Eric Dumazet
2012-12-21 19:59 ` David Miller
2012-12-21 23:45   ` Alexey Kuznetsov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).