All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rick Jones <rick.jones2@hp.com>
To: Eric Dumazet <erdnetdev@gmail.com>
Cc: David Miller <davem@davemloft.net>, netdev <netdev@vger.kernel.org>
Subject: Re: [RFC] IP_MAX_MTU value
Date: Fri, 21 Dec 2012 10:19:57 -0800	[thread overview]
Message-ID: <50D4A84D.1010402@hp.com> (raw)
In-Reply-To: <1356072468.21834.4805.camel@edumazet-glaptop>

On 12/20/2012 10:47 PM, Eric Dumazet wrote:
> Hi David
>
> We have the following definition in net/ipv4/route.c
>
> #define IP_MAX_MTU   0xFFF0
>
> This means that "netperf -t UDP_STREAM", using UDP messages of 65507
> bytes, are fragmented on loopback interface (while its MTU is now 65536
> and should allow those UDP messages being sent without fragments)
>
> I guess Rick chose 65507 bytes in netperf because it was related to the
> max IPv4 datagram length :
>
> 65507 + 28 = 65535

That is correct.  From src/nettest_opmni.c:

/* choosing the default send size is a trifle more complicated than it
    used to be as we have to account for different protocol limits */

#define UDP_LENGTH_MAX (0xFFFF - 28)

static int
choose_send_size(int lss, int protocol) {

   int send_size;

   if (lss > 0) {
     send_size = lss_size;

     /* we will assume that everyone has IPPROTO_UDP and thus avoid an
        issue with Windows using an enum */
     if ((protocol == IPPROTO_UDP) && (send_size > UDP_LENGTH_MAX))
       send_size = UDP_LENGTH_MAX;

   }
   else {
     send_size = 4096;
   }
   return send_size;
}

And I figured that while IPv6 allows even larger sizes, the likelihood 
of it mattering in the then near/medium term was minimal.

> Changing IP_MAX_MTU from 0xFFF0 to 0x10000 seems safe [1], but I might
> miss something really obvious ?

If you go beyond the protocol limit of an IPv4 datagram, won't it be 
necessary to  start being a bit more conditional on IPv4 vs IPv6?

> It might be because in old days we reserved 16 bytes for the ethernet
> header, and we wanted to avoid kmalloc() round-up to kmalloc-131072
> slab ?
>
> If so, we certainly can limit skb->head to 32 or 64 KB and complete with
> page fragments the remaining space.
>
> Thanks
>
> [1] performance increase is ~50%

99 times out of 10 I will assert that faster is better, but do we need 
another 50% for UDP over loopback with that large a message size?

happy benchmarking,

rick jones

  parent reply	other threads:[~2012-12-21 18:20 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-21  6:47 [RFC] IP_MAX_MTU value Eric Dumazet
2012-12-21  7:08 ` Eric Dumazet
2012-12-21 18:19 ` Rick Jones [this message]
2012-12-21 18:34   ` Eric Dumazet
2012-12-21 18:50     ` Rick Jones
2012-12-21 18:54       ` Eric Dumazet
2012-12-21 19:59 ` David Miller
2012-12-21 23:45   ` Alexey Kuznetsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50D4A84D.1010402@hp.com \
    --to=rick.jones2@hp.com \
    --cc=davem@davemloft.net \
    --cc=erdnetdev@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.