netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
To: Tom Herbert <tom@herbertland.com>
Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	Vladislav Yasevich <vyasevich@gmail.com>,
	Benjamin Coddington <bcodding@redhat.com>
Subject: Re: [PATCH net-next v3 1/4] ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
Date: Wed, 28 Oct 2015 00:53:51 +0100	[thread overview]
Message-ID: <1445990031.562267.422041489.195AFF12@webmail.messagingengine.com> (raw)
In-Reply-To: <CALx6S35udd410V5bEzH2OxE=Qug0fydo10-X6E__gwzeGYCtzg@mail.gmail.com>

On Tue, Oct 27, 2015, at 23:22, Tom Herbert wrote:
> On Tue, Oct 27, 2015 at 2:40 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > We cannot reliable calculate packet size on MSG_MORE corked sockets
> > and thus cannot decide if they are going to be fragmented later on,
> > so better not use CHECKSUM_PARTIAL in the first place.
> >
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Vlad Yasevich <vyasevich@gmail.com>
> > Cc: Benjamin Coddington <bcodding@redhat.com>
> > Cc: Tom Herbert <tom@herbertland.com>
> > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > ---
> >  net/ipv4/ip_output.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > index 50e2973..0b02417 100644
> > --- a/net/ipv4/ip_output.c
> > +++ b/net/ipv4/ip_output.c
> > @@ -911,6 +911,7 @@ static int __ip_append_data(struct sock *sk,
> >         if (transhdrlen &&
> >             length + fragheaderlen <= mtu &&
> >             rt->dst.dev->features & NETIF_F_V4_CSUM &&
> > +           !(flags & MSG_MORE) &&
> 
> I still don't understand this. It seems like the effect is to disable
> checksum offload for all UDP messages sent with MSG_MORE flag set.

Exactly.

MSG_MORE/UDP_CORK is a method to append data on the *same* UDP packet.
The probability this packet exceeds the MTU size is rather large, as it
is mostly used to prepare a header and later on send data via
sendpage/sendfile-syscall (IPv6 UDP as no sendpage so it falls back to
normal udpv6_sendmsg path). sendpage is mostly used to send rather large
amount of data because for small amounts regular copying might be faster
(IMHO). So the probability we exceed the MTU is quiet high. This is the
case for NFSv4 which uses this flag over UDP, sending xdr header and
later on the filesystem data directly from the page cache. You will
still have CHECKSUM_PARTIAL capability with sendmsg and multiple iovecs!

Because we cannot simply switch back to CHECKSUM_NONE in the second
write, the first write would not yet have been checksumed, I decided to
exclude MSG_MORE to set up a CHECKSUM_PARTIAL skb.

Because there would be at least some more syscalls between the first
write and the second write (not so in the  NFS example directly from the
kernel but normal user space usage) the data would already be cold in
the caches. So it makes sense to me to checksum the data during copy-in
to trash the CPU caches only once.

The ip6_fragment logic will now catch this case and fragment anyway, but
as I wrote, this is only a last resort.

Hope that makes it more clear.

Bye,
Hannes

  reply	other threads:[~2015-10-27 23:53 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-27 21:40 [PATCH net-next v3 0/4] net: clean up interactions of CHECKSUM_PARTIAL and fragmentation Hannes Frederic Sowa
2015-10-27 21:40 ` [PATCH net-next v3 1/4] ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets Hannes Frederic Sowa
2015-10-27 22:22   ` Tom Herbert
2015-10-27 23:53     ` Hannes Frederic Sowa [this message]
2015-10-27 21:40 ` [PATCH net-next v3 2/4] ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment Hannes Frederic Sowa
2015-10-27 21:40 ` [PATCH net-next v3 3/4] ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets Hannes Frederic Sowa
2015-10-27 21:40 ` [PATCH net-next v3 4/4] ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment Hannes Frederic Sowa
2015-10-28  9:33 ` [PATCH net-next v3 0/4] net: clean up interactions of CHECKSUM_PARTIAL and fragmentation Hannes Frederic Sowa
2015-10-30 10:37   ` David Miller
2015-10-30 12:21     ` Hannes Frederic Sowa
2015-11-01 17:02 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1445990031.562267.422041489.195AFF12@webmail.messagingengine.com \
    --to=hannes@stressinduktion.org \
    --cc=bcodding@redhat.com \
    --cc=edumazet@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=tom@herbertland.com \
    --cc=vyasevich@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).