From: Oliver Graute <oliver.graute@gmail.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Network Development <netdev@vger.kernel.org>,
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
Jakub Sitnicki <jakub@cloudflare.com>,
Paolo Abeni <pabeni@redhat.com>,
sagi@lightbitslabs.com, Christoph Hellwig <hch@lst.de>,
sagi@grimberg.me, Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: UDP implementation and the MSG_MORE flag
Date: Thu, 28 Jan 2021 16:23:53 +0100 [thread overview]
Message-ID: <20210128152353.GB27281@optiplex> (raw)
In-Reply-To: <CAF=yD-JuHy8yf88RR_=K+r_3SwhwzqRtHrK08-WF4BkwMNk-LQ@mail.gmail.com>
On 27/01/21, Willem de Bruijn wrote:
> On Wed, Jan 27, 2021 at 9:53 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Tue, Jan 26, 2021 at 10:25 PM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > On Tue, Jan 26, 2021 at 5:00 PM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > > >
> > > > On Tue, Jan 26, 2021 at 4:54 PM Willem de Bruijn
> > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > >
> > > > > On Tue, Jan 26, 2021 at 9:58 AM Oliver Graute <oliver.graute@gmail.com> wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > we observe some unexpected behavior in the UDP implementation of the
> > > > > > linux kernel.
> > > > > >
> > > > > > Some UDP packets send via the loopback interface are dropped in the
> > > > > > kernel on the receive side when using sendto with the MSG_MORE flag.
> > > > > > Every drop increases the InCsumErrors in /proc/self/net/snmp. Some
> > > > > > example code to reproduce it is appended below.
> > > > > >
> > > > > > In the code we tracked it down to this code section. ( Even a little
> > > > > > further but its unclear to me wy the csum() is wrong in the bad case)
> > > > > >
> > > > > > udpv6_recvmsg()
> > > > > > ...
> > > > > > if (checksum_valid || udp_skb_csum_unnecessary(skb)) {
> > > > > > if (udp_skb_is_linear(skb))
> > > > > > err = copy_linear_skb(skb, copied, off, &msg->msg_iter);
> > > > > > else
> > > > > > err = skb_copy_datagram_msg(skb, off, msg, copied);
> > > > > > } else {
> > > > > > err = skb_copy_and_csum_datagram_msg(skb, off, msg);
> > > > > > if (err == -EINVAL) {
> > > > > > goto csum_copy_err;
> > > > > > }
> > > > > > }
> > > > > > ...
> > > > > >
> > > > >
> > > > > Thanks for the report with a full reproducer.
> > > > >
> > > > > I don't have a full answer yet, but can reproduce this easily.
> > > > >
> > > > > The third program, without MSG_MORE, builds an skb with
> > > > > CHECKSUM_PARTIAL in __ip_append_data. When looped to the receive path
> > > > > that ip_summed means no additional validation is needed. As encoded in
> > > > > skb_csum_unnecessary.
> > > > >
> > > > > The first and second programs are essentially the same, bar for a
> > > > > slight difference in length. In both cases packet length is very short
> > > > > compared to the loopback device MTU. Because of MSG_MORE, these
> > > > > packets have CHECKSUM_NONE.
> > > > >
> > > > > On receive in
> > > > >
> > > > > __udp4_lib_rcv()
> > > > > udp4_csum_init()
> > > > > err = skb_checksum_init_zero_check()
> > > > >
> > > > > The second program validates and sets ip_summed = CHECKSUM_COMPLETE
> > > > > and csum_valid = 1.
> > > > > The first does not, though err == 0.
> > > > >
> > > > > This appears to succeed consistently for packets <= 68B of payload,
> > > > > fail consistently otherwise. It is not clear to me yet what causes
> > > > > this distinction.
> > > >
> > > > This is from
> > > >
> > > > "
> > > > /* For small packets <= CHECKSUM_BREAK perform checksum complete directly
> > > > * in checksum_init.
> > > > */
> > > > #define CHECKSUM_BREAK 76
> > > > "
> > > >
> > > > So the small packet gets checksummed immediately in
> > > > __skb_checksum_validate_complete, but the larger one does not.
> > > >
> > > > Question is why the copy_and_checksum you pointed to seems to fail checksum.
> > >
> > > Manually calling __skb_checksum_complete(skb) in
> > > skb_copy_and_csum_datagram_msg succeeds, so it is the
> > > skb_copy_and_csum_datagram that returns an incorrect csum.
> > >
> > > Bisection shows that this is a regression in 5.0, between
> > >
> > > 65d69e2505bb datagram: introduce skb_copy_and_hash_datagram_iter helper (fail)
> > > d05f443554b3 iov_iter: introduce hash_and_copy_to_iter helper
> > > 950fcaecd5cc datagram: consolidate datagram copy to iter helpers
> > > cb002d074dab iov_iter: pass void csum pointer to csum_and_copy_to_iter (pass)
> > >
> > > That's a significant amount of code change. I'll take a closer look,
> > > but checkpointing state for now..
> >
> > Key difference is the csum_block_add when handling frags, and the
> > removal of temporary csum2.
> >
> > In the reproducer, there is one 13B csum_and_copy_to_iter from
> > skb->data + offset, followed by a 73B csum_and_copy_to_iter from the
> > first frag. So the second one passes pos 13 to csum_block_add.
> >
> > The original implementation of skb_copy_and_csum_datagram similarly
> > fails the test, if we fail to account for the position
> >
> > - *csump = csum_block_add(*csump, csum2, pos);
> > + *csump = csum_block_add(*csump, csum2, 0);
>
> One possible approach:
very thx for your analysis and your patch proposal. After a first quick
test this patch proposal solves the problem.
Best regards,
Oliver
next prev parent reply other threads:[~2021-01-28 15:26 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-26 14:12 UDP implementation and the MSG_MORE flag Oliver Graute
2021-01-26 14:12 ` Oliver Graute
2021-01-26 21:54 ` Willem de Bruijn
2021-01-26 22:00 ` Willem de Bruijn
2021-01-27 3:25 ` Willem de Bruijn
2021-01-28 2:53 ` Willem de Bruijn
2021-01-28 3:10 ` Willem de Bruijn
2021-01-28 15:23 ` Oliver Graute [this message]
2021-02-03 22:21 ` michi1
2021-02-03 22:21 ` michi1
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210128152353.GB27281@optiplex \
--to=oliver.graute@gmail.com \
--cc=hch@lst.de \
--cc=jakub@cloudflare.com \
--cc=kuznet@ms2.inr.ac.ru \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sagi@grimberg.me \
--cc=sagi@lightbitslabs.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willemdebruijn.kernel@gmail.com \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.