netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: David Howells <dhowells@redhat.com>,  netdev@vger.kernel.org
Cc: dhowells@redhat.com,
	 syzbot+62cbf263225ae13ff153@syzkaller.appspotmail.com,
	 Eric Dumazet <edumazet@google.com>,
	 Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	 "David S. Miller" <davem@davemloft.net>,
	 David Ahern <dsahern@kernel.org>,
	 Paolo Abeni <pabeni@redhat.com>,
	 Jakub Kicinski <kuba@kernel.org>,
	 bpf@vger.kernel.org,  syzkaller-bugs@googlegroups.com,
	 linux-kernel@vger.kernel.org
Subject: Re: [PATCH net v2] ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()
Date: Wed, 20 Sep 2023 09:54:42 -0400	[thread overview]
Message-ID: <650af9a2aa74_37bf362941f@willemb.c.googlers.com.notmuch> (raw)
In-Reply-To: <108791.1695199151@warthog.procyon.org.uk>

David Howells wrote:
> Including the transhdrlen in length is a problem when the packet is
> partially filled (e.g. something like send(MSG_MORE) happened previously)
> when appending to an IPv4 or IPv6 packet as we don't want to repeat the
> transport header or account for it twice.  This can happen under some
> circumstances, such as splicing into an L2TP socket.
> 
> The symptom observed is a warning in __ip6_append_data():
> 
>     WARNING: CPU: 1 PID: 5042 at net/ipv6/ip6_output.c:1800 __ip6_append_data.isra.0+0x1be8/0x47f0 net/ipv6/ip6_output.c:1800
> 
> that occurs when MSG_SPLICE_PAGES is used to append more data to an already
> partially occupied skbuff.  The warning occurs when 'copy' is larger than
> the amount of data in the message iterator.  This is because the requested
> length includes the transport header length when it shouldn't.  This can be
> triggered by, for example:
> 
>         sfd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_L2TP);
>         bind(sfd, ...); // ::1
>         connect(sfd, ...); // ::1 port 7
>         send(sfd, buffer, 4100, MSG_MORE);
>         sendfile(sfd, dfd, NULL, 1024);
> 
> Fix this by deducting transhdrlen from length in ip{,6}_append_data() right
> before we clear transhdrlen if there is already a packet that we're going
> to try appending to.
> 
> Reported-by: syzbot+62cbf263225ae13ff153@syzkaller.appspotmail.com
> Link: https://lore.kernel.org/r/0000000000001c12b30605378ce8@google.com/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Eric Dumazet <edumazet@google.com>
> cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> cc: "David S. Miller" <davem@davemloft.net>
> cc: David Ahern <dsahern@kernel.org>
> cc: Paolo Abeni <pabeni@redhat.com>
> cc: Jakub Kicinski <kuba@kernel.org>
> cc: netdev@vger.kernel.org
> cc: bpf@vger.kernel.org
> cc: syzkaller-bugs@googlegroups.com
> Link: https://lore.kernel.org/r/75315.1695139973@warthog.procyon.org.uk/ # v1
> ---
>  net/ipv4/ip_output.c  |    1 +
>  net/ipv6/ip6_output.c |    1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 4ab877cf6d35..9646f2d9afcf 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1354,6 +1354,7 @@ int ip_append_data(struct sock *sk, struct flowi4 *fl4,
>  		if (err)
>  			return err;
>  	} else {
> +		length -= transhdrlen;
>  		transhdrlen = 0;
>  	}
>  
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 54fc4c711f2c..6a4ce7f622e9 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1888,6 +1888,7 @@ int ip6_append_data(struct sock *sk,
>  		length += exthdrlen;
>  		transhdrlen += exthdrlen;
>  	} else {
> +		length -= transhdrlen;
>  		transhdrlen = 0;
>  	}
>  

Definitely a much simpler patch, thanks.

So the current model is that callers with non-zero transhdrlen always
pass to __ip_append_data payload length + transhdrlen.

I do see that udp does this: ulen += sizeof(struct udphdr); This calls
ip_make_skb if not corked, but directly ip_append_data if corked.

Then __ip_append_data will use transhdrlen in its packet calculations,
and reset that to zero after allocating the first new skb.

So if corked *and* fragmentation, which would cause a new skb to be
allocated, the next skb would incorrectly reserve udp header space,
because the second __ip_append_data call will again pass transhdrlen.
If so, then this patch fixes that. But that has never been reported,
so I'm most likely misreading some part..

So on the surface this makes sense to me. But I need to read it more
closely still. The most risk-averse version would limit this change
explicitly to MSG_SPLICE_PAGES calls.

FWIW I think MSG_ZEROCOPY is somewhat immune compared to
MSG_SPLCE_PAGES solely because it is limited to TCP, UDP and RDS
sockets.

  reply	other threads:[~2023-09-20 13:54 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-20  8:39 [PATCH net v2] ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data() David Howells
2023-09-20 13:54 ` Willem de Bruijn [this message]
2023-09-21  1:41   ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=650af9a2aa74_37bf362941f@willemb.c.googlers.com.notmuch \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=dhowells@redhat.com \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=syzbot+62cbf263225ae13ff153@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).