netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Cong Wang <xiyou.wangcong@gmail.com>,  Simon Horman <horms@kernel.org>
Cc: netdev@vger.kernel.org,  bpf@vger.kernel.org,
	 Cong Wang <cong.wang@bytedance.com>,
	 syzbot+58c03971700330ce14d8@syzkaller.appspotmail.com,
	 John Fastabend <john.fastabend@gmail.com>,
	 Jakub Sitnicki <jakub@cloudflare.com>
Subject: Re: [Patch bpf] tcp_bpf: fix return value of tcp_bpf_sendmsg()
Date: Thu, 22 Aug 2024 13:45:51 -0700	[thread overview]
Message-ID: <66c7a37fd0270_1b1420837@john.notmuch> (raw)
In-Reply-To: <ZsaLFVB0HyQfXBXy@pop-os.localdomain>

Cong Wang wrote:
> On Wed, Aug 21, 2024 at 03:55:33PM +0100, Simon Horman wrote:
> > On Tue, Aug 20, 2024 at 08:07:44PM -0700, Cong Wang wrote:
> > > From: Cong Wang <cong.wang@bytedance.com>
> > > 
> > > When we cork messages in psock->cork, the last message triggers the
> > > flushing will result in sending a sk_msg larger than the current
> > > message size. In this case, in tcp_bpf_send_verdict(), 'copied' becomes
> > > negative at least in the following case:
> > > 
> > > 468         case __SK_DROP:
> > > 469         default:
> > > 470                 sk_msg_free_partial(sk, msg, tosend);
> > > 471                 sk_msg_apply_bytes(psock, tosend);
> > > 472                 *copied -= (tosend + delta); // <==== HERE
> > > 473                 return -EACCES;
> > > 
> > > Therefore, it could lead to the following BUG with a proper value of
> > > 'copied' (thanks to syzbot). We should not use negative 'copied' as a
> > > return value here.
> > > 
> > >   ------------[ cut here ]------------
> > >   kernel BUG at net/socket.c:733!
> > >   Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
> > >   Modules linked in:
> > >   CPU: 0 UID: 0 PID: 3265 Comm: syz-executor510 Not tainted 6.11.0-rc3-syzkaller-00060-gd07b43284ab3 #0
> > >   Hardware name: linux,dummy-virt (DT)
> > >   pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> > >   pc : sock_sendmsg_nosec net/socket.c:733 [inline]
> > >   pc : sock_sendmsg_nosec net/socket.c:728 [inline]
> > >   pc : __sock_sendmsg+0x5c/0x60 net/socket.c:745
> > >   lr : sock_sendmsg_nosec net/socket.c:730 [inline]
> > >   lr : __sock_sendmsg+0x54/0x60 net/socket.c:745
> > >   sp : ffff800088ea3b30
> > >   x29: ffff800088ea3b30 x28: fbf00000062bc900 x27: 0000000000000000
> > >   x26: ffff800088ea3bc0 x25: ffff800088ea3bc0 x24: 0000000000000000
> > >   x23: f9f00000048dc000 x22: 0000000000000000 x21: ffff800088ea3d90
> > >   x20: f9f00000048dc000 x19: ffff800088ea3d90 x18: 0000000000000001
> > >   x17: 0000000000000000 x16: 0000000000000000 x15: 000000002002ffaf
> > >   x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> > >   x11: 0000000000000000 x10: ffff8000815849c0 x9 : ffff8000815b49c0
> > >   x8 : 0000000000000000 x7 : 000000000000003f x6 : 0000000000000000
> > >   x5 : 00000000000007e0 x4 : fff07ffffd239000 x3 : fbf00000062bc900
> > >   x2 : 0000000000000000 x1 : 0000000000000000 x0 : 00000000fffffdef
> > >   Call trace:
> > >    sock_sendmsg_nosec net/socket.c:733 [inline]
> > >    __sock_sendmsg+0x5c/0x60 net/socket.c:745
> > >    ____sys_sendmsg+0x274/0x2ac net/socket.c:2597
> > >    ___sys_sendmsg+0xac/0x100 net/socket.c:2651
> > >    __sys_sendmsg+0x84/0xe0 net/socket.c:2680
> > >    __do_sys_sendmsg net/socket.c:2689 [inline]
> > >    __se_sys_sendmsg net/socket.c:2687 [inline]
> > >    __arm64_sys_sendmsg+0x24/0x30 net/socket.c:2687
> > >    __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
> > >    invoke_syscall+0x48/0x110 arch/arm64/kernel/syscall.c:49
> > >    el0_svc_common.constprop.0+0x40/0xe0 arch/arm64/kernel/syscall.c:132
> > >    do_el0_svc+0x1c/0x28 arch/arm64/kernel/syscall.c:151
> > >    el0_svc+0x34/0xec arch/arm64/kernel/entry-common.c:712
> > >    el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:730
> > >    el0t_64_sync+0x19c/0x1a0 arch/arm64/kernel/entry.S:598
> > >   Code: f9404463 d63f0060 3108441f 54fffe81 (d4210000)
> > >   ---[ end trace 0000000000000000 ]---
> > > 
> > > Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
> > > Reported-by: syzbot+58c03971700330ce14d8@syzkaller.appspotmail.com
> > > Cc: John Fastabend <john.fastabend@gmail.com>
> > > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > > ---
> > >  net/ipv4/tcp_bpf.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> > > index 53b0d62fd2c2..fe6178715ba0 100644
> > > --- a/net/ipv4/tcp_bpf.c
> > > +++ b/net/ipv4/tcp_bpf.c
> > > @@ -577,7 +577,7 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
> > >  		err = sk_stream_error(sk, msg->msg_flags, err);
> > >  	release_sock(sk);
> > >  	sk_psock_put(sk, psock);
> > > -	return copied ? copied : err;
> > > +	return copied > 0 ? copied : err;
> > 
> > Does it make more sense to make the condition err:
> > is err 0 iif everything is ok? (completely untested!)
> 
> Mind to elaborate?
> 
> From my point of view, 'copied' is to handle partial transmission, for
> example:
> 
> 0. User wants to send 2 * 1K bytes with sendmsg()
> 1. Kernel already sent the first 1K successfully
> 2. Kernel got some error when sending the 2nd 1K
> 
> In this scenario, we should return 1K instead of the error to the caller to
> indicate this partial transmission situation, otherwise we could not
> distinguish it with a compete failure (that is, 0 byte sent).

Yep, if we don't return the positive value on partial send we will confuse
apps and they will probably resent data.

From my side this looks good.

Reviewed-by: John Fastabend <john.fastabend@gmail.com>

> 
> Do I miss anything?
> 
> Thanks.



  reply	other threads:[~2024-08-22 20:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-21  3:07 [Patch bpf] tcp_bpf: fix return value of tcp_bpf_sendmsg() Cong Wang
2024-08-21 14:55 ` Simon Horman
2024-08-22  0:49   ` Cong Wang
2024-08-22 20:45     ` John Fastabend [this message]
2024-08-23  8:25       ` Simon Horman
2024-08-29 18:36       ` Martin KaFai Lau
2024-08-29 19:51         ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=66c7a37fd0270_1b1420837@john.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=horms@kernel.org \
    --cc=jakub@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=syzbot+58c03971700330ce14d8@syzkaller.appspotmail.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).