From: Jakub Sitnicki <jakub@cloudflare.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>,
netdev@vger.kernel.org, bpf@vger.kernel.org,
Cong Wang <cong.wang@bytedance.com>,
Jiang Wang <jiang.wang@bytedance.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Lorenz Bauer <lmb@cloudflare.com>
Subject: Re: [Patch bpf v2] skmsg: check sk_rcvbuf limit before queuing to ingress_skb
Date: Sat, 03 Jul 2021 19:52:35 +0200 [thread overview]
Message-ID: <875yxrs2sc.fsf@cloudflare.com> (raw)
In-Reply-To: <60ddec01c651b_3fe24208dc@john-XPS-13-9370.notmuch>
On Thu, Jul 01, 2021 at 06:23 PM CEST, John Fastabend wrote:
[...]
>> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
>> index 9b6160a191f8..a5185c781332 100644
>> --- a/net/core/skmsg.c
>> +++ b/net/core/skmsg.c
>> @@ -854,7 +854,8 @@ static int sk_psock_skb_redirect(struct sk_psock *from, struct sk_buff *skb)
>> return -EIO;
>> }
>> spin_lock_bh(&psock_other->ingress_lock);
>> - if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) {
>> + if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED) ||
>> + atomic_read(&sk_other->sk_rmem_alloc) > READ_ONCE(sk_other->sk_rcvbuf)) {
>> spin_unlock_bh(&psock_other->ingress_lock);
>> skb_bpf_redirect_clear(skb);
>> sock_drop(from->sk, skb);
>> @@ -930,7 +931,8 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
>> }
>> if (err < 0) {
>> spin_lock_bh(&psock->ingress_lock);
>> - if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
>> + if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED) &&
>> + atomic_read(&sk_other->sk_rmem_alloc) <= READ_ONCE(sk_other->sk_rcvbuf)) {
>> skb_queue_tail(&psock->ingress_skb, skb);
>
> We can't just drop the packet in the memory overrun case here. This will
> break TCP because the data will be gone and no one will retransmit.
I don't think it's always the case that data will be gone. But you're
right that it breaks TCP. I was too quick to Ack this patch.
When running with just the verdict prog attached, the -EIO error from
sk_psock_verdict_apply is propagated up to tcp_read_sock. That is, it
maps to 0 bytes used by recv_actor. sk_psock_verdict_recv in this case.
tcp_read_sock, if 0 bytes were used = copied, won't sk_eat_skb. It stays
on sk_receive_queue.
sk->sk_data_ready
sk_psock_verdict_data_ready
->read_sock(..., sk_psock_verdict_recv)
tcp_read_sock (used = copied = 0)
sk_psock_verdict_recv -> ret = 0
sk_psock_verdict_apply -> -EIO
sk_psock_skb_redirect -> -EIO
However, I think this gets us stuck. What if no more data gets queued,
and sk_data_ready doesn't get called again?
Then there is the case when a parser prog is attached. In this case the
skb is really gone if we drop it on redirect.
In sk_psock_strp_read, we ignore the -EIO error from
sk_psock_verdict_apply, and return to tcp_read_sock how many bytes have
been parsed.
sk->sk_data_ready
sk_psock_verdict_data_ready
->read_sock(..., sk_psock_verdict_recv)
tcp_read_sock (used = copied = eaten)
strp_recv -> ret = eaten
__strp_recv -> ret = eaten
strp->cb.rcv_msg -> -EIO
sk_psock_verdict_apply -> -EIO
sk_psock_redirect -> -EIO
Maybe we could put the skb back on strp->skb_head list on error, though?
But again some notification would need to trigger a re-read, or we are
stuck.
next prev parent reply other threads:[~2021-07-03 17:52 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-01 6:16 [Patch bpf v2] skmsg: check sk_rcvbuf limit before queuing to ingress_skb Cong Wang
2021-07-01 15:56 ` Jakub Sitnicki
2021-07-01 16:26 ` John Fastabend
2021-07-01 16:23 ` John Fastabend
2021-07-01 18:00 ` Cong Wang
2021-07-02 19:33 ` Cong Wang
2021-07-03 17:52 ` Jakub Sitnicki [this message]
2021-07-04 13:10 ` Jakub Sitnicki
2021-07-04 19:53 ` Cong Wang
2021-07-05 8:24 ` Jakub Sitnicki
2021-07-05 16:24 ` John Fastabend
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=875yxrs2sc.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=bpf@vger.kernel.org \
--cc=cong.wang@bytedance.com \
--cc=daniel@iogearbox.net \
--cc=jiang.wang@bytedance.com \
--cc=john.fastabend@gmail.com \
--cc=lmb@cloudflare.com \
--cc=netdev@vger.kernel.org \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).