Re: [Patch bpf v2] skmsg: check sk_rcvbuf limit before queuing to ingress_skb

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jakub Sitnicki <jakub@cloudflare.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>, Cong Wang <cong.wang@bytedance.com>,
	Jiang Wang <jiang.wang@bytedance.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Lorenz Bauer <lmb@cloudflare.com>
Subject: Re: [Patch bpf v2] skmsg: check sk_rcvbuf limit before queuing to ingress_skb
Date: Mon, 05 Jul 2021 10:24:15 +0200	[thread overview]
Message-ID: <8735strwwg.fsf@cloudflare.com> (raw)
In-Reply-To: <CAM_iQpW69PGfp_Y8mZoqznwCk2axask5qJLB7ntZjFgGO+Eizg@mail.gmail.com>

On Sun, Jul 04, 2021 at 09:53 PM CEST, Cong Wang wrote:
> On Sat, Jul 3, 2021 at 10:52 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>> When running with just the verdict prog attached, the -EIO error from
>> sk_psock_verdict_apply is propagated up to tcp_read_sock. That is, it
>> maps to 0 bytes used by recv_actor. sk_psock_verdict_recv in this case.
>>
>> tcp_read_sock, if 0 bytes were used = copied, won't sk_eat_skb. It stays
>> on sk_receive_queue.
>
> Are you sure?
>
> When recv_actor() returns 0, the while loop breaks:
>
> 1661                         used = recv_actor(desc, skb, offset, len);
> 1662                         if (used <= 0) {
> 1663                                 if (!copied)
> 1664                                         copied = used;
> 1665                                 break;
>
> then it calls sk_eat_skb() a few lines after the loop:
> ...
> 1690                 sk_eat_skb(sk, skb);

This sk_eat_skb is still within the loop:

1636:int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
1637-		  sk_read_actor_t recv_actor)
1638-{
	…
1643-	int copied = 0;
        …
1647-	while ((skb = tcp_recv_skb(sk, seq, &offset)) != NULL) {
1648-		if (offset < skb->len) {
			…
1661-			used = recv_actor(desc, skb, offset, len);
1662-			if (used <= 0) {
1663-				if (!copied)
1664-					copied = used;
1665-				break;
1666-			} else if (used <= len) {
1667-				seq += used;
1668-				copied += used;
1669-				offset += used;
1670-			}
			…
1684-		}
		…
1690-		sk_eat_skb(sk, skb);
		…
1694-	}
	…
1699-	/* Clean up data we have read: This will do ACK frames. */
1700-	if (copied > 0) {
1701-		tcp_recv_skb(sk, seq, &offset);
1702-		tcp_cleanup_rbuf(sk, copied);
1703-	}
1704-	return copied;
1705-}

sk_eat_skb could get called by tcp_recv_skb → sk_eat_skb if recv_actor
returned > 0 (the case when we have parser attached).

>
>>
>>   sk->sk_data_ready
>>     sk_psock_verdict_data_ready
>>       ->read_sock(..., sk_psock_verdict_recv)
>>         tcp_read_sock (used = copied = 0)
>>           sk_psock_verdict_recv -> ret = 0
>>             sk_psock_verdict_apply -> -EIO
>>               sk_psock_skb_redirect -> -EIO
>>
>> However, I think this gets us stuck. What if no more data gets queued,
>> and sk_data_ready doesn't get called again?
>
> I think it is dropped by sk_eat_skb() in TCP case and of course the
> sender will retransmit it after detecting this loss. So from this point of
> view, there is no difference between drops due to overlimit and drops
> due to eBPF program policy.

I'm not sure the retransmit will happen.

We update tp->rcv_nxt (tcp_rcv_nxt_update) when skb gets pushed onto
sk_receive_queue in either:

 - tcp_rcv_established -> tcp_queue_rcv, or
 - tcp_rcv_established -> tcp_data_queue -> tcp_queue_rcv

... and schedule ACK (tcp_event_data_recv) to be sent.

Say we are in quickack mode, then
tcp_ack_snd_check()/__tcp_ack_snd_check() would cause ACK to be sent
out.

next prev parent reply	other threads:[~2021-07-05  8:24 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-01  6:16 [Patch bpf v2] skmsg: check sk_rcvbuf limit before queuing to ingress_skb Cong Wang
2021-07-01 15:56 ` Jakub Sitnicki
2021-07-01 16:26   ` John Fastabend
2021-07-01 16:23 ` John Fastabend
2021-07-01 18:00   ` Cong Wang
2021-07-02 19:33     ` Cong Wang
2021-07-03 17:52   ` Jakub Sitnicki
2021-07-04 13:10     ` Jakub Sitnicki
2021-07-04 19:53     ` Cong Wang
2021-07-05  8:24       ` Jakub Sitnicki [this message]
2021-07-05 16:24         ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8735strwwg.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=jiang.wang@bytedance.com \
    --cc=john.fastabend@gmail.com \
    --cc=lmb@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.