All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: cong.wang@bytedance.com, daniel@iogearbox.net, lmb@isovalent.com,
	edumazet@google.com, bpf@vger.kernel.org, netdev@vger.kernel.org,
	ast@kernel.org, andrii@kernel.org, will@isovalent.com
Subject: Re: [PATCH bpf v2 02/12] bpf: sockmap, convert schedule_work into delayed_work
Date: Wed, 29 Mar 2023 13:09:37 +0200	[thread overview]
Message-ID: <874jq3dcw1.fsf@cloudflare.com> (raw)
In-Reply-To: <642362a1403ee_286af20850@john.notmuch>

On Tue, Mar 28, 2023 at 02:56 PM -07, John Fastabend wrote:
> Jakub Sitnicki wrote:
>> On Mon, Mar 27, 2023 at 10:54 AM -07, John Fastabend wrote:
>> > Sk_buffs are fed into sockmap verdict programs either from a strparser
>> > (when the user might want to decide how framing of skb is done by attaching
>> > another parser program) or directly through tcp_read_sock. The
>> > tcp_read_sock is the preferred method for performance when the BPF logic is
>> > a stream parser.
>> >
>> > The flow for Cilium's common use case with a stream parser is,
>> >
>> >  tcp_read_sock()
>> >   sk_psock_verdict_recv
>> >     ret = bpf_prog_run_pin_on_cpu()
>> >     sk_psock_verdict_apply(sock, skb, ret)
>> >      // if system is under memory pressure or app is slow we may
>> >      // need to queue skb. Do this queuing through ingress_skb and
>> >      // then kick timer to wake up handler
>> >      skb_queue_tail(ingress_skb, skb)
>> >      schedule_work(work);
>> >
>> >
>> > The work queue is wired up to sk_psock_backlog(). This will then walk the
>> > ingress_skb skb list that holds our sk_buffs that could not be handled,
>> > but should be OK to run at some later point. However, its possible that
>> > the workqueue doing this work still hits an error when sending the skb.
>> > When this happens the skbuff is requeued on a temporary 'state' struct
>> > kept with the workqueue. This is necessary because its possible to
>> > partially send an skbuff before hitting an error and we need to know how
>> > and where to restart when the workqueue runs next.
>> >
>> > Now for the trouble, we don't rekick the workqueue. This can cause a
>> > stall where the skbuff we just cached on the state variable might never
>> > be sent. This happens when its the last packet in a flow and no further
>> > packets come along that would cause the system to kick the workqueue from
>> > that side.
>> >
>> > To fix we could do simple schedule_work(), but while under memory pressure
>> > it makes sense to back off some instead of continue to retry repeatedly. So
>> > instead to fix convert schedule_work to schedule_delayed_work and add
>> > backoff logic to reschedule from backlog queue on errors. Its not obvious
>> > though what a good backoff is so use '1'.
>> >
>> > To test we observed some flakes whil running NGINX compliance test with
>> > sockmap we attributed these failed test to this bug and subsequent issue.
>> >
>> > Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()")
>> > Tested-by: William Findlay <will@isovalent.com>
>> > Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>> > ---
>
> [...]
>
>> > --- a/net/core/skmsg.c
>> > +++ b/net/core/skmsg.c
>> > @@ -481,7 +481,7 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
>> >  	}
>> >  out:
>> >  	if (psock->work_state.skb && copied > 0)
>> > -		schedule_work(&psock->work);
>> > +		schedule_delayed_work(&psock->work, 0);
>> >  	return copied;
>> >  }
>> >  EXPORT_SYMBOL_GPL(sk_msg_recvmsg);
>> > @@ -639,7 +639,8 @@ static void sk_psock_skb_state(struct sk_psock *psock,
>> >  
>> >  static void sk_psock_backlog(struct work_struct *work)
>> >  {
>> > -	struct sk_psock *psock = container_of(work, struct sk_psock, work);
>> > +	struct delayed_work *dwork = to_delayed_work(work);
>> > +	struct sk_psock *psock = container_of(dwork, struct sk_psock, work);
>> >  	struct sk_psock_work_state *state = &psock->work_state;
>> >  	struct sk_buff *skb = NULL;
>> >  	bool ingress;
>> > @@ -679,6 +680,10 @@ static void sk_psock_backlog(struct work_struct *work)
>> >  				if (ret == -EAGAIN) {
>> >  					sk_psock_skb_state(psock, state, skb,
>> >  							   len, off);
>> > +
>> > +					// Delay slightly to prioritize any
>> > +					// other work that might be here.
>> > +					schedule_delayed_work(&psock->work, 1);
>> 
>> Do IIUC that this means we can back out changes from commit bec217197b41
>> ("skmsg: Schedule psock work if the cached skb exists on the psock")?
>
> Yeah I think so this is a more direct way to get the same result. I'm also
> thinking this check,
>
>        if (psock->work_state.skb && copied > 0)
>                schedule_work(&psock->work)
>
> is not correct copied=0 which could happen on empty queue could be the
> result of a skb stuck from this eagain error in backlog.

I suspect the 'copied > 0' check is there to handle the 0-length read
scenario. But I think you're right, that the empty queue scenario is not
being handled properly.

> I think its OK to revert that patch in a separate patch. And ideally we
> could get some way to load up the stack to hit these corner cases without
> running long stress tests.
>
> WDYT?

Yeah, the revert can wait. I was just curious if my thinking was
right. There is plenty of material in this series as is :-)

  reply	other threads:[~2023-03-29 11:18 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-27 17:54 [PATCH bpf v2 00/11] bpf sockmap fixes John Fastabend
2023-03-27 17:54 ` [PATCH bpf v2 01/12] bpf: sockmap, pass skb ownership through read_skb John Fastabend
2023-03-28 10:42   ` Jakub Sitnicki
2023-03-27 17:54 ` [PATCH bpf v2 02/12] bpf: sockmap, convert schedule_work into delayed_work John Fastabend
2023-03-28 12:09   ` Jakub Sitnicki
2023-03-28 21:56     ` John Fastabend
2023-03-29 11:09       ` Jakub Sitnicki [this message]
2023-03-27 17:54 ` [PATCH bpf v2 03/12] bpf: sockmap, improved check for empty queue John Fastabend
2023-03-29 12:24   ` Jakub Sitnicki
2023-04-01  0:59     ` John Fastabend
2023-04-03  8:42       ` Jakub Sitnicki
2023-03-27 17:54 ` [PATCH bpf v2 04/12] bpf: sockmap, handle fin correctly John Fastabend
2023-04-03 11:11   ` Jakub Sitnicki
2023-04-03 21:05     ` John Fastabend
2023-04-04 10:11       ` Jakub Sitnicki
2023-03-27 17:54 ` [PATCH bpf v2 05/12] bpf: sockmap, TCP data stall on recv before accept John Fastabend
2023-03-27 17:54 ` [PATCH bpf v2 06/12] bpf: sockmap, wake up polling after data copy John Fastabend
2023-03-27 17:54 ` [PATCH bpf v2 07/12] bpf: sockmap incorrectly handling copied_seq John Fastabend
2023-03-27 17:54 ` [PATCH bpf v2 08/12] bpf: sockmap, pull socket helpers out of listen test for general use John Fastabend
2023-03-27 17:54 ` [PATCH bpf v2 09/12] bpf: sockmap, build helper to create connected socket pair John Fastabend
2023-03-27 17:54 ` [PATCH bpf v2 10/12] bpf: sockmap, test shutdown() correctly exits epoll and recv()=0 John Fastabend
2023-03-27 17:54 ` [PATCH bpf v2 11/12] bpf: sockmap, test FIONREAD returns correct bytes in rx buffer John Fastabend
2023-03-27 17:54 ` [PATCH bpf v2 12/12] bpf: sockmap, test FIONREAD returns correct bytes in rx buffer with drops John Fastabend
2023-03-27 18:17 ` [PATCH bpf v2 00/11] bpf sockmap fixes John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874jq3dcw1.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=edumazet@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=lmb@isovalent.com \
    --cc=netdev@vger.kernel.org \
    --cc=will@isovalent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.