BPF List
 help / color / mirror / Atom feed
From: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
To: <kafai@fb.com>
Cc: <ast@kernel.org>, <benh@amazon.com>, <bpf@vger.kernel.org>,
	<daniel@iogearbox.net>, <davem@davemloft.net>,
	<edumazet@google.com>, <kuba@kernel.org>, <kuni1840@gmail.com>,
	<kuniyu@amazon.co.jp>, <linux-kernel@vger.kernel.org>,
	<netdev@vger.kernel.org>
Subject: Re: [RFC PATCH bpf-next 7/8] bpf: Call bpf_run_sk_reuseport() for socket migration.
Date: Fri, 20 Nov 2020 07:13:31 +0900	[thread overview]
Message-ID: <20201119221331.77586-1-kuniyu@amazon.co.jp> (raw)
In-Reply-To: <20201119010045.a6mqkzuv4tjruny6@kafai-mbp.dhcp.thefacebook.com>

From:   Martin KaFai Lau <kafai@fb.com>
Date:   Wed, 18 Nov 2020 17:00:45 -0800
> On Tue, Nov 17, 2020 at 06:40:22PM +0900, Kuniyuki Iwashima wrote:
> > This patch makes it possible to select a new listener for socket migration
> > by eBPF.
> > 
> > The noteworthy point is that we select a listening socket in
> > reuseport_detach_sock() and reuseport_select_sock(), but we do not have
> > struct skb in the unhash path.
> > 
> > Since we cannot pass skb to the eBPF program, we run only the
> > BPF_PROG_TYPE_SK_REUSEPORT program by calling bpf_run_sk_reuseport() with
> > skb NULL. So, some fields derived from skb are also NULL in the eBPF
> > program.
> More things need to be considered here when skb is NULL.
> 
> Some helpers are probably assuming skb is not NULL.
> 
> Also, the sk_lookup in filter.c is actually passing a NULL skb to avoid
> doing the reuseport select.

Honestly, I have missed this point...
I wanted users to reuse the same eBPF program seamlessly, but it seems unsafe.


> > Moreover, we can cancel migration by returning SK_DROP. This feature is
> > useful when listeners have different settings at the socket API level or
> > when we want to free resources as soon as possible.
> > 
> > Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
> > ---
> >  net/core/filter.c          | 26 +++++++++++++++++++++-----
> >  net/core/sock_reuseport.c  | 23 ++++++++++++++++++++---
> >  net/ipv4/inet_hashtables.c |  2 +-
> >  3 files changed, 42 insertions(+), 9 deletions(-)
> > 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 01e28f283962..ffc4591878b8 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -8914,6 +8914,22 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type,
> >  	SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF(S, NS, F, NF,		       \
> >  					     BPF_FIELD_SIZEOF(NS, NF), 0)
> >  
> > +#define SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF_OR_NULL(S, NS, F, NF, SIZE, OFF)	\
> > +	do {									\
> > +		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(S, F), si->dst_reg,	\
> > +				      si->src_reg, offsetof(S, F));		\
> > +		*insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 1);		\
> Although it may not matter much, always doing this check seems not very ideal
> considering the fast path will always have skb and only the slow
> path (accept-queue migrate) has skb is NULL.  I think the req_sk usually
> has the skb also except the timer one.

Yes, but the migration happens only when/after the listener is closed, so
I think it does not occur so frequently and will not be a problem.


> First thought is to create a temp skb but it has its own issues.
> or it may actually belong to a new prog type.  However, lets keep
> exploring possible options (including NULL skb).

I also thought up the two ideas, but the former will be a bit complicated.
And the latter makes users implement the new eBPF program. I did not want
users to struggle anymore, so I have selected the NULL skb. However, it is
not safe, so adding a new prog type seems to be the better way.

  reply	other threads:[~2020-11-19 22:13 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-17  9:40 [RFC PATCH bpf-next 0/8] Socket migration for SO_REUSEPORT Kuniyuki Iwashima
2020-11-17  9:40 ` [RFC PATCH bpf-next 1/8] net: Introduce net.ipv4.tcp_migrate_req Kuniyuki Iwashima
2020-11-17  9:40 ` [RFC PATCH bpf-next 2/8] tcp: Keep TCP_CLOSE sockets in the reuseport group Kuniyuki Iwashima
2020-11-17  9:40 ` [RFC PATCH bpf-next 3/8] tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues Kuniyuki Iwashima
2020-11-18 23:50   ` Martin KaFai Lau
2020-11-19 22:09     ` Kuniyuki Iwashima
2020-11-20  1:53       ` Martin KaFai Lau
2020-11-21 10:13         ` Kuniyuki Iwashima
2020-11-23  0:40           ` Martin KaFai Lau
2020-11-24  9:24             ` Kuniyuki Iwashima
2020-11-17  9:40 ` [RFC PATCH bpf-next 4/8] tcp: Migrate TFO requests causing RST during TCP_SYN_RECV Kuniyuki Iwashima
2020-11-17  9:40 ` [RFC PATCH bpf-next 5/8] tcp: Migrate TCP_NEW_SYN_RECV requests Kuniyuki Iwashima
2020-11-17  9:40 ` [RFC PATCH bpf-next 6/8] bpf: Add cookie in sk_reuseport_md Kuniyuki Iwashima
2020-11-19  0:11   ` Martin KaFai Lau
2020-11-19 22:10     ` Kuniyuki Iwashima
2020-11-17  9:40 ` [RFC PATCH bpf-next 7/8] bpf: Call bpf_run_sk_reuseport() for socket migration Kuniyuki Iwashima
2020-11-19  1:00   ` Martin KaFai Lau
2020-11-19 22:13     ` Kuniyuki Iwashima [this message]
2020-11-17  9:40 ` [RFC PATCH bpf-next 8/8] bpf: Test BPF_PROG_TYPE_SK_REUSEPORT " Kuniyuki Iwashima
2020-11-18  9:18 ` [RFC PATCH bpf-next 0/8] Socket migration for SO_REUSEPORT David Laight
2020-11-19 22:01   ` Kuniyuki Iwashima
2020-11-18 16:25 ` Eric Dumazet
2020-11-19 22:05   ` Kuniyuki Iwashima
2020-11-19  1:49 ` Martin KaFai Lau
2020-11-19 22:17   ` Kuniyuki Iwashima
2020-11-20  2:31     ` Martin KaFai Lau
2020-11-21 10:16       ` Kuniyuki Iwashima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201119221331.77586-1-kuniyu@amazon.co.jp \
    --to=kuniyu@amazon.co.jp \
    --cc=ast@kernel.org \
    --cc=benh@amazon.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kafai@fb.com \
    --cc=kuba@kernel.org \
    --cc=kuni1840@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox