All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: Martin Lau <kafai@fb.com>
Cc: "bpf\@vger.kernel.org" <bpf@vger.kernel.org>,
	"netdev\@vger.kernel.org" <netdev@vger.kernel.org>,
	"kernel-team\@cloudflare.com" <kernel-team@cloudflare.com>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"John Fastabend" <john.fastabend@gmail.com>,
	Lorenz Bauer <lmb@cloudflare.com>
Subject: Re: [PATCH bpf-next v3 10/12] net: Generate reuseport group ID on group creation
Date: Thu, 23 Jan 2020 11:59:46 +0100	[thread overview]
Message-ID: <875zh230bh.fsf@cloudflare.com> (raw)
In-Reply-To: <20200122225351.hajnt4u7au24mj5g@kafai-mbp.dhcp.thefacebook.com>

On Wed, Jan 22, 2020 at 11:53 PM CET, Martin Lau wrote:
> On Wed, Jan 22, 2020 at 02:05:47PM +0100, Jakub Sitnicki wrote:
>> Commit 736b46027eb4 ("net: Add ID (if needed) to sock_reuseport and expose
>> reuseport_lock") has introduced lazy generation of reuseport group IDs that
>> survive group resize.
>> 
>> By comparing the identifier we check if BPF reuseport program is not trying
>> to select a socket from a BPF map that belongs to a different reuseport
>> group than the one the packet is for.
>> 
>> Because SOCKARRAY used to be the only BPF map type that can be used with
>> reuseport BPF, it was possible to delay the generation of reuseport group
>> ID until a socket from the group was inserted into BPF map for the first
>> time.
>> 
>> Now that SOCKMAP can be used with reuseport BPF we have two options, either
>> generate the reuseport ID on map update, like SOCKARRAY does, or allocate
>> an ID from the start when reuseport group gets created.
>> 
>> This patch goes the latter approach to keep SOCKMAP free of calls into
>> reuseport code. This streamlines the reuseport_id access as its lifetime
>> now matches the longevity of reuseport object.
>> 
>> The cost of this simplification, however, is that we allocate reuseport IDs
>> for all SO_REUSEPORT users. Even those that don't use SOCKARRAY in their
>> setups. With the way identifiers are currently generated, we can have at
>> most S32_MAX reuseport groups, which hopefully is sufficient.
> Not sure if it would be a concern.  I think it is good as is.
> For TCP, that would mean billion different ip:port listening socks
> in inet_hashinfo.
>
> If it came to that, another idea is to use a 64bit reuseport_id which
> practically won't wrap around.  It could use the very first sk->sk_cookie
> as the reuseport_id.  All the ida logic will go away also in the expense
> of +4 bytes.

Thanks for the idea. I'll add it to the patch description.

>
>> 
>> Another change is that we now always call into SOCKARRAY logic to unlink
>> the socket from the map when unhashing or closing the socket. Previously we
>> did it only when at least one socket from the group was in a BPF map.
>> 
>> It is worth noting that this doesn't conflict with SOCKMAP tear-down in
>> case a socket is in a SOCKMAP and belongs to a reuseport group. SOCKMAP
>> tear-down happens first:
>> 
>>   prot->unhash
>>   `- tcp_bpf_unhash
>>      |- tcp_bpf_remove
>>      |  `- while (sk_psock_link_pop(psock))
>>      |     `- sk_psock_unlink
>>      |        `- sock_map_delete_from_link
>>      |           `- __sock_map_delete
>>      |              `- sock_map_unref
>>      |                 `- sk_psock_put
>>      |                    `- sk_psock_drop
>>      |                       `- rcu_assign_sk_user_data(sk, NULL)
>>      `- inet_unhash
>>         `- reuseport_detach_sock
>>            `- bpf_sk_reuseport_detach
>>               `- WRITE_ONCE(sk->sk_user_data, NULL)
> Thanks for the details.
>
> [ ... ]
>
>> @@ -200,12 +189,10 @@ void reuseport_detach_sock(struct sock *sk)
>>  	reuse = rcu_dereference_protected(sk->sk_reuseport_cb,
>>  					  lockdep_is_held(&reuseport_lock));
>>  
>> -	/* At least one of the sk in this reuseport group is added to
>> -	 * a bpf map.  Notify the bpf side.  The bpf map logic will
>> -	 * remove the sk if it is indeed added to a bpf map.
>> +	/* Notify the bpf side. The sk may be added to bpf map. The
>> +	 * bpf map logic will remove the sk from the map if indeed.
> s/indeed/needed/ ?
>
> I think it will be good to have a few words here like, that is needed
> by sockarray but not necessary for sockmap which has its own ->unhash
> to remove itself from the map.


Yeah, I didn't do it justice. Will expand the comment.

[...]

  reply	other threads:[~2020-01-23 10:59 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-22 13:05 [PATCH bpf-next v3 00/12] Extend SOCKMAP to store listening sockets Jakub Sitnicki
2020-01-22 13:05 ` [PATCH bpf-next v3 01/12] bpf, sk_msg: Don't clear saved sock proto on restore Jakub Sitnicki
2020-01-22 13:05 ` [PATCH bpf-next v3 02/12] net, sk_msg: Annotate lockless access to sk_prot on clone Jakub Sitnicki
2020-01-22 22:57   ` Martin Lau
2020-01-22 13:05 ` [PATCH bpf-next v3 03/12] net, sk_msg: Clear sk_user_data pointer on clone if tagged Jakub Sitnicki
2020-01-22 13:05 ` [PATCH bpf-next v3 04/12] tcp_bpf: Don't let child socket inherit parent protocol ops on copy Jakub Sitnicki
2020-01-22 20:35   ` Martin Lau
2020-01-23 10:34     ` Jakub Sitnicki
2020-01-22 13:05 ` [PATCH bpf-next v3 05/12] bpf, sockmap: Allow inserting listening TCP sockets into sockmap Jakub Sitnicki
2020-01-22 20:52   ` Martin Lau
2020-01-23 10:41     ` Jakub Sitnicki
2020-01-22 13:05 ` [PATCH bpf-next v3 06/12] bpf, sockmap: Don't set up sockmap progs for listening sockets Jakub Sitnicki
2020-01-22 16:24   ` Lorenz Bauer
2020-01-22 18:07     ` Jakub Sitnicki
2020-01-22 23:11   ` Martin Lau
2020-01-22 13:05 ` [PATCH bpf-next v3 07/12] bpf, sockmap: Return socket cookie on lookup from syscall Jakub Sitnicki
2020-01-22 13:05 ` [PATCH bpf-next v3 08/12] bpf, sockmap: Let all kernel-land lookup values in SOCKMAP Jakub Sitnicki
2020-01-22 23:02   ` Martin Lau
2020-01-22 13:05 ` [PATCH bpf-next v3 09/12] bpf: Allow selecting reuseport socket from a SOCKMAP Jakub Sitnicki
2020-01-22 23:08   ` Martin Lau
2020-01-22 13:05 ` [PATCH bpf-next v3 10/12] net: Generate reuseport group ID on group creation Jakub Sitnicki
2020-01-22 22:53   ` Martin Lau
2020-01-23 10:59     ` Jakub Sitnicki [this message]
2020-01-22 13:05 ` [PATCH bpf-next v3 11/12] selftests/bpf: Extend SK_REUSEPORT tests to cover SOCKMAP Jakub Sitnicki
2020-01-22 13:05 ` [PATCH bpf-next v3 12/12] selftests/bpf: Tests for SOCKMAP holding listening sockets Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875zh230bh.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kernel-team@cloudflare.com \
    --cc=lmb@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.