From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DC36C3DA7D for ; Wed, 4 Jan 2023 02:37:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229717AbjADChT (ORCPT ); Tue, 3 Jan 2023 21:37:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54008 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229534AbjADChS (ORCPT ); Tue, 3 Jan 2023 21:37:18 -0500 Received: from out-87.mta0.migadu.com (out-87.mta0.migadu.com [91.218.175.87]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C5A1EE6C for ; Tue, 3 Jan 2023 18:37:16 -0800 (PST) Message-ID: <14f42c5c-c6a7-885a-ab31-4d13ba6af440@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1672799834; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vu6Xx7rbeOjsqMyImTE6U9sA5lQAyt/VR3NwD1GgxlQ=; b=lVt5fabFPWsU5TSZvmH13AwI9cLiHJf2BvaM1jG1+BotNu21jX+kdQ2ETUe3DAG3rRtDfi MvNIq2wYEd2llpHaIZL1yHBtTNS7XP9IVNUN1Rj1N2aut1DJFQDZz1sQ9UAx6jS6OR/j9h HnIwjl/aVz1GMwZl1+kezvFQKkcgXNo= Date: Tue, 3 Jan 2023 18:37:12 -0800 MIME-Version: 1.0 Subject: Re: [PATCH 1/2] bpf: Add socket destroy capability Content-Language: en-US To: Aditi Ghag Cc: sdf@google.com, edumazet@google.com, bpf@vger.kernel.org, Martin KaFai Lau References: <3d8066d4-b293-ec13-2437-1ee9b1ed4cc4@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Martin KaFai Lau In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On 1/2/23 11:08 AM, Aditi Ghag wrote: > OT: While we can use the has_current_bpf_ctx macro to skip taking sock > locks in the BPF context, how is the assumption around locking enforced in > the code (e.g., setsockopt helper assumes that BPF iter prog acquires the > relevant locks)? The lock condition is enforced by the verifier when loading bpf prog. The bpf_setsockopt helper is only available to the bpf hooks that have already held/owned the sock lock. For example, if a bpf-tc prog tries to call bpf_setsockopt(), the verifier will reject loading the prog because the bpf-tc hook does not hold/own the sock lock. > We could potentially use lockdep_sock_is_held for the sock lock. > It's a debug configuration, but I suppose it's enabled in selftests. Not sure I totally get it. Meaning adding a sock_owned_by_me() check somewhere in the new bpf_sk_destroy() kfunc for debug purpose? hmm...yeah, why not but not sure how that may look like. I think it is some minor implementation details and will be easier to comment with code. [ btw, please reply inline instead of top posting. ] > > On Wed, Dec 21, 2022 at 9:08 PM Martin KaFai Lau > wrote: > >> On 12/16/22 5:57 PM, Aditi Ghag wrote: >>> The socket destroy helper is used to >>> forcefully terminate sockets from certain >>> BPF contexts. We plan to use the capability >>> in Cilium to force client sockets to reconnect >>> when their remote load-balancing backends are >>> deleted. The other use case is on-the-fly >>> policy enforcement where existing socket >>> connections prevented by policies need to >>> be terminated. >>> >>> The helper is currently exposed to iterator >>> type BPF programs where users can filter, >>> and terminate a set of sockets. >>> >>> Sockets are destroyed asynchronously using >>> the work queue infrastructure. This allows >>> for current the locking semantics within >>> socket destroy handlers, as BPF iterators >>> invoking the helper acquire *sock* locks. >>> This also allows the helper to be invoked >>> from non-sleepable contexts. >>> The other approach to skip acquiring locks >>> by passing an argument to the `diag_destroy` >>> handler didn't work out well for UDP, as >>> the UDP abort function internally invokes >>> another function that ends up acquiring >>> *sock* lock. >>> While there are sleepable BPF iterators, >>> these are limited to only certain map types. >> >> bpf-iter program can be sleepable and non sleepable. Both sleepable and >> non >> sleepable tcp/unix bpf-iter programs have been able to call >> bpf_setsockopt() >> synchronously. bpf_setsockopt() also requires the sock lock to be held >> first. >> The situation on calling '.diag_destroy' from bpf-iter should not be much >> different from calling bpf_setsockopt(). From a quick look at tcp_abort >> and >> udp_abort, I don't see they might sleep also and you may want to double >> check. >> Even '.diag_destroy' was only usable in sleepable 'bpf-iter' because it >> might >> sleep, the common bpf map types are already available to the sleepable >> programs. >> >> At the kernel side, the tcp and unix iter acquire the lock_sock() first >> (eg. in >> bpf_iter_tcp_seq_show()) before calling the bpf-iter prog . At the kernel >> setsockopt code (eg. do_ip_setsockopt()), it uses sockopt_lock_sock() and >> avoids >> doing the lock if it has already been guaranteed by the bpf running >> context. >> >> For udp, I don't see how the udp_abort acquires the sock lock differently >> from >> tcp_abort. I assume the actual problem seen in udp_abort is related to >> the >> '->unhash()' part which acquires the udp_table's bucket lock. This is a >> problem >> for udp bpf-iter only because the udp bpf-iter did not release the >> udp_table's >> bucket lock before calling the bpf prog. The tcp (and unix) bpf-iter >> releases >> the bucket lock first before calling the bpf prog. This was done >> explicitly to >> allow acquiring the sock lock before calling the bpf prog because >> otherwise it >> will have a lock ordering issue. Hence, for this reason, bpf_setsockopt() >> is >> only available to tcp and unix bpf-iter but not udp bpf-iter. The udp-iter >> needs >> to do similar change like the tcp and unix iter >> (https://lore.kernel.org/bpf/20210701200535.1033513-1-kafai@fb.com/): >> batch, >> release the bucket's lock, lock the sock, and then call bpf prog. This >> will >> allow udp-iter to call bpf_setsockopt() like its tcp and unix >> counterpart. That >> will also allow udp bpf-iter prog to directly do '.diag_destroy'. >> >