From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 391A213C9CA; Tue, 18 Jun 2024 08:14:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718698495; cv=none; b=cgJln2LBJE8x12mdTg11Z/KugpFO/NUrlqLB8e56lBrMlacKMDOUmUsAoXu4Xe7ZKz5P+eRVGUGOMB+MDae3aXGYWe/ufk3EfCCGmgujjWQ/6eb5ojGo7ziVpZV2XJsTnBXWilQZb5qmeJ6jsKayekZIYMCJYxBZqcJox2QvsEM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718698495; c=relaxed/simple; bh=zlCDXefRON0FAF8gTSzpEaEP8+H4tiyr3HYpIgBFslg=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=fGc3qKjISK9NE1v0op8ACBCUxQkKj8hFUj7QaVM79APn209oUWniJrTCGajZIkX7eWlJjI+zVCvckEPdeXD83CBJCdvx1Oq81+GlMggcdqHmp4uGgJCF+jrl2Jq23LTqGNDz6iby52L2DsUlj4gVVBQvEy0Xc0BXJ7TAU1jPhiY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BTM03l7l; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BTM03l7l" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CEEE0C4AF1D; Tue, 18 Jun 2024 08:14:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1718698494; bh=zlCDXefRON0FAF8gTSzpEaEP8+H4tiyr3HYpIgBFslg=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=BTM03l7lfrQLJnaYzBrrkZKaZ6N7NWnlO3XETb8YQ1F7Ep+ARGVEmDiYYZM2YQBGq JSCeqs7wDvKWXCSrtDRPl5RPppnPge6q1XBS3d3ZmIXYR6elE1PINPD4rAYi8RdSVt FEoEpws5BzuPrwlDYkW7MllJ6zNnRuD6nvvqy0xwDYOrla3fDgd3Xk/3u/RUVHfeZR diqZNcjMhxWzh8oY7M3NM59/jYlaD4ItsQN+Sc34jSt1NyTHyV6/GcY+nKJetU/sgh 02iiPAKXigPcL9At1SiPh0YGrJp+5hHEfbzUiyAz6vvjZeWJ6BBYhqhJDONAMkF7Vl S6xC7RJbgZTlg== Message-ID: <532e7984-91a8-4faf-8367-bb309884c8e8@kernel.org> Date: Tue, 18 Jun 2024 10:14:31 +0200 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v7 net-next 14/15] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT. To: Sebastian Andrzej Siewior , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Daniel Bristot de Oliveira , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Hao Luo , Jiri Olsa , John Fastabend , KP Singh , Martin KaFai Lau , Song Liu , Stanislav Fomichev , =?UTF-8?Q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Yonghong Song , bpf@vger.kernel.org References: <20240618072526.379909-1-bigeasy@linutronix.de> <20240618072526.379909-15-bigeasy@linutronix.de> Content-Language: en-US From: Jesper Dangaard Brouer In-Reply-To: <20240618072526.379909-15-bigeasy@linutronix.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 18/06/2024 09.13, Sebastian Andrzej Siewior wrote: > The XDP redirect process is two staged: > - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the > packet and makes decisions. While doing that, the per-CPU variable > bpf_redirect_info is used. > > - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info > and it may also access other per-CPU variables like xskmap_flush_list. > > At the very end of the NAPI callback, xdp_do_flush() is invoked which > does not access bpf_redirect_info but will touch the individual per-CPU > lists. > > The per-CPU variables are only used in the NAPI callback hence disabling > bottom halves is the only protection mechanism. Users from preemptible > context (like cpu_map_kthread_run()) explicitly disable bottom halves > for protections reasons. > Without locking in local_bh_disable() on PREEMPT_RT this data structure > requires explicit locking. > > PREEMPT_RT has forced-threaded interrupts enabled and every > NAPI-callback runs in a thread. If each thread has its own data > structure then locking can be avoided. > > Create a struct bpf_net_context which contains struct bpf_redirect_info. > Define the variable on stack, use bpf_net_ctx_set() to save a pointer to > it, bpf_net_ctx_clear() removes it again. > The bpf_net_ctx_set() may nest. For instance a function can be used from > within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and > NET_TX_SOFTIRQ which does not. Therefore only the first invocations > updates the pointer. > Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct > bpf_redirect_info. The returned data structure is zero initialized to > ensure nothing is leaked from stack. This is done on first usage of the > struct. bpf_net_ctx_set() sets bpf_redirect_info::kern_flags to 0 to > note that initialisation is required. First invocation of > bpf_net_ctx_get_ri() will memset() the data structure and update > bpf_redirect_info::kern_flags. > bpf_redirect_info::nh is excluded from memset because it is only used > once BPF_F_NEIGH is set which also sets the nh member. The kern_flags is > moved past nh to exclude it from memset. > > The pointer to bpf_net_context is saved task's task_struct. Using > always the bpf_net_context approach has the advantage that there is > almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds. > > Cc: Alexei Starovoitov > Cc: Andrii Nakryiko > Cc: Eduard Zingerman > Cc: Hao Luo > Cc: Jesper Dangaard Brouer > Cc: Jiri Olsa > Cc: John Fastabend > Cc: KP Singh > Cc: Martin KaFai Lau > Cc: Song Liu > Cc: Stanislav Fomichev > Cc: Toke Høiland-Jørgensen > Cc: Yonghong Song > Cc:bpf@vger.kernel.org > Acked-by: Alexei Starovoitov > Reviewed-by: Toke Høiland-Jørgensen > Signed-off-by: Sebastian Andrzej Siewior Acked-by: Jesper Dangaard Brouer > --- > include/linux/filter.h | 56 ++++++++++++++++++++++++++++++++++-------- > include/linux/sched.h | 3 +++ > kernel/bpf/cpumap.c | 3 +++ > kernel/bpf/devmap.c | 9 ++++++- > kernel/fork.c | 1 + > net/bpf/test_run.c | 11 ++++++++- > net/core/dev.c | 26 +++++++++++++++++++- > net/core/filter.c | 44 +++++++++------------------------ > net/core/lwt_bpf.c | 3 +++ > 9 files changed, 111 insertions(+), 45 deletions(-)