From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E830EB64DA for ; Fri, 14 Jul 2023 09:48:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235974AbjGNJsR (ORCPT ); Fri, 14 Jul 2023 05:48:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235072AbjGNJsE (ORCPT ); Fri, 14 Jul 2023 05:48:04 -0400 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:237:300::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF8A32D73; Fri, 14 Jul 2023 02:47:59 -0700 (PDT) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1qKFOn-0002GU-MB; Fri, 14 Jul 2023 11:47:41 +0200 Date: Fri, 14 Jul 2023 11:47:41 +0200 From: Florian Westphal To: Daniel Xu Cc: Alexei Starovoitov , Andrii Nakryiko , Alexei Starovoitov , Florian Westphal , "David S. Miller" , Pablo Neira Ayuso , Paolo Abeni , Daniel Borkmann , Eric Dumazet , Jakub Kicinski , Jozsef Kadlecsik , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , bpf , LKML , netfilter-devel , coreteam@netfilter.org, Network Development , David Ahern Subject: Re: [PATCH bpf-next v4 2/6] netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link Message-ID: <20230714094741.GA7912@breakpoint.cc> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org Daniel Xu wrote: > On Thu, Jul 13, 2023 at 04:10:03PM -0700, Alexei Starovoitov wrote: > > Why is rcu_assign_pointer() used? > > If it's not RCU protected, what is the point of rcu_*() accessors > > and rcu_read_lock() ? > > > > In general, the pattern: > > rcu_read_lock(); > > ptr = rcu_dereference(...); > > rcu_read_unlock(); > > ptr->.. > > is a bug. 100%. FWIW, I agree with Alexei, it does look... dodgy. > The reason I left it like this is b/c otherwise I think there is a race > with module unload and taking a refcnt. For example: > > ptr = READ_ONCE(global_var) > > // ptr invalid > try_module_get(ptr->owner) > Yes, I agree. > I think the the synchronize_rcu() call in > kernel/module/main.c:free_module() protects against that race based on > my reading. > > Maybe the ->enable() path can store a copy of the hook ptr in > struct bpf_nf_link to get rid of the odd rcu_dereference()? > > Open to other ideas too -- would appreciate any hints. I would suggest the following: - Switch ordering of patches 2 and 3. What is currently patch 3 would add the .owner fields only. Then, what is currently patch #2 would document the rcu/modref interaction like this (omitting error checking for brevity): rcu_read_lock(); v6_hook = rcu_dereference(nf_defrag_v6_hook); if (!v6_hook) { rcu_read_unlock(); err = request_module("nf_defrag_ipv6"); if (err) return err < 0 ? err : -EINVAL; rcu_read_lock(); v6_hook = rcu_dereference(nf_defrag_v6_hook); } if (v6_hook && try_module_get(v6_hook->owner)) v6_hook = rcu_pointer_handoff(v6_hook); else v6_hook = NULL; rcu_read_unlock(); if (!v6_hook) err(); v6_hook->enable(); I'd store the v4/6_hook pointer in the nf bpf link struct, its probably more self-explanatory for the disable side in that we did pick up a module reference that we still own at delete time, without need for any rcu involvement. Because above handoff is repetitive for ipv4 and ipv6, I suggest to add an agnostic helper for this. I know you added distinct structures for ipv4 and ipv6 but if they would use the same one you could add static const struct nf_defrag_hook *get_proto_frag_hook(const struct nf_defrag_hook __rcu *hook, const char *modulename); And then use it like: v4_hook = get_proto_frag_hook(nf_defrag_v4_hook, "nf_defrag_ipv4"); Without a need to copy the modprobe and handoff part. What do you think?