From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from ganesha.gnumonks.org (ganesha.gnumonks.org [213.95.27.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B648518732B for ; Wed, 11 Dec 2024 16:16:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.95.27.120 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733933780; cv=none; b=VWtSPjbQkyrDG4sob/YK7IzdaKG9Z0yb/HkpVKakYUPYgxcSRQOv5INmHYizloFHDMLy6kZ0yh6K4N9UVYcdV5m84cO/1970CV0SD60XV/seNHn30lrPbIfGTwbMt6haLXiPoVfkU4W8hIq9jpvhZqqSftwGZ2Qn4bmdK1TLmYY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733933780; c=relaxed/simple; bh=Qp8KKsLc0BpWvgZ0ofanG8CabL9otiGQurPseJDnJ2I=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ELCM1HKt8yjbO7/fmOX68YunKZIvb3qFXq6qxLTMMWXAoBuLZfJpcgYepzhe40JnVDNkzXpLTxG9loRh5ecc5TKTuJ4C5djo9W/sCUj81VjWlz8M0u3CQo73RYv+j00dMa5t2vlune2UVocB/RnQgBlhMI32pgesa69z3WJ7wnM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=netfilter.org; spf=pass smtp.mailfrom=gnumonks.org; arc=none smtp.client-ip=213.95.27.120 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=netfilter.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gnumonks.org Received: from [78.30.39.247] (port=51988 helo=gnumonks.org) by ganesha.gnumonks.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tLPNk-00Dk60-05; Wed, 11 Dec 2024 17:16:14 +0100 Date: Wed, 11 Dec 2024 17:16:11 +0100 From: Pablo Neira Ayuso To: Florian Westphal Cc: netfilter-devel , syzkaller-bugs@googlegroups.com, syzbot+b26935466701e56cfdc2@syzkaller.appspotmail.com Subject: Re: [PATCH nf] netfilter: nf_tables: do not defer rule destruction via call_rcu Message-ID: References: <67478d92.050a0220.253251.0062.GAE@google.com> <20241207111459.7191-1-fw@strlen.de> <20241209220401.GA4709@breakpoint.cc> Precedence: bulk X-Mailing-List: netfilter-devel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20241209220401.GA4709@breakpoint.cc> X-Spam-Score: -1.9 (-) Hi Florian On Mon, Dec 09, 2024 at 11:04:01PM +0100, Florian Westphal wrote: > Pablo Neira Ayuso wrote: > > > One way would be to allocate nft_trans_rule objects + one nft_trans_chain > > > object, deactivate the rules + the chain and then defer the freeing to the > > > nft destroy workqueue. We'd still need to keep the synchronize_rcu path as > > > a fallback to handle -ENOMEM corner cases though. > > > > I think it can be done _without_ nft_trans objects. > > > > Since the commit mutex is held in this netdev event path: Remove this > > basechain, deactivate rules and add basechain to global list protected > > with spinlock, it invokes worker. Then, worker zaps this list > > basechains, it calls synchronize_rcu() and it destroys rules and then > > basechain. No memory allocations needed in this case? > > I don't think its possible due to netlink dumps. > > Its safe for normal commit path because list_del_rcu() gets called on > these objects (making them unreachable for new dumps), then, > synchronize_rcu is called (so all concurrent dumpers are done) and then > we free. > > It would work if you add a new list_head to the basechain, then you > could just link the basechain object to some other list and then > call nf_tables_rule_release() AFTER a synchronize_rcu because rules > can't be picked up if the owning chain is no longer reachable. > > However, I do wonder how complex it would be. I started a patch, spent several hours in a row, but I need more time to get this right. There is also Phil making the line to remove this in nf-next. > This is tricky to get right and I'm not sure this patch adds a > noticeable issue, see nft_rcv_nl_event() which has similar pattern, > i.e. synchronize_rcu() is called. I can see veth hits synchronize_rcu(), this can be called from default_device_exit_batch() which calls unregister_netdevice_many(). I need time, but it is difficult. We have to deliver a pull request very late on wednesday so it can be taken on thursday. Now the week has three days to fix and test stuff, so upstream maintainers have a day to tidy up and submit upstream. > I'm not saying we should not consider async route, but maybe better > to followup in -next? Yes, I can follow up in -next, but then Phil just remove this again in -next. I can just take this fix to address the existing situation which is worse than before. Before this patch UAF was possible, although I never manage to trigger it. Now crash is easy to trigger, I spent too much time looking at the interaction with layers and I forgot basic assumption regarding nf_tables in this patch.