From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43C73C77B70 for ; Fri, 14 Apr 2023 08:12:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229543AbjDNIMY (ORCPT ); Fri, 14 Apr 2023 04:12:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229879AbjDNIMW (ORCPT ); Fri, 14 Apr 2023 04:12:22 -0400 Received: from mail.netfilter.org (mail.netfilter.org [217.70.188.207]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3FFAF2100 for ; Fri, 14 Apr 2023 01:12:18 -0700 (PDT) Date: Fri, 14 Apr 2023 10:12:14 +0200 From: Pablo Neira Ayuso To: Tzung-Bi Shih Cc: kadlec@netfilter.org, fw@strlen.de, netfilter-devel@vger.kernel.org, coreteam@netfilter.org, jiejiang@chromium.org, jasongustaman@chromium.org, garrick@chromium.org Subject: Re: [PATCH] netfilter: conntrack: fix wrong ct->timeout value Message-ID: References: <20230410060935.253503-1-tzungbi@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org On Fri, Apr 14, 2023 at 11:52:02AM +0800, Tzung-Bi Shih wrote: > On Thu, Apr 13, 2023 at 11:04:24AM +0200, Pablo Neira Ayuso wrote: > > On Thu, Apr 13, 2023 at 11:23:11AM +0800, Tzung-Bi Shih wrote: > > > nf_ct_expires() got called by userspace program. And the return > > > value (which means the remaining timeout) will be the parameter for > > > the next ctnetlink_change_timeout(). > > > > Unconfirmed conntrack is owned by the packet that refers to it, it is > > not yet in the hashes. I don't see how concurrent access to the > > timeout might occur. > > > > Or are you referring to a different scenario that triggers the partial > > state? > > I think it isn't a concurrent access. We observed that userspace program > reads the remaining timeout and sets it back for unconfirmed conntrack. > > Conceptually, it looks like: > timeout = nf_ct_expires(...); (1) > ctnetlink_change_timeout(...timeout); How could this possibly happen? nf_ct_expires() is called from: - ctnetlink_dump_timeout(), this is netlink dump path, since: commit 1397af5bfd7d32b0cf2adb70a78c9a9e8f11d912 Author: Florian Westphal Date: Mon Apr 11 13:01:18 2022 +0200 netfilter: conntrack: remove the percpu dying list it is not possible to do any listing of unconfirmed, and that is fine. As I said, nfnetlink_queue operates with an unconfirmed conntrack with owns exclusively, which is not in the hashes. - nf_ct_expires() calls from xt and nft matches are irrelevant: net/netfilter/nft_ct.c: *dest = jiffies_to_msecs(nf_ct_expires(ct)); net/netfilter/xt_conntrack.c: unsigned long expires = nf_ct_expires(ct) / HZ; - there is the garbage collector, but that only works with conntrack entries in the hashes: net/netfilter/nf_conntrack_core.c: expires = clamp(nf_ct_expires(tmp), GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_CLAMP); - there is also /proc interface, but that only works with conntrack entries in the hashes: net/netfilter/nf_conntrack_standalone.c: seq_printf(s, "%ld ", nf_ct_expires(ct) / HZ); > At (1), `nfct_time_stamp` is wrongly involved in the calculation[4] because > the conntrack is unconfirmed. The `ct->timeout` is an internal but not a > timestamp. I can see there are two possible states for ct->timeout, thanks for explaining this again. > As a result, ctnetlink_change_timeout() sets the wrong value to `ct->timeout`. > > [4]: https://elixir.bootlin.com/linux/v6.3-rc6/source/include/net/netfilter/nf_conntrack.h#L296 > > > > As you can see in [4], if this happens on an unconfirmed conntrack, the > > > `nfct_time_stamp` would be wrongly invoved in the calculation again. > > > That's why we take care of all `ct->timeout` accesses in v1. > > Again, that's why v1 separates all `ct->timeout` accesses. > > If the conntrack is confirmed: > - `ct->timeout` is a timestamp. > - `nfct_time_stamp` should be involved when getting/setting `ct->timeout`. > > If the conntrack is unconfirmed: > - `ct->timeout` is an interval. > - `nfct_time_stamp` shouldn't be involved. > > Only separate `ct->timeout` access in __nf_ct_set_timeout() is obviously > insufficient. I would suggest either take care of all `ct->timeout` > accesses as v1 or at least change both __nf_ct_set_timeout() and > nf_ct_expires(). it does not even make sense to use WRITE_ONCE from __nf_conntrack_confirm(), see: https://elixir.bootlin.com/linux/latest/source/net/netfilter/nf_conntrack_core.c#L1260 because the unconfirmed conntrack object is owned exclusively by the packet. > In the latter case, it looks fine in our environment. However, I'm not > sure if any hidden cases we haven't seen. Maybe you have an old kernel?