From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_NEOMUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37957C282C8 for ; Mon, 28 Jan 2019 06:21:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 08B402147A for ; Mon, 28 Jan 2019 06:21:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726630AbfA1GVX (ORCPT ); Mon, 28 Jan 2019 01:21:23 -0500 Received: from Chamillionaire.breakpoint.cc ([146.0.238.67]:45470 "EHLO Chamillionaire.breakpoint.cc" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726149AbfA1GVX (ORCPT ); Mon, 28 Jan 2019 01:21:23 -0500 Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.89) (envelope-from ) id 1go0IL-000179-2p; Mon, 28 Jan 2019 07:21:21 +0100 Date: Mon, 28 Jan 2019 07:21:21 +0100 From: Florian Westphal To: Niklas =?iso-8859-15?Q?Hamb=FCchen?= Cc: netdev@vger.kernel.org Subject: Re: Packets being dropped somewhere in the kernel, between iptables and packet capture layers Message-ID: <20190128062121.fbksn2vdzpthwfkh@breakpoint.cc> References: <19e1b7a2-00b2-3656-309c-0586e990007b@nh2.me> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <19e1b7a2-00b2-3656-309c-0586e990007b@nh2.me> User-Agent: NeoMutt/20170113 (1.7.2) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Niklas Hambüchen wrote: > I'm sending this to netdev@vger.kernel.org even though http://vger.kernel.org/lkml/ still suggests linux-net@vger.kernel.org, because the latter seems to be inactive since 2011 and full of spam, and I got "unresolvable address" for it. Perhaps somebody should update the page that recommends it. > Nevertheless, please let me know if here is the wrong place. This problem is known; I asked for test feedback on this patch but never got a response: netfilter: nf_nat: return the same reply tuple for matching CTs It is possible that two concurrent packets originating from the same socket of a connection-less protocol (e.g. UDP) can end up having different IP_CT_DIR_REPLY tuples which results in one of the packets being dropped. To illustrate this, consider the following simplified scenario: 1. No DNAT/SNAT/MASQUEARADE rules are installed, but the nf_nat module is loaded. 2. Packet A and B are sent at the same time from two different threads via the same UDP socket which hasn't been used before (=no CT has been created before). Both packets have the same IP_CT_DIR_ORIGINAL tuple. 3. CT of A has been created and confirmed, afterwards get_unique_tuple is called for B. Because IP_CT_DIR_REPLY tuple (the inverse of the IP_CT_DIR_ORIGINAL tuple) is already taken by the A's confirmed CT (nf_nat_used_tuple finds it), get_unique_tuple calls UDP's unique_tuple which returns a different IP_CT_DIR_REPLY tuple (usually with src port = 1024) 4. B's CT cannot get confirmed in __nf_conntrack_confirm due to the found IP_CT_DIR_ORIGINAL tuple of A and the different IP_CT_DIR_REPLY tuples, thus the packet B gets dropped. This patch modifies nf_conntrack_tuple_taken so it doesn't consider colliding reply tuples if the IP_CT_DIR_ORIGINAL tuples are equal. Then, at insert time, either clash resolution is possible (new packet has the existing/older conntrack assigned to it), or it has to be dropped. diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 741b533148ba..07847a612adf 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1007,6 +1007,22 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple, } if (nf_ct_key_equal(h, tuple, zone, net)) { + /* If the origin tuples are identical, we can ignore + * this clashing entry: they refer to the same flow. + * Do not apply nat clash resolution in this case and + * let nf_ct_resolve_clash() deal with this. + * + * This can happen with UDP in particular, e.g. when + * more than one packet is sent from same socket in + * different threads. + * + * We would now mangle our entry and would then have to + * discard it at conntrack confirm time. + */ + if (nf_ct_tuple_equal(&ignored_conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple, + &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple)) + continue; + NF_CT_STAT_INC_ATOMIC(net, found); rcu_read_unlock(); return 1;