From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Borkmann Subject: Re: [PATCH nf-next] netfilter: conntrack: add support for flextuples Date: Mon, 04 May 2015 15:51:37 +0200 Message-ID: <55477969.4000407@iogearbox.net> References: <776b8819c85c83088478b933a35691133055347a.1430733932.git.daniel@iogearbox.net> <20150504103451.GA12200@salvia> <55475F13.1000304@iogearbox.net> <20150504130828.GA3607@salvia> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: netfilter-devel@vger.kernel.org, Thomas Graf , Madhu Challa To: Pablo Neira Ayuso Return-path: Received: from www62.your-server.de ([213.133.104.62]:58695 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751252AbbEDNvj (ORCPT ); Mon, 4 May 2015 09:51:39 -0400 In-Reply-To: <20150504130828.GA3607@salvia> Sender: netfilter-devel-owner@vger.kernel.org List-ID: On 05/04/2015 03:08 PM, Pablo Neira Ayuso wrote: > On Mon, May 04, 2015 at 01:59:15PM +0200, Daniel Borkmann wrote: >> Hi Pablo, >> >> On 05/04/2015 12:34 PM, Pablo Neira Ayuso wrote: >>> On Mon, May 04, 2015 at 12:23:41PM +0200, Daniel Borkmann wrote: >>>> This patch adds support for the possibility of doing NAT with >>>> conflicting IP address/ports tuples from multiple, isolated >>>> tenants, represented as network namespaces and netfilter zones. >>>> For such internal VRFs, traffic is directed to a single or shared >>>> pool of public IP address/port range for the external/public VRF. >>>> >>>> Or in other words, this allows for doing NAT *between* VRFs >>>> instead of *inside* VRFs without requiring each tenant to NAT >>>> twice or to use its own dedicated IP address to SNAT to, also >>>> with the side effect to not requiring to expose a unique marker >>>> per tenant in the data center to the public. >>>> >>>> Simplified example scheme: >>>> >>>> +--- VRF A ---+ +--- CT Zone 1 --------+ >>>> | 10.1.1.1/8 +--+ 10.1.1.1 ESTABLISHED | >>>> +-------------+ +--+-------------------+ >>>> | >>>> +--+--+ >>>> | L3 +-SNAT-[20.1.1.1:20000-40000]--eth0 >>>> +--+--+ >>>> | >>>> +-- VRF B ----+ +--- CT Zone 2 --------+ >>>> | 10.1.1.1/8 +--+ 10.1.1.1 ESTABLISHED | >>>> +-------------+ +----------------------+ >>> >>> So, it's the skb->mark that survives between the containers. I'm not >>> sure it makes sense to keep a zone 0 from the container that performs >>> SNAT. Instead, we can probably restore the zone based on the >>> skb->mark. The problem is that the existing zone is u16. In nftables, >>> Patrick already mentioned about supporting casting so we can do >>> something like: >>> >>> ct zone set (u16)meta mark >>> >>> So you can reserve a part of the skb->mark to map it to the zone. I'm >>> not very convinced about this. >> >> Thanks for the feedback! I'm not yet sure though, I understood the >> above suggestion to the described problem fully so far, i.e. how >> would replies on the SNAT find the correct zone again? > > From the original direction, you can set the zone based on the mark: > > -m mark --mark 1 -j CT --zone 1 > > Then, from the reply direction, you can restore it: > > -m conntrack --ctzone 1 -j MARK --set-mark 1 > ... > > --ctzone is not supported though, it would need a new revision for the > conntrack match. Ok, thanks a lot, now I see what you mean. If I'm not missing something, I would see two problems with that: the first would be that the zone match would be linear, f.e. if we support 100 or more zones, we would need to walk through the rules linearly until we find --mark 100, right? The other issue is that from reply direction (when the packet comes in with the translated addr), we couldn't match in the connection tracking table on the correct zone. The above restore rule would assume that the match itself already has taken place and was successfully, no? (That is actually why we are direction based: --flextuple ORIGINAL|REPLY.) >> Our issue simplified, basically boils down to: given are two zones, >> both use IP address , both zones want to talk to IP address in >> a third zone. To let those two with talk to , connections are >> being routed + SNATed from a non-unique to a unique address/port >> tuple [which the proposed approach solves], so they can talk to .