netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eugene Crosser <crosser@average.org>
To: netdev@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org,
	Lahav Schlesinger <lschlesinger@drivenets.com>,
	David Ahern <dsahern@kernel.org>
Subject: Commit 09e856d54bda5f288ef8437a90ab2b9b3eab83d1r "vrf: Reset skb conntrack connection on VRF rcv" breaks expected netfilter behaviour
Date: Tue, 12 Oct 2021 15:28:39 +0200	[thread overview]
Message-ID: <bca5dcab-ef6b-8711-7f99-8d86e79d76eb@average.org> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 4593 bytes --]

Hello all,


Commit mentioned in the subject was intended to avoid creation of stray
conntrack entries when input interface is enslaved in a VRF, and thus
prerouting conntrack hook is called twice: once in the context of the
original input interface, and once in the context of the VRF interface.
Solution was to nuke conntrack related data associated with the skb when
it enters VRF context.



However this breaks netfilter operation. Imagine a use case when
conntrack zone must be assigned based on the (original, "real") input
interface, rather than VRF interface (that can enslave multiple "real"
interfaces, that would become indistinguishable). One could create
netfilter rules similar to these:



        chain rawprerouting {

                type filter hook prerouting priority raw;

                iif realiface1 ct zone set 1 return

                iif realiface2 ct zone set 2 return

        }



This works before the mentioned commit, but not after: zone assignment
is "forgotten", and any subsequent NAT or filtering that is dependent on
the conntrack zone does not work.



There is a reproducer script at the bottom of this message that
demonstrates the difference in behaviour.



Maybe a better solution for stray conntrack entries would be to
introduce finer control in netfilter? One possible idea would be to
implement both "track" and "notrack" targets; then a working
configuration would look like this:



        chain rawprerouting {

                type filter hook prerouting priority raw;

                iif realiface1 ct zone set 1 notrack

                iif realiface2 ct zone set 2 notrack

                iif vrfmaster track

        }



so in the original input interface context, zone is assigned, but
conntrack processing itself is shortcircuited. When the packet enters
VRF context, conntracking is reenabled, so one entry is created, in the
zone assigned at an earlier stage.



This is just an idea, I don't have enough knowledge to judge how
workable is it.



For reference, this is a thread about the issue in netfilter-devel:
https://marc.info/?t=163310182600001&r=1&w=2



Thank you,



Eugene



==========

#!/bin/sh



# This script demonstrates unexpected change of nftables behaviour

# caused by commit 09e856d54bda5f28 ""vrf: Reset skb conntrack

# connection on VRF rcv"

#
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=09e856d54bda5f288ef8437a90ab2b9b3eab83d1

#

# Before the commit, it was possible to assign conntrack zone to a

# packet (or mark it for `notracking`) in the prerouting chanin, raw

# priority, based on the `iif` (interface from which the packet

# arrived).

# After the change, # if the interface is enslaved in a VRF, such

# assignment is lost. Instead, assignment based on the `iif` matching

# the VRF master interface is honored. Thus it is impossible to

# distinguish packets based on the original interface.

#

# This script demonstrates this change of behaviour: conntrack zone 1

# or 2 is assigned depending on the match with the original interface

# or the vrf master interface. It can be observed that conntrack entry

# appears in different zone in the kernel versions before and after

# the commit. Additionaly, the script produces netfilter trace files

# that can be used for debugging the issue.



IPIN=172.30.30.1

IPOUT=172.30.30.2

PFXL=30



ip li sh vein >/dev/null 2>&1 && ip li del vein

ip li sh tvrf >/dev/null 2>&1 && ip li del tvrf

nft list table testct >/dev/null 2>&1 && nft delete table testct



ip li add vein type veth peer veout

ip li add tvrf type vrf table 9876

ip li set veout master tvrf

ip li set vein up

ip li set veout up

ip li set tvrf up

/sbin/sysctl -w net.ipv4.conf.veout.accept_local=1

/sbin/sysctl -w net.ipv4.conf.veout.rp_filter=0

ip addr add $IPIN/$PFXL dev vein

ip addr add $IPOUT/$PFXL dev veout



nft -f - <<__END__

table testct {

	chain rawpre {

		type filter hook prerouting priority raw;

		iif { veout, tvrf } meta nftrace set 1

		iif veout ct zone set 1 return

		iif tvrf ct zone set 2 return

		notrack

	}

	chain rawout {

		type filter hook output priority raw;

		notrack

	}

}

__END__



uname -rv

conntrack -F

stdbuf -o0 nft monitor trace >nftrace.`uname -r`.txt &

monpid=$!

ping -W 1 -c 1 -I vein $IPOUT

conntrack -L

sleep 1

kill -15 $monpid

wait


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

             reply	other threads:[~2021-10-12 13:28 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-12 13:28 Eugene Crosser [this message]
2021-10-13  9:22 ` Commit 09e856d54bda5f288ef8437a90ab2b9b3eab83d1r "vrf: Reset skb conntrack connection on VRF rcv" breaks expected netfilter behaviour Florian Westphal
2021-10-15 21:04   ` Florian Westphal
2021-10-16 18:51     ` David Ahern
2021-10-18 14:34       ` Florian Westphal
2021-10-18 18:14         ` David Ahern
2021-10-19 11:49           ` Florian Westphal
2021-10-19 13:21             ` Eugene Crosser
2021-10-19 14:34             ` David Ahern
2021-10-19 14:46               ` Florian Westphal
2021-10-19 21:41     ` Jakub Kicinski
2021-10-13 12:28 ` Lahav Schlesinger
2021-10-13 12:58   ` Florian Westphal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bca5dcab-ef6b-8711-7f99-8d86e79d76eb@average.org \
    --to=crosser@average.org \
    --cc=dsahern@kernel.org \
    --cc=lschlesinger@drivenets.com \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).