netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eugene Crosser <crosser@average.org>
To: nicolas.dichtel@6wind.com, netdev@vger.kernel.org
Cc: "netfilter-devel@vger.kernel.org"
	<netfilter-devel@vger.kernel.org>,
	David Ahern <dsahern@kernel.org>, Florian Westphal <fw@strlen.de>,
	Pablo Neira Ayuso <pablo@netfilter.org>
Subject: Re: When routed to VRF, NF _output_ hook is run unexpectedly
Date: Tue, 24 Jun 2025 17:27:23 +0200	[thread overview]
Message-ID: <c5909e04-35c7-4775-bd17-e17115037792@average.org> (raw)
In-Reply-To: <ed8f88e7-103a-403b-83ed-c40153e9bef0@6wind.com>


[-- Attachment #1.1: Type: text/plain, Size: 2857 bytes --]

On 20/06/2025 18:20, Nicolas Dichtel wrote:

>>>> It is possible, and very useful, to implement "two-stage routing" by
>>>> installing a route that points to a VRF device:
>>>>
>>>>     ip link add vrfNNN type vrf table NNN
>>>>     ...
>>>>     ip route add xxxxx/yy dev vrfNNN
>>>>
>>>> however this causes surprising behaviour with relation to netfilter
>>>> hooks. Namely, packets taking such path traverse _output_ nftables
>>>> chain, with conntracking information reset. So, for example, even
>>>> when "notrack" has been set in the prerouting chain, conntrack entries
>>>> will still be created. Script attached below demonstrates this behaviour.
>>> You can have a look to this commit to better understand this:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8c9c296adfae9
>>
>> I've seen this commit.
>> My point is that the packets are _not locally generated_ in this case,
>> so it seems wrong to pass them to the _output_ hook, doesn't it?
> They are, from the POV of the vrf. The first route sends packets to the vrf
> device, which acts like a loopback.

I see, this explains the behaviour that I observe.
I believe that there are two problems here though:

1. This behaviour is _surprising_. Packets are not really "locally
generated", they come from "outside", but treated as is they were
locally generated. In my view, it deserves an section in
Documentation/networking/vrf.rst (see suggestion below).

2. Using "output" hook makes it impossible(?) to define different
nftables rules depending on what vrf was used for routing (because iif
is not accessible in the "output" chain). For example, traffic from
different tenants, that is routed via different VRFs but egress over the
same uplink interface, cannot be assigned different zones. Conntrack
entries of different tenants will be mixed. As another example, one
cannot disable conntracking of tenant's traffic while continuing to
track "true output" traffic from he processes running on the host.

Thanks for consideration,

Eugene

========================
Suggested update to the documentation:

diff --git a/Documentation/networking/vrf.rst
b/Documentation/networking/vrf.rst
index 0a9a6f968cb9..74c6a69355df 100644
--- a/Documentation/networking/vrf.rst
+++ b/Documentation/networking/vrf.rst
@@ -61,6 +61,11 @@ domain as a whole.
        the VRF device. For egress POSTROUTING and OUTPUT rules can be
written
        using either the VRF device or real egress device.

+.. [3] When a packet is forwarded to a VRF interface, it gets further
+       routed according to the route table associated with the VRF, but
+       processed by the "output" netfilter hook instead of "forwarding"
+       hook.
+
 Setup
 -----
 1. VRF device is created with an association to a FIB table.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2025-06-24 15:27 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-20 13:38 When routed to VRF, NF _output_ hook is run unexpectedly Eugene Crosser
2025-06-20 14:56 ` Nicolas Dichtel
2025-06-20 16:04   ` Eugene Crosser
2025-06-20 16:20     ` Nicolas Dichtel
2025-06-24 15:27       ` Eugene Crosser [this message]
2025-08-06  9:00         ` Nicolas Dichtel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c5909e04-35c7-4775-bd17-e17115037792@average.org \
    --to=crosser@average.org \
    --cc=dsahern@kernel.org \
    --cc=fw@strlen.de \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=nicolas.dichtel@6wind.com \
    --cc=pablo@netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).