nft manpage/wiki issues and improvement ideas

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Christoph Anton Mitterer <calestyo@scientia.org>
To: netfilter-devel@vger.kernel.org
Subject: nft manpage/wiki issues and improvement ideas
Date: Thu, 25 Sep 2025 02:07:56 +0200	[thread overview]
Message-ID: <6bb455009ebd3a2fe17581dfa74addc9186f33ea.camel@scientia.org> (raw)

Hey.


I recently started migrating all my iptables config to nftables (better
late than never :-P)... and along reading through all the nftables.org
wiki pages and most of the manpage I've noticed all kinds of
documentation issues or things that might be improved...

But since I'm all but an expert (I merely have to do my netfilter
config at some university science cluster), I'm not really sure whether
I could give definite answers, so... a far too long O:-) list of things
that could be visited by some expert.


1) Non-documentation issue, could however be a downstream bug:
      # nft describe icmpv6 code
      payload expression, datatype icmpv6_code (icmpv6 code) (basetype integer), 8 bits
      # nft describe icmp code
      payload expression, datatype icmp_code (icmp code) (basetype integer), 8 bits
   
   produce no (code) output as of at least v1.1.5.
   That still worked in older versions.


In the manpage:


2) Section CHAINS
   
>    The priority parameter accepts a signed integer value or a standard
>    priority name which specifies the order in which chains with the same
>    hook value are traversed.
   
   IMO it would be helpful if something like "the same hook REGARDLESS OF
   THEIR TABLE" would be added.
   Maybe even elaborating a bit more that tables, AFAIU, aren't really
   seen by netfilter at all and have no impact on any processing.

>    The ordering is ascending, i.e. lower priority values have precedence
>    over higher ones.
      
   A bit ambiguous, IMO it would be better to say "chains with lower
   priority values are processed first".
   "Precedence" could be easily interpreted as "the verdict of such chains
   winning", but AFAIU that's only the case if the verdict is drop, not if
   accept.


3) Section VERDICT STATEMENT
>    accept and drop are absolute verdicts — they terminate ruleset
>    evaluation immediately.
   
   and
   
> accept
> Terminate ruleset evaluation and accept the packet. The packet can
> still be dropped later by another hook, for instance accept in the
> forward hook still allows one to drop the packet later in the
> postrouting hook, or another forward base chain that has a higher
> priority number and is evaluated afterwards in the processing
> pipeline.
   
   Seem contradicting and misleading.
   
   "ruleset" is previously used as the whole set of all rules in all
   chains + all set definitions.
   The first paragraph says they'd end all evaluation of that immediately.
   The 2nd says... no no.. other hooks can still change.
   
   What I think the first paragraph wants to say is:
   
   accept and deny terminate *even* the evaluation of the rule like in:
       ip daddr 1.1.1.1 drop counter
   counter wouldn't be executed (though many examples seem to use
   comment after the verdict... not sure about that).
   
   It also doesn't explain whether reject is also behaving like drop wrt
   evaluation (so one must assume at that point: no), like in:
       ip daddr 1.1.1.1 reject counter
   
   
   And with respect to how chain processing is affected by the verdicts
   (AFAIU):
   - drop, regardless in which chain, as soon as it is encountered will
     truly drop the packet.
     No later chain (be it at the same hook with a higher priority, or
     at another hook) can change that.
     There is also no returning from regular chains back to their callers.
   - accept, merely accepts the packet with respect to the current
     call stack of chains.
     Another base chain (or regular called from that) at the same hook
     but of higher priority OR at another hook could still
     drop(/reject?) (but not accept) it.
   
   The 2nd paragraph rather confusingly (why mentioning the forward
   hook?!) explains the one case... but that even a chain of the SAME hook
   but with higher priority could still turn the accept to drop... is only
   with much phantasy in that text.
   
   - Again, no word about whether reject works here like drop.
     I think it does, i.e. the reject of a chain would override another
     chain's allow

   The description of drop does a better job.

> jump CHAIN
> Continue evaluation at the first rule in CHAIN. The current position
> in the ruleset is pushed to a call stack and evaluation will continue
> there when the new chain is entirely evaluated or a return verdict is
> issued. In case an absolute verdict is issued by a rule in the chain,
> ruleset evaluation terminates immediately and the specific action is
> taken.
   
   I don't think it makes sense for documentation to tell about pushing to
   call stack.
   A mere: at the end of the chain, or if a return verdict is found,
   processing resumes right after the rule which caused the jump.
   ?
   
   Again the wording that an absolute verdict terminates the (whole)
   ruleset evaluation is IMO misleading. Only a a drop(/reject?) would do
   so. An accept however would only end the evaluation of the call stack
   of chains from the current base chain.
   Not that of other base chains at the same hook with higher prio, or
   that of other hooks.
   
> goto CHAIN
> Similar to jump, but the current position is not pushed to the call
> stack, meaning that after the new chain evaluation will continue at
> the last chain instead of the one containing the goto statement.
         
   Maybe I misunderstood something, but that seems wrong.
   
   AFAIU
   (https://wiki.nftables.org/wiki-nftables/index.php/Jumping_to_chain see
   jump vs goto), goto does *not* return, but simply uses the policy of
   the base chain (not of the regular chain, which has no policy).


4) Neither the wiki nor the manpage seems to have a section which
   briefly describes how tables/chains/rules are actually processed.
   It's all rather widely dispersed over many pages/sections and
   difficult to grasp, especially since some documentation seems plain
   wrong and misleading.

AFAIU it works as follows:

- technically (in the sense how the actual evaluation is done) tables
  don't matter at all
- packets traverse the network stack and at various hooks they're
  evaluated by the chains attached to that hook
  and even after netfilter they might still get reject (e.g. by things
  like rp_filter, or when icmp.c simply discards certain ICMP types
- a drop/reject verdict (including a drop that results from chain
  policy) actually drops the package and stops any further evaluation
  of:
  - the current chains
    if a regular, also the ones up to the base chain that called it
  - of other chains (in particular of higher priority) at the same hook
    (regardless of their table)
  - of other chains (of any priority) at other (in particular: later)
    hooks
    (regardless of their table)
  => Thus if any base-chain uses drop as policy, this chain must either
     accept the package, or it will be (overall) dropped (as other base
    chains cannot override the drop from the policy of that chain).
- an accept verdict (including an accept that results from chain
  policy) *only* accepts the package from the current chain's point of
  view (that is: the current regular chain up to the base chain from
  which it was called or, if no regular chain, the current base chain).
  - chains (regardless from which table) of higher priority at the same
    hook as well as
  - chains (of any priority and regardless from which table) of later
    hooks
  all may still deny/reject the package, in which case it would be
  dropped/rejected as described above at drop/reject verdict
  => Thus a package is only actually accepted (from netfilter's PoV),
     if  none of the chains (regardless of their table) from all of the
     relevant hooks does anything other than accept (be it via verdict,
     policy or implicit policy default).
  => Thus the ordering of different call stacks of base chains via
     priorities, doesn't change whether a packet gets
     dropped/rejected/accepted, *unless* the package is modified or
     things like marks are set, which would change the matching of
     rules in other chains
- any terminating verdict (drop/reject/accept...TODO: also goto/jump?)
  also end evaluation of the current rule, that is:
     ip daddr 1.1.1.1 accept counter
  causes counter to be ignored other than in:
     ip daddr 1.1.1.1 counter accept
  TODO: also the case with comment?
- jump C
  - continues evaluation at the first rule of C
  - a accept/drop/reject verdict in C via rule causes evaluation of
the    call stack of chains to end and thus there will be no implicit
    return to the calling chain
  - a return verdict in C, causes to continue the evaluation in the
    calling chain after the rule that caused the jump
  - reaching the end of rules in C, causes an implicit return
- goto C
  - continues evaluation at the first rule of C
  - a accept/drop/reject verdict in C via rule causes evaluation of the
    call stack of chains to end and thus there will be no implicit
    return to the calling chain
  - reaching the end of rules in C, causes the policy of the original
    base chain to be used.
  TODO: What I haven't checked now, but also seems not documented:
      - Can one use return in a chain to which one got via goto and if
        so, what happens?
      - Can on jump/goto to other base chains?
        And if so, which the policy of which base-chain would be used
        when reaching the end of a regular chain one entered via goto?


5) Quite some syntax seems completely undocumented... e.g. what
   operators one can use with tcp_flag and what "," means with
   bitfields like in ct state.
   Also the syntax introduced in
   https://git.netfilter.org/nftables/commit/?id=c3d57114f119b89ec0caa0b4dfa8527826a38792


6) It doesn't seem to be documented how exactly the sorting is done
   when including files (which may be quite important).
   As far as I could see in the code, the wildcards are done via glob()
   an since setlocale() doesn't seem to be handled throughout the code,
   it seems to be the collation order of the C locale (which would of
   course break, should localisation ever be added).


In the Wiki:


7) https://wiki.nftables.org/wiki-nftables/index.php/Configuring_chains
> NOTE: If a packet is accepted and there is another chain, bearing the
> same hook type and with a later priority, then the packet will
> subsequently traverse this other chain. Hence, an accept verdict - be
> it by way of a rule or the default chain policy - isn't necessarily
> final. However, the same is not true of packets that are subjected to
> a drop verdict. Instead, drops take immediate effect, with no further
> rules or chains being evaluated. 
   
   Looks mostly right to me, but misses the point that chains of other
   later hooks can also still drop/reject the package.
   Also, misses whether or not rejects are like drops here.
   
>    In summary, packets will traverse all of the chains within the scope
>    of a given hook until they are either dropped or no more base chains
>    exist. An accept verdict is only guaranteed to be final in the case
>    that there is no later chain bearing the same type of hook as the
>    chain that the packet originally entered.
   
   In principle right, but misses the point that a later hook (and its
   chains) may still drop/reject the package.


8) https://wiki.nftables.org/wiki-nftables/index.php/Sets
   Claims that the max length of set names is 16... but I created way
   longer ones (which seemed to work... and it's really good to be able to
   :-) ).

   Also in there is chapter 2.1, which is part of chapter 2 named sets.
   Not sure why 2.1 is in there, because its main new information is $VAR,
   which is however, AFAIU, not a set.
   In particular, what the example uses with:
>    tcp dport { http, https } ip saddr $CDN accept

   is an anonymous set, not a named one (and we're still in the chapter of
   named ones).


   The really interesting thing, namely "sets referencing other sets" like
   in:
>    define CDN = {
>        $CDN_EDGE,
>        $CDN_MONITORS
>    }

   I may be wrong, but these seem to be rather mere string operations
   ultimately causing an anonymous set, right? One can e.g. also do:
>    elements={{{1.1.1.1, 1.1.2.2}, 2.2.2.2}, 3.3.3.3 }
   and it will simply remove any inner { }.
   I think this should somehow be mentioned, so that people don't think
   they could do dynamic things like { @setA, @setB}
   
   
9) Perhaps more a question to be sure:
>    A hash sign (#) begins a comment. All following characters on the
>    same line are ignored.
   Is that really meant to imply that end-of-line comments work, i.e.
   > ip dport 1.1.1.1 accept #foo bar baz
   is supported?
   
   I merely ask cause I've seen config parsers (I think it was either
   ssh_config or sshd_confg) which did work with end of line comments but
   were never intended to and it was ultimately removed.
   

10) https://wiki.nftables.org/wiki-nftables/index.php/Atomic_rule_replacement
   - Missing from the manpage.
   - What should IMO also be mentioned is, that if the new ruleset
     contains errors, than despite the ruleset flush, the old rules
     stay in place unmodified... which is quite important.

> What happens when you include 2 files which each have a statement for
> the filter table? If you have two included files both with statements
> for the filter table, but one adds a rule allowing traffic from
> 192.168.1.1 and the other allows traffic from 192.168.1.2 then both
> rules will be included in the chain, even if one or both files
> contains a flush statement.
      
      and
      
> What about flush statements in either, or neither file? If there are
> any flush commands in any included file then those will be run at the
> moment the config swap is executed, not at the moment the file is
> loaded. If you do not include a flush statement in any included file,
> you will get duplicate rules. If you do include a flush statement,
> you will not get duplicate rules and the config from *both* files
> will be included.
      
   Maybe I got something wrong, but this reads as if flush statements in
   the two different files were effectively handled like one.
   
   I tried a bit, and that doesn't seem to be the case. It rather seems as
   if flush statements would be as if they were processed when encountered
   during parsing.
   E.g.
   main.nft:
      #!/usr/sbin/nft -f
      flush ruleset
      table inet filter {
              chain input {
                      type filter hook input priority filter
                      iifname lo accept
              }
      }
      include "included.nft"
   
   
   included.nft:
      #flush ruleset
      table inet filter { 
              chain bla {
                      type filter hook input priority filter
                      ip daddr 1.1.1.1 drop
              }
      }
   
   If I load it like this, I get both chains.
   If however I uncomment the flash in included.nft, I only get the bla
   chain, i.e. input must have been flushed away.


Generally missing (to my best knowledge):

11) ct state {a,b} vs. ct state a,b
    or better said: what "," does in bitfields
    based on Florian Westphal's answer[0] I'd assume that "," in
    bitfields cause the statement to match, if any (or all) of the
    named bits are set.

    Also, from his explanation ct state {a,b} matches only if either a
    (but not b) or b (but not a) is set.
    Not sure about this, but I read the whole wiki and all generic
    parts of the manpage, and I don't think it was ever mentioned that
    matching sets work like this.
    I mean it probably doesn't make a difference for things like
    addresses, port ranges or ICMP types, where one can anyway only
    have always one value,... but for things like bitfiedls it might.


12) is <predicate> <value> generally the same as <predicate> eq <value>
    Like in:
    dport 22
    dport eq 22


13) "Teaching"
    Well, obviously one can't explain everything, but I think for some
    very common uses cases, it would be nice to give advise to users,
    e.g.:
    - If matching the loopback iface, iif, oif should always fine and
      be faster (assuming the ID of lo is guaranteed to be always 1).
      I tried to create further loopbacks or remove it, but that
      generally seems to no longer work.
      Would it be somehow possibly... and would iiftype oiftype be as
      fast as checking the number... then maybe one should suggest
      that?

      At the same time, telling people this isn't safe for their
      eth0/wlan0.
      Yes, there is some note about this in the manpage:
>       This is because internally the interface index is used. In case of
>       dynamically created interfaces, such as tun/tap or dialup interfaces
>       (ppp for example), it might be better to use iifname or oifnam
>       instead.
      But I wouldn't be surprised if may people are not experienced with
      these types of ifaces, and might simply assume their eth/wlan is fine.
      
      At least I found many wikis, blogs, which do use iif/oif for eth/wlan.
      
   - Telling that:
        ct state established,related accept
     (who doesn't have such a rule ;-) )
     is probably a bit faster than:
        ct state {established,related} accept

   Giving some performance guidelines:

   - E.g. I blindly assume that the conntrack state of the packet is
     already available and thus a check like ct state new is super
     fast, and in particular faster than doing
        tcp flags & (syn|ack|fin|rst) == syn
     which in turn may or may not (I don't know) be slower than
        tcp flags syn / syn,ack,fin,rst
     of which there are countless examples to match "new" TCP
     connections.

   - What I quite often see is that people have some base rules and
     then simple port based matches or TCP, UDP... often with a check
     whether the connection is new.
     So one get's lists of:
        ct state new tcp dport ...
        ct state new tcp dport ...
        ct state new tcp dport ...
     Even assuming ct state is fast... (when) would it be better to do e.g.:
        ct state new jump new_queue
     and only in new_queue do the tcp dport, udp dport rules?

   - Does it performance wise make any difference to do e.g.
       ct state new tcp dport 22
     or
       tcp dport 22 ct state new
     respectively some guidelines *which* matching expressions are
     super fast and which are rather slow?

   - Assume e.g. the above case, where one has *many*:
        tcp dport ... <do this>
        tcp dport ... <do that>
        udp dport ... <do this>
        udp dport ... <do that>
     What one could obviously do is.
        meta l4proto tcp jump tcp_conns
        meta l4proto udp jump udp_conns
     and handle the port matching in these regular chains.

     But this gives one basically:
     - one extra expression that checks the type (which tcp/udp
       statements would anyway already do
     - the costs of the jump (and return)

     The question is again: When is it worth it?
     

Thanks and best wishes,
Chris.


[0] https://lore.kernel.org/netfilter/aNPhP63SyX2ofE92@strlen.de/T/#m15841db7bf5bb588483fdd3576d70af7a71f5555

next             reply	other threads:[~2025-09-25  0:16 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-25  0:07 Christoph Anton Mitterer [this message]
2025-09-25  7:35 ` nft manpage/wiki issues and improvement ideas Pablo Neira Ayuso
2025-09-25 20:37   ` Christoph Anton Mitterer
2025-09-26  1:52   ` [PATCH 0/7] doc: miscellaneois improvements Christoph Anton Mitterer
2025-09-26  1:52     ` [PATCH 1/7] doc: clarify evaluation of chains Christoph Anton Mitterer
2025-09-26  1:52     ` [PATCH 2/7] doc: fix/improve documentation of verdicts Christoph Anton Mitterer
2025-09-30 10:50       ` Florian Westphal
2025-10-02 14:50         ` Christoph Anton Mitterer
2025-10-02 15:21           ` Florian Westphal
2025-10-10 23:06             ` Christoph Anton Mitterer
2025-09-26  1:52     ` [PATCH 3/7] doc: minor improvements with respect to the term “ruleset” Christoph Anton Mitterer
2025-09-26  1:52     ` [PATCH 4/7] doc: add overall description of the ruleset evaluation Christoph Anton Mitterer
2025-09-30 11:50       ` Florian Westphal
2025-10-10 23:07         ` Christoph Anton Mitterer
2025-09-26  1:52     ` [PATCH 5/7] doc: add some more documentation on bitmasks Christoph Anton Mitterer
2025-09-30 11:51       ` Florian Westphal
2025-09-30 11:53         ` Florian Westphal
2025-09-26  1:52     ` [PATCH 6/7] doc: describe include’s collation order to be that of the C locale Christoph Anton Mitterer
2025-09-26  1:52     ` [PATCH 7/7] doc: describe how values match sets Christoph Anton Mitterer
2025-09-26  2:32   ` nft manpage/wiki issues and improvement ideas Christoph Anton Mitterer
2025-10-11  0:23 ` [PATCH v2 0/7] doc: miscellaneous improvements Christoph Anton Mitterer
2025-10-11  0:23   ` [PATCH v2 1/7] doc: clarify evaluation of chains Christoph Anton Mitterer
2025-10-15 11:46     ` Florian Westphal
2025-10-11  0:23   ` [PATCH v2 2/7] doc: fix/improve documentation of verdicts Christoph Anton Mitterer
2025-10-15 11:42     ` Florian Westphal
2025-10-17  2:30       ` Christoph Anton Mitterer
2025-10-18 13:25         ` Florian Westphal
2025-10-19  0:11           ` Christoph Anton Mitterer
2025-10-11  0:23   ` [PATCH v2 3/7] doc: minor improvements with respect to the term “ruleset” Christoph Anton Mitterer
2025-10-15 11:51     ` Florian Westphal
2025-10-11  0:24   ` [PATCH v2 4/7] doc: add overall description of the ruleset evaluation Christoph Anton Mitterer
2025-10-20  9:39     ` Florian Westphal
2025-10-20 23:48       ` Christoph Anton Mitterer
2025-10-11  0:24   ` [PATCH v2 5/7] doc: add some more documentation on bitmasks Christoph Anton Mitterer
2025-10-18 13:32     ` Florian Westphal
2025-10-19  1:31       ` Christoph Anton Mitterer
2025-10-11  0:24   ` [PATCH v2 6/7] doc: describe include’s collation order to be that of the C locale Christoph Anton Mitterer
2025-10-18 13:35     ` Florian Westphal
2025-10-18 22:13       ` Christoph Anton Mitterer
2025-10-11  0:24   ` [PATCH v2 7/7] doc: describe how values match sets Christoph Anton Mitterer
2025-10-18 13:51     ` Florian Westphal
2025-10-19  1:50       ` Christoph Anton Mitterer
2025-10-19  1:38 ` [PATCH v3 0/6] doc: miscellaneous improvements Christoph Anton Mitterer
2025-10-19  1:38   ` [PATCH v3 1/6] doc: fix/improve documentation of verdicts Christoph Anton Mitterer
2025-10-20  9:28     ` Florian Westphal
2025-10-20 22:13       ` Christoph Anton Mitterer
2025-10-19  1:38   ` [PATCH v3 2/6] doc: minor improvements with respect to the term “ruleset” Christoph Anton Mitterer
2025-10-20  9:04     ` Florian Westphal
2025-10-19  1:38   ` [PATCH v3 3/6] doc: add overall description of the ruleset evaluation Christoph Anton Mitterer
2025-10-19  1:38   ` [PATCH v3 4/6] doc: add more documentation on bitmasks and sets Christoph Anton Mitterer
2025-10-20  9:06     ` Florian Westphal
2025-10-20 21:57       ` Christoph Anton Mitterer
2025-10-20 22:18         ` Florian Westphal
2025-10-20 23:51           ` Christoph Anton Mitterer
2025-10-19  1:38   ` [PATCH v3 5/6] doc: describe include’s collation order to be that of the C locale Christoph Anton Mitterer
2025-10-19  1:38   ` [PATCH v3 6/6] doc: minor improvements the `reject` statement Christoph Anton Mitterer
2025-10-20 23:49 ` [PATCH v4 0/5] doc: miscellaneous improvements Christoph Anton Mitterer
2025-10-20 23:49   ` [PATCH v4 1/5] doc: fix/improve documentation of verdicts Christoph Anton Mitterer
2025-10-20 23:49   ` [PATCH v4 2/5] doc: add overall description of the ruleset evaluation Christoph Anton Mitterer
2025-10-20 23:49   ` [PATCH v4 3/5] doc: add more documentation on bitmasks and sets Christoph Anton Mitterer
2025-10-20 23:49   ` [PATCH v4 4/5] doc: describe include’s collation order to be that of the C locale Christoph Anton Mitterer
2025-10-20 23:49   ` [PATCH v4 5/5] doc: minor improvements the `reject` statement Christoph Anton Mitterer
2025-10-22 14:34     ` Florian Westphal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6bb455009ebd3a2fe17581dfa74addc9186f33ea.camel@scientia.org \
    --to=calestyo@scientia.org \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).