Re: [libnftnl RFC] data_reg: Improve data reg value printing

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Phil Sutter <phil@nwl.cc>
To: Florian Westphal <fw@strlen.de>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>, netfilter-devel@vger.kernel.org
Subject: Re: [libnftnl RFC] data_reg: Improve data reg value printing
Date: Thu, 11 Sep 2025 17:57:27 +0200	[thread overview]
Message-ID: <aMLxZ6J-LHVm4KxX@orbyte.nwl.cc> (raw)
In-Reply-To: <aMLdxyxGNYsSP5c2@strlen.de>

On Thu, Sep 11, 2025 at 04:33:58PM +0200, Florian Westphal wrote:
> Phil Sutter <phil@nwl.cc> wrote:
> > The old code printing each field with data as u32 value is problematic
> > in two ways:
> > 
> > A) Field values are printed in host byte order which may not be correct
> >    and output for identical data will divert between machines of
> >    different Endianness.
> > 
> > B) The actual data length is not clearly readable from given output.
> > 
> > This patch won't entirely fix for (A) given that data may be in host
> > byte order but it solves for the common case of matching against packet
> > data.
> 
> Can you provide an example diff and a diffstat for the expected fallout in
> nftables?

Getting a diffstat is tedious: tests/py will create .got files, but
copying them to their name with that suffix dropped will remove
unproblematic payloads. OTOH dropping all stored payload files upfront
will cause a mess due to needed per-family payload records.

Here's some stats though after running tests/py in standard syntax only:

|% find . -name \*.got | wc -l
|123
|% find . -name \*.got | \
|	while read f; do diff "$f" "${f%.got}"; done | \
|	grep '^>' | \
|	wc -l
|7821

Here are some sample diffs:

| # meta length 1000
| ip test-ip4 input
|   [ meta load len => reg 1 ]
|-  [ cmp eq reg 1 0x000003e8 ]
|+  [ cmp eq reg 1 0xe8030000 ]
| 
| # meta length 22
| ip test-ip4 input
|   [ meta load len => reg 1 ]
|-  [ cmp eq reg 1 0x00000016 ]
|+  [ cmp eq reg 1 0x16000000 ]

The above is noise, data is in host byte order. Ranges are not, though:

| # meta length 33-45
| ip test-ip4 input
|   [ meta load len => reg 1 ]
|   [ byteorder reg 1 = hton(reg 1, 4, 4) ]
|-  [ range eq reg 1 0x21000000 0x2d000000 ]
|+  [ range eq reg 1 0x00000021 0x0000002d ]

Sets are "special", also due to another patch which cleans up output for
them:

| # meta length { 33, 55, 67, 88}
|-__set%d test-ip4 3
|-__set%d test-ip4 0
|-       element 00000021  : 0 [end]     element 00000037  : 0 [end]     element 00000043  : 0 [end]     element 00000058  : 0 [end]
|+       element 21000000        element 37000000        element 43000000        element 58000000 
| ip test-ip4 input
|   [ meta load len => reg 1 ]
|   [ lookup reg 1 set __set%d ]
 
Said patch drops that pointless [end] marker and drops the colon for
non-maps as well as flags if zero (otherwise prints "flags NN" to
separate them from element data). Set definitions are lost in the test
suite runner filter since d477eada4f271 ("tests/py: clean up set backend
support fallout").

But now for some better examples:

| # meta protocol ip
| ip test-ip4 input
|   [ meta load protocol => reg 1 ]
|-  [ cmp eq reg 1 0x00000008 ]
|+  [ cmp eq reg 1 0x0800 ]
|
| # meta l4proto 22
| ip test-ip4 input
|   [ meta load l4proto => reg 1 ]
|-  [ cmp eq reg 1 0x00000016 ]
|+  [ cmp eq reg 1 0x16 ]


Obviously, 'meta protocol' is two bytes and 'meta l4proto' just one.

Strings are much nicer:

| # meta iifname "dummy0"
| ip test-ip4 input
|   [ meta load iifname => reg 1 ]
|-  [ cmp eq reg 1 0x6d6d7564 0x00003079 0x00000000 0x00000000 ]
|+  [ cmp eq reg 1 0x64756d6d 0x79300000 0x00000000 0x00000000 ]

Packet payload is the actual sales pitch for this patch:

| # ip6 saddr 1234:1234:1234::
| inet test-inet input
|   [ meta load nfproto => reg 1 ]
|-  [ cmp eq reg 1 0x0000000a ]
|+  [ cmp eq reg 1 0x0a ]
|   [ payload load 16b @ network header + 8 => reg 1 ]
|-  [ cmp eq reg 1 0x34123412 0x00003412 0x00000000 0x00000000 ]
|+  [ cmp eq reg 1 0x12341234 0x12340000 0x00000000 0x00000000 ]
|
| # ip6 nexthdr 33-44
| inet test-inet input
|   [ meta load nfproto => reg 1 ]
|-  [ cmp eq reg 1 0x0000000a ]
|+  [ cmp eq reg 1 0x0a ]
|   [ payload load 1b @ network header + 6 => reg 1 ]
|-  [ range eq reg 1 0x00000021 0x0000002c ]
|+  [ range eq reg 1 0x21 0x2c ]
|
| # ip6 length != 233
| inet test-inet input
|   [ meta load nfproto => reg 1 ]
|-  [ cmp eq reg 1 0x0000000a ]
|+  [ cmp eq reg 1 0x0a ]
|   [ payload load 2b @ network header + 4 => reg 1 ]
|-  [ cmp neq reg 1 0x0000e900 ]
|+  [ cmp neq reg 1 0x00e9 ]

The combination of network byte order and correct lengths fits perfectly
and even works on Big Endian hosts. Ethernet addresses are 6B:

| # ether saddr 00:0f:54:0c:11:04 ip daddr 1.2.3.4 accept
| ip test-ip input
|   [ meta load iiftype => reg 1 ]
|-  [ cmp eq reg 1 0x00000001 ]
|+  [ cmp eq reg 1 0x0100 ]
|   [ payload load 6b @ link header + 6 => reg 1 ]
|-  [ cmp eq reg 1 0x0c540f00 0x00000411 ]
|+  [ cmp eq reg 1 0x000f540c 0x1104 ]
|   [ payload load 4b @ network header + 16 => reg 1 ]
|-  [ cmp eq reg 1 0x04030201 ]
|+  [ cmp eq reg 1 0x01020304 ]
|   [ immediate reg 0 accept ]

The value separating into 4B chunks and the 0x prefix is rather
disturbing here, maybe we should drop at least the prefix entirely (set
elements don't have it, either).


> > Fixing for (B) is crucial to see what's happening beneath the bonnet.
> > The new output will show exactly what is used e.g. by a cmp expression.
> > 
> > Signed-off-by: Phil Sutter <phil@nwl.cc>
> > ---
> > This change will affect practically all stored payload dumps in nftables
> > test suite. I have an alternative version which prints "full" reg fields
> > as before and uses the byte-by-byte printing only for the remainder (if
> > any). This would largely reduce the churn in stored payload dumps, but
> > also make this less useful.
> 
> I think that if we want it then one big code-churn commit would be
> better than multiple smaller ones.

ACK. Hence why I also decided to do some cleanup of set element debug
output - the imminent bulk change is a good occasion.

> The inability to see the width of the compare operation is bad
> for debugging so I would prefer to change it.

As mentioned, we could decide to leave full data fields or at least data
regs of size N*4 alone to reduce churn. But interpreting values becomes
even harder due to the mix of host byte order and "Big Endian".

Cheers, Phil

next prev parent reply	other threads:[~2025-09-11 15:57 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-11 14:11 [libnftnl RFC] data_reg: Improve data reg value printing Phil Sutter
2025-09-11 14:33 ` Florian Westphal
2025-09-11 15:57   ` Phil Sutter [this message]
2025-09-15 21:19 ` Pablo Neira Ayuso
2025-09-15 22:16   ` Phil Sutter
2025-09-16 22:32     ` Pablo Neira Ayuso
2025-09-16 12:08   ` Florian Westphal
2025-09-30 17:14     ` Phil Sutter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aMLxZ6J-LHVm4KxX@orbyte.nwl.cc \
    --to=phil@nwl.cc \
    --cc=fw@strlen.de \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).