netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Netfilter queue is unable to mangle fragmented UDP6: bug?
@ 2023-10-22  4:24 Duncan Roe
  2023-10-25 12:32 ` Florian Westphal
  0 siblings, 1 reply; 4+ messages in thread
From: Duncan Roe @ 2023-10-22  4:24 UTC (permalink / raw)
  To: Netfilter Development

[-- Attachment #1: Type: text/plain, Size: 12572 bytes --]

My libnetfilter_queue application is unable to mangle UDP6 messages that have
been fragmented. The kernel only delivers the first fragment of such a message
to the application.

Is this a permanent restriction or a bug?

If it is a bug, should I be submitting this report elsewhere?

From the testing below, I have to conclude that GSO is *never* applied to UDP
messages. "Something else" in the kernel re-combines UDP4 fragments before they
are queued to my application, so they mangle OK.

In summary:
 - GSO re-combines TCP fragments before tcpdump can see them.
 - Some other kernel code re-combines UDP4 fragments before netfilter queues
   them
 - Some other different kernel code re-combines UDP6 fragments for the user
   application but after netfilter queues them
 - It's been this way for a number of years

================ Testing with GSO

 nfq6 cmd: nfq6 -t6 -t7 -t8 -t17 -t18 24
 tcpdump cmd: tcpdump -i eth1 'ether host 18:60:24:bb:02:d6 && (tcp || udp) &&
                      ! port x11'

> netcat cmds: nc -6 -q0 -u fe80::1ac0:4dff:fe04:75ba%eth0 1042 <zxc2k : nc -6 -k -l -n -p 1042 -q0 -u -v
>               nfq6 output                                   # tcpdump o/p (early fields omitted)
> packet received (id=169 hw=0x86dd hook=1, payload len 1496) # frag (0|1448) 33020 > 1042: UDP, length 2048
> Packet too short to get UDP payload                         #
>                                                             # frag (1448|608)
> -----------------------------------------------------------------------------
> netcat cmds: nc -4 -q0 -u dimstar 1042 <zxc2k : nc -4 -k -l -n -p 1042 -q0 -u -v
>               nfq6 output                                   # tcpdump o/p (early fields omitted)
>                                                             # UDP, length 2048
> packet received (id=172 hw=0x0800 hook=1, payload len 2076) # udp
> -----------------------------------------------------------------------------
> netcat cmds: nc -6 -q0 fe80::1ac0:4dff:fe04:75ba%eth0 1042 <zxc2k : nc -6 -k -l -n -p 1042 -q0 -v
>               nfq6 output                                                           # tcpdump o/p (early fields omitted, direction re-inserted)
> packet received (id=153 hw=0x86dd hook=1, payload len 80)                           # > Flags [S], seq 352061829, win 64800, options [mss 1440,sackOK,TS val 4262995036 ecr 0,nop,wscale 7], length 0
> packet received (id=154 hw=0x86dd hook=3, payload len 80, checksum not ready)       # < Flags [S.], seq 3686343792, ack 352061830, win 64260, options [mss 1440,sackOK,TS val 1966029309 ecr 4262995036,nop,wscale 7], length 0
> packet received (id=155 hw=0x86dd hook=1, payload len 72)                           # > Flags [.], ack 1, win 507, options [nop,nop,TS val 4262995036 ecr 1966029309], length 0
> GSO packet received (id=156 hw=0x86dd hook=1, payload len 2120, checksum not ready) # > Flags [P.], seq 1:2049, ack 1, win 507, options [nop,nop,TS val 4262995036 ecr 1966029309], length 2048
> packet received (id=157 hw=0x86dd hook=3, payload len 72, checksum not ready)       # > Flags [F.], seq 2049, ack 1, win 507, options [nop,nop,TS val 4262995036 ecr 1966029309], length 0
> packet received (id=158 hw=0x86dd hook=1, payload len 72)                           # < Flags [.], ack 2049, win 487, options [nop,nop,TS val 1966029309 ecr 4262995036], length 0
> packet received (id=159 hw=0x86dd hook=3, payload len 72, checksum not ready)       # < Flags [F.], seq 1, ack 2050, win 501, options [nop,nop,TS val 1966029309 ecr 4262995036], length 0
> packet received (id=160 hw=0x86dd hook=1, payload len 72)                           # > Flags [.], ack 2, win 507, options [nop,nop,TS val 4262995036 ecr 1966029309], length 0
> -----------------------------------------------------------------------------
> netcat cmds: nc -4 -q0 dimstar 1042 <zxc2k : nc -4 -k -l -n -p 1042 -q0 -v
>               nfq6 output                                                           # tcpdump o/p (early fields omitted, direction re-inserted)
> packet received (id=176 hw=0x0800 hook=1, payload len 60)                           # > Flags [S], seq 821055799, win 64240, options [mss 1460,sackOK,TS val 3739788506 ecr 0,nop,wscale 7], length 0
> packet received (id=177 hw=0x0800 hook=3, payload len 60, checksum not ready)       # < Flags [S.], seq 1085807033, ack 821055800, win 65160, options [mss 1460,sackOK,TS val 4164299250 ecr 3739788506,nop,wscale 7], length 0
> packet received (id=178 hw=0x0800 hook=1, payload len 52)                           # > Flags [.], ack 1, win 502, options [nop,nop,TS val 3739788506 ecr 4164299250], length 0
> GSO packet received (id=179 hw=0x0800 hook=1, payload len 2100, checksum not ready) # > Flags [P.], seq 1:2049, ack 1, win 502, options [nop,nop,TS val 3739788506 ecr 4164299250], length 2048
> packet received (id=180 hw=0x0800 hook=1, payload len 52)                           # > Flags [F.], seq 2049, ack 1, win 502, options [nop,nop,TS val 3739788506 ecr 4164299250], length 0
> packet received (id=181 hw=0x0800 hook=3, payload len 52, checksum not ready)       # < Flags [.], ack 2049, win 494, options [nop,nop,TS val 4164299251 ecr 3739788506], length 0
> packet received (id=182 hw=0x0800 hook=3, payload len 52, checksum not ready)       # < Flags [F.], seq 1, ack 2050, win 501, options [nop,nop,TS val 4164299251 ecr 3739788506], length 0
> packet received (id=183 hw=0x0800 hook=1, payload len 52)                           # > Flags [.], ack 2, win 502, options [nop,nop,TS val 3739788507 ecr 4164299251], length 0

================ Testing without GSO (needs v2 nfq6)

 nfq6 cmd: nfq6 -t6 -t7 -t8 -t17 -t18 -t20 24
 tcpdump cmd: (as above)

> netcat cmds: nc -6 -q0 -u fe80::1ac0:4dff:fe04:75ba%eth0 1042 <zxc2k : nc -6 -k -l -n -p 1042 -q0 -u -v
>               nfq6 output                                   # tcpdump o/p (early fields and source port omitted)
> packet received (id=1 hw=0x86dd hook=1, payload len 1496)   # frag (0|1448) > 1042: UDP, length 2048
> Packet too short to get UDP payload                         #
>                                                             # frag (1448|608)
> -----------------------------------------------------------------------------
> netcat cmds: nc -4 -q0 -u dimstar 1042 <zxc2k : nc -4 -k -l -n -p 1042 -q0 -u -v
>               nfq6 output                                   # tcpdump o/p (early fields omitted)
>                                                             # UDP, length 2048
> packet received (id=3 hw=0x0800 hook=1, payload len 2076)   # udp
> -----------------------------------------------------------------------------
> netcat cmds: nc -6 -q0 fe80::1ac0:4dff:fe04:75ba%eth0 1042 <zxc2k : nc -6 -k -l -n -p 1042 -q0 -v
>               nfq6 output                                   # tcpdump o/p (early fields omitted, direction re-inserted)
> packet received (id=47 hw=0x86dd hook=1, payload len 80)    # > Flags [S], seq 3918008965, win 64800, options [mss 1440,sackOK,TS val 925571377 ecr 0,nop,wscale 7], length 0
> packet received (id=48 hw=0x86dd hook=3, payload len 80)    # < Flags [S.], seq 2930457023, ack 3918008966, win 64260, options [mss 1440,sackOK,TS val 2923572945 ecr 925571377,nop,wscale 7], length 0
> packet received (id=49 hw=0x86dd hook=1, payload len 72)    # > Flags [.], ack 1, win 507, options [nop,nop,TS val 925571377 ecr 2923572945], length 0
> packet received (id=50 hw=0x86dd hook=1, payload len 1500)  # > Flags [.], seq 1:1429, ack 1, win 507, options [nop,nop,TS val 925571377 ecr 2923572945], length 1428
> packet received (id=51 hw=0x86dd hook=3, payload len 72)    # < Flags [.], ack 1429, win 501, options [nop,nop,TS val 2923572945 ecr 925571377], length 0
> packet received (id=52 hw=0x86dd hook=1, payload len 692)   # > Flags [P.], seq 1429:2049, ack 1, win 507, options [nop,nop,TS val 925571377 ecr 2923572945], length 620
> packet received (id=53 hw=0x86dd hook=1, payload len 72)    # > Flags [F.], seq 2049, ack 1, win 507, options [nop,nop,TS val 925571377 ecr 2923572945], length 0
> packet received (id=54 hw=0x86dd hook=3, payload len 72)    # < Flags [.], ack 2049, win 497, options [nop,nop,TS val 2923572945 ecr 925571377], length 0
> packet received (id=55 hw=0x86dd hook=3, payload len 72)    # < Flags [F.], seq 1, ack 2050, win 501, options [nop,nop,TS val 2923572945 ecr 925571377], length 0
> packet received (id=56 hw=0x86dd hook=1, payload len 72)    # > Flags [.], ack 2, win 507, options [nop,nop,TS val 925571378 ecr 2923572945], length 0
> -----------------------------------------------------------------------------
> netcat cmds: nc -4 -q0 dimstar 1042 <zxc2k : nc -4 -k -l -n -p 1042 -q0 -v
>               nfq6 output                                   # tcpdump o/p (early fields omitted, direction re-inserted)
> packet received (id=64 hw=0x0800 hook=1, payload len 60)    # > Flags [S], seq 2388825860, win 64240, options [mss 1460,sackOK,TS val 398191667 ecr 0,nop,wscale 7], length 0
> packet received (id=65 hw=0x0800 hook=3, payload len 60)    # < Flags [S.], seq 3593988110, ack 2388825861, win 65160, options [mss 1460,sackOK,TS val 822702413 ecr 398191667,nop,wscale 7], length 0
> packet received (id=66 hw=0x0800 hook=1, payload len 52)    # > Flags [.], ack 1, win 502, options [nop,nop,TS val 398191668 ecr 822702413], length 0
> packet received (id=67 hw=0x0800 hook=1, payload len 1500)  # > Flags [.], seq 1:1449, ack 1, win 502, options [nop,nop,TS val 398191668 ecr 822702413], length 1448
> packet received (id=68 hw=0x0800 hook=3, payload len 52)    # < Flags [.], ack 1449, win 501, options [nop,nop,TS val 822702414 ecr 398191668], length 0
> packet received (id=69 hw=0x0800 hook=1, payload len 652)   # > Flags [P.], seq 1449:2049, ack 1, win 502, options [nop,nop,TS val 398191668 ecr 822702413], length 600
> packet received (id=70 hw=0x0800 hook=1, payload len 52)    # < Flags [.], ack 2049, win 501, options [nop,nop,TS val 822702414 ecr 398191668], length 0
> packet received (id=71 hw=0x0800 hook=3, payload len 52)    # > Flags [F.], seq 2049, ack 1, win 502, options [nop,nop,TS val 398191668 ecr 822702413], length 0
> packet received (id=72 hw=0x0800 hook=3, payload len 52)    # < Flags [F.], seq 1, ack 2050, win 501, options [nop,nop,TS val 822702414 ecr 398191668], length 0
> packet received (id=73 hw=0x0800 hook=1, payload len 52)    # > Flags [.], ack 2, win 502, options [nop,nop,TS val 398191668 ecr 822702414], length 0

================ Software revisions

 - Linux 6.4.7
 - netcat-openbsd-7.3_1-x86_64-1_SBo (based on Debian netcat-openbsd, that
   should work also. Other netcats may not accept all options).
   Slackbuilds link:
   https://slackbuilds.org/repository/15.0/network/netcat-openbsd/
   Direct link: https://github.com/duncan-roe/netcat-openbsd
 - libnetfilter_queue: commit 1512964 (latest)
 - nfq6: v2 (from patchwork)

================ nft table (log prefix entries irrelevant for these tests)

table inet INET {
        chain FILTER_INPUT {
                type filter hook input priority filter - 1; policy accept;
                iif "lo" udp dport 1042 counter packets 0 bytes 0 log prefix "local UDP" group 0 queue flags bypass to 24
                iif "eth1" udp dport 1042 counter packets 142 bytes 1965130 log prefix "incoming UDP to" group 0 queue flags bypass to 24
                iif "eth1" udp sport 1042 counter packets 0 bytes 0 log prefix "incoming UDP fm" group 0 queue flags bypass to 24
                iif "eth1" tcp dport 1042 counter packets 330 bytes 767057 log prefix "incoming TCP to" group 0 queue flags bypass to 24
                iif "eth1" tcp sport 1042 counter packets 0 bytes 0 log prefix "incoming TCP fm" group 0 queue flags bypass to 24
                iif "lo" tcp dport 1042 counter packets 0 bytes 0 log prefix "local TCP" group 0 queue flags bypass to 24
        }

        chain FILTER_OUTPUT {
                type filter hook output priority filter - 1; policy accept;
                oif "eth1" udp dport 1042 counter packets 0 bytes 0 log prefix "outgoing UDP to" group 0 queue flags bypass to 24
                oif "eth1" tcp dport 1042 counter packets 0 bytes 0 log prefix "outgoing TCP to" group 0 queue flags bypass to 24
                oif "eth1" udp sport 1042 counter packets 7 bytes 275 log prefix "outgoing UDP fm" group 0 queue flags bypass to 24
                oif "eth1" tcp sport 1042 counter packets 263 bytes 17684 log prefix "outgoing TCP fm" group 0 queue flags bypass to 24
        }
}

================ Attachment

zxc2k.xz

[-- Attachment #2: zxc2k.xz --]
[-- Type: application/octet-stream, Size: 116 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Netfilter queue is unable to mangle fragmented UDP6: bug?
  2023-10-22  4:24 Netfilter queue is unable to mangle fragmented UDP6: bug? Duncan Roe
@ 2023-10-25 12:32 ` Florian Westphal
  2023-10-27  2:10   ` Duncan Roe
  0 siblings, 1 reply; 4+ messages in thread
From: Florian Westphal @ 2023-10-25 12:32 UTC (permalink / raw)
  To: Netfilter Development

Duncan Roe <duncan_roe@optusnet.com.au> wrote:
> My libnetfilter_queue application is unable to mangle UDP6 messages that have
> been fragmented. The kernel only delivers the first fragment of such a message
> to the application.
> 
> Is this a permanent restriction or a bug?

There is not enough information here to answer this question,
see below.

> messages. "Something else" in the kernel re-combines UDP4 fragments before they
> are queued to my application, so they mangle OK.

I'm not sure what you mean or what you expect to happen.

> In summary:
>  - GSO re-combines TCP fragments before tcpdump can see them.

Do you mean "segments"?  Its the other way around, with GSO/TSO, stack
builds large superpackes, one tcp header with lots of data.

Such superpackets are split at the last possible moment;
ideally by NIC/hardware.

>  - Some other kernel code re-combines UDP4 fragments before netfilter queues
>    them
>  - Some other different kernel code re-combines UDP6 fragments for the user
>    application but after netfilter queues them
>  - It's been this way for a number of years

GSO is just the software fallback of TSO, i.e. local stack passes
large skb down to the driver which will do pseudo segmentation,
this needs hardware that can handle scatterlist, which is true for
almost all nics.

There is some segmentation support for UDP to handle encapsulation
(tunneling) use cases, where stack can pass large skb and then can
have hardware or software fallback do the segmentation for us, i.e.
split according to inner protocol and add the outer udp encapsulation
to all packets.

> ================ Testing with GSO
> 
>  nfq6 cmd: nfq6 -t6 -t7 -t8 -t17 -t18 24
>  tcpdump cmd: tcpdump -i eth1 'ether host 18:60:24:bb:02:d6 && (tcp || udp) &&
>                       ! port x11'
> 
> > netcat cmds: nc -6 -q0 -u fe80::1ac0:4dff:fe04:75ba%eth0 1042 <zxc2k : nc -6 -k -l -n -p 1042 -q0 -u -v
> >               nfq6 output                                   # tcpdump o/p (early fields omitted)
> > packet received (id=169 hw=0x86dd hook=1, payload len 1496) # frag (0|1448) 33020 > 1042: UDP, length 2048
> > Packet too short to get UDP payload                         #
> >                                                             # frag (1448|608)

You are sending a large udp packet via ipv6, it doesn't fit the device mtu,
fragmentation is needed.  This has nothing to do with GSO.

> > packet received (id=176 hw=0x0800 hook=1, payload len 60)                           # > Flags [S], seq 821055799, win 64240, options [mss 1460,sackOK,TS val 3739788506 ecr 0,nop,wscale 7], length 0
> > packet received (id=177 hw=0x0800 hook=3, payload len 60, checksum not ready)       # < Flags [S.], seq 1085807033, ack 821055800, win 65160, options [mss 1460,sackOK,TS val 4164299250 ecr 3739788506,nop,wscale 7], length 0
> > packet received (id=178 hw=0x0800 hook=1, payload len 52)                           # > Flags [.], ack 1, win 502, options [nop,nop,TS val 3739788506 ecr 4164299250], length 0
> > GSO packet received (id=179 hw=0x0800 hook=1, payload len 2100, checksum not ready) # > Flags [P.], seq 1:2049, ack 1, win 502, options [nop,nop,TS val 3739788506 ecr 4164299250], length 2048

Stack built a larger packet, device or software fallback will segment
them as needed.

> ================ Testing without GSO (needs v2 nfq6)
> 
>  nfq6 cmd: nfq6 -t6 -t7 -t8 -t17 -t18 -t20 24
>  tcpdump cmd: (as above)
> 
> > netcat cmds: nc -6 -q0 -u fe80::1ac0:4dff:fe04:75ba%eth0 1042 <zxc2k : nc -6 -k -l -n -p 1042 -q0 -u -v
> >               nfq6 output                                   # tcpdump o/p (early fields and source port omitted)
> > packet received (id=1 hw=0x86dd hook=1, payload len 1496)   # frag (0|1448) > 1042: UDP, length 2048
> > Packet too short to get UDP payload                         #
> >                                                             # frag (1448|608)
> > -----------------------------------------------------------------------------
> > netcat cmds: nc -4 -q0 -u dimstar 1042 <zxc2k : nc -4 -k -l -n -p 1042 -q0 -u -v
> >               nfq6 output                                   # tcpdump o/p (early fields omitted)
> >                                                             # UDP, length 2048
> > packet received (id=3 hw=0x0800 hook=1, payload len 2076)   # udp
> > -----------------------------------------------------------------------------

It would help if you could explain what is wrong here.

You also removed tcpdump info, I suspect it was "flags [+]"
with two fragments for udp:ipv4 too?

Frag handling depends on a lot of factors, such as ip defrag being
enabled or not, where queueing happens (hook and prio), if userspace
does mtu probing (like 'ping6 -M do') or not.

And the NIC driver too.

For incoming data it also depends on sysctl settings and if
GRO/LRO is enabled.

> > packet received (id=49 hw=0x86dd hook=1, payload len 72)    # > Flags [.], ack 1, win 507, options [nop,nop,TS val 925571377 ecr 2923572945], length 0
> > packet received (id=50 hw=0x86dd hook=1, payload len 1500)  # > Flags [.], seq 1:1429, ack 1, win 507, options [nop,nop,TS val 925571377 ecr 2923572945], length 1428
> > packet received (id=51 hw=0x86dd hook=3, payload len 72)    # < Flags [.], ack 1429, win 501, options [nop,nop,TS val 2923572945 ecr 925571377], length 0
> > packet received (id=52 hw=0x86dd hook=1, payload len 692)   # > Flags [P.], seq 1429:2049, ack 1, win 507, options [nop,nop,TS val 925571377 ecr 2923572945], length 620

Kernel does software segmentation here, this is slow.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Netfilter queue is unable to mangle fragmented UDP6: bug?
  2023-10-25 12:32 ` Florian Westphal
@ 2023-10-27  2:10   ` Duncan Roe
  2023-10-27 10:41     ` Florian Westphal
  0 siblings, 1 reply; 4+ messages in thread
From: Duncan Roe @ 2023-10-27  2:10 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Netfilter Development

Hi Florian,

Thank you for your detailed reply. Responses below:

On Wed, Oct 25, 2023 at 02:32:32PM +0200, Florian Westphal wrote:
> Duncan Roe <duncan_roe@optusnet.com.au> wrote:
> > My libnetfilter_queue application is unable to mangle UDP6 messages that have
> > been fragmented. The kernel only delivers the first fragment of such a message
> > to the application.
> >
> > Is this a permanent restriction or a bug?
>
> There is not enough information here to answer this question,
> see below.
>
> > messages. "Something else" in the kernel re-combines UDP4 fragments before they
> > are queued to my application, so they mangle OK.
>
> I'm not sure what you mean or what you expect to happen.

I expected the netfilter program to see the full UDP datagram as sent. With
UDP/IPv4 it does see the full datagram, but not with UDP/IPv6.
>
> > In summary:
> >  - GSO re-combines TCP fragments before tcpdump can see them.
>
> Do you mean "segments"?  Its the other way around, with GSO/TSO, stack
> builds large superpackes, one tcp header with lots of data.

Sorry for the confusion here. I only meant to say that there is no problem with
TCP.

IOW kernel delivers to filter program exactly what was in the buffer when the
remote application did a write(2) (for buffer size up to just under 64KB).

I don't know what GSO is, only that it's strongly recommended to use it.
>
> Such superpackets are split at the last possible moment;
> ideally by NIC/hardware.
>
> >  - Some other kernel code re-combines UDP4 fragments before netfilter queues
> >    them
> >  - Some other different kernel code re-combines UDP6 fragments for the user
> >    application but after netfilter queues them
> >  - It's been this way for a number of years
>
> GSO is just the software fallback of TSO, i.e. local stack passes
> large skb down to the driver which will do pseudo segmentation,
> this needs hardware that can handle scatterlist, which is true for
> almost all nics.
>
> There is some segmentation support for UDP to handle encapsulation
> (tunneling) use cases, where stack can pass large skb and then can
> have hardware or software fallback do the segmentation for us, i.e.
> split according to inner protocol and add the outer udp encapsulation
> to all packets.
>
> > ================ Testing with GSO
> >
> >  nfq6 cmd: nfq6 -t6 -t7 -t8 -t17 -t18 24
> >  tcpdump cmd: tcpdump -i eth1 'ether host 18:60:24:bb:02:d6 && (tcp || udp) &&
> >                       ! port x11'
> >
> > > netcat cmds: nc -6 -q0 -u fe80::1ac0:4dff:fe04:75ba%eth0 1042 <zxc2k : nc -6 -k -l -n -p 1042 -q0 -u -v
> > >               nfq6 output                                   # tcpdump o/p (early fields omitted)
> > > packet received (id=169 hw=0x86dd hook=1, payload len 1496) # frag (0|1448) 33020 > 1042: UDP, length 2048
> > > Packet too short to get UDP payload                         #
> > >                                                             # frag (1448|608)
>
> You are sending a large udp packet via ipv6, it doesn't fit the device mtu,
> fragmentation is needed.  This has nothing to do with GSO.

OK I was under the mis-apprehension GSO was a level 3 thing (working at IP
level).
>
> > > packet received (id=176 hw=0x0800 hook=1, payload len 60)                           # > Flags [S], seq 821055799, win 64240, options [mss 1460,sackOK,TS val 3739788506 ecr 0,nop,wscale 7], length 0
> > > packet received (id=177 hw=0x0800 hook=3, payload len 60, checksum not ready)       # < Flags [S.], seq 1085807033, ack 821055800, win 65160, options [mss 1460,sackOK,TS val 4164299250 ecr 3739788506,nop,wscale 7], length 0
> > > packet received (id=178 hw=0x0800 hook=1, payload len 52)                           # > Flags [.], ack 1, win 502, options [nop,nop,TS val 3739788506 ecr 4164299250], length 0
> > > GSO packet received (id=179 hw=0x0800 hook=1, payload len 2100, checksum not ready) # > Flags [P.], seq 1:2049, ack 1, win 502, options [nop,nop,TS val 3739788506 ecr 4164299250], length 2048
>
> Stack built a larger packet, device or software fallback will segment
> them as needed.
>
> > ================ Testing without GSO (needs v2 nfq6)
> >
> >  nfq6 cmd: nfq6 -t6 -t7 -t8 -t17 -t18 -t20 24
> >  tcpdump cmd: (as above)
> >
> > > netcat cmds: nc -6 -q0 -u fe80::1ac0:4dff:fe04:75ba%eth0 1042 <zxc2k : nc -6 -k -l -n -p 1042 -q0 -u -v
> > >               nfq6 output                                   # tcpdump o/p (early fields and source port omitted)
> > > packet received (id=1 hw=0x86dd hook=1, payload len 1496)   # frag (0|1448) > 1042: UDP, length 2048
> > > Packet too short to get UDP payload                         #
> > >                                                             # frag (1448|608)
> > > -----------------------------------------------------------------------------
> > > netcat cmds: nc -4 -q0 -u dimstar 1042 <zxc2k : nc -4 -k -l -n -p 1042 -q0 -u -v
> > >               nfq6 output                                   # tcpdump o/p (early fields omitted)
> > >                                                             # UDP, length 2048
> > > packet received (id=3 hw=0x0800 hook=1, payload len 2076)   # udp
> > > -----------------------------------------------------------------------------
>
> It would help if you could explain what is wrong here.

This example shows a UDP4 2KB datagram being successfully mangled and a UDP6 2KB
datagram failing to be mangled.
>
> You also removed tcpdump info, I suspect it was "flags [+]"
> with two fragments for udp:ipv4 too?

There are 2 fragments for both IPv4 and IPv6.

tcpdump does not report any flags:

> 08:16:09.713395 IP6 fe80::1a60:24ff:febb:2d6 > fe80::1ac0:4dff:fe04:75ba: frag (0|1448) 47843 > 1042: UDP, length 2048
> 08:16:09.713395 IP6 fe80::1a60:24ff:febb:2d6 > fe80::1ac0:4dff:fe04:75ba: frag (1448|608)
> 08:17:22.924883 IP smallstar.local.net.55288 > dimstar.local.net.1042: UDP, length 2048
> 08:17:22.924883 IP smallstar.local.net > dimstar.local.net: udp

> Frag handling depends on a lot of factors, such as ip defrag being
> enabled or not, where queueing happens (hook and prio), if userspace
> does mtu probing (like 'ping6 -M do') or not.
>
> And the NIC driver too.
>
> For incoming data it also depends on sysctl settings and if
> GRO/LRO is enabled.
>
> > > packet received (id=49 hw=0x86dd hook=1, payload len 72)    # > Flags [.], ack 1, win 507, options [nop,nop,TS val 925571377 ecr 2923572945], length 0
> > > packet received (id=50 hw=0x86dd hook=1, payload len 1500)  # > Flags [.], seq 1:1429, ack 1, win 507, options [nop,nop,TS val 925571377 ecr 2923572945], length 1428
> > > packet received (id=51 hw=0x86dd hook=3, payload len 72)    # < Flags [.], ack 1429, win 501, options [nop,nop,TS val 2923572945 ecr 925571377], length 0
> > > packet received (id=52 hw=0x86dd hook=1, payload len 692)   # > Flags [P.], seq 1429:2049, ack 1, win 507, options [nop,nop,TS val 925571377 ecr 2923572945], length 620
>
> Kernel does software segmentation here, this is slow.

Sure, that was just a test.

---

Florian, please say if you would like more explanation. Thank you again for
looking at this.

Cheers ... Duncan.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Netfilter queue is unable to mangle fragmented UDP6: bug?
  2023-10-27  2:10   ` Duncan Roe
@ 2023-10-27 10:41     ` Florian Westphal
  0 siblings, 0 replies; 4+ messages in thread
From: Florian Westphal @ 2023-10-27 10:41 UTC (permalink / raw)
  To: Florian Westphal, Netfilter Development

Duncan Roe <duncan_roe@optusnet.com.au> wrote:
> I expected the netfilter program to see the full UDP datagram as sent. With
> UDP/IPv4 it does see the full datagram, but not with UDP/IPv6.

For INPUT ipv4 stack will defrag before INPUT hooks are called.

From ipv6 point of view, the ipv6 next header protocol value isn't
relevant at that stage, so it doesn't matter if thats IPPROTO_TCP,
IPPROTO_UDP or, in this case, IPPROTO_FRAGMENT.

INPUT hook runs on the arrived packets, then the packets are delivered
to the next handler, i.e. the fragment-collection done by the IPPROTO_FRAGMENT
handlers is done AFTER the INPUT hook.

To get the behaviour you want you need to enable netfilter ipv6 defrag.

There is currently no way to do this standalone, you will need to add
a dummy tproxy or conntrack rule (the latter will enable conntrack too
which might not be what you want).

Or you modify your ruleset to also queue fragments to userspace and
do ipv6 defrag yourself in the nfqueue application.

> > Do you mean "segments"?  Its the other way around, with GSO/TSO, stack
> > builds large superpackes, one tcp header with lots of data.
> 
> Sorry for the confusion here. I only meant to say that there is no problem with
> TCP.

Yes, because no ipv6 fragmentation takes place.

> IOW kernel delivers to filter program exactly what was in the buffer when the
> remote application did a write(2) (for buffer size up to just under 64KB).

Not really, it depends on the protocols involved and the network, think
e.g. of a traffic policier that enforces some rate limit.

> I don't know what GSO is, only that it's strongly recommended to use it.

https://en.wikipedia.org/wiki/TCP_offload_engine

But if you are talking about F_GSO flag for nfqueue -- it does NOT
enable GSO, on the contrary.  It tells the kernel "This program
can handle large packets with "bogus" (to-be-filled-by-hardware)
checksum".

Without the flag, tcp packets need to be splitted in software and their
checksums need to be computed too (i.e. all the data needs to be read).

> This example shows a UDP4 2KB datagram being successfully mangled and a UDP6 2KB
> datagram failing to be mangled.
> >
> > You also removed tcpdump info, I suspect it was "flags [+]"
> > with two fragments for udp:ipv4 too?
> 
> There are 2 fragments for both IPv4 and IPv6.
> 
> tcpdump does not report any flags:
> 
> > 08:16:09.713395 IP6 fe80::1a60:24ff:febb:2d6 > fe80::1ac0:4dff:fe04:75ba: frag (0|1448) 47843 > 1042: UDP, length 2048
> > 08:16:09.713395 IP6 fe80::1a60:24ff:febb:2d6 > fe80::1ac0:4dff:fe04:75ba: frag (1448|608)
> > 08:17:22.924883 IP smallstar.local.net.55288 > dimstar.local.net.1042: UDP, length 2048
> > 08:17:22.924883 IP smallstar.local.net > dimstar.local.net: udp

Forgot to mention: in the future when debugging problems, please use
-vvvv (as many as needed), tcpdump elides a lot of information
otherwise.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-10-27 10:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-22  4:24 Netfilter queue is unable to mangle fragmented UDP6: bug? Duncan Roe
2023-10-25 12:32 ` Florian Westphal
2023-10-27  2:10   ` Duncan Roe
2023-10-27 10:41     ` Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).