public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 00/14] Netfilter fixes for net
@ 2022-08-24 22:03 Pablo Neira Ayuso
  0 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2022-08-24 22:03 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet

Hi,

The following patchset contains Netfilter fixes for net. All fixes
included in this batch address problems appearing in several releases:

1) Fix crash with malformed ebtables blob which do not provide all
   entry points, from Florian Westphal.

2) Fix possible TCP connection clogging up with default 5-days
   timeout in conntrack, from Florian.

3) Fix crash in nf_tables tproxy with unsupported chains, also from Florian.

4) Do not allow to update implicit chains.

5) Make table handle allocation per-netns to fix data race.

6) Do not truncated payload length and offset, and checksum offset.
   Instead report EINVAl.

7) Enable chain stats update via static key iff no error occurs.

8) Restrict osf expression to ip, ip6 and inet families.

9) Restrict tunnel expression to netdev family.

10) Fix crash when trying to bind again an already bound chain.

11) Flowtable garbage collector might leave behind pending work to
    delete entries. This patch comes with a previous preparation patch
    as dependency.

12) Allow net.netfilter.nf_conntrack_frag6_high_thresh to be lowered,
    from Eric Dumazet.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git

Thanks.

----------------------------------------------------------------

The following changes since commit 855a28f9c96c80e6cbd2d986a857235e34868064:

  net: dsa: don't dereference NULL extack in dsa_slave_changeupper() (2022-08-23 07:54:16 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git HEAD

for you to fetch changes up to 00cd7bf9f9e06769ef84d5102774c8becd6a498a:

  netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases (2022-08-24 08:06:44 +0200)

----------------------------------------------------------------
Eric Dumazet (1):
      netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases

Florian Westphal (3):
      netfilter: ebtables: reject blobs that don't provide all entry points
      netfilter: conntrack: work around exceeded receive window
      netfilter: nft_tproxy: restrict to prerouting hook

Pablo Neira Ayuso (10):
      netfilter: nf_tables: disallow updates of implicit chain
      netfilter: nf_tables: make table handle allocation per-netns friendly
      netfilter: nft_payload: report ERANGE for too long offset and length
      netfilter: nft_payload: do not truncate csum_offset and csum_type
      netfilter: nf_tables: do not leave chain stats enabled on error
      netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families
      netfilter: nft_tunnel: restrict it to netdev family
      netfilter: nf_tables: disallow binding to already bound chain
      netfilter: flowtable: add function to invoke garbage collection immediately
      netfilter: flowtable: fix stuck flows on cleanup due to pending work

 include/linux/netfilter_bridge/ebtables.h |  4 ----
 include/net/netfilter/nf_flow_table.h     |  3 +++
 include/net/netfilter/nf_tables.h         |  1 +
 net/bridge/netfilter/ebtable_broute.c     |  8 --------
 net/bridge/netfilter/ebtable_filter.c     |  8 --------
 net/bridge/netfilter/ebtable_nat.c        |  8 --------
 net/bridge/netfilter/ebtables.c           |  8 +-------
 net/ipv6/netfilter/nf_conntrack_reasm.c   |  1 -
 net/netfilter/nf_conntrack_proto_tcp.c    | 31 +++++++++++++++++++++++++++++++
 net/netfilter/nf_flow_table_core.c        | 15 ++++++++++-----
 net/netfilter/nf_flow_table_offload.c     |  8 ++++++++
 net/netfilter/nf_tables_api.c             | 14 ++++++++++----
 net/netfilter/nft_osf.c                   | 18 +++++++++++++++---
 net/netfilter/nft_payload.c               | 29 +++++++++++++++++++++--------
 net/netfilter/nft_tproxy.c                |  8 ++++++++
 net/netfilter/nft_tunnel.c                |  1 +
 16 files changed, 109 insertions(+), 56 deletions(-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH net 00/14] Netfilter fixes for net
@ 2024-01-17 16:00 Pablo Neira Ayuso
  0 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2024-01-17 16:00 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw

Hi,

The following batch contains Netfilter fixes for net. Slightly larger
than usual because this batch includes several patches to tighten the
nf_tables control plane to reject inconsistent configuration:

1) Restrict NFTA_SET_POLICY to NFT_SET_POL_PERFORMANCE and
   NFT_SET_POL_MEMORY.

2) Bail out if a nf_tables expression registers more than 16 netlink
   attributes which is what struct nft_expr_info allows.

3) Bail out if NFT_EXPR_STATEFUL provides no .clone interface, remove
   existing fallback to memcpy() when cloning which might accidentally
   duplicate memory reference to the same object.

4) Fix br_netfilter interaction with neighbour layer. This requires
   three preparation patches:

   - Use nf_bridge_get_physinif() in nfnetlink_log
   - Use nf_bridge_info_exists() to check in br_netfilter context
     is available in nf_queue.
   - Pass net to nf_bridge_get_physindev()

   And finally, the fix which replaces physindev with physinif
   in nf_bridge_info.

   Patches from Pavel Tikhomirov.

5) Catch-all deactivation happens in the transaction, hence this
   oneliner to check for the next generation. This bug uncovered after
   the removal of the _BUSY bit, which happened in set elements back in
   summer 2023.

6) Ensure set (total) key length size and concat field length description
   is consistent, otherwise bail out.

7) Skip set element with the _DEAD flag on from the netlink dump path.
   A tests occasionally shows that dump is mismatching because GC might
   lose race to get rid of this element while a netlink dump is in
   progress.

8) Reject NFT_SET_CONCAT for field_count < 1, from Pavel Tikhomirov.

9) Use IP6_INC_STATS in ipvs to fix preemption BUG splat, patch
   from Fedor Pchelkin.

10) Fix a slow down due to synchronize_rcu() in ipset netlink interface
    with swap/destroy and kernel side add/del/test, from Jozsef Kadlecsik.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-24-01-17

Thanks.

----------------------------------------------------------------

The following changes since commit ea937f77208323d35ffe2f8d8fc81b00118bfcda:

  net: netdevsim: don't try to destroy PHC on VFs (2024-01-17 10:56:44 +0000)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-24-01-17

for you to fetch changes up to 080898f8e782734987f127c73a69ebeab7b5f5e8:

  netfilter: ipset: fix performance regression in swap operation (2024-01-17 12:02:52 +0100)

----------------------------------------------------------------
netfilter pull request 24-01-17

----------------------------------------------------------------
Fedor Pchelkin (1):
      ipvs: avoid stat macros calls from preemptible context

Jozsef Kadlecsik (1):
      netfilter: ipset: fix performance regression in swap operation

Pablo Neira Ayuso (8):
      netfilter: nf_tables: reject invalid set policy
      netfilter: nf_tables: validate .maxattr at expression registration
      netfilter: nf_tables: bail out if stateful expression provides no .clone
      netfilter: nft_limit: do not ignore unsupported flags
      netfilter: nf_tables: check if catch-all set element is active in next generation
      netfilter: nf_tables: do not allow mismatch field size and set key length
      netfilter: nf_tables: skip dead set elements in netlink dump
      netfilter: nf_tables: reject NFT_SET_CONCAT with not field length description

Pavel Tikhomirov (4):
      netfilter: nfnetlink_log: use proper helper for fetching physinif
      netfilter: nf_queue: remove excess nf_bridge variable
      netfilter: propagate net to nf_bridge_get_physindev
      netfilter: bridge: replace physindev with physinif in nf_bridge_info

 include/linux/netfilter/ipset/ip_set.h     |  2 ++
 include/linux/netfilter_bridge.h           |  6 ++--
 include/linux/skbuff.h                     |  2 +-
 net/bridge/br_netfilter_hooks.c            | 42 ++++++++++++++++++++++------
 net/bridge/br_netfilter_ipv6.c             | 14 +++++++---
 net/ipv4/netfilter/nf_reject_ipv4.c        |  9 ++++--
 net/ipv6/netfilter/nf_reject_ipv6.c        | 11 ++++++--
 net/netfilter/ipset/ip_set_core.c          | 31 +++++++++++++++------
 net/netfilter/ipset/ip_set_hash_netiface.c |  8 +++---
 net/netfilter/ipvs/ip_vs_xmit.c            |  4 +--
 net/netfilter/nf_log_syslog.c              | 13 +++++----
 net/netfilter/nf_queue.c                   |  6 ++--
 net/netfilter/nf_tables_api.c              | 44 +++++++++++++++++++++---------
 net/netfilter/nfnetlink_log.c              |  8 +++---
 net/netfilter/nft_limit.c                  | 19 ++++++++-----
 net/netfilter/xt_physdev.c                 |  2 +-
 16 files changed, 150 insertions(+), 71 deletions(-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH net 00/14] Netfilter fixes for net
@ 2024-09-24 20:13 Pablo Neira Ayuso
  2024-09-26  9:41 ` Paolo Abeni
  0 siblings, 1 reply; 25+ messages in thread
From: Pablo Neira Ayuso @ 2024-09-24 20:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw

Hi,

The following patchset contains Netfilter fixes for net:

Patch #1 and #2 handle an esoteric scenario: Given two tasks sending UDP
packets to one another, two packets of the same flow in each direction
handled by different CPUs that result in two conntrack objects in NEW
state, where reply packet loses race. Then, patch #3 adds a testcase for
this scenario. Series from Florian Westphal.

1) NAT engine can falsely detect a port collision if it happens to pick
   up a reply packet as NEW rather than ESTABLISHED. Add extra code to
   detect this and suppress port reallocation in this case.

2) To complete the clash resolution in the reply direction, extend conntrack
   logic to detect clashing conntrack in the reply direction to existing entry.

3) Adds a test case.

Then, an assorted list of fixes follow:

4) Add a selftest for tproxy, from Antonio Ojea.

5) Guard ctnetlink_*_size() functions under
   #if defined(CONFIG_NETFILTER_NETLINK_GLUE_CT) || defined(CONFIG_NF_CONNTRACK_EVENTS)
   From Andy Shevchenko.

6) Use -m socket --transparent in iptables tproxy documentation.
   From XIE Zhibang.

7) Call kfree_rcu() when releasing flowtable hooks to address race with
   netlink dump path, from Phil Sutter.

8) Fix compilation warning in nf_reject with CONFIG_BRIDGE_NETFILTER=n.
   From Simon Horman.

9) Guard ctnetlink_label_size() under CONFIG_NF_CONNTRACK_EVENTS which
   is its only user, to address a compilation warning. From Simon Horman.

10) Use rcu-protected list iteration over basechain hooks from netlink
    dump path.

11) Fix memcg for nf_tables, use GFP_KERNEL_ACCOUNT is not complete.

12) Remove old nfqueue conntrack clash resolution. Instead trying to
    use same destination address consistently which requires double DNAT,
    use the existing clash resolution which allows clashing packets
    go through with different destination. Antonio Ojea originally
    reported an issue from the postrouting chain, I proposed a fix:
    https://lore.kernel.org/netfilter-devel/ZuwSwAqKgCB2a51-@calendula/T/
    which he reported it did not work for him.

13) Adds a selftest for patch 12.

14) Fixes ipvs.sh selftest.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-24-09-24

Thanks.

----------------------------------------------------------------

The following changes since commit 9410645520e9b820069761f3450ef6661418e279:

  Merge tag 'net-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next (2024-09-16 06:02:27 +0200)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-24-09-24

for you to fetch changes up to 69021d3bc01c72c3315ea541062351a623b72c8f:

  selftests: netfilter: Avoid hanging ipvs.sh (2024-09-19 14:54:10 +0200)

----------------------------------------------------------------
netfilter pull request 24-09-24

----------------------------------------------------------------
Andy Shevchenko (1):
      netfilter: ctnetlink: Guard possible unused functions

Antonio Ojea (1):
      selftests: netfilter: nft_tproxy.sh: add tcp tests

Florian Westphal (5):
      netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash
      netfilter: conntrack: add clash resolution for reverse collisions
      selftests: netfilter: add reverse-clash resolution test case
      netfilter: nfnetlink_queue: remove old clash resolution logic
      kselftest: add test for nfqueue induced conntrack race

Pablo Neira Ayuso (2):
      netfilter: nf_tables: use rcu chain hook list iterator from netlink dump path
      netfilter: nf_tables: missing objects with no memcg accounting

Phil Sutter (2):
      netfilter: nf_tables: Keep deleted flowtable hooks until after RCU
      selftests: netfilter: Avoid hanging ipvs.sh

Simon Horman (2):
      netfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n
      netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS

谢致邦 (XIE Zhibang) (1):
      docs: tproxy: ignore non-transparent sockets in iptables

 Documentation/networking/tproxy.rst                |   2 +-
 include/linux/netfilter.h                          |   4 -
 net/ipv4/netfilter/nf_reject_ipv4.c                |  10 +-
 net/ipv6/netfilter/nf_reject_ipv6.c                |   5 +-
 net/netfilter/nf_conntrack_core.c                  | 141 +++-----
 net/netfilter/nf_conntrack_netlink.c               |   9 +-
 net/netfilter/nf_nat_core.c                        | 121 ++++++-
 net/netfilter/nf_tables_api.c                      |   6 +-
 net/netfilter/nft_compat.c                         |   6 +-
 net/netfilter/nft_log.c                            |   2 +-
 net/netfilter/nft_meta.c                           |   2 +-
 net/netfilter/nft_numgen.c                         |   2 +-
 net/netfilter/nft_set_pipapo.c                     |  13 +-
 net/netfilter/nft_tunnel.c                         |   5 +-
 tools/testing/selftests/net/netfilter/Makefile     |   4 +
 tools/testing/selftests/net/netfilter/config       |   1 +
 .../net/netfilter/conntrack_reverse_clash.c        | 125 +++++++
 .../net/netfilter/conntrack_reverse_clash.sh       |  51 +++
 tools/testing/selftests/net/netfilter/ipvs.sh      |   2 +-
 tools/testing/selftests/net/netfilter/nft_queue.sh |  92 +++++-
 .../selftests/net/netfilter/nft_tproxy_tcp.sh      | 358 +++++++++++++++++++++
 .../selftests/net/netfilter/nft_tproxy_udp.sh      | 262 +++++++++++++++
 22 files changed, 1091 insertions(+), 132 deletions(-)
 create mode 100644 tools/testing/selftests/net/netfilter/conntrack_reverse_clash.c
 create mode 100755 tools/testing/selftests/net/netfilter/conntrack_reverse_clash.sh
 create mode 100755 tools/testing/selftests/net/netfilter/nft_tproxy_tcp.sh
 create mode 100755 tools/testing/selftests/net/netfilter/nft_tproxy_udp.sh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 00/14] Netfilter fixes for net
  2024-09-24 20:13 [PATCH net 00/14] Netfilter fixes for net Pablo Neira Ayuso
@ 2024-09-26  9:41 ` Paolo Abeni
  2024-09-26 10:37   ` Florian Westphal
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Abeni @ 2024-09-26  9:41 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel, fw; +Cc: davem, netdev, kuba, edumazet, fw

On 9/24/24 22:13, Pablo Neira Ayuso wrote:
> The following patchset contains Netfilter fixes for net:
> 
> Patch #1 and #2 handle an esoteric scenario: Given two tasks sending UDP
> packets to one another, two packets of the same flow in each direction
> handled by different CPUs that result in two conntrack objects in NEW
> state, where reply packet loses race. Then, patch #3 adds a testcase for
> this scenario. Series from Florian Westphal.

Kdoc complains against the lack of documentation for the return value in 
the first 2 patches: 'Returns' should be '@Return'.

If you could repost a new revision soon, I could possibly still include 
it in today PR (delaying the latter a bit).

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 00/14] Netfilter fixes for net
  2024-09-26  9:41 ` Paolo Abeni
@ 2024-09-26 10:37   ` Florian Westphal
  2024-09-26 10:38     ` Pablo Neira Ayuso
  2024-09-26 10:43     ` Paolo Abeni
  0 siblings, 2 replies; 25+ messages in thread
From: Florian Westphal @ 2024-09-26 10:37 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Pablo Neira Ayuso, netfilter-devel, fw, davem, netdev, kuba,
	edumazet

Paolo Abeni <pabeni@redhat.com> wrote:
> On 9/24/24 22:13, Pablo Neira Ayuso wrote:
> > The following patchset contains Netfilter fixes for net:
> > 
> > Patch #1 and #2 handle an esoteric scenario: Given two tasks sending UDP
> > packets to one another, two packets of the same flow in each direction
> > handled by different CPUs that result in two conntrack objects in NEW
> > state, where reply packet loses race. Then, patch #3 adds a testcase for
> > this scenario. Series from Florian Westphal.
> 
> Kdoc complains against the lack of documentation for the return value in the
> first 2 patches: 'Returns' should be '@Return'.

:-(

Apparently this is found via

scripts/kernel-doc -Wall -none <file>

I'll run this in the future, but, I have to say, its encouraging me
to just not write such kdocs entries in first place, no risk of making
a mistake.

Paolo, Pablo, what should I do now?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 00/14] Netfilter fixes for net
  2024-09-26 10:37   ` Florian Westphal
@ 2024-09-26 10:38     ` Pablo Neira Ayuso
  2024-09-26 10:41       ` Florian Westphal
  2024-09-26 10:43     ` Paolo Abeni
  1 sibling, 1 reply; 25+ messages in thread
From: Pablo Neira Ayuso @ 2024-09-26 10:38 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Paolo Abeni, netfilter-devel, davem, netdev, kuba, edumazet

On Thu, Sep 26, 2024 at 12:37:37PM +0200, Florian Westphal wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
> > On 9/24/24 22:13, Pablo Neira Ayuso wrote:
> > > The following patchset contains Netfilter fixes for net:
> > > 
> > > Patch #1 and #2 handle an esoteric scenario: Given two tasks sending UDP
> > > packets to one another, two packets of the same flow in each direction
> > > handled by different CPUs that result in two conntrack objects in NEW
> > > state, where reply packet loses race. Then, patch #3 adds a testcase for
> > > this scenario. Series from Florian Westphal.
> > 
> > Kdoc complains against the lack of documentation for the return value in the
> > first 2 patches: 'Returns' should be '@Return'.
> 
> :-(
> 
> Apparently this is found via
> 
> scripts/kernel-doc -Wall -none <file>
> 
> I'll run this in the future, but, I have to say, its encouraging me
> to just not write such kdocs entries in first place, no risk of making
> a mistake.
> 
> Paolo, Pablo, what should I do now?

I am going to fix it and resubmit PR.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 00/14] Netfilter fixes for net
  2024-09-26 10:38     ` Pablo Neira Ayuso
@ 2024-09-26 10:41       ` Florian Westphal
  0 siblings, 0 replies; 25+ messages in thread
From: Florian Westphal @ 2024-09-26 10:41 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Florian Westphal, Paolo Abeni, netfilter-devel, davem, netdev,
	kuba, edumazet

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Paolo, Pablo, what should I do now?
> 
> I am going to fix it and resubmit PR.

Thanks, and sorry about this.  I was not aware of how to format this
"properly".

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 00/14] Netfilter fixes for net
  2024-09-26 10:37   ` Florian Westphal
  2024-09-26 10:38     ` Pablo Neira Ayuso
@ 2024-09-26 10:43     ` Paolo Abeni
  2024-09-26 10:56       ` Pablo Neira Ayuso
  1 sibling, 1 reply; 25+ messages in thread
From: Paolo Abeni @ 2024-09-26 10:43 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, netfilter-devel, davem, netdev, kuba, edumazet

On 9/26/24 12:37, Florian Westphal wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
>> On 9/24/24 22:13, Pablo Neira Ayuso wrote:
>>> The following patchset contains Netfilter fixes for net:
>>>
>>> Patch #1 and #2 handle an esoteric scenario: Given two tasks sending UDP
>>> packets to one another, two packets of the same flow in each direction
>>> handled by different CPUs that result in two conntrack objects in NEW
>>> state, where reply packet loses race. Then, patch #3 adds a testcase for
>>> this scenario. Series from Florian Westphal.
>>
>> Kdoc complains against the lack of documentation for the return value in the
>> first 2 patches: 'Returns' should be '@Return'.
> 
> :-(
> 
> Apparently this is found via
> 
> scripts/kernel-doc -Wall -none <file>
> 
> I'll run this in the future, but, I have to say, its encouraging me
> to just not write such kdocs entries in first place, no risk of making
> a mistake.
> 
> Paolo, Pablo, what should I do now?

If an updated PR could be resent soon, say within ~1h, I can wait for 
the CI to run on it, merge and delay the net PR after that.

Otherwise, if the fixes in here are urgent, I can pull the series as-is, 
and you could follow-up on nf-next/net-next.

The last resort is just drop this from today's PR.

Please LMK your preference,

Paolo




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 00/14] Netfilter fixes for net
  2024-09-26 10:43     ` Paolo Abeni
@ 2024-09-26 10:56       ` Pablo Neira Ayuso
  0 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2024-09-26 10:56 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Florian Westphal, netfilter-devel, davem, netdev, kuba, edumazet

On Thu, Sep 26, 2024 at 12:43:23PM +0200, Paolo Abeni wrote:
> On 9/26/24 12:37, Florian Westphal wrote:
> > Paolo Abeni <pabeni@redhat.com> wrote:
> > > On 9/24/24 22:13, Pablo Neira Ayuso wrote:
> > > > The following patchset contains Netfilter fixes for net:
> > > > 
> > > > Patch #1 and #2 handle an esoteric scenario: Given two tasks sending UDP
> > > > packets to one another, two packets of the same flow in each direction
> > > > handled by different CPUs that result in two conntrack objects in NEW
> > > > state, where reply packet loses race. Then, patch #3 adds a testcase for
> > > > this scenario. Series from Florian Westphal.
> > > 
> > > Kdoc complains against the lack of documentation for the return value in the
> > > first 2 patches: 'Returns' should be '@Return'.
> > 
> > :-(
> > 
> > Apparently this is found via
> > 
> > scripts/kernel-doc -Wall -none <file>
> > 
> > I'll run this in the future, but, I have to say, its encouraging me
> > to just not write such kdocs entries in first place, no risk of making
> > a mistake.
> > 
> > Paolo, Pablo, what should I do now?
> 
> If an updated PR could be resent soon, say within ~1h, I can wait for the CI
> to run on it, merge and delay the net PR after that.
> 
> Otherwise, if the fixes in here are urgent, I can pull the series as-is, and
> you could follow-up on nf-next/net-next.
> 
> The last resort is just drop this from today's PR.
> 
> Please LMK your preference,

I am working on this now.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH net 00/14] Netfilter fixes for net
@ 2026-05-01 12:22 Pablo Neira Ayuso
  2026-05-01 12:22 ` [PATCH net 01/14] netfilter: replace skb_try_make_writable() by skb_ensure_writable() Pablo Neira Ayuso
                   ` (13 more replies)
  0 siblings, 14 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2026-05-01 12:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

Hi,

The following batch contains Netfilter fixes for net:

1) Replace skb_try_make_writable() by skb_ensure_writable() in
   nft_fwd_netdev and the flowtable to deal with uncloned packets
   having their network header in paged fragments.

2) Drop packet if output device does not exist and ensure sufficient
   headroom in nft_fwd_netdev before transmitting the skb.

3) Use the existing dup recursion counter in nft_fwd_netdev for the
   neigh_xmit variant, from Weiming Shi.

4) Add .check_hooks interface to x_tables to detach the control plane
   hook check based on the match/target configuration. Then, update
   nft_compat to use .check_hooks from .validate path, this fixes a
   lack of hook validation for several match/targets.

5) Fix incorrect .usersize in xt_CT, from Florian Westphal.

6) Fix a memleak with netdev tables in dormant state,
   from Florian Westphal.

7) Several patches to check if the packet is a fragment, then skip
   layer 4 inspection, for x_tables and nf_tables; as well as common
   nf_socket infrastructure. The xt_hashlimit match drops fragments
   to stay consistent with the existing approach when failing to parse
   the layer 4 protocol header.

8) Ensure sufficient headroom in the flowtable before transmitting
   the skb.

9) Fix the flowtable inline vlan approach for double-tagged vlan:
   Reverse the iteration over .encap[] since it represents the
   encapsulation as seen from the ingress path. Postpone pushing
   layer 2 header so output device is available to calculate needed
   headroom. Finally, add and use nf_flow_vlan_push() to fix it.

10) Fix flowtable inline pppoe with GSO packets. Moreover, use
    FLOW_OFFLOAD_XMIT_DIRECT to fill up destination hardware
    address since neighbour cache does not exist in pppoe.

11) Use skb_pull_rcsum() to decapsulate vlan and pppoe headers, for
    double-tagged vlan in particular this should provide some benefits
    in certain scenarios.

More notes regarding 9-11):

- sashiko is also signalling to use it for IPIP headers, but that needs
  more adjustments such setting skb->protocol after removing the IPIP
  header, will follow up in a separated patch.
- I plan to submit selftests to cover double-tagged-vlan. As for pppoe,
  it should be possible but that would mandate a few userspace dependencies.
  This has been semi-automatically  tested by me and reporters describing
  broken double-vlan-tagged and pppoe currently in the flowtable.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-26-05-01

Thanks.

----------------------------------------------------------------

The following changes since commit 0c7a5ba011d336df4fcd1f667fcc16ea5549be12:

  Merge branch 'mptcp-misc-fixes-for-v7-1-rc2' (2026-04-28 18:36:29 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-26-05-01

for you to fetch changes up to baa3c65435fb3f450b262672bc06db887a92d397:

  netfilter: flowtable: use skb_pull_rcsum() to pop vlan/pppoe header (2026-05-01 12:39:23 +0200)

----------------------------------------------------------------
netfilter pull request 26-05-01

----------------------------------------------------------------
Fernando Fernandez Mancera (3):
      netfilter: nf_socket: skip socket lookup for non-first fragments
      netfilter: nf_tables: skip L4 header parsing for non-first fragments
      netfilter: xtables: fix L4 header parsing for non-first fragments

Florian Westphal (2):
      netfilter: xt_CT: fix usersize for v1 and v2 revision
      netfilter: nf_tables: fix netdev hook allocation memleak with dormant tables

Pablo Neira Ayuso (8):
      netfilter: replace skb_try_make_writable() by skb_ensure_writable()
      netfilter: nft_fwd_netdev: add device and headroom validate with neigh forwarding
      netfilter: x_tables: add .check_hooks to matches and targets
      netfilter: nft_compat: run xt_check_hooks_{match,target}() from .validate
      netfilter: flowtable: ensure sufficient headroom in xmit path
      netfilter: flowtable: fix inline vlan encapsulation in xmit path
      netfilter: flowtable: fix inline pppoe encapsulation in xmit path
      netfilter: flowtable: use skb_pull_rcsum() to pop vlan/pppoe header

Weiming Shi (1):
      netfilter: nft_fwd_netdev: use recursion counter in neigh egress path

 include/linux/netfilter/x_tables.h    |   8 ++
 include/net/netfilter/nf_dup_netdev.h |  13 +++
 include/net/netfilter/nf_flow_table.h |   4 +-
 net/ipv4/netfilter/nf_socket_ipv4.c   |   3 +
 net/ipv6/netfilter/nf_socket_ipv6.c   |   5 +-
 net/netfilter/nf_dup_netdev.c         |  16 ----
 net/netfilter/nf_flow_table_core.c    |   1 +
 net/netfilter/nf_flow_table_ip.c      | 151 ++++++++++++++++++++++++++--------
 net/netfilter/nf_flow_table_path.c    |   7 +-
 net/netfilter/nf_tables_api.c         |  35 ++++----
 net/netfilter/nf_tables_core.c        |   2 +-
 net/netfilter/nft_compat.c            |  45 +++++++---
 net/netfilter/nft_exthdr.c            |   2 +-
 net/netfilter/nft_fwd_netdev.c        |  29 ++++++-
 net/netfilter/nft_osf.c               |   2 +-
 net/netfilter/nft_tproxy.c            |   8 +-
 net/netfilter/x_tables.c              |  79 ++++++++++++++++--
 net/netfilter/xt_CT.c                 |   8 +-
 net/netfilter/xt_TCPMSS.c             |  33 ++++----
 net/netfilter/xt_TPROXY.c             |  11 ++-
 net/netfilter/xt_addrtype.c           |  25 ++++--
 net/netfilter/xt_devgroup.c           |  18 ++--
 net/netfilter/xt_ecn.c                |   4 +
 net/netfilter/xt_hashlimit.c          |   4 +-
 net/netfilter/xt_osf.c                |   3 +
 net/netfilter/xt_physdev.c            |  20 +++--
 net/netfilter/xt_policy.c             |  24 ++++--
 net/netfilter/xt_set.c                |  39 +++++----
 net/netfilter/xt_tcpmss.c             |   4 +
 29 files changed, 447 insertions(+), 156 deletions(-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH net 01/14] netfilter: replace skb_try_make_writable() by skb_ensure_writable()
  2026-05-01 12:22 [PATCH net 00/14] Netfilter fixes for net Pablo Neira Ayuso
@ 2026-05-01 12:22 ` Pablo Neira Ayuso
  2026-05-01 23:50   ` patchwork-bot+netdevbpf
  2026-05-01 12:22 ` [PATCH net 02/14] netfilter: nft_fwd_netdev: add device and headroom validate with neigh forwarding Pablo Neira Ayuso
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Pablo Neira Ayuso @ 2026-05-01 12:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

skb_try_make_writable() only works on clones and uncloned packets might
have their network header in paged fragments.

nft_fwd needs to work for the ingress and egress hooks, but the egress
hook where skb->data points to the mac header, use skb_network_offset()
to include the mac header. The flowtable is fine since it already uses
the transport offset.

Fixes: d32de98ea70f ("netfilter: nft_fwd_netdev: allow to forward packets via neighbour layer")
Fixes: 7d2086871762 ("netfilter: nf_flow_table: move ipv4 offload hook code to nf_flow_table")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_flow_table_ip.c | 4 ++--
 net/netfilter/nft_fwd_netdev.c   | 5 +++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index fd56d663cb5b..dbd7644fdbeb 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -524,7 +524,7 @@ static int nf_flow_offload_forward(struct nf_flowtable_ctx *ctx,
 		return 0;
 	}
 
-	if (skb_try_make_writable(skb, thoff + ctx->hdrsize))
+	if (skb_ensure_writable(skb, thoff + ctx->hdrsize))
 		return -1;
 
 	flow_offload_refresh(flow_table, flow, false);
@@ -1037,7 +1037,7 @@ static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx,
 		return 0;
 	}
 
-	if (skb_try_make_writable(skb, thoff + ctx->hdrsize))
+	if (skb_ensure_writable(skb, thoff + ctx->hdrsize))
 		return -1;
 
 	flow_offload_refresh(flow_table, flow, false);
diff --git a/net/netfilter/nft_fwd_netdev.c b/net/netfilter/nft_fwd_netdev.c
index 4bce36c3a6a0..2cc809303ce8 100644
--- a/net/netfilter/nft_fwd_netdev.c
+++ b/net/netfilter/nft_fwd_netdev.c
@@ -100,6 +100,7 @@ static void nft_fwd_neigh_eval(const struct nft_expr *expr,
 	int oif = regs->data[priv->sreg_dev];
 	unsigned int verdict = NF_STOLEN;
 	struct sk_buff *skb = pkt->skb;
+	int nhoff = skb_network_offset(skb);
 	struct net_device *dev;
 	int neigh_table;
 
@@ -111,7 +112,7 @@ static void nft_fwd_neigh_eval(const struct nft_expr *expr,
 			verdict = NFT_BREAK;
 			goto out;
 		}
-		if (skb_try_make_writable(skb, sizeof(*iph))) {
+		if (skb_ensure_writable(skb, nhoff + sizeof(*iph))) {
 			verdict = NF_DROP;
 			goto out;
 		}
@@ -132,7 +133,7 @@ static void nft_fwd_neigh_eval(const struct nft_expr *expr,
 			verdict = NFT_BREAK;
 			goto out;
 		}
-		if (skb_try_make_writable(skb, sizeof(*ip6h))) {
+		if (skb_ensure_writable(skb, nhoff + sizeof(*ip6h))) {
 			verdict = NF_DROP;
 			goto out;
 		}
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net 02/14] netfilter: nft_fwd_netdev: add device and headroom validate with neigh forwarding
  2026-05-01 12:22 [PATCH net 00/14] Netfilter fixes for net Pablo Neira Ayuso
  2026-05-01 12:22 ` [PATCH net 01/14] netfilter: replace skb_try_make_writable() by skb_ensure_writable() Pablo Neira Ayuso
@ 2026-05-01 12:22 ` Pablo Neira Ayuso
  2026-05-01 12:22 ` [PATCH net 03/14] netfilter: nft_fwd_netdev: use recursion counter in neigh egress path Pablo Neira Ayuso
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2026-05-01 12:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

The ttl field has been decremented already and evaluation of this rule
would proceed, just drop this packet instead if there is no destination
device to forwards this packet. This is exactly what nf_dup already does
in this case.

Moreover, check for headroom and call skb_expand_head() like in the IP
output path to ensure there is sufficient headroom when forwarding this
via neigh_xmit().

Fixes: d32de98ea70f ("netfilter: nft_fwd_netdev: allow to forward packets via neighbour layer")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_fwd_netdev.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_fwd_netdev.c b/net/netfilter/nft_fwd_netdev.c
index 2cc809303ce8..605b1d42abce 100644
--- a/net/netfilter/nft_fwd_netdev.c
+++ b/net/netfilter/nft_fwd_netdev.c
@@ -102,6 +102,7 @@ static void nft_fwd_neigh_eval(const struct nft_expr *expr,
 	struct sk_buff *skb = pkt->skb;
 	int nhoff = skb_network_offset(skb);
 	struct net_device *dev;
+	unsigned int hh_len;
 	int neigh_table;
 
 	switch (priv->nfproto) {
@@ -153,8 +154,19 @@ static void nft_fwd_neigh_eval(const struct nft_expr *expr,
 	}
 
 	dev = dev_get_by_index_rcu(nft_net(pkt), oif);
-	if (dev == NULL)
-		return;
+	if (dev == NULL) {
+		verdict = NF_DROP;
+		goto out;
+	}
+
+	hh_len = LL_RESERVED_SPACE(dev);
+	if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
+		skb = skb_expand_head(skb, hh_len);
+		if (!skb) {
+			verdict = NF_STOLEN;
+			goto out;
+		}
+	}
 
 	skb->dev = dev;
 	skb_clear_tstamp(skb);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net 03/14] netfilter: nft_fwd_netdev: use recursion counter in neigh egress path
  2026-05-01 12:22 [PATCH net 00/14] Netfilter fixes for net Pablo Neira Ayuso
  2026-05-01 12:22 ` [PATCH net 01/14] netfilter: replace skb_try_make_writable() by skb_ensure_writable() Pablo Neira Ayuso
  2026-05-01 12:22 ` [PATCH net 02/14] netfilter: nft_fwd_netdev: add device and headroom validate with neigh forwarding Pablo Neira Ayuso
@ 2026-05-01 12:22 ` Pablo Neira Ayuso
  2026-05-01 12:22 ` [PATCH net 04/14] netfilter: x_tables: add .check_hooks to matches and targets Pablo Neira Ayuso
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2026-05-01 12:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

From: Weiming Shi <bestswngs@gmail.com>

nft_fwd_neigh can be used in egress chains (NF_NETDEV_EGRESS). When the
forwarding rule targets the same device or two devices forward to each
other, neigh_xmit() triggers dev_queue_xmit() which re-enters
nf_hook_egress(), causing infinite recursion and stack overflow.

Move the nf_get_nf_dup_skb_recursion() accessor and NF_RECURSION_LIMIT
to the shared header nf_dup_netdev.h as a static inline, so that
nft_fwd_netdev can use the recursion counter directly without exported
function call overhead. Guard neigh_xmit() with the same recursion
limit already used in nf_do_netdev_egress().

[ Updated to cache the nf_get_nf_dup_skb_recursion pointer. --pablo ]

Fixes: f87b9464d152 ("netfilter: nft_fwd_netdev: Support egress hook")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_dup_netdev.h | 13 +++++++++++++
 net/netfilter/nf_dup_netdev.c         | 16 ----------------
 net/netfilter/nft_fwd_netdev.c        |  8 ++++++++
 3 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/include/net/netfilter/nf_dup_netdev.h b/include/net/netfilter/nf_dup_netdev.h
index b175d271aec9..609bcf422a9b 100644
--- a/include/net/netfilter/nf_dup_netdev.h
+++ b/include/net/netfilter/nf_dup_netdev.h
@@ -3,10 +3,23 @@
 #define _NF_DUP_NETDEV_H_
 
 #include <net/netfilter/nf_tables.h>
+#include <linux/netdevice.h>
+#include <linux/sched.h>
 
 void nf_dup_netdev_egress(const struct nft_pktinfo *pkt, int oif);
 void nf_fwd_netdev_egress(const struct nft_pktinfo *pkt, int oif);
 
+#define NF_RECURSION_LIMIT	2
+
+static inline u8 *nf_get_nf_dup_skb_recursion(void)
+{
+#ifndef CONFIG_PREEMPT_RT
+	return this_cpu_ptr(&softnet_data.xmit.nf_dup_skb_recursion);
+#else
+	return &current->net_xmit.nf_dup_skb_recursion;
+#endif
+}
+
 struct nft_offload_ctx;
 struct nft_flow_rule;
 
diff --git a/net/netfilter/nf_dup_netdev.c b/net/netfilter/nf_dup_netdev.c
index e348fb90b8dc..3b0a70e154cd 100644
--- a/net/netfilter/nf_dup_netdev.c
+++ b/net/netfilter/nf_dup_netdev.c
@@ -13,22 +13,6 @@
 #include <net/netfilter/nf_tables_offload.h>
 #include <net/netfilter/nf_dup_netdev.h>
 
-#define NF_RECURSION_LIMIT	2
-
-#ifndef CONFIG_PREEMPT_RT
-static u8 *nf_get_nf_dup_skb_recursion(void)
-{
-	return this_cpu_ptr(&softnet_data.xmit.nf_dup_skb_recursion);
-}
-#else
-
-static u8 *nf_get_nf_dup_skb_recursion(void)
-{
-	return &current->net_xmit.nf_dup_skb_recursion;
-}
-
-#endif
-
 static void nf_do_netdev_egress(struct sk_buff *skb, struct net_device *dev,
 				enum nf_dev_hooks hook)
 {
diff --git a/net/netfilter/nft_fwd_netdev.c b/net/netfilter/nft_fwd_netdev.c
index 605b1d42abce..b9e88d7cf308 100644
--- a/net/netfilter/nft_fwd_netdev.c
+++ b/net/netfilter/nft_fwd_netdev.c
@@ -95,6 +95,7 @@ static void nft_fwd_neigh_eval(const struct nft_expr *expr,
 			      struct nft_regs *regs,
 			      const struct nft_pktinfo *pkt)
 {
+	u8 *nf_dup_skb_recursion = nf_get_nf_dup_skb_recursion();
 	struct nft_fwd_neigh *priv = nft_expr_priv(expr);
 	void *addr = &regs->data[priv->sreg_addr];
 	int oif = regs->data[priv->sreg_dev];
@@ -153,6 +154,11 @@ static void nft_fwd_neigh_eval(const struct nft_expr *expr,
 		goto out;
 	}
 
+	if (*nf_dup_skb_recursion > NF_RECURSION_LIMIT) {
+		verdict = NF_DROP;
+		goto out;
+	}
+
 	dev = dev_get_by_index_rcu(nft_net(pkt), oif);
 	if (dev == NULL) {
 		verdict = NF_DROP;
@@ -170,7 +176,9 @@ static void nft_fwd_neigh_eval(const struct nft_expr *expr,
 
 	skb->dev = dev;
 	skb_clear_tstamp(skb);
+	(*nf_dup_skb_recursion)++;
 	neigh_xmit(neigh_table, dev, addr, skb);
+	(*nf_dup_skb_recursion)--;
 out:
 	regs->verdict.code = verdict;
 }
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net 04/14] netfilter: x_tables: add .check_hooks to matches and targets
  2026-05-01 12:22 [PATCH net 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (2 preceding siblings ...)
  2026-05-01 12:22 ` [PATCH net 03/14] netfilter: nft_fwd_netdev: use recursion counter in neigh egress path Pablo Neira Ayuso
@ 2026-05-01 12:22 ` Pablo Neira Ayuso
  2026-05-01 12:22 ` [PATCH net 05/14] netfilter: nft_compat: run xt_check_hooks_{match,target}() from .validate Pablo Neira Ayuso
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2026-05-01 12:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

Add a new .check_hooks interface for checking if the match/target is
used from the validate hook according to its configuration.

Move existing conditional hook check based on the match/target
configuration from .checkentry to .check_hooks for the following
matches/targets:

- addrtype
- devgroup
- physdev
- policy
- set
- TCPMSS
- SET

This is a preparation patch to fix nft_compat, not functional changes
are intended.

Based on patch from Florian Westphal.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netfilter/x_tables.h |  8 +++
 net/netfilter/x_tables.c           | 79 +++++++++++++++++++++++++++---
 net/netfilter/xt_TCPMSS.c          | 33 +++++++------
 net/netfilter/xt_addrtype.c        | 25 +++++++---
 net/netfilter/xt_devgroup.c        | 18 +++++--
 net/netfilter/xt_physdev.c         | 20 ++++++--
 net/netfilter/xt_policy.c          | 24 +++++++--
 net/netfilter/xt_set.c             | 39 +++++++++------
 8 files changed, 187 insertions(+), 59 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index 77c778d84d4c..a81b46af5118 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -146,6 +146,9 @@ struct xt_match {
 	/* Called when user tries to insert an entry of this type. */
 	int (*checkentry)(const struct xt_mtchk_param *);
 
+	/* Called to validate hooks based on the match configuration. */
+	int (*check_hooks)(const struct xt_mtchk_param *);
+
 	/* Called when entry of this type deleted. */
 	void (*destroy)(const struct xt_mtdtor_param *);
 #ifdef CONFIG_NETFILTER_XTABLES_COMPAT
@@ -187,6 +190,9 @@ struct xt_target {
 	/* Should return 0 on success or an error code otherwise (-Exxxx). */
 	int (*checkentry)(const struct xt_tgchk_param *);
 
+	/* Called to validate hooks based on the target configuration. */
+	int (*check_hooks)(const struct xt_tgchk_param *);
+
 	/* Called when entry of this type deleted. */
 	void (*destroy)(const struct xt_tgdtor_param *);
 #ifdef CONFIG_NETFILTER_XTABLES_COMPAT
@@ -279,8 +285,10 @@ bool xt_find_jump_offset(const unsigned int *offsets,
 
 int xt_check_proc_name(const char *name, unsigned int size);
 
+int xt_check_hooks_match(struct xt_mtchk_param *par);
 int xt_check_match(struct xt_mtchk_param *, unsigned int size, u16 proto,
 		   bool inv_proto);
+int xt_check_hooks_target(struct xt_tgchk_param *par);
 int xt_check_target(struct xt_tgchk_param *, unsigned int size, u16 proto,
 		    bool inv_proto);
 
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 9f837fb5ceb4..2c67c2e6b132 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -477,11 +477,9 @@ int xt_check_proc_name(const char *name, unsigned int size)
 }
 EXPORT_SYMBOL(xt_check_proc_name);
 
-int xt_check_match(struct xt_mtchk_param *par,
-		   unsigned int size, u16 proto, bool inv_proto)
+static int xt_check_match_common(struct xt_mtchk_param *par,
+				 unsigned int size, u16 proto, bool inv_proto)
 {
-	int ret;
-
 	if (XT_ALIGN(par->match->matchsize) != size &&
 	    par->match->matchsize != -1) {
 		/*
@@ -530,6 +528,14 @@ int xt_check_match(struct xt_mtchk_param *par,
 				    par->match->proto);
 		return -EINVAL;
 	}
+
+	return 0;
+}
+
+static int xt_checkentry_match(struct xt_mtchk_param *par)
+{
+	int ret;
+
 	if (par->match->checkentry != NULL) {
 		ret = par->match->checkentry(par);
 		if (ret < 0)
@@ -538,8 +544,34 @@ int xt_check_match(struct xt_mtchk_param *par,
 			/* Flag up potential errors. */
 			return -EIO;
 	}
+
+	return 0;
+}
+
+int xt_check_hooks_match(struct xt_mtchk_param *par)
+{
+	if (par->match->check_hooks != NULL)
+		return par->match->check_hooks(par);
+
 	return 0;
 }
+EXPORT_SYMBOL_GPL(xt_check_hooks_match);
+
+int xt_check_match(struct xt_mtchk_param *par,
+		   unsigned int size, u16 proto, bool inv_proto)
+{
+	int ret;
+
+	ret = xt_check_match_common(par, size, proto, inv_proto);
+	if (ret < 0)
+		return ret;
+
+	ret = xt_check_hooks_match(par);
+	if (ret < 0)
+		return ret;
+
+	return xt_checkentry_match(par);
+}
 EXPORT_SYMBOL_GPL(xt_check_match);
 
 /** xt_check_entry_match - check that matches end before start of target
@@ -1012,11 +1044,9 @@ bool xt_find_jump_offset(const unsigned int *offsets,
 }
 EXPORT_SYMBOL(xt_find_jump_offset);
 
-int xt_check_target(struct xt_tgchk_param *par,
-		    unsigned int size, u16 proto, bool inv_proto)
+static int xt_check_target_common(struct xt_tgchk_param *par,
+				  unsigned int size, u16 proto, bool inv_proto)
 {
-	int ret;
-
 	if (XT_ALIGN(par->target->targetsize) != size) {
 		pr_err_ratelimited("%s_tables: %s.%u target: invalid size %u (kernel) != (user) %u\n",
 				   xt_prefix[par->family], par->target->name,
@@ -1061,6 +1091,23 @@ int xt_check_target(struct xt_tgchk_param *par,
 				    par->target->proto);
 		return -EINVAL;
 	}
+
+	return 0;
+}
+
+int xt_check_hooks_target(struct xt_tgchk_param *par)
+{
+	if (par->target->check_hooks != NULL)
+		return par->target->check_hooks(par);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(xt_check_hooks_target);
+
+static int xt_checkentry_target(struct xt_tgchk_param *par)
+{
+	int ret;
+
 	if (par->target->checkentry != NULL) {
 		ret = par->target->checkentry(par);
 		if (ret < 0)
@@ -1071,6 +1118,22 @@ int xt_check_target(struct xt_tgchk_param *par,
 	}
 	return 0;
 }
+
+int xt_check_target(struct xt_tgchk_param *par,
+		    unsigned int size, u16 proto, bool inv_proto)
+{
+	int ret;
+
+	ret = xt_check_target_common(par, size, proto, inv_proto);
+	if (ret < 0)
+		return ret;
+
+	ret = xt_check_hooks_target(par);
+	if (ret < 0)
+		return ret;
+
+	return xt_checkentry_target(par);
+}
 EXPORT_SYMBOL_GPL(xt_check_target);
 
 /**
diff --git a/net/netfilter/xt_TCPMSS.c b/net/netfilter/xt_TCPMSS.c
index 116a885adb3c..80e1634bc51f 100644
--- a/net/netfilter/xt_TCPMSS.c
+++ b/net/netfilter/xt_TCPMSS.c
@@ -247,6 +247,21 @@ tcpmss_tg6(struct sk_buff *skb, const struct xt_action_param *par)
 }
 #endif
 
+static int tcpmss_tg4_check_hooks(const struct xt_tgchk_param *par)
+{
+	const struct xt_tcpmss_info *info = par->targinfo;
+
+	if (info->mss == XT_TCPMSS_CLAMP_PMTU &&
+	    (par->hook_mask & ~((1 << NF_INET_FORWARD) |
+			   (1 << NF_INET_LOCAL_OUT) |
+			   (1 << NF_INET_POST_ROUTING))) != 0) {
+		pr_info_ratelimited("path-MTU clamping only supported in FORWARD, OUTPUT and POSTROUTING hooks\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /* Must specify -p tcp --syn */
 static inline bool find_syn_match(const struct xt_entry_match *m)
 {
@@ -262,17 +277,9 @@ static inline bool find_syn_match(const struct xt_entry_match *m)
 
 static int tcpmss_tg4_check(const struct xt_tgchk_param *par)
 {
-	const struct xt_tcpmss_info *info = par->targinfo;
 	const struct ipt_entry *e = par->entryinfo;
 	const struct xt_entry_match *ematch;
 
-	if (info->mss == XT_TCPMSS_CLAMP_PMTU &&
-	    (par->hook_mask & ~((1 << NF_INET_FORWARD) |
-			   (1 << NF_INET_LOCAL_OUT) |
-			   (1 << NF_INET_POST_ROUTING))) != 0) {
-		pr_info_ratelimited("path-MTU clamping only supported in FORWARD, OUTPUT and POSTROUTING hooks\n");
-		return -EINVAL;
-	}
 	if (par->nft_compat)
 		return 0;
 
@@ -286,17 +293,9 @@ static int tcpmss_tg4_check(const struct xt_tgchk_param *par)
 #if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
 static int tcpmss_tg6_check(const struct xt_tgchk_param *par)
 {
-	const struct xt_tcpmss_info *info = par->targinfo;
 	const struct ip6t_entry *e = par->entryinfo;
 	const struct xt_entry_match *ematch;
 
-	if (info->mss == XT_TCPMSS_CLAMP_PMTU &&
-	    (par->hook_mask & ~((1 << NF_INET_FORWARD) |
-			   (1 << NF_INET_LOCAL_OUT) |
-			   (1 << NF_INET_POST_ROUTING))) != 0) {
-		pr_info_ratelimited("path-MTU clamping only supported in FORWARD, OUTPUT and POSTROUTING hooks\n");
-		return -EINVAL;
-	}
 	if (par->nft_compat)
 		return 0;
 
@@ -312,6 +311,7 @@ static struct xt_target tcpmss_tg_reg[] __read_mostly = {
 	{
 		.family		= NFPROTO_IPV4,
 		.name		= "TCPMSS",
+		.check_hooks	= tcpmss_tg4_check_hooks,
 		.checkentry	= tcpmss_tg4_check,
 		.target		= tcpmss_tg4,
 		.targetsize	= sizeof(struct xt_tcpmss_info),
@@ -322,6 +322,7 @@ static struct xt_target tcpmss_tg_reg[] __read_mostly = {
 	{
 		.family		= NFPROTO_IPV6,
 		.name		= "TCPMSS",
+		.check_hooks	= tcpmss_tg4_check_hooks,
 		.checkentry	= tcpmss_tg6_check,
 		.target		= tcpmss_tg6,
 		.targetsize	= sizeof(struct xt_tcpmss_info),
diff --git a/net/netfilter/xt_addrtype.c b/net/netfilter/xt_addrtype.c
index a77088943107..913dbe3aa5e2 100644
--- a/net/netfilter/xt_addrtype.c
+++ b/net/netfilter/xt_addrtype.c
@@ -153,14 +153,10 @@ addrtype_mt_v1(const struct sk_buff *skb, struct xt_action_param *par)
 	return ret;
 }
 
-static int addrtype_mt_checkentry_v1(const struct xt_mtchk_param *par)
+static int addrtype_mt_check_hooks(const struct xt_mtchk_param *par)
 {
-	const char *errmsg = "both incoming and outgoing interface limitation cannot be selected";
 	struct xt_addrtype_info_v1 *info = par->matchinfo;
-
-	if (info->flags & XT_ADDRTYPE_LIMIT_IFACE_IN &&
-	    info->flags & XT_ADDRTYPE_LIMIT_IFACE_OUT)
-		goto err;
+	const char *errmsg;
 
 	if (par->hook_mask & ((1 << NF_INET_PRE_ROUTING) |
 	    (1 << NF_INET_LOCAL_IN)) &&
@@ -176,6 +172,21 @@ static int addrtype_mt_checkentry_v1(const struct xt_mtchk_param *par)
 		goto err;
 	}
 
+	return 0;
+err:
+	pr_info_ratelimited("%s\n", errmsg);
+	return -EINVAL;
+}
+
+static int addrtype_mt_checkentry_v1(const struct xt_mtchk_param *par)
+{
+	const char *errmsg = "both incoming and outgoing interface limitation cannot be selected";
+	struct xt_addrtype_info_v1 *info = par->matchinfo;
+
+	if (info->flags & XT_ADDRTYPE_LIMIT_IFACE_IN &&
+	    info->flags & XT_ADDRTYPE_LIMIT_IFACE_OUT)
+		goto err;
+
 #if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
 	if (par->family == NFPROTO_IPV6) {
 		if ((info->source | info->dest) & XT_ADDRTYPE_BLACKHOLE) {
@@ -211,6 +222,7 @@ static struct xt_match addrtype_mt_reg[] __read_mostly = {
 		.family		= NFPROTO_IPV4,
 		.revision	= 1,
 		.match		= addrtype_mt_v1,
+		.check_hooks	= addrtype_mt_check_hooks,
 		.checkentry	= addrtype_mt_checkentry_v1,
 		.matchsize	= sizeof(struct xt_addrtype_info_v1),
 		.me		= THIS_MODULE
@@ -221,6 +233,7 @@ static struct xt_match addrtype_mt_reg[] __read_mostly = {
 		.family		= NFPROTO_IPV6,
 		.revision	= 1,
 		.match		= addrtype_mt_v1,
+		.check_hooks	= addrtype_mt_check_hooks,
 		.checkentry	= addrtype_mt_checkentry_v1,
 		.matchsize	= sizeof(struct xt_addrtype_info_v1),
 		.me		= THIS_MODULE
diff --git a/net/netfilter/xt_devgroup.c b/net/netfilter/xt_devgroup.c
index 9520dd00070b..6d1a44ab5eee 100644
--- a/net/netfilter/xt_devgroup.c
+++ b/net/netfilter/xt_devgroup.c
@@ -33,14 +33,10 @@ static bool devgroup_mt(const struct sk_buff *skb, struct xt_action_param *par)
 	return true;
 }
 
-static int devgroup_mt_checkentry(const struct xt_mtchk_param *par)
+static int devgroup_mt_check_hooks(const struct xt_mtchk_param *par)
 {
 	const struct xt_devgroup_info *info = par->matchinfo;
 
-	if (info->flags & ~(XT_DEVGROUP_MATCH_SRC | XT_DEVGROUP_INVERT_SRC |
-			    XT_DEVGROUP_MATCH_DST | XT_DEVGROUP_INVERT_DST))
-		return -EINVAL;
-
 	if (info->flags & XT_DEVGROUP_MATCH_SRC &&
 	    par->hook_mask & ~((1 << NF_INET_PRE_ROUTING) |
 			       (1 << NF_INET_LOCAL_IN) |
@@ -56,9 +52,21 @@ static int devgroup_mt_checkentry(const struct xt_mtchk_param *par)
 	return 0;
 }
 
+static int devgroup_mt_checkentry(const struct xt_mtchk_param *par)
+{
+	const struct xt_devgroup_info *info = par->matchinfo;
+
+	if (info->flags & ~(XT_DEVGROUP_MATCH_SRC | XT_DEVGROUP_INVERT_SRC |
+			    XT_DEVGROUP_MATCH_DST | XT_DEVGROUP_INVERT_DST))
+		return -EINVAL;
+
+	return 0;
+}
+
 static struct xt_match devgroup_mt_reg __read_mostly = {
 	.name		= "devgroup",
 	.match		= devgroup_mt,
+	.check_hooks	= devgroup_mt_check_hooks,
 	.checkentry	= devgroup_mt_checkentry,
 	.matchsize	= sizeof(struct xt_devgroup_info),
 	.family		= NFPROTO_UNSPEC,
diff --git a/net/netfilter/xt_physdev.c b/net/netfilter/xt_physdev.c
index d2b0b52434fa..dd98f758176c 100644
--- a/net/netfilter/xt_physdev.c
+++ b/net/netfilter/xt_physdev.c
@@ -91,14 +91,10 @@ physdev_mt(const struct sk_buff *skb, struct xt_action_param *par)
 	return (!!ret ^ !(info->invert & XT_PHYSDEV_OP_OUT));
 }
 
-static int physdev_mt_check(const struct xt_mtchk_param *par)
+static int physdev_mt_check_hooks(const struct xt_mtchk_param *par)
 {
 	const struct xt_physdev_info *info = par->matchinfo;
-	static bool brnf_probed __read_mostly;
 
-	if (!(info->bitmask & XT_PHYSDEV_OP_MASK) ||
-	    info->bitmask & ~XT_PHYSDEV_OP_MASK)
-		return -EINVAL;
 	if (info->bitmask & (XT_PHYSDEV_OP_OUT | XT_PHYSDEV_OP_ISOUT) &&
 	    (!(info->bitmask & XT_PHYSDEV_OP_BRIDGED) ||
 	     info->invert & XT_PHYSDEV_OP_BRIDGED) &&
@@ -107,6 +103,18 @@ static int physdev_mt_check(const struct xt_mtchk_param *par)
 		return -EINVAL;
 	}
 
+	return 0;
+}
+
+static int physdev_mt_check(const struct xt_mtchk_param *par)
+{
+	const struct xt_physdev_info *info = par->matchinfo;
+	static bool brnf_probed __read_mostly;
+
+	if (!(info->bitmask & XT_PHYSDEV_OP_MASK) ||
+	    info->bitmask & ~XT_PHYSDEV_OP_MASK)
+		return -EINVAL;
+
 #define X(memb) strnlen(info->memb, sizeof(info->memb)) >= sizeof(info->memb)
 	if (info->bitmask & XT_PHYSDEV_OP_IN) {
 		if (info->physindev[0] == '\0')
@@ -141,6 +149,7 @@ static struct xt_match physdev_mt_reg[] __read_mostly = {
 	{
 		.name		= "physdev",
 		.family		= NFPROTO_IPV4,
+		.check_hooks	= physdev_mt_check_hooks,
 		.checkentry	= physdev_mt_check,
 		.match		= physdev_mt,
 		.matchsize	= sizeof(struct xt_physdev_info),
@@ -149,6 +158,7 @@ static struct xt_match physdev_mt_reg[] __read_mostly = {
 	{
 		.name		= "physdev",
 		.family		= NFPROTO_IPV6,
+		.check_hooks	= physdev_mt_check_hooks,
 		.checkentry	= physdev_mt_check,
 		.match		= physdev_mt,
 		.matchsize	= sizeof(struct xt_physdev_info),
diff --git a/net/netfilter/xt_policy.c b/net/netfilter/xt_policy.c
index b5fa65558318..ff54e3a8581e 100644
--- a/net/netfilter/xt_policy.c
+++ b/net/netfilter/xt_policy.c
@@ -126,13 +126,10 @@ policy_mt(const struct sk_buff *skb, struct xt_action_param *par)
 	return ret;
 }
 
-static int policy_mt_check(const struct xt_mtchk_param *par)
+static int policy_mt_check_hooks(const struct xt_mtchk_param *par)
 {
 	const struct xt_policy_info *info = par->matchinfo;
-	const char *errmsg = "neither incoming nor outgoing policy selected";
-
-	if (!(info->flags & (XT_POLICY_MATCH_IN|XT_POLICY_MATCH_OUT)))
-		goto err;
+	const char *errmsg;
 
 	if (par->hook_mask & ((1 << NF_INET_PRE_ROUTING) |
 	    (1 << NF_INET_LOCAL_IN)) && info->flags & XT_POLICY_MATCH_OUT) {
@@ -144,6 +141,21 @@ static int policy_mt_check(const struct xt_mtchk_param *par)
 		errmsg = "input policy not valid in POSTROUTING and OUTPUT";
 		goto err;
 	}
+
+	return 0;
+err:
+	pr_info_ratelimited("%s\n", errmsg);
+	return -EINVAL;
+}
+
+static int policy_mt_check(const struct xt_mtchk_param *par)
+{
+	const struct xt_policy_info *info = par->matchinfo;
+	const char *errmsg = "neither incoming nor outgoing policy selected";
+
+	if (!(info->flags & (XT_POLICY_MATCH_IN|XT_POLICY_MATCH_OUT)))
+		goto err;
+
 	if (info->len > XT_POLICY_MAX_ELEM) {
 		errmsg = "too many policy elements";
 		goto err;
@@ -158,6 +170,7 @@ static struct xt_match policy_mt_reg[] __read_mostly = {
 	{
 		.name		= "policy",
 		.family		= NFPROTO_IPV4,
+		.check_hooks	= policy_mt_check_hooks,
 		.checkentry 	= policy_mt_check,
 		.match		= policy_mt,
 		.matchsize	= sizeof(struct xt_policy_info),
@@ -166,6 +179,7 @@ static struct xt_match policy_mt_reg[] __read_mostly = {
 	{
 		.name		= "policy",
 		.family		= NFPROTO_IPV6,
+		.check_hooks	= policy_mt_check_hooks,
 		.checkentry	= policy_mt_check,
 		.match		= policy_mt,
 		.matchsize	= sizeof(struct xt_policy_info),
diff --git a/net/netfilter/xt_set.c b/net/netfilter/xt_set.c
index 731bc2cafae4..4ae04bba9358 100644
--- a/net/netfilter/xt_set.c
+++ b/net/netfilter/xt_set.c
@@ -430,6 +430,29 @@ set_target_v3(struct sk_buff *skb, const struct xt_action_param *par)
 	return XT_CONTINUE;
 }
 
+static int
+set_target_v3_check_hooks(const struct xt_tgchk_param *par)
+{
+	const struct xt_set_info_target_v3 *info = par->targinfo;
+
+	if (info->map_set.index != IPSET_INVALID_ID) {
+		if (strncmp(par->table, "mangle", 7)) {
+			pr_info_ratelimited("--map-set only usable from mangle table\n");
+			return -EINVAL;
+		}
+		if (((info->flags & IPSET_FLAG_MAP_SKBPRIO) |
+		     (info->flags & IPSET_FLAG_MAP_SKBQUEUE)) &&
+		     (par->hook_mask & ~(1 << NF_INET_FORWARD |
+					 1 << NF_INET_LOCAL_OUT |
+					 1 << NF_INET_POST_ROUTING))) {
+			pr_info_ratelimited("mapping of prio or/and queue is allowed only from OUTPUT/FORWARD/POSTROUTING chains\n");
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
 static int
 set_target_v3_checkentry(const struct xt_tgchk_param *par)
 {
@@ -459,20 +482,6 @@ set_target_v3_checkentry(const struct xt_tgchk_param *par)
 	}
 
 	if (info->map_set.index != IPSET_INVALID_ID) {
-		if (strncmp(par->table, "mangle", 7)) {
-			pr_info_ratelimited("--map-set only usable from mangle table\n");
-			ret = -EINVAL;
-			goto cleanup_del;
-		}
-		if (((info->flags & IPSET_FLAG_MAP_SKBPRIO) |
-		     (info->flags & IPSET_FLAG_MAP_SKBQUEUE)) &&
-		     (par->hook_mask & ~(1 << NF_INET_FORWARD |
-					 1 << NF_INET_LOCAL_OUT |
-					 1 << NF_INET_POST_ROUTING))) {
-			pr_info_ratelimited("mapping of prio or/and queue is allowed only from OUTPUT/FORWARD/POSTROUTING chains\n");
-			ret = -EINVAL;
-			goto cleanup_del;
-		}
 		index = ip_set_nfnl_get_byindex(par->net,
 						info->map_set.index);
 		if (index == IPSET_INVALID_ID) {
@@ -672,6 +681,7 @@ static struct xt_target set_targets[] __read_mostly = {
 		.family		= NFPROTO_IPV4,
 		.target		= set_target_v3,
 		.targetsize	= sizeof(struct xt_set_info_target_v3),
+		.check_hooks	= set_target_v3_check_hooks,
 		.checkentry	= set_target_v3_checkentry,
 		.destroy	= set_target_v3_destroy,
 		.me		= THIS_MODULE
@@ -682,6 +692,7 @@ static struct xt_target set_targets[] __read_mostly = {
 		.family		= NFPROTO_IPV6,
 		.target		= set_target_v3,
 		.targetsize	= sizeof(struct xt_set_info_target_v3),
+		.check_hooks	= set_target_v3_check_hooks,
 		.checkentry	= set_target_v3_checkentry,
 		.destroy	= set_target_v3_destroy,
 		.me		= THIS_MODULE
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net 05/14] netfilter: nft_compat: run xt_check_hooks_{match,target}() from .validate
  2026-05-01 12:22 [PATCH net 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (3 preceding siblings ...)
  2026-05-01 12:22 ` [PATCH net 04/14] netfilter: x_tables: add .check_hooks to matches and targets Pablo Neira Ayuso
@ 2026-05-01 12:22 ` Pablo Neira Ayuso
  2026-05-01 12:22 ` [PATCH net 06/14] netfilter: xt_CT: fix usersize for v1 and v2 revision Pablo Neira Ayuso
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2026-05-01 12:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

Several matches and one target check that the hook is correct from
checkentry(), however, the basechain is only available from
nft_table_validate().

This patch uses xt_check_hooks_{match,target}() from the nft_compat
expression .validate path.

This patch sets the table in the nft_ctx struct in nft_table_validate()
which is required by this patch.

Based on patch from Florian Westphal.

Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_api.c |  1 +
 net/netfilter/nft_compat.c    | 45 +++++++++++++++++++++++++++--------
 2 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index d20ce5c36d31..38e33c66c618 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -4205,6 +4205,7 @@ static int nft_table_validate(struct net *net, const struct nft_table *table)
 	struct nft_chain *chain;
 	struct nft_ctx ctx = {
 		.net	= net,
+		.table	= (struct nft_table *)table,
 		.family	= table->family,
 	};
 	int err = 0;
diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
index decc725a33c2..0caa9304d2d0 100644
--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -261,10 +261,10 @@ nft_target_init(const struct nft_ctx *ctx, const struct nft_expr *expr,
 			return ret;
 	}
 
-	nft_target_set_tgchk_param(&par, ctx, target, info, &e, proto, inv);
-
 	nft_compat_wait_for_destructors(ctx->net);
 
+	nft_target_set_tgchk_param(&par, ctx, target, info, &e, proto, inv);
+
 	ret = xt_check_target(&par, size, proto, inv);
 	if (ret < 0) {
 		if (ret == -ENOENT) {
@@ -353,8 +353,6 @@ static int nft_target_dump(struct sk_buff *skb,
 static int nft_target_validate(const struct nft_ctx *ctx,
 			       const struct nft_expr *expr)
 {
-	struct xt_target *target = expr->ops->data;
-	unsigned int hook_mask = 0;
 	int ret;
 
 	if (ctx->family != NFPROTO_IPV4 &&
@@ -377,11 +375,21 @@ static int nft_target_validate(const struct nft_ctx *ctx,
 		const struct nft_base_chain *basechain =
 						nft_base_chain(ctx->chain);
 		const struct nf_hook_ops *ops = &basechain->ops;
+		unsigned int hook_mask = 1 << ops->hooknum;
+		struct xt_target *target = expr->ops->data;
+		void *info = nft_expr_priv(expr);
+		struct xt_tgchk_param par;
+		union nft_entry e = {};
 
-		hook_mask = 1 << ops->hooknum;
 		if (target->hooks && !(hook_mask & target->hooks))
 			return -EINVAL;
 
+		nft_target_set_tgchk_param(&par, ctx, target, info, &e, 0, false);
+
+		ret = xt_check_hooks_target(&par);
+		if (ret < 0)
+			return ret;
+
 		ret = nft_compat_chain_validate_dependency(ctx, target->table);
 		if (ret < 0)
 			return ret;
@@ -515,10 +523,10 @@ __nft_match_init(const struct nft_ctx *ctx, const struct nft_expr *expr,
 			return ret;
 	}
 
-	nft_match_set_mtchk_param(&par, ctx, match, info, &e, proto, inv);
-
 	nft_compat_wait_for_destructors(ctx->net);
 
+	nft_match_set_mtchk_param(&par, ctx, match, info, &e, proto, inv);
+
 	return xt_check_match(&par, size, proto, inv);
 }
 
@@ -614,8 +622,6 @@ static int nft_match_large_dump(struct sk_buff *skb,
 static int nft_match_validate(const struct nft_ctx *ctx,
 			      const struct nft_expr *expr)
 {
-	struct xt_match *match = expr->ops->data;
-	unsigned int hook_mask = 0;
 	int ret;
 
 	if (ctx->family != NFPROTO_IPV4 &&
@@ -638,11 +644,30 @@ static int nft_match_validate(const struct nft_ctx *ctx,
 		const struct nft_base_chain *basechain =
 						nft_base_chain(ctx->chain);
 		const struct nf_hook_ops *ops = &basechain->ops;
+		unsigned int hook_mask = 1 << ops->hooknum;
+		struct xt_match *match = expr->ops->data;
+		size_t size = XT_ALIGN(match->matchsize);
+		struct xt_mtchk_param par;
+		union nft_entry e = {};
+		void *info;
 
-		hook_mask = 1 << ops->hooknum;
 		if (match->hooks && !(hook_mask & match->hooks))
 			return -EINVAL;
 
+		if (NFT_EXPR_SIZE(size) > NFT_MATCH_LARGE_THRESH) {
+			struct nft_xt_match_priv *priv = nft_expr_priv(expr);
+
+			info = priv->info;
+		} else {
+			info = nft_expr_priv(expr);
+		}
+
+		nft_match_set_mtchk_param(&par, ctx, match, info, &e, 0, false);
+
+		ret = xt_check_hooks_match(&par);
+		if (ret < 0)
+			return ret;
+
 		ret = nft_compat_chain_validate_dependency(ctx, match->table);
 		if (ret < 0)
 			return ret;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net 06/14] netfilter: xt_CT: fix usersize for v1 and v2 revision
  2026-05-01 12:22 [PATCH net 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (4 preceding siblings ...)
  2026-05-01 12:22 ` [PATCH net 05/14] netfilter: nft_compat: run xt_check_hooks_{match,target}() from .validate Pablo Neira Ayuso
@ 2026-05-01 12:22 ` Pablo Neira Ayuso
  2026-05-01 12:22 ` [PATCH net 07/14] netfilter: nf_tables: fix netdev hook allocation memleak with dormant tables Pablo Neira Ayuso
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2026-05-01 12:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

From: Florian Westphal <fw@strlen.de>

While resurrecting the conntrack-tool test cases I found following bug:
In:
iptables -I OUTPUT -t raw -p 13 -j CT --timeout test-generic
Out:
[0:0] -A OUTPUT -p 13 -j CT --timeout test

Data after first four bytes of the timeout policy name is never
copied to userspace because its treated as kernel-only.

Fixes: ec2318904965 ("xtables: extend matches and targets with .usersize")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/xt_CT.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c
index 498f5871c84a..d2aeacf94230 100644
--- a/net/netfilter/xt_CT.c
+++ b/net/netfilter/xt_CT.c
@@ -354,7 +354,7 @@ static struct xt_target xt_ct_tg_reg[] __read_mostly = {
 		.family		= NFPROTO_IPV4,
 		.revision	= 1,
 		.targetsize	= sizeof(struct xt_ct_target_info_v1),
-		.usersize	= offsetof(struct xt_ct_target_info, ct),
+		.usersize	= offsetof(struct xt_ct_target_info_v1, ct),
 		.checkentry	= xt_ct_tg_check_v1,
 		.destroy	= xt_ct_tg_destroy_v1,
 		.target		= xt_ct_target_v1,
@@ -366,7 +366,7 @@ static struct xt_target xt_ct_tg_reg[] __read_mostly = {
 		.family		= NFPROTO_IPV4,
 		.revision	= 2,
 		.targetsize	= sizeof(struct xt_ct_target_info_v1),
-		.usersize	= offsetof(struct xt_ct_target_info, ct),
+		.usersize	= offsetof(struct xt_ct_target_info_v1, ct),
 		.checkentry	= xt_ct_tg_check_v2,
 		.destroy	= xt_ct_tg_destroy_v1,
 		.target		= xt_ct_target_v1,
@@ -398,7 +398,7 @@ static struct xt_target xt_ct_tg_reg[] __read_mostly = {
 		.family		= NFPROTO_IPV6,
 		.revision	= 1,
 		.targetsize	= sizeof(struct xt_ct_target_info_v1),
-		.usersize	= offsetof(struct xt_ct_target_info, ct),
+		.usersize	= offsetof(struct xt_ct_target_info_v1, ct),
 		.checkentry	= xt_ct_tg_check_v1,
 		.destroy	= xt_ct_tg_destroy_v1,
 		.target		= xt_ct_target_v1,
@@ -410,7 +410,7 @@ static struct xt_target xt_ct_tg_reg[] __read_mostly = {
 		.family		= NFPROTO_IPV6,
 		.revision	= 2,
 		.targetsize	= sizeof(struct xt_ct_target_info_v1),
-		.usersize	= offsetof(struct xt_ct_target_info, ct),
+		.usersize	= offsetof(struct xt_ct_target_info_v1, ct),
 		.checkentry	= xt_ct_tg_check_v2,
 		.destroy	= xt_ct_tg_destroy_v1,
 		.target		= xt_ct_target_v1,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net 07/14] netfilter: nf_tables: fix netdev hook allocation memleak with dormant tables
  2026-05-01 12:22 [PATCH net 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (5 preceding siblings ...)
  2026-05-01 12:22 ` [PATCH net 06/14] netfilter: xt_CT: fix usersize for v1 and v2 revision Pablo Neira Ayuso
@ 2026-05-01 12:22 ` Pablo Neira Ayuso
  2026-05-01 12:22 ` [PATCH net 08/14] netfilter: nf_socket: skip socket lookup for non-first fragments Pablo Neira Ayuso
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2026-05-01 12:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

From: Florian Westphal <fw@strlen.de>

sashiko says:
 could the related code in __nf_tables_abort() leak the struct nft_hook objects when the table is dormant?

 In __nf_tables_abort(), when rolling back a NEWCHAIN transaction that
 updates hooks, the code conditionally unregisters and frees the hooks only
 if the table is not dormant [..]
            if (!(table->flags & NFT_TABLE_F_DORMANT)) {
                nft_netdev_unregister_hooks(net,
                                            &nft_trans_chain_hooks(trans),
                                            true);
            }
            ...
            nft_trans_destroy(trans);

Unfortunately netdev family mixes hook registration and allocation.
Push table struct down and only check for the flag to unregister.

Fixes: 216e7bf7402c ("netfilter: nf_tables: skip netdev hook unregistration if table is dormant")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_api.c | 34 ++++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 38e33c66c618..87387adbca65 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -407,6 +407,7 @@ static void nft_netdev_unregister_trans_hook(struct net *net,
 }
 
 static void nft_netdev_unregister_hooks(struct net *net,
+					const struct nft_table *table,
 					struct list_head *hook_list,
 					bool release_netdev)
 {
@@ -414,8 +415,10 @@ static void nft_netdev_unregister_hooks(struct net *net,
 	struct nf_hook_ops *ops;
 
 	list_for_each_entry_safe(hook, next, hook_list, list) {
-		list_for_each_entry(ops, &hook->ops_list, list)
-			nf_unregister_net_hook(net, ops);
+		if (!(table->flags & NFT_TABLE_F_DORMANT)) {
+			list_for_each_entry(ops, &hook->ops_list, list)
+				nf_unregister_net_hook(net, ops);
+		}
 		if (release_netdev)
 			nft_netdev_hook_unlink_free_rcu(hook);
 	}
@@ -452,20 +455,25 @@ static void __nf_tables_unregister_hook(struct net *net,
 	struct nft_base_chain *basechain;
 	const struct nf_hook_ops *ops;
 
-	if (table->flags & NFT_TABLE_F_DORMANT ||
-	    !nft_is_base_chain(chain))
+	if (!nft_is_base_chain(chain))
 		return;
 	basechain = nft_base_chain(chain);
 	ops = &basechain->ops;
 
+	/* must also be called for dormant tables */
+	if (nft_base_chain_netdev(table->family, basechain->ops.hooknum)) {
+		nft_netdev_unregister_hooks(net, table, &basechain->hook_list,
+					    release_netdev);
+		return;
+	}
+
+	if (table->flags & NFT_TABLE_F_DORMANT)
+		return;
+
 	if (basechain->type->ops_unregister)
 		return basechain->type->ops_unregister(net, ops);
 
-	if (nft_base_chain_netdev(table->family, basechain->ops.hooknum))
-		nft_netdev_unregister_hooks(net, &basechain->hook_list,
-					    release_netdev);
-	else
-		nf_unregister_net_hook(net, &basechain->ops);
+	nf_unregister_net_hook(net, &basechain->ops);
 }
 
 static void nf_tables_unregister_hook(struct net *net,
@@ -11282,11 +11290,9 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
 			break;
 		case NFT_MSG_NEWCHAIN:
 			if (nft_trans_chain_update(trans)) {
-				if (!(table->flags & NFT_TABLE_F_DORMANT)) {
-					nft_netdev_unregister_hooks(net,
-								    &nft_trans_chain_hooks(trans),
-								    true);
-				}
+				nft_netdev_unregister_hooks(net, table,
+							    &nft_trans_chain_hooks(trans),
+							    true);
 				free_percpu(nft_trans_chain_stats(trans));
 				kfree(nft_trans_chain_name(trans));
 				nft_trans_destroy(trans);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net 08/14] netfilter: nf_socket: skip socket lookup for non-first fragments
  2026-05-01 12:22 [PATCH net 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (6 preceding siblings ...)
  2026-05-01 12:22 ` [PATCH net 07/14] netfilter: nf_tables: fix netdev hook allocation memleak with dormant tables Pablo Neira Ayuso
@ 2026-05-01 12:22 ` Pablo Neira Ayuso
  2026-05-01 12:22 ` [PATCH net 09/14] netfilter: nf_tables: skip L4 header parsing " Pablo Neira Ayuso
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2026-05-01 12:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

From: Fernando Fernandez Mancera <fmancera@suse.de>

Both nft_socket and xt_socket relies on L4 headers to perform socket
lookup in the slow path. For fragmented packets, while the IP protocol
remains constant across all fragments, only the first fragment contains
the actual L4 header.

As the expression/match could be attached to a chain with a priority
lower than -400, it could bypass defragmentation.

Add a check for fragmentation in the lookup functions directly so the
problem is handled for both nft_socket and xt_socket at the same time.
In addition, future users of the functions would not need to care about
this.

Fixes: 902d6a4c2a4f ("netfilter: nf_defrag: Skip defrag if NOTRACK is set")
Fixes: 554ced0a6e29 ("netfilter: nf_tables: add support for native socket matching")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/nf_socket_ipv4.c | 3 +++
 net/ipv6/netfilter/nf_socket_ipv6.c | 5 +++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/netfilter/nf_socket_ipv4.c b/net/ipv4/netfilter/nf_socket_ipv4.c
index 5080fa5fbf6a..f9c6755f5ec5 100644
--- a/net/ipv4/netfilter/nf_socket_ipv4.c
+++ b/net/ipv4/netfilter/nf_socket_ipv4.c
@@ -94,6 +94,9 @@ struct sock *nf_sk_lookup_slow_v4(struct net *net, const struct sk_buff *skb,
 #endif
 	int doff = 0;
 
+	if (ntohs(iph->frag_off) & IP_OFFSET)
+		return NULL;
+
 	if (iph->protocol == IPPROTO_UDP || iph->protocol == IPPROTO_TCP) {
 		struct tcphdr _hdr;
 		struct udphdr *hp;
diff --git a/net/ipv6/netfilter/nf_socket_ipv6.c b/net/ipv6/netfilter/nf_socket_ipv6.c
index ced8bd44828e..893f2aeb4711 100644
--- a/net/ipv6/netfilter/nf_socket_ipv6.c
+++ b/net/ipv6/netfilter/nf_socket_ipv6.c
@@ -100,6 +100,7 @@ struct sock *nf_sk_lookup_slow_v6(struct net *net, const struct sk_buff *skb,
 	const struct in6_addr *daddr = NULL, *saddr = NULL;
 	struct ipv6hdr *iph = ipv6_hdr(skb), ipv6_var;
 	struct sk_buff *data_skb = NULL;
+	unsigned short fragoff = 0;
 	int doff = 0;
 	int thoff = 0, tproto;
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
@@ -107,8 +108,8 @@ struct sock *nf_sk_lookup_slow_v6(struct net *net, const struct sk_buff *skb,
 	struct nf_conn const *ct;
 #endif
 
-	tproto = ipv6_find_hdr(skb, &thoff, -1, NULL, NULL);
-	if (tproto < 0) {
+	tproto = ipv6_find_hdr(skb, &thoff, -1, &fragoff, NULL);
+	if (tproto < 0 || fragoff) {
 		pr_debug("unable to find transport header in IPv6 packet, dropping\n");
 		return NULL;
 	}
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net 09/14] netfilter: nf_tables: skip L4 header parsing for non-first fragments
  2026-05-01 12:22 [PATCH net 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (7 preceding siblings ...)
  2026-05-01 12:22 ` [PATCH net 08/14] netfilter: nf_socket: skip socket lookup for non-first fragments Pablo Neira Ayuso
@ 2026-05-01 12:22 ` Pablo Neira Ayuso
  2026-05-01 12:22 ` [PATCH net 10/14] netfilter: xtables: fix " Pablo Neira Ayuso
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2026-05-01 12:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

From: Fernando Fernandez Mancera <fmancera@suse.de>

The tproxy, osf and exthdr (SCTP) expressions rely on the presence of
transport layer headers to perform socket lookups, fingerprint matching,
or chunk extraction. For fragmented packets, while the IP protocol
remains constant across all fragments, only the first fragment contains
the actual L4 header.

The expressions could be attached to a chain with a priority lower than
-400, bypassing defragmentation. Or could be used in stateless
environments where defragmentation is not happening at all.  This could
result in garbage data being used for the matching.

Add a check for pkt->fragoff so only unfragmented packets or the first
fragment is processed.

Fixes: 133dc203d77d ("netfilter: nft_exthdr: Support SCTP chunks")
Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support")
Fixes: b96af92d6eaf ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_core.c | 2 +-
 net/netfilter/nft_exthdr.c     | 2 +-
 net/netfilter/nft_osf.c        | 2 +-
 net/netfilter/nft_tproxy.c     | 8 ++++----
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index 5ddd5b6e135f..8ab186f86dd4 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -153,7 +153,7 @@ static bool nft_payload_fast_eval(const struct nft_expr *expr,
 	if (priv->base == NFT_PAYLOAD_NETWORK_HEADER)
 		ptr = skb_network_header(skb) + pkt->nhoff;
 	else {
-		if (!(pkt->flags & NFT_PKTINFO_L4PROTO))
+		if (!(pkt->flags & NFT_PKTINFO_L4PROTO) || pkt->fragoff)
 			return false;
 		ptr = skb->data + nft_thoff(pkt);
 	}
diff --git a/net/netfilter/nft_exthdr.c b/net/netfilter/nft_exthdr.c
index 0407d6f708ae..e6a07c0df207 100644
--- a/net/netfilter/nft_exthdr.c
+++ b/net/netfilter/nft_exthdr.c
@@ -376,7 +376,7 @@ static void nft_exthdr_sctp_eval(const struct nft_expr *expr,
 	const struct sctp_chunkhdr *sch;
 	struct sctp_chunkhdr _sch;
 
-	if (pkt->tprot != IPPROTO_SCTP)
+	if (pkt->tprot != IPPROTO_SCTP || pkt->fragoff)
 		goto err;
 
 	do {
diff --git a/net/netfilter/nft_osf.c b/net/netfilter/nft_osf.c
index c02d5cb52143..45fe56da5044 100644
--- a/net/netfilter/nft_osf.c
+++ b/net/netfilter/nft_osf.c
@@ -33,7 +33,7 @@ static void nft_osf_eval(const struct nft_expr *expr, struct nft_regs *regs,
 		return;
 	}
 
-	if (pkt->tprot != IPPROTO_TCP) {
+	if (pkt->tprot != IPPROTO_TCP || pkt->fragoff) {
 		regs->verdict.code = NFT_BREAK;
 		return;
 	}
diff --git a/net/netfilter/nft_tproxy.c b/net/netfilter/nft_tproxy.c
index f2101af8c867..89be443734f6 100644
--- a/net/netfilter/nft_tproxy.c
+++ b/net/netfilter/nft_tproxy.c
@@ -30,8 +30,8 @@ static void nft_tproxy_eval_v4(const struct nft_expr *expr,
 	__be16 tport = 0;
 	struct sock *sk;
 
-	if (pkt->tprot != IPPROTO_TCP &&
-	    pkt->tprot != IPPROTO_UDP) {
+	if ((pkt->tprot != IPPROTO_TCP &&
+	     pkt->tprot != IPPROTO_UDP) || pkt->fragoff) {
 		regs->verdict.code = NFT_BREAK;
 		return;
 	}
@@ -97,8 +97,8 @@ static void nft_tproxy_eval_v6(const struct nft_expr *expr,
 
 	memset(&taddr, 0, sizeof(taddr));
 
-	if (pkt->tprot != IPPROTO_TCP &&
-	    pkt->tprot != IPPROTO_UDP) {
+	if ((pkt->tprot != IPPROTO_TCP &&
+	     pkt->tprot != IPPROTO_UDP) || pkt->fragoff) {
 		regs->verdict.code = NFT_BREAK;
 		return;
 	}
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net 10/14] netfilter: xtables: fix L4 header parsing for non-first fragments
  2026-05-01 12:22 [PATCH net 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (8 preceding siblings ...)
  2026-05-01 12:22 ` [PATCH net 09/14] netfilter: nf_tables: skip L4 header parsing " Pablo Neira Ayuso
@ 2026-05-01 12:22 ` Pablo Neira Ayuso
  2026-05-01 12:22 ` [PATCH net 11/14] netfilter: flowtable: ensure sufficient headroom in xmit path Pablo Neira Ayuso
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2026-05-01 12:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

From: Fernando Fernandez Mancera <fmancera@suse.de>

Multiple targets and matches relies on L4 header to operate. For
fragmented packets, every fragment carries the transport protocol
identifier, but only the first fragment contains the L4 header.

As the 'raw' table can be configured to run at priority -450 (before
defragmentation at -400), the target/match can be reached before
reassembly. In this case, non-first fragments have their payload
incorrectly parsed as a TCP/UDP header. This would be of course a
misconfiguration scenario. In most of the cases this just lead to a
unreliable behavior for fragmented traffic.

Add a fragment check to ensure target/match only evaluates unfragmented
packets or the first fragment in the stream.

Fixes: 902d6a4c2a4f ("netfilter: nf_defrag: Skip defrag if NOTRACK is set")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/xt_TPROXY.c    | 11 +++++++++--
 net/netfilter/xt_ecn.c       |  4 ++++
 net/netfilter/xt_hashlimit.c |  4 +++-
 net/netfilter/xt_osf.c       |  3 +++
 net/netfilter/xt_tcpmss.c    |  4 ++++
 5 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/xt_TPROXY.c b/net/netfilter/xt_TPROXY.c
index e4bea1d346cf..5f60e7298a1e 100644
--- a/net/netfilter/xt_TPROXY.c
+++ b/net/netfilter/xt_TPROXY.c
@@ -86,6 +86,9 @@ tproxy_tg4_v0(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct xt_tproxy_target_info *tgi = par->targinfo;
 
+	if (par->fragoff)
+		return NF_DROP;
+
 	return tproxy_tg4(xt_net(par), skb, tgi->laddr, tgi->lport,
 			  tgi->mark_mask, tgi->mark_value);
 }
@@ -95,6 +98,9 @@ tproxy_tg4_v1(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct xt_tproxy_target_info_v1 *tgi = par->targinfo;
 
+	if (par->fragoff)
+		return NF_DROP;
+
 	return tproxy_tg4(xt_net(par), skb, tgi->laddr.ip, tgi->lport,
 			  tgi->mark_mask, tgi->mark_value);
 }
@@ -106,6 +112,7 @@ tproxy_tg6_v1(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct ipv6hdr *iph = ipv6_hdr(skb);
 	const struct xt_tproxy_target_info_v1 *tgi = par->targinfo;
+	unsigned short fragoff = 0;
 	struct udphdr _hdr, *hp;
 	struct sock *sk;
 	const struct in6_addr *laddr;
@@ -113,8 +120,8 @@ tproxy_tg6_v1(struct sk_buff *skb, const struct xt_action_param *par)
 	int thoff = 0;
 	int tproto;
 
-	tproto = ipv6_find_hdr(skb, &thoff, -1, NULL, NULL);
-	if (tproto < 0)
+	tproto = ipv6_find_hdr(skb, &thoff, -1, &fragoff, NULL);
+	if (tproto < 0 || fragoff)
 		return NF_DROP;
 
 	hp = skb_header_pointer(skb, thoff, sizeof(_hdr), &_hdr);
diff --git a/net/netfilter/xt_ecn.c b/net/netfilter/xt_ecn.c
index b96e8203ac54..a8503f5d26bf 100644
--- a/net/netfilter/xt_ecn.c
+++ b/net/netfilter/xt_ecn.c
@@ -30,6 +30,10 @@ static bool match_tcp(const struct sk_buff *skb, struct xt_action_param *par)
 	struct tcphdr _tcph;
 	const struct tcphdr *th;
 
+	/* this is fine for IPv6 as ecn_mt_check6() enforces -p tcp */
+	if (par->fragoff)
+		return false;
+
 	/* In practice, TCP match does this, so can't fail.  But let's
 	 * be good citizens.
 	 */
diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 3bd127bfc114..2704b4b60d1e 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -658,6 +658,8 @@ hashlimit_init_dst(const struct xt_hashlimit_htable *hinfo,
 		if (!(hinfo->cfg.mode &
 		      (XT_HASHLIMIT_HASH_DPT | XT_HASHLIMIT_HASH_SPT)))
 			return 0;
+		if (ntohs(ip_hdr(skb)->frag_off) & IP_OFFSET)
+			return -1;
 		nexthdr = ip_hdr(skb)->protocol;
 		break;
 #if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
@@ -681,7 +683,7 @@ hashlimit_init_dst(const struct xt_hashlimit_htable *hinfo,
 			return 0;
 		nexthdr = ipv6_hdr(skb)->nexthdr;
 		protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr, &frag_off);
-		if ((int)protoff < 0)
+		if ((int)protoff < 0 || ntohs(frag_off) & IP6_OFFSET)
 			return -1;
 		break;
 	}
diff --git a/net/netfilter/xt_osf.c b/net/netfilter/xt_osf.c
index dc9485854002..e8807caede68 100644
--- a/net/netfilter/xt_osf.c
+++ b/net/netfilter/xt_osf.c
@@ -27,6 +27,9 @@
 static bool
 xt_osf_match_packet(const struct sk_buff *skb, struct xt_action_param *p)
 {
+	if (p->fragoff)
+		return false;
+
 	return nf_osf_match(skb, xt_family(p), xt_hooknum(p), xt_in(p),
 			    xt_out(p), p->matchinfo, xt_net(p), nf_osf_fingers);
 }
diff --git a/net/netfilter/xt_tcpmss.c b/net/netfilter/xt_tcpmss.c
index 0d32d4841cb3..b9da8269161d 100644
--- a/net/netfilter/xt_tcpmss.c
+++ b/net/netfilter/xt_tcpmss.c
@@ -32,6 +32,10 @@ tcpmss_mt(const struct sk_buff *skb, struct xt_action_param *par)
 	u8 _opt[15 * 4 - sizeof(_tcph)];
 	unsigned int i, optlen;
 
+	/* this is fine for IPv6 as xt_tcpmss enforces -p tcp */
+	if (par->fragoff)
+		return false;
+
 	/* If we don't have the whole header, drop packet. */
 	th = skb_header_pointer(skb, par->thoff, sizeof(_tcph), &_tcph);
 	if (th == NULL)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net 11/14] netfilter: flowtable: ensure sufficient headroom in xmit path
  2026-05-01 12:22 [PATCH net 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (9 preceding siblings ...)
  2026-05-01 12:22 ` [PATCH net 10/14] netfilter: xtables: fix " Pablo Neira Ayuso
@ 2026-05-01 12:22 ` Pablo Neira Ayuso
  2026-05-01 12:22 ` [PATCH net 12/14] netfilter: flowtable: fix inline vlan encapsulation " Pablo Neira Ayuso
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2026-05-01 12:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

Check for headroom and call skb_expand_head() like in the IP output
path to ensure there is sufficient headroom for the mac header when
forwarding this packet as suggested by sashiko.

Fixes: b5964aac51e0 ("netfilter: flowtable: consolidate xmit path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_flow_table_ip.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index dbd7644fdbeb..8d5fb7e940a1 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -471,8 +471,17 @@ struct nf_flow_xmit {
 static unsigned int nf_flow_queue_xmit(struct net *net, struct sk_buff *skb,
 				       struct nf_flow_xmit *xmit)
 {
-	skb->dev = xmit->outdev;
-	dev_hard_header(skb, skb->dev, ntohs(skb->protocol),
+	struct net_device *dev = xmit->outdev;
+	unsigned int hh_len = LL_RESERVED_SPACE(dev);
+
+	if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
+		skb = skb_expand_head(skb, hh_len);
+		if (!skb)
+			return NF_STOLEN;
+	}
+
+	skb->dev = dev;
+	dev_hard_header(skb, dev, ntohs(skb->protocol),
 			xmit->dest, xmit->source, skb->len);
 	dev_queue_xmit(skb);
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net 12/14] netfilter: flowtable: fix inline vlan encapsulation in xmit path
  2026-05-01 12:22 [PATCH net 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (10 preceding siblings ...)
  2026-05-01 12:22 ` [PATCH net 11/14] netfilter: flowtable: ensure sufficient headroom in xmit path Pablo Neira Ayuso
@ 2026-05-01 12:22 ` Pablo Neira Ayuso
  2026-05-01 12:22 ` [PATCH net 13/14] netfilter: flowtable: fix inline pppoe " Pablo Neira Ayuso
  2026-05-01 12:22 ` [PATCH net 14/14] netfilter: flowtable: use skb_pull_rcsum() to pop vlan/pppoe header Pablo Neira Ayuso
  13 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2026-05-01 12:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

Several issues in the inline vlan support:

- The layer 2 encapsulation representation in the tuple takes encap[0] as
  the outer header and encap[1] as the inner header as seen from the ingress
  path. Reverse the encap loop to push first the inner then the outer vlan
  header.

- Postpone pushing the layer 2 header once destination device is known.
  This allows to calculate the needed hearoom via LL_RESERVED_SPACE to
  accommodate the layer 2 headers.

- Add and use nf_flow_vlan_push() as suggested by Eric Woudstra, this
  is a simplified version of skb_vlan_push() for egress path only.

Fixes: c653d5a78f34 ("netfilter: flowtable: inline vlan encapsulation in xmit path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_flow_table_ip.c | 110 ++++++++++++++++++++-----------
 1 file changed, 73 insertions(+), 37 deletions(-)

diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 8d5fb7e940a1..0ce3c209050c 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -462,32 +462,6 @@ static void nf_flow_encap_pop(struct nf_flowtable_ctx *ctx,
 		nf_flow_ip_tunnel_pop(ctx, skb);
 }
 
-struct nf_flow_xmit {
-	const void		*dest;
-	const void		*source;
-	struct net_device	*outdev;
-};
-
-static unsigned int nf_flow_queue_xmit(struct net *net, struct sk_buff *skb,
-				       struct nf_flow_xmit *xmit)
-{
-	struct net_device *dev = xmit->outdev;
-	unsigned int hh_len = LL_RESERVED_SPACE(dev);
-
-	if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
-		skb = skb_expand_head(skb, hh_len);
-		if (!skb)
-			return NF_STOLEN;
-	}
-
-	skb->dev = dev;
-	dev_hard_header(skb, dev, ntohs(skb->protocol),
-			xmit->dest, xmit->source, skb->len);
-	dev_queue_xmit(skb);
-
-	return NF_STOLEN;
-}
-
 static struct flow_offload_tuple_rhash *
 nf_flow_offload_lookup(struct nf_flowtable_ctx *ctx,
 		       struct nf_flowtable *flow_table, struct sk_buff *skb)
@@ -553,6 +527,32 @@ static int nf_flow_offload_forward(struct nf_flowtable_ctx *ctx,
 	return 1;
 }
 
+/* Similar to skb_vlan_push. */
+static int nf_flow_vlan_push(struct sk_buff *skb, __be16 proto, u16 id,
+			     u32 needed_headroom)
+{
+	if (skb_vlan_tag_present(skb)) {
+		struct vlan_hdr *vhdr;
+
+		if (skb_cow_head(skb, needed_headroom + VLAN_HLEN))
+			return -1;
+
+		__skb_push(skb, VLAN_HLEN);
+		if (skb_mac_header_was_set(skb))
+			skb->mac_header -= VLAN_HLEN;
+
+		vhdr = (struct vlan_hdr *)skb->data;
+		skb->network_header -= VLAN_HLEN;
+		vhdr->h_vlan_TCI = htons(skb_vlan_tag_get(skb));
+		vhdr->h_vlan_encapsulated_proto = skb->protocol;
+		skb->protocol = skb->vlan_proto;
+		skb_postpush_rcsum(skb, skb->data, VLAN_HLEN);
+	}
+	__vlan_hwaccel_put_tag(skb, proto, id);
+
+	return 0;
+}
+
 static int nf_flow_pppoe_push(struct sk_buff *skb, u16 id)
 {
 	int data_len = skb->len + sizeof(__be16);
@@ -739,17 +739,19 @@ static int nf_flow_tunnel_v6_push(struct net *net, struct sk_buff *skb,
 }
 
 static int nf_flow_encap_push(struct sk_buff *skb,
-			      struct flow_offload_tuple *tuple)
+			      struct flow_offload_tuple *tuple,
+			      struct net_device *outdev)
 {
+	u32 needed_headroom = LL_RESERVED_SPACE(outdev);
 	int i;
 
-	for (i = 0; i < tuple->encap_num; i++) {
+	for (i = tuple->encap_num - 1; i >= 0; i--) {
 		switch (tuple->encap[i].proto) {
 		case htons(ETH_P_8021Q):
 		case htons(ETH_P_8021AD):
-			skb_reset_mac_header(skb);
-			if (skb_vlan_push(skb, tuple->encap[i].proto,
-					  tuple->encap[i].id) < 0)
+			if (nf_flow_vlan_push(skb, tuple->encap[i].proto,
+					      tuple->encap[i].id,
+					      needed_headroom) < 0)
 				return -1;
 			break;
 		case htons(ETH_P_PPP_SES):
@@ -762,6 +764,44 @@ static int nf_flow_encap_push(struct sk_buff *skb,
 	return 0;
 }
 
+struct nf_flow_xmit {
+	const void		*dest;
+	const void		*source;
+	struct net_device	*outdev;
+	struct flow_offload_tuple *tuple;
+};
+
+static void __nf_flow_queue_xmit(struct net *net, struct sk_buff *skb,
+				 struct nf_flow_xmit *xmit)
+{
+	struct net_device *dev = xmit->outdev;
+	unsigned int hh_len = LL_RESERVED_SPACE(dev);
+
+	if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
+		skb = skb_expand_head(skb, hh_len);
+		if (!skb)
+			return;
+	}
+
+	skb->dev = dev;
+	dev_hard_header(skb, dev, ntohs(skb->protocol),
+			xmit->dest, xmit->source, skb->len);
+	dev_queue_xmit(skb);
+}
+
+static unsigned int nf_flow_queue_xmit(struct net *net, struct sk_buff *skb,
+				       struct nf_flow_xmit *xmit)
+{
+	if (xmit->tuple->encap_num) {
+		if (nf_flow_encap_push(skb, xmit->tuple, xmit->outdev) < 0)
+			return NF_DROP;
+	}
+
+	__nf_flow_queue_xmit(net, skb, xmit);
+
+	return NF_STOLEN;
+}
+
 unsigned int
 nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
 			const struct nf_hook_state *state)
@@ -806,9 +846,6 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
 	if (nf_flow_tunnel_v4_push(state->net, skb, other_tuple, &ip_daddr) < 0)
 		return NF_DROP;
 
-	if (nf_flow_encap_push(skb, other_tuple) < 0)
-		return NF_DROP;
-
 	switch (tuplehash->tuple.xmit_type) {
 	case FLOW_OFFLOAD_XMIT_NEIGH:
 		rt = dst_rtable(tuplehash->tuple.dst_cache);
@@ -838,6 +875,7 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
 		WARN_ON_ONCE(1);
 		return NF_DROP;
 	}
+	xmit.tuple = other_tuple;
 
 	return nf_flow_queue_xmit(state->net, skb, &xmit);
 }
@@ -1128,9 +1166,6 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
 				   &ip6_daddr, encap_limit) < 0)
 		return NF_DROP;
 
-	if (nf_flow_encap_push(skb, other_tuple) < 0)
-		return NF_DROP;
-
 	switch (tuplehash->tuple.xmit_type) {
 	case FLOW_OFFLOAD_XMIT_NEIGH:
 		rt = dst_rt6_info(tuplehash->tuple.dst_cache);
@@ -1160,6 +1195,7 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
 		WARN_ON_ONCE(1);
 		return NF_DROP;
 	}
+	xmit.tuple = other_tuple;
 
 	return nf_flow_queue_xmit(state->net, skb, &xmit);
 }
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net 13/14] netfilter: flowtable: fix inline pppoe encapsulation in xmit path
  2026-05-01 12:22 [PATCH net 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (11 preceding siblings ...)
  2026-05-01 12:22 ` [PATCH net 12/14] netfilter: flowtable: fix inline vlan encapsulation " Pablo Neira Ayuso
@ 2026-05-01 12:22 ` Pablo Neira Ayuso
  2026-05-01 12:22 ` [PATCH net 14/14] netfilter: flowtable: use skb_pull_rcsum() to pop vlan/pppoe header Pablo Neira Ayuso
  13 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2026-05-01 12:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

Address two issues in the inline pppoe encapsulation:

- Add needs_gso_segment flag to segment PPPoE packets in software
  given that there is no GSO support for this.

- Use FLOW_OFFLOAD_XMIT_DIRECT since neighbour cache is not available
  in point-to-point device, use the hardware address that is obtained
  via flowtable path discovery (ie. fill_forward_path).

Fixes: 18d27bed0880 ("netfilter: flowtable: inline pppoe encapsulation in xmit path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_flow_table.h |  4 ++-
 net/netfilter/nf_flow_table_core.c    |  1 +
 net/netfilter/nf_flow_table_ip.c      | 42 +++++++++++++++++++++++++--
 net/netfilter/nf_flow_table_path.c    |  7 ++++-
 4 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index b09c11c048d5..7b23b245a5a8 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -148,9 +148,10 @@ struct flow_offload_tuple {
 	/* All members above are keys for lookups, see flow_offload_hash(). */
 	struct { }			__hash;
 
-	u8				dir:2,
+	u16				dir:2,
 					xmit_type:3,
 					encap_num:2,
+					needs_gso_segment:1,
 					tun_num:2,
 					in_vlan_ingress:2;
 	u16				mtu;
@@ -232,6 +233,7 @@ struct nf_flow_route {
 			u32			hw_ifindex;
 			u8			h_source[ETH_ALEN];
 			u8			h_dest[ETH_ALEN];
+			u8			needs_gso_segment:1;
 		} out;
 		enum flow_offload_xmit_type	xmit_type;
 	} tuple[FLOW_OFFLOAD_DIR_MAX];
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 2c4140e6f53c..785d8c244a77 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -122,6 +122,7 @@ static int flow_offload_fill_route(struct flow_offload *flow,
 
 	flow_tuple->tun = route->tuple[dir].in.tun;
 	flow_tuple->encap_num = route->tuple[dir].in.num_encaps;
+	flow_tuple->needs_gso_segment = route->tuple[dir].out.needs_gso_segment;
 	flow_tuple->tun_num = route->tuple[dir].in.num_tuns;
 
 	switch (route->tuple[dir].xmit_type) {
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 0ce3c209050c..2eba64eb393a 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -553,7 +553,8 @@ static int nf_flow_vlan_push(struct sk_buff *skb, __be16 proto, u16 id,
 	return 0;
 }
 
-static int nf_flow_pppoe_push(struct sk_buff *skb, u16 id)
+static int nf_flow_pppoe_push(struct sk_buff *skb, u16 id,
+			      u32 needed_headroom)
 {
 	int data_len = skb->len + sizeof(__be16);
 	struct ppp_hdr {
@@ -562,7 +563,7 @@ static int nf_flow_pppoe_push(struct sk_buff *skb, u16 id)
 	} *ph;
 	__be16 proto;
 
-	if (skb_cow_head(skb, PPPOE_SES_HLEN))
+	if (skb_cow_head(skb, needed_headroom + PPPOE_SES_HLEN))
 		return -1;
 
 	switch (skb->protocol) {
@@ -755,7 +756,8 @@ static int nf_flow_encap_push(struct sk_buff *skb,
 				return -1;
 			break;
 		case htons(ETH_P_PPP_SES):
-			if (nf_flow_pppoe_push(skb, tuple->encap[i].id) < 0)
+			if (nf_flow_pppoe_push(skb, tuple->encap[i].id,
+					       needed_headroom) < 0)
 				return -1;
 			break;
 		}
@@ -769,6 +771,7 @@ struct nf_flow_xmit {
 	const void		*source;
 	struct net_device	*outdev;
 	struct flow_offload_tuple *tuple;
+	bool			needs_gso_segment;
 };
 
 static void __nf_flow_queue_xmit(struct net *net, struct sk_buff *skb,
@@ -789,10 +792,41 @@ static void __nf_flow_queue_xmit(struct net *net, struct sk_buff *skb,
 	dev_queue_xmit(skb);
 }
 
+static unsigned int nf_flow_encap_gso_xmit(struct net *net, struct sk_buff *skb,
+					   struct nf_flow_xmit *xmit)
+{
+	struct sk_buff *segs, *nskb;
+
+	segs = skb_gso_segment(skb, 0);
+	if (IS_ERR(segs))
+		return NF_DROP;
+
+	if (segs)
+		consume_skb(skb);
+	else
+		segs = skb;
+
+	skb_list_walk_safe(segs, segs, nskb) {
+		skb_mark_not_on_list(segs);
+
+		if (nf_flow_encap_push(segs, xmit->tuple, xmit->outdev) < 0) {
+			kfree_skb(segs);
+			kfree_skb_list(nskb);
+			return NF_STOLEN;
+		}
+		__nf_flow_queue_xmit(net, segs, xmit);
+	}
+
+	return NF_STOLEN;
+}
+
 static unsigned int nf_flow_queue_xmit(struct net *net, struct sk_buff *skb,
 				       struct nf_flow_xmit *xmit)
 {
 	if (xmit->tuple->encap_num) {
+		if (skb_is_gso(skb) && xmit->needs_gso_segment)
+			return nf_flow_encap_gso_xmit(net, skb, xmit);
+
 		if (nf_flow_encap_push(skb, xmit->tuple, xmit->outdev) < 0)
 			return NF_DROP;
 	}
@@ -876,6 +910,7 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
 		return NF_DROP;
 	}
 	xmit.tuple = other_tuple;
+	xmit.needs_gso_segment = tuplehash->tuple.needs_gso_segment;
 
 	return nf_flow_queue_xmit(state->net, skb, &xmit);
 }
@@ -1196,6 +1231,7 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
 		return NF_DROP;
 	}
 	xmit.tuple = other_tuple;
+	xmit.needs_gso_segment = tuplehash->tuple.needs_gso_segment;
 
 	return nf_flow_queue_xmit(state->net, skb, &xmit);
 }
diff --git a/net/netfilter/nf_flow_table_path.c b/net/netfilter/nf_flow_table_path.c
index 6bb9579dcc2a..9e88ea6a2eef 100644
--- a/net/netfilter/nf_flow_table_path.c
+++ b/net/netfilter/nf_flow_table_path.c
@@ -86,6 +86,7 @@ struct nft_forward_info {
 	u8 ingress_vlans;
 	u8 h_source[ETH_ALEN];
 	u8 h_dest[ETH_ALEN];
+	bool needs_gso_segment;
 	enum flow_offload_xmit_type xmit_type;
 };
 
@@ -138,8 +139,11 @@ static void nft_dev_path_info(const struct net_device_path_stack *stack,
 					path->encap.proto;
 				info->num_encaps++;
 			}
-			if (path->type == DEV_PATH_PPPOE)
+			if (path->type == DEV_PATH_PPPOE) {
 				memcpy(info->h_dest, path->encap.h_dest, ETH_ALEN);
+				info->xmit_type = FLOW_OFFLOAD_XMIT_DIRECT;
+				info->needs_gso_segment = 1;
+			}
 			break;
 		case DEV_PATH_BRIDGE:
 			if (is_zero_ether_addr(info->h_source))
@@ -279,6 +283,7 @@ static void nft_dev_forward_path(const struct nft_pktinfo *pkt,
 		memcpy(route->tuple[dir].out.h_dest, info.h_dest, ETH_ALEN);
 		route->tuple[dir].xmit_type = info.xmit_type;
 	}
+	route->tuple[dir].out.needs_gso_segment = info.needs_gso_segment;
 }
 
 int nft_flow_route(const struct nft_pktinfo *pkt, const struct nf_conn *ct,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net 14/14] netfilter: flowtable: use skb_pull_rcsum() to pop vlan/pppoe header
  2026-05-01 12:22 [PATCH net 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (12 preceding siblings ...)
  2026-05-01 12:22 ` [PATCH net 13/14] netfilter: flowtable: fix inline pppoe " Pablo Neira Ayuso
@ 2026-05-01 12:22 ` Pablo Neira Ayuso
  13 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2026-05-01 12:22 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

This adjusts the checksum, if required, after pulling the layer 2
header, either the pppoe header or the inner vlan header in the
double-tagged vlan packets.

Fixes: 4cd91f7c290f ("netfilter: flowtable: add vlan support")
Fixes: 72efd585f714 ("netfilter: flowtable: add pppoe support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_flow_table_ip.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 2eba64eb393a..9c05a50d6013 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -445,13 +445,13 @@ static void nf_flow_encap_pop(struct nf_flowtable_ctx *ctx,
 		switch (skb->protocol) {
 		case htons(ETH_P_8021Q):
 			vlan_hdr = (struct vlan_hdr *)skb->data;
-			__skb_pull(skb, VLAN_HLEN);
+			skb_pull_rcsum(skb, VLAN_HLEN);
 			vlan_set_encap_proto(skb, vlan_hdr);
 			skb_reset_network_header(skb);
 			break;
 		case htons(ETH_P_PPP_SES):
 			skb->protocol = __nf_flow_pppoe_proto(skb);
-			skb_pull(skb, PPPOE_SES_HLEN);
+			skb_pull_rcsum(skb, PPPOE_SES_HLEN);
 			skb_reset_network_header(skb);
 			break;
 		}
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH net 01/14] netfilter: replace skb_try_make_writable() by skb_ensure_writable()
  2026-05-01 12:22 ` [PATCH net 01/14] netfilter: replace skb_try_make_writable() by skb_ensure_writable() Pablo Neira Ayuso
@ 2026-05-01 23:50   ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 25+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-05-01 23:50 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, kuba, pabeni, edumazet, fw, horms

Hello:

This series was applied to netdev/net.git (main)
by Pablo Neira Ayuso <pablo@netfilter.org>:

On Fri,  1 May 2026 14:22:24 +0200 you wrote:
> skb_try_make_writable() only works on clones and uncloned packets might
> have their network header in paged fragments.
> 
> nft_fwd needs to work for the ingress and egress hooks, but the egress
> hook where skb->data points to the mac header, use skb_network_offset()
> to include the mac header. The flowtable is fine since it already uses
> the transport offset.
> 
> [...]

Here is the summary with links:
  - [net,01/14] netfilter: replace skb_try_make_writable() by skb_ensure_writable()
    https://git.kernel.org/netdev/net/c/1049970d7583
  - [net,02/14] netfilter: nft_fwd_netdev: add device and headroom validate with neigh forwarding
    https://git.kernel.org/netdev/net/c/0a0b35f0bf10
  - [net,03/14] netfilter: nft_fwd_netdev: use recursion counter in neigh egress path
    https://git.kernel.org/netdev/net/c/1d47b55b36d2
  - [net,04/14] netfilter: x_tables: add .check_hooks to matches and targets
    https://git.kernel.org/netdev/net/c/6813985ca456
  - [net,05/14] netfilter: nft_compat: run xt_check_hooks_{match,target}() from .validate
    https://git.kernel.org/netdev/net/c/2f768d638d97
  - [net,06/14] netfilter: xt_CT: fix usersize for v1 and v2 revision
    https://git.kernel.org/netdev/net/c/8bedb6c46945
  - [net,07/14] netfilter: nf_tables: fix netdev hook allocation memleak with dormant tables
    https://git.kernel.org/netdev/net/c/63bac0278603
  - [net,08/14] netfilter: nf_socket: skip socket lookup for non-first fragments
    https://git.kernel.org/netdev/net/c/0bf00859d7a5
  - [net,09/14] netfilter: nf_tables: skip L4 header parsing for non-first fragments
    https://git.kernel.org/netdev/net/c/009d203e56db
  - [net,10/14] netfilter: xtables: fix L4 header parsing for non-first fragments
    https://git.kernel.org/netdev/net/c/952e121c9613
  - [net,11/14] netfilter: flowtable: ensure sufficient headroom in xmit path
    https://git.kernel.org/netdev/net/c/ef4f741e8627
  - [net,12/14] netfilter: flowtable: fix inline vlan encapsulation in xmit path
    https://git.kernel.org/netdev/net/c/a177ae30f786
  - [net,13/14] netfilter: flowtable: fix inline pppoe encapsulation in xmit path
    https://git.kernel.org/netdev/net/c/69c54f80f4a7
  - [net,14/14] netfilter: flowtable: use skb_pull_rcsum() to pop vlan/pppoe header
    https://git.kernel.org/netdev/net/c/baa3c65435fb

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2026-05-01 23:51 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-01 12:22 [PATCH net 00/14] Netfilter fixes for net Pablo Neira Ayuso
2026-05-01 12:22 ` [PATCH net 01/14] netfilter: replace skb_try_make_writable() by skb_ensure_writable() Pablo Neira Ayuso
2026-05-01 23:50   ` patchwork-bot+netdevbpf
2026-05-01 12:22 ` [PATCH net 02/14] netfilter: nft_fwd_netdev: add device and headroom validate with neigh forwarding Pablo Neira Ayuso
2026-05-01 12:22 ` [PATCH net 03/14] netfilter: nft_fwd_netdev: use recursion counter in neigh egress path Pablo Neira Ayuso
2026-05-01 12:22 ` [PATCH net 04/14] netfilter: x_tables: add .check_hooks to matches and targets Pablo Neira Ayuso
2026-05-01 12:22 ` [PATCH net 05/14] netfilter: nft_compat: run xt_check_hooks_{match,target}() from .validate Pablo Neira Ayuso
2026-05-01 12:22 ` [PATCH net 06/14] netfilter: xt_CT: fix usersize for v1 and v2 revision Pablo Neira Ayuso
2026-05-01 12:22 ` [PATCH net 07/14] netfilter: nf_tables: fix netdev hook allocation memleak with dormant tables Pablo Neira Ayuso
2026-05-01 12:22 ` [PATCH net 08/14] netfilter: nf_socket: skip socket lookup for non-first fragments Pablo Neira Ayuso
2026-05-01 12:22 ` [PATCH net 09/14] netfilter: nf_tables: skip L4 header parsing " Pablo Neira Ayuso
2026-05-01 12:22 ` [PATCH net 10/14] netfilter: xtables: fix " Pablo Neira Ayuso
2026-05-01 12:22 ` [PATCH net 11/14] netfilter: flowtable: ensure sufficient headroom in xmit path Pablo Neira Ayuso
2026-05-01 12:22 ` [PATCH net 12/14] netfilter: flowtable: fix inline vlan encapsulation " Pablo Neira Ayuso
2026-05-01 12:22 ` [PATCH net 13/14] netfilter: flowtable: fix inline pppoe " Pablo Neira Ayuso
2026-05-01 12:22 ` [PATCH net 14/14] netfilter: flowtable: use skb_pull_rcsum() to pop vlan/pppoe header Pablo Neira Ayuso
  -- strict thread matches above, loose matches on Subject: below --
2024-09-24 20:13 [PATCH net 00/14] Netfilter fixes for net Pablo Neira Ayuso
2024-09-26  9:41 ` Paolo Abeni
2024-09-26 10:37   ` Florian Westphal
2024-09-26 10:38     ` Pablo Neira Ayuso
2024-09-26 10:41       ` Florian Westphal
2024-09-26 10:43     ` Paolo Abeni
2024-09-26 10:56       ` Pablo Neira Ayuso
2024-01-17 16:00 Pablo Neira Ayuso
2022-08-24 22:03 Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox