All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: netfilter-devel@vger.kernel.org
Cc: davem@davemloft.net, netdev@vger.kernel.org, paulb@mellanox.com,
	ozsh@mellanox.com, vladbu@mellanox.com, jiri@resnulli.us,
	kuba@kernel.org, saeedm@mellanox.com, michael.chan@broadcom.com,
	sriharsha.basavapatna@broadcom.com
Subject: [PATCH net-next 0/8] the indirect flow_block infrastructure, revisited
Date: Fri, 29 May 2020 02:25:33 +0200	[thread overview]
Message-ID: <20200529002541.19743-1-pablo@netfilter.org> (raw)

Hi,

This series fixes b5140a36da78 ("netfilter: flowtable: add indr block
setup support") that adds support for the indirect block for the
flowtable. This patch crashes the kernel with the TC CT action.

[  630.908086] BUG: kernel NULL pointer dereference, address: 00000000000000f0
[  630.908233] #PF: error_code(0x0000) - not-present page
[  630.908304] PGD 800000104addd067 P4D 800000104addd067 PUD 104311d067 PMD 0
[  630.908380] Oops: 0000 [#1] SMP PTI [  630.908615] RIP: 0010:nf_flow_table_indr_block_cb+0xc0/0x190 [nf_flow_table]
[  630.908690] Code: 5b 41 5c 41 5d 41 5e 41 5f 5d c3 4c 89 75 a0 4c 89 65 a8 4d 89 ee 49 89 dd 4c 89 fe 48 c7 c7 b7 64 36 a0 31 c0 e8 ce ed d8 e0 <49> 8b b7 f0 00 00 00 48 c7 c7 c8 64      36 a0 31 c0 e8 b9 ed d8 e0 49[  630.908790] RSP: 0018:ffffc9000895f8c0 EFLAGS: 00010246
[...]
[  630.910774] Call Trace:
[  630.911192]  ? mlx5e_rep_indr_setup_block+0x270/0x270 [mlx5_core]
[  630.911621]  ? mlx5e_rep_indr_setup_block+0x270/0x270 [mlx5_core]
[  630.912040]  ? mlx5e_rep_indr_setup_block+0x270/0x270 [mlx5_core]
[  630.912443]  flow_block_cmd+0x51/0x80
[  630.912844]  __flow_indr_block_cb_register+0x26c/0x510
[  630.913265]  mlx5e_nic_rep_netdevice_event+0x9e/0x110 [mlx5_core]
[  630.913665]  notifier_call_chain+0x53/0xa0
[  630.914063]  raw_notifier_call_chain+0x16/0x20
[  630.914466]  call_netdevice_notifiers_info+0x39/0x90
[  630.914859]  register_netdevice+0x484/0x550
[  630.915256]  __ip_tunnel_create+0x12b/0x1f0 [ip_tunnel]
[  630.915661]  ip_tunnel_init_net+0x116/0x180 [ip_tunnel]
[  630.916062]  ipgre_tap_init_net+0x22/0x30 [ip_gre]
[  630.916458]  ops_init+0x44/0x110
[  630.916851]  register_pernet_operations+0x112/0x200

A workaround patch to cure this crash has been proposed. However, there
is another problem: The indirect flow_block still does not work for the
new TC CT action. The problem is that the existing flow_indr_block_entry
callback assumes you can look up for the flowtable from the netdevice to
get the flow_block. This flow_block allows you to offload the flows via
TC_SETUP_CLSFLOWER. Unfortunately, it is not possible to get the
flow_block from the TC CT flowtables because they are _not_ bound to any
specific netdevice.

= What is the indirect flow_block infrastructure?

The indirect flow_block infrastructure allows drivers to offload
tc/netfilter rules that belong to software tunnel netdevices, e.g.
vxlan.

This indirect flow_block infrastructure relates tunnel netdevices with
drivers because there is no obvious way to relate these two things
from the control plane.

= How does the indirect flow_block work before this patchset?

Front-ends register the indirect block callback through
flow_indr_add_block_cb() if they support for offloading tunnel
netdevices.

== Setting up an indirect block

1) Drivers track tunnel netdevices via NETDEV_{REGISTER,UNREGISTER} events.
   If there is a new tunnel netdevice that the driver can offload, then the
   driver invokes __flow_indr_block_cb_register() with the new tunnel
   netdevice and the driver callback. The __flow_indr_block_cb_register()
   call iterates over the list of the front-end callbacks.

2) The front-end callback sets up the flow_block_offload structure and it
   invokes the driver callback to set up the flow_block.

3) The driver callback now registers the flow_block structure and it
   returns the flow_block back to the front-end.

4) The front-end gets the flow_block object and it is now ready to
   offload rules for this tunnel netdevice.

A simplified callgraph is represented below.

        Front-end                      Driver

                                   NETDEV_REGISTER
                                         |
                     __flow_indr_block_cb_register(netdev, cb_priv, driver_cb)
                                         | [1]
            .--------------frontend_indr_block_cb(cb_priv, driver_cb)
            |
            .
   setup_flow_block_offload(bo)
            | [2]
       driver_cb(bo, cb_priv) -----------.
                                         |
                                         \/
                                  set up flow_blocks [3]
                                         |
      add rules to flow_block <----------
      TC_SETUP_CLSFLOWER [4]

== Releasing the indirect flow_block

There are two possibilities, either tunnel netdevice is removed or
a netdevice (port representor) is removed.

=== Tunnel netdevice is removed

Driver waits for the NETDEV_UNREGISTER event that announces the tunnel
netdevice removal. Then, it calls __flow_indr_block_cb_unregister() to
remove the flow_block and rules.  Callgraph is very similar to the one
described above.

=== Netdevice is removed (port representor)

Driver calls __flow_indr_block_cb_unregister() to remove the existing
netfilter/tc rule that belong to the tunnel netdevice.

= How does the indirect flow_block work after this patchset?

Drivers register the indirect flow_block setup callback through
flow_indr_dev_register() if they support for offloading tunnel
netdevices.

== Setting up an indirect flow_block

1) Frontends check if dev->netdev_ops->ndo_setup_tc is unset. If so,
   frontends call flow_indr_dev_setup_offload(). This call invokes
   the drivers' indirect flow_block setup callback.

2) The indirect flow_block setup callback sets up a flow_block structure
   which relates the tunnel netdevice and the driver.

3) The front-end uses flow_block and offload the rules.

Note that the operational to set up (non-indirect) flow_block is very
similar.

== Releasing the indirect flow_block

=== Tunnel netdevice is removed

This calls flow_indr_dev_setup_offload() to set down the flow_block and
remove the offloaded rules. This alternate path is exercised if
dev->netdev_ops->ndo_setup_tc is unset.

=== Netdevice is removed (port representor)

If a netdevice is removed, then it might need to to clean up the
offloaded tc/netfilter rules that belongs to the tunnel netdevice:

1) The driver invokes flow_indr_dev_unregister() when a netdevice is
   removed.

2) This call iterates over the existing indirect flow_blocks
   and it invokes the cleanup callback to let the front-end remove the
   tc/netfilter rules. The cleanup callback already provides the
   flow_block that the front-end needs to clean up.

        Front-end                      Driver

                                         |
                            flow_indr_dev_unregister(...)
                                         |
                         iterate over list of indirect flow_block
                               and invoke cleanup callback
                                         |
            .-----------------------------
            |
            .
   frontend_flow_block_cleanup(flow_block)
            .
            |
           \/
   remove rules to flow_block
      TC_SETUP_CLSFLOWER

= About this patchset

This patchset aims to address the existing TC CT problem while
simplifying the indirect flow_block infrastructure. Saving 300 LoC in
the flow_offload core and the drivers. The operational gets aligned with
the (non-indirect) flow_blocks logic. Patchset is composed of:

Patch #1 add nf_flow_table_gc_cleanup() which is required by the
         netfilter's flowtable new indirect flow_block approach.

Patch #2 adds the flow_block_indr object which is actually part of
         of the flow_block object. This stores the indirect flow_block
         metadata such as the tunnel netdevice owner and the cleanup
         callback (in case the tunnel netdevice goes away).

         This patch adds flow_indr_dev_{un}register() to allow drivers
         to offer netdevice tunnel hardware offload to the front-ends.
         Then, front-ends call flow_indr_dev_setup_offload() to invoke
         the drivers to set up the (indirect) flow_block.

Patch #3 add the tcf_block_offload_init() helper function, this is
         a preparation patch to adapt the tc front-end to use this
         new indirect flow_block infrastructure.

Patch #4 updates the tc and netfilter front-ends to use the new
         indirect flow_block infrastructure.

Patch #5 updates the mlx5 driver to use the new indirect flow_block
         infrastructure.

Patch #6 updates the nfp driver to use the new indirect flow_block
         infrastructure.

Patch #7 updates the bnxt driver to use the new indirect flow_block
         infrastructure.

Patch #8 removes the indirect flow_block infrastructure version 1,
         now that frontends and drivers have been translated to
         version 2 (coming in this patchset).

Please, apply.

Pablo Neira Ayuso (8):
  netfilter: nf_flowtable: expose nf_flow_table_gc_cleanup()
  net: flow_offload: consolidate indirect flow_block infrastructure
  net: cls_api: add tcf_block_offload_init()
  net: use flow_indr_dev_setup_offload()
  mlx5: update indirect block support
  nfp: update indirect block support
  bnxt_tc: update indirect block support
  net: remove indirect block netdev event registration

 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   1 -
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c  |  51 +--
 .../ethernet/mellanox/mlx5/core/en/rep/tc.c   |  81 +----
 .../ethernet/mellanox/mlx5/core/en/rep/tc.h   |   4 -
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |   1 -
 .../net/ethernet/mellanox/mlx5/core/en_rep.h  |   5 -
 .../net/ethernet/netronome/nfp/flower/main.c  |  11 +-
 .../net/ethernet/netronome/nfp/flower/main.h  |   7 +-
 .../ethernet/netronome/nfp/flower/offload.c   |  35 +-
 include/net/flow_offload.h                    |  28 +-
 include/net/netfilter/nf_flow_table.h         |   2 +
 net/core/flow_offload.c                       | 301 +++++++-----------
 net/netfilter/nf_flow_table_core.c            |   6 +-
 net/netfilter/nf_flow_table_offload.c         |  85 +----
 net/netfilter/nf_tables_offload.c             |  69 ++--
 net/sched/cls_api.c                           | 157 +++------
 16 files changed, 249 insertions(+), 595 deletions(-)

--
2.20.1


             reply	other threads:[~2020-05-29  0:26 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-29  0:25 Pablo Neira Ayuso [this message]
2020-05-29  0:25 ` [PATCH net-next 1/8] netfilter: nf_flowtable: expose nf_flow_table_gc_cleanup() Pablo Neira Ayuso
2020-05-29  0:25 ` [PATCH net-next 2/8] net: flow_offload: consolidate indirect flow_block infrastructure Pablo Neira Ayuso
2020-05-29  0:25 ` [PATCH net-next 3/8] net: cls_api: add tcf_block_offload_init() Pablo Neira Ayuso
2020-05-29  0:25 ` [PATCH net-next 4/8] net: use flow_indr_dev_setup_offload() Pablo Neira Ayuso
2020-05-29  0:25 ` [PATCH net-next 5/8] mlx5: update indirect block support Pablo Neira Ayuso
2020-05-29  0:25 ` [PATCH net-next 6/8] nfp: " Pablo Neira Ayuso
2020-05-29  0:25 ` [PATCH net-next 7/8] bnxt_tc: " Pablo Neira Ayuso
2020-05-29  0:25 ` [PATCH net-next 8/8] net: remove indirect block netdev event registration Pablo Neira Ayuso
2020-06-01 18:42 ` [PATCH net-next 0/8] the indirect flow_block infrastructure, revisited David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200529002541.19743-1-pablo@netfilter.org \
    --to=pablo@netfilter.org \
    --cc=davem@davemloft.net \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=ozsh@mellanox.com \
    --cc=paulb@mellanox.com \
    --cc=saeedm@mellanox.com \
    --cc=sriharsha.basavapatna@broadcom.com \
    --cc=vladbu@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.