From: Jakub Kicinski <kubakici@wp.pl>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
f.fainelli@gmail.com, ronye@mellanox.com,
simon.horman@netronome.com, jiri@mellanox.com, nbd@nbd.name,
john@phrozen.org, fw@strlen.de
Subject: Re: [PATCH nf-next,RFC v4] netfilter: nf_flow_table: add hardware offload support
Date: Thu, 25 Jan 2018 14:38:46 -0800 [thread overview]
Message-ID: <20180125143846.6bb5290f@cakuba.netronome.com> (raw)
In-Reply-To: <20180125112858.s27mtfv3cikqfiny@salvia>
On Thu, 25 Jan 2018 12:28:58 +0100, Pablo Neira Ayuso wrote:
> On Wed, Jan 24, 2018 at 05:31:36PM -0800, Jakub Kicinski wrote:
> > On Thu, 25 Jan 2018 01:09:41 +0100, Pablo Neira Ayuso wrote:
> > > This patch adds the infrastructure to offload flows to hardware, in case
> > > the nic/switch comes with built-in flow tables capabilities.
> > >
> > > If the hardware comes with no hardware flow tables or they have
> > > limitations in terms of features, the existing infrastructure falls back
> > > to the software flow table implementation.
> > >
> > > The software flow table garbage collector skips entries that resides in
> > > the hardware, so the hardware will be responsible for releasing this
> > > flow table entry too via flow_offload_dead().
> > >
> > > Hardware configuration, either to add or to delete entries, is done from
> > > the hardware offload workqueue, to ensure this is done from user context
> > > given that we may sleep when grabbing the mdio mutex.
> > >
> > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> >
> > I wonder how do you deal with device/table removal? I know regrettably
> > little about internals of nftables. I assume the table cannot be
> > removed/module unloaded as long as there are flow entries? And on
> > device removal all flows pertaining to the removed ifindex will be
> > automatically flushed?
>
> Yes, this code is part of the generic software infrastructure, it's
> not specific to the hardware offload, it's already upstream, see
> net/netfilter/nft_flow_offload.c, see flow_offload_netdev_notifier.
Hm. At a glance flow_offload_iterate_cleanup() will just mark the
flows as dead, not request their removal from the HW. Doesn't that
mean that reloading the HW driver with flows installed will likely lead
to HW/FW resources being leaked (unless every driver duplicates manual
flush on remove).
> > Still there could be outstanding work items targeting the device, so
> > this WARN_ON:
> >
> > + indev = dev_get_by_index(net, ifindex);
> > + if (WARN_ON(!indev))
> > + return 0;
> >
> > looks possible to trigger.
>
> It should not, that's why there's a WARN_ON there ;-).
>
> See nf_flow_table_hw_module_exit(), there's a call to
> cancel_work_sync() to stop the hw offload workqueue, then flushes it.
> After this, there's a flow table cleanup. So noone should be calling
> that function by then.
Ah, I must be misunderstanding. I meant when device is removed, not
the flow_table_hw module. Does the nf_flow_table_hw_module_exit() run
when device is removed? I was expecting that, for example something
like nft_flow_offload_iterate_cleanup() would queue up all the flow
remove calls and then call flush_work() (not cancel_work).
> > On the general architecture - I think it's worth documenting somewhere
> > clearly that unlike TC offloads and most NDOs add/del of NFT flows are
> > not protected by rtnl_lock.
>
> Someone could probably look at getting rid of the rtnl_lock() all over
> the place for hardware offloads, holding on the entire rtnetlink
> subsystem just because some piece of hardware is taking time to
> configure things is not good. Not explicitly related to this, but have
> a look at Florian Westphal's talk on rtnl_lock during NetDev.
Oh, 100% agreed. I was just pointing out that it could be useful to
mention the locking in kdoc or at least the commit message.
> > > v4: More work in progress
> > > - Decouple nf_flow_table_hw from nft_flow_offload via rcu hooks
> > > - Consolidate ->ndo invocations, now they happen from the hw worker.
> > > - Fix bug in list handling, use list_replace_init()
> > > - cleanup entries on nf_flow_table_hw module removal
> > > - add NFT_FLOWTABLE_F_HW flag to flowtables to explicit signal that user wants
> > > to offload entries to hardware.
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index ed0799a12bf2..be0c12acc3f0 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -859,6 +859,13 @@ struct dev_ifalias {
> > > char ifalias[];
> > > };
> > >
> > > +struct flow_offload;
> > > +
> > > +enum flow_offload_type {
> > > + FLOW_OFFLOAD_ADD = 0,
> > > + FLOW_OFFLOAD_DEL,
> > > +};
> > > +
> > > /*
> > > * This structure defines the management hooks for network devices.
> > > * The following hooks can be defined; unless noted otherwise, they are
> > > @@ -1316,6 +1323,8 @@ struct net_device_ops {
> > > int (*ndo_bridge_dellink)(struct net_device *dev,
> > > struct nlmsghdr *nlh,
> > > u16 flags);
> > > + int (*ndo_flow_offload)(enum flow_offload_type type,
> > > + struct flow_offload *flow);
> >
> > nit: should there be kdoc for the new NDO? ndo kdoc comment doesn't
> > look like it would be recognized by tools anyway though..
>
> Yes, I can add this in the next iteration, no problem.
>
> > nit: using "flow" as the name rings slightly grandiose to me :)
> > I would appreciate a nf_ prefix for clarity. Drivers will have
> > to juggle a number of "flow" things, it would make the code easier
> > to follow if names were prefixed clearly, I feel.
>
> This infrastructure could be used from tc too. My take on this is that
> we should look at generalizing ndo's so they can be used from every
> subsystem, so you just pick your own poison when doing packet
> classification.
>
> With some intermediate representation that suits well for everyone, we
> would save quite a bit of redundant code in the drivers, so all
> frontend interfaces that are basically part of the "same world" could
> call the same ndo. We just need some glue code/abstraction in between
> drivers and frontends [1].
>
> The other direction, that IMO I would prefer to skip, is to have one
> ndo for each frontend packet classification subsystem.
Unification or making an agreement on a unified API which would cover
all cases would be great.
My vote doesn't carry much wight but I thought I would express my
preference as far as the "interim" goes :)
next prev parent reply other threads:[~2018-01-25 22:38 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-25 0:09 [PATCH nf-next,RFC v4] netfilter: nf_flow_table: add hardware offload support Pablo Neira Ayuso
2018-01-25 1:31 ` Jakub Kicinski
2018-01-25 11:28 ` Pablo Neira Ayuso
2018-01-25 22:38 ` Jakub Kicinski [this message]
2018-01-29 10:37 ` Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180125143846.6bb5290f@cakuba.netronome.com \
--to=kubakici@wp.pl \
--cc=f.fainelli@gmail.com \
--cc=fw@strlen.de \
--cc=jiri@mellanox.com \
--cc=john@phrozen.org \
--cc=nbd@nbd.name \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
--cc=ronye@mellanox.com \
--cc=simon.horman@netronome.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).