Re: [PATCH nf-next,RFC v4] netfilter: nf_flow_table: add hardware offload support

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jakub Kicinski <kubakici@wp.pl>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	f.fainelli@gmail.com, ronye@mellanox.com,
	simon.horman@netronome.com, jiri@mellanox.com, nbd@nbd.name,
	john@phrozen.org, fw@strlen.de
Subject: Re: [PATCH nf-next,RFC v4] netfilter: nf_flow_table: add hardware offload support
Date: Thu, 25 Jan 2018 14:38:46 -0800	[thread overview]
Message-ID: <20180125143846.6bb5290f@cakuba.netronome.com> (raw)
In-Reply-To: <20180125112858.s27mtfv3cikqfiny@salvia>

On Thu, 25 Jan 2018 12:28:58 +0100, Pablo Neira Ayuso wrote:
> On Wed, Jan 24, 2018 at 05:31:36PM -0800, Jakub Kicinski wrote:
> > On Thu, 25 Jan 2018 01:09:41 +0100, Pablo Neira Ayuso wrote:  
> > > This patch adds the infrastructure to offload flows to hardware, in case
> > > the nic/switch comes with built-in flow tables capabilities.
> > > 
> > > If the hardware comes with no hardware flow tables or they have
> > > limitations in terms of features, the existing infrastructure falls back
> > > to the software flow table implementation.
> > > 
> > > The software flow table garbage collector skips entries that resides in
> > > the hardware, so the hardware will be responsible for releasing this
> > > flow table entry too via flow_offload_dead().
> > > 
> > > Hardware configuration, either to add or to delete entries, is done from
> > > the hardware offload workqueue, to ensure this is done from user context
> > > given that we may sleep when grabbing the mdio mutex.
> > > 
> > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>  
> > 
> > I wonder how do you deal with device/table removal?  I know regrettably
> > little about internals of nftables.  I assume the table cannot be
> > removed/module unloaded as long as there are flow entries? And on
> > device removal all flows pertaining to the removed ifindex will be
> > automatically flushed?  
> 
> Yes, this code is part of the generic software infrastructure, it's
> not specific to the hardware offload, it's already upstream, see
> net/netfilter/nft_flow_offload.c, see flow_offload_netdev_notifier.

Hm.  At a glance flow_offload_iterate_cleanup() will just mark the
flows as dead, not request their removal from the HW.  Doesn't that
mean that reloading the HW driver with flows installed will likely lead
to HW/FW resources being leaked (unless every driver duplicates manual
flush on remove).

> > Still there could be outstanding work items targeting the device, so
> > this WARN_ON:
> > 
> > +	indev = dev_get_by_index(net, ifindex);
> > +	if (WARN_ON(!indev))
> > +		return 0;
> > 
> > looks possible to trigger.  
> 
> It should not, that's why there's a WARN_ON there ;-).
> 
> See nf_flow_table_hw_module_exit(), there's a call to
> cancel_work_sync() to stop the hw offload workqueue, then flushes it.
> After this, there's a flow table cleanup. So noone should be calling
> that function by then.

Ah, I must be misunderstanding.  I meant when device is removed, not
the flow_table_hw module.  Does the nf_flow_table_hw_module_exit() run
when device is removed?  I was expecting that, for example something
like nft_flow_offload_iterate_cleanup() would queue up all the flow
remove calls and then call flush_work() (not cancel_work). 

> > On the general architecture - I think it's worth documenting somewhere
> > clearly that unlike TC offloads and most NDOs add/del of NFT flows are
> > not protected by rtnl_lock.  
> 
> Someone could probably look at getting rid of the rtnl_lock() all over
> the place for hardware offloads, holding on the entire rtnetlink
> subsystem just because some piece of hardware is taking time to
> configure things is not good. Not explicitly related to this, but have
> a look at Florian Westphal's talk on rtnl_lock during NetDev.

Oh, 100% agreed.  I was just pointing out that it could be useful to
mention the locking in kdoc or at least the commit message.

> > > v4: More work in progress
> > > - Decouple nf_flow_table_hw from nft_flow_offload via rcu hooks
> > > - Consolidate ->ndo invocations, now they happen from the hw worker.
> > > - Fix bug in list handling, use list_replace_init()
> > > - cleanup entries on nf_flow_table_hw module removal
> > > - add NFT_FLOWTABLE_F_HW flag to flowtables to explicit signal that user wants
> > >   to offload entries to hardware.
> > > 
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index ed0799a12bf2..be0c12acc3f0 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -859,6 +859,13 @@ struct dev_ifalias {
> > >  	char ifalias[];
> > >  };
> > >  
> > > +struct flow_offload;
> > > +
> > > +enum flow_offload_type {
> > > +	FLOW_OFFLOAD_ADD	= 0,
> > > +	FLOW_OFFLOAD_DEL,
> > > +};
> > > +
> > >  /*
> > >   * This structure defines the management hooks for network devices.
> > >   * The following hooks can be defined; unless noted otherwise, they are
> > > @@ -1316,6 +1323,8 @@ struct net_device_ops {
> > >  	int			(*ndo_bridge_dellink)(struct net_device *dev,
> > >  						      struct nlmsghdr *nlh,
> > >  						      u16 flags);
> > > +	int			(*ndo_flow_offload)(enum flow_offload_type type,
> > > +						    struct flow_offload *flow);  
> > 
> > nit: should there be kdoc for the new NDO?  ndo kdoc comment doesn't
> >      look like it would be recognized by tools anyway though..  
> 
> Yes, I can add this in the next iteration, no problem.
> 
> > nit: using "flow" as the name rings slightly grandiose to me :)  
> >      I would appreciate a nf_ prefix for clarity.  Drivers will have 
> >      to juggle a number of "flow" things, it would make the code easier
> >      to follow if names were prefixed clearly, I feel.  
> 
> This infrastructure could be used from tc too. My take on this is that
> we should look at generalizing ndo's so they can be used from every
> subsystem, so you just pick your own poison when doing packet
> classification.
> 
> With some intermediate representation that suits well for everyone, we
> would save quite a bit of redundant code in the drivers, so all
> frontend interfaces that are basically part of the "same world" could
> call the same ndo. We just need some glue code/abstraction in between
> drivers and frontends [1].
>
> The other direction, that IMO I would prefer to skip, is to have one
> ndo for each frontend packet classification subsystem.

Unification or making an agreement on a unified API which would cover
all cases would be great.  

My vote doesn't carry much wight but I thought I would express my
preference as far as the "interim" goes :)

next prev parent reply	other threads:[~2018-01-25 22:38 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-25  0:09 [PATCH nf-next,RFC v4] netfilter: nf_flow_table: add hardware offload support Pablo Neira Ayuso
2018-01-25  1:31 ` Jakub Kicinski
2018-01-25 11:28   ` Pablo Neira Ayuso
2018-01-25 22:38     ` Jakub Kicinski [this message]
2018-01-29 10:37       ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180125143846.6bb5290f@cakuba.netronome.com \
    --to=kubakici@wp.pl \
    --cc=f.fainelli@gmail.com \
    --cc=fw@strlen.de \
    --cc=jiri@mellanox.com \
    --cc=john@phrozen.org \
    --cc=nbd@nbd.name \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=ronye@mellanox.com \
    --cc=simon.horman@netronome.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).