All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Felix Fietkau <nbd@nbd.name>
Cc: netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	Jo-Philipp Wich <jo@mein.io>
Subject: Re: [RFC] netfilter: nf_tables: ignore errors on flowtable device hw offload setup
Date: Mon, 16 May 2022 02:57:50 +0200	[thread overview]
Message-ID: <YoGhjjhsE1PcVeFC@salvia> (raw)
In-Reply-To: <88da25b7-0cd0-49df-c09e-8271618ba50f@nbd.name>

On Fri, May 13, 2022 at 11:09:51AM +0200, Felix Fietkau wrote:
> 
> On 13.05.22 10:15, Pablo Neira Ayuso wrote:
> > On Fri, May 13, 2022 at 10:03:13AM +0200, Felix Fietkau wrote:
> > > 
> > > On 13.05.22 09:49, Pablo Neira Ayuso wrote:
> > > > Hi,
> > > > > On Tue, May 10, 2022 at 10:27:39PM +0200, Felix Fietkau wrote:
> > > > > In many cases, it's not easily possible for user space to know, which
> > > > > devices properly support hardware offload.
> > > > > Then, it is a matter of extending the netlink interface to
> > > expose this
> > > > feature? Probably add a FLOW_BLOCK_PROBE or similar which allow to
> > > > consult if this feature is available?
> > > > > > Even if a device supports hardware flow offload, it is not
> > > > > guaranteed that it will actually be able to handle the flows for
> > > > > which hardware offload is requested.
> > > > > When might this happen?
> > > 
> > > I think there are many possible reasons: The flow might be using features
> > > not supported by the offload driver. Maybe it doesn't have any space left in
> > > the offload table. I'm sure there are many other possible reasons it could
> > > fail.
> > 
> > This fallback to software flowtable path for partial scenarios already
> > exists.
> I know. All I meant was to point out that hardware offload is not guaranteed
> in one place, so I don't think bailing out with an error because flow block
> bind didn't work for one of the flowtable devices is justified.
> 
> > > > > Ignoring errors on the FLOW_BLOCK_BIND makes it a lot easier to set up
> > > > > configurations that use hardware offload where possible and gracefully
> > > > > fall back to software offload for everything else.
> > > > > I understand this might be useful from userspace perspective,
> > > because
> > > > forcing the user to re-try is silly.
> > > > > However, on the other hand, the user should have some way to
> > > know from
> > > > the control plane that the feature (hardware offload) that they
> > > > request is not available for their setup.
> > > 
> > > In my opinion, most users of this API probably don't care and just want to
> > > have offload on a best effort basis.
> > 
> > OK, but if the setup does not support hardware offload at all, why
> > should the control plane accept this? I think user should know in
> > first place that no one single flow is going to be offloaded to
> > hardware.
>
> It makes for a much cleaner configuration if you can just create a single
> hw-offload enabled flowtable containing multiple devices, some of which
> support hardware offload and some of which don't.

This scenario mixing devices that support hw offload and which does
not support hw offload make sense.

> > > Assuming that is the case, wouldn't it be better if we simply have
> > > an API that indicates, which flowtable members hardware offload was
> > > actually enabled for?
> > 
> > What are you proposing?
> > 
> > I think it would be good to expose through netlink interface what the
> > device can actually do according to the existing supported flowtable
> > software datapath features.
> In addition to the NFTA_FLOWTABLE_HOOK_DEVS array, the netlink API could
> also return another array, e.g. NFTA_FLOWTABLE_HOOK_OFFLOAD_DEVS which
> indicates devices for which hw offload is enabled.
> 
> What I really don't like about the current state of the flowtable offload
> API is the (in my opinion completely unnecessary) complexity that is
> required for the simple use case of enabling hw/sw flow offloading on a best
> effort basis for all devices.
> What I like even less is the number of implementation details that it has to
> consider.
> 
> For example: Let's assume we have a machine with several devices, some of
> which support hw offload, some of which don't. We have a mix of VLANs and
> bridges in there as well, maybe even PPPoE.
> Now the admin of that machine wants to enable best-effort hardware +
> software flow offloading for that configuration.
> Now he (or a piece of user space software dealing with the config) has to do
> these things:
> - figure out which devices could support hw offload, create a separate flow
> table for them
> - be aware of which of these devices are actually used by looking at the
> stack of bridges, vlans, dsa devices, etc.
> - if an error occurs, test them individually just to see which one actually
> failed and leave it out of the flowtable
> - for sw offload be aware that there is limited support for offloading decap
> of vlans/pppoe, count the number of decaps and figure out the right input
> device to add based on the behavior of nft_dev_path_info, so that the
> 'indev' it selects matches the device you put in the flow table.
> 
> So I'm asking you: Am I getting any of this completely wrong? Do you
> consider it to be a reasonable trade-off to force the admin (or intermediate
> user space layer) to jump through these hoops for such a simple use case,
> just because somebody might want more fine grained control?
> 
> I consider this patch to be a first step towards making simple use cases
> easier to configure. I'd also be fine with adding a flag to make the
> fallback behavior opt-in, even though I think it would make a much better
> default.
> 
> Eventually I'd also like to add a flag that makes it unnecessary to even
> specify the devices in the flow table by making the code auto-create hooks
> for devices with active flows, just like I did in my xtables target.
> You correctly pointed out to me in the past that this comes at the cost of a
> few packets delay before offloading kicks in, but I'm still wondering: who
> actually cares about that?
> 
> If I'm completely off-base with this, please let me know. I'm simply trying
> to make sense of all of this...

Maybe only fail if _none_ of the selected devices support for hardware
offload, ie. instead of silently accepting all devices, count of the
number of devices for which a block has been set up, if it is == 0
then bail out with EOPNOTSUPP.

  reply	other threads:[~2022-05-16  0:57 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-10 20:27 [RFC] netfilter: nf_tables: ignore errors on flowtable device hw offload setup Felix Fietkau
2022-05-13  7:49 ` Pablo Neira Ayuso
2022-05-13  8:03   ` Felix Fietkau
2022-05-13  8:15     ` Pablo Neira Ayuso
2022-05-13  9:09       ` Felix Fietkau
2022-05-16  0:57         ` Pablo Neira Ayuso [this message]
2022-05-19 15:37           ` Felix Fietkau
2022-05-20  7:50             ` Pablo Neira Ayuso
2022-05-20 18:07               ` Felix Fietkau
2022-05-20 21:47                 ` Pablo Neira Ayuso
2022-05-30 16:55                   ` Pablo Neira Ayuso
2022-05-30 18:52                     ` Felix Fietkau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YoGhjjhsE1PcVeFC@salvia \
    --to=pablo@netfilter.org \
    --cc=jo@mein.io \
    --cc=nbd@nbd.name \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.