From: Florian Westphal <fw@strlen.de>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>,
netfilter-devel <netfilter-devel@vger.kernel.org>
Subject: Re: nft transaction semantics and flowtable hw offload
Date: Mon, 8 May 2023 22:25:47 +0200 [thread overview]
Message-ID: <20230508202547.GA25016@breakpoint.cc> (raw)
In-Reply-To: <ZFlRWJ/kC+F1YNsB@calendula>
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi Florian,
>
> On Fri, May 05, 2023 at 02:32:08PM +0200, Florian Westphal wrote:
> > Following dummy ruleset only works on first load:
> >
> > $ cat bug
> > flush ruleset
> > table inet filter {
> > flowtable f1 {
> > hook ingress priority 10
> > flags offload
> > devices = { dummy0, dummy1 }
> > }
> > }
> > $ nft -f bug
>
> This follows flowtable addition path.
>
> > $ nft -f bug
>
> This follows nft_flow_update() path.
I don't think so, the "flush ruleset" should make the
existing flowtable invisible, no?
> > bug:3:13-14: Error: Could not process rule: Device or resource busy
>
> I don't see who reports EBUSY at quick glance.
Its the NIC driver (ndo_setup_tc), but I do not see how its fixable,
hence this email.
> > This works when 'offload' flag is removed.
> >
> > Transaction will *first* try to register the flowtable hook,
> > then it unregisters the existing flowtable hook.
> >
> > When 'offload' flag is enabled, hook registration fails because
> > the device offload capability is already busy.
> >
> > Any suggestions on how to fix this?
> > Or would you say this is as expected/designed?
>
> It should not report EBUSY, we have fixed similar issues like this one
> in the past.
But driver can't know that the (registered) flow block is going
to be unregistered later in the same transaction, so it can't
"suppress" the error.
For SW path (no offload) this works because netfilter core
will be happy to (temporarily) register the software flow bypass
hook a second time.
You can replicate a unrelated but similar error if you create 1k base chains,
then try to "nft -f" the same ruleset; if you prepend a "flush ruleset"
this will fail because we have "reg, unreg" and not "unreg, reg"
ordering (and we have to do it this way to not leave a "no hooks at all"
window) and the temporary 2nd registering brings the core over the
hardcoded limit.
> > We could swap register/unregister, but this has two major issues:
> >
> > 1. it gives a window where no hook is registered on hw side
> > 2. after unreg, we cannot assume that (re)registering works, so
> > 'nft -f' may cause loss of functionality.
> >
> > Adding a 'refcount' scheme doesn't really work either, we'd need
> > an extra data structure to record the known offload settings, so that
> > on a 'hook flowtable f1 to dummy0' request we can figure out that this
> > is expected to be busy and then we could skip the register request.
> >
> > But that has to problem that the batch might not have an unregister
> > request, i.e. we would accept a bogus ruleset that *should* have failed
> > with -EBUSY.
> >
> > Any ideas?
>
> If you help me narrow down the issue, because I currently do not see
> where this EBUSY error originates from.
To be fair, I dont have offload capable HW so I hacked up the dummy driver
to add a faux ndo_setup_tc for testing.
As far as I can piece this together this is coming from
flow_block_cb_setup_simple() in mlx5.
Interesting however is that mtk_wed_setup_tc_block() seems to instead
bump an internal refcount.
Is that the expected driver handling in this situation?
If so, I'd ping mlx driver maintainers.
Thanks,
Florian
next prev parent reply other threads:[~2023-05-08 20:26 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-05 12:32 nft transaction semantics and flowtable hw offload Florian Westphal
2023-05-08 19:45 ` Pablo Neira Ayuso
2023-05-08 20:25 ` Florian Westphal [this message]
2023-06-02 23:14 ` Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230508202547.GA25016@breakpoint.cc \
--to=fw@strlen.de \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.