All of lore.kernel.org
 help / color / mirror / Atom feed
* nft transaction semantics and flowtable hw offload
@ 2023-05-05 12:32 Florian Westphal
  2023-05-08 19:45 ` Pablo Neira Ayuso
  2023-06-02 23:14 ` Pablo Neira Ayuso
  0 siblings, 2 replies; 4+ messages in thread
From: Florian Westphal @ 2023-05-05 12:32 UTC (permalink / raw)
  To: netfilter-devel

Following dummy ruleset only works on first load:

$ cat bug
flush ruleset
table inet filter {
  flowtable f1 {
  hook ingress priority 10
  flags offload
  devices = { dummy0, dummy1 }
 }
}
$ nft -f bug
$ nft -f bug
bug:3:13-14: Error: Could not process rule: Device or resource busy

This works when 'offload' flag is removed.

Transaction will *first* try to register the flowtable hook,
then it unregisters the existing flowtable hook.

When 'offload' flag is enabled, hook registration fails because
the device offload capability is already busy.

Any suggestions on how to fix this?
Or would you say this is as expected/designed?

I don't see a way to resolve this.

We could swap register/unregister, but this has two major issues:

1. it gives a window where no hook is registered on hw side
2. after unreg, we cannot assume that (re)registering works, so
   'nft -f' may cause loss of functionality.

Adding a 'refcount' scheme doesn't really work either, we'd need
an extra data structure to record the known offload settings, so that
on a 'hook flowtable f1 to dummy0' request we can figure out that this
is expected to be busy and then we could skip the register request.

But that has to problem that the batch might not have an unregister
request, i.e. we would accept a bogus ruleset that *should* have failed
with -EBUSY.

Any ideas?

If not, i'd add a paragraph to the man page wrt. offload caveats.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: nft transaction semantics and flowtable hw offload
  2023-05-05 12:32 nft transaction semantics and flowtable hw offload Florian Westphal
@ 2023-05-08 19:45 ` Pablo Neira Ayuso
  2023-05-08 20:25   ` Florian Westphal
  2023-06-02 23:14 ` Pablo Neira Ayuso
  1 sibling, 1 reply; 4+ messages in thread
From: Pablo Neira Ayuso @ 2023-05-08 19:45 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

Hi Florian,

On Fri, May 05, 2023 at 02:32:08PM +0200, Florian Westphal wrote:
> Following dummy ruleset only works on first load:
> 
> $ cat bug
> flush ruleset
> table inet filter {
>   flowtable f1 {
>   hook ingress priority 10
>   flags offload
>   devices = { dummy0, dummy1 }
>  }
> }
> $ nft -f bug

This follows flowtable addition path.

> $ nft -f bug

This follows nft_flow_update() path.

> bug:3:13-14: Error: Could not process rule: Device or resource busy

I don't see who reports EBUSY at quick glance.

nft_flowtable_update() removes redundant hooks (already registered).

> This works when 'offload' flag is removed.
> 
> Transaction will *first* try to register the flowtable hook,
> then it unregisters the existing flowtable hook.
> 
> When 'offload' flag is enabled, hook registration fails because
> the device offload capability is already busy.
>
> Any suggestions on how to fix this?
> Or would you say this is as expected/designed?

It should not report EBUSY, we have fixed similar issues like this one
in the past.

> I don't see a way to resolve this.
> 
> We could swap register/unregister, but this has two major issues:
> 
> 1. it gives a window where no hook is registered on hw side
> 2. after unreg, we cannot assume that (re)registering works, so
>    'nft -f' may cause loss of functionality.
> 
> Adding a 'refcount' scheme doesn't really work either, we'd need
> an extra data structure to record the known offload settings, so that
> on a 'hook flowtable f1 to dummy0' request we can figure out that this
> is expected to be busy and then we could skip the register request.
> 
> But that has to problem that the batch might not have an unregister
> request, i.e. we would accept a bogus ruleset that *should* have failed
> with -EBUSY.
> 
> Any ideas?

If you help me narrow down the issue, because I currently do not see
where this EBUSY error originates from.

> If not, i'd add a paragraph to the man page wrt. offload caveats.
> 
> Thanks,
> Florian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: nft transaction semantics and flowtable hw offload
  2023-05-08 19:45 ` Pablo Neira Ayuso
@ 2023-05-08 20:25   ` Florian Westphal
  0 siblings, 0 replies; 4+ messages in thread
From: Florian Westphal @ 2023-05-08 20:25 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi Florian,
> 
> On Fri, May 05, 2023 at 02:32:08PM +0200, Florian Westphal wrote:
> > Following dummy ruleset only works on first load:
> > 
> > $ cat bug
> > flush ruleset
> > table inet filter {
> >   flowtable f1 {
> >   hook ingress priority 10
> >   flags offload
> >   devices = { dummy0, dummy1 }
> >  }
> > }
> > $ nft -f bug
> 
> This follows flowtable addition path.
> 
> > $ nft -f bug
> 
> This follows nft_flow_update() path.

I don't think so, the "flush ruleset" should make the
existing flowtable invisible, no?

> > bug:3:13-14: Error: Could not process rule: Device or resource busy
> 
> I don't see who reports EBUSY at quick glance.

Its the NIC driver (ndo_setup_tc), but I do not see how its fixable,
hence this email.

> > This works when 'offload' flag is removed.
> > 
> > Transaction will *first* try to register the flowtable hook,
> > then it unregisters the existing flowtable hook.
> > 
> > When 'offload' flag is enabled, hook registration fails because
> > the device offload capability is already busy.
> >
> > Any suggestions on how to fix this?
> > Or would you say this is as expected/designed?
> 
> It should not report EBUSY, we have fixed similar issues like this one
> in the past.

But driver can't know that the (registered) flow block is going
to be unregistered later in the same transaction, so it can't
"suppress" the error.

For SW path (no offload) this works because netfilter core
will be happy to (temporarily) register the software flow bypass
hook a second time.

You can replicate a unrelated but similar error if you create 1k base chains,
then try to "nft -f" the same ruleset; if you prepend a "flush ruleset"
this will fail because we have "reg, unreg" and not "unreg, reg"
ordering (and we have to do it this way to not leave a "no hooks at all"
window) and the temporary 2nd registering brings the core over the
hardcoded limit.

> > We could swap register/unregister, but this has two major issues:
> > 
> > 1. it gives a window where no hook is registered on hw side
> > 2. after unreg, we cannot assume that (re)registering works, so
> >    'nft -f' may cause loss of functionality.
> > 
> > Adding a 'refcount' scheme doesn't really work either, we'd need
> > an extra data structure to record the known offload settings, so that
> > on a 'hook flowtable f1 to dummy0' request we can figure out that this
> > is expected to be busy and then we could skip the register request.
> > 
> > But that has to problem that the batch might not have an unregister
> > request, i.e. we would accept a bogus ruleset that *should* have failed
> > with -EBUSY.
> > 
> > Any ideas?
> 
> If you help me narrow down the issue, because I currently do not see
> where this EBUSY error originates from.

To be fair, I dont have offload capable HW so I hacked up the dummy driver
to add a faux ndo_setup_tc for testing.

As far as I can piece this together this is coming from
flow_block_cb_setup_simple() in mlx5.

Interesting however is that mtk_wed_setup_tc_block() seems to instead
bump an internal refcount.

Is that the expected driver handling in this situation?
If so, I'd ping mlx driver maintainers.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: nft transaction semantics and flowtable hw offload
  2023-05-05 12:32 nft transaction semantics and flowtable hw offload Florian Westphal
  2023-05-08 19:45 ` Pablo Neira Ayuso
@ 2023-06-02 23:14 ` Pablo Neira Ayuso
  1 sibling, 0 replies; 4+ messages in thread
From: Pablo Neira Ayuso @ 2023-06-02 23:14 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

Hi Florian,

On Fri, May 05, 2023 at 02:32:08PM +0200, Florian Westphal wrote:
> Following dummy ruleset only works on first load:
> 
> $ cat bug
> flush ruleset
> table inet filter {
>   flowtable f1 {
>   hook ingress priority 10
>   flags offload
>   devices = { dummy0, dummy1 }
>  }
> }
> $ nft -f bug
> $ nft -f bug
> bug:3:13-14: Error: Could not process rule: Device or resource busy
> 
> This works when 'offload' flag is removed.
> 
> Transaction will *first* try to register the flowtable hook,
> then it unregisters the existing flowtable hook.
> 
> When 'offload' flag is enabled, hook registration fails because
> the device offload capability is already busy.
> 
> Any suggestions on how to fix this?
> Or would you say this is as expected/designed?
> 
> I don't see a way to resolve this.
>
> We could swap register/unregister, but this has two major issues:
> 
> 1. it gives a window where no hook is registered on hw side
> 2. after unreg, we cannot assume that (re)registering works, so
>    'nft -f' may cause loss of functionality.
> 
> Adding a 'refcount' scheme doesn't really work either, we'd need
> an extra data structure to record the known offload settings, so that
> on a 'hook flowtable f1 to dummy0' request we can figure out that this
> is expected to be busy and then we could skip the register request.
> 
> But that has to problem that the batch might not have an unregister
> request, i.e. we would accept a bogus ruleset that *should* have failed
> with -EBUSY.
> 
> Any ideas?

I think the hardware offload API needs to support transaction
semantics (generation mask), this requires a flag to enable such
behaviour, so 'tc' can toggle this off.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-06-02 23:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-05 12:32 nft transaction semantics and flowtable hw offload Florian Westphal
2023-05-08 19:45 ` Pablo Neira Ayuso
2023-05-08 20:25   ` Florian Westphal
2023-06-02 23:14 ` Pablo Neira Ayuso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.