[RFC] conntrack event framework speedup

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] conntrack event framework speedup
@ 2022-03-15 12:05 Florian Westphal
  2022-03-15 21:30 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2022-03-15 12:05 UTC (permalink / raw)
  To: netfilter-devel

Hello,

Due to net.netfilter.nf_conntrack_events=1 we eat some uncessesary
overhead:

1. allocation of new conntrack entries needs to alloc ct->ext
2. inverse for deletion/free.
3. Because the ctnetlink module is typically active, each packet will
   end up calling __nf_conntrack_eventmask_report via nf_confirm() and
   then in ctnetlink only to find that we have no listeners
   (and we can't call nfnetlink_has_listeners() from conntrack because
    that would yield a dependency of conntrack to nfnetlink).

I would propose to add minimal conntrack specific code
to nfnetlink, namely, to add bind()(/unbind() calls that inc/dec a
counter for each ctnetlink event listener/socket.
If counter becomes nonzero, flip a bit in struct net.

This would allow us to do the following:

add new net.netfilter.nf_conntrack_events default mode: 2, autodetect.
in nfnetlink bind, inc pernet counter when event group is bound.
in nfnetlink unbind, dec pernet counter when event group is unbound.
in init_conntrack() allocate the event cache extension only if
 a) nf_conntrack_events == 1, or
 b) nf_conntrack_events == 2 and pernet counter is nonzero.

Extend nf_confirm() to check of the pernet counter before
call to __nf_conntrack_eventmask_report().

If nobody spots a problem with this idea I'd start to work on
a prototype.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] conntrack event framework speedup
  2022-03-15 12:05 [RFC] conntrack event framework speedup Florian Westphal
@ 2022-03-15 21:30 ` Pablo Neira Ayuso
  2022-03-15 21:41   ` Florian Westphal
  0 siblings, 1 reply; 8+ messages in thread
From: Pablo Neira Ayuso @ 2022-03-15 21:30 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Tue, Mar 15, 2022 at 01:05:38PM +0100, Florian Westphal wrote:
> Hello,
> 
> Due to net.netfilter.nf_conntrack_events=1 we eat some uncessesary
> overhead:
> 
> 1. allocation of new conntrack entries needs to alloc ct->ext
> 2. inverse for deletion/free.
> 3. Because the ctnetlink module is typically active, each packet will
>    end up calling __nf_conntrack_eventmask_report via nf_confirm() and
>    then in ctnetlink only to find that we have no listeners
>    (and we can't call nfnetlink_has_listeners() from conntrack because
>     that would yield a dependency of conntrack to nfnetlink).
> 
> I would propose to add minimal conntrack specific code
> to nfnetlink, namely, to add bind()(/unbind() calls that inc/dec a
> counter for each ctnetlink event listener/socket.
> If counter becomes nonzero, flip a bit in struct net.
> 
> This would allow us to do the following:
> 
> add new net.netfilter.nf_conntrack_events default mode: 2, autodetect.

Probably the sysctl entry does not make any sense anymore if you can
autodetect when there is a listener?

> in nfnetlink bind, inc pernet counter when event group is bound.
> in nfnetlink unbind, dec pernet counter when event group is unbound.

So you keep one counter per netlink group in netns area?

> in init_conntrack() allocate the event cache extension only if
>  a) nf_conntrack_events == 1, or
>  b) nf_conntrack_events == 2 and pernet counter is nonzero.
> 
> Extend nf_confirm() to check of the pernet counter before
> call to __nf_conntrack_eventmask_report().
> 
> If nobody spots a problem with this idea I'd start to work on
> a prototype.

There is also setsockopt() to subscribe to netlink groups, you might
need to extend netlink_kernel_cfg to deal with this case too?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] conntrack event framework speedup
  2022-03-15 21:30 ` Pablo Neira Ayuso
@ 2022-03-15 21:41   ` Florian Westphal
  2022-03-15 21:53     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2022-03-15 21:41 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > add new net.netfilter.nf_conntrack_events default mode: 2, autodetect.
> 
> Probably the sysctl entry does not make any sense anymore if you can
> autodetect when there is a listener?

Hmmm, did not consider that.  I *think* we still want to allow to
disable the feature because of xt_CT/nft_ct.

Someone might have nf_conntrack_events=0 and tehy could be using
explicit configuration via templates (and then expect that only
those flows that matched a '-j CT' rule generate events.

> > in nfnetlink bind, inc pernet counter when event group is bound.
> > in nfnetlink unbind, dec pernet counter when event group is unbound.
> 
> So you keep one counter per netlink group in netns area?

Rough sketch (doesn't compile/apply):

+++ b/net/netfilter/nfnetlink.c
@@ -45,6 +45,7 @@ MODULE_DESCRIPTION("Netfilter messages via netlink socket");
 static unsigned int nfnetlink_pernet_id __read_mostly;
  
  struct nfnl_net {
  +  unsigned int ctnetlink_listeners;
     struct sock *nfnl;

 static int nfnetlink_bind(struct net *net, int group)
 {
        const struct nfnetlink_subsystem *ss;
@@ -691,11 +691,47 @@ static int nfnetlink_bind(struct net *net, int group)
        if (!ss)
                request_module_nowait("nfnetlink-subsys-%d", type);
+
+       if (type == NFNL_SUBSYS_CTNETLINK) {
+               struct nfnl_net *nfnlnet = nfnl_pernet(net);
+
+               nfnl_lock(NFNL_SUBSYS_CTNETLINK);
+               nfnlnet->ctnetlink_listeners++;
+               if (nfnlnet->ctnetlink_listeners == 1)
+                       net->ct.ctnetlink_has_listener = true;
+               nfnl_unlock(NFNL_SUBSYS_CTNETLINK);

and then check 'net->ct.ctnetlink_has_listener' when allocating
a new conntrack.

> > a prototype.
> 
> There is also setsockopt() to subscribe to netlink groups, you might
> need to extend netlink_kernel_cfg to deal with this case too?

afaics the netlink_bind callback is invoked for subscriptions via setsockopt too,
so that angle shouw be covered.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] conntrack event framework speedup
  2022-03-15 21:41   ` Florian Westphal
@ 2022-03-15 21:53     ` Pablo Neira Ayuso
  2022-03-15 22:07       ` Florian Westphal
  0 siblings, 1 reply; 8+ messages in thread
From: Pablo Neira Ayuso @ 2022-03-15 21:53 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Tue, Mar 15, 2022 at 10:41:21PM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > add new net.netfilter.nf_conntrack_events default mode: 2, autodetect.
> > 
> > Probably the sysctl entry does not make any sense anymore if you can
> > autodetect when there is a listener?
> 
> Hmmm, did not consider that.  I *think* we still want to allow to
> disable the feature because of xt_CT/nft_ct.
> 
> Someone might have nf_conntrack_events=0 and tehy could be using
> explicit configuration via templates (and then expect that only
> those flows that matched a '-j CT' rule generate events.

Maybe could you bump the ctnetlink_listeners counter when -j CT is
used with event filtering?

> > > in nfnetlink bind, inc pernet counter when event group is bound.
> > > in nfnetlink unbind, dec pernet counter when event group is unbound.
> > 
> > So you keep one counter per netlink group in netns area?
> 
> Rough sketch (doesn't compile/apply):
> 
> +++ b/net/netfilter/nfnetlink.c
> @@ -45,6 +45,7 @@ MODULE_DESCRIPTION("Netfilter messages via netlink socket");
>  static unsigned int nfnetlink_pernet_id __read_mostly;
>   
>   struct nfnl_net {
>   +  unsigned int ctnetlink_listeners;
>      struct sock *nfnl;
> 
>  static int nfnetlink_bind(struct net *net, int group)
>  {
>         const struct nfnetlink_subsystem *ss;
> @@ -691,11 +691,47 @@ static int nfnetlink_bind(struct net *net, int group)
>         if (!ss)
>                 request_module_nowait("nfnetlink-subsys-%d", type);
> +
> +       if (type == NFNL_SUBSYS_CTNETLINK) {
> +               struct nfnl_net *nfnlnet = nfnl_pernet(net);
> +
> +               nfnl_lock(NFNL_SUBSYS_CTNETLINK);
> +               nfnlnet->ctnetlink_listeners++;
> +               if (nfnlnet->ctnetlink_listeners == 1)
> +                       net->ct.ctnetlink_has_listener = true;
> +               nfnl_unlock(NFNL_SUBSYS_CTNETLINK);
> 
> and then check 'net->ct.ctnetlink_has_listener' when allocating
> a new conntrack.

LGTM.

> > > a prototype.
> > 
> > There is also setsockopt() to subscribe to netlink groups, you might
> > need to extend netlink_kernel_cfg to deal with this case too?
> 
> afaics the netlink_bind callback is invoked for subscriptions via setsockopt too,
> so that angle shouw be covered.

good.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] conntrack event framework speedup
  2022-03-15 21:53     ` Pablo Neira Ayuso
@ 2022-03-15 22:07       ` Florian Westphal
  2022-03-16  9:12         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2022-03-15 22:07 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Tue, Mar 15, 2022 at 10:41:21PM +0100, Florian Westphal wrote:
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > > add new net.netfilter.nf_conntrack_events default mode: 2, autodetect.
> > > 
> > > Probably the sysctl entry does not make any sense anymore if you can
> > > autodetect when there is a listener?
> > 
> > Hmmm, did not consider that.  I *think* we still want to allow to
> > disable the feature because of xt_CT/nft_ct.
> > 
> > Someone might have nf_conntrack_events=0 and tehy could be using
> > explicit configuration via templates (and then expect that only
> > those flows that matched a '-j CT' rule generate events.
> 
> Maybe could you bump the ctnetlink_listeners counter when -j CT is
> used with event filtering?

Hmmm, I don't think that will work.  The -j CT thing can be used to
enable event reporting (including the event type) for particular flows
only.  E.g. users might do:

nf_conntrack_events=0
and then only enable destroy events for tcp traffic on port 22, 80, 443
(arbitrary example).

If I bump the listen-count, then they will see event reports for
for udp timeouts and everything else.

IDEALLY we could ditch the sysctl, the autotuning and tell users they
now need to configure events with nft/iptables but given the 'ct
helpers' thing I'm sure we'll get lots of complaits about broken event
reporting ;-)

> > @@ -691,11 +691,47 @@ static int nfnetlink_bind(struct net *net, int group)
> >         if (!ss)
> >                 request_module_nowait("nfnetlink-subsys-%d", type);
> > +
> > +       if (type == NFNL_SUBSYS_CTNETLINK) {
> > +               struct nfnl_net *nfnlnet = nfnl_pernet(net);
> > +
> > +               nfnl_lock(NFNL_SUBSYS_CTNETLINK);
> > +               nfnlnet->ctnetlink_listeners++;
> > +               if (nfnlnet->ctnetlink_listeners == 1)
> > +                       net->ct.ctnetlink_has_listener = true;
> > +               nfnl_unlock(NFNL_SUBSYS_CTNETLINK);
> > 
> > and then check 'net->ct.ctnetlink_has_listener' when allocating
> > a new conntrack.
> 
> LGTM.

Thanks.  I will work on a parototype along these lines and see where
that leads.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] conntrack event framework speedup
  2022-03-15 22:07       ` Florian Westphal
@ 2022-03-16  9:12         ` Pablo Neira Ayuso
  2022-03-16 12:18           ` Florian Westphal
  0 siblings, 1 reply; 8+ messages in thread
From: Pablo Neira Ayuso @ 2022-03-16  9:12 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Tue, Mar 15, 2022 at 11:07:48PM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Tue, Mar 15, 2022 at 10:41:21PM +0100, Florian Westphal wrote:
> > > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > > > add new net.netfilter.nf_conntrack_events default mode: 2, autodetect.
> > > > 
> > > > Probably the sysctl entry does not make any sense anymore if you can
> > > > autodetect when there is a listener?
> > > 
> > > Hmmm, did not consider that.  I *think* we still want to allow to
> > > disable the feature because of xt_CT/nft_ct.
> > > 
> > > Someone might have nf_conntrack_events=0 and tehy could be using
> > > explicit configuration via templates (and then expect that only
> > > those flows that matched a '-j CT' rule generate events.
> > 
> > Maybe could you bump the ctnetlink_listeners counter when -j CT is
> > used with event filtering?
> 
> Hmmm, I don't think that will work.  The -j CT thing can be used to
> enable event reporting (including the event type) for particular flows
> only.

IIRC, it allows to filter what events are of your interest in a global
fashion.

> E.g. users might do:
> 
> nf_conntrack_events=0
> and then only enable destroy events for tcp traffic on port 22, 80, 443
> (arbitrary example).
> 
> If I bump the listen-count, then they will see event reports for
> for udp timeouts and everything else.

Are you sure? -j CT sets on the event mask. The explicit -j CT rules
means userspace want to listen to events, but only those that you
specified. So it is the same as having a userspace process to listen,
but the global filtering applies.

My understanding is that the listen-count tells that packets should
follow ct netlink event path.

What am I missing?

> IDEALLY we could ditch the sysctl, the autotuning and tell users they
> now need to configure events with nft/iptables but given the 'ct
> helpers' thing I'm sure we'll get lots of complaits about broken event
> reporting ;-)

It won't be flexible enough for all usecases. The ct event filter from
rule is global.

> > > @@ -691,11 +691,47 @@ static int nfnetlink_bind(struct net *net, int group)
> > >         if (!ss)
> > >                 request_module_nowait("nfnetlink-subsys-%d", type);
> > > +
> > > +       if (type == NFNL_SUBSYS_CTNETLINK) {
> > > +               struct nfnl_net *nfnlnet = nfnl_pernet(net);
> > > +
> > > +               nfnl_lock(NFNL_SUBSYS_CTNETLINK);
> > > +               nfnlnet->ctnetlink_listeners++;
> > > +               if (nfnlnet->ctnetlink_listeners == 1)
> > > +                       net->ct.ctnetlink_has_listener = true;
> > > +               nfnl_unlock(NFNL_SUBSYS_CTNETLINK);
> > > 
> > > and then check 'net->ct.ctnetlink_has_listener' when allocating
> > > a new conntrack.
> > 
> > LGTM.
> 
> Thanks.  I will work on a parototype along these lines and see where
> that leads.

Let me know,

Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] conntrack event framework speedup
  2022-03-16  9:12         ` Pablo Neira Ayuso
@ 2022-03-16 12:18           ` Florian Westphal
  2022-03-17  9:13             ` Pablo Neira Ayuso
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2022-03-16 12:18 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Hmmm, I don't think that will work.  The -j CT thing can be used to
> > enable event reporting (including the event type) for particular flows
> > only.
> 
> IIRC, it allows to filter what events are of your interest in a global
> fashion.
> 
> > E.g. users might do:
> > 
> > nf_conntrack_events=0
> > and then only enable destroy events for tcp traffic on port 22, 80, 443
> > (arbitrary example).
> > 
> > If I bump the listen-count, then they will see event reports for
> > for udp timeouts and everything else.
> 
> Are you sure? -j CT sets on the event mask. The explicit -j CT rules
> means userspace want to listen to events, but only those that you
> specified. So it is the same as having a userspace process to listen,
> but the global filtering applies.

The filtering isn't global, its per flow.  Provided
nf_conntrack_events=0, then only flows where the first packet matched
a -j CT rule will generate events, AND only those events that were
specified in its event mask.

So, flows that did not match any CT rule never generate an event, and,
therefore, changes to the kernel should not auto-add the extension for
them.

I don't see how that mechanism can be preserved without the ability to
set nf_conntrack_events=0.

When a new conntrack is generated, the test is (in current kernels):

'add the event cache extension if the template has an event cache
 extension OR if the sysctl is enabled'.

So, changing it to
'add the event cache extension if the template has an event cache
 extension OR if we have a listener' is not the same, unfortunately.

> My understanding is that the listen-count tells that packets should
> follow ct netlink event path.

Yes, thats correct, it tells kernel there is an active subscriber for
events.

> What am I missing?

I can't tell the following two cases apart:

1. templates are active and user wants events ONLY for the chosen flow,
   e.g. tcp.
2. templates are active and user wants only particular events for the
   chosen flows, but all events for the rest.

1) is done by templates + setting the sysctl to 0.
2) is done by templates + setting the sysctl to 1.

With 'assume 1 if listener active', we can only provide functionality of 2).

I finished testing of a prototype and it appears that functionality is ok,
I've pushed this here:
https://git.breakpoint.cc/cgit/fw/nf-next.git/log/?h=nf_ct_events_02

(only the top-most 4 changes are relevant).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] conntrack event framework speedup
  2022-03-16 12:18           ` Florian Westphal
@ 2022-03-17  9:13             ` Pablo Neira Ayuso
  0 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2022-03-17  9:13 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Wed, Mar 16, 2022 at 01:18:46PM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > Hmmm, I don't think that will work.  The -j CT thing can be used to
> > > enable event reporting (including the event type) for particular flows
> > > only.
> > 
> > IIRC, it allows to filter what events are of your interest in a global
> > fashion.
> > 
> > > E.g. users might do:
> > > 
> > > nf_conntrack_events=0
> > > and then only enable destroy events for tcp traffic on port 22, 80, 443
> > > (arbitrary example).
> > > 
> > > If I bump the listen-count, then they will see event reports for
> > > for udp timeouts and everything else.
> > 
> > Are you sure? -j CT sets on the event mask. The explicit -j CT rules
> > means userspace want to listen to events, but only those that you
> > specified. So it is the same as having a userspace process to listen,
> > but the global filtering applies.
> 
> The filtering isn't global, its per flow.  Provided
> nf_conntrack_events=0, then only flows where the first packet matched
> a -j CT rule will generate events, AND only those events that were
> specified in its event mask.
> 
> So, flows that did not match any CT rule never generate an event, and,
> therefore, changes to the kernel should not auto-add the extension for
> them.
> 
> I don't see how that mechanism can be preserved without the ability to
> set nf_conntrack_events=0.
> 
> When a new conntrack is generated, the test is (in current kernels):
> 
> 'add the event cache extension if the template has an event cache
>  extension OR if the sysctl is enabled'.
> 
> So, changing it to
> 'add the event cache extension if the template has an event cache
>  extension OR if we have a listener' is not the same, unfortunately.
> 
> > My understanding is that the listen-count tells that packets should
> > follow ct netlink event path.
> 
> Yes, thats correct, it tells kernel there is an active subscriber for
> events.
> 
> > What am I missing?
> 
> I can't tell the following two cases apart:
> 
> 1. templates are active and user wants events ONLY for the chosen flow,
>    e.g. tcp.
> 2. templates are active and user wants only particular events for the
>    chosen flows, but all events for the rest.
> 
> 
> 1) is done by templates + setting the sysctl to 0.
> 2) is done by templates + setting the sysctl to 1.
> 
> With 'assume 1 if listener active', we can only provide functionality of 2).
> 
> I finished testing of a prototype and it appears that functionality is ok,
> I've pushed this here:
> https://git.breakpoint.cc/cgit/fw/nf-next.git/log/?h=nf_ct_events_02
> 
> (only the top-most 4 changes are relevant).

Thanks for explaining.

One more question: Why do we need the nf_conntrack_events=2? What is
the reason to not use this as default (ie. 1)?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-03-17  9:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-03-15 12:05 [RFC] conntrack event framework speedup Florian Westphal
2022-03-15 21:30 ` Pablo Neira Ayuso
2022-03-15 21:41   ` Florian Westphal
2022-03-15 21:53     ` Pablo Neira Ayuso
2022-03-15 22:07       ` Florian Westphal
2022-03-16  9:12         ` Pablo Neira Ayuso
2022-03-16 12:18           ` Florian Westphal
2022-03-17  9:13             ` Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).