[PATCH net 0/2] conntrack update

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net 0/2] conntrack update
@ 2016-10-10 10:18 Nicolas Dichtel
  2016-10-10 10:18 ` [PATCH net 1/2] conntrack: remove obsolete sysctl (nf_conntrack_events_retry_timeout) Nicolas Dichtel
  2016-10-10 10:18 ` [PATCH net 2/2] conntrack: enable to tune gc parameters Nicolas Dichtel
  0 siblings, 2 replies; 19+ messages in thread
From: Nicolas Dichtel @ 2016-10-10 10:18 UTC (permalink / raw)
  To: davem, pablo; +Cc: netdev, netfilter-devel, fw

The first patch is a small documentation fix.
The second patch adds more flexibility to configure the conntrack gc. I target
it specifically to net and not net-next because the removal of the conntrack
timer has just land into net.

 Documentation/networking/nf_conntrack-sysctl.txt | 35 +++++++++++------------
 include/net/netfilter/nf_conntrack_core.h        |  5 ++++
 net/netfilter/nf_conntrack_core.c                | 17 +++++------
 net/netfilter/nf_conntrack_standalone.c          | 36 ++++++++++++++++++++++++
 4 files changed, 67 insertions(+), 26 deletions(-)

Comments are welcomed,
Regards,
Nicolas

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net 1/2] conntrack: remove obsolete sysctl (nf_conntrack_events_retry_timeout)
  2016-10-10 10:18 [PATCH net 0/2] conntrack update Nicolas Dichtel
@ 2016-10-10 10:18 ` Nicolas Dichtel
  2016-10-10 13:57   ` Florian Westphal
  2016-10-10 10:18 ` [PATCH net 2/2] conntrack: enable to tune gc parameters Nicolas Dichtel
  1 sibling, 1 reply; 19+ messages in thread
From: Nicolas Dichtel @ 2016-10-10 10:18 UTC (permalink / raw)
  To: davem, pablo; +Cc: netdev, netfilter-devel, fw, Nicolas Dichtel

This entry has been removed in commit 9500507c6138.

Fixes: 9500507c6138 ("netfilter: conntrack: remove timer from ecache extension")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 Documentation/networking/nf_conntrack-sysctl.txt | 18 ------------------
 1 file changed, 18 deletions(-)

diff --git a/Documentation/networking/nf_conntrack-sysctl.txt b/Documentation/networking/nf_conntrack-sysctl.txt
index 4fb51d32fccc..399e4e866a9c 100644
--- a/Documentation/networking/nf_conntrack-sysctl.txt
+++ b/Documentation/networking/nf_conntrack-sysctl.txt
@@ -33,24 +33,6 @@ nf_conntrack_events - BOOLEAN
 	If this option is enabled, the connection tracking code will
 	provide userspace with connection tracking events via ctnetlink.
 
-nf_conntrack_events_retry_timeout - INTEGER (seconds)
-	default 15
-
-	This option is only relevant when "reliable connection tracking
-	events" are used.  Normally, ctnetlink is "lossy", that is,
-	events are normally dropped when userspace listeners can't keep up.
-
-	Userspace can request "reliable event mode".  When this mode is
-	active, the conntrack will only be destroyed after the event was
-	delivered.  If event delivery fails, the kernel periodically
-	re-tries to send the event to userspace.
-
-	This is the maximum interval the kernel should use when re-trying
-	to deliver the destroy event.
-
-	A higher number means there will be fewer delivery retries and it
-	will take longer for a backlog to be processed.
-
 nf_conntrack_expect_max - INTEGER
 	Maximum size of expectation table.  Default value is
 	nf_conntrack_buckets / 256. Minimum is 1.
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net 2/2] conntrack: enable to tune gc parameters
  2016-10-10 10:18 [PATCH net 0/2] conntrack update Nicolas Dichtel
  2016-10-10 10:18 ` [PATCH net 1/2] conntrack: remove obsolete sysctl (nf_conntrack_events_retry_timeout) Nicolas Dichtel
@ 2016-10-10 10:18 ` Nicolas Dichtel
  2016-10-10 14:04   ` Florian Westphal
  1 sibling, 1 reply; 19+ messages in thread
From: Nicolas Dichtel @ 2016-10-10 10:18 UTC (permalink / raw)
  To: davem, pablo; +Cc: netdev, netfilter-devel, fw, Nicolas Dichtel

After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
timed-out entries"), netlink conntrack deletion events may be sent with a
huge delay. It could be interesting to let the user tweak gc parameters
depending on its use case.

CC: Florian Westphal <fw@strlen.de>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 Documentation/networking/nf_conntrack-sysctl.txt | 17 +++++++++++
 include/net/netfilter/nf_conntrack_core.h        |  5 ++++
 net/netfilter/nf_conntrack_core.c                | 17 +++++------
 net/netfilter/nf_conntrack_standalone.c          | 36 ++++++++++++++++++++++++
 4 files changed, 67 insertions(+), 8 deletions(-)

diff --git a/Documentation/networking/nf_conntrack-sysctl.txt b/Documentation/networking/nf_conntrack-sysctl.txt
index 399e4e866a9c..5b6ace93521d 100644
--- a/Documentation/networking/nf_conntrack-sysctl.txt
+++ b/Documentation/networking/nf_conntrack-sysctl.txt
@@ -37,6 +37,23 @@ nf_conntrack_expect_max - INTEGER
 	Maximum size of expectation table.  Default value is
 	nf_conntrack_buckets / 256. Minimum is 1.
 
+nf_conntrack_gc_interval - INTEGER
+	Maximum interval in second between two run of the conntrack gc. This
+	gc is in charge of removing stale entries. It also impacts the delay
+	before notifying the userland a conntrack deletion.
+	This sysctl is only writeable in the initial net namespace.
+
+nf_conntrack_gc_max_buckets - INTEGER
+nf_conntrack_gc_max_buckets_div - INTEGER
+	During a run, the conntrack gc processes at maximum
+	nf_conntrack_buckets/nf_conntrack_gc_max_buckets_div (and never more
+	than nf_conntrack_gc_max_buckets) entries.
+	These sysctl are only writeable in the initial net namespace.
+
+nf_conntrack_gc_max_evicts - INTEGER
+	The maximum number of entries to be evicted during a run of gc.
+	This sysctl is only writeable in the initial net namespace.
+
 nf_conntrack_frag6_high_thresh - INTEGER
 	default 262144
 
diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h
index 62e17d1319ff..2a5ed368fb71 100644
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -86,4 +86,9 @@ void nf_conntrack_lock(spinlock_t *lock);
 
 extern spinlock_t nf_conntrack_expect_lock;
 
+extern unsigned int nf_ct_gc_interval;
+extern unsigned int nf_ct_gc_max_buckets_div;
+extern unsigned int nf_ct_gc_max_buckets;
+extern unsigned int nf_ct_gc_max_evicts;
+
 #endif /* _NF_CONNTRACK_CORE_H */
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index ba6a1d421222..435b431e3449 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -83,10 +83,10 @@ static __read_mostly spinlock_t nf_conntrack_locks_all_lock;
 static __read_mostly DEFINE_SPINLOCK(nf_conntrack_locks_all_lock);
 static __read_mostly bool nf_conntrack_locks_all;
 
-#define GC_MAX_BUCKETS_DIV	64u
-#define GC_MAX_BUCKETS		8192u
-#define GC_INTERVAL		(5 * HZ)
-#define GC_MAX_EVICTS		256u
+unsigned int nf_ct_gc_interval = 5 * HZ;
+unsigned int nf_ct_gc_max_buckets = 8192;
+unsigned int nf_ct_gc_max_buckets_div = 64;
+unsigned int nf_ct_gc_max_evicts = 256;
 
 static struct conntrack_gc_work conntrack_gc_work;
 
@@ -936,13 +936,14 @@ static noinline int early_drop(struct net *net, unsigned int _hash)
 static void gc_worker(struct work_struct *work)
 {
 	unsigned int i, goal, buckets = 0, expired_count = 0;
-	unsigned long next_run = GC_INTERVAL;
+	unsigned long next_run = nf_ct_gc_interval;
 	unsigned int ratio, scanned = 0;
 	struct conntrack_gc_work *gc_work;
 
 	gc_work = container_of(work, struct conntrack_gc_work, dwork.work);
 
-	goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS);
+	goal = min(nf_conntrack_htable_size / nf_ct_gc_max_buckets_div,
+		   nf_ct_gc_max_buckets);
 	i = gc_work->last_bucket;
 
 	do {
@@ -977,7 +978,7 @@ static void gc_worker(struct work_struct *work)
 		rcu_read_unlock();
 		cond_resched_rcu_qs();
 	} while (++buckets < goal &&
-		 expired_count < GC_MAX_EVICTS);
+		 expired_count < nf_ct_gc_max_evicts);
 
 	if (gc_work->exiting)
 		return;
@@ -1885,7 +1886,7 @@ int nf_conntrack_init_start(void)
 	nf_ct_untracked_status_or(IPS_CONFIRMED | IPS_UNTRACKED);
 
 	conntrack_gc_work_init(&conntrack_gc_work);
-	schedule_delayed_work(&conntrack_gc_work.dwork, GC_INTERVAL);
+	schedule_delayed_work(&conntrack_gc_work.dwork, nf_ct_gc_interval);
 
 	return 0;
 
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 5f446cd9f3fd..c5310fb35eca 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -445,6 +445,8 @@ static void nf_conntrack_standalone_fini_proc(struct net *net)
 /* Sysctl support */
 
 #ifdef CONFIG_SYSCTL
+static int one = 1;
+static int int_max = INT_MAX;
 /* Log invalid packets of a given protocol */
 static int log_invalid_proto_min __read_mostly;
 static int log_invalid_proto_max __read_mostly = 255;
@@ -517,6 +519,40 @@ static struct ctl_table nf_ct_sysctl_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+	{
+		.procname	= "nf_conntrack_gc_interval",
+		.data		= &nf_ct_gc_interval,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_jiffies,
+	},
+	{
+		.procname	= "nf_conntrack_gc_max_buckets",
+		.data		= &nf_ct_gc_max_buckets,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &one,
+		.extra2		= &int_max,
+	},
+	{
+		.procname	= "nf_conntrack_gc_max_buckets_div",
+		.data		= &nf_ct_gc_max_buckets_div,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &one,
+		.extra2		= &int_max,
+	},
+	{
+		.procname	= "nf_conntrack_gc_max_evicts",
+		.data		= &nf_ct_gc_max_evicts,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &one,
+		.extra2		= &int_max,
+	},
 	{ }
 };
 
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net 1/2] conntrack: remove obsolete sysctl (nf_conntrack_events_retry_timeout)
  2016-10-10 10:18 ` [PATCH net 1/2] conntrack: remove obsolete sysctl (nf_conntrack_events_retry_timeout) Nicolas Dichtel
@ 2016-10-10 13:57   ` Florian Westphal
  2016-10-17 15:39     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 19+ messages in thread
From: Florian Westphal @ 2016-10-10 13:57 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: davem, pablo, netdev, netfilter-devel, fw

Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> This entry has been removed in commit 9500507c6138.
> 
> Fixes: 9500507c6138 ("netfilter: conntrack: remove timer from ecache extension")
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Acked-by: Florian Westphal <fw@strlen.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] conntrack: enable to tune gc parameters
  2016-10-10 10:18 ` [PATCH net 2/2] conntrack: enable to tune gc parameters Nicolas Dichtel
@ 2016-10-10 14:04   ` Florian Westphal
  2016-10-10 15:24     ` Nicolas Dichtel
  0 siblings, 1 reply; 19+ messages in thread
From: Florian Westphal @ 2016-10-10 14:04 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: davem, pablo, netdev, netfilter-devel, fw

Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
> timed-out entries"), netlink conntrack deletion events may be sent with a
> huge delay. It could be interesting to let the user tweak gc parameters
> depending on its use case.

Hmm, care to elaborate?

I am not against doing this but I'd like to hear/read your use case.

The expectation is that in almot all cases eviction will happen from
packet path.  The gc worker is jusdt there for case where a busy system
goes idle.

> +nf_conntrack_gc_max_evicts - INTEGER
> +	The maximum number of entries to be evicted during a run of gc.
> +	This sysctl is only writeable in the initial net namespace.

Hmmm, do you have any advice on sizing this one?

I think a better change might be (instead of adding htis knob) to
resched the gc worker for immediate re-executaion in case the entire
"budget" was used.  What do you think?


diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -983,7 +983,7 @@ static void gc_worker(struct work_struct *work)
                return;
 
        ratio = scanned ? expired_count * 100 / scanned : 0;
-       if (ratio >= 90)
+       if (ratio >= 90 || expired_count == GC_MAX_EVICTS)
                next_run = 0;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] conntrack: enable to tune gc parameters
  2016-10-10 14:04   ` Florian Westphal
@ 2016-10-10 15:24     ` Nicolas Dichtel
  2016-10-13 20:43       ` Florian Westphal
  0 siblings, 1 reply; 19+ messages in thread
From: Nicolas Dichtel @ 2016-10-10 15:24 UTC (permalink / raw)
  To: Florian Westphal; +Cc: davem, pablo, netdev, netfilter-devel

Le 10/10/2016 à 16:04, Florian Westphal a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>> After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
>> timed-out entries"), netlink conntrack deletion events may be sent with a
>> huge delay. It could be interesting to let the user tweak gc parameters
>> depending on its use case.
> 
> Hmm, care to elaborate?
> 
> I am not against doing this but I'd like to hear/read your use case.
> 
> The expectation is that in almot all cases eviction will happen from
> packet path.  The gc worker is jusdt there for case where a busy system
> goes idle.
It was precisely that case. After a period of activity, the event is sent a long
time after the timeout. If the router does not manage a lot of flows, why not
trying to parse more entries instead of the default 1/64 of the table?
In fact, I don't understand why using GC_MAX_BUCKETS_DIV instead of using always
GC_MAX_BUCKETS whatever the size of the table is.

> 
>> +nf_conntrack_gc_max_evicts - INTEGER
>> +	The maximum number of entries to be evicted during a run of gc.
>> +	This sysctl is only writeable in the initial net namespace.
> 
> Hmmm, do you have any advice on sizing this one?
In fact, no ;-)
I really hesitate to expose the four values or just a subset. My goal was also
to get feedback. I can remove this one.

> 
> I think a better change might be (instead of adding htis knob) to
> resched the gc worker for immediate re-executaion in case the entire
> "budget" was used.  What do you think?
Even if it's not directly related to my problem, I think it's a good idea.

> 
> 
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -983,7 +983,7 @@ static void gc_worker(struct work_struct *work)
>                 return;
>  
>         ratio = scanned ? expired_count * 100 / scanned : 0;
> -       if (ratio >= 90)
> +       if (ratio >= 90 || expired_count == GC_MAX_EVICTS)
>                 next_run = 0;
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] conntrack: enable to tune gc parameters
  2016-10-10 15:24     ` Nicolas Dichtel
@ 2016-10-13 20:43       ` Florian Westphal
  2016-10-14 10:12         ` Nicolas Dichtel
  0 siblings, 1 reply; 19+ messages in thread
From: Florian Westphal @ 2016-10-13 20:43 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: Florian Westphal, davem, pablo, netdev, netfilter-devel

Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> Le 10/10/2016 à 16:04, Florian Westphal a écrit :
> > Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> >> After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
> >> timed-out entries"), netlink conntrack deletion events may be sent with a
> >> huge delay. It could be interesting to let the user tweak gc parameters
> >> depending on its use case.
> > 
> > Hmm, care to elaborate?
> > 
> > I am not against doing this but I'd like to hear/read your use case.
> > 
> > The expectation is that in almot all cases eviction will happen from
> > packet path.  The gc worker is jusdt there for case where a busy system
> > goes idle.
> It was precisely that case. After a period of activity, the event is sent a long
> time after the timeout. If the router does not manage a lot of flows, why not
> trying to parse more entries instead of the default 1/64 of the table?
> In fact, I don't understand why using GC_MAX_BUCKETS_DIV instead of using always
> GC_MAX_BUCKETS whatever the size of the table is.

I wanted to make sure that we have a known upper bound on the number of
buckets we process so that we do not block other pending kworker items
for too long.

(Or cause too many useless scans)

Another idea worth trying might be to get rid of the max cap and
instead break early in case too many jiffies expired.

I don't want to add sysctl knobs for this unless absolutely needed; its already
possible to 'force' eviction cycle by running 'conntrack -L'.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] conntrack: enable to tune gc parameters
  2016-10-13 20:43       ` Florian Westphal
@ 2016-10-14 10:12         ` Nicolas Dichtel
  2016-10-14 10:37           ` Florian Westphal
  0 siblings, 1 reply; 19+ messages in thread
From: Nicolas Dichtel @ 2016-10-14 10:12 UTC (permalink / raw)
  To: Florian Westphal; +Cc: davem, pablo, netdev, netfilter-devel

Le 13/10/2016 à 22:43, Florian Westphal a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>> Le 10/10/2016 à 16:04, Florian Westphal a écrit :
>>> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>>>> After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
>>>> timed-out entries"), netlink conntrack deletion events may be sent with a
>>>> huge delay. It could be interesting to let the user tweak gc parameters
>>>> depending on its use case.
>>>
>>> Hmm, care to elaborate?
>>>
>>> I am not against doing this but I'd like to hear/read your use case.
>>>
>>> The expectation is that in almot all cases eviction will happen from
>>> packet path.  The gc worker is jusdt there for case where a busy system
>>> goes idle.
>> It was precisely that case. After a period of activity, the event is sent a long
>> time after the timeout. If the router does not manage a lot of flows, why not
>> trying to parse more entries instead of the default 1/64 of the table?
>> In fact, I don't understand why using GC_MAX_BUCKETS_DIV instead of using always
>> GC_MAX_BUCKETS whatever the size of the table is.
> 
> I wanted to make sure that we have a known upper bound on the number of
> buckets we process so that we do not block other pending kworker items
> for too long.
I don't understand. GC_MAX_BUCKETS is the upper bound and I agree that it is
needed. But why GC_MAX_BUCKETS_DIV (ie 1/64)?
In other words, why this line:
goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS);
instead of:
goal = GC_MAX_BUCKETS;
?

> 
> (Or cause too many useless scans)
> 
> Another idea worth trying might be to get rid of the max cap and
> instead break early in case too many jiffies expired.
> 
> I don't want to add sysctl knobs for this unless absolutely needed; its already
> possible to 'force' eviction cycle by running 'conntrack -L'.
> 
Sure, but this is not a "real" solution, just a workaround.
We need to find a way to deliver conntrack deletion events in a reasonable
delay, whatever the traffic on the machine is.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] conntrack: enable to tune gc parameters
  2016-10-14 10:12         ` Nicolas Dichtel
@ 2016-10-14 10:37           ` Florian Westphal
  2016-10-14 10:53             ` Pablo Neira Ayuso
  0 siblings, 1 reply; 19+ messages in thread
From: Florian Westphal @ 2016-10-14 10:37 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: Florian Westphal, davem, pablo, netdev, netfilter-devel

Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> Le 13/10/2016 à 22:43, Florian Westphal a écrit :
> > Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> >> Le 10/10/2016 à 16:04, Florian Westphal a écrit :
> >>> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> >>>> After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
> >>>> timed-out entries"), netlink conntrack deletion events may be sent with a
> >>>> huge delay. It could be interesting to let the user tweak gc parameters
> >>>> depending on its use case.
> >>>
> >>> Hmm, care to elaborate?
> >>>
> >>> I am not against doing this but I'd like to hear/read your use case.
> >>>
> >>> The expectation is that in almot all cases eviction will happen from
> >>> packet path.  The gc worker is jusdt there for case where a busy system
> >>> goes idle.
> >> It was precisely that case. After a period of activity, the event is sent a long
> >> time after the timeout. If the router does not manage a lot of flows, why not
> >> trying to parse more entries instead of the default 1/64 of the table?
> >> In fact, I don't understand why using GC_MAX_BUCKETS_DIV instead of using always
> >> GC_MAX_BUCKETS whatever the size of the table is.
> > 
> > I wanted to make sure that we have a known upper bound on the number of
> > buckets we process so that we do not block other pending kworker items
> > for too long.
> I don't understand. GC_MAX_BUCKETS is the upper bound and I agree that it is
> needed. But why GC_MAX_BUCKETS_DIV (ie 1/64)?
> In other words, why this line:
> goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS);
> instead of:
> goal = GC_MAX_BUCKETS;

Sure, we can do that.  But why is a fixed size better than a fraction?

E.g. with 8k buckets and simple goal = GC_MAX_BUCKETS we scan entire
table on every run, currently we only scan 128.

I wanted to keep too many destroy notifications from firing at once
but maybe i was too paranoid...

> > (Or cause too many useless scans)
> > 
> > Another idea worth trying might be to get rid of the max cap and
> > instead break early in case too many jiffies expired.
> > 
> > I don't want to add sysctl knobs for this unless absolutely needed; its already
> > possible to 'force' eviction cycle by running 'conntrack -L'.
> > 
> Sure, but this is not a "real" solution, just a workaround.
> We need to find a way to deliver conntrack deletion events in a reasonable
> delay, whatever the traffic on the machine is.

Agree, but that depends on what 'reasonable' means and what kind of
uneeded cpu churn we're willing to add.

We can add a sysctl for this but we should use a low default to not do
too much unneeded work.

So what about your original patch, but only add

nf_conntrack_gc_interval

(and also add instant-resched in case entire budget was consumed)?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] conntrack: enable to tune gc parameters
  2016-10-14 10:37           ` Florian Westphal
@ 2016-10-14 10:53             ` Pablo Neira Ayuso
  2016-10-14 11:16               ` Florian Westphal
  2016-10-18  8:30               ` [PATCH net] conntrack: perform a full scan in gc Nicolas Dichtel
  0 siblings, 2 replies; 19+ messages in thread
From: Pablo Neira Ayuso @ 2016-10-14 10:53 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Nicolas Dichtel, davem, netdev, netfilter-devel

On Fri, Oct 14, 2016 at 12:37:26PM +0200, Florian Westphal wrote:
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> > Le 13/10/2016 à 22:43, Florian Westphal a écrit :
[...]
> > > (Or cause too many useless scans)
> > > 
> > > Another idea worth trying might be to get rid of the max cap and
> > > instead break early in case too many jiffies expired.
> > > 
> > > I don't want to add sysctl knobs for this unless absolutely needed; its already
> > > possible to 'force' eviction cycle by running 'conntrack -L'.
> > > 
> > Sure, but this is not a "real" solution, just a workaround.
> > We need to find a way to deliver conntrack deletion events in a reasonable
> > delay, whatever the traffic on the machine is.
> 
> Agree, but that depends on what 'reasonable' means and what kind of
> uneeded cpu churn we're willing to add.
> 
> We can add a sysctl for this but we should use a low default to not do
> too much unneeded work.
> 
> So what about your original patch, but only add
> 
> nf_conntrack_gc_interval
> 
> (and also add instant-resched in case entire budget was consumed)?

I would prefer not to expose sysctl knobs, if we don't really know
what good default values are good, then we cannot expect our users to
know this for us.

I would go tune this in a way that this resembles to the previous
behaviour.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/2] conntrack: enable to tune gc parameters
  2016-10-14 10:53             ` Pablo Neira Ayuso
@ 2016-10-14 11:16               ` Florian Westphal
  2016-10-18  8:30               ` [PATCH net] conntrack: perform a full scan in gc Nicolas Dichtel
  1 sibling, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2016-10-14 11:16 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Florian Westphal, Nicolas Dichtel, davem, netdev, netfilter-devel

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> I would prefer not to expose sysctl knobs, if we don't really know
> what good default values are good, then we cannot expect our users to
> know this for us.
> 
> I would go tune this in a way that this resembles to the previous
> behaviour.

I do not see how this is possible without reverting to old per-conntrack
timer scheme.

With per-ct timer userspace gets notified the moment the timer
fires, without it notification comes 'when kernel detects the timeout'
which in worst case, as Nicholas describes, is when gc worker comes
along.

You can run the gc worker every jiffie of course, but thats just
wasting cpu cycles (and you still get a small delay).

I don't see a way to do run-time tuning except faster restarts when
old entries start accumulating.  This is what the code tries to do,
perhaps you have a better idea for the 'next gc run' computation.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 1/2] conntrack: remove obsolete sysctl (nf_conntrack_events_retry_timeout)
  2016-10-10 13:57   ` Florian Westphal
@ 2016-10-17 15:39     ` Pablo Neira Ayuso
  0 siblings, 0 replies; 19+ messages in thread
From: Pablo Neira Ayuso @ 2016-10-17 15:39 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Nicolas Dichtel, davem, netdev, netfilter-devel

On Mon, Oct 10, 2016 at 03:57:37PM +0200, Florian Westphal wrote:
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> > This entry has been removed in commit 9500507c6138.
> > 
> > Fixes: 9500507c6138 ("netfilter: conntrack: remove timer from ecache extension")
> > Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> 
> Acked-by: Florian Westphal <fw@strlen.de>

Applied, thanks Nicolas.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net] conntrack: perform a full scan in gc
  2016-10-14 10:53             ` Pablo Neira Ayuso
  2016-10-14 11:16               ` Florian Westphal
@ 2016-10-18  8:30               ` Nicolas Dichtel
  2016-10-18  8:47                 ` Florian Westphal
  1 sibling, 1 reply; 19+ messages in thread
From: Nicolas Dichtel @ 2016-10-18  8:30 UTC (permalink / raw)
  To: pablo, fw; +Cc: davem, netdev, netfilter-devel, Nicolas Dichtel

After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
timed-out entries"), netlink conntrack deletion events may be sent with a
huge delay (5 minutes).

There is two ways to evict conntrack:
 - during a conntrack lookup;
 - during a conntrack dump.
Let's do a full scan of conntrack entries after a period of inactivity
(no conntrack lookup).

CC: Florian Westphal <fw@strlen.de>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---

Here is another proposal to try to fix the problem.
Comments are welcomed,
Nicolas

 net/netfilter/nf_conntrack_core.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index ba6a1d421222..3dbb27bd9582 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -87,6 +87,7 @@ static __read_mostly bool nf_conntrack_locks_all;
 #define GC_MAX_BUCKETS		8192u
 #define GC_INTERVAL		(5 * HZ)
 #define GC_MAX_EVICTS		256u
+static bool gc_full_scan = true;
 
 static struct conntrack_gc_work conntrack_gc_work;
 
@@ -511,6 +512,7 @@ ____nf_conntrack_find(struct net *net, const struct nf_conntrack_zone *zone,
 	unsigned int bucket, hsize;
 
 begin:
+	gc_full_scan = false;
 	nf_conntrack_get_ht(&ct_hash, &hsize);
 	bucket = reciprocal_scale(hash, hsize);
 
@@ -942,7 +944,11 @@ static void gc_worker(struct work_struct *work)
 
 	gc_work = container_of(work, struct conntrack_gc_work, dwork.work);
 
-	goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS);
+	if (gc_full_scan)
+		goal = nf_conntrack_htable_size;
+	else
+		goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV,
+			   GC_MAX_BUCKETS);
 	i = gc_work->last_bucket;
 
 	do {
@@ -977,7 +983,8 @@ static void gc_worker(struct work_struct *work)
 		rcu_read_unlock();
 		cond_resched_rcu_qs();
 	} while (++buckets < goal &&
-		 expired_count < GC_MAX_EVICTS);
+		 (gc_full_scan || expired_count < GC_MAX_EVICTS));
+	gc_full_scan = true;
 
 	if (gc_work->exiting)
 		return;
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net] conntrack: perform a full scan in gc
  2016-10-18  8:30               ` [PATCH net] conntrack: perform a full scan in gc Nicolas Dichtel
@ 2016-10-18  8:47                 ` Florian Westphal
  2016-10-18 10:06                   ` Nicolas Dichtel
  0 siblings, 1 reply; 19+ messages in thread
From: Florian Westphal @ 2016-10-18  8:47 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: pablo, fw, davem, netdev, netfilter-devel

Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
> timed-out entries"), netlink conntrack deletion events may be sent with a
> huge delay (5 minutes).
> 
> There is two ways to evict conntrack:
>  - during a conntrack lookup;
>  - during a conntrack dump.
> Let's do a full scan of conntrack entries after a period of inactivity
> (no conntrack lookup).
> 
> CC: Florian Westphal <fw@strlen.de>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
> 
> Here is another proposal to try to fix the problem.
> Comments are welcomed,
> Nicolas

Hmm, I don't think its good idea in practice.
If goal is to avoid starving arbitrary 'dead' ct for too long,
then simple ping will defeat the logic here, because...

>  net/netfilter/nf_conntrack_core.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index ba6a1d421222..3dbb27bd9582 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -87,6 +87,7 @@ static __read_mostly bool nf_conntrack_locks_all;
>  #define GC_MAX_BUCKETS		8192u
>  #define GC_INTERVAL		(5 * HZ)
>  #define GC_MAX_EVICTS		256u
> +static bool gc_full_scan = true;
>  
>  static struct conntrack_gc_work conntrack_gc_work;
>  
> @@ -511,6 +512,7 @@ ____nf_conntrack_find(struct net *net, const struct nf_conntrack_zone *zone,
>  	unsigned int bucket, hsize;
>  
>  begin:
> +	gc_full_scan = false;

... we do periodic lookup (but always in same slot), so no full scan is
triggered.

If you think its useful, consider sending patch that rescheds worker
instantly in case budget expired, otherwise I will do this later this
week.

[ I am aware doing instant restart might be too late, but at least we
  would then reap more entries once we stumble upon large number of
  expired ones ].


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net] conntrack: perform a full scan in gc
  2016-10-18  8:47                 ` Florian Westphal
@ 2016-10-18 10:06                   ` Nicolas Dichtel
  2016-10-18 12:37                     ` [PATCH net] conntrack: restart gc immediately if GC_MAX_EVICTS is reached Nicolas Dichtel
  2016-10-20  8:50                     ` [PATCH net] conntrack: perform a full scan in gc Nicolas Dichtel
  0 siblings, 2 replies; 19+ messages in thread
From: Nicolas Dichtel @ 2016-10-18 10:06 UTC (permalink / raw)
  To: Florian Westphal; +Cc: pablo, davem, netdev, netfilter-devel

Le 18/10/2016 à 10:47, Florian Westphal a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>> After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
>> timed-out entries"), netlink conntrack deletion events may be sent with a
>> huge delay (5 minutes).
>>
>> There is two ways to evict conntrack:
>>  - during a conntrack lookup;
>>  - during a conntrack dump.
>> Let's do a full scan of conntrack entries after a period of inactivity
>> (no conntrack lookup).
>>
>> CC: Florian Westphal <fw@strlen.de>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>> ---
>>
>> Here is another proposal to try to fix the problem.
>> Comments are welcomed,
>> Nicolas
> 
> Hmm, I don't think its good idea in practice.
> If goal is to avoid starving arbitrary 'dead' ct for too long,
> then simple ping will defeat the logic here, because...
> 
>>  net/netfilter/nf_conntrack_core.c | 11 +++++++++--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
>> index ba6a1d421222..3dbb27bd9582 100644
>> --- a/net/netfilter/nf_conntrack_core.c
>> +++ b/net/netfilter/nf_conntrack_core.c
>> @@ -87,6 +87,7 @@ static __read_mostly bool nf_conntrack_locks_all;
>>  #define GC_MAX_BUCKETS		8192u
>>  #define GC_INTERVAL		(5 * HZ)
>>  #define GC_MAX_EVICTS		256u
>> +static bool gc_full_scan = true;
>>  
>>  static struct conntrack_gc_work conntrack_gc_work;
>>  
>> @@ -511,6 +512,7 @@ ____nf_conntrack_find(struct net *net, const struct nf_conntrack_zone *zone,
>>  	unsigned int bucket, hsize;
>>  
>>  begin:
>> +	gc_full_scan = false;
> 
> ... we do periodic lookup (but always in same slot), so no full scan is
> triggered.
Yes, I was wondering about that. My first idea was to have that bool per bucket
and force a scan of the bucket instead of the whole table.

> 
> If you think its useful, consider sending patch that rescheds worker
> instantly in case budget expired, otherwise I will do this later this
> week.
Ok, I will send it, but it does not address the "inactivity" problem.

> 
> [ I am aware doing instant restart might be too late, but at least we
>   would then reap more entries once we stumble upon large number of
>   expired ones ].
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net] conntrack: restart gc immediately if GC_MAX_EVICTS is reached
  2016-10-18 10:06                   ` Nicolas Dichtel
@ 2016-10-18 12:37                     ` Nicolas Dichtel
  2016-10-19 16:02                       ` Florian Westphal
  2016-10-19 16:14                       ` Pablo Neira Ayuso
  2016-10-20  8:50                     ` [PATCH net] conntrack: perform a full scan in gc Nicolas Dichtel
  1 sibling, 2 replies; 19+ messages in thread
From: Nicolas Dichtel @ 2016-10-18 12:37 UTC (permalink / raw)
  To: pablo, fw; +Cc: davem, netdev, netfilter-devel, Nicolas Dichtel

When the maximum evictions number is reached, do not wait 5 seconds before
the next run.

CC: Florian Westphal <fw@strlen.de>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/netfilter/nf_conntrack_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index ba6a1d421222..df2f5a3901df 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -983,7 +983,7 @@ static void gc_worker(struct work_struct *work)
 		return;
 
 	ratio = scanned ? expired_count * 100 / scanned : 0;
-	if (ratio >= 90)
+	if (ratio >= 90 || expired_count == GC_MAX_EVICTS)
 		next_run = 0;
 
 	gc_work->last_bucket = i;
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net] conntrack: restart gc immediately if GC_MAX_EVICTS is reached
  2016-10-18 12:37                     ` [PATCH net] conntrack: restart gc immediately if GC_MAX_EVICTS is reached Nicolas Dichtel
@ 2016-10-19 16:02                       ` Florian Westphal
  2016-10-19 16:14                       ` Pablo Neira Ayuso
  1 sibling, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2016-10-19 16:02 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: pablo, fw, davem, netdev, netfilter-devel

Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> When the maximum evictions number is reached, do not wait 5 seconds before
> the next run.
> 
> CC: Florian Westphal <fw@strlen.de>

Acked-by: Florian Westphal <fw@strlen.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net] conntrack: restart gc immediately if GC_MAX_EVICTS is reached
  2016-10-18 12:37                     ` [PATCH net] conntrack: restart gc immediately if GC_MAX_EVICTS is reached Nicolas Dichtel
  2016-10-19 16:02                       ` Florian Westphal
@ 2016-10-19 16:14                       ` Pablo Neira Ayuso
  1 sibling, 0 replies; 19+ messages in thread
From: Pablo Neira Ayuso @ 2016-10-19 16:14 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: fw, davem, netdev, netfilter-devel

On Tue, Oct 18, 2016 at 02:37:32PM +0200, Nicolas Dichtel wrote:
> When the maximum evictions number is reached, do not wait 5 seconds before
> the next run.

Applied, thanks Nicolas.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net] conntrack: perform a full scan in gc
  2016-10-18 10:06                   ` Nicolas Dichtel
  2016-10-18 12:37                     ` [PATCH net] conntrack: restart gc immediately if GC_MAX_EVICTS is reached Nicolas Dichtel
@ 2016-10-20  8:50                     ` Nicolas Dichtel
  1 sibling, 0 replies; 19+ messages in thread
From: Nicolas Dichtel @ 2016-10-20  8:50 UTC (permalink / raw)
  To: Florian Westphal; +Cc: pablo, davem, netdev, netfilter-devel

Le 18/10/2016 à 12:06, Nicolas Dichtel a écrit :
> Le 18/10/2016 à 10:47, Florian Westphal a écrit :
>> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>>> After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
>>> timed-out entries"), netlink conntrack deletion events may be sent with a
>>> huge delay (5 minutes).
>>>
>>> There is two ways to evict conntrack:
>>>  - during a conntrack lookup;
>>>  - during a conntrack dump.
>>> Let's do a full scan of conntrack entries after a period of inactivity
>>> (no conntrack lookup).
>>>
>>> CC: Florian Westphal <fw@strlen.de>
>>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>>> ---
>>>
>>> Here is another proposal to try to fix the problem.
>>> Comments are welcomed,
>>> Nicolas
>>
>> Hmm, I don't think its good idea in practice.
>> If goal is to avoid starving arbitrary 'dead' ct for too long,
>> then simple ping will defeat the logic here, because...
>>
>>>  net/netfilter/nf_conntrack_core.c | 11 +++++++++--
>>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
>>> index ba6a1d421222..3dbb27bd9582 100644
>>> --- a/net/netfilter/nf_conntrack_core.c
>>> +++ b/net/netfilter/nf_conntrack_core.c
>>> @@ -87,6 +87,7 @@ static __read_mostly bool nf_conntrack_locks_all;
>>>  #define GC_MAX_BUCKETS		8192u
>>>  #define GC_INTERVAL		(5 * HZ)
>>>  #define GC_MAX_EVICTS		256u
>>> +static bool gc_full_scan = true;
>>>  
>>>  static struct conntrack_gc_work conntrack_gc_work;
>>>  
>>> @@ -511,6 +512,7 @@ ____nf_conntrack_find(struct net *net, const struct nf_conntrack_zone *zone,
>>>  	unsigned int bucket, hsize;
>>>  
>>>  begin:
>>> +	gc_full_scan = false;
>>
>> ... we do periodic lookup (but always in same slot), so no full scan is
>> triggered.
> Yes, I was wondering about that. My first idea was to have that bool per bucket
> and force a scan of the bucket instead of the whole table.
FYI, I'am off for about two weeks, but we really need to find a way to fix that
before the release goes out.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-10-20  8:50 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-10 10:18 [PATCH net 0/2] conntrack update Nicolas Dichtel
2016-10-10 10:18 ` [PATCH net 1/2] conntrack: remove obsolete sysctl (nf_conntrack_events_retry_timeout) Nicolas Dichtel
2016-10-10 13:57   ` Florian Westphal
2016-10-17 15:39     ` Pablo Neira Ayuso
2016-10-10 10:18 ` [PATCH net 2/2] conntrack: enable to tune gc parameters Nicolas Dichtel
2016-10-10 14:04   ` Florian Westphal
2016-10-10 15:24     ` Nicolas Dichtel
2016-10-13 20:43       ` Florian Westphal
2016-10-14 10:12         ` Nicolas Dichtel
2016-10-14 10:37           ` Florian Westphal
2016-10-14 10:53             ` Pablo Neira Ayuso
2016-10-14 11:16               ` Florian Westphal
2016-10-18  8:30               ` [PATCH net] conntrack: perform a full scan in gc Nicolas Dichtel
2016-10-18  8:47                 ` Florian Westphal
2016-10-18 10:06                   ` Nicolas Dichtel
2016-10-18 12:37                     ` [PATCH net] conntrack: restart gc immediately if GC_MAX_EVICTS is reached Nicolas Dichtel
2016-10-19 16:02                       ` Florian Westphal
2016-10-19 16:14                       ` Pablo Neira Ayuso
2016-10-20  8:50                     ` [PATCH net] conntrack: perform a full scan in gc Nicolas Dichtel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).