netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] xfrm: export xfrm garbage collector thresholds via sysctl
@ 2009-07-27 18:22 Neil Horman
  2009-07-27 18:37 ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Neil Horman @ 2009-07-27 18:22 UTC (permalink / raw)
  To: netdev
  Cc: joe, nhorman, davem, herbert, kuznet, pekkas, jmorris, yoshfuji,
	kaber

Export garbage collector thresholds for xfrm[4|6]_dst_ops

Had a problem reported to me recently in which a high volume of ipsec
connections on a system began reporting ENOBUFS for new connections eventually.
It seemed that after about 2000 connections we started being unable to create
more.  A quick look revealed that the xfrm code used a dst_ops structure that
limited the gc_thresh value to 1024, and alaways dropped route cache entries
after 2x the gc_thresh.  It seems the most direct solution is to export the
gc_thresh values in the xfrm[4|6] dst_ops as sysctls, like the main routing
table does, so that higher volumes of connections can be supported.  This patch
has been tested and allows the reporter to increase their ipsec connection
volume successfully.

Reported-by: Joe Nall <joe@nall.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


ipv4/xfrm4_policy.c |   18 ++++++++++++++++++
ipv6/xfrm6_policy.c |   18 ++++++++++++++++++
2 files changed, 36 insertions(+)

diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 0071ee6..018ac8b 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -264,6 +264,20 @@ static struct xfrm_policy_afinfo xfrm4_policy_afinfo = {
 	.fill_dst =		xfrm4_fill_dst,
 };
 
+static struct ctl_table xfrm4_policy_table[] = {
+	{
+		.ctl_name       = CTL_UNNUMBERED,
+		.procname       = "xfrm4_gc_thresh",
+		.data           = &xfrm4_dst_ops.gc_thresh,
+		.maxlen         = sizeof(int),
+		.mode           = 0644,
+		.proc_handler   = proc_dointvec,
+	},
+	{ }
+};
+
+static struct ctl_table_header *sysctl_hdr;
+
 static void __init xfrm4_policy_init(void)
 {
 	xfrm_policy_register_afinfo(&xfrm4_policy_afinfo);
@@ -271,6 +285,8 @@ static void __init xfrm4_policy_init(void)
 
 static void __exit xfrm4_policy_fini(void)
 {
+	if (sysctl_hdr)
+		unregister_net_sysctl_table(sysctl_hdr);
 	xfrm_policy_unregister_afinfo(&xfrm4_policy_afinfo);
 }
 
@@ -278,5 +294,7 @@ void __init xfrm4_init(void)
 {
 	xfrm4_state_init();
 	xfrm4_policy_init();
+	sysctl_hdr = register_net_sysctl_table(&init_net, net_ipv4_ctl_path,
+						xfrm4_policy_table);
 }
 
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 3a3c677..4acc308 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -306,6 +306,20 @@ static void xfrm6_policy_fini(void)
 	xfrm_policy_unregister_afinfo(&xfrm6_policy_afinfo);
 }
 
+static struct ctl_table xfrm6_policy_table[] = {
+	{
+		.ctl_name       = CTL_UNNUMBERED,
+		.procname       = "xfrm6_gc_thresh",
+		.data	   	= &xfrm6_dst_ops.gc_thresh,
+		.maxlen	 	= sizeof(int),
+		.mode	   	= 0644,
+		.proc_handler   = proc_dointvec,
+	},
+	{ }
+};
+
+static struct ctl_table_header *sysctl_hdr;
+
 int __init xfrm6_init(void)
 {
 	int ret;
@@ -317,6 +331,8 @@ int __init xfrm6_init(void)
 	ret = xfrm6_state_init();
 	if (ret)
 		goto out_policy;
+	sysctl_hdr = register_net_sysctl_table(&init_net, net_ipv6_ctl_path,
+						xfrm6_policy_table);
 out:
 	return ret;
 out_policy:
@@ -326,6 +342,8 @@ out_policy:
 
 void xfrm6_fini(void)
 {
+	if (sysctl_hdr)
+		unregister_net_sysctl_table(sysctl_hdr);
 	//xfrm6_input_fini();
 	xfrm6_policy_fini();
 	xfrm6_state_fini();

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfrm: export xfrm garbage collector thresholds via sysctl
  2009-07-27 18:22 [PATCH] xfrm: export xfrm garbage collector thresholds via sysctl Neil Horman
@ 2009-07-27 18:37 ` David Miller
  2009-07-27 19:36   ` Neil Horman
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2009-07-27 18:37 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, joe, herbert, kuznet, pekkas, jmorris, yoshfuji, kaber

From: Neil Horman <nhorman@tuxdriver.com>
Date: Mon, 27 Jul 2009 14:22:46 -0400

> Export garbage collector thresholds for xfrm[4|6]_dst_ops
> 
> Had a problem reported to me recently in which a high volume of ipsec
> connections on a system began reporting ENOBUFS for new connections eventually.
> It seemed that after about 2000 connections we started being unable to create
> more.  A quick look revealed that the xfrm code used a dst_ops structure that
> limited the gc_thresh value to 1024, and alaways dropped route cache entries
> after 2x the gc_thresh.  It seems the most direct solution is to export the
> gc_thresh values in the xfrm[4|6] dst_ops as sysctls, like the main routing
> table does, so that higher volumes of connections can be supported.  This patch
> has been tested and allows the reporter to increase their ipsec connection
> volume successfully.
> 
> Reported-by: Joe Nall <joe@nall.com>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

Applied, but this suggests that either:

1) we pick a horrible default

2) our IPSEC machinery holds onto dst entries too tightly and that's
   the true cause of this problem

I'd like to ask that you investigate this, because with defaults
we should be able to handle IPSEC loads as high as the routing
loads we could handle.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfrm: export xfrm garbage collector thresholds via sysctl
  2009-07-27 18:37 ` David Miller
@ 2009-07-27 19:36   ` Neil Horman
  2009-07-27 19:40     ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Neil Horman @ 2009-07-27 19:36 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, joe, herbert, kuznet, pekkas, jmorris, yoshfuji, kaber

On Mon, Jul 27, 2009 at 11:37:55AM -0700, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Mon, 27 Jul 2009 14:22:46 -0400
> 
> > Export garbage collector thresholds for xfrm[4|6]_dst_ops
> > 
> > Had a problem reported to me recently in which a high volume of ipsec
> > connections on a system began reporting ENOBUFS for new connections eventually.
> > It seemed that after about 2000 connections we started being unable to create
> > more.  A quick look revealed that the xfrm code used a dst_ops structure that
> > limited the gc_thresh value to 1024, and alaways dropped route cache entries
> > after 2x the gc_thresh.  It seems the most direct solution is to export the
> > gc_thresh values in the xfrm[4|6] dst_ops as sysctls, like the main routing
> > table does, so that higher volumes of connections can be supported.  This patch
> > has been tested and allows the reporter to increase their ipsec connection
> > volume successfully.
> > 
> > Reported-by: Joe Nall <joe@nall.com>
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> 
> Applied, but this suggests that either:
> 
Thanks!

> 1) we pick a horrible default
> 
> 2) our IPSEC machinery holds onto dst entries too tightly and that's
>    the true cause of this problem
> 
> I'd like to ask that you investigate this, because with defaults
> we should be able to handle IPSEC loads as high as the routing
> loads we could handle.
> 
I'll gladly look into this further.  Compared to the main routing table, the
ipsec default selection is pretty bad.  Its statcially set despite the size of
the larger routing table.  Looking at the garbage collection algorithm, we do
keep a pretty tight leash on freeing entries, but I think its warrented.  We
create 1 dst_entry for each open socket on an ipsec tunnel, and don't release it
until its __refcnt drops to zero.  I think that makes sense, since it means
we only keep cache entries for active connections, and clean them up as soon as
they close (e.g. I don't really see the advantage to unhashing a xfrm cache
entry only to recreate it on the next packet sent).  I think the most sensible
first step is to dynamically choose a gc threshold based on the size of memory
or the main routing table.  I'll write this up and post later this week after I
do some testing.  Thanks!
Neil

> Thanks.
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfrm: export xfrm garbage collector thresholds via sysctl
  2009-07-27 19:36   ` Neil Horman
@ 2009-07-27 19:40     ` David Miller
  2009-07-27 20:02       ` Joe Nall
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2009-07-27 19:40 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, joe, herbert, kuznet, pekkas, jmorris, yoshfuji, kaber

From: Neil Horman <nhorman@tuxdriver.com>
Date: Mon, 27 Jul 2009 15:36:25 -0400

> I think that makes sense, since it means we only keep cache entries
> for active connections, and clean them up as soon as they close
> (e.g. I don't really see the advantage to unhashing a xfrm cache
> entry only to recreate it on the next packet sent).

How is this related to the user's problem?

My impression was that they were forwarding IPSEC traffic when
running up against these limits, and that has no socket based
assosciation at all.

Making the XFRM GC limits get computed similarly to how the
ipv4/ipv6 one's do probably makes sense.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfrm: export xfrm garbage collector thresholds via sysctl
  2009-07-27 19:40     ` David Miller
@ 2009-07-27 20:02       ` Joe Nall
  2009-07-27 20:10         ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Joe Nall @ 2009-07-27 20:02 UTC (permalink / raw)
  To: David Miller
  Cc: nhorman, netdev, herbert, kuznet, pekkas, jmorris, yoshfuji,
	kaber


On Jul 27, 2009, at 2:40 PM, David Miller wrote:

> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Mon, 27 Jul 2009 15:36:25 -0400
>
>> I think that makes sense, since it means we only keep cache entries
>> for active connections, and clean them up as soon as they close
>> (e.g. I don't really see the advantage to unhashing a xfrm cache
>> entry only to recreate it on the next packet sent).
>
> How is this related to the user's problem?
>
> My impression was that they were forwarding IPSEC traffic when
> running up against these limits, and that has no socket based
> assosciation at all.
>
> Making the XFRM GC limits get computed similarly to how the
> ipv4/ipv6 one's do probably makes sense.

The problem was seen serving TCP connections over IPSec when the  
server (a 24 core machine w 32GB RAM) and the IPSec host were the same  
(transport not tunnel).

The problem was originally identified in an MLS ipsec thin client  
stress test with 25 clients and 200+ windows per client.

We then duplicated the issue with single level xclocks on the same  
hardware.

I then duplicated the problem with two bone stock (not MLS) F10 (and  
later F11) boxes running ab (apache benchmark tool) with 10k requests  
over 2k concurrent connections. See https://bugzilla.redhat.com/show_bug.cgi?id=503124

With the gc_thresh raised to 8k, my ab test passes with 5k+  
connections and 100k requests. Our test guys ran 11 workstations with  
about 2200 concurrent X connections over the weekend and I'm hoping  
for some MLS results before we lose the test suite on Wednesday.

joe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfrm: export xfrm garbage collector thresholds via sysctl
  2009-07-27 20:02       ` Joe Nall
@ 2009-07-27 20:10         ` David Miller
  0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2009-07-27 20:10 UTC (permalink / raw)
  To: joe; +Cc: nhorman, netdev, herbert, kuznet, pekkas, jmorris, yoshfuji,
	kaber

From: Joe Nall <joe@nall.com>
Date: Mon, 27 Jul 2009 15:02:20 -0500

> The problem was seen serving TCP connections over IPSec when the
> server (a 24 core machine w 32GB RAM) and the IPSec host were the same
> (transport not tunnel).

Aha, thanks for the clarification, that makes a lot more sense.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-07-27 20:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-27 18:22 [PATCH] xfrm: export xfrm garbage collector thresholds via sysctl Neil Horman
2009-07-27 18:37 ` David Miller
2009-07-27 19:36   ` Neil Horman
2009-07-27 19:40     ` David Miller
2009-07-27 20:02       ` Joe Nall
2009-07-27 20:10         ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).