netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] make per interface sysctl entries configurable
@ 2009-10-25 17:54 Octavian Purdila
  2009-10-25 18:07 ` Denys Fedoryschenko
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Octavian Purdila @ 2009-10-25 17:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Benjamin LaHaise, Stephen Hemminger, Cosmin Ratiu, netdev

[-- Attachment #1: Type: text/plain, Size: 436 bytes --]


RFC patches are attached.

Another possible approach: add an interface flag and use it to decide whether 
we want per interface sysctl entries or not.

Benchmarks for creating 1000 interface (with the ndst module previously posted 
on the list, ppc750 @800Mhz machine):

- without the patches:

real    4m 38.27s
user    0m 0.00s
sys     2m 18.90s

- with the patches:

real    0m 0.10s
user    0m 0.00s
sys     0m 0.05s

Thanks,
tavi

[-- Attachment #2: 818108.diff --]
[-- Type: text/x-patch, Size: 2756 bytes --]

net: CONFIG_NET_SYSCTL_DEV: make per interface dev_snmp6 proc entries optional
        
Use same CONFIG_NET_SYSCTL_DEV config option (we should probably
rename it to a better name) to enable/disable per interface dev_snmp6
proc entries.

--- //packages/linux_2.6.31/rc7/src/arch/powerpc/configs/ixia_ppc750_defconfig
+++ //packages/linux_2.6.31/rc7/src/arch/powerpc/configs/ixia_ppc750_defconfig
@@ -287,6 +287,7 @@
 #
 # Networking options
 #
+# CONFIG_NET_SYSCTL_DEV is not set
 CONFIG_PACKET=y
 CONFIG_PACKET_MMAP=y
 CONFIG_UNIX=y
--- //packages/linux_2.6.31/rc7/src/net/Kconfig
+++ //packages/linux_2.6.31/rc7/src/net/Kconfig
@@ -25,6 +25,15 @@
 
 menu "Networking options"
 
+config NET_SYSCTL_DEV
+	bool "Per device sysctl entries"
+	default y
+	depends on SYSCTL
+	help
+	  Enables per device sysctl entries. You want this enabled unless
+	  your system has a large number of interfaces and you want to reduce
+	  memory usage.
+
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
 source "net/xfrm/Kconfig"
--- //packages/linux_2.6.31/rc7/src/net/ipv4/devinet.c
+++ //packages/linux_2.6.31/rc7/src/net/ipv4/devinet.c
@@ -97,7 +97,7 @@
 static BLOCKING_NOTIFIER_HEAD(inetaddr_chain);
 static void inet_del_ifa(struct in_device *in_dev, struct in_ifaddr **ifap,
 			 int destroy);
-#ifdef CONFIG_SYSCTL
+#ifdef CONFIG_NET_SYSCTL_DEV
 static void devinet_sysctl_register(struct in_device *idev);
 static void devinet_sysctl_unregister(struct in_device *idev);
 #else
@@ -1397,7 +1397,6 @@
 	return ret;
 }
 
-
 #define DEVINET_SYSCTL_ENTRY(attr, name, mval, proc, sysctl) \
 	{ \
 		.ctl_name	= NET_IPV4_CONF_ ## attr, \
@@ -1531,6 +1530,7 @@
 	kfree(t);
 }
 
+#ifdef CONFIG_NET_SYSCTL_DEV
 static void devinet_sysctl_register(struct in_device *idev)
 {
 	neigh_sysctl_register(idev->dev, idev->arp_parms, NET_IPV4,
@@ -1544,6 +1544,7 @@
 	__devinet_sysctl_unregister(&idev->cnf);
 	neigh_sysctl_unregister(idev->arp_parms);
 }
+#endif
 
 static struct ctl_table ctl_forward_entry[] = {
 	{
--- //packages/linux_2.6.31/rc7/src/net/ipv6/addrconf.c
+++ //packages/linux_2.6.31/rc7/src/net/ipv6/addrconf.c
@@ -100,7 +100,7 @@
 #define	INFINITY_LIFE_TIME	0xFFFFFFFF
 #define TIME_DELTA(a,b) ((unsigned long)((long)(a) - (long)(b)))
 
-#ifdef CONFIG_SYSCTL
+#ifdef CONFIG_NET_SYSCTL_DEV
 static void addrconf_sysctl_register(struct inet6_dev *idev);
 static void addrconf_sysctl_unregister(struct inet6_dev *idev);
 #else
@@ -4412,6 +4412,7 @@
 	kfree(t);
 }
 
+#ifdef CONFIG_NET_SYSCTL_DEV
 static void addrconf_sysctl_register(struct inet6_dev *idev)
 {
 	neigh_sysctl_register(idev->dev, idev->nd_parms, NET_IPV6,
@@ -4427,7 +4428,7 @@
 	__addrconf_sysctl_unregister(&idev->cnf);
 	neigh_sysctl_unregister(idev->nd_parms);
 }
-
+#endif
 
 #endif
 

[-- Attachment #3: 818110.diff --]
[-- Type: text/x-patch, Size: 1277 bytes --]

net: CONFIG_NET_SYSCTL_DEV: make per interface dev_snmp6 proc entries optional
        
Use same CONFIG_NET_SYSCTL_DEV config option (we should probably
rename it to a better name) to enable/disable per interface dev_snmp6
proc entries.

--- //packages/linux_2.6.31/rc7/src/include/net/ipv6.h
+++ //packages/linux_2.6.31/rc7/src/include/net/ipv6.h
@@ -604,8 +604,14 @@
 extern void udplite6_proc_exit(void);
 extern int  ipv6_misc_proc_init(void);
 extern void ipv6_misc_proc_exit(void);
+
+#ifdef CONFIG_NET_SYSCTL_DEV
 extern int snmp6_register_dev(struct inet6_dev *idev);
 extern int snmp6_unregister_dev(struct inet6_dev *idev);
+#else
+static inline int snmp6_register_dev(struct inet6_dev *idev) { return 0; }
+static inline int snmp6_unregister_dev(struct inet6_dev *idev) { return 0; }
+#endif
 
 #else
 static inline int ac6_proc_init(struct net *net) { return 0; }
--- //packages/linux_2.6.31/rc7/src/net/ipv6/proc.c
+++ //packages/linux_2.6.31/rc7/src/net/ipv6/proc.c
@@ -232,6 +232,7 @@
 	.release = single_release,
 };
 
+#ifdef CONFIG_NET_SYSCTL_DEV
 int snmp6_register_dev(struct inet6_dev *idev)
 {
 	struct proc_dir_entry *p;
@@ -266,6 +267,7 @@
 	idev->stats.proc_dir_entry = NULL;
 	return 0;
 }
+#endif
 
 static int ipv6_proc_init_net(struct net *net)
 {

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] make per interface sysctl entries configurable
  2009-10-25 17:54 [RFC] make per interface sysctl entries configurable Octavian Purdila
@ 2009-10-25 18:07 ` Denys Fedoryschenko
  2009-10-25 21:37 ` Eric Dumazet
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Denys Fedoryschenko @ 2009-10-25 18:07 UTC (permalink / raw)
  To: Octavian Purdila
  Cc: Eric Dumazet, Benjamin LaHaise, Stephen Hemminger, Cosmin Ratiu,
	netdev

Very interesting patch, because i have PPPoE and sysctl locking issue my issue 
N1(accorting to perf and oprofile) on massive pppoe login and during 
operation.
Probably i will try to apply it on one of loaded (but redundant, in case it 
will crash) pppoe.

On Sunday 25 October 2009 19:54:49 Octavian Purdila wrote:
> RFC patches are attached.
>
> Another possible approach: add an interface flag and use it to decide
> whether we want per interface sysctl entries or not.
>
> Benchmarks for creating 1000 interface (with the ndst module previously
> posted on the list, ppc750 @800Mhz machine):
>
> - without the patches:
>
> real    4m 38.27s
> user    0m 0.00s
> sys     2m 18.90s
>
> - with the patches:
>
> real    0m 0.10s
> user    0m 0.00s
> sys     0m 0.05s
>
> Thanks,
> tavi



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] make per interface sysctl entries configurable
  2009-10-25 17:54 [RFC] make per interface sysctl entries configurable Octavian Purdila
  2009-10-25 18:07 ` Denys Fedoryschenko
@ 2009-10-25 21:37 ` Eric Dumazet
  2009-10-25 22:21   ` Octavian Purdila
  2009-10-26  4:31 ` Stephen Hemminger
  2009-10-26 15:24 ` Denys Fedoryschenko
  3 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2009-10-25 21:37 UTC (permalink / raw)
  To: Octavian Purdila
  Cc: Benjamin LaHaise, Stephen Hemminger, Cosmin Ratiu, netdev

Octavian Purdila a écrit :
> RFC patches are attached.
> 
> Another possible approach: add an interface flag and use it to decide whether 
> we want per interface sysctl entries or not.
> 

Hmm, could we speedup sysctl instead, adding rbtree or something ?



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] make per interface sysctl entries configurable
  2009-10-25 21:37 ` Eric Dumazet
@ 2009-10-25 22:21   ` Octavian Purdila
  2009-10-25 22:32     ` Denys Fedoryschenko
  2009-10-26  9:01     ` Cosmin Ratiu
  0 siblings, 2 replies; 8+ messages in thread
From: Octavian Purdila @ 2009-10-25 22:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Benjamin LaHaise, Stephen Hemminger, Cosmin Ratiu, netdev

On Sunday 25 October 2009 23:37:19 you wrote:
> Octavian Purdila a écrit :
> > RFC patches are attached.
> >
> > Another possible approach: add an interface flag and use it to decide
> > whether we want per interface sysctl entries or not.
> 
> Hmm, could we speedup sysctl instead, adding rbtree or something ?
> 

Very good point, I think this is the best solution for people using a 
moderately high number of interfaces (a few thousand).

But for really large setups there is another issue: memory consumption. In 
fact, in order to be able to scale to 128K interfaces and still have a 
significant amount of memory available to applications we also had to disable 
sysfs and #ifdef CONFIG_SYSFS struct device from net_device.

I would also argue that when you have such a large number of interfaces you 
don't need to change setting on a per interface basis. Or at least this is our 
case :)  and I suspect that the case with a large number of PPP interfaces is 
similar.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] make per interface sysctl entries configurable
  2009-10-25 22:21   ` Octavian Purdila
@ 2009-10-25 22:32     ` Denys Fedoryschenko
  2009-10-26  9:01     ` Cosmin Ratiu
  1 sibling, 0 replies; 8+ messages in thread
From: Denys Fedoryschenko @ 2009-10-25 22:32 UTC (permalink / raw)
  To: Octavian Purdila
  Cc: Eric Dumazet, Benjamin LaHaise, Stephen Hemminger, Cosmin Ratiu,
	netdev

On Monday 26 October 2009 00:21:48 Octavian Purdila wrote:

> Very good point, I think this is the best solution for people using a
> moderately high number of interfaces (a few thousand).
>
> But for really large setups there is another issue: memory consumption. In
> fact, in order to be able to scale to 128K interfaces and still have a
> significant amount of memory available to applications we also had to
> disable sysfs and #ifdef CONFIG_SYSFS struct device from net_device.
>
> I would also argue that when you have such a large number of interfaces you
> don't need to change setting on a per interface basis. Or at least this is
> our case :)  and I suspect that the case with a large number of PPP
> interfaces is similar.
I will add also, sysctl -a (over busybox) on pppoe with 2k interfaces takes 
ages to complete.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] make per interface sysctl entries configurable
  2009-10-25 17:54 [RFC] make per interface sysctl entries configurable Octavian Purdila
  2009-10-25 18:07 ` Denys Fedoryschenko
  2009-10-25 21:37 ` Eric Dumazet
@ 2009-10-26  4:31 ` Stephen Hemminger
  2009-10-26 15:24 ` Denys Fedoryschenko
  3 siblings, 0 replies; 8+ messages in thread
From: Stephen Hemminger @ 2009-10-26  4:31 UTC (permalink / raw)
  To: Octavian Purdila; +Cc: Eric Dumazet, Benjamin LaHaise, Cosmin Ratiu, netdev

On Sun, 25 Oct 2009 19:54:49 +0200
Octavian Purdila <opurdila@ixiacom.com> wrote:

> 
> RFC patches are attached.
> 
> Another possible approach: add an interface flag and use it to decide whether 
> we want per interface sysctl entries or not.
> 
> Benchmarks for creating 1000 interface (with the ndst module previously posted 
> on the list, ppc750 @800Mhz machine):
> 
> - without the patches:
> 
> real    4m 38.27s
> user    0m 0.00s
> sys     2m 18.90s
> 
> - with the patches:
> 
> real    0m 0.10s
> user    0m 0.00s
> sys     0m 0.05s
> 
> Thanks,
> tavi

I would rather optimize the algorithm than give up and make it
not available. It should be possible to do better by just using some
better programming.

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] make per interface sysctl entries configurable
  2009-10-25 22:21   ` Octavian Purdila
  2009-10-25 22:32     ` Denys Fedoryschenko
@ 2009-10-26  9:01     ` Cosmin Ratiu
  1 sibling, 0 replies; 8+ messages in thread
From: Cosmin Ratiu @ 2009-10-26  9:01 UTC (permalink / raw)
  To: Octavian Purdila
  Cc: Eric Dumazet, Benjamin LaHaise, Stephen Hemminger, netdev

On Monday 26 October 2009 00:21:48 Octavian Purdila wrote:
> On Sunday 25 October 2009 23:37:19 you wrote:
> > Octavian Purdila a écrit :
> > > RFC patches are attached.
> > >
> > > Another possible approach: add an interface flag and use it to decide
> > > whether we want per interface sysctl entries or not.
> >
> > Hmm, could we speedup sysctl instead, adding rbtree or something ?
> 
> Very good point, I think this is the best solution for people using a
> moderately high number of interfaces (a few thousand).
> 
> But for really large setups there is another issue: memory consumption. In
> fact, in order to be able to scale to 128K interfaces and still have a
> significant amount of memory available to applications we also had to
>  disable sysfs and #ifdef CONFIG_SYSFS struct device from net_device.
> 
> I would also argue that when you have such a large number of interfaces you
> don't need to change setting on a per interface basis. Or at least this is
>  our case :)  and I suspect that the case with a large number of PPP
>  interfaces is similar.
> 

Another possible approach: shared settings for an interface group. If you have 
a large number of interfaces of the same type it would be nice if you could 
change some setting for the whole group instead of globally or individually. 

Is this approach feasible anyway? Or I'm talking rubbish.

Cosmin.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] make per interface sysctl entries configurable
  2009-10-25 17:54 [RFC] make per interface sysctl entries configurable Octavian Purdila
                   ` (2 preceding siblings ...)
  2009-10-26  4:31 ` Stephen Hemminger
@ 2009-10-26 15:24 ` Denys Fedoryschenko
  3 siblings, 0 replies; 8+ messages in thread
From: Denys Fedoryschenko @ 2009-10-26 15:24 UTC (permalink / raw)
  To: Octavian Purdila
  Cc: Eric Dumazet, Benjamin LaHaise, Stephen Hemminger, Cosmin Ratiu,
	netdev

I test it on pppoe with 1k customers. It works flawlessly.

When there is problem on network and i have massive users disconnect and then 
login, the bottleneck is in lock somewhere in creation of sysctl(according 
perf). PPPoE after 200-300 interfaces will start dying, and connection rate 
will drop to 20-50 customers per minute, load average will jump to 70-100 (i 
guess pppd processes waiting their turn). With this patch i am able to 
sustain 200-300 customers / minute login rate and perftop is "clear" now.

Definitely this option is optional, and doesn't cut any functionality by 
default, just giving more choice. And for PPP (pppoe/pptp) NAS it is very 
useful.

On Sunday 25 October 2009 19:54:49 Octavian Purdila wrote:
> RFC patches are attached.
>
> Another possible approach: add an interface flag and use it to decide
> whether we want per interface sysctl entries or not.
>
> Benchmarks for creating 1000 interface (with the ndst module previously
> posted on the list, ppc750 @800Mhz machine):
>
> - without the patches:
>
> real    4m 38.27s
> user    0m 0.00s
> sys     2m 18.90s
>
> - with the patches:
>
> real    0m 0.10s
> user    0m 0.00s
> sys     0m 0.05s
>
> Thanks,
> tavi



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-10-26 15:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-25 17:54 [RFC] make per interface sysctl entries configurable Octavian Purdila
2009-10-25 18:07 ` Denys Fedoryschenko
2009-10-25 21:37 ` Eric Dumazet
2009-10-25 22:21   ` Octavian Purdila
2009-10-25 22:32     ` Denys Fedoryschenko
2009-10-26  9:01     ` Cosmin Ratiu
2009-10-26  4:31 ` Stephen Hemminger
2009-10-26 15:24 ` Denys Fedoryschenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).