public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* regression: sysctl_check changes in 2.6.24 are O(n) resulting in slow creation of 10000 network interfaces
@ 2008-01-06 22:03 Benjamin LaHaise
  2008-01-07  6:57 ` Eric W. Biederman
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin LaHaise @ 2008-01-06 22:03 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, Linus Torvalds

Hello folks,

2.6.24-rc6 regresses on the 10000 network interface creation test relative to 
2.6.23.  The cause appears to be the new code in sysctl_check_lookup(), which 
shows up as the #1 item while profiling.  Is a revert of this new code 
possible until its scaling issues are fixed?  2.6.23 can do more than 100 new 
network interfaces per second for the first few thousand devices, but with 
2.6.24-rc6 the results drop off rather dramatically to less than 10 interfaces 
per second.  The 10000 interface test is unbearable with the new sysctl_check 
code.

		time to creat 100 more interfaces
interfaces	v2.6.24-rc6	sysctl_check disabled
0		 0.729s		0.222s
100		 1.791s		0.223s
200		 3.966s		0.230s
300		 7.460s		0.236s
400		10.747s		0.241s
500		13.633s		0.252s

samples  %        app name                 symbol name
524598   33.4231  vmlinux                  sysctl_check_lookup
297996   18.9859  vmlinux                  cpu_idle
130263    8.2993  vmlinux                  __rcu_pending
123953    7.8973  vmlinux                  sysctl_head_next
121691    7.7532  vmlinux                  quicklist_trim
89624     5.7101  vmlinux                  strcmp
87257     5.5593  vmlinux                  rcu_pending
78475     4.9998  vmlinux                  poll_idle
43721     2.7855  vmlinux                  check_pgt_cache
34454     2.1951  vmlinux                  sysctl_parent
7494      0.4775  vmlinux                  rt_run_flush

		-ben


diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index a68425a..a4468e2 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -1459,6 +1459,8 @@ static void sysctl_check_bin_path(struct ctl_table *table, const char **fail)
 int sysctl_check_table(struct ctl_table *table)
 {
 	int error = 0;
+	return 0;
+
 	for (; table->ctl_name || table->procname; table++) {
 		const char *fail = NULL;
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: regression: sysctl_check changes in 2.6.24 are O(n) resulting in slow creation of 10000 network interfaces
  2008-01-06 22:03 regression: sysctl_check changes in 2.6.24 are O(n) resulting in slow creation of 10000 network interfaces Benjamin LaHaise
@ 2008-01-07  6:57 ` Eric W. Biederman
  2008-01-07  7:10   ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2008-01-07  6:57 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: linux-kernel, Linus Torvalds

Benjamin LaHaise <bcrl@kvack.org> writes:

> Hello folks,
>
> 2.6.24-rc6 regresses on the 10000 network interface creation test relative to 
> 2.6.23.  The cause appears to be the new code in sysctl_check_lookup(), which 
> shows up as the #1 item while profiling.  Is a revert of this new code 
> possible until its scaling issues are fixed?  2.6.23 can do more than 100 new 
> network interfaces per second for the first few thousand devices, but with 
> 2.6.24-rc6 the results drop off rather dramatically to less than 10 interfaces
> per second.  The 10000 interface test is unbearable with the new sysctl_check 
> code.

Why do we need 10000 interfaces?  Why isn't network device creation a
slow path?

The problem seems to be in the data structures used by sysctl.
You are increasing the length of the linked list each time you
add a network interface.  So sysctl lookups slow down.
At 10000 entries that is a long linked list walk.

At 100000 things get even longer.  Now the numbers you report still
seem like a lot of time to me.  My guess would be that we are getting
badly into cache miss territory.

If what you describe is a real scenario where users care we need
to fix the sysctl data structures so that they scale.

Because of this bug report and another one I got earlier today
about a real bug in the parallel port code detected by the very
lookup that is slowing you down.  I am quite reluctant to contemplate
pulling this code.  It seems to be doing it's job, if in some cases
uncomfortably so.

So is this a bug report telling me that there are users with
10k or 100k interfaces that care.  So we need to fix sysctl.

Is there a specific kernel test case that is run often that having
slow sysctl performance matters for?  CONFIG_SYSCTL=n should solve
that if it is specialized.

Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: regression: sysctl_check changes in 2.6.24 are O(n) resulting in slow creation of 10000 network interfaces
  2008-01-07  6:57 ` Eric W. Biederman
@ 2008-01-07  7:10   ` David Miller
  2008-01-07 10:25     ` Eric W. Biederman
  2008-01-07 20:45     ` Andi Kleen
  0 siblings, 2 replies; 8+ messages in thread
From: David Miller @ 2008-01-07  7:10 UTC (permalink / raw)
  To: ebiederm; +Cc: bcrl, linux-kernel, torvalds

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Sun, 06 Jan 2008 23:57:57 -0700

> Why do we need 10000 interfaces?  Why isn't network device creation a
> slow path?

Because people create virtual devices like mad.

> So is this a bug report telling me that there are users with
> 10k or 100k interfaces that care.  So we need to fix sysctl.

Unquestionably, we do, it's a major regression.

People create thousands of VLAN devices, as one of many examples, all
the time.

That's why we even bother hashing network devices in the network code.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: regression: sysctl_check changes in 2.6.24 are O(n) resulting in slow creation of 10000 network interfaces
  2008-01-07  7:10   ` David Miller
@ 2008-01-07 10:25     ` Eric W. Biederman
  2008-01-08  1:47       ` Benjamin LaHaise
  2008-01-07 20:45     ` Andi Kleen
  1 sibling, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2008-01-07 10:25 UTC (permalink / raw)
  To: bcrl; +Cc: linux-kernel, torvalds, David Miller

David Miller <davem@davemloft.net> writes:

> From: ebiederm@xmission.com (Eric W. Biederman)
> Date: Sun, 06 Jan 2008 23:57:57 -0700
>
>> Why do we need 10000 interfaces?  Why isn't network device creation a
>> slow path?
>
> Because people create virtual devices like mad.
>
>> So is this a bug report telling me that there are users with
>> 10k or 100k interfaces that care.  So we need to fix sysctl.
>
> Unquestionably, we do, it's a major regression.
>
> People create thousands of VLAN devices, as one of many examples, all
> the time.
>
> That's why we even bother hashing network devices in the network code.

Cool thanks.  Although I think that was only a 256 way hash.  So it
is a bit stretched at 10,000 chain length of 39 and approaching ugly
at 100,000.  Still it should perform much better the sysctl.

I think someone failed to notice that using /proc/sys slowed to a crawl
in that event, and now that I am doing a lookup on register it seems to
showing up in the benchmarks.

At 256 or fewer that we that the network device hash is optimized for
the sysctl data structures didn't look to be falling over to badly.
2s vs .2s if I read Benjamin numbers right.

Given that it is late in the release cycle (so we can't do the
surgery that it appears the internal sysctl data structures need)
I propose doing something like the patch below.

Benjamin can you test the patch below and tell me if it also
keeps the network device performance at acceptable levels.

I really don't want to remove the check for invalid binary sysctl names
or the other ones if I can help it.  But that should be a small constant
cost and not make things progressively worse.

Eric

diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index a68425a..d69ef6d 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -1343,6 +1343,7 @@ static void sysctl_repair_table(struct ctl_table *table)
 	}
 }
 
+#if 0
 static struct ctl_table *sysctl_check_lookup(struct ctl_table *table)
 {
 	struct ctl_table_header *head;
@@ -1385,6 +1386,7 @@ out:
 	sysctl_head_finish(head);
 	return ref;
 }
+#endif
 
 static void set_fail(const char **fail, struct ctl_table *table, const char *str)
 {
@@ -1397,6 +1399,10 @@ static void set_fail(const char **fail, struct ctl_table *table, const char *str
 	*fail = str;
 }
 
+#if 0 
+/* Temporarily disabled to improve network device creation speed
+ * Reenable after we have fixed the sysctl data structures.
+ */
 static int sysctl_check_dir(struct ctl_table *table)
 {
 	struct ctl_table *ref;
@@ -1436,6 +1442,15 @@ static void sysctl_check_leaf(struct ctl_table *table, const char **fail)
 	if (ref && (ref != table))
 		set_fail(fail, table, "Sysctl already exists");
 }
+#else
+static int sysctl_check_dir(struct ctl_table *table)
+{
+	return 0;
+}
+static void sysctl_check_leaf(struct ctl_table *table, const char **fail)
+{
+}
+#endif
 
 static void sysctl_check_bin_path(struct ctl_table *table, const char **fail)
 {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: regression: sysctl_check changes in 2.6.24 are O(n) resulting in slow creation of 10000 network interfaces
  2008-01-07  7:10   ` David Miller
  2008-01-07 10:25     ` Eric W. Biederman
@ 2008-01-07 20:45     ` Andi Kleen
  2008-01-07 21:30       ` Alan Cox
  1 sibling, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2008-01-07 20:45 UTC (permalink / raw)
  To: David Miller; +Cc: ebiederm, bcrl, linux-kernel, torvalds

David Miller <davem@davemloft.net> writes:
>
>> So is this a bug report telling me that there are users with
>> 10k or 100k interfaces that care.  So we need to fix sysctl.
>
> Unquestionably, we do, it's a major regression.
>
> People create thousands of VLAN devices, as one of many examples, all
> the time.

It might be an reasonable option to just stop creating sysctl entries
for interfaces after some threshold. I presume people who have that
many interfaces will mostly work through {default,all}/* anyways.

I think that would be a better option than to complicate sysctl.c
for this uncommon case.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: regression: sysctl_check changes in 2.6.24 are O(n) resulting in slow creation of 10000 network interfaces
  2008-01-07 20:45     ` Andi Kleen
@ 2008-01-07 21:30       ` Alan Cox
  2008-01-07 22:57         ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: Alan Cox @ 2008-01-07 21:30 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David Miller, ebiederm, bcrl, linux-kernel, torvalds

> I think that would be a better option than to complicate sysctl.c
> for this uncommon case.

What is so complicated about hashing the entries if you are checking for
duplicates when debugging. You can set the hash function to "0" and the
array size to 1 when the debug is off and it'll all go away nicely

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: regression: sysctl_check changes in 2.6.24 are O(n) resulting in slow creation of 10000 network interfaces
  2008-01-07 21:30       ` Alan Cox
@ 2008-01-07 22:57         ` Andi Kleen
  0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2008-01-07 22:57 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andi Kleen, David Miller, ebiederm, bcrl, linux-kernel, torvalds

On Mon, Jan 07, 2008 at 09:30:54PM +0000, Alan Cox wrote:
> > I think that would be a better option than to complicate sysctl.c
> > for this uncommon case.
> 
> What is so complicated about hashing the entries if you are checking for

One thing I'm worrying about is memory bloat (yes I know that's not
popular but someone has to do it ;-)

You would need a hash table for each table. To handle 100k entries
you would need a larger hash tables with at least a few hundred entries.
And that for each subdirectory.

% find /proc/sys -type d | wc -l
64

Assuming e.g. a 128 byte entry hash table (which is probably too small
for 100k entries) that would require 64 * 128 * 8 = 64k of memory.
Not gigantic, but lots of small fry bloat also adds up. Now if you
chose an actually realistic hash table size it gets even bigger.

Most likely you would need to implement a tree or a resizeable hash table
to do this sanely and then you quickly go into complicated territory.

> duplicates when debugging. You can set the hash function to "0" and the

My understanding was that the code was always on; not only for debugging.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: regression: sysctl_check changes in 2.6.24 are O(n) resulting in slow creation of 10000 network interfaces
  2008-01-07 10:25     ` Eric W. Biederman
@ 2008-01-08  1:47       ` Benjamin LaHaise
  0 siblings, 0 replies; 8+ messages in thread
From: Benjamin LaHaise @ 2008-01-08  1:47 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, torvalds, David Miller

On Mon, Jan 07, 2008 at 03:25:00AM -0700, Eric W. Biederman wrote:
> I think someone failed to notice that using /proc/sys slowed to a crawl
> in that event, and now that I am doing a lookup on register it seems to
> showing up in the benchmarks.

The directory that is problematic rarely gets accessed.  Fixing sysfs 
properly at some point would be a good idea, it's just one of those things 
that isn't quite high priority yet.

> Benjamin can you test the patch below and tell me if it also
> keeps the network device performance at acceptable levels.

I can confirm that this patch does indeed put the regression at bay for 
now.

		-ben

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-01-08  1:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-06 22:03 regression: sysctl_check changes in 2.6.24 are O(n) resulting in slow creation of 10000 network interfaces Benjamin LaHaise
2008-01-07  6:57 ` Eric W. Biederman
2008-01-07  7:10   ` David Miller
2008-01-07 10:25     ` Eric W. Biederman
2008-01-08  1:47       ` Benjamin LaHaise
2008-01-07 20:45     ` Andi Kleen
2008-01-07 21:30       ` Alan Cox
2008-01-07 22:57         ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox