public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Really slow netstat and /proc/net/tcp in 2.4
@ 2001-10-11 18:47 Simon Kirby
  2001-10-11 19:30 ` kuznet
  0 siblings, 1 reply; 11+ messages in thread
From: Simon Kirby @ 2001-10-11 18:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: David S. Miller

Is there something that changed from 2.2 -> 2.4 with regards to the
speed of netstat and /proc/net/tcp?  We have some webservers we just
upgraded from 2.2.19 to 2.4.12, and some in-house monitoring tools that
check /proc/net/tcp have begun to suck up a lot of CPU cycles trying to
read that file.

A simple cat or wc -l on the file feels like about on the order of two
magnitudes slower ("time" reports around a second when the file has 450
entries).  Some servers seem to be worse than others, and it does not
appear to be proportional to the number of entries across servers.

netstat -tn just crawls along on these servers.  Should I enable
profile=1 or something to see what's happening here?

Examples:

2.2.19:

[sroot@marble:/root]# time wc -l /proc/net/tcp
    858 /proc/net/tcp
0.000u 0.010s 0:00.01 100.0%    0+0k 0+0io 112pf+0w

2.4.12:

[sroot@pro:/root]# time wc -l /proc/net/tcp
    463 /proc/net/tcp
0.000u 0.640s 0:00.64 100.0%    0+0k 0+0io 69pf+0w

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Really slow netstat and /proc/net/tcp in 2.4
  2001-10-11 18:47 Really slow netstat and /proc/net/tcp in 2.4 Simon Kirby
@ 2001-10-11 19:30 ` kuznet
  2001-10-11 19:55   ` Simon Kirby
  0 siblings, 1 reply; 11+ messages in thread
From: kuznet @ 2001-10-11 19:30 UTC (permalink / raw)
  To: Simon Kirby; +Cc: linux-kernel

Hello!

> Is there something that changed from 2.2 -> 2.4 with regards to the
> speed of netstat and /proc/net/tcp?

Incredibly high size of hash table, I think.
At least here size is ~1MB. And all this is read each 1K of data read
via /proc/ :-)

Alexey

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Really slow netstat and /proc/net/tcp in 2.4
  2001-10-11 19:30 ` kuznet
@ 2001-10-11 19:55   ` Simon Kirby
  2001-10-12 16:44     ` kuznet
  2001-10-12 19:56     ` Andi Kleen
  0 siblings, 2 replies; 11+ messages in thread
From: Simon Kirby @ 2001-10-11 19:55 UTC (permalink / raw)
  To: kuznet; +Cc: linux-kernel

On Thu, Oct 11, 2001 at 11:30:25PM +0400, kuznet@ms2.inr.ac.ru wrote:

> Hello!
> 
> > Is there something that changed from 2.2 -> 2.4 with regards to the
> > speed of netstat and /proc/net/tcp?
> 
> Incredibly high size of hash table, I think.
> At least here size is ~1MB. And all this is read each 1K of data read
> via /proc/ :-)

So it's walking the hash table per block read, and the hash table is very
large?  Hmm.  I notice it's a bit faster if I use dd if=/proc/net/tcp
of=/dev/null bs=1024k, but not much.

Is it possible to fix this?  Was the 2.2 hash table just that much
smaller?

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Really slow netstat and /proc/net/tcp in 2.4
  2001-10-11 19:55   ` Simon Kirby
@ 2001-10-12 16:44     ` kuznet
  2001-10-12 19:36       ` Simon Kirby
  2001-10-12 19:56     ` Andi Kleen
  1 sibling, 1 reply; 11+ messages in thread
From: kuznet @ 2001-10-12 16:44 UTC (permalink / raw)
  To: Simon Kirby; +Cc: linux-kernel

Hello!

> Is it possible to fix this?  Was the 2.2 hash table just that much
> smaller?

2.2 did not use hash tables, holding special single list for /proc.

If I understand correctly it was removed because added more data/work
and new point of synchronization for main path being useful only for /proc.
The approach would be justified, if you had 100000 sockets. In this
case both approaches are equally slow. :-) But for 1000 sockets hash
table of 100000 entries is sort of overscaled.


> Is it possible to fix this?

To fix --- no. To make differently --- yes.

Well, actually, if you are interested drop me a not I can pack for you some
my old work on this. It is fully functional, but api is still dirty.
It requires some patching kernel, unfortunately.

Alexey

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Really slow netstat and /proc/net/tcp in 2.4
  2001-10-12 16:44     ` kuznet
@ 2001-10-12 19:36       ` Simon Kirby
  2001-10-12 19:43         ` kuznet
  0 siblings, 1 reply; 11+ messages in thread
From: Simon Kirby @ 2001-10-12 19:36 UTC (permalink / raw)
  To: kuznet; +Cc: linux-kernel

On Fri, Oct 12, 2001 at 08:44:58PM +0400, kuznet@ms2.inr.ac.ru wrote:

> > Is it possible to fix this?  Was the 2.2 hash table just that much
> > smaller?
> 
> 2.2 did not use hash tables, holding special single list for /proc.
> 
> If I understand correctly it was removed because added more data/work
> and new point of synchronization for main path being useful only for /proc.
> The approach would be justified, if you had 100000 sockets. In this
> case both approaches are equally slow. :-) But for 1000 sockets hash
> table of 100000 entries is sort of overscaled.
> 
> > Is it possible to fix this?
> 
> To fix --- no. To make differently --- yes.
> 
> Well, actually, if you are interested drop me a not I can pack for you some
> my old work on this. It is fully functional, but api is still dirty.
> It requires some patching kernel, unfortunately.

If it involves changing the TCP stack locking stuff and putting the old
list back, it's probably not worth the bother.  The only thing we're
using the file for (other than the occasional admin-run "netstat") is to
check to see what ports are listening on the machine without actually
attempting to connect to them.  We check our services this way more often
than actually connecting and requesting a response to reduce log clutter
and testing load on the server.  Is there an easier way to accomplish
this than parsing /proc/net/tcp?  We could attempt to bind to the ports
we want to check, but that would race with daemons trying to start up.

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Really slow netstat and /proc/net/tcp in 2.4
  2001-10-12 19:36       ` Simon Kirby
@ 2001-10-12 19:43         ` kuznet
  0 siblings, 0 replies; 11+ messages in thread
From: kuznet @ 2001-10-12 19:43 UTC (permalink / raw)
  To: Simon Kirby; +Cc: linux-kernel

Hello!

> If it involves changing the TCP stack locking stuff

No. It even does not touch the kernel except for exporting 4 new
not exported symbols.

> and testing load on the server.  Is there an easier way to accomplish
> this than parsing /proc/net/tcp?  We could attempt to bind to the ports
> we want to check, but that would race with daemons trying to start up.

To syn-flood with single syn using packet socket.

Alexey


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Really slow netstat and /proc/net/tcp in 2.4
  2001-10-11 19:55   ` Simon Kirby
  2001-10-12 16:44     ` kuznet
@ 2001-10-12 19:56     ` Andi Kleen
  2001-10-12 22:10       ` Simon Kirby
  1 sibling, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2001-10-12 19:56 UTC (permalink / raw)
  To: Simon Kirby; +Cc: linux-kernel, kuznet

In article <20011011125538.C10868@netnation.com>,
Simon Kirby <sim@netnation.com> writes:
> On Thu, Oct 11, 2001 at 11:30:25PM +0400, kuznet@ms2.inr.ac.ru wrote:
>> Hello!
>> 
>> > Is there something that changed from 2.2 -> 2.4 with regards to the
>> > speed of netstat and /proc/net/tcp?
>> 
>> Incredibly high size of hash table, I think.
>> At least here size is ~1MB. And all this is read each 1K of data read
>> via /proc/ :-)

> So it's walking the hash table per block read, and the hash table is very
> large?  Hmm.  I notice it's a bit faster if I use dd if=/proc/net/tcp
> of=/dev/null bs=1024k, but not much.

> Is it possible to fix this?  Was the 2.2 hash table just that much
> smaller?

The hash table is likely to big anyways; eating cache and not helping that
much. If you're interested in some testing
I can send you patches to change it by hand and collect statistics for
average hash queue length. Then you can figure out a good size for your
workload with some work. Longer time I think the table sizing heuristics
are far too aggressive and need to be throttled back; but that needs more
data from real servers.

-Andi


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Really slow netstat and /proc/net/tcp in 2.4
  2001-10-12 19:56     ` Andi Kleen
@ 2001-10-12 22:10       ` Simon Kirby
  2001-10-12 23:57         ` Andi Kleen
  0 siblings, 1 reply; 11+ messages in thread
From: Simon Kirby @ 2001-10-12 22:10 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, kuznet

On Fri, Oct 12, 2001 at 09:56:01PM +0200, Andi Kleen wrote:

> The hash table is likely to big anyways; eating cache and not helping that
> much. If you're interested in some testing
> I can send you patches to change it by hand and collect statistics for
> average hash queue length. Then you can figure out a good size for your
> workload with some work. Longer time I think the table sizing heuristics
> are far too aggressive and need to be throttled back; but that needs more
> data from real servers.

Wouldn't just counting the lines in /proc/net/tcp be sufficient to see
how many buckets should be used in an ideal hash table distribution
scenario?  (In which case the size of the hash table depends largely on a
machine's work load...)

Most of our web servers seem to have 500-1000 entries in /proc/net/tcp.

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Really slow netstat and /proc/net/tcp in 2.4
  2001-10-12 22:10       ` Simon Kirby
@ 2001-10-12 23:57         ` Andi Kleen
  2001-10-13 15:07           ` Hugh Dickins
  0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2001-10-12 23:57 UTC (permalink / raw)
  To: Simon Kirby, Andi Kleen; +Cc: linux-kernel, kuznet

On Fri, Oct 12, 2001 at 03:10:33PM -0700, Simon Kirby wrote:
> On Fri, Oct 12, 2001 at 09:56:01PM +0200, Andi Kleen wrote:
> 
> > The hash table is likely to big anyways; eating cache and not helping that
> > much. If you're interested in some testing
> > I can send you patches to change it by hand and collect statistics for
> > average hash queue length. Then you can figure out a good size for your
> > workload with some work. Longer time I think the table sizing heuristics
> > are far too aggressive and need to be throttled back; but that needs more
> > data from real servers.
> 
> Wouldn't just counting the lines in /proc/net/tcp be sufficient to see
> how many buckets should be used in an ideal hash table distribution
> scenario?  (In which case the size of the hash table depends largely on a
> machine's work load...)

That won't tell you the list length of individual hash buckets. Keeping
that number in average slow is the goal of the big hash tables, but I suspect
the 2.4 ones are far too big; losing any possible benefit in cache non locality.

I attached a patch. It allows you to get some simple statistics from
/proc/net/sockstat (unfortunately costly too). It also adds a new kernel
boot argument tcpehashgoal=order. Order is the log2 of how many pages you
want to use for the hash table (so it needs 2^order * 4096 bytes on i386) 
You can experiment with various sizes and check which one gives still 
reasonable hash distribution under load.

The smallest one you can find is best.

BTW, it seems like the tables are 1/4 too big on SMP systems. the second
half reserved for time-wait have per bucket rwlocks too, but they're not 
used. If established and time-wait were split this wastage could be avoided. 
This way some memory (but not walk time) could be saved. It would also
lower the requirements on continuous memory by half; e.g. useful if tcp/ip
was ever turned into a module.

-Andi



--- net/ipv4/proc.c-o	Wed May 16 19:21:45 2001
+++ net/ipv4/proc.c	Sat Oct 13 03:37:55 2001
@@ -68,6 +68,7 @@
 {
 	/* From  net/socket.c  */
 	extern int socket_get_info(char *, char **, off_t, int);
+	extern int tcp_v4_hash_statistics(char *) ;
 
 	int len  = socket_get_info(buffer,start,offset,length);
 
@@ -82,6 +83,8 @@
 		       fold_prot_inuse(&raw_prot));
 	len += sprintf(buffer+len, "FRAG: inuse %d memory %d\n",
 		       ip_frag_nqueues, atomic_read(&ip_frag_mem));
+	len += tcp_v4_hash_statistics(buffer+len); 
+			   
 	if (offset >= len)
 	{
 		*start = buffer;
--- net/ipv4/tcp.c-o	Thu Oct 11 08:42:47 2001
+++ net/ipv4/tcp.c	Sat Oct 13 03:56:58 2001
@@ -2442,6 +2442,15 @@
   	return 0;
 }
 
+static unsigned tcp_ehash_order; 
+static int __init tcp_hash_setup(char *str)
+{
+	tcp_ehash_order = simple_strtol(str,NULL,0); 
+	return 0;
+} 
+
+__setup("tcpehashorder=", tcp_hash_setup); 
+
 
 extern void __skb_cb_too_small_for_tcp(int, int);
 
@@ -2486,8 +2495,12 @@
 	else
 		goal = num_physpages >> (23 - PAGE_SHIFT);
 
-	for(order = 0; (1UL << order) < goal; order++)
-		;
+	if (tcp_ehash_order) 
+		order = tcp_ehash_order;
+	else {	
+		for(order = 0; (1UL << order) < goal; order++)
+			;
+	} 		
 	do {
 		tcp_ehash_size = (1UL << order) * PAGE_SIZE /
 			sizeof(struct tcp_ehash_bucket);
--- net/ipv4/tcp_ipv4.c-o	Mon Oct  1 18:19:56 2001
+++ net/ipv4/tcp_ipv4.c	Sat Oct 13 03:41:57 2001
@@ -2162,6 +2162,62 @@
 	return len;
 }
 
+int tcp_v4_hash_statistics(char *buffer)
+{
+	int i;
+	int max_hlen = 0, hrun = 0, hcnt = 0 ;
+	char *bufs = buffer;
+
+	buffer += sprintf(buffer, "hash_buckets %d\n", tcp_ehash_size*2); 
+
+	local_bh_disable();
+	for (i = 0; i < tcp_ehash_size; i++) {
+		struct tcp_ehash_bucket *head = &tcp_ehash[i];
+		struct sock *sk;
+		struct tcp_tw_bucket *tw;
+		int len = 0; 
+
+		read_lock(&head->lock);
+		for(sk = head->chain; sk; sk = sk->next) {
+			if (!TCP_INET_FAMILY(sk->family))
+				continue;
+			++len; 
+		}
+
+		if (len > 0) { 
+			if (len > max_hlen) max_hlen = len;
+			++hcnt; 
+			hrun += len; 
+		} 
+
+		len = 0; 
+
+		for (tw = (struct tcp_tw_bucket *)tcp_ehash[i+tcp_ehash_size].chain;
+		     tw != NULL;
+		     tw = (struct tcp_tw_bucket *)tw->next) {
+			if (!TCP_INET_FAMILY(tw->family))
+				continue;
+			++len; 
+		}
+		read_unlock(&head->lock);
+
+		if (len > 0) { 
+			if (len > max_hlen) max_hlen = len;
+			++hcnt; 
+			hrun += len; 
+		} 
+	}
+
+	local_bh_enable();
+
+	buffer += sprintf(buffer, "used hash buckets: %d\n", hcnt); 
+	if (hcnt > 0) 
+		buffer += sprintf(buffer, "average length: %d\n", hrun / hcnt); 
+
+	return buffer - bufs; 
+}
+
+
 struct proto tcp_prot = {
 	name:		"TCP",
 	close:		tcp_close,
@@ -2210,3 +2266,4 @@
 	 */
 	tcp_socket->sk->prot->unhash(tcp_socket->sk);
 }
+

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Really slow netstat and /proc/net/tcp in 2.4
  2001-10-12 23:57         ` Andi Kleen
@ 2001-10-13 15:07           ` Hugh Dickins
  2001-10-13 16:07             ` Andi Kleen
  0 siblings, 1 reply; 11+ messages in thread
From: Hugh Dickins @ 2001-10-13 15:07 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Simon Kirby, linux-kernel, kuznet

On Sat, 13 Oct 2001, Andi Kleen wrote:
> 
> I attached a patch. It allows you to get some simple statistics from
> /proc/net/sockstat (unfortunately costly too). It also adds a new kernel
> boot argument tcpehashgoal=order. Order is the log2 of how many pages you
> want to use for the hash table (so it needs 2^order * 4096 bytes on i386) 
> You can experiment with various sizes and check which one gives still 
> reasonable hash distribution under load.

Wouldn't something like "tcpehashbuckets" make a better boot tunable
than "tcpehashorder"?  Rounded up to next power of two before used.

I come at this from the PAGE_SIZE angle, rather than the TCP angle:
"order" tunables seem confusing to me (being interested in configurable
PAGE_SIZE).  And they're confusing to code too: note that the existing
calculation of goal from num_physpages gives you more hash buckets for
larger PAGE_SIZE (comment says "methodology is similar to that of the
buffer cache", but buffer cache gets it right - though for small memory,
would do better to multiply mempages by sizeof _before_ shifting right).

Hugh


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Really slow netstat and /proc/net/tcp in 2.4
  2001-10-13 15:07           ` Hugh Dickins
@ 2001-10-13 16:07             ` Andi Kleen
  0 siblings, 0 replies; 11+ messages in thread
From: Andi Kleen @ 2001-10-13 16:07 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-kernel

In article <Pine.LNX.4.21.0110131537470.931-100000@localhost.localdomain>,
Hugh Dickins <hugh@veritas.com> writes:
> On Sat, 13 Oct 2001, Andi Kleen wrote:
>> 
>> I attached a patch. It allows you to get some simple statistics from
>> /proc/net/sockstat (unfortunately costly too). It also adds a new kernel
>> boot argument tcpehashgoal=order. Order is the log2 of how many pages you
>> want to use for the hash table (so it needs 2^order * 4096 bytes on i386) 
>> You can experiment with various sizes and check which one gives still 
>> reasonable hash distribution under load.

> Wouldn't something like "tcpehashbuckets" make a better boot tunable
> than "tcpehashorder"?  Rounded up to next power of two before used.
[...]

I just hacked something quickly together so that people can test what
impact different hash tables sizes have on their workload, and for
that using the order was easiest. The goal is of course to do
automatic hash table tuning. I don't expect it to be an permanent
tunable. My hope is that it'll turn out that smaller hash tables will
be good enough.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2001-10-13 16:07 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-11 18:47 Really slow netstat and /proc/net/tcp in 2.4 Simon Kirby
2001-10-11 19:30 ` kuznet
2001-10-11 19:55   ` Simon Kirby
2001-10-12 16:44     ` kuznet
2001-10-12 19:36       ` Simon Kirby
2001-10-12 19:43         ` kuznet
2001-10-12 19:56     ` Andi Kleen
2001-10-12 22:10       ` Simon Kirby
2001-10-12 23:57         ` Andi Kleen
2001-10-13 15:07           ` Hugh Dickins
2001-10-13 16:07             ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox