* Really slow netstat and /proc/net/tcp in 2.4
@ 2001-10-11 18:47 Simon Kirby
2001-10-11 19:30 ` kuznet
0 siblings, 1 reply; 11+ messages in thread
From: Simon Kirby @ 2001-10-11 18:47 UTC (permalink / raw)
To: linux-kernel; +Cc: David S. Miller
Is there something that changed from 2.2 -> 2.4 with regards to the
speed of netstat and /proc/net/tcp? We have some webservers we just
upgraded from 2.2.19 to 2.4.12, and some in-house monitoring tools that
check /proc/net/tcp have begun to suck up a lot of CPU cycles trying to
read that file.
A simple cat or wc -l on the file feels like about on the order of two
magnitudes slower ("time" reports around a second when the file has 450
entries). Some servers seem to be worse than others, and it does not
appear to be proportional to the number of entries across servers.
netstat -tn just crawls along on these servers. Should I enable
profile=1 or something to see what's happening here?
Examples:
2.2.19:
[sroot@marble:/root]# time wc -l /proc/net/tcp
858 /proc/net/tcp
0.000u 0.010s 0:00.01 100.0% 0+0k 0+0io 112pf+0w
2.4.12:
[sroot@pro:/root]# time wc -l /proc/net/tcp
463 /proc/net/tcp
0.000u 0.640s 0:00.64 100.0% 0+0k 0+0io 69pf+0w
Simon-
[ Stormix Technologies Inc. ][ NetNation Communications Inc. ]
[ sim@stormix.com ][ sim@netnation.com ]
[ Opinions expressed are not necessarily those of my employers. ]
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Really slow netstat and /proc/net/tcp in 2.4 2001-10-11 18:47 Really slow netstat and /proc/net/tcp in 2.4 Simon Kirby @ 2001-10-11 19:30 ` kuznet 2001-10-11 19:55 ` Simon Kirby 0 siblings, 1 reply; 11+ messages in thread From: kuznet @ 2001-10-11 19:30 UTC (permalink / raw) To: Simon Kirby; +Cc: linux-kernel Hello! > Is there something that changed from 2.2 -> 2.4 with regards to the > speed of netstat and /proc/net/tcp? Incredibly high size of hash table, I think. At least here size is ~1MB. And all this is read each 1K of data read via /proc/ :-) Alexey ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Really slow netstat and /proc/net/tcp in 2.4 2001-10-11 19:30 ` kuznet @ 2001-10-11 19:55 ` Simon Kirby 2001-10-12 16:44 ` kuznet 2001-10-12 19:56 ` Andi Kleen 0 siblings, 2 replies; 11+ messages in thread From: Simon Kirby @ 2001-10-11 19:55 UTC (permalink / raw) To: kuznet; +Cc: linux-kernel On Thu, Oct 11, 2001 at 11:30:25PM +0400, kuznet@ms2.inr.ac.ru wrote: > Hello! > > > Is there something that changed from 2.2 -> 2.4 with regards to the > > speed of netstat and /proc/net/tcp? > > Incredibly high size of hash table, I think. > At least here size is ~1MB. And all this is read each 1K of data read > via /proc/ :-) So it's walking the hash table per block read, and the hash table is very large? Hmm. I notice it's a bit faster if I use dd if=/proc/net/tcp of=/dev/null bs=1024k, but not much. Is it possible to fix this? Was the 2.2 hash table just that much smaller? Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ sim@stormix.com ][ sim@netnation.com ] [ Opinions expressed are not necessarily those of my employers. ] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Really slow netstat and /proc/net/tcp in 2.4 2001-10-11 19:55 ` Simon Kirby @ 2001-10-12 16:44 ` kuznet 2001-10-12 19:36 ` Simon Kirby 2001-10-12 19:56 ` Andi Kleen 1 sibling, 1 reply; 11+ messages in thread From: kuznet @ 2001-10-12 16:44 UTC (permalink / raw) To: Simon Kirby; +Cc: linux-kernel Hello! > Is it possible to fix this? Was the 2.2 hash table just that much > smaller? 2.2 did not use hash tables, holding special single list for /proc. If I understand correctly it was removed because added more data/work and new point of synchronization for main path being useful only for /proc. The approach would be justified, if you had 100000 sockets. In this case both approaches are equally slow. :-) But for 1000 sockets hash table of 100000 entries is sort of overscaled. > Is it possible to fix this? To fix --- no. To make differently --- yes. Well, actually, if you are interested drop me a not I can pack for you some my old work on this. It is fully functional, but api is still dirty. It requires some patching kernel, unfortunately. Alexey ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Really slow netstat and /proc/net/tcp in 2.4 2001-10-12 16:44 ` kuznet @ 2001-10-12 19:36 ` Simon Kirby 2001-10-12 19:43 ` kuznet 0 siblings, 1 reply; 11+ messages in thread From: Simon Kirby @ 2001-10-12 19:36 UTC (permalink / raw) To: kuznet; +Cc: linux-kernel On Fri, Oct 12, 2001 at 08:44:58PM +0400, kuznet@ms2.inr.ac.ru wrote: > > Is it possible to fix this? Was the 2.2 hash table just that much > > smaller? > > 2.2 did not use hash tables, holding special single list for /proc. > > If I understand correctly it was removed because added more data/work > and new point of synchronization for main path being useful only for /proc. > The approach would be justified, if you had 100000 sockets. In this > case both approaches are equally slow. :-) But for 1000 sockets hash > table of 100000 entries is sort of overscaled. > > > Is it possible to fix this? > > To fix --- no. To make differently --- yes. > > Well, actually, if you are interested drop me a not I can pack for you some > my old work on this. It is fully functional, but api is still dirty. > It requires some patching kernel, unfortunately. If it involves changing the TCP stack locking stuff and putting the old list back, it's probably not worth the bother. The only thing we're using the file for (other than the occasional admin-run "netstat") is to check to see what ports are listening on the machine without actually attempting to connect to them. We check our services this way more often than actually connecting and requesting a response to reduce log clutter and testing load on the server. Is there an easier way to accomplish this than parsing /proc/net/tcp? We could attempt to bind to the ports we want to check, but that would race with daemons trying to start up. Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ sim@stormix.com ][ sim@netnation.com ] [ Opinions expressed are not necessarily those of my employers. ] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Really slow netstat and /proc/net/tcp in 2.4 2001-10-12 19:36 ` Simon Kirby @ 2001-10-12 19:43 ` kuznet 0 siblings, 0 replies; 11+ messages in thread From: kuznet @ 2001-10-12 19:43 UTC (permalink / raw) To: Simon Kirby; +Cc: linux-kernel Hello! > If it involves changing the TCP stack locking stuff No. It even does not touch the kernel except for exporting 4 new not exported symbols. > and testing load on the server. Is there an easier way to accomplish > this than parsing /proc/net/tcp? We could attempt to bind to the ports > we want to check, but that would race with daemons trying to start up. To syn-flood with single syn using packet socket. Alexey ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Really slow netstat and /proc/net/tcp in 2.4 2001-10-11 19:55 ` Simon Kirby 2001-10-12 16:44 ` kuznet @ 2001-10-12 19:56 ` Andi Kleen 2001-10-12 22:10 ` Simon Kirby 1 sibling, 1 reply; 11+ messages in thread From: Andi Kleen @ 2001-10-12 19:56 UTC (permalink / raw) To: Simon Kirby; +Cc: linux-kernel, kuznet In article <20011011125538.C10868@netnation.com>, Simon Kirby <sim@netnation.com> writes: > On Thu, Oct 11, 2001 at 11:30:25PM +0400, kuznet@ms2.inr.ac.ru wrote: >> Hello! >> >> > Is there something that changed from 2.2 -> 2.4 with regards to the >> > speed of netstat and /proc/net/tcp? >> >> Incredibly high size of hash table, I think. >> At least here size is ~1MB. And all this is read each 1K of data read >> via /proc/ :-) > So it's walking the hash table per block read, and the hash table is very > large? Hmm. I notice it's a bit faster if I use dd if=/proc/net/tcp > of=/dev/null bs=1024k, but not much. > Is it possible to fix this? Was the 2.2 hash table just that much > smaller? The hash table is likely to big anyways; eating cache and not helping that much. If you're interested in some testing I can send you patches to change it by hand and collect statistics for average hash queue length. Then you can figure out a good size for your workload with some work. Longer time I think the table sizing heuristics are far too aggressive and need to be throttled back; but that needs more data from real servers. -Andi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Really slow netstat and /proc/net/tcp in 2.4 2001-10-12 19:56 ` Andi Kleen @ 2001-10-12 22:10 ` Simon Kirby 2001-10-12 23:57 ` Andi Kleen 0 siblings, 1 reply; 11+ messages in thread From: Simon Kirby @ 2001-10-12 22:10 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel, kuznet On Fri, Oct 12, 2001 at 09:56:01PM +0200, Andi Kleen wrote: > The hash table is likely to big anyways; eating cache and not helping that > much. If you're interested in some testing > I can send you patches to change it by hand and collect statistics for > average hash queue length. Then you can figure out a good size for your > workload with some work. Longer time I think the table sizing heuristics > are far too aggressive and need to be throttled back; but that needs more > data from real servers. Wouldn't just counting the lines in /proc/net/tcp be sufficient to see how many buckets should be used in an ideal hash table distribution scenario? (In which case the size of the hash table depends largely on a machine's work load...) Most of our web servers seem to have 500-1000 entries in /proc/net/tcp. Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ sim@stormix.com ][ sim@netnation.com ] [ Opinions expressed are not necessarily those of my employers. ] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Really slow netstat and /proc/net/tcp in 2.4 2001-10-12 22:10 ` Simon Kirby @ 2001-10-12 23:57 ` Andi Kleen 2001-10-13 15:07 ` Hugh Dickins 0 siblings, 1 reply; 11+ messages in thread From: Andi Kleen @ 2001-10-12 23:57 UTC (permalink / raw) To: Simon Kirby, Andi Kleen; +Cc: linux-kernel, kuznet On Fri, Oct 12, 2001 at 03:10:33PM -0700, Simon Kirby wrote: > On Fri, Oct 12, 2001 at 09:56:01PM +0200, Andi Kleen wrote: > > > The hash table is likely to big anyways; eating cache and not helping that > > much. If you're interested in some testing > > I can send you patches to change it by hand and collect statistics for > > average hash queue length. Then you can figure out a good size for your > > workload with some work. Longer time I think the table sizing heuristics > > are far too aggressive and need to be throttled back; but that needs more > > data from real servers. > > Wouldn't just counting the lines in /proc/net/tcp be sufficient to see > how many buckets should be used in an ideal hash table distribution > scenario? (In which case the size of the hash table depends largely on a > machine's work load...) That won't tell you the list length of individual hash buckets. Keeping that number in average slow is the goal of the big hash tables, but I suspect the 2.4 ones are far too big; losing any possible benefit in cache non locality. I attached a patch. It allows you to get some simple statistics from /proc/net/sockstat (unfortunately costly too). It also adds a new kernel boot argument tcpehashgoal=order. Order is the log2 of how many pages you want to use for the hash table (so it needs 2^order * 4096 bytes on i386) You can experiment with various sizes and check which one gives still reasonable hash distribution under load. The smallest one you can find is best. BTW, it seems like the tables are 1/4 too big on SMP systems. the second half reserved for time-wait have per bucket rwlocks too, but they're not used. If established and time-wait were split this wastage could be avoided. This way some memory (but not walk time) could be saved. It would also lower the requirements on continuous memory by half; e.g. useful if tcp/ip was ever turned into a module. -Andi --- net/ipv4/proc.c-o Wed May 16 19:21:45 2001 +++ net/ipv4/proc.c Sat Oct 13 03:37:55 2001 @@ -68,6 +68,7 @@ { /* From net/socket.c */ extern int socket_get_info(char *, char **, off_t, int); + extern int tcp_v4_hash_statistics(char *) ; int len = socket_get_info(buffer,start,offset,length); @@ -82,6 +83,8 @@ fold_prot_inuse(&raw_prot)); len += sprintf(buffer+len, "FRAG: inuse %d memory %d\n", ip_frag_nqueues, atomic_read(&ip_frag_mem)); + len += tcp_v4_hash_statistics(buffer+len); + if (offset >= len) { *start = buffer; --- net/ipv4/tcp.c-o Thu Oct 11 08:42:47 2001 +++ net/ipv4/tcp.c Sat Oct 13 03:56:58 2001 @@ -2442,6 +2442,15 @@ return 0; } +static unsigned tcp_ehash_order; +static int __init tcp_hash_setup(char *str) +{ + tcp_ehash_order = simple_strtol(str,NULL,0); + return 0; +} + +__setup("tcpehashorder=", tcp_hash_setup); + extern void __skb_cb_too_small_for_tcp(int, int); @@ -2486,8 +2495,12 @@ else goal = num_physpages >> (23 - PAGE_SHIFT); - for(order = 0; (1UL << order) < goal; order++) - ; + if (tcp_ehash_order) + order = tcp_ehash_order; + else { + for(order = 0; (1UL << order) < goal; order++) + ; + } do { tcp_ehash_size = (1UL << order) * PAGE_SIZE / sizeof(struct tcp_ehash_bucket); --- net/ipv4/tcp_ipv4.c-o Mon Oct 1 18:19:56 2001 +++ net/ipv4/tcp_ipv4.c Sat Oct 13 03:41:57 2001 @@ -2162,6 +2162,62 @@ return len; } +int tcp_v4_hash_statistics(char *buffer) +{ + int i; + int max_hlen = 0, hrun = 0, hcnt = 0 ; + char *bufs = buffer; + + buffer += sprintf(buffer, "hash_buckets %d\n", tcp_ehash_size*2); + + local_bh_disable(); + for (i = 0; i < tcp_ehash_size; i++) { + struct tcp_ehash_bucket *head = &tcp_ehash[i]; + struct sock *sk; + struct tcp_tw_bucket *tw; + int len = 0; + + read_lock(&head->lock); + for(sk = head->chain; sk; sk = sk->next) { + if (!TCP_INET_FAMILY(sk->family)) + continue; + ++len; + } + + if (len > 0) { + if (len > max_hlen) max_hlen = len; + ++hcnt; + hrun += len; + } + + len = 0; + + for (tw = (struct tcp_tw_bucket *)tcp_ehash[i+tcp_ehash_size].chain; + tw != NULL; + tw = (struct tcp_tw_bucket *)tw->next) { + if (!TCP_INET_FAMILY(tw->family)) + continue; + ++len; + } + read_unlock(&head->lock); + + if (len > 0) { + if (len > max_hlen) max_hlen = len; + ++hcnt; + hrun += len; + } + } + + local_bh_enable(); + + buffer += sprintf(buffer, "used hash buckets: %d\n", hcnt); + if (hcnt > 0) + buffer += sprintf(buffer, "average length: %d\n", hrun / hcnt); + + return buffer - bufs; +} + + struct proto tcp_prot = { name: "TCP", close: tcp_close, @@ -2210,3 +2266,4 @@ */ tcp_socket->sk->prot->unhash(tcp_socket->sk); } + ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Really slow netstat and /proc/net/tcp in 2.4 2001-10-12 23:57 ` Andi Kleen @ 2001-10-13 15:07 ` Hugh Dickins 2001-10-13 16:07 ` Andi Kleen 0 siblings, 1 reply; 11+ messages in thread From: Hugh Dickins @ 2001-10-13 15:07 UTC (permalink / raw) To: Andi Kleen; +Cc: Simon Kirby, linux-kernel, kuznet On Sat, 13 Oct 2001, Andi Kleen wrote: > > I attached a patch. It allows you to get some simple statistics from > /proc/net/sockstat (unfortunately costly too). It also adds a new kernel > boot argument tcpehashgoal=order. Order is the log2 of how many pages you > want to use for the hash table (so it needs 2^order * 4096 bytes on i386) > You can experiment with various sizes and check which one gives still > reasonable hash distribution under load. Wouldn't something like "tcpehashbuckets" make a better boot tunable than "tcpehashorder"? Rounded up to next power of two before used. I come at this from the PAGE_SIZE angle, rather than the TCP angle: "order" tunables seem confusing to me (being interested in configurable PAGE_SIZE). And they're confusing to code too: note that the existing calculation of goal from num_physpages gives you more hash buckets for larger PAGE_SIZE (comment says "methodology is similar to that of the buffer cache", but buffer cache gets it right - though for small memory, would do better to multiply mempages by sizeof _before_ shifting right). Hugh ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Really slow netstat and /proc/net/tcp in 2.4 2001-10-13 15:07 ` Hugh Dickins @ 2001-10-13 16:07 ` Andi Kleen 0 siblings, 0 replies; 11+ messages in thread From: Andi Kleen @ 2001-10-13 16:07 UTC (permalink / raw) To: Hugh Dickins; +Cc: linux-kernel In article <Pine.LNX.4.21.0110131537470.931-100000@localhost.localdomain>, Hugh Dickins <hugh@veritas.com> writes: > On Sat, 13 Oct 2001, Andi Kleen wrote: >> >> I attached a patch. It allows you to get some simple statistics from >> /proc/net/sockstat (unfortunately costly too). It also adds a new kernel >> boot argument tcpehashgoal=order. Order is the log2 of how many pages you >> want to use for the hash table (so it needs 2^order * 4096 bytes on i386) >> You can experiment with various sizes and check which one gives still >> reasonable hash distribution under load. > Wouldn't something like "tcpehashbuckets" make a better boot tunable > than "tcpehashorder"? Rounded up to next power of two before used. [...] I just hacked something quickly together so that people can test what impact different hash tables sizes have on their workload, and for that using the order was easiest. The goal is of course to do automatic hash table tuning. I don't expect it to be an permanent tunable. My hope is that it'll turn out that smaller hash tables will be good enough. -Andi ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2001-10-13 16:07 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-10-11 18:47 Really slow netstat and /proc/net/tcp in 2.4 Simon Kirby 2001-10-11 19:30 ` kuznet 2001-10-11 19:55 ` Simon Kirby 2001-10-12 16:44 ` kuznet 2001-10-12 19:36 ` Simon Kirby 2001-10-12 19:43 ` kuznet 2001-10-12 19:56 ` Andi Kleen 2001-10-12 22:10 ` Simon Kirby 2001-10-12 23:57 ` Andi Kleen 2001-10-13 15:07 ` Hugh Dickins 2001-10-13 16:07 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox