* [patch] do not readlock all buckets in /proc/net/tcp
@ 2004-07-05 11:09 Marcus Meissner
2004-07-05 11:27 ` Herbert Xu
0 siblings, 1 reply; 7+ messages in thread
From: Marcus Meissner @ 2004-07-05 11:09 UTC (permalink / raw)
To: netdev
[-- Attachment #1: Type: text/plain, Size: 466 bytes --]
Hi,
This patch makes the files /proc/net/tcp and /proc/net/tcp6 not acquire
the readlock for every bucket.
On ppc64 and ia64 the readlocks are so expensive, that reading /proc/net/tcp
takes 0.25 seconds on a usual p670 LPAR.
And it locks 65536 buckets where just 20 chains are used at all in a normal
non-netserver setup.
Ciao, Marcus
Changelog:
Readlock only non-empty hash chains to avoid 65536 readlocks.
Signed-Off-By: Marcus Meissner <meissner@suse.de>
[-- Attachment #2: tcp-proc-walk --]
[-- Type: text/plain, Size: 1255 bytes --]
--- linux-2.6.5/net/ipv4/tcp_ipv4.c.xx 2004-07-04 13:39:51.000000000 +0200
+++ linux-2.6.5/net/ipv4/tcp_ipv4.c 2004-07-04 13:51:57.000000000 +0200
@@ -2255,6 +2255,12 @@
struct hlist_node *node;
struct tcp_tw_bucket *tw;
+ /* Avoid taking the readlock cost if we know the chain is empty,
+ * we have a lot of buckets.
+ */
+ if (hlist_empty(&tcp_ehash[st->bucket].chain) &&
+ hlist_empty(&tcp_ehash[st->bucket+tcp_ehash_size].chain))
+ continue;
read_lock(&tcp_ehash[st->bucket].lock);
sk_for_each(sk, node, &tcp_ehash[st->bucket].chain) {
if (sk->sk_family != st->family) {
@@ -2301,13 +2307,17 @@
}
read_unlock(&tcp_ehash[st->bucket].lock);
st->state = TCP_SEQ_STATE_ESTABLISHED;
- if (++st->bucket < tcp_ehash_size) {
- read_lock(&tcp_ehash[st->bucket].lock);
- sk = sk_head(&tcp_ehash[st->bucket].chain);
- } else {
+
+ while ((++st->bucket < tcp_ehash_size) &&
+ hlist_empty(&tcp_ehash[st->bucket].chain) &&
+ hlist_empty(&tcp_ehash[st->bucket+tcp_ehash_size].chain))
+ /*empty*/;
+ if (st->bucket >= tcp_ehash_size) {
cur = NULL;
goto out;
}
+ read_lock(&tcp_ehash[st->bucket].lock);
+ sk = sk_head(&tcp_ehash[st->bucket].chain);
} else
sk = sk_next(sk);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] do not readlock all buckets in /proc/net/tcp
2004-07-05 11:09 [patch] do not readlock all buckets in /proc/net/tcp Marcus Meissner
@ 2004-07-05 11:27 ` Herbert Xu
2004-07-05 11:35 ` Marcus Meissner
0 siblings, 1 reply; 7+ messages in thread
From: Herbert Xu @ 2004-07-05 11:27 UTC (permalink / raw)
To: Marcus Meissner; +Cc: netdev
Marcus Meissner <meissner@suse.de> wrote:
>
> This patch makes the files /proc/net/tcp and /proc/net/tcp6 not acquire
> the readlock for every bucket.
>
> On ppc64 and ia64 the readlocks are so expensive, that reading /proc/net/tcp
> takes 0.25 seconds on a usual p670 LPAR.
>
> And it locks 65536 buckets where just 20 chains are used at all in a normal
> non-netserver setup.
Why not use NETLINK+TCP_DIAG instead? It's much faster.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] do not readlock all buckets in /proc/net/tcp
2004-07-05 11:27 ` Herbert Xu
@ 2004-07-05 11:35 ` Marcus Meissner
2004-07-05 12:06 ` Herbert Xu
0 siblings, 1 reply; 7+ messages in thread
From: Marcus Meissner @ 2004-07-05 11:35 UTC (permalink / raw)
To: Herbert Xu; +Cc: netdev
On Mon, Jul 05, 2004 at 09:27:54PM +1000, Herbert Xu wrote:
> Marcus Meissner <meissner@suse.de> wrote:
> >
> > This patch makes the files /proc/net/tcp and /proc/net/tcp6 not acquire
> > the readlock for every bucket.
> >
> > On ppc64 and ia64 the readlocks are so expensive, that reading /proc/net/tcp
> > takes 0.25 seconds on a usual p670 LPAR.
> >
> > And it locks 65536 buckets where just 20 chains are used at all in a normal
> > non-netserver setup.
>
> Why not use NETLINK+TCP_DIAG instead? It's much faster.
Not sure if you want / can fix all proprietary software.
Oh, and NETLINK+TCP_DIAG seems to have the same readlock contention problem,
see tcpdiag_dump().
Ciao, Marcus
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] do not readlock all buckets in /proc/net/tcp
2004-07-05 11:35 ` Marcus Meissner
@ 2004-07-05 12:06 ` Herbert Xu
2004-07-05 12:25 ` YOSHIFUJI Hideaki / 吉藤英明
0 siblings, 1 reply; 7+ messages in thread
From: Herbert Xu @ 2004-07-05 12:06 UTC (permalink / raw)
To: Marcus Meissner; +Cc: netdev
On Mon, Jul 05, 2004 at 01:35:55PM +0200, Marcus Meissner wrote:
>
> Oh, and NETLINK+TCP_DIAG seems to have the same readlock contention problem,
> see tcpdiag_dump().
Then you wouldn't mind adding this optimisation for tcp_diag as well,
right? :)
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] do not readlock all buckets in /proc/net/tcp
2004-07-05 12:06 ` Herbert Xu
@ 2004-07-05 12:25 ` YOSHIFUJI Hideaki / 吉藤英明
2004-07-05 12:45 ` Marcus Meissner
0 siblings, 1 reply; 7+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2004-07-05 12:25 UTC (permalink / raw)
To: herbert, meissner; +Cc: netdev, yoshfuji
In article <20040705120610.GA5728@gondor.apana.org.au> (at Mon, 5 Jul 2004 22:06:10 +1000), Herbert Xu <herbert@gondor.apana.org.au> says:
> On Mon, Jul 05, 2004 at 01:35:55PM +0200, Marcus Meissner wrote:
> >
> > Oh, and NETLINK+TCP_DIAG seems to have the same readlock contention problem,
> > see tcpdiag_dump().
>
> Then you wouldn't mind adding this optimisation for tcp_diag as well,
> right? :)
here it is. :-)
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
===== net/ipv4/tcp_diag.c 1.15 vs edited =====
--- 1.15/net/ipv4/tcp_diag.c 2004-06-08 07:27:58 +09:00
+++ edited/net/ipv4/tcp_diag.c 2004-07-05 21:18:17 +09:00
@@ -522,9 +522,13 @@
if (i > s_i)
s_num = 0;
- read_lock_bh(&head->lock);
-
num = 0;
+
+ if (hlist_empty(&head->chain) &&
+ (!(r->tcpdiag_states&TCPF_TIME_WAIT) || hlist_empty(&head->chain)))
+ continue;
+
+ read_lock_bh(&head->lock);
sk_for_each(sk, node, &head->chain) {
struct inet_opt *inet = inet_sk(sk);
--
Hideaki YOSHIFUJI @ USAGI Project <yoshfuji@linux-ipv6.org>
GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] do not readlock all buckets in /proc/net/tcp
2004-07-05 12:25 ` YOSHIFUJI Hideaki / 吉藤英明
@ 2004-07-05 12:45 ` Marcus Meissner
2004-07-05 13:25 ` YOSHIFUJI Hideaki / 吉藤英明
0 siblings, 1 reply; 7+ messages in thread
From: Marcus Meissner @ 2004-07-05 12:45 UTC (permalink / raw)
To: YOSHIFUJI Hideaki / 吉藤英明; +Cc: netdev
On Mon, Jul 05, 2004 at 09:25:22PM +0900, YOSHIFUJI Hideaki / 吉藤英明 wrote:
> In article <20040705120610.GA5728@gondor.apana.org.au> (at Mon, 5 Jul 2004 22:06:10 +1000), Herbert Xu <herbert@gondor.apana.org.au> says:
>
> > On Mon, Jul 05, 2004 at 01:35:55PM +0200, Marcus Meissner wrote:
> > >
> > > Oh, and NETLINK+TCP_DIAG seems to have the same readlock contention problem,
> > > see tcpdiag_dump().
> >
> > Then you wouldn't mind adding this optimisation for tcp_diag as well,
> > right? :)
>
> here it is. :-)
>
> Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
>
> ===== net/ipv4/tcp_diag.c 1.15 vs edited =====
> --- 1.15/net/ipv4/tcp_diag.c 2004-06-08 07:27:58 +09:00
> +++ edited/net/ipv4/tcp_diag.c 2004-07-05 21:18:17 +09:00
> @@ -522,9 +522,13 @@
> if (i > s_i)
> s_num = 0;
>
> - read_lock_bh(&head->lock);
> -
> num = 0;
> +
> + if (hlist_empty(&head->chain) &&
> + (!(r->tcpdiag_states&TCPF_TIME_WAIT) || hlist_empty(&head->chain)))
> + continue;
The second hlist_empty is bad, you should be checking &tcp_ehash[i +
tcp_ehash_size].chain ((head+tcp_ehash_size) I think).
Ciao, Marcus
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] do not readlock all buckets in /proc/net/tcp
2004-07-05 12:45 ` Marcus Meissner
@ 2004-07-05 13:25 ` YOSHIFUJI Hideaki / 吉藤英明
0 siblings, 0 replies; 7+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2004-07-05 13:25 UTC (permalink / raw)
To: meissner; +Cc: netdev, yoshfuji
In article <20040705124511.GA17193@suse.de> (at Mon, 5 Jul 2004 14:45:11 +0200), Marcus Meissner <meissner@suse.de> says:
> The second hlist_empty is bad, you should be checking &tcp_ehash[i +
> tcp_ehash_size].chain ((head+tcp_ehash_size) I think).
Oops...
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
===== net/ipv4/tcp_diag.c 1.15 vs edited =====
--- 1.15/net/ipv4/tcp_diag.c 2004-06-08 07:27:58 +09:00
+++ edited/net/ipv4/tcp_diag.c 2004-07-05 22:21:06 +09:00
@@ -522,9 +522,14 @@
if (i > s_i)
s_num = 0;
- read_lock_bh(&head->lock);
-
num = 0;
+
+ if (hlist_empty(&head->chain) &&
+ (!(r->tcpdiag_states&TCPF_TIME_WAIT) ||
+ hlist_empty(&(head + tcp_ehash_size)->chain)))
+ continue;
+
+ read_lock_bh(&head->lock);
sk_for_each(sk, node, &head->chain) {
struct inet_opt *inet = inet_sk(sk);
--
Hideaki YOSHIFUJI @ USAGI Project <yoshfuji@linux-ipv6.org>
GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-07-05 13:25 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-05 11:09 [patch] do not readlock all buckets in /proc/net/tcp Marcus Meissner
2004-07-05 11:27 ` Herbert Xu
2004-07-05 11:35 ` Marcus Meissner
2004-07-05 12:06 ` Herbert Xu
2004-07-05 12:25 ` YOSHIFUJI Hideaki / 吉藤英明
2004-07-05 12:45 ` Marcus Meissner
2004-07-05 13:25 ` YOSHIFUJI Hideaki / 吉藤英明
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).