From: Andi Kleen <andi@firstfloor.org>
To: Dave Jones <davej@redhat.com>
Cc: netdev@vger.kernel.org, j.w.r.degoede@hhs.nl
Subject: Re: cat /proc/net/tcp takes 0.5 seconds on x86_64
Date: Wed, 27 Aug 2008 14:41:52 +0200 [thread overview]
Message-ID: <87zlmyr5nz.fsf@basil.nowhere.org> (raw)
In-Reply-To: <20080826163719.GA25066@redhat.com> (Dave Jones's message of "Tue, 26 Aug 2008 12:37:19 -0400")
Dave Jones <davej@redhat.com> writes:
> Just had this bug reported against our development tree..
SUSE had an old patch for this which unfortunately got rejected
some time ago for some bogus reason.
The reason why it's so slow is because the hash table walk
takes a read lock for each bucket, which is just not fast.
On some architectures like POWER it is even slower than on x86.
The patch simply skipped that for empty buckets.
I append the old patch (haven't checked if it applies
to an recent kernel)
-Andi
Skip empty hash buckets faster in /proc/net/tcp
On most systems most of the TCP established/time-wait hash buckets are empty.
When walking the hash table for /proc/net/tcp their read locks would
always be aquired just to find out they're empty. This patch changes the code
to check first if the buckets have any entries before taking the lock, which
is much cheaper than taking a lock. Since the hash tables are large
this makes a measurable difference on processing /proc/net/tcp,
especially on architectures with slow read_lock (e.g. PPC)
On a 2GB Core2 system here I see a time cat /proc/net/tcp > /dev/null
constently dropping from 0.44s to 0.4-0.8s system time with this change.
This is with mostly empty hash tables.
On systems with slower atomics (like P4 or POWER4) or larger hash tables
(more RAM) the difference is much higher.
This can be noticeable because there are some daemons around who regularly
scan /proc/net/tcp.
Original idea for this patch from Marcus Meissner, but redone by me.
Cc: meissner@suse.de
Signed-off-by: Andi Kleen <ak@suse.de>
---
net/ipv4/tcp_ipv4.c | 30 ++++++++++++++++++------------
1 file changed, 18 insertions(+), 12 deletions(-)
Index: linux/net/ipv4/tcp_ipv4.c
===================================================================
--- linux.orig/net/ipv4/tcp_ipv4.c
+++ linux/net/ipv4/tcp_ipv4.c
@@ -2039,6 +2039,12 @@ static void *listening_get_idx(struct se
return rc;
}
+static inline int empty_bucket(struct tcp_iter_state *st)
+{
+ return hlist_empty(&tcp_hashinfo.ehash[st->bucket].chain) &&
+ hlist_empty(&tcp_hashinfo.ehash[st->bucket].twchain);
+}
+
static void *established_get_first(struct seq_file *seq)
{
struct tcp_iter_state* st = seq->private;
@@ -2050,6 +2056,10 @@ static void *established_get_first(struc
struct inet_timewait_sock *tw;
rwlock_t *lock = inet_ehash_lockp(&tcp_hashinfo, st->bucket);
+ /* Lockless fast path for the common case of empty buckets */
+ if (empty_bucket(st))
+ continue;
+
read_lock_bh(lock);
sk_for_each(sk, node, &tcp_hashinfo.ehash[st->bucket].chain) {
if (sk->sk_family != st->family) {
@@ -2097,13 +2107,15 @@ get_tw:
read_unlock_bh(inet_ehash_lockp(&tcp_hashinfo, st->bucket));
st->state = TCP_SEQ_STATE_ESTABLISHED;
- if (++st->bucket < tcp_hashinfo.ehash_size) {
- read_lock_bh(inet_ehash_lockp(&tcp_hashinfo, st->bucket));
- sk = sk_head(&tcp_hashinfo.ehash[st->bucket].chain);
- } else {
- cur = NULL;
- goto out;
- }
+ /* Look for next non empty bucket */
+ while (++st->bucket < tcp_hashinfo.ehash_size &&
+ empty_bucket(st))
+ ;
+ if (st->bucket >= tcp_hashinfo.ehash_size)
+ return NULL;
+
+ read_lock_bh(inet_ehash_lockp(&tcp_hashinfo, st->bucket));
+ sk = sk_head(&tcp_hashinfo.ehash[st->bucket].chain);
} else
sk = sk_next(sk);
next prev parent reply other threads:[~2008-08-27 12:42 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bug-459782-176318@bugzilla.redhat.com>
[not found] ` <200808261549.m7QFnVUN032543@bz-web1.app.phx.redhat.com>
2008-08-26 16:37 ` cat /proc/net/tcp takes 0.5 seconds on x86_64 Dave Jones
2008-08-26 18:32 ` Eric Dumazet
2008-08-26 19:01 ` Hans de Goede
2008-08-26 20:39 ` Eric Dumazet
2008-08-26 20:58 ` Hans de Goede
2008-08-26 21:27 ` Eric Dumazet
2008-08-27 9:14 ` Hans de Goede
2008-08-27 9:05 ` David Miller
2008-08-27 9:45 ` Hans de Goede
2008-08-27 9:39 ` David Miller
2008-08-27 4:19 ` Herbert Xu
2008-08-27 9:07 ` Hans de Goede
2008-08-27 12:41 ` Andi Kleen [this message]
2008-08-27 21:29 ` Trent Piepho
2008-08-27 21:47 ` Andi Kleen
2008-08-27 22:54 ` Andi Kleen
2008-08-27 21:29 ` David Miller
2008-08-27 21:48 ` Stephen Hemminger
2008-08-27 22:09 ` David Miller
2008-08-28 6:20 ` Eric Dumazet
2008-08-28 6:51 ` David Miller
2008-08-28 7:13 ` Eric Dumazet
2008-08-28 7:57 ` David Miller
2008-08-28 9:52 ` Eric Dumazet
2008-08-28 7:26 ` Andi Kleen
2008-08-27 22:34 ` Andi Kleen
2008-08-27 22:39 ` David Miller
2008-08-27 22:57 ` Andi Kleen
2008-08-27 23:07 ` David Miller
2008-08-27 23:09 ` Eric Dumazet
2008-08-27 23:15 ` David Miller
2008-08-27 23:35 ` Andi Kleen
2008-08-27 23:43 ` Eric Dumazet
2008-08-27 23:45 ` David Miller
2008-08-28 0:40 ` Eric Dumazet
2008-08-28 7:45 ` Andi Kleen
2008-08-28 7:59 ` David Miller
2008-08-28 8:12 ` Hans de Goede
2008-08-28 8:04 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zlmyr5nz.fsf@basil.nowhere.org \
--to=andi@firstfloor.org \
--cc=davej@redhat.com \
--cc=j.w.r.degoede@hhs.nl \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).