From: Evgeniy Polyakov <zbr@ioremap.net>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>,
Herbert Xu <herbert@gondor.apana.org.au>,
berrange@redhat.com, et-mgmt-tools@redhat.com,
davem@davemloft.net, netdev@vger.kernel.org
Subject: Re: virt-manager broken by bind(0) in net-next.
Date: Sat, 31 Jan 2009 01:51:14 +0300 [thread overview]
Message-ID: <20090130225113.GA13977@ioremap.net> (raw)
In-Reply-To: <49837F7E.90306@cosmosbay.com>
On Fri, Jan 30, 2009 at 11:30:22PM +0100, Eric Dumazet (dada1@cosmosbay.com) wrote:
> > It should contain rough number of sockets, there is no need to be very
> > precise because of this hueristic.
>
> Denying there is a bug is... well... I dont know what to say.
>
> I wonder why we still use atomic_t all over the kernel.
It is not a bug. It is not supposed to be precise. At all.
I implemented a simple heuristic on when diferent bind port selection
algorithm should start: roughly when number of opened sockets equals to
some predefined value (sysctl at the moment, but it could be 64k or
anything else), so if that number is loosely maintained and does not
precisely corresponds to the number of sockets, it is not a problem.
You also saw 'again' lavel which has magic 5 number - it is another
heuristic - since lock is dropped atfer the bind bucket check, and we
selected it, it is possible that non-reuse socket will be added into the
bucket, so we will have to rerun the process again. I limited this to
the 5 attempts only, since it is better than what we have right now (I
never saw more than 2 attempts needed in the tests), when number of
bound sockets does not exceed 64k.
> > I used free alignment slot so that socket structure would not be
> > icreased.
>
> Are you kidding ?
>
> bsockets is not part of socket structure, but part of "struct inet_hashinfo",
Yes, I mistyped.
> shared by all cpus and accessed several thousand times per second on many
> machines.
>
> Please read the comment three lines after 'the free alignemnt slot'
> you chose.... You just introduced one write on a cache line
> that is supposed to *not* be written.
I have no objection on moving this anywhere at the end of the structure
like after bind_bucket_cachep.
--- ./include/net/inet_hashtables.h~ 2009-01-19 22:19:11.000000000 +0300
+++ ./include/net/inet_hashtables.h 2009-01-31 01:48:21.000000000 +0300
@@ -134,7 +134,6 @@
struct inet_bind_hashbucket *bhash;
unsigned int bhash_size;
- int bsockets;
struct kmem_cache *bind_bucket_cachep;
@@ -148,6 +147,7 @@
* table where wildcard'd TCP sockets can exist. Hash function here
* is just local port number.
*/
+ int bsockets;
struct inet_listen_hashbucket listening_hash[INET_LHTABLE_SIZE]
____cacheline_aligned_in_smp;
--- ./net/ipv4/inet_connection_sock.c~ 2009-01-19 22:21:08.000000000 +0300
+++ ./net/ipv4/inet_connection_sock.c 2009-01-31 01:50:20.000000000 +0300
@@ -172,7 +172,8 @@
} else {
ret = 1;
if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb)) {
- if (sk->sk_reuse && sk->sk_state != TCP_LISTEN && --attempts >= 0) {
+ if (sk->sk_reuse && sk->sk_state != TCP_LISTEN &&
+ smallest_size != -1 && --attempts >= 0) {
spin_unlock(&head->lock);
goto again;
}
--
Evgeniy Polyakov
next prev parent reply other threads:[~2009-01-30 22:51 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20090128212114.38be3e8c@extreme>
[not found] ` <20090129103544.GC22110@redhat.com>
2009-01-30 5:35 ` virt-manager broken by bind(0) in net-next Stephen Hemminger
2009-01-30 8:16 ` Evgeniy Polyakov
[not found] ` <20090130081600.GA2717-i6C2adt8DTjR7s880joybQ@public.gmane.org>
2009-01-30 10:27 ` Daniel P. Berrange
2009-01-30 11:21 ` Evgeniy Polyakov
2009-01-30 12:53 ` Herbert Xu
[not found] ` <20090130125337.GA7155-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-01-30 17:57 ` Stephen Hemminger
2009-01-30 18:41 ` Eric Dumazet
2009-01-30 21:50 ` Evgeniy Polyakov
2009-01-30 22:30 ` Eric Dumazet
2009-01-30 22:51 ` Evgeniy Polyakov [this message]
[not found] ` <20090130225113.GA13977-i6C2adt8DTjR7s880joybQ@public.gmane.org>
2009-01-31 0:36 ` Stephen Hemminger
2009-01-31 8:35 ` Evgeniy Polyakov
2009-01-31 2:52 ` Stephen Hemminger
2009-01-31 8:37 ` Evgeniy Polyakov
2009-01-31 9:17 ` Eric Dumazet
2009-01-31 9:31 ` Evgeniy Polyakov
2009-01-31 9:49 ` Eric Dumazet
2009-01-31 9:56 ` Evgeniy Polyakov
2009-01-31 10:17 ` Eric Dumazet
2009-02-01 12:42 ` Evgeniy Polyakov
2009-02-01 16:12 ` Eric Dumazet
2009-02-01 17:40 ` Evgeniy Polyakov
2009-02-01 20:31 ` David Miller
[not found] ` <20090130215008.GB12210-i6C2adt8DTjR7s880joybQ@public.gmane.org>
2009-02-01 5:58 ` Stephen Hemminger
2009-02-01 9:07 ` David Miller
2009-02-01 12:44 ` Evgeniy Polyakov
[not found] ` <498349F7.4050300-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2009-02-01 5:29 ` Stephen Hemminger
2009-01-30 6:50 ` Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090130225113.GA13977@ioremap.net \
--to=zbr@ioremap.net \
--cc=berrange@redhat.com \
--cc=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=et-mgmt-tools@redhat.com \
--cc=herbert@gondor.apana.org.au \
--cc=netdev@vger.kernel.org \
--cc=shemminger@vyatta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).