* [PATCH] net/core/sock.c remove extra wakeup
@ 2008-05-29 16:08 Patrick Mullaney
2008-05-30 9:52 ` David Miller
0 siblings, 1 reply; 7+ messages in thread
From: Patrick Mullaney @ 2008-05-29 16:08 UTC (permalink / raw)
To: davem; +Cc: Gregory Haskins, netdev
From: Patrick Mullaney <pmullaney@novell.com>
We have noticed an extra wakeup of a task waiting on a udp receive
while testing with CONFIG_PREEMPT_RT. The following patch
eliminates the source of the extra wakeup by only waking a task
if it is waiting on the writable condition(the SOCK_NOSPACE
bit has been set). The patch is against 2.6.25.
Signed-off-by: Patrick Mullaney <pmullaney@novell.com>
---
net/core/sock.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/net/core/sock.c b/net/core/sock.c
index 7a0567b..df08fb2 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1628,12 +1628,17 @@ static void sock_def_readable(struct sock *sk, int len)
static void sock_def_write_space(struct sock *sk)
{
+ /* don't wake unless writer is waiting */
+ if(!test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))
+ return;
+
read_lock(&sk->sk_callback_lock);
/* Do not wake up a writer until he can make "significant"
* progress. --DaveM
*/
if ((atomic_read(&sk->sk_wmem_alloc) << 1) <= sk->sk_sndbuf) {
+ clear_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
wake_up_interruptible_sync(sk->sk_sleep);
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] net/core/sock.c remove extra wakeup
2008-05-29 16:08 [PATCH] net/core/sock.c remove extra wakeup Patrick Mullaney
@ 2008-05-30 9:52 ` David Miller
0 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2008-05-30 9:52 UTC (permalink / raw)
To: pmullaney; +Cc: GHaskins, netdev
From: "Patrick Mullaney" <pmullaney@novell.com>
Date: Thu, 29 May 2008 10:08:43 -0600
> We have noticed an extra wakeup of a task waiting on a udp receive
> while testing with CONFIG_PREEMPT_RT. The following patch
> eliminates the source of the extra wakeup by only waking a task
> if it is waiting on the writable condition(the SOCK_NOSPACE
> bit has been set). The patch is against 2.6.25.
>
> Signed-off-by: Patrick Mullaney <pmullaney@novell.com>
This looks Ok on the surface, but I'm really worried about
races.
If you clear that bit, another thread which set that bit
and already checked the socket space, might sleep and never
wake up.
I think that is fundamentally why we don't try to optimize
this in that way.
And why is this so expensive? Once the first wakeup occurs,
the subsequent attemps will merely see that the wait queue
is empty and do nothing.
The only cost is the read lock on the callback lock, and that
is about as expensive as this new atomic operation you are
adding to clear the SOCK_NOSPACE bit.
So the benefit is not clear, and neither is the correctness of
this change.
Sorry.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net/core/sock.c remove extra wakeup
[not found] <483F99090200005A00037FFE@sinclair.provo.novell.com>
@ 2008-05-30 10:05 ` Gregory Haskins
2008-06-11 6:46 ` Herbert Xu
0 siblings, 1 reply; 7+ messages in thread
From: Gregory Haskins @ 2008-05-30 10:05 UTC (permalink / raw)
To: davem, Patrick Mullaney; +Cc: netdev
Sorry for the top post (on a phone)
Point taken that the fix needs to be analysed for races, but the problem is very real. We noticed that our udp netperf test would get two wake-ups per packet. I don't remember the exact path where the udp waits occur off the top of my head, but the fundamental problem is the overloaded use of the single wait-queue.
So basically it was waiting for a packet to arrive, and the net-rx softirq would first trigger a NOSPACE wakeup (of which was a noop by the udp-rx code) followed by an rx-wakeup. So the netperf thread woke twice for one packet. Fixing this boosted performance by a large margin (don't recall exact figures off the top of my head....somewhere around 20%)
Long stort short, its potentially worth fixing this, even if this patch isn't "it" :)
HTH
Regards
-Greg
-----Original Message-----
From: David Miller <davem@davemloft.net>
To: Patrick Mullaney <PMullaney@novell.com>
Cc: Gregory Haskins <GHaskins@novell.com>
Cc: <netdev@vger.kernel.org>
Sent: 5/30/2008 3:52:54 AM
Subject: Re: [PATCH] net/core/sock.c remove extra wakeup
From: "Patrick Mullaney" <pmullaney@novell.com>
Date: Thu, 29 May 2008 10:08:43 -0600
> We have noticed an extra wakeup of a task waiting on a udp receive
> while testing with CONFIG_PREEMPT_RT. The following patch
> eliminates the source of the extra wakeup by only waking a task
> if it is waiting on the writable condition(the SOCK_NOSPACE
> bit has been set). The patch is against 2.6.25.
>
> Signed-off-by: Patrick Mullaney <pmullaney@novell.com>
This looks Ok on the surface, but I'm really worried about
races.
If you clear that bit, another thread which set that bit
and already checked the socket space, might sleep and never
wake up.
I think that is fundamentally why we don't try to optimize
this in that way.
And why is this so expensive? Once the first wakeup occurs,
the subsequent attemps will merely see that the wait queue
is empty and do nothing.
The only cost is the read lock on the callback lock, and that
is about as expensive as this new atomic operation you are
adding to clear the SOCK_NOSPACE bit.
So the benefit is not clear, and neither is the correctness of
this change.
Sorry.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net/core/sock.c remove extra wakeup
[not found] ` <483F990C0200005A00038001@lucius.provo.novell.com>
@ 2008-05-30 17:00 ` Patrick Mullaney
2008-06-01 21:03 ` Andi Kleen
0 siblings, 1 reply; 7+ messages in thread
From: Patrick Mullaney @ 2008-05-30 17:00 UTC (permalink / raw)
To: Gregory Haskins; +Cc: davem, netdev
Not sure if Greg's post made it to the list but I would like to
build on what he said(for those who didn't see it - see below).
Point taken on the potential for a race, it was considered - I wanted to
get feedback from the list to see if it was an issue. There is a
somewhat similar looking piece of code in stream.c that clears this
bit as well in a similar spot(sk_stream_write_space). Perhaps this is
because stream apps must have a single thread per socket and this
is not necessarily the case with datagram apps?
The wakeup is expensive because it is completely unnecessary - the
thread being woke has nothing to do. If this is a high priority thread,
which is part of the reason I mentioned that we are testing on a
PREEMPT_RT enabled kernel, this will cause undue jitter and degrading
of performance(as Greg mentioned we are seeing a 20% degradation in
loaded situations). Even in the case where it is sched_other, we see
significant degradation.
Would a better approach be?
1) clear the bit when the waiter returns(bottom of sock_wait_for_wmem)
2) add a new wait-queue to the socket to separate the writers
3) another idea?
I'm having a hard time understanding the use of the NOSPACE flag in
sock.c. It seems like it currently gets set but never gets cleared. I
must be missing something.
Thanks.
Pat
On Fri, 2008-05-30 at 10:05 +0000, Gregory Haskins wrote:
> Sorry for the top post (on a phone)
>
> Point taken that the fix needs to be analysed for races, but the problem is very real.
> We noticed that our udp netperf test would get two wake-ups per packet.
> I don't remember the exact path where the udp waits occur off the top of my head,
> but the fundamental problem is the overloaded use of the single wait-queue.
>
> So basically it was waiting for a packet to arrive, and the net-rx softirq would first trigger
> a NOSPACE wakeup (of which was a noop by the udp-rx code) followed by an rx-wakeup.
> So the netperf thread woke twice for one packet. Fixing this boosted performance by a large
> margin (don't recall exact figures off the top of my head....somewhere around 20%)
>
> Long stort short, its potentially worth fixing this, even if this patch isn't "it" :)
>
> HTH
>
> Regards
> -Greg
>
> -----Original Message-----
> From: David Miller <davem@davemloft.net>
> To: Patrick Mullaney <PMullaney@novell.com>
> Cc: Gregory Haskins <GHaskins@novell.com>
> Cc: <netdev@vger.kernel.org>
>
> Sent: 5/30/2008 3:52:54 AM
> Subject: Re: [PATCH] net/core/sock.c remove extra wakeup
>
> From: "Patrick Mullaney" <pmullaney@novell.com>
> Date: Thu, 29 May 2008 10:08:43 -0600
>
> > We have noticed an extra wakeup of a task waiting on a udp receive
> > while testing with CONFIG_PREEMPT_RT. The following patch
> > eliminates the source of the extra wakeup by only waking a task
> > if it is waiting on the writable condition(the SOCK_NOSPACE
> > bit has been set). The patch is against 2.6.25.
> >
> > Signed-off-by: Patrick Mullaney <pmullaney@novell.com>
>
> This looks Ok on the surface, but I'm really worried about
> races.
>
> If you clear that bit, another thread which set that bit
> and already checked the socket space, might sleep and never
> wake up.
>
> I think that is fundamentally why we don't try to optimize
> this in that way.
>
> And why is this so expensive? Once the first wakeup occurs,
> the subsequent attemps will merely see that the wait queue
> is empty and do nothing.
>
> The only cost is the read lock on the callback lock, and that
> is about as expensive as this new atomic operation you are
> adding to clear the SOCK_NOSPACE bit.
>
> So the benefit is not clear, and neither is the correctness of
> this change.
>
> Sorry.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net/core/sock.c remove extra wakeup
2008-05-30 17:00 ` Patrick Mullaney
@ 2008-06-01 21:03 ` Andi Kleen
0 siblings, 0 replies; 7+ messages in thread
From: Andi Kleen @ 2008-06-01 21:03 UTC (permalink / raw)
To: Patrick Mullaney; +Cc: Gregory Haskins, davem, netdev
"Patrick Mullaney" <pmullaney@novell.com> writes:
>
> Would a better approach be?
>
> 1) clear the bit when the waiter returns(bottom of sock_wait_for_wmem)
> 2) add a new wait-queue to the socket to separate the writers
> 3) another idea?
A large reason this is tricky is because it is essentially lockless. So
alternative would be:
4) you add a spinlock which is always taken around the various condition
checks and bit manipulation. Not 100% sure if it would be bad to
general SMP scalability (you would need to ensure that the bouncing
lock cache line isn't a performance problem), but at least it would
be much safer and makes it much easier to verify your change is correct.
Actually the wait queue already has such a lock, so it might not be
that expensive if you switch the wait queue to a lockless version.
-Andi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net/core/sock.c remove extra wakeup
2008-05-30 10:05 ` Gregory Haskins
@ 2008-06-11 6:46 ` Herbert Xu
2008-06-16 3:48 ` Gregory Haskins
0 siblings, 1 reply; 7+ messages in thread
From: Herbert Xu @ 2008-06-11 6:46 UTC (permalink / raw)
To: Gregory Haskins; +Cc: davem, PMullaney, netdev
Gregory Haskins <GHaskins@novell.com> wrote:
>
> So basically it was waiting for a packet to arrive, and the net-rx softirq would first trigger a NOSPACE wakeup (of which was a noop by the udp-rx code) followed by an rx-wakeup. So the netperf thread woke twice for one packet. Fixing this boosted performance by a large margin (don't recall exact figures off the top of my head....somewhere around 20%)
Please wrap your lines.
Anyway, you've lost me with this scenario. The patch is stopping
wake-ups on the write path yet you're talking about the read path.
So my question is why are you getting the extra wake-up on that
write path when you receive a packet?
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net/core/sock.c remove extra wakeup
2008-06-11 6:46 ` Herbert Xu
@ 2008-06-16 3:48 ` Gregory Haskins
0 siblings, 0 replies; 7+ messages in thread
From: Gregory Haskins @ 2008-06-16 3:48 UTC (permalink / raw)
To: Herbert Xu; +Cc: davem, Patrick Mullaney, netdev
>>> On Wed, Jun 11, 2008 at 2:46 AM, in message
<E1K6K60-00061s-00@gondolin.me.apana.org.au>, Herbert Xu
<herbert@gondor.apana.org.au> wrote:
> Gregory Haskins <GHaskins@novell.com> wrote:
>>
>> So basically it was waiting for a packet to arrive, and the net-rx softirq
> would first trigger a NOSPACE wakeup (of which was a noop by the udp-rx code)
> followed by an rx-wakeup. So the netperf thread woke twice for one packet.
> Fixing this boosted performance by a large margin (don't recall exact figures
> off the top of my head....somewhere around 20%)
>
> Please wrap your lines.
Apologies. I don't think I have control over it in my corporate mailer environment :( but if I can figure out a way to do it I will make this change. You are not the first to complain ;)
>
> Anyway, you've lost me with this scenario. The patch is stopping
> wake-ups on the write path yet you're talking about the read path.
> So my question is why are you getting the extra wake-up on that
> write path when you receive a packet?
Heh..you tell me :)
I don't recall the specifics off the top of my head, but IIRC it had to do with the fact that we were running a tx/rx netperf test and the way tx-completions are handled in the stack w.r.t. the softirqs.
The tx-completions code is executing inside the softirq-net-rx context and is freeing up the transmitted skbs. The problem is that it is blindly signaling the wait-queue that there was now SPACE (even though no-one cared about that event at the moment...the only waiter was waiting on an rx event).
So one way to look at it is the problem is that the single wait-queue serves both rx-wakeup, and tx-nospace events. Rather than open the can of worms of splitting the wait-queues into finer granularity, Pat chose to at least make the overloaded use of the single wait-queue more intelligent to know if anyone cared about the NOSPACE event or not.
>From my perspective, I don't care how the issue is specifically solved...I would ultimately just like to see this wasted wake-up "go away" one way or the other. ;) It significantly degrades performance (at least in this synthetic test) and as far as I can tell there is no reason to structure the code this way. If we are mistaken in that assessment, please let me know.
If we can offer any more details (such as the specific codepath that triggers the NOSPACE wakeup, for instance) we would be happy to oblige. Long story short this is easily repeatable.
Regards,
-Greg
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-06-16 3:48 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-29 16:08 [PATCH] net/core/sock.c remove extra wakeup Patrick Mullaney
2008-05-30 9:52 ` David Miller
[not found] <483F99090200005A00037FFE@sinclair.provo.novell.com>
2008-05-30 10:05 ` Gregory Haskins
2008-06-11 6:46 ` Herbert Xu
2008-06-16 3:48 ` Gregory Haskins
[not found] <483F99090200005A00037FFE@lucius.provo.novell.com>
[not found] ` <483F990C0200005A00038001@lucius.provo.novell.com>
2008-05-30 17:00 ` Patrick Mullaney
2008-06-01 21:03 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).