From: Eric Dumazet <eric.dumazet@gmail.com>
To: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jiri Olsa <jolsa@redhat.com>, Ingo Molnar <mingo@elte.hu>,
Eric Dumazet <eric.dumazet@gmail.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
fbl@redhat.com, nhorman@redhat.com, davem@redhat.com,
htejun@gmail.com, jarkao2@gmail.com, oleg@redhat.com,
davidel@xmailserver.org
Subject: Re: [PATCHv5 2/2] memory barrier: adding smp_mb__after_lock
Date: Tue, 07 Jul 2009 16:42:01 +0200 [thread overview]
Message-ID: <4A535EB9.2020406@gmail.com> (raw)
In-Reply-To: <20090707140135.GA5506@Krystal>
Mathieu Desnoyers a écrit :
> * Jiri Olsa (jolsa@redhat.com) wrote:
>> On Tue, Jul 07, 2009 at 12:18:16PM +0200, Jiri Olsa wrote:
>>> On Fri, Jul 03, 2009 at 01:18:48PM +0200, Jiri Olsa wrote:
>>>> On Fri, Jul 03, 2009 at 12:25:30PM +0200, Ingo Molnar wrote:
>>>>> * Jiri Olsa <jolsa@redhat.com> wrote:
>>>>>
>>>>>> On Fri, Jul 03, 2009 at 11:24:38AM +0200, Ingo Molnar wrote:
>>>>>>> * Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>>>>
>>>>>>>> Ingo Molnar a écrit :
>>>>>>>>> * Jiri Olsa <jolsa@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>> +++ b/arch/x86/include/asm/spinlock.h
>>>>>>>>>> @@ -302,4 +302,7 @@ static inline void __raw_write_unlock(raw_rwlock_t *rw)
>>>>>>>>>> #define _raw_read_relax(lock) cpu_relax()
>>>>>>>>>> #define _raw_write_relax(lock) cpu_relax()
>>>>>>>>>>
>>>>>>>>>> +/* The {read|write|spin}_lock() on x86 are full memory barriers. */
>>>>>>>>>> +#define smp_mb__after_lock() do { } while (0)
>>>>>>>>> Two small stylistic comments, please make this an inline function:
>>>>>>>>>
>>>>>>>>> static inline void smp_mb__after_lock(void) { }
>>>>>>>>> #define smp_mb__after_lock
>>>>>>>>>
>>>>>>>>> (untested)
>>>>>>>>>
>>>>>>>>>> +/* The lock does not imply full memory barrier. */
>>>>>>>>>> +#ifndef smp_mb__after_lock
>>>>>>>>>> +#define smp_mb__after_lock() smp_mb()
>>>>>>>>>> +#endif
>>>>>>>>> ditto.
>>>>>>>>>
>>>>>>>>> Ingo
>>>>>>>> This was following existing implementations of various smp_mb__??? helpers :
>>>>>>>>
>>>>>>>> # grep -4 smp_mb__before_clear_bit include/asm-generic/bitops.h
>>>>>>>>
>>>>>>>> /*
>>>>>>>> * clear_bit may not imply a memory barrier
>>>>>>>> */
>>>>>>>> #ifndef smp_mb__before_clear_bit
>>>>>>>> #define smp_mb__before_clear_bit() smp_mb()
>>>>>>>> #define smp_mb__after_clear_bit() smp_mb()
>>>>>>>> #endif
>>>>>>> Did i mention that those should be fixed too? :-)
>>>>>>>
>>>>>>> Ingo
>>>>>> ok, could I include it in the 2/2 or you prefer separate patch?
>>>>> depends on whether it will regress ;-)
>>>>>
>>>>> If it regresses, it's better to have it separate. If it wont, it can
>>>>> be included. If unsure, default to the more conservative option.
>>>>>
>>>>> Ingo
>>>>
>>>> how about this..
>>>> and similar change for smp_mb__before_clear_bit in a separate patch
>>>>
>>>>
>>>> diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
>>>> index b7e5db8..4e77853 100644
>>>> --- a/arch/x86/include/asm/spinlock.h
>>>> +++ b/arch/x86/include/asm/spinlock.h
>>>> @@ -302,4 +302,8 @@ static inline void __raw_write_unlock(raw_rwlock_t *rw)
>>>> #define _raw_read_relax(lock) cpu_relax()
>>>> #define _raw_write_relax(lock) cpu_relax()
>>>>
>>>> +/* The {read|write|spin}_lock() on x86 are full memory barriers. */
>>>> +static inline void smp_mb__after_lock(void) { }
>>>> +#define ARCH_HAS_SMP_MB_AFTER_LOCK
>>>> +
>>>> #endif /* _ASM_X86_SPINLOCK_H */
>>>> diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
>>>> index 252b245..4be57ab 100644
>>>> --- a/include/linux/spinlock.h
>>>> +++ b/include/linux/spinlock.h
>>>> @@ -132,6 +132,11 @@ do { \
>>>> #endif /*__raw_spin_is_contended*/
>>>> #endif
>>>>
>>>> +/* The lock does not imply full memory barrier. */
>>>> +#ifndef ARCH_HAS_SMP_MB_AFTER_LOCK
>>>> +static inline void smp_mb__after_lock(void) { smp_mb(); }
>>>> +#endif
>>>> +
>>>> /**
>>>> * spin_unlock_wait - wait until the spinlock gets unlocked
>>>> * @lock: the spinlock in question.
>>>> diff --git a/include/net/sock.h b/include/net/sock.h
>>>> index 4eb8409..98afcd9 100644
>>>> --- a/include/net/sock.h
>>>> +++ b/include/net/sock.h
>>>> @@ -1271,6 +1271,9 @@ static inline int sk_has_allocations(const struct sock *sk)
>>>> * in its cache, and so does the tp->rcv_nxt update on CPU2 side. The CPU1
>>>> * could then endup calling schedule and sleep forever if there are no more
>>>> * data on the socket.
>>>> + *
>>>> + * The sk_has_helper is always called right after a call to read_lock, so we
>>>> + * can use smp_mb__after_lock barrier.
>>>> */
>>>> static inline int sk_has_sleeper(struct sock *sk)
>>>> {
>>>> @@ -1280,7 +1283,7 @@ static inline int sk_has_sleeper(struct sock *sk)
>>>> *
>>>> * This memory barrier is paired in the sock_poll_wait.
>>>> */
>>>> - smp_mb();
>>>> + smp_mb__after_lock();
>>>> return sk->sk_sleep && waitqueue_active(sk->sk_sleep);
>>>> }
>>>>
>>> any feedback on this?
>>> I'd send v6 if this way is acceptable..
>>>
>>> thanks,
>>> jirka
>> also I checked the smp_mb__before_clear_bit/smp_mb__after_clear_bit and
>> it is used quite extensivelly.
>>
>> I'd prefer to send it in a separate patch, so we can move on with the
>> changes I've sent so far..
>>
>
> As with any optimization (and this is one that adds a semantic that will
> just grow the memory barrier/locking rule complexity), it should come
> with performance benchmarks showing real-life improvements.
>
> Otherwise I'd recommend sticking to smp_mb() if this execution path is
> not that critical, or to move to RCU if it's _that_ critical.
>
> A valid argument would be if the data structures protected are so
> complex that RCU is out of question but still the few cycles saved by
> removing a memory barrier are really significant. And even then, the
> proper solution would be more something like a
> __read_lock()+smp_mb+smp_mb+__read_unlock(), so we get the performance
> improvements on architectures other than x86 as well.
>
> So in all cases, I don't think the smp_mb__after_lock() is the
> appropriate solution.
RCU on this part is out of the question, as David already mentioned it.
It would be a regression for short lived tcp/udp sessions, and some workloads
use them a lot...
We gained about 20% performance between 2.6.26 and 2.6.31, carefuly removing
some atomic ops in network stack, adding RCU where it was sensible, but this
is a painful process, not something Jiri can use to fix bugs on legacy RedHat
kernels :) (We still are sorting out regressions)
To solve problem pointed by Jiri, we have to insert an smp_mb() at this point,
(not mentioning the other change in select() logic of course)
static void sock_def_readable(struct sock *sk, int len)
{
read_lock(&sk->sk_callback_lock);
+ smp_mb(); /* paired with opposite smp_mb() in sk poll logic */
if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
wake_up_interruptible_sync_poll(sk->sk_sleep, POLLIN |
POLLRDNORM | POLLRDBAND);
sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN);
read_unlock(&sk->sk_callback_lock);
}
As about every incoming packet calls this path, we should be very careful not
slowing down stack if not necessary.
On x86 this extra smp_mb() is not needed, since previous call to read_lock()
already gives the full barrier for free.
next prev parent reply other threads:[~2009-07-07 14:42 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-03 8:12 [PATCHv5 0/2] net: fix race in the receive/select Jiri Olsa
2009-07-03 8:13 ` [PATCHv5 1/2] net: adding memory barrier to the poll and receive callbacks Jiri Olsa
2009-07-07 15:56 ` Eric Dumazet
2009-07-03 8:14 ` [PATCHv5 2/2] memory barrier: adding smp_mb__after_lock Jiri Olsa
2009-07-03 9:06 ` Ingo Molnar
2009-07-03 9:20 ` Eric Dumazet
2009-07-03 9:24 ` Ingo Molnar
2009-07-03 9:56 ` Jiri Olsa
2009-07-03 10:25 ` Ingo Molnar
2009-07-03 11:18 ` Jiri Olsa
2009-07-03 11:30 ` Jarek Poplawski
2009-07-03 11:43 ` Jiri Olsa
2009-07-07 10:18 ` Jiri Olsa
2009-07-07 13:46 ` Jiri Olsa
2009-07-07 14:01 ` Mathieu Desnoyers
2009-07-07 14:34 ` Oleg Nesterov
2009-07-07 15:04 ` Mathieu Desnoyers
2009-07-07 15:44 ` Oleg Nesterov
2009-07-07 15:50 ` Peter Zijlstra
2009-07-07 19:45 ` Mathieu Desnoyers
2009-07-07 22:44 ` Eric Dumazet
2009-07-07 23:28 ` Mathieu Desnoyers
2009-07-07 23:51 ` Oleg Nesterov
2009-07-08 4:34 ` Mathieu Desnoyers
2009-07-08 7:18 ` Jarek Poplawski
2009-07-07 14:34 ` Jiri Olsa
2009-07-07 14:42 ` Eric Dumazet [this message]
2009-07-07 14:57 ` Mathieu Desnoyers
2009-07-07 15:23 ` Eric Dumazet
2009-07-08 17:47 ` Jiri Olsa
2009-07-08 18:07 ` David Miller
2009-07-08 18:16 ` Jiri Olsa
2009-07-03 14:04 ` Mathieu Desnoyers
2009-07-03 15:29 ` Herbert Xu
2009-07-03 15:37 ` Eric Dumazet
2009-07-03 15:47 ` Mathieu Desnoyers
2009-07-03 17:06 ` Paul E. McKenney
2009-07-03 17:31 ` Mathieu Desnoyers
2009-07-03 15:40 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A535EB9.2020406@gmail.com \
--to=eric.dumazet@gmail.com \
--cc=a.p.zijlstra@chello.nl \
--cc=davem@redhat.com \
--cc=davidel@xmailserver.org \
--cc=fbl@redhat.com \
--cc=htejun@gmail.com \
--cc=jarkao2@gmail.com \
--cc=jolsa@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@polymtl.ca \
--cc=mingo@elte.hu \
--cc=netdev@vger.kernel.org \
--cc=nhorman@redhat.com \
--cc=oleg@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).