netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Oleg Nesterov <oleg@redhat.com>, Jiri Olsa <jolsa@redhat.com>,
	Ingo Molnar <mingo@elte.hu>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	fbl@redhat.com, nhorman@redhat.com, davem@redhat.com,
	htejun@gmail.com, jarkao2@gmail.com, davidel@xmailserver.org
Subject: Re: [PATCHv5 2/2] memory barrier: adding smp_mb__after_lock
Date: Tue, 7 Jul 2009 19:28:11 -0400	[thread overview]
Message-ID: <20090707232811.GC19217@Krystal> (raw)
In-Reply-To: <4A53CFDC.6080005@gmail.com>

* Eric Dumazet (eric.dumazet@gmail.com) wrote:
> Mathieu Desnoyers a écrit :
> > * Peter Zijlstra (a.p.zijlstra@chello.nl) wrote:
> >> On Tue, 2009-07-07 at 17:44 +0200, Oleg Nesterov wrote:
> >>> On 07/07, Mathieu Desnoyers wrote:
> >>>> Actually, thinking about it more, to appropriately support x86, as well
> >>>> as powerpc, arm and mips, we would need something like:
> >>>>
> >>>> read_lock_smp_mb()
> >>>>
> >>>> Which would be a read_lock with an included memory barrier.
> >>> Then we need read_lock_irq_smp_mb, read_lock_irqsave__smp_mb, write_lock_xxx,
> >>> otherwise it is not clear why only read_lock() has _smp_mb() version.
> >>>
> >>> The same for spin_lock_xxx...
> >> At which time the smp_mb__{before,after}_{un,}lock become attractive
> >> again.
> >>
> > 
> > Then having a new __read_lock() (without acquire semantic) which would
> > be required to be followed by a smp__mb_after_lock() would make sense. I
> > think this would fit all of x86, powerpc, arm, mips without having to
> > create tons of new primitives. Only "simpler" ones that clearly separate
> > locking from memory barriers.
> > 
> 
> Hmm... On x86, read_lock() is :
> 
> 	lock subl $0x1,(%eax)
> 	jns   .Lok
> 	call	__read_lock_failed
> .Lok:	ret
> 
> 
> What would be __read_lock() ? I cant see how it could *not* use lock prefix
> actually and or being cheaper...
> 

(I'll use read_lock_noacquire() instead of __read_lock() because
__read_lock() is already used for low-level primitives and will produce
name clashes. But I recognise that noacquire is just an ugly name.)

Here, a __read_lock_noacquire _must_ be followed by a
smp__mb_after_lock(), and a __read_unlock_norelease() _must_ be
preceded by a smp__mb_before_unlock().

x86 :

#define __read_lock_noacquire	read_lock
/* Assumes all __*_lock_noacquire primitives act as a full smp_mb() */
#define smp__mb_after_lock()

/* Assumes all __*_unlock_norelease primitives act as a full smp_mb() */
#define smp__mb_before_unlock()
#define __read_unlock_norelease	read_unlock

it's that easy :-)


however, on powerpc, we have to stop and think about it a bit more:

quoting http://www.linuxjournal.com/article/8212

"lwsync, or lightweight sync, orders loads with respect to subsequent
loads and stores, and it also orders stores. However, it does not order
stores with respect to subsequent loads. Interestingly enough, the
lwsync instruction enforces the same ordering as does the zSeries and,
coincidentally, the SPARC TSO."

static inline long __read_trylock_noacquire(raw_rwlock_t *rw)
{
        long tmp;

        __asm__ __volatile__(
"1:     lwarx           %0,0,%1\n"
        __DO_SIGN_EXTEND
"       addic.          %0,%0,1\n\
        ble-            2f\n"
        PPC405_ERR77(0,%1)
"       stwcx.          %0,0,%1\n\
        bne-            1b\n\
        /* isync\n\ Removed the isync because following smp_mb (sync
         * instruction) includes a core synchronizing barrier. */
2:"     : "=&r" (tmp)
        : "r" (&rw->lock)
        : "cr0", "xer", "memory");

        return tmp;
}

#define smp__mb_after_lock()	smp_mb()


#define smp__mb_before_unlock()	smp_mb()

static inline void __raw_read_unlock_norelease(raw_rwlock_t *rw)
{
        long tmp;

        __asm__ __volatile__(
        "# read_unlock\n\t"
        /* LWSYNC_ON_SMP -------- can be removed, replace by prior
         * smp_mb() */
"1:     lwarx           %0,0,%1\n\
        addic           %0,%0,-1\n"
        PPC405_ERR77(0,%1)
"       stwcx.          %0,0,%1\n\
        bne-            1b"
        : "=&r"(tmp)
        : "r"(&rw->lock)
        : "cr0", "xer", "memory");
}

I assume here that lwarx/stwcx pairs for different addresses cannot be
reordered with other pairs. If they can, then we already have a problem
with the current powerpc read lock implementation.

I just wrote this as an example to show how this could become a
performance improvement on architectures different than x86. The code
proposed above comes without warranty and should be tested with care. :)

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2009-07-07 23:28 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-03  8:12 [PATCHv5 0/2] net: fix race in the receive/select Jiri Olsa
2009-07-03  8:13 ` [PATCHv5 1/2] net: adding memory barrier to the poll and receive callbacks Jiri Olsa
2009-07-07 15:56   ` Eric Dumazet
2009-07-03  8:14 ` [PATCHv5 2/2] memory barrier: adding smp_mb__after_lock Jiri Olsa
2009-07-03  9:06   ` Ingo Molnar
2009-07-03  9:20     ` Eric Dumazet
2009-07-03  9:24       ` Ingo Molnar
2009-07-03  9:56         ` Jiri Olsa
2009-07-03 10:25           ` Ingo Molnar
2009-07-03 11:18             ` Jiri Olsa
2009-07-03 11:30               ` Jarek Poplawski
2009-07-03 11:43                 ` Jiri Olsa
2009-07-07 10:18               ` Jiri Olsa
2009-07-07 13:46                 ` Jiri Olsa
2009-07-07 14:01                   ` Mathieu Desnoyers
2009-07-07 14:34                     ` Oleg Nesterov
2009-07-07 15:04                       ` Mathieu Desnoyers
2009-07-07 15:44                         ` Oleg Nesterov
2009-07-07 15:50                           ` Peter Zijlstra
2009-07-07 19:45                             ` Mathieu Desnoyers
2009-07-07 22:44                               ` Eric Dumazet
2009-07-07 23:28                                 ` Mathieu Desnoyers [this message]
2009-07-07 23:51                                   ` Oleg Nesterov
2009-07-08  4:34                                     ` Mathieu Desnoyers
2009-07-08  7:18                                       ` Jarek Poplawski
2009-07-07 14:34                     ` Jiri Olsa
2009-07-07 14:42                     ` Eric Dumazet
2009-07-07 14:57                       ` Mathieu Desnoyers
2009-07-07 15:23                         ` Eric Dumazet
2009-07-08 17:47                           ` Jiri Olsa
2009-07-08 18:07                             ` David Miller
2009-07-08 18:16                               ` Jiri Olsa
2009-07-03 14:04     ` Mathieu Desnoyers
2009-07-03 15:29       ` Herbert Xu
2009-07-03 15:37         ` Eric Dumazet
2009-07-03 15:47           ` Mathieu Desnoyers
2009-07-03 17:06             ` Paul E. McKenney
2009-07-03 17:31               ` Mathieu Desnoyers
2009-07-03 15:40         ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090707232811.GC19217@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=a.p.zijlstra@chello.nl \
    --cc=davem@redhat.com \
    --cc=davidel@xmailserver.org \
    --cc=eric.dumazet@gmail.com \
    --cc=fbl@redhat.com \
    --cc=htejun@gmail.com \
    --cc=jarkao2@gmail.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@redhat.com \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).