Re: [PATCH 4/4] perf: Optimize perf_output_begin() -- weaker memory barrier

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: peterz@infradead.org, linux-arch@vger.kernel.org,
	geert@linux-m68k.org, torvalds@linux-foundation.org,
	VICTORK@il.ibm.com, oleg@redhat.com, anton@samba.org,
	benh@kernel.crashing.org, fweisbec@gmail.com,
	michael@ellerman.id.au, mikey@neuling.org,
	linux@arm.linux.org.uk, schwidefsky@de.ibm.com,
	heiko carstens <heiko.carstens@de.ibm.com>,
	tony luck <tony.luck@intel.com>
Subject: Re: [PATCH 4/4] perf: Optimize perf_output_begin() -- weaker memory barrier
Date: Thu, 7 Nov 2013 19:25:43 -0800	[thread overview]
Message-ID: <20131108032543.GB18245@linux.vnet.ibm.com> (raw)
In-Reply-To: <1435428554.62880.1383879251319.JavaMail.zimbra@efficios.com>

On Fri, Nov 08, 2013 at 02:54:11AM +0000, Mathieu Desnoyers wrote:
> ----- Original Message -----
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>
> > Cc: peterz@infradead.org, linux-arch@vger.kernel.org, geert@linux-m68k.org, torvalds@linux-foundation.org,
> > VICTORK@il.ibm.com, oleg@redhat.com, anton@samba.org, benh@kernel.crashing.org, fweisbec@gmail.com,
> > michael@ellerman.id.au, mikey@neuling.org, linux@arm.linux.org.uk, schwidefsky@de.ibm.com, "heiko carstens"
> > <heiko.carstens@de.ibm.com>, "tony luck" <tony.luck@intel.com>
> > Sent: Thursday, November 7, 2013 9:19:28 PM
> > Subject: Re: [PATCH 4/4] perf: Optimize perf_output_begin() -- weaker memory barrier
> > 
> > On Thu, Nov 07, 2013 at 04:16:17PM -0500, Mathieu Desnoyers wrote:
> > > * peterz@infradead.org (peterz@infradead.org) wrote:
> > > > Apply the fancy new smp_load_acquire() and smp_store_release() to
> > > > potentially avoid the full memory barrier in perf_output_begin().
> > > > 
> > > > On x86 (and other TSO like architectures) this removes all explicit
> > > > memory fences, on weakly ordered systems this often allows the use of
> > > > weaker barriers; in particular on powerpc we demote from a full sync
> > > > to a cheaper lwsync.
> > > > 
> > > > Cc: Tony Luck <tony.luck@intel.com>
> > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > > > Cc: Frederic Weisbecker <fweisbec@gmail.com>
> > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > > > Cc: Michael Ellerman <michael@ellerman.id.au>
> > > > Cc: Michael Neuling <mikey@neuling.org>
> > > > Cc: Russell King <linux@arm.linux.org.uk>
> > > > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > > > Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> > > > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > > > Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > > > Cc: Victor Kaplansky <VICTORK@il.ibm.com>
> > > > Suggested-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > > Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> > > > ---
> > > >  kernel/events/ring_buffer.c |   62
> > > >  +++++++++++++++++++++++++-------------------
> > > >  1 file changed, 36 insertions(+), 26 deletions(-)
> > > > 
> > > > --- a/kernel/events/ring_buffer.c
> > > > +++ b/kernel/events/ring_buffer.c
> > > > @@ -41,6 +41,32 @@ static void perf_output_get_handle(struc
> > > >  	handle->wakeup = local_read(&rb->wakeup);
> > > >  }
> > > >  
> > > > +/*
> > > > + * Our user visible data structure (struct perf_event_mmap_page) uses
> > > > + * u64 values for ->data_head and ->data_tail to avoid size variance
> > > > + * across 32/64 bit.
> > > > + *
> > > > + * Since you cannot mmap() a buffer larger than your memory address
> > > > space
> > > > + * we're naturally limited to unsigned long and can avoid writing the
> > > > + * high word on 32bit systems (its always 0)
> > > > + *
> > > > + * This allows us to always use a single load/store.
> > > > + */
> > > > +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> > > > +static inline unsigned long *low_word(u64 *ptr)
> > > > +{
> > > > +	return (unsigned long *)ptr;
> > > > +}
> > > > +#else /* __ORDER_BIG_ENDIAN__ */
> > > > +static inline unsigned long *low_word(u64 *ptr)
> > > > +{
> > > > +	void *_ptr = ptr;
> > > > +	_ptr += sizeof(u64);
> > > > +	_ptr -= sizeof(unsigned long);
> > > > +	return (unsigned long *)_ptr;
> > > > +}
> > > > +#endif
> > > > +
> > > >  static void perf_output_put_handle(struct perf_output_handle *handle)
> > > >  {
> > > >  	struct ring_buffer *rb = handle->rb;
> > > > @@ -61,28 +87,15 @@ static void perf_output_put_handle(struc
> > > >  	 *
> > > >  	 *   kernel				user
> > > >  	 *
> > > > -	 *   READ ->data_tail			READ ->data_head
> > > > -	 *   smp_mb()	(A)			smp_rmb()	(C)
> > > > +	 *   READ.acq ->data_tail  (A)		READ.acq ->data_head  (C)
> > > 
> > > I don't get how the barrier() in the TSO implementation of
> > > smp_load_acquire (A) orders the following writes to $data after the
> > > READ.acq of data_tail. I'm probably missing something.
> > > 
> > > Also, I don't get how the smp_load_acquire (C) with the barrier() (x86
> > > TSO) orders READ $data after the READ.acq of data_head.
> > > 
> > > I don't have the TSO model fresh in memory however.
> > 
> > TSO guarantees that earlier reads will not be reordered with later
> > writes, so only a barrier() is required.
> 
> OK, so this takes care of (A), but how about (C) ? It's about the order of two READ operations.

Same there.  The only thing that TSO does -not- order is a prior write
against a later read.

							Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> > 
> > 							Thanx, Paul
> > 
> > > >  	 *   WRITE $data			READ $data
> > > > -	 *   smp_wmb()	(B)			smp_mb()	(D)
> > > > -	 *   STORE ->data_head			WRITE ->data_tail
> > > > +	 *   STORE.rel ->data_head (B)		WRITE.rel ->data_tail (D)
> > > 
> > > You might want to choose either STORE or WRITE for consistency.
> > > 
> > > Thanks,
> > > 
> > > Mathieu
> > > 
> > > >  	 *
> > > >  	 * Where A pairs with D, and B pairs with C.
> > > >  	 *
> > > > -	 * I don't think A needs to be a full barrier because we won't in fact
> > > > -	 * write data until we see the store from userspace. So we simply don't
> > > > -	 * issue the data WRITE until we observe it. Be conservative for now.
> > > > -	 *
> > > > -	 * OTOH, D needs to be a full barrier since it separates the data READ
> > > > -	 * from the tail WRITE.
> > > > -	 *
> > > > -	 * For B a WMB is sufficient since it separates two WRITEs, and for C
> > > > -	 * an RMB is sufficient since it separates two READs.
> > > > -	 *
> > > >  	 * See perf_output_begin().
> > > >  	 */
> > > > -	smp_wmb();
> > > > -	rb->user_page->data_head = head;
> > > > +	smp_store_release(low_word(&rb->user_page->data_head), head);
> > > >  
> > > >  	/*
> > > >  	 * Now check if we missed an update -- rely on previous implied
> > > > @@ -139,7 +152,13 @@ int perf_output_begin(struct perf_output
> > > >  	perf_output_get_handle(handle);
> > > >  
> > > >  	do {
> > > > -		tail = ACCESS_ONCE(rb->user_page->data_tail);
> > > > +		tail = smp_load_acquire(low_word(&rb->user_page->data_tail));
> > > > +		/*
> > > > +		 * STORES of the data below cannot pass the ACQUIRE barrier.
> > > > +		 *
> > > > +		 * Matches with an smp_mb() or smp_store_release() in userspace
> > > > +		 * as described in perf_output_put_handle().
> > > > +		 */
> > > >  		offset = head = local_read(&rb->head);
> > > >  		if (!rb->overwrite &&
> > > >  		    unlikely(CIRC_SPACE(head, tail, perf_data_size(rb)) < size))
> > > > @@ -147,15 +166,6 @@ int perf_output_begin(struct perf_output
> > > >  		head += size;
> > > >  	} while (local_cmpxchg(&rb->head, offset, head) != offset);
> > > >  
> > > > -	/*
> > > > -	 * Separate the userpage->tail read from the data stores below.
> > > > -	 * Matches the MB userspace SHOULD issue after reading the data
> > > > -	 * and before storing the new tail position.
> > > > -	 *
> > > > -	 * See perf_output_put_handle().
> > > > -	 */
> > > > -	smp_mb();
> > > > -
> > > >  	if (unlikely(head - local_read(&rb->wakeup) > rb->watermark))
> > > >  		local_add(rb->watermark, &rb->wakeup);
> > > >  
> > > > 
> > > > 
> > > 
> > > --
> > > Mathieu Desnoyers
> > > EfficiOS Inc.
> > > http://www.efficios.com
> > > 
> > 
> > 
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
>

next prev parent reply	other threads:[~2013-11-08  3:25 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-07 22:03 [PATCH 0/4] arch: Introduce smp_load_acquire() and smp_store_release() peterz
2013-11-07 22:03 ` [PATCH 1/4] doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE peterz
2013-11-07 20:05   ` Mathieu Desnoyers
2013-11-07 22:03 ` [PATCH 2/4] arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h peterz
2013-11-07 20:22   ` Mathieu Desnoyers
2013-11-07 22:03 ` [PATCH 3/4] arch: Introduce smp_load_acquire(), smp_store_release() peterz
2013-11-07 21:03   ` Mathieu Desnoyers
2013-11-08  4:58     ` Paul Mackerras
2013-11-07 22:03 ` [PATCH 4/4] perf: Optimize perf_output_begin() -- weaker memory barrier peterz
2013-11-07 21:16   ` Mathieu Desnoyers
2013-11-08  2:19     ` Paul E. McKenney
2013-11-08  2:54       ` Mathieu Desnoyers
2013-11-08  3:13         ` Mathieu Desnoyers
2013-11-08  3:25         ` Paul E. McKenney [this message]
2013-11-08  7:21       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131108032543.GB18245@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=VICTORK@il.ibm.com \
    --cc=anton@samba.org \
    --cc=benh@kernel.crashing.org \
    --cc=fweisbec@gmail.com \
    --cc=geert@linux-m68k.org \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=michael@ellerman.id.au \
    --cc=mikey@neuling.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=tony.luck@intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.