Re: [ltt-dev] cli/sti vs local_cmpxchg and local_add_return

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <compudj@krystal.dyndns.org>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: ltt-dev@lists.casi.polymtl.ca, Ingo Molnar <mingo@elte.hu>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Josh Boyer <jwboyer@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [ltt-dev] cli/sti vs local_cmpxchg and local_add_return
Date: Wed, 18 Mar 2009 11:10:24 -0400	[thread overview]
Message-ID: <20090318151023.GA13272@Krystal> (raw)
In-Reply-To: <200903182243.34090.nickpiggin@yahoo.com.au>

* Nick Piggin (nickpiggin@yahoo.com.au) wrote:
> On Wednesday 18 March 2009 02:14:37 Mathieu Desnoyers wrote:
> > * Nick Piggin (nickpiggin@yahoo.com.au) wrote:
> > > On Tuesday 17 March 2009 12:32:20 Mathieu Desnoyers wrote:
> > > > Hi,
> > > >
> > > > I am trying to get access to some non-x86 hardware to run some atomic
> > > > primitive benchmarks for a paper on LTTng I am preparing. That should
> > > > be useful to argue about performance benefit of per-cpu atomic
> > > > operations vs interrupt disabling. I would like to run the following
> > > > benchmark module on CONFIG_SMP :
> > > >
> > > > - PowerPC
> > > > - MIPS
> > > > - ia64
> > > > - alpha
> > > >
> > > > usage :
> > > > make
> > > > insmod test-cmpxchg-nolock.ko
> > > > insmod: error inserting 'test-cmpxchg-nolock.ko': -1 Resource
> > > > temporarily unavailable dmesg (see dmesg output)
> > > >
> > > > If some of you would be kind enough to run my test module provided
> > > > below and provide the results of these tests on a recent kernel
> > > > (2.6.26~2.6.29 should be good) along with their cpuinfo, I would
> > > > greatly appreciate.
> > > >
> > > > Here are the CAS results for various Intel-based architectures :
> > > >
> > > > Architecture         | Speedup                      |      CAS     |
> > > >  Interrupts         |
> > > >
> > > >                      | (cli + sti) / local cmpxchg  | local | sync |
> > > >                      | Enable (sti) | Disable (cli)
> > > >
> > > > -----------------------------------------------------------------------
> > > >---- ---------------------- Intel Pentium 4      | 5.24                 
> > > >        | 25   | 81   | 70           | 61          | AMD Athlon(tm)64 X2
> > > >  | 4.57
> > > >
> > > >                     |  7    | 17   | 17           | 15          | Intel
> > > >
> > > > Core2          | 6.33                         |  6    | 30   | 20
> > > >
> > > > | 18          | Intel Xeon E5405     | 5.25                         | 
> > > > | 8 24   | 20           | 22          |
> > > >
> > > > The benefit expected on PowerPC, ia64 and alpha should principally come
> > > > from removed memory barriers in the local primitives.
> > >
> > > Benefit versus what? I think all of those architectures can do SMP
> > > atomic compare exchange sequences without barriers, can't they?
> >
> > Hi Nick,
> >
> > I want to compare if it is faster to use SMP cas without barriers to
> > perform synchronization of the tracing hot path wrt interrupts or if it
> > is faster to disable interrupts. These decisions will depend on the
> > benchmark I propose, because it is comparing the time it takes to
> > perform both.
> >
> > Overall, the benchmarks will allow to choose between those two
> > simplified hotpath pseudo-codes (offset is global to the buffer,
> > commit_count is per-subbuffer).
> >
> >
> > * lockless :
> >
> > do {
> >   old_offset = local_read(&offset);
> >   get_cycles();
> >   compute needed size.
> >   new_offset = old_offset + size;
> > } while (local_cmpxchg(&offset, old_offset, new_offset) != old_offset);
> >
> > /*
> >  * note : writing to buffer is done out-of-order wrt buffer slot
> >  * physical order.
> >  */
> > write_to_buffer(offset);
> >
> > /*
> >  * Make sure the data is written in the buffer before commit count is
> >  * incremented.
> >  */
> > smp_wmb();
> >
> > /* note : incrementing the commit count is also done out-of-order */
> > count = local_add_return(size, &commit_count[subbuf_index]);
> > if (count is filling a subbuffer)
> >   allow to wake up readers
> 
> Ah OK, so you just mean the benefit of using local atomics is avoiding
> the barriers that you get with atomic_t.
> 
> I'd thought you were referring to some benefit over irq disable pattern.
> 

On powerpc and mips, for instance, yes the gain is just the disabled
barriers. On x86 it becomes more interesting because we can remove the
lock; prefix, which gives a good speedup. All I want to do here is to
figure out which of barrier-less local_t ops vs disabling interrupts is
faster (and how much faster/slower) on various architectures.

For instance, on architecture like the powerpc64 (tests provided by Paul
McKenney), it's only a difference of less than 4 cycles between irq
off/irq (14-16 cycles, and this is without doing the data access) and
doing both local_cmpxchg and local_add_return (18 cycles). So given we
might have tracepoints called from NMI context, the tiny performance
impact we have with local_t ops does not counter balance the benefit of
having a lockless NMI-safe trace buffer management algorithm.

Thanks,

Mathieu

> 
> > * irq off :
> >
> > (note : offset and commit count would each be written to atomically
> > (type unsigned long))
> >
> > local_irq_save(flags);
> >
> > get_cycles();
> > compute needed size;
> > offset += size;
> >
> > write_to_buffer(offset);
> >
> > /*
> >  * Make sure the data is written in the buffer before commit count is
> >  * incremented.
> >  */
> > smp_wmb();
> >
> > commit_count[subbuf_index] += size;
> > if (count is filling a subbuffer)
> >   allow to wake up readers
> >
> > local_irq_restore(flags);
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

next prev parent reply	other threads:[~2009-03-18 15:10 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-17  1:32 cli/sti vs local_cmpxchg and local_add_return Mathieu Desnoyers
2009-03-17  3:37 ` David Miller
2009-03-17  4:10   ` Mathieu Desnoyers
2009-03-17  4:27     ` David Miller
2009-03-17  4:44       ` Mathieu Desnoyers
2009-03-17  5:01 ` Paul E. McKenney
2009-03-17 16:06   ` Mathieu Desnoyers
2009-03-17 19:28     ` David Miller
2009-03-17 19:35       ` Mathieu Desnoyers
2009-03-17  6:05 ` Nick Piggin
2009-03-17 15:14   ` [ltt-dev] " Mathieu Desnoyers
2009-03-18 11:43     ` Nick Piggin
2009-03-18 15:10       ` Mathieu Desnoyers [this message]
2009-03-17 18:42 ` Alan D. Brunelle
2009-03-17 19:01   ` Andika Triwidada
2009-03-23 16:50   ` Mathieu Desnoyers
2009-03-18 11:56 ` Josh Boyer
2009-03-23 16:56   ` Mathieu Desnoyers
2009-03-23 17:04     ` Josh Boyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090318151023.GA13272@Krystal \
    --to=compudj@krystal.dyndns.org \
    --cc=jwboyer@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ltt-dev@lists.casi.polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).