All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Josh Boyer <jwboyer@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, ltt-dev@lists.casi.polymtl.ca
Subject: Re: cli/sti vs local_cmpxchg and local_add_return
Date: Tue, 17 Mar 2009 12:06:35 -0400	[thread overview]
Message-ID: <20090317160635.GE10092@Krystal> (raw)
In-Reply-To: <20090317050135.GB6859@linux.vnet.ibm.com>

* Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> On Mon, Mar 16, 2009 at 09:32:20PM -0400, Mathieu Desnoyers wrote:
> > Hi,
> > 
> > I am trying to get access to some non-x86 hardware to run some atomic
> > primitive benchmarks for a paper on LTTng I am preparing. That should be
> > useful to argue about performance benefit of per-cpu atomic operations
> > vs interrupt disabling. I would like to run the following benchmark
> > module on CONFIG_SMP :
> > 
> > - PowerPC
> > - MIPS
> > - ia64
> > - alpha
> > 
> > usage :
> > make
> > insmod test-cmpxchg-nolock.ko
> > insmod: error inserting 'test-cmpxchg-nolock.ko': -1 Resource temporarily unavailable
> > dmesg (see dmesg output)
> 
> Here you are on a 4.2GHz Power box:
> 
> test init
> test results: time for baseline
> number of loops: 20000
> total time: 12490
> -> baseline takes 0 cycles
> test end
> test results: time for locked cmpxchg
> number of loops: 20000
> total time: 345748
> -> locked cmpxchg takes 17 cycles
> test end
> test results: time for non locked cmpxchg
> number of loops: 20000
> total time: 198304
> -> non locked cmpxchg takes 9 cycles
> test end
> test results: time for locked add return
> number of loops: 20000
> total time: 253977
> -> locked add return takes 12 cycles
> test end
> test results: time for non locked add return
> number of loops: 20000
> total time: 189837
> -> non locked add return takes 9 cycles
> test end
> test results: time for enabling interrupts (STI)
> number of loops: 20000
> total time: 298390
> -> enabling interrupts (STI) takes 14 cycles
> test end
> test results: time for disabling interrupts (CLI)
> number of loops: 20000
> total time: 43977
> -> disabling interrupts (CLI) takes 2 cycles
> test end
> test results: time for disabling/enabling interrupts (STI/CLI)
> number of loops: 20000
> total time: 298773
> -> enabling/disabling interrupts (STI/CLI) takes 14 cycles
> test end

Thanks !

So on powerpc64, we have :

local_cmpxchg + local_add_return : 9 + 9 = 18 cycles
irq off/on : ~14-16 cycles (this is without the write and increment
instructions performing the same work as the cmpxchg and add_return.
Imprecision of the measurement is probably due to pipeline effect).

But powerpc has non-maskable interrupts, so for less than 4 cycles, I
think it's better to stay with the local_t variant to be NMI-safe.

Mathieu


> 
> 						Thanx, Paul
> 
> 
> > If some of you would be kind enough to run my test module provided below
> > and provide the results of these tests on a recent kernel (2.6.26~2.6.29
> > should be good) along with their cpuinfo, I would greatly appreciate.
> > 
> > Here are the CAS results for various Intel-based architectures :
> > 
> > Architecture         | Speedup                      |      CAS     |         Interrupts         |
> >                      | (cli + sti) / local cmpxchg  | local | sync | Enable (sti) | Disable (cli)
> > -------------------------------------------------------------------------------------------------
> > Intel Pentium 4      | 5.24                         |  25   | 81   | 70           | 61          |
> > AMD Athlon(tm)64 X2  | 4.57                         |  7    | 17   | 17           | 15          |
> > Intel Core2          | 6.33                         |  6    | 30   | 20           | 18          |
> > Intel Xeon E5405     | 5.25                         |  8    | 24   | 20           | 22          |
> > 
> > The benefit expected on PowerPC, ia64 and alpha should principally come
> > from removed memory barriers in the local primitives.
> > 
> > Thanks,
> > 
> > Mathieu
> > 
> > P.S. please forgive the coding style and hackish interface. :)
> > 
> > 
> > /* test-cmpxchg-nolock.c
> >  *
> >  * Compare local cmpxchg with irq disable / enable.
> >  */
> > 
> > 
> > #include <linux/jiffies.h>
> > #include <linux/compiler.h>
> > #include <linux/init.h>
> > #include <linux/module.h>
> > #include <linux/math64.h>
> > #include <asm/timex.h>
> > #include <asm/system.h>
> > 
> > #define NR_LOOPS 20000
> > 
> > int test_val;
> > 
> > static void do_testbaseline(void)
> > {
> > 	unsigned long flags;
> > 	unsigned int i;
> > 	cycles_t time1, time2, time;
> > 	u32 rem;
> > 
> > 	local_irq_save(flags);
> > 	preempt_disable();
> > 	time1 = get_cycles();
> > 	for (i = 0; i < NR_LOOPS; i++) {
> > 		asm volatile ("");
> > 	}
> > 	time2 = get_cycles();
> > 	local_irq_restore(flags);
> > 	preempt_enable();
> > 	time = time2 - time1;
> > 
> > 	printk(KERN_ALERT "test results: time for baseline\n");
> > 	printk(KERN_ALERT "number of loops: %d\n", NR_LOOPS);
> > 	printk(KERN_ALERT "total time: %llu\n", time);
> > 	time = div_u64_rem(time, NR_LOOPS, &rem);
> > 	printk(KERN_ALERT "-> baseline takes %llu cycles\n", time);
> > 	printk(KERN_ALERT "test end\n");
> > }
> > 
> > static void do_test_sync_cmpxchg(void)
> > {
> > 	int ret;
> > 	unsigned long flags;
> > 	unsigned int i;
> > 	cycles_t time1, time2, time;
> > 	u32 rem;
> > 
> > 	local_irq_save(flags);
> > 	preempt_disable();
> > 	time1 = get_cycles();
> > 	for (i = 0; i < NR_LOOPS; i++) {
> > #ifdef CONFIG_X86_32
> > 		ret = sync_cmpxchg(&test_val, 0, 0);
> > #else
> > 		ret = cmpxchg(&test_val, 0, 0);
> > #endif
> > 	}
> > 	time2 = get_cycles();
> > 	local_irq_restore(flags);
> > 	preempt_enable();
> > 	time = time2 - time1;
> > 
> > 	printk(KERN_ALERT "test results: time for locked cmpxchg\n");
> > 	printk(KERN_ALERT "number of loops: %d\n", NR_LOOPS);
> > 	printk(KERN_ALERT "total time: %llu\n", time);
> > 	time = div_u64_rem(time, NR_LOOPS, &rem);
> > 	printk(KERN_ALERT "-> locked cmpxchg takes %llu cycles\n", time);
> > 	printk(KERN_ALERT "test end\n");
> > }
> > 
> > static void do_test_cmpxchg(void)
> > {
> > 	int ret;
> > 	unsigned long flags;
> > 	unsigned int i;
> > 	cycles_t time1, time2, time;
> > 	u32 rem;
> > 
> > 	local_irq_save(flags);
> > 	preempt_disable();
> > 	time1 = get_cycles();
> > 	for (i = 0; i < NR_LOOPS; i++) {
> > 		ret = cmpxchg_local(&test_val, 0, 0);
> > 	}
> > 	time2 = get_cycles();
> > 	local_irq_restore(flags);
> > 	preempt_enable();
> > 	time = time2 - time1;
> > 
> > 	printk(KERN_ALERT "test results: time for non locked cmpxchg\n");
> > 	printk(KERN_ALERT "number of loops: %d\n", NR_LOOPS);
> > 	printk(KERN_ALERT "total time: %llu\n", time);
> > 	time = div_u64_rem(time, NR_LOOPS, &rem);
> > 	printk(KERN_ALERT "-> non locked cmpxchg takes %llu cycles\n", time);
> > 	printk(KERN_ALERT "test end\n");
> > }
> > static void do_test_sync_inc(void)
> > {
> > 	int ret;
> > 	unsigned long flags;
> > 	unsigned int i;
> > 	cycles_t time1, time2, time;
> > 	u32 rem;
> > 	atomic_t val;
> > 
> > 	local_irq_save(flags);
> > 	preempt_disable();
> > 	time1 = get_cycles();
> > 	for (i = 0; i < NR_LOOPS; i++) {
> > 		ret = atomic_add_return(10, &val);
> > 	}
> > 	time2 = get_cycles();
> > 	local_irq_restore(flags);
> > 	preempt_enable();
> > 	time = time2 - time1;
> > 
> > 	printk(KERN_ALERT "test results: time for locked add return\n");
> > 	printk(KERN_ALERT "number of loops: %d\n", NR_LOOPS);
> > 	printk(KERN_ALERT "total time: %llu\n", time);
> > 	time = div_u64_rem(time, NR_LOOPS, &rem);
> > 	printk(KERN_ALERT "-> locked add return takes %llu cycles\n", time);
> > 	printk(KERN_ALERT "test end\n");
> > }
> > 
> > 
> > static void do_test_inc(void)
> > {
> > 	int ret;
> > 	unsigned long flags;
> > 	unsigned int i;
> > 	cycles_t time1, time2, time;
> > 	u32 rem;
> > 	local_t loc_val;
> > 
> > 	local_irq_save(flags);
> > 	preempt_disable();
> > 	time1 = get_cycles();
> > 	for (i = 0; i < NR_LOOPS; i++) {
> > 		ret = local_add_return(10, &loc_val);
> > 	}
> > 	time2 = get_cycles();
> > 	local_irq_restore(flags);
> > 	preempt_enable();
> > 	time = time2 - time1;
> > 
> > 	printk(KERN_ALERT "test results: time for non locked add return\n");
> > 	printk(KERN_ALERT "number of loops: %d\n", NR_LOOPS);
> > 	printk(KERN_ALERT "total time: %llu\n", time);
> > 	time = div_u64_rem(time, NR_LOOPS, &rem);
> > 	printk(KERN_ALERT "-> non locked add return takes %llu cycles\n", time);
> > 	printk(KERN_ALERT "test end\n");
> > }
> > 
> > 
> > 
> > /*
> >  * This test will have a higher standard deviation due to incoming interrupts.
> >  */
> > static void do_test_enable_int(void)
> > {
> > 	unsigned long flags;
> > 	unsigned int i;
> > 	cycles_t time1, time2, time;
> > 	u32 rem;
> > 
> > 	local_irq_save(flags);
> > 	preempt_disable();
> > 	time1 = get_cycles();
> > 	for (i = 0; i < NR_LOOPS; i++) {
> > 		local_irq_restore(flags);
> > 	}
> > 	time2 = get_cycles();
> > 	local_irq_restore(flags);
> > 	preempt_enable();
> > 	time = time2 - time1;
> > 
> > 	printk(KERN_ALERT "test results: time for enabling interrupts (STI)\n");
> > 	printk(KERN_ALERT "number of loops: %d\n", NR_LOOPS);
> > 	printk(KERN_ALERT "total time: %llu\n", time);
> > 	time = div_u64_rem(time, NR_LOOPS, &rem);
> > 	printk(KERN_ALERT "-> enabling interrupts (STI) takes %llu cycles\n",
> > 					time);
> > 	printk(KERN_ALERT "test end\n");
> > }
> > 
> > static void do_test_disable_int(void)
> > {
> > 	unsigned long flags, flags2;
> > 	unsigned int i;
> > 	cycles_t time1, time2, time;
> > 	u32 rem;
> > 
> > 	local_irq_save(flags);
> > 	preempt_disable();
> > 	time1 = get_cycles();
> > 	for ( i = 0; i < NR_LOOPS; i++) {
> > 		local_irq_save(flags2);
> > 	}
> > 	time2 = get_cycles();
> > 	local_irq_restore(flags);
> > 	preempt_enable();
> > 	time = time2 - time1;
> > 
> > 	printk(KERN_ALERT "test results: time for disabling interrupts (CLI)\n");
> > 	printk(KERN_ALERT "number of loops: %d\n", NR_LOOPS);
> > 	printk(KERN_ALERT "total time: %llu\n", time);
> > 	time = div_u64_rem(time, NR_LOOPS, &rem);
> > 	printk(KERN_ALERT "-> disabling interrupts (CLI) takes %llu cycles\n",
> > 				time);
> > 	printk(KERN_ALERT "test end\n");
> > }
> > 
> > static void do_test_int(void)
> > {
> > 	unsigned long flags;
> > 	unsigned int i;
> > 	cycles_t time1, time2, time;
> > 	u32 rem;
> > 
> > 	local_irq_save(flags);
> > 	preempt_disable();
> > 	time1 = get_cycles();
> > 	for (i = 0; i < NR_LOOPS; i++) {
> > 		local_irq_restore(flags);
> > 		local_irq_save(flags);
> > 	}
> > 	time2 = get_cycles();
> > 	local_irq_restore(flags);
> > 	preempt_enable();
> > 	time = time2 - time1;
> > 
> > 	printk(KERN_ALERT "test results: time for disabling/enabling interrupts (STI/CLI)\n");
> > 	printk(KERN_ALERT "number of loops: %d\n", NR_LOOPS);
> > 	printk(KERN_ALERT "total time: %llu\n", time);
> > 	time = div_u64_rem(time, NR_LOOPS, &rem);
> > 	printk(KERN_ALERT "-> enabling/disabling interrupts (STI/CLI) takes %llu cycles\n",
> > 					time);
> > 	printk(KERN_ALERT "test end\n");
> > }
> > 
> > 
> > 
> > static int ltt_test_init(void)
> > {
> > 	printk(KERN_ALERT "test init\n");
> > 	
> > 	do_testbaseline();
> > 	do_test_sync_cmpxchg();
> > 	do_test_cmpxchg();
> > 	do_test_sync_inc();
> > 	do_test_inc();
> > 	do_test_enable_int();
> > 	do_test_disable_int();
> > 	do_test_int();
> > 	return -EAGAIN; /* Fail will directly unload the module */
> > }
> > 
> > static void ltt_test_exit(void)
> > {
> > 	printk(KERN_ALERT "test exit\n");
> > }
> > 
> > module_init(ltt_test_init)
> > module_exit(ltt_test_exit)
> > 
> > MODULE_LICENSE("GPL");
> > MODULE_AUTHOR("Mathieu Desnoyers");
> > MODULE_DESCRIPTION("Cmpxchg vs int Test");
> > 
> > 
> > 
> > * Makefile
> > 
> > ifneq ($(KERNELRELEASE),)
> > 	obj-m += test-cmpxchg-nolock.o
> > else
> > 	KERNELDIR ?= /lib/modules/$(shell uname -r)/build
> > 	PWD := $(shell pwd)
> > 	KERNELRELEASE = $(shell cat $(KERNELDIR)/$(KBUILD_OUTPUT)/include/linux/version.h | sed -n 's/.*UTS_RELEASE.*\"\(.*\)\".*/\1/p')
> > ifneq ($(INSTALL_MOD_PATH),)
> > 	DEPMOD_OPT := -b $(INSTALL_MOD_PATH)
> > endif
> > 
> > default:
> > 	$(MAKE) -C $(KERNELDIR) M=$(PWD) modules
> > 
> > modules_install:
> > 	$(MAKE) -C $(KERNELDIR) M=$(PWD) modules_install
> > 	if [ -f $(KERNELDIR)/$(KBUILD_OUTPUT)/System.map ] ; then /sbin/depmod -ae -F $(KERNELDIR)/$(KBUILD_OUTPUT)/System.map $(DEPMOD_OPT) $(KERNELRELEASE) ; fi
> > 
> > 
> > clean:
> > 	$(MAKE) -C $(KERNELDIR) M=$(PWD) clean
> > endif
> > 
> > 
> > -- 
> > Mathieu Desnoyers
> > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2009-03-17 16:06 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-17  1:32 cli/sti vs local_cmpxchg and local_add_return Mathieu Desnoyers
2009-03-17  3:37 ` David Miller
2009-03-17  4:10   ` Mathieu Desnoyers
2009-03-17  4:27     ` David Miller
2009-03-17  4:44       ` Mathieu Desnoyers
2009-03-17  5:01 ` Paul E. McKenney
2009-03-17 16:06   ` Mathieu Desnoyers [this message]
2009-03-17 19:28     ` David Miller
2009-03-17 19:35       ` Mathieu Desnoyers
2009-03-17  6:05 ` Nick Piggin
2009-03-17 15:14   ` [ltt-dev] " Mathieu Desnoyers
2009-03-18 11:43     ` Nick Piggin
2009-03-18 15:10       ` Mathieu Desnoyers
2009-03-17 18:42 ` Alan D. Brunelle
2009-03-17 19:01   ` Andika Triwidada
2009-03-23 16:50   ` Mathieu Desnoyers
2009-03-18 11:56 ` Josh Boyer
2009-03-23 16:56   ` Mathieu Desnoyers
2009-03-23 17:04     ` Josh Boyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090317160635.GE10092@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=jwboyer@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ltt-dev@lists.casi.polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.