* lock's trace events can improve mutex's performance in userspace?
@ 2010-03-11 9:43 Xiao Guangrong
2010-03-13 4:27 ` Hitoshi Mitake
0 siblings, 1 reply; 4+ messages in thread
From: Xiao Guangrong @ 2010-03-11 9:43 UTC (permalink / raw)
To: Ingo Molnar
Cc: Steven Rostedt, Frederic Weisbecker, Peter Zijlstra,
Hitoshi Mitake, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Ming Lei,
LKML
[-- Attachment #1: Type: text/plain, Size: 940 bytes --]
We found that if enable lock's trace events, the 'sysbench mutex'
benchmark program can run quicker.
The simple program that is attached can reproduce it, the system info,
kernel config, and the script are also attached.
The test step is below:
# tar -zxvf test-mutex.tar.bz
# cd test-mutex
# make
# ./tscript.sh >& log
# cat log | grep "real"
real 0m46.765s < all trace events are disabled >
real 0m47.073s
real 0m47.402s
real 0m46.458s
real 0m47.433s
real 0m47.395s
real 0m47.010s
real 0m47.454s
real 0m47.044s
real 0m47.464s
real 0m39.245s < enable lock's trace events >
real 0m40.822s
real 0m40.779s
real 0m40.549s
real 0m40.605s
real 0m40.923s
real 0m40.560s
real 0m41.050s
real 0m40.757s
real 0m40.715s
[ "< ... >" is my comments ]
>From the result, we can see the program's runtime is less if enable lock's
trace events.
The conclusion is weird but i don't know why.
[-- Attachment #2: test-mutex.tar.bz --]
[-- Type: application/octet-stream, Size: 23996 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: lock's trace events can improve mutex's performance in userspace? 2010-03-11 9:43 lock's trace events can improve mutex's performance in userspace? Xiao Guangrong @ 2010-03-13 4:27 ` Hitoshi Mitake 2010-04-05 9:14 ` Michel Lespinasse 0 siblings, 1 reply; 4+ messages in thread From: Hitoshi Mitake @ 2010-03-13 4:27 UTC (permalink / raw) To: Xiao Guangrong Cc: Ingo Molnar, Steven Rostedt, Frederic Weisbecker, Peter Zijlstra, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Ming Lei, LKML [-- Attachment #1: Type: text/plain, Size: 2321 bytes --] On 03/11/10 18:43, Xiao Guangrong wrote: > We found that if enable lock's trace events, the 'sysbench mutex' > benchmark program can run quicker. > > The simple program that is attached can reproduce it, the system info, > kernel config, and the script are also attached. > > The test step is below: > > # tar -zxvf test-mutex.tar.bz > # cd test-mutex > # make > # ./tscript.sh>& log > # cat log | grep "real" > real 0m46.765s< all trace events are disabled> > real 0m47.073s > real 0m47.402s > real 0m46.458s > real 0m47.433s > real 0m47.395s > real 0m47.010s > real 0m47.454s > real 0m47.044s > real 0m47.464s > real 0m39.245s< enable lock's trace events> > real 0m40.822s > real 0m40.779s > real 0m40.549s > real 0m40.605s > real 0m40.923s > real 0m40.560s > real 0m41.050s > real 0m40.757s > real 0m40.715s > > [ "< ...>" is my comments ] > > From the result, we can see the program's runtime is less if enable lock's > trace events. > > The conclusion is weird but i don't know why. Hi Xiao, It's hard to believe, but... % sudo ./tscript.sh &> log % grep real log real 0m24.132s real 0m23.535s real 0m20.064s real 0m16.636s <- enabled from here real 0m16.435s real 0m17.339s I could reproduce your surprising result. (I only execed your benchmark 3 times.) I rewrote your mainc.c and checked contended count of each test like this way, if (pthread_mutex_trylock(&mutex) == EBUSY) { pthread_mutex_lock(&mutex); atomic_inc(&contended); } # I'll attach my new mainc.c % cat log Run mutex with trace events disabled... contended:25191221 real 0m24.132s user 0m17.149s sys 1m18.933s contended:25360563 real 0m23.535s user 0m17.233s sys 1m16.213s contended:23813911 real 0m20.064s user 0m15.561s sys 1m4.332s Run mutex with lockdep events enabled... contended:11458318 real 0m16.636s user 0m10.173s sys 0m55.595s contended:11881095 real 0m16.435s user 0m10.273s sys 0m54.911s contended:11261650 real 0m17.339s user 0m10.225s sys 0m58.556s It seems that num of contention decreased to about half. I don't know why this happened and effect to performance of it, but this result is worth to consider. Thanks, Hitoshi [-- Attachment #2: mainc.c --] [-- Type: text/x-csrc, Size: 1594 bytes --] #include <string.h> #include <stdio.h> #include <stdlib.h> #include <pthread.h> #include <errno.h> #define THREADS_NUM 1000 #define MUTEX_LOCKS 300000 static pthread_mutex_t mutex; static pthread_mutex_t thread_start_mutex; static int dummy_count; typedef struct { volatile int val; } atomic_t; atomic_t contended = { 0 }; static inline int atomic_inc(atomic_t *addr) { return __sync_add_and_fetch(&addr->val, 1); } static void *run_thread(void *arg) { int i; pthread_mutex_lock(&thread_start_mutex); pthread_mutex_unlock(&thread_start_mutex); for (i = 0; i < MUTEX_LOCKS; i++) { if (pthread_mutex_trylock(&mutex) == EBUSY) { pthread_mutex_lock(&mutex); atomic_inc(&contended); } dummy_count++; pthread_mutex_unlock(&mutex); } return NULL; } int main(int argc, char *argv[]) { pthread_t thread[THREADS_NUM]; pthread_attr_t thread_attr; int i; pthread_mutex_init(&mutex, NULL); pthread_mutex_init(&thread_start_mutex, NULL); pthread_attr_init(&thread_attr); pthread_attr_setstacksize(&thread_attr, 32*1024); pthread_mutex_lock(&thread_start_mutex); for (i = 0; i < THREADS_NUM; i++) { if (pthread_create(&thread[i], &thread_attr, run_thread, NULL) != 0) { printf("Thread[%d] create failed:", i); perror(NULL); exit(-1); } } pthread_mutex_unlock(&thread_start_mutex); for (i = 0; i < THREADS_NUM; i++) { if (pthread_join(thread[i], NULL) != 0) printf("Thread #%d join failed.\n", i); } pthread_mutex_destroy(&mutex); pthread_mutex_destroy(&thread_start_mutex); printf("contended:%d\n", contended.val); return 0; } ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: lock's trace events can improve mutex's performance in userspace? 2010-03-13 4:27 ` Hitoshi Mitake @ 2010-04-05 9:14 ` Michel Lespinasse 2010-04-08 9:05 ` Xiao Guangrong 0 siblings, 1 reply; 4+ messages in thread From: Michel Lespinasse @ 2010-04-05 9:14 UTC (permalink / raw) To: Hitoshi Mitake Cc: Xiao Guangrong, Ingo Molnar, Steven Rostedt, Frederic Weisbecker, Peter Zijlstra, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Ming Lei, LKML Sorry for the late reply... One thing to consider in locking micro-benchmarks is that often, code changes that slow down parts of the contended code path where the lock is not held, will result in an increase of the reported micro-benchmark metric. This effect is particularly marked for micro-benchmarks that consist of multiple threads doing empty acquire/release loops. As a thought experiment, imagine what would happen if you added a one-millisecond sleep in the contended code path for mutex acquisition. Soon all but one of your benchmark threads would be sleeping, and the only non-sleeping thread would be able to spin on that lock/unlock loop with no contention, resulting in very nice results for the micro-benchmark. Remove the sleep and the lock/unlock threads will have to contend, resulting in lower reported performance metrics. I think what you're seeing with lockdep events is a smaller case of this - enabling lockdep slows down the contended code path, but also reduces the number of times that it is taken... On Fri, Mar 12, 2010 at 8:27 PM, Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> wrote: > On 03/11/10 18:43, Xiao Guangrong wrote: >> We found that if enable lock's trace events, the 'sysbench mutex' >> benchmark program can run quicker. >> >> The simple program that is attached can reproduce it, the system info, >> kernel config, and the script are also attached. >> >> The test step is below: >> >> # tar -zxvf test-mutex.tar.bz >> # cd test-mutex >> # make >> # ./tscript.sh>& log >> # cat log | grep "real" >> real 0m46.765s< all trace events are disabled> >> real 0m47.073s >> real 0m47.402s >> real 0m46.458s >> real 0m47.433s >> real 0m47.395s >> real 0m47.010s >> real 0m47.454s >> real 0m47.044s >> real 0m47.464s >> real 0m39.245s< enable lock's trace events> >> real 0m40.822s >> real 0m40.779s >> real 0m40.549s >> real 0m40.605s >> real 0m40.923s >> real 0m40.560s >> real 0m41.050s >> real 0m40.757s >> real 0m40.715s >> >> [ "< ...>" is my comments ] >> >> From the result, we can see the program's runtime is less if enable > lock's >> trace events. >> >> The conclusion is weird but i don't know why. > > Hi Xiao, > > It's hard to believe, but... > > % sudo ./tscript.sh &> log > % grep real log > real 0m24.132s > real 0m23.535s > real 0m20.064s > real 0m16.636s <- enabled from here > real 0m16.435s > real 0m17.339s > > I could reproduce your surprising result. > (I only execed your benchmark 3 times.) > > I rewrote your mainc.c and checked contended count of each test like > this way, > if (pthread_mutex_trylock(&mutex) == EBUSY) { > pthread_mutex_lock(&mutex); > atomic_inc(&contended); > } > # I'll attach my new mainc.c > > % cat log > Run mutex with trace events disabled... > contended:25191221 > > real 0m24.132s > user 0m17.149s > sys 1m18.933s > contended:25360563 > > real 0m23.535s > user 0m17.233s > sys 1m16.213s > contended:23813911 > > real 0m20.064s > user 0m15.561s > sys 1m4.332s > Run mutex with lockdep events enabled... > contended:11458318 > > real 0m16.636s > user 0m10.173s > sys 0m55.595s > contended:11881095 > > real 0m16.435s > user 0m10.273s > sys 0m54.911s > contended:11261650 > > real 0m17.339s > user 0m10.225s > sys 0m58.556s > > It seems that num of contention decreased to about half. > I don't know why this happened and effect to performance of it, > but this result is worth to consider. > > Thanks, > Hitoshi > -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: lock's trace events can improve mutex's performance in userspace? 2010-04-05 9:14 ` Michel Lespinasse @ 2010-04-08 9:05 ` Xiao Guangrong 0 siblings, 0 replies; 4+ messages in thread From: Xiao Guangrong @ 2010-04-08 9:05 UTC (permalink / raw) To: Michel Lespinasse Cc: Hitoshi Mitake, Ingo Molnar, Steven Rostedt, Frederic Weisbecker, Peter Zijlstra, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Ming Lei, LKML Hi Michel, Michel Lespinasse wrote: > Sorry for the late reply... > > One thing to consider in locking micro-benchmarks is that often, code > changes that slow down parts of the contended code path where the lock > is not held, will result in an increase of the reported > micro-benchmark metric. This effect is particularly marked for > micro-benchmarks that consist of multiple threads doing empty > acquire/release loops. > > As a thought experiment, imagine what would happen if you added a > one-millisecond sleep in the contended code path for mutex > acquisition. Soon all but one of your benchmark threads would be > sleeping, and the only non-sleeping thread would be able to spin on > that lock/unlock loop with no contention, resulting in very nice > results for the micro-benchmark. Remove the sleep and the lock/unlock > threads will have to contend, resulting in lower reported performance > metrics. Great thanks for your valuable reply that makes we see the issue more clearly. I've do the test address your conjecture that add usleep(1) in mutex acquisition path, the test result shows contention is reduced. And i also do the test that does more work in mutex holding path, the result shows optimization ratio is decreased. Thanks, Xiao ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-04-08 9:08 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-11 9:43 lock's trace events can improve mutex's performance in userspace? Xiao Guangrong 2010-03-13 4:27 ` Hitoshi Mitake 2010-04-05 9:14 ` Michel Lespinasse 2010-04-08 9:05 ` Xiao Guangrong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox