* Scalability requirements for sysv ipc (was: ipc: store ipcs into IDRs)
@ 2008-03-21 9:41 Manfred Spraul
2008-03-21 12:45 ` Nadia Derbey
0 siblings, 1 reply; 27+ messages in thread
From: Manfred Spraul @ 2008-03-21 9:41 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Nadia Derbey, Andrew Morton, Paul E. McKenney
Hi all,
I noticed that sysv ipc now uses very special locking: first a global
rw-semaphore, then within that semaphore rcu:
> linux-2.6.25-rc3:/ipc/util.c:
> struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
> {
> struct kern_ipc_perm *out;
> int lid = ipcid_to_idx(id);
>
> down_read(&ids->rw_mutex);
>
> rcu_read_lock();
> out = idr_find(&ids->ipcs_idr, lid);
ids->rw_mutex is a per-namespace (i.e.: usually global) semaphore. Thus
ipc_lock writes into a global cacheline. Everything else is based on
per-object locking, especially sysv sem doesn't contain a single global
lock/statistic counter/...
That can't be the Right Thing (tm): Either there are cases where we need
the scalability (then using IDRs is impossible), or the scalability is
never needed (then the remaining parts from RCU should be removed).
I don't have a suitable test setup, has anyone performed benchmarks
recently?
Is sysv semaphore still important, or have all apps moved to posix
semaphores/futexes?
Nadia: Do you have access to a suitable benchmark?
A microbenchmark on a single-cpu system doesn't help much (except that
2.6.25 is around factor 2 slower for sysv msg ping-pong between two
tasks compared to the numbers I remember from older kernels....)
--
Manfred
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: Scalability requirements for sysv ipc (was: ipc: store ipcs into IDRs) 2008-03-21 9:41 Scalability requirements for sysv ipc (was: ipc: store ipcs into IDRs) Manfred Spraul @ 2008-03-21 12:45 ` Nadia Derbey 2008-03-21 13:33 ` Scalability requirements for sysv ipc Manfred Spraul 0 siblings, 1 reply; 27+ messages in thread From: Nadia Derbey @ 2008-03-21 12:45 UTC (permalink / raw) To: Manfred Spraul; +Cc: Linux Kernel Mailing List, Andrew Morton, Paul E. McKenney Manfred Spraul wrote: > Hi all, > > I noticed that sysv ipc now uses very special locking: first a global > rw-semaphore, then within that semaphore rcu: > > linux-2.6.25-rc3:/ipc/util.c: > >> struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id) >> { >> struct kern_ipc_perm *out; >> int lid = ipcid_to_idx(id); >> >> down_read(&ids->rw_mutex); >> >> rcu_read_lock(); >> out = idr_find(&ids->ipcs_idr, lid); > > ids->rw_mutex is a per-namespace (i.e.: usually global) semaphore. Thus > ipc_lock writes into a global cacheline. Everything else is based on > per-object locking, especially sysv sem doesn't contain a single global > lock/statistic counter/... > That can't be the Right Thing (tm): Either there are cases where we need > the scalability (then using IDRs is impossible), or the scalability is > never needed (then the remaining parts from RCU should be removed). > I don't have a suitable test setup, has anyone performed benchmarks > recently? > Is sysv semaphore still important, or have all apps moved to posix > semaphores/futexes? > Nadia: Do you have access to a suitable benchmark? > > A microbenchmark on a single-cpu system doesn't help much (except that > 2.6.25 is around factor 2 slower for sysv msg ping-pong between two > tasks compared to the numbers I remember from older kernels....) > If I remember well, at that time I had used ctxbench and I wrote some other small scripts. And the results I had were around 2 or 3% slowdown, but I have to confirm that by checking in my archives. I'll also have a look at the remaining RCU critical sections in the code. Regards, Nadia ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-21 12:45 ` Nadia Derbey @ 2008-03-21 13:33 ` Manfred Spraul 2008-03-21 14:13 ` Paul E. McKenney 2008-03-25 16:00 ` Nadia Derbey 0 siblings, 2 replies; 27+ messages in thread From: Manfred Spraul @ 2008-03-21 13:33 UTC (permalink / raw) To: Nadia Derbey; +Cc: Linux Kernel Mailing List, Andrew Morton, Paul E. McKenney Nadia Derbey wrote: > Manfred Spraul wrote: >> >> A microbenchmark on a single-cpu system doesn't help much (except >> that 2.6.25 is around factor 2 slower for sysv msg ping-pong between >> two tasks compared to the numbers I remember from older kernels....) >> > > If I remember well, at that time I had used ctxbench and I wrote some > other small scripts. > And the results I had were around 2 or 3% slowdown, but I have to > confirm that by checking in my archives. > Do you have access to multi-core systems? The "best case" for the rcu code would be - 8 or 16 cores - one instance of ctxbench running on each core, bound to that core. I'd expect a significant slowdown. The big question is if it matters. -- Manfred ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-21 13:33 ` Scalability requirements for sysv ipc Manfred Spraul @ 2008-03-21 14:13 ` Paul E. McKenney 2008-03-21 16:08 ` Manfred Spraul 2008-03-25 16:00 ` Nadia Derbey 1 sibling, 1 reply; 27+ messages in thread From: Paul E. McKenney @ 2008-03-21 14:13 UTC (permalink / raw) To: Manfred Spraul; +Cc: Nadia Derbey, Linux Kernel Mailing List, Andrew Morton On Fri, Mar 21, 2008 at 02:33:24PM +0100, Manfred Spraul wrote: > Nadia Derbey wrote: > >Manfred Spraul wrote: > >> > >>A microbenchmark on a single-cpu system doesn't help much (except > >>that 2.6.25 is around factor 2 slower for sysv msg ping-pong between > >>two tasks compared to the numbers I remember from older kernels....) > > > >If I remember well, at that time I had used ctxbench and I wrote some > >other small scripts. > >And the results I had were around 2 or 3% slowdown, but I have to > >confirm that by checking in my archives. > > > Do you have access to multi-core systems? The "best case" for the rcu > code would be > - 8 or 16 cores > - one instance of ctxbench running on each core, bound to that core. > > I'd expect a significant slowdown. The big question is if it matters. I could give it a spin -- though I would need to be pointed to the patch and the test. Thanx, Paul ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-21 14:13 ` Paul E. McKenney @ 2008-03-21 16:08 ` Manfred Spraul 2008-03-22 5:43 ` Mike Galbraith 0 siblings, 1 reply; 27+ messages in thread From: Manfred Spraul @ 2008-03-21 16:08 UTC (permalink / raw) To: paulmck; +Cc: Nadia Derbey, Linux Kernel Mailing List, Andrew Morton Paul E. McKenney wrote: > I could give it a spin -- though I would need to be pointed to the > patch and the test. > > I'd just compare a recent kernel with something older, pre Fri Oct 19 11:53:44 2007 Then download ctxbench, run one instance on each core, bound with taskset. http://www.tmr.com/%7Epublic/source/ (I don't juse ctxbench myself, if it doesn't work then I could post my own app. It would be i386 only with RDTSCs inside) I'll try to run it on my PentiumIII/850, right now I'm still setting everything up. -- Manfred ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-21 16:08 ` Manfred Spraul @ 2008-03-22 5:43 ` Mike Galbraith 2008-03-22 10:10 ` Manfred Spraul 2008-03-27 22:29 ` Bill Davidsen 0 siblings, 2 replies; 27+ messages in thread From: Mike Galbraith @ 2008-03-22 5:43 UTC (permalink / raw) To: Manfred Spraul Cc: paulmck, Nadia Derbey, Linux Kernel Mailing List, Andrew Morton On Fri, 2008-03-21 at 17:08 +0100, Manfred Spraul wrote: > Paul E. McKenney wrote: > > I could give it a spin -- though I would need to be pointed to the > > patch and the test. > > > > > I'd just compare a recent kernel with something older, pre Fri Oct 19 > 11:53:44 2007 > > Then download ctxbench, run one instance on each core, bound with taskset. > http://www.tmr.com/%7Epublic/source/ > (I don't juse ctxbench myself, if it doesn't work then I could post my > own app. It would be i386 only with RDTSCs inside) (test gizmos are always welcome) Results for Q6600 box don't look particularly wonderful. taskset -c 3 ./ctx -s 2.6.24.3 3766962 itterations in 9.999845 seconds = 376734/sec 2.6.22.18-cfs-v24.1 4375920 itterations in 10.006199 seconds = 437330/sec for i in 0 1 2 3; do taskset -c $i ./ctx -s& done 2.6.22.18-cfs-v24.1 4355784 itterations in 10.005670 seconds = 435361/sec 4396033 itterations in 10.005686 seconds = 439384/sec 4390027 itterations in 10.006511 seconds = 438739/sec 4383906 itterations in 10.006834 seconds = 438128/sec 2.6.24.3 1269937 itterations in 9.999757 seconds = 127006/sec 1266723 itterations in 9.999663 seconds = 126685/sec 1267293 itterations in 9.999348 seconds = 126742/sec 1265793 itterations in 9.999766 seconds = 126592/sec -Mike ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-22 5:43 ` Mike Galbraith @ 2008-03-22 10:10 ` Manfred Spraul 2008-03-22 11:53 ` Mike Galbraith 2008-03-27 22:29 ` Bill Davidsen 1 sibling, 1 reply; 27+ messages in thread From: Manfred Spraul @ 2008-03-22 10:10 UTC (permalink / raw) To: Mike Galbraith Cc: paulmck, Nadia Derbey, Linux Kernel Mailing List, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 1060 bytes --] Mike Galbraith wrote: > taskset -c 3 ./ctx -s > > 2.6.24.3 > 3766962 itterations in 9.999845 seconds = 376734/sec > > 2.6.22.18-cfs-v24.1 > 4375920 itterations in 10.006199 seconds = 437330/sec > > for i in 0 1 2 3; do taskset -c $i ./ctx -s& done > > 2.6.22.18-cfs-v24.1 > 4355784 itterations in 10.005670 seconds = 435361/sec > 4396033 itterations in 10.005686 seconds = 439384/sec > 4390027 itterations in 10.006511 seconds = 438739/sec > 4383906 itterations in 10.006834 seconds = 438128/sec > > 2.6.24.3 > 1269937 itterations in 9.999757 seconds = 127006/sec > 1266723 itterations in 9.999663 seconds = 126685/sec > 1267293 itterations in 9.999348 seconds = 126742/sec > 1265793 itterations in 9.999766 seconds = 126592/sec > > Ouch - 71% slowdown with just 4 cores. Wow. Attached are my own testapps: one for sysv msg, one for sysv sem. Could you run them? Taskset is done internally, just execute $ for i in 1 2 3 4;do ./psem $i 5;./pmsg $i 5;done Only tested on uniprocessor, I hope the pthread_setaffinity works as expected.... -- Manfred [-- Attachment #2: pmsg.cpp --] [-- Type: text/plain, Size: 4665 bytes --] /* * pmsg.cpp, parallel sysv msg pingpong * * Copyright (C) 1999, 2001, 2005, 2008 by Manfred Spraul. * All rights reserved except the rights granted by the GPL. * * Redistribution of this file is permitted under the terms of the GNU * General Public License (GPL) version 2 or later. * $Header$ */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <getopt.h> #include <errno.h> #include <sys/types.h> #include <sys/ipc.h> #include <sys/msg.h> #include <pthread.h> ////////////////////////////////////////////////////////////////////////////// static enum { WAITING, RUNNING, STOPPED, } volatile g_state = WAITING; unsigned long long *g_results; int *g_svmsg_ids; pthread_t *g_threads; struct taskinfo { int svmsg_id; int threadid; int sender; }; #define DATASIZE 8 void* worker_thread(void *arg) { struct taskinfo *ti = (struct taskinfo*)arg; unsigned long long rounds; int ret; struct { long mtype; char buffer[DATASIZE]; } mbuf; { cpu_set_t cpus; CPU_ZERO(&cpus); CPU_SET(ti->threadid/2, &cpus); printf("ti: %d %lxh\n", ti->threadid/2, cpus.__bits[0]); ret = pthread_setaffinity_np(g_threads[ti->threadid], sizeof(cpus), &cpus); if (ret < 0) { printf("pthread_setaffinity_np failed for thread %d with errno %d.\n", ti->threadid, errno); } ret = pthread_getaffinity_np(g_threads[ti->threadid], sizeof(cpus), &cpus); if (ret < 0) { printf("pthread_getaffinity_np() failed for thread %d with errno %d.\n", ti->threadid, errno); fflush(stdout); } else { printf("thread %d: sysvmsg %8d type %d bound to %lxh\n",ti->threadid, ti->svmsg_id, ti->sender, cpus.__bits[0]); } fflush(stdout); } rounds = 0; while(g_state == WAITING) { #ifdef __i386__ __asm__ __volatile__("pause": : :"memory"); #endif } if (ti->sender) { mbuf.mtype = ti->sender+1; ret = msgsnd(ti->svmsg_id, &mbuf, DATASIZE, 0); if (ret != 0) { printf("Initial send failed, errno %d.\n", errno); exit(1); } } while(g_state == RUNNING) { int target = 1+!ti->sender; ret = msgrcv(ti->svmsg_id, &mbuf, DATASIZE, target, 0); if (ret != DATASIZE) { if (errno == EIDRM) break; printf("Error on msgrcv, got %d, errno %d.\n", ret, errno); exit(1); } mbuf.mtype = ti->sender+1; ret = msgsnd(ti->svmsg_id, &mbuf, DATASIZE, 0); if (ret != 0) { if (errno == EIDRM) break; printf("send failed, errno %d.\n", errno); exit(1); } rounds++; } /* store result */ g_results[ti->threadid] = rounds; pthread_exit(0); return NULL; } void init_thread(int thread1, int thread2) { int ret; struct taskinfo *ti1, *ti2; ti1 = new (struct taskinfo); ti2 = new (struct taskinfo); if (!ti1 || !ti2) { printf("Could not allocate task info\n"); exit(1); } g_svmsg_ids[thread1] = msgget(IPC_PRIVATE,0777|IPC_CREAT); if(g_svmsg_ids[thread1] == -1) { printf(" message queue create failed.\n"); exit(1); } ti1->svmsg_id = g_svmsg_ids[thread1]; ti2->svmsg_id = ti1->svmsg_id; ti1->threadid = thread1; ti2->threadid = thread2; ti1->sender = 1; ti2->sender = 0; ret = pthread_create(&g_threads[thread1], NULL, worker_thread, ti1); if (ret) { printf(" pthread_create failed with error code %d\n", ret); exit(1); } ret = pthread_create(&g_threads[thread2], NULL, worker_thread, ti2); if (ret) { printf(" pthread_create failed with error code %d\n", ret); exit(1); } } ////////////////////////////////////////////////////////////////////////////// int main(int argc, char **argv) { int queues, timeout; unsigned long long totals; int i; printf("pmsg [nr queues] [timeout]\n"); if (argc != 3) { printf(" Invalid parameters.\n"); return 0; } queues = atoi(argv[1]); timeout = atoi(argv[2]); printf("Using %d queues (%d threads) for %d seconds.\n", queues, 2*queues, timeout); g_results = new unsigned long long[2*queues]; g_svmsg_ids = new int[queues]; g_threads = new pthread_t[2*queues]; for (i=0;i<queues;i++) { g_results[i] = 0; g_results[i+queues] = 0; init_thread(i, i+queues); } sleep(1); g_state = RUNNING; sleep(timeout); g_state = STOPPED; sleep(1); for (i=0;i<queues;i++) { int res; res = msgctl(g_svmsg_ids[i],IPC_RMID,NULL); if (res < 0) { printf("msgctl(IPC_RMID) failed for %d, errno%d.\n", g_svmsg_ids[i], errno); } } for (i=0;i<2*queues;i++) pthread_join(g_threads[i], NULL); printf("Result matrix:\n"); totals = 0; for (i=0;i<queues;i++) { printf(" Thread %3d: %8lld %3d: %8lld\n", i, g_results[i], i+queues, g_results[i+queues]); totals += g_results[i] + g_results[i+queues]; } printf("Total: %lld\n", totals); } [-- Attachment #3: psem.cpp --] [-- Type: text/plain, Size: 4840 bytes --] /* * psem.cpp, parallel sysv sem pingpong * * Copyright (C) 1999, 2001, 2005, 2008 by Manfred Spraul. * All rights reserved except the rights granted by the GPL. * * Redistribution of this file is permitted under the terms of the GNU * General Public License (GPL) version 2 or later. * $Header$ */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <getopt.h> #include <errno.h> #include <sys/types.h> #include <sys/ipc.h> #include <sys/sem.h> #include <pthread.h> ////////////////////////////////////////////////////////////////////////////// static enum { WAITING, RUNNING, STOPPED, } volatile g_state = WAITING; unsigned long long *g_results; int *g_svsem_ids; pthread_t *g_threads; struct taskinfo { int svsem_id; int threadid; int sender; }; #define DATASIZE 8 void* worker_thread(void *arg) { struct taskinfo *ti = (struct taskinfo*)arg; unsigned long long rounds; int ret; { cpu_set_t cpus; CPU_ZERO(&cpus); CPU_SET(ti->threadid/2, &cpus); printf("ti: %d %lxh\n", ti->threadid/2, cpus.__bits[0]); ret = pthread_setaffinity_np(g_threads[ti->threadid], sizeof(cpus), &cpus); if (ret < 0) { printf("pthread_setaffinity_np failed for thread %d with errno %d.\n", ti->threadid, errno); } ret = pthread_getaffinity_np(g_threads[ti->threadid], sizeof(cpus), &cpus); if (ret < 0) { printf("pthread_getaffinity_np() failed for thread %d with errno %d.\n", ti->threadid, errno); fflush(stdout); } else { printf("thread %d: sysvsem %8d type %d bound to %lxh\n",ti->threadid, ti->svsem_id, ti->sender, cpus.__bits[0]); } fflush(stdout); } rounds = 0; while(g_state == WAITING) { #ifdef __i386__ __asm__ __volatile__("pause": : :"memory"); #endif } if (ti->sender) { struct sembuf sop[1]; int res; /* 1) insert token */ sop[0].sem_num=0; sop[0].sem_op=1; sop[0].sem_flg=0; res = semop(ti->svsem_id,sop,1); if (ret != 0) { printf("Initial semop failed, errno %d.\n", errno); exit(1); } } while(g_state == RUNNING) { struct sembuf sop[1]; int res; /* 1) retrieve token */ sop[0].sem_num=ti->sender; sop[0].sem_op=-1; sop[0].sem_flg=0; res = semop(ti->svsem_id,sop,1); if (ret != 0) { /* EIDRM can happen */ if (errno == EIDRM) break; printf("main semop failed, errno %d.\n", errno); exit(1); } /* 2) reinsert token */ sop[0].sem_num=1-ti->sender; sop[0].sem_op=1; sop[0].sem_flg=0; res = semop(ti->svsem_id,sop,1); if (ret != 0) { /* EIDRM can happen */ if (errno == EIDRM) break; printf("main semop failed, errno %d.\n", errno); exit(1); } rounds++; } g_results[ti->threadid] = rounds; pthread_exit(0); return NULL; } void init_thread(int thread1, int thread2) { int ret; struct taskinfo *ti1, *ti2; ti1 = new (struct taskinfo); ti2 = new (struct taskinfo); if (!ti1 || !ti2) { printf("Could not allocate task info\n"); exit(1); } g_svsem_ids[thread1] = semget(IPC_PRIVATE,2,0777|IPC_CREAT); if(g_svsem_ids[thread1] == -1) { printf(" message queue create failed.\n"); exit(1); } ti1->svsem_id = g_svsem_ids[thread1]; ti2->svsem_id = ti1->svsem_id; ti1->threadid = thread1; ti2->threadid = thread2; ti1->sender = 1; ti2->sender = 0; ret = pthread_create(&g_threads[thread1], NULL, worker_thread, ti1); if (ret) { printf(" pthread_create failed with error code %d\n", ret); exit(1); } ret = pthread_create(&g_threads[thread2], NULL, worker_thread, ti2); if (ret) { printf(" pthread_create failed with error code %d\n", ret); exit(1); } } ////////////////////////////////////////////////////////////////////////////// int main(int argc, char **argv) { int queues, timeout; unsigned long long totals; int i; printf("psem [nr queues] [timeout]\n"); if (argc != 3) { printf(" Invalid parameters.\n"); return 0; } queues = atoi(argv[1]); timeout = atoi(argv[2]); printf("Using %d queues (%d threads) for %d seconds.\n", queues, 2*queues, timeout); g_results = new unsigned long long[2*queues]; g_svsem_ids = new int[queues]; g_threads = new pthread_t[2*queues]; for (i=0;i<queues;i++) { g_results[i] = 0; g_results[i+queues] = 0; init_thread(i, i+queues); } sleep(1); g_state = RUNNING; sleep(timeout); g_state = STOPPED; sleep(1); for (i=0;i<queues;i++) { int res; res = semctl(g_svsem_ids[i],1,IPC_RMID,NULL); if (res < 0) { printf("semctl(IPC_RMID) failed for %d, errno%d.\n", g_svsem_ids[i], errno); } } for (i=0;i<2*queues;i++) pthread_join(g_threads[i], NULL); printf("Result matrix:\n"); totals = 0; for (i=0;i<queues;i++) { printf(" Thread %3d: %8lld %3d: %8lld\n", i, g_results[i], i+queues, g_results[i+queues]); totals += g_results[i] + g_results[i+queues]; } printf("Total: %lld\n", totals); } ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-22 10:10 ` Manfred Spraul @ 2008-03-22 11:53 ` Mike Galbraith 2008-03-22 14:22 ` Manfred Spraul 0 siblings, 1 reply; 27+ messages in thread From: Mike Galbraith @ 2008-03-22 11:53 UTC (permalink / raw) To: Manfred Spraul Cc: paulmck, Nadia Derbey, Linux Kernel Mailing List, Andrew Morton On Sat, 2008-03-22 at 11:10 +0100, Manfred Spraul wrote: > Attached are my own testapps: one for sysv msg, one for sysv sem. > Could you run them? Taskset is done internally, just execute > > $ for i in 1 2 3 4;do ./psem $i 5;./pmsg $i 5;done 2.6.22.18-cfs-v24-smp 2.6.24.3-smp Result matrix: (psem) Thread 0: 2394885 1: 2394885 Thread 0: 2004534 1: 2004535 Total: 4789770 Total: 4009069 Result matrix: (pmsg) Thread 0: 2345913 1: 2345914 Thread 0: 1971000 1: 1971000 Total: 4691827 Total: 3942000 Result matrix: Thread 0: 1613610 2: 1613611 Thread 0: 477112 2: 477111 Thread 1: 1613590 3: 1613590 Thread 1: 485607 3: 485607 Total: 6454401 Total: 1925437 Result matrix: Thread 0: 1409956 2: 1409956 Thread 0: 519398 2: 519398 Thread 1: 1409776 3: 1409776 Thread 1: 519169 3: 519170 Total: 5639464 Total: 2077135 Result matrix: Thread 0: 516309 3: 516309 Thread 0: 401157 3: 401157 Thread 1: 318546 4: 318546 Thread 1: 408252 4: 408252 Thread 2: 352940 5: 352940 Thread 2: 703600 5: 703600 Total: 2375590 Total: 3026018 Result matrix: Thread 0: 478356 3: 478356 Thread 0: 344738 3: 344739 Thread 1: 241655 4: 241655 Thread 1: 343614 4: 343615 Thread 2: 252444 5: 252445 Thread 2: 589298 5: 589299 Total: 1944911 Total: 2555303 Result matrix: Thread 0: 443392 4: 443392 Thread 0: 398491 4: 398491 Thread 1: 443338 5: 443339 Thread 1: 398473 5: 398473 Thread 2: 444069 6: 444070 Thread 2: 394647 6: 394648 Thread 3: 444078 7: 444078 Thread 3: 394784 7: 394785 Total: 3549756 Total: 3172792 Result matrix: Thread 0: 354973 4: 354973 Thread 0: 331307 4: 331307 Thread 1: 354966 5: 354966 Thread 1: 331220 5: 331221 Thread 2: 358035 6: 358035 Thread 2: 322852 6: 322852 Thread 3: 357877 7: 357877 Thread 3: 322899 7: 322899 Total: 2851702 Total: 2616557 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-22 11:53 ` Mike Galbraith @ 2008-03-22 14:22 ` Manfred Spraul 2008-03-22 19:08 ` Manfred Spraul 2008-03-22 19:35 ` Scalability requirements for sysv ipc Mike Galbraith 0 siblings, 2 replies; 27+ messages in thread From: Manfred Spraul @ 2008-03-22 14:22 UTC (permalink / raw) To: Mike Galbraith Cc: paulmck, Nadia Derbey, Linux Kernel Mailing List, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 260 bytes --] Mike Galbraith wrote: > Total: 4691827 Total: 3942000 > Thanks. Unfortunately the test was buggy, it bound the tasks to the wrong cpu :-( Could you run it again? Actually 1 cpu and 4 cpus are probably enough. -- Manfred [-- Attachment #2: pmsg.cpp --] [-- Type: text/plain, Size: 4653 bytes --] /* * pmsg.cpp, parallel sysv msg pingpong * * Copyright (C) 1999, 2001, 2005, 2008 by Manfred Spraul. * All rights reserved except the rights granted by the GPL. * * Redistribution of this file is permitted under the terms of the GNU * General Public License (GPL) version 2 or later. * $Header$ */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <getopt.h> #include <errno.h> #include <sys/types.h> #include <sys/ipc.h> #include <sys/msg.h> #include <pthread.h> ////////////////////////////////////////////////////////////////////////////// static enum { WAITING, RUNNING, STOPPED, } volatile g_state = WAITING; unsigned long long *g_results; int *g_svmsg_ids; pthread_t *g_threads; struct taskinfo { int svmsg_id; int threadid; int cpuid; int sender; }; #define DATASIZE 8 void* worker_thread(void *arg) { struct taskinfo *ti = (struct taskinfo*)arg; unsigned long long rounds; int ret; struct { long mtype; char buffer[DATASIZE]; } mbuf; { cpu_set_t cpus; CPU_ZERO(&cpus); CPU_SET(ti->cpuid, &cpus); ret = pthread_setaffinity_np(g_threads[ti->threadid], sizeof(cpus), &cpus); if (ret < 0) { printf("pthread_setaffinity_np failed for thread %d with errno %d.\n", ti->threadid, errno); } ret = pthread_getaffinity_np(g_threads[ti->threadid], sizeof(cpus), &cpus); if (ret < 0) { printf("pthread_getaffinity_np() failed for thread %d with errno %d.\n", ti->threadid, errno); fflush(stdout); } else { printf("thread %d: sysvmsg %8d type %d bound to %04lxh\n",ti->threadid, ti->svmsg_id, ti->sender, cpus.__bits[0]); } fflush(stdout); } rounds = 0; while(g_state == WAITING) { #ifdef __i386__ __asm__ __volatile__("pause": : :"memory"); #endif } if (ti->sender) { mbuf.mtype = ti->sender+1; ret = msgsnd(ti->svmsg_id, &mbuf, DATASIZE, 0); if (ret != 0) { printf("Initial send failed, errno %d.\n", errno); exit(1); } } while(g_state == RUNNING) { int target = 1+!ti->sender; ret = msgrcv(ti->svmsg_id, &mbuf, DATASIZE, target, 0); if (ret != DATASIZE) { if (errno == EIDRM) break; printf("Error on msgrcv, got %d, errno %d.\n", ret, errno); exit(1); } mbuf.mtype = ti->sender+1; ret = msgsnd(ti->svmsg_id, &mbuf, DATASIZE, 0); if (ret != 0) { if (errno == EIDRM) break; printf("send failed, errno %d.\n", errno); exit(1); } rounds++; } /* store result */ g_results[ti->threadid] = rounds; pthread_exit(0); return NULL; } void init_threads(int cpu, int cpus) { int ret; struct taskinfo *ti1, *ti2; ti1 = new (struct taskinfo); ti2 = new (struct taskinfo); if (!ti1 || !ti2) { printf("Could not allocate task info\n"); exit(1); } g_svmsg_ids[cpu] = msgget(IPC_PRIVATE,0777|IPC_CREAT); if(g_svmsg_ids[cpu] == -1) { printf(" message queue create failed.\n"); exit(1); } g_results[cpu] = 0; g_results[cpu+cpus] = 0; ti1->svmsg_id = g_svmsg_ids[cpu]; ti1->threadid = cpu; ti1->cpuid = cpu; ti1->sender = 1; ti2->svmsg_id = g_svmsg_ids[cpu]; ti2->threadid = cpu+cpus; ti2->cpuid = cpu; ti2->sender = 0; ret = pthread_create(&g_threads[ti1->threadid], NULL, worker_thread, ti1); if (ret) { printf(" pthread_create failed with error code %d\n", ret); exit(1); } ret = pthread_create(&g_threads[ti2->threadid], NULL, worker_thread, ti2); if (ret) { printf(" pthread_create failed with error code %d\n", ret); exit(1); } } ////////////////////////////////////////////////////////////////////////////// int main(int argc, char **argv) { int queues, timeout; unsigned long long totals; int i; printf("pmsg [nr queues] [timeout]\n"); if (argc != 3) { printf(" Invalid parameters.\n"); return 0; } queues = atoi(argv[1]); timeout = atoi(argv[2]); printf("Using %d queues/cpus (%d threads) for %d seconds.\n", queues, 2*queues, timeout); g_results = new unsigned long long[2*queues]; g_svmsg_ids = new int[queues]; g_threads = new pthread_t[2*queues]; for (i=0;i<queues;i++) { init_threads(i, queues); } sleep(1); g_state = RUNNING; sleep(timeout); g_state = STOPPED; sleep(1); for (i=0;i<queues;i++) { int res; res = msgctl(g_svmsg_ids[i],IPC_RMID,NULL); if (res < 0) { printf("msgctl(IPC_RMID) failed for %d, errno%d.\n", g_svmsg_ids[i], errno); } } for (i=0;i<2*queues;i++) pthread_join(g_threads[i], NULL); printf("Result matrix:\n"); totals = 0; for (i=0;i<queues;i++) { printf(" Thread %3d: %8lld %3d: %8lld\n", i, g_results[i], i+queues, g_results[i+queues]); totals += g_results[i] + g_results[i+queues]; } printf("Total: %lld\n", totals); } [-- Attachment #3: psem.cpp --] [-- Type: text/plain, Size: 4823 bytes --] /* * psem.cpp, parallel sysv sem pingpong * * Copyright (C) 1999, 2001, 2005, 2008 by Manfred Spraul. * All rights reserved except the rights granted by the GPL. * * Redistribution of this file is permitted under the terms of the GNU * General Public License (GPL) version 2 or later. * $Header$ */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <getopt.h> #include <errno.h> #include <sys/types.h> #include <sys/ipc.h> #include <sys/sem.h> #include <pthread.h> ////////////////////////////////////////////////////////////////////////////// static enum { WAITING, RUNNING, STOPPED, } volatile g_state = WAITING; unsigned long long *g_results; int *g_svsem_ids; pthread_t *g_threads; struct taskinfo { int svsem_id; int threadid; int cpuid; int sender; }; #define DATASIZE 8 void* worker_thread(void *arg) { struct taskinfo *ti = (struct taskinfo*)arg; unsigned long long rounds; int ret; { cpu_set_t cpus; CPU_ZERO(&cpus); CPU_SET(ti->cpuid, &cpus); ret = pthread_setaffinity_np(g_threads[ti->threadid], sizeof(cpus), &cpus); if (ret < 0) { printf("pthread_setaffinity_np failed for thread %d with errno %d.\n", ti->threadid, errno); } ret = pthread_getaffinity_np(g_threads[ti->threadid], sizeof(cpus), &cpus); if (ret < 0) { printf("pthread_getaffinity_np() failed for thread %d with errno %d.\n", ti->threadid, errno); fflush(stdout); } else { printf("thread %d: sysvsem %8d type %d bound to %04lxh\n",ti->threadid, ti->svsem_id, ti->sender, cpus.__bits[0]); } fflush(stdout); } rounds = 0; while(g_state == WAITING) { #ifdef __i386__ __asm__ __volatile__("pause": : :"memory"); #endif } if (ti->sender) { struct sembuf sop[1]; int res; /* 1) insert token */ sop[0].sem_num=0; sop[0].sem_op=1; sop[0].sem_flg=0; res = semop(ti->svsem_id,sop,1); if (ret != 0) { printf("Initial semop failed, errno %d.\n", errno); exit(1); } } while(g_state == RUNNING) { struct sembuf sop[1]; int res; /* 1) retrieve token */ sop[0].sem_num=ti->sender; sop[0].sem_op=-1; sop[0].sem_flg=0; res = semop(ti->svsem_id,sop,1); if (ret != 0) { /* EIDRM can happen */ if (errno == EIDRM) break; printf("main semop failed, errno %d.\n", errno); exit(1); } /* 2) reinsert token */ sop[0].sem_num=1-ti->sender; sop[0].sem_op=1; sop[0].sem_flg=0; res = semop(ti->svsem_id,sop,1); if (ret != 0) { /* EIDRM can happen */ if (errno == EIDRM) break; printf("main semop failed, errno %d.\n", errno); exit(1); } rounds++; } g_results[ti->threadid] = rounds; pthread_exit(0); return NULL; } void init_threads(int cpu, int cpus) { int ret; struct taskinfo *ti1, *ti2; ti1 = new (struct taskinfo); ti2 = new (struct taskinfo); if (!ti1 || !ti2) { printf("Could not allocate task info\n"); exit(1); } g_svsem_ids[cpu] = semget(IPC_PRIVATE,2,0777|IPC_CREAT); if(g_svsem_ids[cpu] == -1) { printf("sem array create failed.\n"); exit(1); } g_results[cpu] = 0; g_results[cpu+cpus] = 0; ti1->svsem_id = g_svsem_ids[cpu]; ti1->threadid = cpu; ti1->cpuid = cpu; ti1->sender = 1; ti2->svsem_id = g_svsem_ids[cpu]; ti2->threadid = cpu+cpus; ti2->cpuid = cpu; ti2->sender = 0; ret = pthread_create(&g_threads[ti1->threadid], NULL, worker_thread, ti1); if (ret) { printf(" pthread_create failed with error code %d\n", ret); exit(1); } ret = pthread_create(&g_threads[ti2->threadid], NULL, worker_thread, ti2); if (ret) { printf(" pthread_create failed with error code %d\n", ret); exit(1); } } ////////////////////////////////////////////////////////////////////////////// int main(int argc, char **argv) { int queues, timeout; unsigned long long totals; int i; printf("psem [nr queues] [timeout]\n"); if (argc != 3) { printf(" Invalid parameters.\n"); return 0; } queues = atoi(argv[1]); timeout = atoi(argv[2]); printf("Using %d queues/cpus (%d threads) for %d seconds.\n", queues, 2*queues, timeout); g_results = new unsigned long long[2*queues]; g_svsem_ids = new int[queues]; g_threads = new pthread_t[2*queues]; for (i=0;i<queues;i++) { init_threads(i, queues); } sleep(1); g_state = RUNNING; sleep(timeout); g_state = STOPPED; sleep(1); for (i=0;i<queues;i++) { int res; res = semctl(g_svsem_ids[i],1,IPC_RMID,NULL); if (res < 0) { printf("semctl(IPC_RMID) failed for %d, errno%d.\n", g_svsem_ids[i], errno); } } for (i=0;i<2*queues;i++) pthread_join(g_threads[i], NULL); printf("Result matrix:\n"); totals = 0; for (i=0;i<queues;i++) { printf(" Thread %3d: %8lld %3d: %8lld\n", i, g_results[i], i+queues, g_results[i+queues]); totals += g_results[i] + g_results[i+queues]; } printf("Total: %lld\n", totals); } ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-22 14:22 ` Manfred Spraul @ 2008-03-22 19:08 ` Manfred Spraul 2008-03-25 15:50 ` Mike Galbraith 2008-03-22 19:35 ` Scalability requirements for sysv ipc Mike Galbraith 1 sibling, 1 reply; 27+ messages in thread From: Manfred Spraul @ 2008-03-22 19:08 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Mike Galbraith, paulmck, Nadia Derbey, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 636 bytes --] Hi all, I've revived my Dual-CPU Pentium III/850: I couldn't notice a scalability-problem (two cpus are around 190%, but just the normal performance of 2.6.25-rc3 is abyssimal, 55 to 60% slower than 2.6.18.8: psem 2.6.18 2.6.25 Diff [%] 1 cpu 948.005 398.435 -57,97 2 cpus 1.768.273 734.816 -58,44 Scalability [%] 193,26 192,21 pmsg 2.6.18 2.6.25 Diff [%] 1 cpu 821.582 356.904 -56,56 2 cpus 1.488.058 661.754 -55,53 Scalability [%] 190,56 192,71 Attached are the .config files and the individual results. Did I accidentially enable a scheduler debug option? -- Manfred [-- Attachment #2: bench.tar.gz --] [-- Type: application/x-gzip, Size: 38101 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-22 19:08 ` Manfred Spraul @ 2008-03-25 15:50 ` Mike Galbraith 2008-03-25 16:13 ` Peter Zijlstra 2008-03-30 14:12 ` Scalability requirements for sysv ipc (+namespaces broken with SEM_UNDO) Manfred Spraul 0 siblings, 2 replies; 27+ messages in thread From: Mike Galbraith @ 2008-03-25 15:50 UTC (permalink / raw) To: Manfred Spraul Cc: Linux Kernel Mailing List, paulmck, Nadia Derbey, Andrew Morton, Peter Zijlstra [-- Attachment #1: Type: text/plain, Size: 1879 bytes --] On Sat, 2008-03-22 at 20:08 +0100, Manfred Spraul wrote: > just the normal performance of 2.6.25-rc3 is abyssimal, 55 to 60% slower > than 2.6.18.8: After manually reverting 3e148c79938aa39035669c1cfa3ff60722134535, 2.6.25.git scaled linearly, but as you noted, markedly down from earlier kernels with this benchmark. 2.6.24.4 with same revert, but all 2.6.25.git ipc changes piled on top still performed close to 2.6.22, so I went looking. Bisection led me to.. 8f4d37ec073c17e2d4aa8851df5837d798606d6f is first bad commit commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f Author: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Fri Jan 25 21:08:29 2008 +0100 sched: high-res preemption tick Use HR-timers (when available) to deliver an accurate preemption tick. The regular scheduler tick that runs at 1/HZ can be too coarse when nice level are used. The fairness system will still keep the cpu utilisation 'fair' by then delaying the task that got an excessive amount of CPU time but try to minimize this by delivering preemption points spot-on. The average frequency of this extra interrupt is sched_latency / nr_latency. Which need not be higher than 1/HZ, its just that the distribution within the sched_latency period is important. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> :040000 040000 ab225228500f7a19d5ad20ca12ca3fc8ff5f5ad1 f1742e1d225a72aecea9d6961ed989b5943d31d8 M arch :040000 040000 25d85e4ef7a71b0cc76801a2526ebeb4dce180fe ae61510186b4fad708ef0211ac169decba16d4e5 M include :040000 040000 9247cec7dd506c648ac027c17e5a07145aa41b26 950832cc1dc4d30923f593ecec883a06b45d62e9 M kernel ..and I verified it via :-/ echo 7 > sched_features in latest. That only bought me roughly half though, so there's a part three in there somewhere. -Mike [-- Attachment #2: xxxx.pdf --] [-- Type: application/pdf, Size: 17909 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-25 15:50 ` Mike Galbraith @ 2008-03-25 16:13 ` Peter Zijlstra 2008-03-25 18:31 ` Mike Galbraith 2008-03-26 6:18 ` Mike Galbraith 2008-03-30 14:12 ` Scalability requirements for sysv ipc (+namespaces broken with SEM_UNDO) Manfred Spraul 1 sibling, 2 replies; 27+ messages in thread From: Peter Zijlstra @ 2008-03-25 16:13 UTC (permalink / raw) To: Mike Galbraith Cc: Manfred Spraul, Linux Kernel Mailing List, paulmck, Nadia Derbey, Andrew Morton, Ingo Molnar On Tue, 2008-03-25 at 16:50 +0100, Mike Galbraith wrote: > On Sat, 2008-03-22 at 20:08 +0100, Manfred Spraul wrote: > > > just the normal performance of 2.6.25-rc3 is abyssimal, 55 to 60% slower > > than 2.6.18.8: > > After manually reverting 3e148c79938aa39035669c1cfa3ff60722134535, > 2.6.25.git scaled linearly, but as you noted, markedly down from earlier > kernels with this benchmark. 2.6.24.4 with same revert, but all > 2.6.25.git ipc changes piled on top still performed close to 2.6.22, so > I went looking. Bisection led me to.. > > 8f4d37ec073c17e2d4aa8851df5837d798606d6f is first bad commit > commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f > Author: Peter Zijlstra <a.p.zijlstra@chello.nl> > Date: Fri Jan 25 21:08:29 2008 +0100 > > sched: high-res preemption tick > > Use HR-timers (when available) to deliver an accurate preemption tick. > > The regular scheduler tick that runs at 1/HZ can be too coarse when nice > level are used. The fairness system will still keep the cpu utilisation 'fair' > by then delaying the task that got an excessive amount of CPU time but try to > minimize this by delivering preemption points spot-on. > > The average frequency of this extra interrupt is sched_latency / nr_latency. > Which need not be higher than 1/HZ, its just that the distribution within the > sched_latency period is important. > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > Signed-off-by: Ingo Molnar <mingo@elte.hu> > > :040000 040000 ab225228500f7a19d5ad20ca12ca3fc8ff5f5ad1 f1742e1d225a72aecea9d6961ed989b5943d31d8 M arch > :040000 040000 25d85e4ef7a71b0cc76801a2526ebeb4dce180fe ae61510186b4fad708ef0211ac169decba16d4e5 M include > :040000 040000 9247cec7dd506c648ac027c17e5a07145aa41b26 950832cc1dc4d30923f593ecec883a06b45d62e9 M kernel > > ...and I verified it via :-/ echo 7 > sched_features in latest. That > only bought me roughly half though, so there's a part three in there > somewhere. Ouch, I guess hrtimers are just way expensive on some hardware... ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-25 16:13 ` Peter Zijlstra @ 2008-03-25 18:31 ` Mike Galbraith 2008-03-26 6:18 ` Mike Galbraith 1 sibling, 0 replies; 27+ messages in thread From: Mike Galbraith @ 2008-03-25 18:31 UTC (permalink / raw) To: Peter Zijlstra Cc: Manfred Spraul, Linux Kernel Mailing List, paulmck, Nadia Derbey, Andrew Morton, Ingo Molnar On Tue, 2008-03-25 at 17:13 +0100, Peter Zijlstra wrote: > On Tue, 2008-03-25 at 16:50 +0100, Mike Galbraith wrote: > > ...and I verified it via :-/ echo 7 > sched_features in latest. That > > only bought me roughly half though, so there's a part three in there > > somewhere. > > Ouch, I guess hrtimers are just way expensive on some hardware... That would be about on par with my luck. I'll try to muster up the gumption to go looking for part three, though my motivation for searching long ago proved to be a dead end wrt sysv ipc. -Mike ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-25 16:13 ` Peter Zijlstra 2008-03-25 18:31 ` Mike Galbraith @ 2008-03-26 6:18 ` Mike Galbraith 1 sibling, 0 replies; 27+ messages in thread From: Mike Galbraith @ 2008-03-26 6:18 UTC (permalink / raw) To: Peter Zijlstra Cc: Manfred Spraul, Linux Kernel Mailing List, paulmck, Nadia Derbey, Andrew Morton, Ingo Molnar On Tue, 2008-03-25 at 17:13 +0100, Peter Zijlstra wrote: > > ...and I verified it via :-/ echo 7 > sched_features in latest. That > > only bought me roughly half though, so there's a part three in there > > somewhere. > > Ouch, I guess hrtimers are just way expensive on some hardware... It takes a large bite out of my P4 as well. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc (+namespaces broken with SEM_UNDO) 2008-03-25 15:50 ` Mike Galbraith 2008-03-25 16:13 ` Peter Zijlstra @ 2008-03-30 14:12 ` Manfred Spraul 2008-03-30 15:21 ` David Newall 2008-03-30 17:18 ` Mike Galbraith 1 sibling, 2 replies; 27+ messages in thread From: Manfred Spraul @ 2008-03-30 14:12 UTC (permalink / raw) To: Mike Galbraith Cc: Linux Kernel Mailing List, paulmck, Nadia Derbey, Andrew Morton, Peter Zijlstra, Pavel Emelianov Mike Galbraith wrote: > On Sat, 2008-03-22 at 20:08 +0100, Manfred Spraul wrote: > > >> just the normal performance of 2.6.25-rc3 is abyssimal, 55 to 60% slower >> than 2.6.18.8: >> > > After manually reverting 3e148c79938aa39035669c1cfa3ff60722134535, > 2.6.25.git scaled linearly We can't just revert that patch: with IDR, a global lock is mandatory :-( We must either revert the whole idea of using IDR or live with the reduced scalability. Actually, there are further bugs: the undo structures are not namespace-aware, thus semop with SEM_UNDO, unshare, create new array with same id, but more semaphores, another semop with SEM_UNDO will corrupt kernel memory :-( I'll try to clean up the bugs first, then I'll look at the scalability again. -- Manfred ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc (+namespaces broken with SEM_UNDO) 2008-03-30 14:12 ` Scalability requirements for sysv ipc (+namespaces broken with SEM_UNDO) Manfred Spraul @ 2008-03-30 15:21 ` David Newall 2008-03-30 17:18 ` Mike Galbraith 1 sibling, 0 replies; 27+ messages in thread From: David Newall @ 2008-03-30 15:21 UTC (permalink / raw) To: Manfred Spraul Cc: Mike Galbraith, Linux Kernel Mailing List, paulmck, Nadia Derbey, Andrew Morton, Peter Zijlstra, Pavel Emelianov Manfred Spraul wrote: > Mike Galbraith wrote: >> On Sat, 2008-03-22 at 20:08 +0100, Manfred Spraul wrote: >>> just the normal performance of 2.6.25-rc3 is abyssimal, 55 to 60% >>> slower than 2.6.18.8: >>> >> >> After manually reverting 3e148c79938aa39035669c1cfa3ff60722134535, >> 2.6.25.git scaled linearly > We can't just revert that patch: with IDR, a global lock is mandatory :-( > We must either revert the whole idea of using IDR or live with the > reduced scalability. > > Actually, there are further bugs: the undo structures are not > namespace-aware, thus semop with SEM_UNDO, unshare, create new array > with same id, but more semaphores, another semop with SEM_UNDO will > corrupt kernel memory :-( You should revert it all. The scalability problem isn't good, but from what you're saying, the idea isn't ready yet. Revert it all, fix the problems at your leisure, and submit new patches then. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc (+namespaces broken with SEM_UNDO) 2008-03-30 14:12 ` Scalability requirements for sysv ipc (+namespaces broken with SEM_UNDO) Manfred Spraul 2008-03-30 15:21 ` David Newall @ 2008-03-30 17:18 ` Mike Galbraith 2008-04-04 14:59 ` Nadia Derbey 1 sibling, 1 reply; 27+ messages in thread From: Mike Galbraith @ 2008-03-30 17:18 UTC (permalink / raw) To: Manfred Spraul Cc: Linux Kernel Mailing List, paulmck, Nadia Derbey, Andrew Morton, Peter Zijlstra, Pavel Emelianov On Sun, 2008-03-30 at 16:12 +0200, Manfred Spraul wrote: > Mike Galbraith wrote: > > On Sat, 2008-03-22 at 20:08 +0100, Manfred Spraul wrote: > > > > > >> just the normal performance of 2.6.25-rc3 is abyssimal, 55 to 60% slower > >> than 2.6.18.8: > >> > > > > After manually reverting 3e148c79938aa39035669c1cfa3ff60722134535, > > 2.6.25.git scaled linearly > We can't just revert that patch: with IDR, a global lock is mandatory :-( > We must either revert the whole idea of using IDR or live with the > reduced scalability. Yeah, I looked at the problem, but didn't know what the heck to do about it, so just grabbed my axe to verify/quantify. > Actually, there are further bugs: the undo structures are not > namespace-aware, thus semop with SEM_UNDO, unshare, create new array > with same id, but more semaphores, another semop with SEM_UNDO will > corrupt kernel memory :-( > I'll try to clean up the bugs first, then I'll look at the scalability > again. Great! -Mike ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc (+namespaces broken with SEM_UNDO) 2008-03-30 17:18 ` Mike Galbraith @ 2008-04-04 14:59 ` Nadia Derbey 2008-04-04 15:03 ` Nadia Derbey 0 siblings, 1 reply; 27+ messages in thread From: Nadia Derbey @ 2008-04-04 14:59 UTC (permalink / raw) To: Mike Galbraith Cc: Manfred Spraul, Linux Kernel Mailing List, paulmck, Andrew Morton, Peter Zijlstra, Pavel Emelianov [-- Attachment #1: Type: text/plain, Size: 1940 bytes --] Mike Galbraith wrote: > On Sun, 2008-03-30 at 16:12 +0200, Manfred Spraul wrote: > >>Mike Galbraith wrote: >> >>>On Sat, 2008-03-22 at 20:08 +0100, Manfred Spraul wrote: >>> >>> >>> >>>>just the normal performance of 2.6.25-rc3 is abyssimal, 55 to 60% slower >>>>than 2.6.18.8: >>>> >>> >>>After manually reverting 3e148c79938aa39035669c1cfa3ff60722134535, >>>2.6.25.git scaled linearly >> >>We can't just revert that patch: with IDR, a global lock is mandatory :-( >>We must either revert the whole idea of using IDR or live with the >>reduced scalability. > > > Yeah, I looked at the problem, but didn't know what the heck to do about > it, so just grabbed my axe to verify/quantify. > > >>Actually, there are further bugs: the undo structures are not >>namespace-aware, thus semop with SEM_UNDO, unshare, create new array >>with same id, but more semaphores, another semop with SEM_UNDO will >>corrupt kernel memory :-( >>I'll try to clean up the bugs first, then I'll look at the scalability >>again. > > > Great! > > -Mike > > > > I could get better results with the following solution: wrote an RCU-based idr api (layers allocation is managed similarly to the radix-tree one) Using it in the ipc code makes me get rid of the read lock taken in ipc_lock() (the one introduced in 3e148c79938aa39035669c1cfa3ff60722134535). You'll find the results in attachment (kernel is 2.6.25-rc3-mm1). output.25_rc3_mm1.ref.8 --> pmsg output for the 2.6.25-rc3-mm1 plot.25_rc3_mm1.ref.8 --> previous file results for use by gnuplot output.25_rc3_mm1.ridr.8 --> pmsg output for the 2.6.25-rc3-mm1 + rcu-based idrs plot.25_rc3_mm1.ridr.8 --> previous file results for use by gnuplot I think I should be able to send a patch next week. It is presently an uggly code: I copied idr.c and idr.h into ridr.c and ridr.h to go fast, so didn't do any code factorization. Regards Nadia [-- Attachment #2: output.25_rc3_mm1.ref.8 --] [-- Type: text/x-troff-man, Size: 5846 bytes --] pmsg [nr queues] [timeout] Using 1 queues/cpus (2 threads) for 5 seconds. thread 0: sysvmsg 0 type 1 bound to 0001h thread 1: sysvmsg 0 type 0 bound to 0001h Result matrix: Thread 0: 488650 1: 488650 Total: 977300 pmsg [nr queues] [timeout] Using 2 queues/cpus (4 threads) for 5 seconds. thread 0: sysvmsg 32768 type 1 bound to 0001h thread 1: sysvmsg 65537 type 1 bound to 0002h thread 3: sysvmsg 65537 type 0 bound to 0002h thread 2: sysvmsg 32768 type 0 bound to 0001h Result matrix: Thread 0: 223991 2: 223991 Thread 1: 225588 3: 225588 Total: 899158 pmsg [nr queues] [timeout] Using 3 queues/cpus (6 threads) for 5 seconds. thread 0: sysvmsg 98304 type 1 bound to 0001h thread 1: sysvmsg 131073 type 1 bound to 0002h thread 2: sysvmsg 163842 type 1 bound to 0004h thread 5: sysvmsg 163842 type 0 bound to 0004h thread 4: sysvmsg 131073 type 0 bound to 0002h thread 3: sysvmsg 98304 type 0 bound to 0001h Result matrix: Thread 0: 183407 3: 183407 Thread 1: 184030 4: 184030 Thread 2: 357875 5: 357876 Total: 1450625 pmsg [nr queues] [timeout] Using 4 queues/cpus (8 threads) for 5 seconds. thread 0: sysvmsg 196608 type 1 bound to 0001h thread 1: sysvmsg 229377 type 1 bound to 0002h thread 2: sysvmsg 262146 type 1 bound to 0004h thread 3: sysvmsg 294915 type 1 bound to 0008h thread 5: sysvmsg 229377 type 0 bound to 0002h thread 6: sysvmsg 262146 type 0 bound to 0004h thread 7: sysvmsg 294915 type 0 bound to 0008h thread 4: sysvmsg 196608 type 0 bound to 0001h Result matrix: Thread 0: 166911 4: 166912 Thread 1: 159281 5: 159281 Thread 2: 166024 6: 166024 Thread 3: 167440 7: 167440 Total: 1319313 pmsg [nr queues] [timeout] Using 5 queues/cpus (10 threads) for 5 seconds. thread 0: sysvmsg 327680 type 1 bound to 0001h thread 2: sysvmsg 393218 type 1 bound to 0004h thread 3: sysvmsg 425987 type 1 bound to 0008h thread 4: sysvmsg 458756 type 1 bound to 0010h thread 9: sysvmsg 458756 type 0 bound to 0010h thread 6: sysvmsg 360449 type 0 bound to 0002h thread 8: sysvmsg 425987 type 0 bound to 0008h thread 7: sysvmsg 393218 type 0 bound to 0004h thread 1: sysvmsg 360449 type 1 bound to 0002h thread 5: sysvmsg 327680 type 0 bound to 0001h Result matrix: Thread 0: 39740 5: 39740 Thread 1: 40399 6: 40399 Thread 2: 40326 7: 40327 Thread 3: 39290 8: 39290 Thread 4: 68684 9: 68685 Total: 456880 pmsg [nr queues] [timeout] Using 6 queues/cpus (12 threads) for 5 seconds. thread 0: sysvmsg 491520 type 1 bound to 0001h thread 1: sysvmsg 524289 type 1 bound to 0002h thread 2: sysvmsg 557058 type 1 bound to 0004h thread 3: sysvmsg 589827 type 1 bound to 0008h thread 4: sysvmsg 622596 type 1 bound to 0010h thread 5: sysvmsg 655365 type 1 bound to 0020h thread 6: sysvmsg 491520 type 0 bound to 0001h thread 11: sysvmsg 655365 type 0 bound to 0020h thread 10: sysvmsg 622596 type 0 bound to 0010h thread 8: sysvmsg 557058 type 0 bound to 0004h thread 9: sysvmsg 589827 type 0 bound to 0008h thread 7: sysvmsg 524289 type 0 bound to 0002h Result matrix: Thread 0: 27901 6: 27901 Thread 1: 28554 7: 28555 Thread 2: 28471 8: 28472 Thread 3: 28015 9: 28016 Thread 4: 28213 10: 28213 Thread 5: 28396 11: 28396 Total: 339103 pmsg [nr queues] [timeout] Using 7 queues/cpus (14 threads) for 5 seconds. thread 0: sysvmsg 688128 type 1 bound to 0001h thread 1: sysvmsg 720897 type 1 bound to 0002h thread 2: sysvmsg 753666 type 1 bound to 0004h thread 3: sysvmsg 786435 type 1 bound to 0008h thread 4: sysvmsg 819204 type 1 bound to 0010h thread 5: sysvmsg 851973 type 1 bound to 0020h thread 6: sysvmsg 884742 type 1 bound to 0040h thread 13: sysvmsg 884742 type 0 bound to 0040h thread 7: sysvmsg 688128 type 0 bound to 0001h thread 11: sysvmsg 819204 type 0 bound to 0010h thread 12: sysvmsg 851973 type 0 bound to 0020h thread 8: sysvmsg 720897 type 0 bound to 0002h thread 10: sysvmsg 786435 type 0 bound to 0008h thread 9: sysvmsg 753666 type 0 bound to 0004h Result matrix: Thread 0: 12201 7: 12201 Thread 1: 12451 8: 12452 Thread 2: 12345 9: 12345 Thread 3: 12277 10: 12278 Thread 4: 12259 11: 12259 Thread 5: 12364 12: 12365 Thread 6: 24666 13: 24666 Total: 197129 pmsg [nr queues] [timeout] Using 8 queues/cpus (16 threads) for 5 seconds. thread 0: sysvmsg 917504 type 1 bound to 0001h thread 1: sysvmsg 950273 type 1 bound to 0002h thread 2: sysvmsg 983042 type 1 bound to 0004h thread 3: sysvmsg 1015811 type 1 bound to 0008h thread 4: sysvmsg 1048580 type 1 bound to 0010h thread 5: sysvmsg 1081349 type 1 bound to 0020h thread 6: sysvmsg 1114118 type 1 bound to 0040h thread 7: sysvmsg 1146887 type 1 bound to 0080h thread 15: sysvmsg 1146887 type 0 bound to 0080h thread 8: sysvmsg 917504 type 0 bound to 0001h thread 14: sysvmsg 1114118 type 0 bound to 0040h thread 13: sysvmsg 1081349 type 0 bound to 0020h thread 12: sysvmsg 1048580 type 0 bound to 0010h thread 11: sysvmsg 1015811 type 0 bound to 0008h thread 10: sysvmsg 983042 type 0 bound to 0004h thread 9: sysvmsg 950273 type 0 bound to 0002h Result matrix: Thread 0: 11082 8: 11083 Thread 1: 11461 9: 11461 Thread 2: 11430 10: 11431 Thread 3: 11184 11: 11185 Thread 4: 11373 12: 11374 Thread 5: 11290 13: 11291 Thread 6: 11265 14: 11266 Thread 7: 11324 15: 11325 Total: 180825 [-- Attachment #3: plot.25_rc3_mm1.ref.8 --] [-- Type: text/x-troff-man, Size: 74 bytes --] 1 977300 2 899158 3 1450625 4 1319313 5 456880 6 339103 7 197129 8 180825 [-- Attachment #4: output.25_rc3_mm1.ridr.8 --] [-- Type: text/x-troff-man, Size: 5851 bytes --] pmsg [nr queues] [timeout] Using 1 queues/cpus (2 threads) for 5 seconds. thread 0: sysvmsg 0 type 1 bound to 0001h thread 1: sysvmsg 0 type 0 bound to 0001h Result matrix: Thread 0: 549365 1: 549365 Total: 1098730 pmsg [nr queues] [timeout] Using 2 queues/cpus (4 threads) for 5 seconds. thread 0: sysvmsg 32768 type 1 bound to 0001h thread 1: sysvmsg 65537 type 1 bound to 0002h thread 3: sysvmsg 65537 type 0 bound to 0002h thread 2: sysvmsg 32768 type 0 bound to 0001h Result matrix: Thread 0: 245002 2: 245003 Thread 1: 246618 3: 246619 Total: 983242 pmsg [nr queues] [timeout] Using 3 queues/cpus (6 threads) for 5 seconds. thread 0: sysvmsg 98304 type 1 bound to 0001h thread 1: sysvmsg 131073 type 1 bound to 0002h thread 2: sysvmsg 163842 type 1 bound to 0004h thread 5: sysvmsg 163842 type 0 bound to 0004h thread 4: sysvmsg 131073 type 0 bound to 0002h thread 3: sysvmsg 98304 type 0 bound to 0001h Result matrix: Thread 0: 231585 3: 231586 Thread 1: 233256 4: 233256 Thread 2: 509630 5: 509631 Total: 1948944 pmsg [nr queues] [timeout] Using 4 queues/cpus (8 threads) for 5 seconds. thread 0: sysvmsg 196608 type 1 bound to 0001h thread 1: sysvmsg 229377 type 1 bound to 0002h thread 2: sysvmsg 262146 type 1 bound to 0004h thread 3: sysvmsg 294915 type 1 bound to 0008h thread 5: sysvmsg 229377 type 0 bound to 0002h thread 6: sysvmsg 262146 type 0 bound to 0004h thread 7: sysvmsg 294915 type 0 bound to 0008h thread 4: sysvmsg 196608 type 0 bound to 0001h Result matrix: Thread 0: 233392 4: 233392 Thread 1: 234485 5: 234486 Thread 2: 235604 6: 235604 Thread 3: 235683 7: 235683 Total: 1878329 pmsg [nr queues] [timeout] Using 5 queues/cpus (10 threads) for 5 seconds. thread 0: sysvmsg 327680 type 1 bound to 0001h thread 2: sysvmsg 393218 type 1 bound to 0004h thread 3: sysvmsg 425987 type 1 bound to 0008h thread 4: sysvmsg 458756 type 1 bound to 0010h thread 1: sysvmsg 360449 type 1 bound to 0002h thread 9: sysvmsg 458756 type 0 bound to 0010h thread 6: sysvmsg 360449 type 0 bound to 0002h thread 7: sysvmsg 393218 type 0 bound to 0004h thread 8: sysvmsg 425987 type 0 bound to 0008h thread 5: sysvmsg 327680 type 0 bound to 0001h Result matrix: Thread 0: 216094 5: 216095 Thread 1: 227109 6: 227110 Thread 2: 222042 7: 222042 Thread 3: 222708 8: 222708 Thread 4: 467186 9: 467187 Total: 2710281 pmsg [nr queues] [timeout] Using 6 queues/cpus (12 threads) for 5 seconds. thread 0: sysvmsg 491520 type 1 bound to 0001h thread 1: sysvmsg 524289 type 1 bound to 0002h thread 2: sysvmsg 557058 type 1 bound to 0004h thread 3: sysvmsg 589827 type 1 bound to 0008h thread 4: sysvmsg 622596 type 1 bound to 0010h thread 5: sysvmsg 655365 type 1 bound to 0020h thread 6: sysvmsg 491520 type 0 bound to 0001h thread 11: sysvmsg 655365 type 0 bound to 0020h thread 8: sysvmsg 557058 type 0 bound to 0004h thread 10: sysvmsg 622596 type 0 bound to 0010h thread 9: sysvmsg 589827 type 0 bound to 0008h thread 7: sysvmsg 524289 type 0 bound to 0002h Result matrix: Thread 0: 224027 6: 224028 Thread 1: 225394 7: 225394 Thread 2: 223545 8: 223545 Thread 3: 223599 9: 223599 Thread 4: 224632 10: 224633 Thread 5: 224511 11: 224512 Total: 2691419 pmsg [nr queues] [timeout] Using 7 queues/cpus (14 threads) for 5 seconds. thread 0: sysvmsg 688128 type 1 bound to 0001h thread 1: sysvmsg 720897 type 1 bound to 0002h thread 2: sysvmsg 753666 type 1 bound to 0004h thread 3: sysvmsg 786435 type 1 bound to 0008h thread 4: sysvmsg 819204 type 1 bound to 0010h thread 5: sysvmsg 851973 type 1 bound to 0020h thread 6: sysvmsg 884742 type 1 bound to 0040h thread 13: sysvmsg 884742 type 0 bound to 0040h thread 8: sysvmsg 720897 type 0 bound to 0002h thread 9: sysvmsg 753666 type 0 bound to 0004h thread 10: sysvmsg 786435 type 0 bound to 0008h thread 11: sysvmsg 819204 type 0 bound to 0010h thread 7: sysvmsg 688128 type 0 bound to 0001h thread 12: sysvmsg 851973 type 0 bound to 0020h Result matrix: Thread 0: 188264 7: 188264 Thread 1: 190677 8: 190677 Thread 2: 188850 9: 188851 Thread 3: 188925 10: 188926 Thread 4: 190333 11: 190334 Thread 5: 189235 12: 189235 Thread 6: 386862 13: 386863 Total: 3046296 pmsg [nr queues] [timeout] Using 8 queues/cpus (16 threads) for 5 seconds. thread 0: sysvmsg 917504 type 1 bound to 0001h thread 1: sysvmsg 950273 type 1 bound to 0002h thread 2: sysvmsg 983042 type 1 bound to 0004h thread 3: sysvmsg 1015811 type 1 bound to 0008h thread 4: sysvmsg 1048580 type 1 bound to 0010h thread 5: sysvmsg 1081349 type 1 bound to 0020h thread 6: sysvmsg 1114118 type 1 bound to 0040h thread 7: sysvmsg 1146887 type 1 bound to 0080h thread 8: sysvmsg 917504 type 0 bound to 0001h thread 10: sysvmsg 983042 type 0 bound to 0004h thread 11: sysvmsg 1015811 type 0 bound to 0008h thread 12: sysvmsg 1048580 type 0 bound to 0010h thread 13: sysvmsg 1081349 type 0 bound to 0020h thread 9: sysvmsg 950273 type 0 bound to 0002h thread 15: sysvmsg 1146887 type 0 bound to 0080h thread 14: sysvmsg 1114118 type 0 bound to 0040h Result matrix: Thread 0: 187613 8: 187614 Thread 1: 190488 9: 190489 Thread 2: 190112 10: 190113 Thread 3: 190374 11: 190375 Thread 4: 190658 12: 190658 Thread 5: 190508 13: 190508 Thread 6: 189222 14: 189223 Thread 7: 190272 15: 190272 Total: 3038499 [-- Attachment #5: plot.25_rc3_mm1.ridr.8 --] [-- Type: text/x-troff-man, Size: 79 bytes --] 1 1098730 2 983242 3 1948944 4 1878329 5 2710281 6 2691419 7 3046296 8 3038499 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc (+namespaces broken with SEM_UNDO) 2008-04-04 14:59 ` Nadia Derbey @ 2008-04-04 15:03 ` Nadia Derbey 0 siblings, 0 replies; 27+ messages in thread From: Nadia Derbey @ 2008-04-04 15:03 UTC (permalink / raw) To: Nadia Derbey Cc: Mike Galbraith, Manfred Spraul, Linux Kernel Mailing List, paulmck, Andrew Morton, Peter Zijlstra, Pavel Emelianov, NADIA DERBEY Nadia Derbey wrote: > Mike Galbraith wrote: > >> On Sun, 2008-03-30 at 16:12 +0200, Manfred Spraul wrote: >> >>> Mike Galbraith wrote: >>> >>>> On Sat, 2008-03-22 at 20:08 +0100, Manfred Spraul wrote: >>>> >>>> >>>> >>>>> just the normal performance of 2.6.25-rc3 is abyssimal, 55 to 60% >>>>> slower than 2.6.18.8: >>>>> >>>> >>>> >>>> After manually reverting 3e148c79938aa39035669c1cfa3ff60722134535, >>>> 2.6.25.git scaled linearly >>> >>> >>> We can't just revert that patch: with IDR, a global lock is mandatory >>> :-( >>> We must either revert the whole idea of using IDR or live with the >>> reduced scalability. >> >> >> >> Yeah, I looked at the problem, but didn't know what the heck to do about >> it, so just grabbed my axe to verify/quantify. >> >> >>> Actually, there are further bugs: the undo structures are not >>> namespace-aware, thus semop with SEM_UNDO, unshare, create new array >>> with same id, but more semaphores, another semop with SEM_UNDO will >>> corrupt kernel memory :-( >>> I'll try to clean up the bugs first, then I'll look at the >>> scalability again. >> >> >> >> Great! >> >> -Mike >> >> >> >> > > I could get better results with the following solution: > wrote an RCU-based idr api (layers allocation is managed similarly to > the radix-tree one) > > Using it in the ipc code makes me get rid of the read lock taken in > ipc_lock() (the one introduced in > 3e148c79938aa39035669c1cfa3ff60722134535). > > You'll find the results in attachment (kernel is 2.6.25-rc3-mm1). > output.25_rc3_mm1.ref.8 --> pmsg output for the 2.6.25-rc3-mm1 > plot.25_rc3_mm1.ref.8 --> previous file results for use by gnuplot > output.25_rc3_mm1.ridr.8 --> pmsg output for the 2.6.25-rc3-mm1 > + rcu-based idrs > plot.25_rc3_mm1.ridr.8 --> previous file results for use by gnuplot > > > I think I should be able to send a patch next week. It is presently an > uggly code: I copied idr.c and idr.h into ridr.c and ridr.h to go fast, > so didn't do any code factorization. > > Regards > Nadia > > > Sorry forgot the command: for i in 1 2 3 4 5 6 7 8;do ./pmsg $i 5;done > output.25_rc3_mm1.ref.8 Regards, Nadia ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-22 14:22 ` Manfred Spraul 2008-03-22 19:08 ` Manfred Spraul @ 2008-03-22 19:35 ` Mike Galbraith 2008-03-23 6:38 ` Manfred Spraul 2008-03-23 7:08 ` Mike Galbraith 1 sibling, 2 replies; 27+ messages in thread From: Mike Galbraith @ 2008-03-22 19:35 UTC (permalink / raw) To: Manfred Spraul Cc: paulmck, Nadia Derbey, Linux Kernel Mailing List, Andrew Morton On Sat, 2008-03-22 at 15:22 +0100, Manfred Spraul wrote: > Mike Galbraith wrote: > > Total: 4691827 Total: 3942000 > > > Thanks. Unfortunately the test was buggy, it bound the tasks to the > wrong cpu :-( > Could you run it again? Actually 1 cpu and 4 cpus are probably enough. Sure. (ran as before, hopefully no transcription errors) 2.6.22.18-cfs-v24-smp 2.6.24.3-smp Result matrix: (psem) Thread 0: 2395778 1: 2395779 Thread 0: 2054990 1: 2054992 Total: 4791557 Total: 4009069 Result matrix: (pmsg) Thread 0: 2317014 1: 2317015 Thread 0: 1959099 1: 1959099 Total: 4634029 Total: 3918198 Result matrix: Thread 0: 2340716 2: 2340716 Thread 0: 1890292 2: 1890293 Thread 1: 2361052 3: 2361052 Thread 1: 1899031 3: 1899032 Total: 9403536 Total: 7578648 Result matrix: Thread 0: 1429567 2: 1429567 Thread 0: 1295071 2: 1295071 Thread 1: 1429267 3: 1429268 Thread 1: 1289253 3: 1289254 Total: 5717669 Total: 5168649 Result matrix: Thread 0: 2263039 3: 2263039 Thread 0: 1351208 3: 1351209 Thread 1: 2265120 4: 2265121 Thread 1: 1351300 4: 1351300 Thread 2: 2263642 5: 2263642 Thread 2: 1319512 5: 1319512 Total: 13583603 Total: 8044041 Result matrix: Thread 0: 483934 3: 483934 Thread 0: 514766 3: 514767 Thread 1: 239714 4: 239715 Thread 1: 252764 4: 252765 Thread 2: 270216 5: 270216 Thread 2: 253216 5: 253217 Total: 1987729 Total: 2041495 Result matrix: Thread 0: 2260038 4: 2260039 Thread 0: 642235 4: 642236 Thread 1: 2262748 5: 2262749 Thread 1: 642742 5: 642743 Thread 2: 2271236 6: 2271237 Thread 2: 640281 6: 640282 Thread 3: 2257651 7: 2257652 Thread 3: 641931 7: 641931 Total: 18103350 Total: 5134381 Result matrix: Thread 0: 382811 4: 382811 Thread 0: 342297 4: 342297 Thread 1: 382801 5: 382802 Thread 1: 342309 5: 342310 Thread 2: 376620 6: 376621 Thread 2: 343857 6: 343857 Thread 3: 376559 7: 376559 Thread 3: 343836 7: 343836 Total: 3037584 Total: 2744599 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-22 19:35 ` Scalability requirements for sysv ipc Mike Galbraith @ 2008-03-23 6:38 ` Manfred Spraul 2008-03-23 7:15 ` Mike Galbraith 2008-03-23 7:08 ` Mike Galbraith 1 sibling, 1 reply; 27+ messages in thread From: Manfred Spraul @ 2008-03-23 6:38 UTC (permalink / raw) To: Mike Galbraith Cc: paulmck, Nadia Derbey, Linux Kernel Mailing List, Andrew Morton Mike Galbraith wrote: > On Sat, 2008-03-22 at 15:22 +0100, Manfred Spraul wrote: > >> Mike Galbraith wrote: >> >>> Total: 4691827 Total: 3942000 >>> >>> >> Thanks. Unfortunately the test was buggy, it bound the tasks to the >> wrong cpu :-( >> Could you run it again? Actually 1 cpu and 4 cpus are probably enough. >> > > Sure. (ran as before, hopefully no transcription errors) > > Thanks: sysv sem: - 2.6.22 had almost linear scaling (up to 4 cores). - 2.6.24.3 scales to 2 cpus, then it collapses. with 4 cores, it's 75% slower than 2.6.22. sysv msg: - neither 2.6.22 nor 2.6.24 scale very good. That's more or less expected, the message queue code contains a few global statistic counters (msg_hdrs, msg_bytes). The cleanup of sysv is nice, but IMHO sysv sem should remain scalable - and a gloal semaphore with IDR can't be as scalable as the RCU protected array that was used before. -- Manfred ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-23 6:38 ` Manfred Spraul @ 2008-03-23 7:15 ` Mike Galbraith 0 siblings, 0 replies; 27+ messages in thread From: Mike Galbraith @ 2008-03-23 7:15 UTC (permalink / raw) To: Manfred Spraul Cc: paulmck, Nadia Derbey, Linux Kernel Mailing List, Andrew Morton On Sun, 2008-03-23 at 07:38 +0100, Manfred Spraul wrote: > Mike Galbraith wrote: > > On Sat, 2008-03-22 at 15:22 +0100, Manfred Spraul wrote: > > > >> Mike Galbraith wrote: > >> > >>> Total: 4691827 Total: 3942000 > >>> > >>> > >> Thanks. Unfortunately the test was buggy, it bound the tasks to the > >> wrong cpu :-( > >> Could you run it again? Actually 1 cpu and 4 cpus are probably enough. > >> > > > > Sure. (ran as before, hopefully no transcription errors) > > > > > Thanks: > sysv sem: > - 2.6.22 had almost linear scaling (up to 4 cores). > - 2.6.24.3 scales to 2 cpus, then it collapses. with 4 cores, it's 75% > slower than 2.6.22. > > sysv msg: > - neither 2.6.22 nor 2.6.24 scale very good. That's more or less > expected, the message queue code contains a few global statistic > counters (msg_hdrs, msg_bytes). Actually, 2.6.22 is fine, and 2.6.24.3 is not, just as sysv sem. I just noticed that pmsg didn't get recompiled last night (fat finger) , and sent a correction. -Mike ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-22 19:35 ` Scalability requirements for sysv ipc Mike Galbraith 2008-03-23 6:38 ` Manfred Spraul @ 2008-03-23 7:08 ` Mike Galbraith 2008-03-23 7:20 ` Mike Galbraith 1 sibling, 1 reply; 27+ messages in thread From: Mike Galbraith @ 2008-03-23 7:08 UTC (permalink / raw) To: Manfred Spraul Cc: paulmck, Nadia Derbey, Linux Kernel Mailing List, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 1108 bytes --] On Sat, 2008-03-22 at 20:35 +0100, Mike Galbraith wrote: > On Sat, 2008-03-22 at 15:22 +0100, Manfred Spraul wrote: > > Mike Galbraith wrote: > > > Total: 4691827 Total: 3942000 > > > > > Thanks. Unfortunately the test was buggy, it bound the tasks to the > > wrong cpu :-( > > Could you run it again? Actually 1 cpu and 4 cpus are probably enough. > > Sure. (ran as before, hopefully no transcription errors) Looking at the output over morning java, I noticed that pmsg didn't get recompiled due to a fat finger, so those numbers are bogus. Corrected condensed version of output is below, charted data attached. (hope evolution doesn't turn this into something other than plain text) 1 2 3 4 2.6.22.18-cfs-v24.1 psem 4791557 9403536 13583603 18103350 2.6.22.18-cfs-v24.1 pmsg 4906249 9171440 13264752 17774106 2.6.24.3 psem 4009069 7578648 8044041 5134381 2.6.24.3 pmsg 3917588 7290206 7644794 4824967 [-- Attachment #2: xxxx.pdf --] [-- Type: application/pdf, Size: 16243 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-23 7:08 ` Mike Galbraith @ 2008-03-23 7:20 ` Mike Galbraith 0 siblings, 0 replies; 27+ messages in thread From: Mike Galbraith @ 2008-03-23 7:20 UTC (permalink / raw) To: Manfred Spraul Cc: paulmck, Nadia Derbey, Linux Kernel Mailing List, Andrew Morton On Sun, 2008-03-23 at 08:08 +0100, Mike Galbraith wrote: > On Sat, 2008-03-22 at 20:35 +0100, Mike Galbraith wrote: > > On Sat, 2008-03-22 at 15:22 +0100, Manfred Spraul wrote: > > > Mike Galbraith wrote: > > > > Total: 4691827 Total: 3942000 > > > > > > > Thanks. Unfortunately the test was buggy, it bound the tasks to the > > > wrong cpu :-( > > > Could you run it again? Actually 1 cpu and 4 cpus are probably enough. > > > > Sure. (ran as before, hopefully no transcription errors) > > Looking at the output over morning java, I noticed that pmsg didn't get > recompiled due to a fat finger, so those numbers are bogus. Corrected > condensed version of output is below, charted data attached. > > (hope evolution doesn't turn this into something other than plain text) Pff, I'd rather have had the bounce. Good thing I attached the damn chart, evolution can't screw that up. > > > > 1 > 2 > 3 > 4 > 2.6.22.18-cfs-v24.1 psem > 4791557 > 9403536 > 13583603 > 18103350 > 2.6.22.18-cfs-v24.1 pmsg > 4906249 > 9171440 > 13264752 > 17774106 > 2.6.24.3 psem > 4009069 > 7578648 > 8044041 > 5134381 > 2.6.24.3 pmsg > 3917588 > 7290206 > 7644794 > 4824967 > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-22 5:43 ` Mike Galbraith 2008-03-22 10:10 ` Manfred Spraul @ 2008-03-27 22:29 ` Bill Davidsen 2008-03-28 9:49 ` Manfred Spraul 1 sibling, 1 reply; 27+ messages in thread From: Bill Davidsen @ 2008-03-27 22:29 UTC (permalink / raw) To: Mike Galbraith Cc: Manfred Spraul, paulmck, Nadia Derbey, Linux Kernel Mailing List, Andrew Morton Mike Galbraith wrote: > On Fri, 2008-03-21 at 17:08 +0100, Manfred Spraul wrote: >> Paul E. McKenney wrote: >>> I could give it a spin -- though I would need to be pointed to the >>> patch and the test. >>> >>> >> I'd just compare a recent kernel with something older, pre Fri Oct 19 >> 11:53:44 2007 >> >> Then download ctxbench, run one instance on each core, bound with taskset. >> http://www.tmr.com/%7Epublic/source/ >> (I don't juse ctxbench myself, if it doesn't work then I could post my >> own app. It would be i386 only with RDTSCs inside) > > (test gizmos are always welcome) > > Results for Q6600 box don't look particularly wonderful. > > taskset -c 3 ./ctx -s > > 2.6.24.3 > 3766962 itterations in 9.999845 seconds = 376734/sec > > 2.6.22.18-cfs-v24.1 > 4375920 itterations in 10.006199 seconds = 437330/sec > > for i in 0 1 2 3; do taskset -c $i ./ctx -s& done > > 2.6.22.18-cfs-v24.1 > 4355784 itterations in 10.005670 seconds = 435361/sec > 4396033 itterations in 10.005686 seconds = 439384/sec > 4390027 itterations in 10.006511 seconds = 438739/sec > 4383906 itterations in 10.006834 seconds = 438128/sec > > 2.6.24.3 > 1269937 itterations in 9.999757 seconds = 127006/sec > 1266723 itterations in 9.999663 seconds = 126685/sec > 1267293 itterations in 9.999348 seconds = 126742/sec > 1265793 itterations in 9.999766 seconds = 126592/sec > Glad to see that ctxbench is still useful, I think there's a more recent version I haven't put up, which uses threads rather than processes, but there were similar values generated, so I somewhat lost interest. There was a "round robin" feature to pass the token through more processes, again I didn't find more use for the data. I never tried binding the process to a CPU, in general the affinity code puts one process per CPU under light load, and limits the context switch overhead. It looks as if you are testing only the single CPU (or core) case. -- Bill Davidsen <davidsen@tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-27 22:29 ` Bill Davidsen @ 2008-03-28 9:49 ` Manfred Spraul 0 siblings, 0 replies; 27+ messages in thread From: Manfred Spraul @ 2008-03-28 9:49 UTC (permalink / raw) To: Bill Davidsen Cc: Mike Galbraith, paulmck, Nadia Derbey, Linux Kernel Mailing List, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 1179 bytes --] Bill Davidsen wrote: > > I never tried binding the process to a CPU, in general the affinity > code puts one process per CPU under light load, and limits the context > switch overhead. It looks as if you are testing only the single CPU > (or core) case. > Attached is a patch that I wrote that adds cpu binding. Feel free to add it to your sources. It's not that usefull, recent linux distros include a "taskset" command that can bind a task to a given cpu. I needed it for an older distro. With regards to the multi-core case: I've always ignored them, I couldn't find a good/realistic test case. Thundering herds (i.e.: one task wakes up lots of waiting tasks) is at least for sysv msg and sysv sem lockless: the woken up tasks do not take any locks, they return immediately to user space. Additionally, I don't know if the test case is realistic: at least postgres uses one semaphore for each process/thread, thus waking up multiple tasks never happens. Another case would be to bind both tasks to different cpus. I'm not sure if this happens in real life. Anyone around who knows how other databases implement locking? Is sysv sem still used? -- Manfred [-- Attachment #2: patch-cpubind --] [-- Type: text/plain, Size: 4313 bytes --] diff -ur ctxbench-1.9.orig/ctxbench.c ctxbench-1.9/ctxbench.c --- ctxbench-1.9.orig/ctxbench.c 2002-12-09 22:41:59.000000000 +0100 +++ ctxbench-1.9/ctxbench.c 2008-03-28 10:30:55.000000000 +0100 @@ -1,19 +1,28 @@ +#include <sched.h> #include <time.h> #include <errno.h> #include <stdio.h> #include <signal.h> #include <unistd.h> -#include <sched.h> #include <sys/types.h> #include <sys/time.h> #include <sys/shm.h> #include <sys/sem.h> #include <sys/msg.h> #include <sys/stat.h> +#include <stdlib.h> +#include <sys/types.h> +#include <sys/wait.h> /* this should be in unistd.h!! */ /* #include <getopt.h> */ +/**************** Prototypes */ + +void shmchild(int shm, int semid); +void shmparent(int shm, int semid, pid_t child); +void do_cpubind(int cpu); + /**************** General internal procs and flags here */ /* help/usage */ static void usage(void); @@ -25,7 +34,6 @@ int Niter = 0; /* Use signals rather than semiphores */ static void sig_NOP(); -static void wait_sig(); int OkayToRun = 0; int ParentPID, ChildPID; /* pipe vectors for -p option */ @@ -79,19 +87,20 @@ int msgqid; int do_yield = 0; -\f -main(int argc, char *argv[]) + +int main(int argc, char *argv[]) { int shm; struct shmid_ds buf; int semid = -1; - int child, stat; + int cpubind = -1; + int child; int RunTime = 10; union semun pvt_semun; pvt_semun.val = 0; - while ((shm = getopt(argc, argv, "sSLYmpn:t:")) != EOF) { + while ((shm = getopt(argc, argv, "sSLYmpn:t:c:")) != EOF) { switch (shm) { /* these are IPC types */ case 's': /* use semiphore */ @@ -124,11 +133,14 @@ case 't': /* give time to run */ RunTime = atoi(optarg); break; + case 'c': /* bind to a specific cpu */ + cpubind = atoi(optarg); + break; default: /* typo */ usage(); } } -\f + signal(SIGALRM, timeout); if (RunTime) alarm(RunTime); @@ -164,7 +176,7 @@ } /* identify version and method */ - printf("\n\nContext switching benchmark v1.17\n"); + printf("\n\nContext switching benchmark v1.17-cpubind\n"); printf(" Using %s for IPC control\n", IPCname[IPCtype]); printf(" Max iterations: %8d (zero = no limit)\n", Iterations); @@ -174,13 +186,14 @@ ParentPID = getpid(); if ((child = fork()) == 0) { + do_cpubind(cpubind); ChildPID = getpid(); shmchild(shm, semid); } else { + do_cpubind(cpubind); ChildPID = child; shmparent(shm, semid, child); } - wait(NULL); if (shmctl(shm, IPC_RMID, &buf) != 0) { perror("Error removing shared memory"); @@ -215,14 +228,13 @@ break; } - exit(0); + return 0; } -\f /*******************************/ /* child using IPC method */ -int shmchild(int shm, int semid) +void shmchild(int shm, int semid) { volatile char *mem; int num = 0; @@ -313,7 +325,7 @@ /********************************/ /* parent using shared memory */ -int shmparent(int shm, int semid, pid_t child) +void shmparent(int shm, int semid, pid_t child) { volatile char *mem; int num = 0; @@ -328,7 +340,7 @@ if (!(mem = shmat(shm, 0, 0))) { - perror("shmchild: Error attaching shared memory"); + perror("shmparent: Error attaching shared memory"); exit(2); } @@ -439,7 +451,7 @@ exit(3); } } -\f + /***************************************************************** | usage - give the user a clue ****************************************************************/ @@ -458,6 +470,7 @@ " -p use pipes for IPC\n" " -L spinLock in shared memory\n" " -Y spinlock with sched_yield (for UP)\n" + " -cN bind to cpu N\n" "\nRun limit options:\n" " -nN limit loops to N (default via timeout)\n" " -tN run for N sec, default 10\n\n" @@ -490,3 +503,22 @@ signal(SIGUSR1, sig_NOP); return; } + +/***************************************************************** + | cpu_bind - bind all tasks to a given cpu + ****************************************************************/ + +void do_cpubind(int cpubind) +{ + if (cpubind >= 0) { + cpu_set_t d; + int ret; + + CPU_ZERO(&d); + CPU_SET(cpubind, &d); + ret = sched_setaffinity(0, sizeof(d), &d); + printf("%d: sched_setaffinity %d: %lxh\n",getpid(), ret, *((int*)&d)); + ret = sched_getaffinity(0, sizeof(d), &d); + printf("%d: sched_getaffinity %d: %lxh\n",getpid(), ret, *((int*)&d)); + } +} ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Scalability requirements for sysv ipc 2008-03-21 13:33 ` Scalability requirements for sysv ipc Manfred Spraul 2008-03-21 14:13 ` Paul E. McKenney @ 2008-03-25 16:00 ` Nadia Derbey 1 sibling, 0 replies; 27+ messages in thread From: Nadia Derbey @ 2008-03-25 16:00 UTC (permalink / raw) To: Manfred Spraul; +Cc: Linux Kernel Mailing List, Andrew Morton, Paul E. McKenney Manfred Spraul wrote: > Nadia Derbey wrote: > >> Manfred Spraul wrote: >> >>> >>> A microbenchmark on a single-cpu system doesn't help much (except >>> that 2.6.25 is around factor 2 slower for sysv msg ping-pong between >>> two tasks compared to the numbers I remember from older kernels....) >>> >> >> If I remember well, at that time I had used ctxbench and I wrote some >> other small scripts. >> And the results I had were around 2 or 3% slowdown, but I have to >> confirm that by checking in my archives. >> > Do you have access to multi-core systems? The "best case" for the rcu > code would be > - 8 or 16 cores > - one instance of ctxbench running on each core, bound to that core. > > I'd expect a significant slowdown. The big question is if it matters. > > -- > Manfred > > Hi, Here is what I could find on my side: ============================================================= lkernel@akt$ cat tst3/res_new/output [root@akt tests]# echo 32768 > /proc/sys/kernel/msgmni [root@akt tests]# ./msgbench_std_dev_plot -n 32768000 msgget iterations in 21.469724 seconds = 1526294/sec 32768000 msgsnd iterations in 18.891328 seconds = 1734583/sec 32768000 msgctl(ipc_stat) iterations in 15.359802 seconds = 2133472/sec 32768000 msgctl(msg_stat) iterations in 15.296114 seconds = 2142260/sec 32768000 msgctl(ipc_rmid) iterations in 32.981277 seconds = 993542/sec AVERAGE STD_DEV MIN MAX GET: 21469.724000 566.024657 19880 23607 SEND: 18891.328000 515.542311 18433 21962 IPC_STAT: 15359.802000 274.918673 15147 17166 MSG_STAT: 15296.114000 155.775508 15138 16790 RM: 32981.277000 675.621060 32141 35433 lkernel@akt$ cat tst3/res_ref/output [root@akt tests]# echo 32768 > /proc/sys/kernel/msgmni [root@akt tests]# ./msgbench_std_dev_plot -r 32768000 msgget iterations in 665.842852 seconds = 49213/sec 32768000 msgsnd iterations in 18.363853 seconds = 1784458/sec 32768000 msgctl(ipc_stat) iterations in 14.609669 seconds = 2243001/sec 32768000 msgctl(msg_stat) iterations in 14.774829 seconds = 2217950/sec 32768000 msgctl(ipc_rmid) iterations in 31.134984 seconds = 1052483/sec AVERAGE STD_DEV MIN MAX GET: 665842.852000 946.697555 654049 672208 SEND: 18363.853000 107.514954 18295 19563 IPC_STAT: 14609.669000 43.100272 14529 14881 MSG_STAT: 14774.829000 97.174924 14516 15436 RM: 31134.984000 444.612055 30521 33523 ================================================================== Unfortunately, I haven't kept the exact kernel release numbers, but the testing method was: res_ref = unpatched kernel res_new = same kernel release with my patches applied. What I'll try to do is to re-run your tests (pmsg and psem) with this method (from my what I saw, the patches applied on a 2.6.23-rc4-mm1), but I can't do it before Thursday. Regards, Nadia ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2008-04-04 15:04 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-03-21 9:41 Scalability requirements for sysv ipc (was: ipc: store ipcs into IDRs) Manfred Spraul 2008-03-21 12:45 ` Nadia Derbey 2008-03-21 13:33 ` Scalability requirements for sysv ipc Manfred Spraul 2008-03-21 14:13 ` Paul E. McKenney 2008-03-21 16:08 ` Manfred Spraul 2008-03-22 5:43 ` Mike Galbraith 2008-03-22 10:10 ` Manfred Spraul 2008-03-22 11:53 ` Mike Galbraith 2008-03-22 14:22 ` Manfred Spraul 2008-03-22 19:08 ` Manfred Spraul 2008-03-25 15:50 ` Mike Galbraith 2008-03-25 16:13 ` Peter Zijlstra 2008-03-25 18:31 ` Mike Galbraith 2008-03-26 6:18 ` Mike Galbraith 2008-03-30 14:12 ` Scalability requirements for sysv ipc (+namespaces broken with SEM_UNDO) Manfred Spraul 2008-03-30 15:21 ` David Newall 2008-03-30 17:18 ` Mike Galbraith 2008-04-04 14:59 ` Nadia Derbey 2008-04-04 15:03 ` Nadia Derbey 2008-03-22 19:35 ` Scalability requirements for sysv ipc Mike Galbraith 2008-03-23 6:38 ` Manfred Spraul 2008-03-23 7:15 ` Mike Galbraith 2008-03-23 7:08 ` Mike Galbraith 2008-03-23 7:20 ` Mike Galbraith 2008-03-27 22:29 ` Bill Davidsen 2008-03-28 9:49 ` Manfred Spraul 2008-03-25 16:00 ` Nadia Derbey
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).