* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores [not found] <bug-12562-10286@http.bugzilla.kernel.org/> @ 2009-01-28 20:56 ` Andrew Morton 2009-01-28 22:15 ` Peter Zijlstra 2009-01-28 22:25 ` Thomas Pilarski 0 siblings, 2 replies; 20+ messages in thread From: Andrew Morton @ 2009-01-28 20:56 UTC (permalink / raw) To: Peter Zijlstra, Mike Galbraith, Gregory Haskins, thomas.pi Cc: bugme-daemon, linux-kernel (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Wed, 28 Jan 2009 06:35:20 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12562 > > Summary: High overhead while switching or synchronizing threads > on different cores Thanks for the report, and the testcase. > Product: Process Management > Version: 2.5 > KernelVersion: 2.6.28 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Scheduler > AssignedTo: mingo@elte.hu > ReportedBy: thomas.pi@arcor.de (There's testcase code in the bugzilla report) (Seems to be a regression) > > Hardware Environment: Core2Duo 2.4GHz / 4GB RAM > Software Environment: Ubuntu 8.10 + Vanilla 2.6.28 > > Hardware Environment: AMD64 X2 2.1GHz / 6GB RAM > Software Environment: Ubuntu 8.10 + Vanilla 2.6.28.2 > > Problem Description: > The overhead on a dual core while switching between tasks is extremely high > (>60% of cputime). If is produced by synchronization with pthread and > mutex/cond. > > Executing the attaches program schedulingissue 1 1024 8 20, which create a > producer and a consumer thread with eight 8kb big buffers. The producer creates > 1024 random generated double values, consumer makes the same after receiving > the buffer. > > While executing the program the thoughtput is ~1.6 msg/s. While executing two > instances of the program, the thoughtput is much higher (2 * 8.7 msg/s = 17,4 > msg/s). > > Small improvement while using jiffies as clocksource instead of acpi_pm or hpet > (1.8 messages instead of 1.6). Disabling NO_HZ and HIGH_RESOLUTION_TIME gives > no improvement. Much higher performance with kernel <= 2.6.24, but still four > times slower. Unclear. What is four times slower than what? You're saying that the app progresses four times faster when there are two instances of it running, rather than one instance? > --------------------------------------- > Linux bugs-laptop 2.6.28-hz-hrt #4 SMP Wed Jan 28 13:33:18 CET 2009 x86_64 > GNU/Linux > acpi_pm (equal with htep) > schedulerissue 1 1024 8 20 > All threads finished: 20 messages in 12.295 seconds / 1.627 msg/s > schedulerissue 1 1024 8 200 & schedulerissue 1 1024 8 200 > All threads finished: 200 messages in 22.882 seconds / 8.741 msg/s > All threads finished: 200 messages in 22.934 seconds / 8.721 msg/s > --------------------------------------- > Linux bugs-laptop 2.6.28-hz-hrt #4 SMP Wed Jan 28 13:33:18 CET 2009 x86_64 > GNU/Linux > jiffies > schedulerissue 1 1024 8 20 > All threads finished: 20 messages in 10.704 seconds / 1.868 msg/s > schedulerissue 1 1024 8 200 & schedulerissue 1 1024 8 200 > All threads finished: 200 messages in 23.372 seconds / 8.557 msg/s > All threads finished: 200 messages in 23.460 seconds / 8.525 msg/s > -------------------------------------- > Linux bugs-laptop 2.6.24.7 #1 SMP Wed Jan 14 10:21:04 CET 2009 x86_64 GNU/Linux > hpet > schedulerissue 1 1024 8 20 > All threads finished: 20 messages in 5.290 seconds / 3.781 msg/s > schedulerissue 1 1024 8 200 & schedulerissue 1 1024 8 200 > All threads finished: 200 messages in 23.000 seconds / 8.695 msg/s > All threads finished: 200 messages in 23.078 seconds / 8.666 msg/s > Seems that 2.6.24 is faster than 2.6.28 with 20 messages, but 2.6.24 and 2.6.28 run at the same speed when 200 messages are sent? If so, that seems rather odd, doesn't it? Is it possible that cpufreq does something bad once the CPU gets hot? > AMD64 X2 @ 2.1GHz > Linux bugs-desktop 2.6.28.2 #4 SMP Mon Jan 26 20:26:12 CET 2009 x86_64 > GNU/Linux > acpi_pm > schedulerissue 1 1024 8 20 > All threads finished: 20 messages in 9.288 seconds / 2.153 msg/s > schedulerissue 1 1024 8 200 > All threads finished: 200 messages in 17.049 seconds / 11.731 msg/s > All threads finished: 200 messages in 18.539 seconds / 10.788 msg/s ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-01-28 20:56 ` [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores Andrew Morton @ 2009-01-28 22:15 ` Peter Zijlstra 2009-01-28 22:25 ` Thomas Pilarski 1 sibling, 0 replies; 20+ messages in thread From: Peter Zijlstra @ 2009-01-28 22:15 UTC (permalink / raw) To: Andrew Morton Cc: Mike Galbraith, Gregory Haskins, thomas.pi, bugme-daemon, linux-kernel On Wed, 2009-01-28 at 12:56 -0800, Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Wed, 28 Jan 2009 06:35:20 -0800 (PST) > bugme-daemon@bugzilla.kernel.org wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=12562 > > > > Summary: High overhead while switching or synchronizing threads > > on different cores > > Thanks for the report, and the testcase. > > > Product: Process Management > > Version: 2.5 > > KernelVersion: 2.6.28 > > Platform: All > > OS/Version: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Scheduler > > AssignedTo: mingo@elte.hu > > ReportedBy: thomas.pi@arcor.de > > (There's testcase code in the bugzilla report) > > (Seems to be a regression) Is there a known good kernel? > > > > Hardware Environment: Core2Duo 2.4GHz / 4GB RAM > > Software Environment: Ubuntu 8.10 + Vanilla 2.6.28 > > > > Hardware Environment: AMD64 X2 2.1GHz / 6GB RAM > > Software Environment: Ubuntu 8.10 + Vanilla 2.6.28.2 > > > > Problem Description: > > The overhead on a dual core while switching between tasks is extremely high > > (>60% of cputime). If is produced by synchronization with pthread and > > mutex/cond. > > > > Executing the attaches program schedulingissue 1 1024 8 20, which create a > > producer and a consumer thread with eight 8kb big buffers. The producer creates > > 1024 random generated double values, consumer makes the same after receiving > > the buffer. > > > > While executing the program the thoughtput is ~1.6 msg/s. While executing two > > instances of the program, the thoughtput is much higher (2 * 8.7 msg/s = 17,4 > > msg/s). > > > > Small improvement while using jiffies as clocksource instead of acpi_pm or hpet > > (1.8 messages instead of 1.6). Disabling NO_HZ and HIGH_RESOLUTION_TIME gives > > no improvement. Much higher performance with kernel <= 2.6.24, but still four > > times slower. > > Unclear. What is four times slower than what? You're saying that the > app progresses four times faster when there are two instances of it > running, rather than one instance? It seems that way indeed, a bit more clarity would be good though. > > --------------------------------------- > > Linux bugs-laptop 2.6.28-hz-hrt #4 SMP Wed Jan 28 13:33:18 CET 2009 x86_64 > > GNU/Linux > > acpi_pm (equal with htep) > > schedulerissue 1 1024 8 20 > > All threads finished: 20 messages in 12.295 seconds / 1.627 msg/s > > schedulerissue 1 1024 8 200 & schedulerissue 1 1024 8 200 > > All threads finished: 200 messages in 22.882 seconds / 8.741 msg/s > > All threads finished: 200 messages in 22.934 seconds / 8.721 msg/s > > --------------------------------------- > > Linux bugs-laptop 2.6.28-hz-hrt #4 SMP Wed Jan 28 13:33:18 CET 2009 x86_64 > > GNU/Linux > > jiffies > > schedulerissue 1 1024 8 20 > > All threads finished: 20 messages in 10.704 seconds / 1.868 msg/s > > schedulerissue 1 1024 8 200 & schedulerissue 1 1024 8 200 > > All threads finished: 200 messages in 23.372 seconds / 8.557 msg/s > > All threads finished: 200 messages in 23.460 seconds / 8.525 msg/s > > -------------------------------------- > > Linux bugs-laptop 2.6.24.7 #1 SMP Wed Jan 14 10:21:04 CET 2009 x86_64 GNU/Linux > > hpet > > schedulerissue 1 1024 8 20 > > All threads finished: 20 messages in 5.290 seconds / 3.781 msg/s > > schedulerissue 1 1024 8 200 & schedulerissue 1 1024 8 200 > > All threads finished: 200 messages in 23.000 seconds / 8.695 msg/s > > All threads finished: 200 messages in 23.078 seconds / 8.666 msg/s > > > > Seems that 2.6.24 is faster than 2.6.28 with 20 messages, but 2.6.24 > and 2.6.28 run at the same speed when 200 messages are sent? > > If so, that seems rather odd, doesn't it? Is it possible that cpufreq > does something bad once the CPU gets hot? Nah, I'll bet is a cache affinity issue. Some applications like strong wakeup affinity, others not so. This looks to be a lover. With a single instance, the producer and consumer get scheduled on two different cores for some reason (maybe wake idle too strong). With two instances, they get to stay on the same cpu, since the other cpu is already busy. I'll start up the browser in the morning to download this proglet and poke at it some, but sleep comes first. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-01-28 20:56 ` [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores Andrew Morton 2009-01-28 22:15 ` Peter Zijlstra @ 2009-01-28 22:25 ` Thomas Pilarski 2009-01-29 9:07 ` Peter Zijlstra 1 sibling, 1 reply; 20+ messages in thread From: Thomas Pilarski @ 2009-01-28 22:25 UTC (permalink / raw) To: Andrew Morton Cc: Peter Zijlstra, Mike Galbraith, Gregory Haskins, bugme-daemon, linux-kernel Am Mittwoch, den 28.01.2009, 12:56 -0800 schrieb Andrew Morton: > (There's testcase code in the bugzilla report) > > (Seems to be a regression) There is a regression, because of the improved cpu switching. The problem exists in every kernel. I takes a lot of time to switch between the threads, when they are executed on different cores. Perhaps of the big buffer size of 512KB? > > Small improvement while using jiffies as clocksource instead of acpi_pm or hpet > > (1.8 messages instead of 1.6). Disabling NO_HZ and HIGH_RESOLUTION_TIME gives > > no improvement. Much higher performance with kernel <= 2.6.24, but still four > > times slower. > > Unclear. What is four times slower than what? You're saying that the > app progresses four times faster when there are two instances of it > running, rather than one instance? About 4 messages every second, while executing only one instance and about 8 message every second, while executing two instance of the test. It makes 16 messages every second, when the two threads of a instance is executed on only one core. > Seems that 2.6.24 is faster than 2.6.28 with 20 messages, but 2.6.24 > and 2.6.28 run at the same speed when 200 messages are sent? I have executed the test twenty times. It stays constant on 2.6.28. On 2.6.24 one of ten tests is executed slower. ******* kernel 2.6.28: All threads finished: 20 messages in 12.853 seconds / 1.556 msg/s real 0m12.857s user 0m8.589s sys 0m16.629s ******* kernel 2.6.24: All threads finished: 20 messages in 4.939 seconds / 4.050 msg/s real 0m4.942s user 0m5.248s sys 0m4.352s One of ten executions is going down to 1.806 msg/s. All threads finished: 20 messages in 11.074 seconds / 1.806 msg/s real 0m11.077s user 0m8.817s sys 0m12.925s > If so, that seems rather odd, doesn't it? Is it possible that cpufreq > does something bad once the CPU gets hot? I have disabled the acpid, clocked the cpu to 2.4GHz and watched the temperature of the cores and the frequency. The clock stay always at 2.4GHz and the temperature is always below 67°C. My cpu is clocking down at 95°C. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-01-28 22:25 ` Thomas Pilarski @ 2009-01-29 9:07 ` Peter Zijlstra 2009-01-29 10:12 ` Thomas Pilarski 0 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2009-01-29 9:07 UTC (permalink / raw) To: Thomas Pilarski Cc: Andrew Morton, Mike Galbraith, Gregory Haskins, bugme-daemon, linux-kernel On Wed, 2009-01-28 at 23:25 +0100, Thomas Pilarski wrote: > Am Mittwoch, den 28.01.2009, 12:56 -0800 schrieb Andrew Morton: > > > (There's testcase code in the bugzilla report) > > > > (Seems to be a regression) > > There is a regression, because of the improved cpu switching. The > problem exists in every kernel. This is a contradiction in terms - twice. If it is a regression, then clearly things haven't improved. If it is a regression, state clearly when it worked last. If it never worked, it cannot be a regression. > I takes a lot of time to switch between the threads, when they are > executed on different cores. > Perhaps of the big buffer size of 512KB? Of course, pushing 512kb to another cpu means lots and lots of cache misses. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-01-29 9:07 ` Peter Zijlstra @ 2009-01-29 10:12 ` Thomas Pilarski 2009-01-29 10:24 ` Thomas Pilarski 0 siblings, 1 reply; 20+ messages in thread From: Thomas Pilarski @ 2009-01-29 10:12 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, Mike Galbraith, Gregory Haskins, bugme-daemon, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1893 bytes --] > > There is a regression, because of the improved cpu switching. The > > problem exists in every kernel. > > This is a contradiction in terms - twice. > > If it is a regression, then clearly things haven't improved. > > If it is a regression, state clearly when it worked last. If it never > worked, it cannot be a regression. There is a improvement in load balancing for single threaded applications. It's a regression for my problem. But the problem exists in every kernel I have tested. > > I takes a lot of time to switch between the threads, when they are > > executed on different cores. > > Perhaps of the big buffer size of 512KB? > > Of course, pushing 512kb to another cpu means lots and lots of cache > misses. I have tried 2.6.15, 2.6.18 and 2.6.20 too, but same behavior as in 2.6.24. With Windows I can get 64 message every second with a buffer size of 512 KB. It is reduced to 16 messages with a buffer size of 1MB. But I think it not really comparable, because there is nearby no cpu consumption with 512kB. Perhaps random() works different. By increasing the cpu usage eight times in the producer, I can get 16msg/s and both cores are used about ~50%. Doing the same with linux I get a throughput of ~2msg/s. If it is a caching issue, shouldn't it exists in Windows too? Using a smaller buffer of 4KB, the test is executed on one core only. ./schedulerissue 1 4096 8 2000 All threads finished: 2000 messages in 1.631 seconds / 1226.076 msg/s real 0m1.635s user 0m1.352s sys 0m0.052s But I want to use both cores to increase the performance. Adding a second producer and a second consumer reduces the performance to 33%. Both cores are used. ./schedulerissue 2 4096 8 2000 All threads finished: 1999 messages in 4.744 seconds / 421.379 msg/s real 0m4.748s user 0m3.280s sys 0m5.852s I have added a new version as there was a possible deadlock during shut-down. [-- Attachment #2: ThreadSchedulingIssue.c --] [-- Type: text/x-csrc, Size: 9410 bytes --] #include <stdio.h> #include <stdlib.h> #include <pthread.h> #include <math.h> #include <signal.h> #include <unistd.h> #include <time.h> int CTHREADPAIRCOUNT; int CBUFFER_SIZE; int CBUFFER_COUNT; int CMESSAGES_COUNT; int todo_messages; pthread_mutex_t producer_mutex; pthread_cond_t producer_cond; pthread_mutex_t consumer_mutex; pthread_cond_t consumer_cond; pthread_mutex_t result_mutex; pthread_mutex_t todo_mutex; int message_count = 0; double start; double end; int terminate; int terminate_producer; double* buffers; //[CBUFFER_COUNT][CBUFFER_SIZE]; int freebuffer_count; int freebuffer_pos; double** free_buffers; //[CBUFFER_COUNT]; int filledbuffer_count; int filledbuffer_pos; double** filled_buffers; //[CBUFFER_COUNT]; /** * Return system uptime in µs */ double getSystemTime() { struct timespec tv; clock_gettime(CLOCK_MONOTONIC, &tv); return (double) (tv.tv_sec) * 1000000.0 + (double) (tv.tv_nsec) / 1000.0; } /** * Get free buffer and block thread if not buffer is available. */ double* getFreeBuffer() { pthread_mutex_lock(&producer_mutex); // If there is no free buffer to be filled, wait while (freebuffer_count == 0) { //printf("wait for free buffer\n"); /** exit if all masages are finished **/ if (terminate || terminate_producer) { pthread_mutex_unlock(&producer_mutex); return NULL; } pthread_cond_wait(&producer_cond, &producer_mutex); } usleep(1); double* result = free_buffers[freebuffer_pos]; freebuffer_pos = (freebuffer_pos + 1) % CBUFFER_COUNT; freebuffer_count--; pthread_mutex_unlock(&producer_mutex); return result; } /** * Return free buffer and notify producer */ void returnFreeBuffer(double* buff) { pthread_mutex_lock(&producer_mutex); free_buffers[(freebuffer_pos + freebuffer_count) % CBUFFER_COUNT] = buff; freebuffer_count++; // Notify waiting producer //printf("added free buffer\n"); pthread_cond_signal(&producer_cond); pthread_mutex_unlock(&producer_mutex); } /** * Add filled buffer and notify consumer */ void putFilledBuffer(double* buff) { pthread_mutex_lock(&consumer_mutex); filled_buffers[(filledbuffer_pos + filledbuffer_count) % CBUFFER_COUNT] = buff; filledbuffer_count++; // Notify waiting consumers //printf("added filled buffer\n"); pthread_cond_signal(&consumer_cond); pthread_mutex_unlock(&consumer_mutex); } /** * Get filled buffer or wait until exists */ double* getFilledBuffer() { pthread_mutex_lock(&consumer_mutex); // If there is no filled buffer, wait until producer fills a new one while (filledbuffer_count == 0) { //printf("wait for filled buffer\n"); /** * exit if all massages are finished * This can cause the loosing of some * already produces data. **/ if (terminate || terminate_producer) { terminate = 1; pthread_mutex_unlock(&consumer_mutex); return NULL; } pthread_cond_wait(&consumer_cond, &consumer_mutex); } double* result = filled_buffers[filledbuffer_pos]; filledbuffer_pos = (filledbuffer_pos + 1) % CBUFFER_COUNT; filledbuffer_count--; pthread_mutex_unlock(&consumer_mutex); return result; } /** * Producer thread. Filled buffer with random numbers and add to consumer list. */ void *thread_producer(void *arg) { while (!terminate && !terminate_producer) { int i; pthread_mutex_lock(&todo_mutex); if (todo_messages <= 0) { terminate_producer = 1; pthread_mutex_unlock(&todo_mutex); break; } todo_messages--; pthread_mutex_unlock(&todo_mutex); double* cbuff = getFreeBuffer(); if (cbuff) { cbuff[0] = getSystemTime(); for (i = 2; i < CBUFFER_SIZE; i++) { // Fill the buffer with random character 0 - 255 cbuff[i] = // (double) random() / (double) RAND_MAX * // (double) random() / (double) RAND_MAX * // (double) random() / (double) RAND_MAX * // (double) random() / (double) RAND_MAX * // (double) random() / (double) RAND_MAX * // (double) random() / (double) RAND_MAX * // (double) random() / (double) RAND_MAX * // (double) random() / (double) RAND_MAX * (double) random() / (double) RAND_MAX; } cbuff[1] = getSystemTime(); putFilledBuffer(cbuff); } } pthread_exit(NULL); } /** * Consumer thread. Get filled buffer. Make something and return to producer list. */ void *thread_consumer(void *arg) { while (!terminate) { int i; double* cbuff = getFilledBuffer(); if (cbuff) { cbuff[2] = getSystemTime(); for (i = 4; i < CBUFFER_SIZE - 1; i++) { // Fill the buffer with random character 0 - 255 cbuff[i] *= (double) random() / (double) RAND_MAX; } cbuff[3] = getSystemTime(); pthread_mutex_lock(&result_mutex); if ((message_count == 0) || (start > cbuff[0])) { start = cbuff[0]; } if ((message_count == 0) || (end < cbuff[3])) { end = cbuff[3]; } message_count++; pthread_mutex_unlock(&result_mutex); // printf("Message runntime Calc:%1.3fms / Sendmessage: %1.3fms / Calc:%1.3fms\n", // (cbuff[1] - cbuff[0])/1000.0, // (cbuff[2] - cbuff[1])/1000.0, // (cbuff[3] - cbuff[2])/1000.0 // ); returnFreeBuffer(cbuff); } } pthread_exit(NULL); } /** * Set terminate flag on sig quit. */ void sig_quit(int a) { terminate = 1; printf("Terminate calculation\n"); /* * Notify producers, a they can wait for * free buffers. */ pthread_cond_broadcast(&producer_cond); } /** * For testing purposes only. */ int main(int argc, char *argv[]) { terminate = 0; terminate_producer = 0; if (signal(SIGINT, sig_quit) == SIG_ERR) { printf("Could not init quit signal\n"); return -1; } if (argc < 5) { printf( "Need tree parameters. Number of thread pairs - message size in doubles - buffer count - overall messages - (show intermediate data intervall)\n"); exit(-1); } int show_intermediate = 0; CTHREADPAIRCOUNT = atoi(argv[1]); CBUFFER_SIZE = atoi(argv[2]); CBUFFER_COUNT = atoi(argv[3]); CMESSAGES_COUNT = atoi(argv[4]); if (argc > 5) { show_intermediate = atoi(argv[5]); } if ((CTHREADPAIRCOUNT < 1) || (CTHREADPAIRCOUNT > 256)) { printf("Number of thread pairs is limited by 1-256\n"); exit(-1); } if ((CBUFFER_SIZE < 8) || (CBUFFER_SIZE > 1048576)) { printf("Buffer size is limited by 8-1,048,576\n"); exit(-1); } if ((CBUFFER_COUNT < 1) || (CBUFFER_COUNT > CTHREADPAIRCOUNT * 8)) { printf( "Number of buffers is limited by 1 and tread pairs * 8\n"); exit(-1); } if ((CMESSAGES_COUNT < CTHREADPAIRCOUNT * 2) || (CBUFFER_COUNT > CTHREADPAIRCOUNT * 100)) { printf( "Number of messages is limited by thread pairs * 2 and tread pairs * 100\n"); exit(-1); } if ((show_intermediate < 0) || (show_intermediate > 10)) { printf( "Intermediate data interval must be in [0-10]\n"); exit(-1); } buffers = malloc(CBUFFER_COUNT * CBUFFER_SIZE * sizeof(double)); free_buffers = malloc(CBUFFER_COUNT * sizeof(double*)); filled_buffers = malloc(CBUFFER_COUNT * sizeof(double*)); todo_messages = CMESSAGES_COUNT; pthread_mutex_init(&consumer_mutex, NULL); pthread_cond_init(&consumer_cond, NULL); pthread_mutex_init(&producer_mutex, NULL); pthread_cond_init(&producer_cond, NULL); pthread_mutex_init(&result_mutex, NULL); pthread_mutex_init(&todo_mutex, NULL); int i; for (i = 0; i < CBUFFER_COUNT; i++) { free_buffers[i] = &(buffers[i * CBUFFER_SIZE]); } freebuffer_count = CBUFFER_COUNT; freebuffer_pos = 0; filledbuffer_count = 0; filledbuffer_pos = 0; pthread_t threads[CTHREADPAIRCOUNT * 2]; for (i = 0; i < CTHREADPAIRCOUNT; i++) { if (pthread_create(&threads[i], NULL, thread_producer, NULL)) { printf("Could not create producer %d\n", i); } if (pthread_create(&threads[i + CTHREADPAIRCOUNT], NULL, thread_consumer, NULL)) { printf("Could not create consumer %d\n", i); } } double start_overall = -1; double end_overall = -1; int all_messages = 0; if (show_intermediate) { while (!terminate) { sleep(show_intermediate); pthread_mutex_lock(&result_mutex); printf("Messages %d - msg/s: %1.3f\n", message_count, ((double) message_count) / ((end - start) / 1000000.0)); if ((start_overall < 0) || (start_overall > start)) { start_overall = start; } if ((end_overall < 0) || (end_overall < end)) { end_overall = end; } //start = getSystemTime(); all_messages += message_count; message_count = 0; pthread_mutex_unlock(&result_mutex); } } for (i = 0; i < CTHREADPAIRCOUNT; i++) { //printf("Wait for thread %d\n", i); pthread_join(threads[i], NULL); } terminate = 1; /** * Notify consumers, as they can wait for * data. */ pthread_cond_broadcast(&consumer_cond); for (i = CTHREADPAIRCOUNT; i < CTHREADPAIRCOUNT*2; i++) { //printf("Wait for thread %d\n", i); pthread_join(threads[i], NULL); } if (!show_intermediate) { start_overall = start; end_overall = end; all_messages = message_count; } printf( "All threads finished: %d messages in %1.3f seconds / %1.3f msg/s\n", all_messages, (end_overall - start_overall) / 1000000.0, (double) all_messages / ((end_overall - start_overall) / 1000000.0)); pthread_mutex_destroy(&producer_mutex); pthread_cond_destroy(&producer_cond); pthread_mutex_destroy(&consumer_mutex); pthread_cond_destroy(&consumer_cond); pthread_mutex_destroy(&result_mutex); pthread_mutex_destroy(&todo_mutex); free(buffers); free(free_buffers); free(filled_buffers); return EXIT_SUCCESS; } ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-01-29 10:12 ` Thomas Pilarski @ 2009-01-29 10:24 ` Thomas Pilarski 2009-01-29 10:31 ` Peter Zijlstra 2009-01-29 11:37 ` Peter Zijlstra 0 siblings, 2 replies; 20+ messages in thread From: Thomas Pilarski @ 2009-01-29 10:24 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, Mike Galbraith, Gregory Haskins, bugme-daemon, linux-kernel Some explanation of the test program. ./schedulerissue 1 4096 8 2000 1 producer and 1 consumer buffer size of 4096 doubles * 8byte 8 buffer (256kB total buffer) 2000 messages ./schedulerissue 2 4096 8 2000 2 producer and 2 consumer buffer size of 4096 doubles * 8byte 8 buffer (256kB total buffer) 2000 messages It was not 512KB bytes in the test before, but 4MB. But there is the same problem with a total buffer size of 48kB and 4 threads (./schedulerissue 2 2048 3 20000). ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-01-29 10:24 ` Thomas Pilarski @ 2009-01-29 10:31 ` Peter Zijlstra 2009-01-29 11:37 ` Peter Zijlstra 1 sibling, 0 replies; 20+ messages in thread From: Peter Zijlstra @ 2009-01-29 10:31 UTC (permalink / raw) To: Thomas Pilarski Cc: Andrew Morton, Mike Galbraith, Gregory Haskins, bugme-daemon, linux-kernel On Thu, 2009-01-29 at 11:24 +0100, Thomas Pilarski wrote: > Some explanation of the test program. > > ../schedulerissue 1 4096 8 2000 > 1 producer and 1 consumer > buffer size of 4096 doubles * 8byte > 8 buffer (256kB total buffer) > 2000 messages > > ../schedulerissue 2 4096 8 2000 > 2 producer and 2 consumer > buffer size of 4096 doubles * 8byte > 8 buffer (256kB total buffer) > 2000 messages > > > It was not 512KB bytes in the test before, but 4MB. > But there is the same problem with a total buffer size of 48kB and 4 > threads (./schedulerissue 2 2048 3 20000). Right, read the proglet (and removed that usleep(1)) and am poking at it. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-01-29 10:24 ` Thomas Pilarski 2009-01-29 10:31 ` Peter Zijlstra @ 2009-01-29 11:37 ` Peter Zijlstra 2009-01-29 14:05 ` Thomas Pilarski 1 sibling, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2009-01-29 11:37 UTC (permalink / raw) To: Thomas Pilarski Cc: Andrew Morton, Mike Galbraith, Gregory Haskins, bugme-daemon, linux-kernel On Thu, 2009-01-29 at 11:24 +0100, Thomas Pilarski wrote: > Some explanation of the test program. > > ../schedulerissue 1 4096 8 2000 > 1 producer and 1 consumer > buffer size of 4096 doubles * 8byte > 8 buffer (256kB total buffer) > 2000 messages > > ../schedulerissue 2 4096 8 2000 > 2 producer and 2 consumer > buffer size of 4096 doubles * 8byte > 8 buffer (256kB total buffer) > 2000 messages > > > It was not 512KB bytes in the test before, but 4MB. > But there is the same problem with a total buffer size of 48kB and 4 > threads (./schedulerissue 2 2048 3 20000). Linux opteron 2.6.29-rc3-tip #61 SMP PREEMPT Thu Jan 29 11:59:15 CET 2009 x86_64 x86_64 x86_64 GNU/Linux [root@opteron bench]# schedtool -a 1 -e ./ThreadSchedulingIssue 1 4096 8 20000 All threads finished: 19992 messages in 6.485 seconds / 3082.877 msg/s [root@opteron bench]# ./ThreadSchedulingIssue 1 4096 8 20000 All threads finished: 19992 messages in 6.496 seconds / 3077.604 msg/s [root@opteron bench]# ./ThreadSchedulingIssue 1 4096 8 20000 & ./ThreadSchedulingIssue 1 4096 8 20000 & [1] 10314 [2] 10315 [root@opteron bench]# All threads finished: 19992 messages in 6.720 seconds / 2975.009 msg/s All threads finished: 19992 messages in 6.792 seconds / 2943.574 msg/s [1]- Done ./ThreadSchedulingIssue 1 4096 8 20000 [2]+ Done ./ThreadSchedulingIssue 1 4096 8 20000 [root@opteron bench]# ./ThreadSchedulingIssue 2 4096 8 20000 All threads finished: 19992 messages in 17.299 seconds / 1155.667 msg/s [root@opteron bench]# for i in 4 8 16 32 64 128 256 ; do > echo -n $((i*1024)) $((80000/i)) " " ; > schedtool -a 1 -e ./ThreadSchedulingIssue 1 $((i*1024)) 8 $((80000/i)) ; > done 4096 20000 All threads finished: 19992 messages in 6.368 seconds / 3139.251 msg/s 8192 10000 All threads finished: 9992 messages in 5.363 seconds / 1863.083 msg/s 16384 5000 All threads finished: 4992 messages in 5.471 seconds / 912.479 msg/s 32768 2500 All threads finished: 2493 messages in 5.730 seconds / 435.059 msg/s 65536 1250 All threads finished: 1242 messages in 5.544 seconds / 224.021 msg/s 131072 625 All threads finished: 617 messages in 5.755 seconds / 107.217 msg/s 262144 312 All threads finished: 305 messages in 6.014 seconds / 50.713 msg/s [root@opteron bench]# for i in 4 8 16 32 64 128 256 ; do > echo -n $((i*1024)) $((80000/i)) " " ; > ./ThreadSchedulingIssue 1 $((i*1024)) 8 $((80000/i)) ; > done 4096 20000 All threads finished: 19992 messages in 6.462 seconds / 3093.717 msg/s 8192 10000 All threads finished: 9992 messages in 8.767 seconds / 1139.738 msg/s 16384 5000 All threads finished: 5000 messages in 5.366 seconds / 931.798 msg/s 32768 2500 All threads finished: 2494 messages in 20.720 seconds / 120.369 msg/s 65536 1250 All threads finished: 1242 messages in 11.521 seconds / 107.805 msg/s 131072 625 All threads finished: 618 messages in 14.035 seconds / 44.032 msg/s 262144 312 All threads finished: 305 messages in 17.342 seconds / 17.587 msg/s The above point between 16 and 32 is exactly where the total working set doesn't fit into cache anymore -- I suspect that pushes the producer's latency to go to sleep over the edge and everything collapses. We use wakeup patterns to determine if two tasks are working together and should thus be kept together. Task A should wake up B, and B should wake up A. Furthermore, any task should quickly go to sleep after waking up the other. This program does neither, with a single pair, the producer continues production after waking the consumer (until the queue is filled -- which, if the consumer is fast enough, might never happen). With multiple pairs there is no strict pair relation at all, since they all work on the same global buffer queue, so P1 can wake Cn etc. Furthermore the program uses shared memory (not a bad design), and thus mises out on the explicit affinity hints of pipes, sockets, etc. In short this program is carefully crafted to defeat all our affinity tests - and I'm not sure what to do. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-01-29 11:37 ` Peter Zijlstra @ 2009-01-29 14:05 ` Thomas Pilarski 2009-01-30 7:57 ` Mike Galbraith 0 siblings, 1 reply; 20+ messages in thread From: Thomas Pilarski @ 2009-01-29 14:05 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, Mike Galbraith, Gregory Haskins, bugme-daemon, linux-kernel > In short this program is carefully crafted to defeat all our affinity > tests - and I'm not sure what to do. I am sorry, although it is not carefully crafted. The function random() is causing my problem. I currently have no real data, so I tried to make some random utilization and data. Without the random() function it works even with 80MB of data and I get great results. ./ThreadSchedulingIssue 1 10485760 8 312 All threads finished: 309 messages in 29.369 seconds / 10.521 msg/s schedtool -a 1 -e ./ThreadSchedulingIssue 1 10485760 8 312 All threads finished: 312 messages in 44.284 seconds / 7.045 msg/s It does not even regress with more then two threads. ./ThreadSchedulingIssue 2 10485760 8 312 All threads finished: 311 messages in 28.040 seconds / 11.091 msg/s ./ThreadSchedulingIssue 4 10485760 8 312 All threads finished: 309 messages in 28.021 seconds / 11.027 msg/s With small amounts of data the speed on two core is even doubled. schedtool -a 1 -e ./ThreadSchedulingIssue 1 1048 8 312000 All threads finished: 311992 messages in 19.437 seconds / 16051.247 msg/s ./ThreadSchedulingIssue 3 1048 8 312000 All threads finished: 311998 messages in 9.652 seconds / 32324.411 msg/s ./ThreadSchedulingIssue 8 1048 8 312000 All threads finished: 311997 messages in 9.339 seconds / 33406.370 msg/s -------------- Perhaps it is as it should be, but when I run the test (without random()) with 2*8 threads, it uses ~186 of the cpu, while an instance of "bzip2 -9 -c /dev/urandom >/dev/null" gets only 12%. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-01-29 14:05 ` Thomas Pilarski @ 2009-01-30 7:57 ` Mike Galbraith 2009-02-02 7:43 ` Thomas Pilarski 0 siblings, 1 reply; 20+ messages in thread From: Mike Galbraith @ 2009-01-30 7:57 UTC (permalink / raw) To: Thomas Pilarski Cc: Peter Zijlstra, Andrew Morton, Gregory Haskins, bugme-daemon, linux-kernel On Thu, 2009-01-29 at 15:05 +0100, Thomas Pilarski wrote: > > In short this program is carefully crafted to defeat all our affinity > > tests - and I'm not sure what to do. > > I am sorry, although it is not carefully crafted. The function random() > is causing my problem. I currently have no real data, so I tried to make > some random utilization and data. Yeah, rather big difference, mega-contention vs zero-contention. 2.6.28.2, profile of ThreadSchedulingIssue 4 524288 8 200 vma samples % app name symbol name ffffffff80251efa 2574819 31.6774 vmlinux futex_wake ffffffff80251a39 1367613 16.8255 vmlinux futex_wait 0000000000411790 815426 10.0320 ThreadSchedulingIssue random ffffffff8022b3b5 343692 4.2284 vmlinux task_rq_lock 0000000000404e30 299316 3.6824 ThreadSchedulingIssue __lll_lock_wait_private ffffffff8030d430 262906 3.2345 vmlinux copy_user_generic_string ffffffff80462af2 235176 2.8933 vmlinux schedule 0000000000411b90 210984 2.5957 ThreadSchedulingIssue random_r ffffffff80251730 129376 1.5917 vmlinux hash_futex ffffffff8020be10 123548 1.5200 vmlinux system_call ffffffff8020a679 119398 1.4689 vmlinux __switch_to ffffffff8022f49b 110068 1.3541 vmlinux try_to_wake_up ffffffff8024c4d1 106352 1.3084 vmlinux sched_clock_cpu ffffffff8020be20 102709 1.2636 vmlinux system_call_after_swapgs ffffffff80229a2d 100614 1.2378 vmlinux update_curr ffffffff80248309 86475 1.0639 vmlinux add_wait_queue ffffffff80253149 85969 1.0577 vmlinux do_futex Versus using myrand() free sample cruft generator from rand(3) manpage. Poof. vma samples % app name symbol name 004002f4 979506 90.7113 ThreadSchedulingIssue myrand 00400b00 53348 4.9405 ThreadSchedulingIssue thread_consumer 00400c25 42710 3.9553 ThreadSchedulingIssue thread_producer One of those "don't _ever_ do that" things? -Mike ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-01-30 7:57 ` Mike Galbraith @ 2009-02-02 7:43 ` Thomas Pilarski 2009-02-02 8:19 ` Peter Zijlstra 2009-02-03 3:56 ` Valdis.Kletnieks 0 siblings, 2 replies; 20+ messages in thread From: Thomas Pilarski @ 2009-02-02 7:43 UTC (permalink / raw) To: Mike Galbraith Cc: Peter Zijlstra, Andrew Morton, Gregory Haskins, bugme-daemon, linux-kernel Am Freitag, den 30.01.2009, 08:57 +0100 schrieb Mike Galbraith: > One of those "don't _ever_ do that" things? I did not known random() uses a system call. It's rather unrealistic to have five million system calls in a second. By adding a small loop with some calculations near the random, the problem disappears too. It is a unlucky chosen data generator. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-02-02 7:43 ` Thomas Pilarski @ 2009-02-02 8:19 ` Peter Zijlstra 2009-02-02 8:33 ` Thomas Pilarski 2009-02-03 3:56 ` Valdis.Kletnieks 1 sibling, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2009-02-02 8:19 UTC (permalink / raw) To: Thomas Pilarski Cc: Mike Galbraith, Andrew Morton, Gregory Haskins, bugme-daemon, linux-kernel On Mon, 2009-02-02 at 08:43 +0100, Thomas Pilarski wrote: > Am Freitag, den 30.01.2009, 08:57 +0100 schrieb Mike Galbraith: > > One of those "don't _ever_ do that" things? > > I did not known random() uses a system call. It's rather unrealistic to > have five million system calls in a second. By adding a small loop with > some calculations near the random, the problem disappears too. > It is a unlucky chosen data generator. I suppose you'll have to go bug the glibc people about their random() implementation. If you really need random() to perform for your application (monte-carlo stuff?) You might be better off writing a PRNG with TLS state or something. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-02-02 8:19 ` Peter Zijlstra @ 2009-02-02 8:33 ` Thomas Pilarski 2009-02-02 8:52 ` Mike Galbraith 0 siblings, 1 reply; 20+ messages in thread From: Thomas Pilarski @ 2009-02-02 8:33 UTC (permalink / raw) To: Peter Zijlstra Cc: Mike Galbraith, Andrew Morton, Gregory Haskins, bugme-daemon, linux-kernel Am Montag, den 02.02.2009, 09:19 +0100 schrieb Peter Zijlstra: > I suppose you'll have to go bug the glibc people about their random() > implementation. Yes, I will. > If you really need random() to perform for your application (monte-carlo > stuff?) You might be better off writing a PRNG with TLS state or > something. I just need some noise in my images. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-02-02 8:33 ` Thomas Pilarski @ 2009-02-02 8:52 ` Mike Galbraith 2009-02-02 8:55 ` Peter Zijlstra 0 siblings, 1 reply; 20+ messages in thread From: Mike Galbraith @ 2009-02-02 8:52 UTC (permalink / raw) To: Thomas Pilarski Cc: Peter Zijlstra, Andrew Morton, Gregory Haskins, bugme-daemon, linux-kernel On Mon, 2009-02-02 at 09:33 +0100, Thomas Pilarski wrote: > Am Montag, den 02.02.2009, 09:19 +0100 schrieb Peter Zijlstra: > > I suppose you'll have to go bug the glibc people about their random() > > implementation. > > Yes, I will. Finding the below was easy enough... /* POSIX.1c requires that there is mutual exclusion for the `rand' and `srand' functions to prevent concurrent calls from modifying common data. */ __libc_lock_define_initialized (static, lock) ... long int __random () { int32_t retval; __libc_lock_lock (lock); (void) __random_r (&unsafe_state, &retval); __libc_lock_unlock (lock); return retval; } ...but finding the plumbing leading to __lll_lock_wait_private() over-taxed my attention span. -Mike ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-02-02 8:52 ` Mike Galbraith @ 2009-02-02 8:55 ` Peter Zijlstra 2009-02-02 12:15 ` Peter Zijlstra 0 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2009-02-02 8:55 UTC (permalink / raw) To: Mike Galbraith Cc: Thomas Pilarski, Andrew Morton, Gregory Haskins, bugme-daemon, linux-kernel On Mon, 2009-02-02 at 09:52 +0100, Mike Galbraith wrote: > On Mon, 2009-02-02 at 09:33 +0100, Thomas Pilarski wrote: > > Am Montag, den 02.02.2009, 09:19 +0100 schrieb Peter Zijlstra: > > > I suppose you'll have to go bug the glibc people about their random() > > > implementation. > > > > Yes, I will. > > Finding the below was easy enough... Ah, that was a good clue, apparently all you need to so it use random_r() and provide your own state and all should be well. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-02-02 8:55 ` Peter Zijlstra @ 2009-02-02 12:15 ` Peter Zijlstra 2009-02-02 18:29 ` Michael Kerrisk 0 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2009-02-02 12:15 UTC (permalink / raw) To: Mike Galbraith Cc: Thomas Pilarski, Andrew Morton, Gregory Haskins, bugme-daemon, linux-kernel, Michael Kerrisk On Mon, 2009-02-02 at 09:55 +0100, Peter Zijlstra wrote: > Ah, that was a good clue, apparently all you need to so it use > random_r() and provide your own state and all should be well. Michael, would it make sense to add the random_r() family to the "SEE ALSO" section of the random() man page? (Admittedly, my random() manpage is ancient: 2008-03-07, so it might be this is already the case, in which case, ignore me :) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-02-02 12:15 ` Peter Zijlstra @ 2009-02-02 18:29 ` Michael Kerrisk 2009-02-02 18:35 ` Peter Zijlstra 0 siblings, 1 reply; 20+ messages in thread From: Michael Kerrisk @ 2009-02-02 18:29 UTC (permalink / raw) To: Peter Zijlstra Cc: Mike Galbraith, Thomas Pilarski, Andrew Morton, Gregory Haskins, bugme-daemon, linux-kernel Hi Peter, On Tue, Feb 3, 2009 at 1:15 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, 2009-02-02 at 09:55 +0100, Peter Zijlstra wrote: > >> Ah, that was a good clue, apparently all you need to so it use >> random_r() and provide your own state and all should be well. > > Michael, would it make sense to add the random_r() family to the "SEE > ALSO" section of the random() man page? > > (Admittedly, my random() manpage is ancient: 2008-03-07, so it might be > this is already the case, in which case, ignore me :) (Up-to-date version of the pages can always be found online at the location in the .sig.) Well, the man page already had this text under notes: This function should not be used in cases where multiple threads use random() and the behavior should be reproducible. Use random_r(3) for that purpose. But it certainly doesn't hurt to have random_r(3) also listed under the SEE ALSO, and I've added it for man-pages-3.18. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-02-02 18:29 ` Michael Kerrisk @ 2009-02-02 18:35 ` Peter Zijlstra 2009-02-03 4:55 ` Mike Galbraith 0 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2009-02-02 18:35 UTC (permalink / raw) To: mtk.manpages Cc: Mike Galbraith, Thomas Pilarski, Andrew Morton, Gregory Haskins, bugme-daemon, linux-kernel On Tue, 2009-02-03 at 07:29 +1300, Michael Kerrisk wrote: > Hi Peter, > > On Tue, Feb 3, 2009 at 1:15 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, 2009-02-02 at 09:55 +0100, Peter Zijlstra wrote: > > > >> Ah, that was a good clue, apparently all you need to so it use > >> random_r() and provide your own state and all should be well. > > > > Michael, would it make sense to add the random_r() family to the "SEE > > ALSO" section of the random() man page? > > > > (Admittedly, my random() manpage is ancient: 2008-03-07, so it might be > > this is already the case, in which case, ignore me :) > > (Up-to-date version of the pages can always be found online at the > location in the .sig.) Ah, I'll try to remember that. > Well, the man page already had this text under notes: > > This function should not be used in cases where multiple > threads use random() and the behavior should be reproducible. > Use random_r(3) for that purpose. Yeah, but I found it eventually, but I generally don't read a full manpage when I'm looking for related functions, only the SEE ALSO section. > But it certainly doesn't hurt to have random_r(3) also listed under > the SEE ALSO, and I've added it for man-pages-3.18. Thanks. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-02-02 18:35 ` Peter Zijlstra @ 2009-02-03 4:55 ` Mike Galbraith 0 siblings, 0 replies; 20+ messages in thread From: Mike Galbraith @ 2009-02-03 4:55 UTC (permalink / raw) To: Peter Zijlstra Cc: mtk.manpages, Thomas Pilarski, Andrew Morton, Gregory Haskins, bugme-daemon, linux-kernel This bug is now dead... so who closes it? -Mike ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores 2009-02-02 7:43 ` Thomas Pilarski 2009-02-02 8:19 ` Peter Zijlstra @ 2009-02-03 3:56 ` Valdis.Kletnieks 1 sibling, 0 replies; 20+ messages in thread From: Valdis.Kletnieks @ 2009-02-03 3:56 UTC (permalink / raw) To: Thomas Pilarski Cc: Mike Galbraith, Peter Zijlstra, Andrew Morton, Gregory Haskins, bugme-daemon, linux-kernel [-- Attachment #1: Type: text/plain, Size: 634 bytes --] On Mon, 02 Feb 2009 08:43:55 +0100, Thomas Pilarski said: > Am Freitag, den 30.01.2009, 08:57 +0100 schrieb Mike Galbraith: > > One of those "don't _ever_ do that" things? > > I did not known random() uses a system call. It's rather unrealistic to > have five million system calls in a second. By adding a small loop with > some calculations near the random, the problem disappears too. > It is a unlucky chosen data generator. Am I the only one that's scared by the concept of anything that beats on random numbers enough to need 5 million of them a second, but is still using the relatively sucky one that's in most glibc's? :) [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2009-02-03 4:55 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <bug-12562-10286@http.bugzilla.kernel.org/>
2009-01-28 20:56 ` [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores Andrew Morton
2009-01-28 22:15 ` Peter Zijlstra
2009-01-28 22:25 ` Thomas Pilarski
2009-01-29 9:07 ` Peter Zijlstra
2009-01-29 10:12 ` Thomas Pilarski
2009-01-29 10:24 ` Thomas Pilarski
2009-01-29 10:31 ` Peter Zijlstra
2009-01-29 11:37 ` Peter Zijlstra
2009-01-29 14:05 ` Thomas Pilarski
2009-01-30 7:57 ` Mike Galbraith
2009-02-02 7:43 ` Thomas Pilarski
2009-02-02 8:19 ` Peter Zijlstra
2009-02-02 8:33 ` Thomas Pilarski
2009-02-02 8:52 ` Mike Galbraith
2009-02-02 8:55 ` Peter Zijlstra
2009-02-02 12:15 ` Peter Zijlstra
2009-02-02 18:29 ` Michael Kerrisk
2009-02-02 18:35 ` Peter Zijlstra
2009-02-03 4:55 ` Mike Galbraith
2009-02-03 3:56 ` Valdis.Kletnieks
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox