Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores
       [not found] <bug-12562-10286@http.bugzilla.kernel.org/>
@ 2009-01-28 20:56 ` Andrew Morton
  2009-01-28 22:15   ` Peter Zijlstra
  2009-01-28 22:25   ` Thomas Pilarski
  0 siblings, 2 replies; 20+ messages in thread
From: Andrew Morton @ 2009-01-28 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Mike Galbraith, Gregory Haskins, thomas.pi
  Cc: bugme-daemon, linux-kernel


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed, 28 Jan 2009 06:35:20 -0800 (PST)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=12562
> 
>            Summary: High overhead while switching or synchronizing threads
>                     on different cores

Thanks for the report, and the testcase.

>            Product: Process Management
>            Version: 2.5
>      KernelVersion: 2.6.28
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Scheduler
>         AssignedTo: mingo@elte.hu
>         ReportedBy: thomas.pi@arcor.de

(There's testcase code in the bugzilla report)

(Seems to be a regression)

> 
> Hardware Environment: Core2Duo 2.4GHz / 4GB RAM 
> Software Environment: Ubuntu 8.10 + Vanilla 2.6.28
> 
> Hardware Environment: AMD64 X2 2.1GHz / 6GB RAM 
> Software Environment: Ubuntu 8.10 + Vanilla 2.6.28.2
> 
> Problem Description:
> The overhead on a dual core while switching between tasks is extremely high
> (>60% of cputime). If is produced by synchronization with pthread and
> mutex/cond. 
> 
> Executing the attaches program schedulingissue 1 1024 8 20, which create a
> producer and a consumer thread with eight 8kb big buffers. The producer creates
> 1024 random generated double values, consumer makes the same after receiving
> the buffer.
> 
> While executing the program the thoughtput is ~1.6 msg/s. While executing two
> instances of the program, the thoughtput is much higher (2 * 8.7 msg/s = 17,4
> msg/s). 
> 
> Small improvement while using jiffies as clocksource instead of acpi_pm or hpet
> (1.8 messages instead of 1.6). Disabling NO_HZ and HIGH_RESOLUTION_TIME gives
> no improvement. Much higher performance with kernel <= 2.6.24, but still four
> times slower.

Unclear.  What is four times slower than what?  You're saying that the
app progresses four times faster when there are two instances of it
running, rather than one instance?


> ---------------------------------------
> Linux bugs-laptop 2.6.28-hz-hrt #4 SMP Wed Jan 28 13:33:18 CET 2009 x86_64
> GNU/Linux
> acpi_pm (equal with htep)
> schedulerissue 1 1024 8 20
> All threads finished: 20 messages in 12.295 seconds / 1.627 msg/s
> schedulerissue 1 1024 8 200 & schedulerissue 1 1024 8 200
> All threads finished: 200 messages in 22.882 seconds / 8.741 msg/s
> All threads finished: 200 messages in 22.934 seconds / 8.721 msg/s
> ---------------------------------------
> Linux bugs-laptop 2.6.28-hz-hrt #4 SMP Wed Jan 28 13:33:18 CET 2009 x86_64
> GNU/Linux
> jiffies
> schedulerissue 1 1024 8 20
> All threads finished: 20 messages in 10.704 seconds / 1.868 msg/s
> schedulerissue 1 1024 8 200 & schedulerissue 1 1024 8 200
> All threads finished: 200 messages in 23.372 seconds / 8.557 msg/s
> All threads finished: 200 messages in 23.460 seconds / 8.525 msg/s
> --------------------------------------
> Linux bugs-laptop 2.6.24.7 #1 SMP Wed Jan 14 10:21:04 CET 2009 x86_64 GNU/Linux
> hpet 
> schedulerissue 1 1024 8 20
> All threads finished: 20 messages in 5.290 seconds / 3.781 msg/s
> schedulerissue 1 1024 8 200 & schedulerissue 1 1024 8 200
> All threads finished: 200 messages in 23.000 seconds / 8.695 msg/s
> All threads finished: 200 messages in 23.078 seconds / 8.666 msg/s
> 

Seems that 2.6.24 is faster than 2.6.28 with 20 messages, but 2.6.24
and 2.6.28 run at the same speed when 200 messages are sent?

If so, that seems rather odd, doesn't it?  Is it possible that cpufreq
does something bad once the CPU gets hot?


> AMD64 X2 @ 2.1GHz
> Linux bugs-desktop 2.6.28.2 #4 SMP Mon Jan 26 20:26:12 CET 2009 x86_64
> GNU/Linux
> acpi_pm
> schedulerissue 1 1024 8 20
> All threads finished: 20 messages in 9.288 seconds / 2.153 msg/s
> schedulerissue 1 1024 8 200
> All threads finished: 200 messages in 17.049 seconds / 11.731 msg/s
> All threads finished: 200 messages in 18.539 seconds / 10.788 msg/s




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores
  2009-01-28 20:56 ` [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores Andrew Morton
@ 2009-01-28 22:15   ` Peter Zijlstra
  2009-01-28 22:25   ` Thomas Pilarski
  1 sibling, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2009-01-28 22:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Galbraith, Gregory Haskins, thomas.pi, bugme-daemon,
	linux-kernel

On Wed, 2009-01-28 at 12:56 -0800, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Wed, 28 Jan 2009 06:35:20 -0800 (PST)
> bugme-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=12562
> > 
> >            Summary: High overhead while switching or synchronizing threads
> >                     on different cores
> 
> Thanks for the report, and the testcase.
> 
> >            Product: Process Management
> >            Version: 2.5
> >      KernelVersion: 2.6.28
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Scheduler
> >         AssignedTo: mingo@elte.hu
> >         ReportedBy: thomas.pi@arcor.de
> 
> (There's testcase code in the bugzilla report)
> 
> (Seems to be a regression)

Is there a known good kernel?

> > 
> > Hardware Environment: Core2Duo 2.4GHz / 4GB RAM 
> > Software Environment: Ubuntu 8.10 + Vanilla 2.6.28
> > 
> > Hardware Environment: AMD64 X2 2.1GHz / 6GB RAM 
> > Software Environment: Ubuntu 8.10 + Vanilla 2.6.28.2
> > 
> > Problem Description:
> > The overhead on a dual core while switching between tasks is extremely high
> > (>60% of cputime). If is produced by synchronization with pthread and
> > mutex/cond. 
> > 
> > Executing the attaches program schedulingissue 1 1024 8 20, which create a
> > producer and a consumer thread with eight 8kb big buffers. The producer creates
> > 1024 random generated double values, consumer makes the same after receiving
> > the buffer.
> > 
> > While executing the program the thoughtput is ~1.6 msg/s. While executing two
> > instances of the program, the thoughtput is much higher (2 * 8.7 msg/s = 17,4
> > msg/s). 
> > 
> > Small improvement while using jiffies as clocksource instead of acpi_pm or hpet
> > (1.8 messages instead of 1.6). Disabling NO_HZ and HIGH_RESOLUTION_TIME gives
> > no improvement. Much higher performance with kernel <= 2.6.24, but still four
> > times slower.
> 
> Unclear.  What is four times slower than what?  You're saying that the
> app progresses four times faster when there are two instances of it
> running, rather than one instance?

It seems that way indeed, a bit more clarity would be good though.

> > ---------------------------------------
> > Linux bugs-laptop 2.6.28-hz-hrt #4 SMP Wed Jan 28 13:33:18 CET 2009 x86_64
> > GNU/Linux
> > acpi_pm (equal with htep)
> > schedulerissue 1 1024 8 20
> > All threads finished: 20 messages in 12.295 seconds / 1.627 msg/s
> > schedulerissue 1 1024 8 200 & schedulerissue 1 1024 8 200
> > All threads finished: 200 messages in 22.882 seconds / 8.741 msg/s
> > All threads finished: 200 messages in 22.934 seconds / 8.721 msg/s
> > ---------------------------------------
> > Linux bugs-laptop 2.6.28-hz-hrt #4 SMP Wed Jan 28 13:33:18 CET 2009 x86_64
> > GNU/Linux
> > jiffies
> > schedulerissue 1 1024 8 20
> > All threads finished: 20 messages in 10.704 seconds / 1.868 msg/s
> > schedulerissue 1 1024 8 200 & schedulerissue 1 1024 8 200
> > All threads finished: 200 messages in 23.372 seconds / 8.557 msg/s
> > All threads finished: 200 messages in 23.460 seconds / 8.525 msg/s
> > --------------------------------------
> > Linux bugs-laptop 2.6.24.7 #1 SMP Wed Jan 14 10:21:04 CET 2009 x86_64 GNU/Linux
> > hpet 
> > schedulerissue 1 1024 8 20
> > All threads finished: 20 messages in 5.290 seconds / 3.781 msg/s
> > schedulerissue 1 1024 8 200 & schedulerissue 1 1024 8 200
> > All threads finished: 200 messages in 23.000 seconds / 8.695 msg/s
> > All threads finished: 200 messages in 23.078 seconds / 8.666 msg/s
> > 
> 
> Seems that 2.6.24 is faster than 2.6.28 with 20 messages, but 2.6.24
> and 2.6.28 run at the same speed when 200 messages are sent?
> 
> If so, that seems rather odd, doesn't it?  Is it possible that cpufreq
> does something bad once the CPU gets hot?

Nah, I'll bet is a cache affinity issue.

Some applications like strong wakeup affinity, others not so. This looks
to be a lover.

With a single instance, the producer and consumer get scheduled on two
different cores for some reason (maybe wake idle too strong).

With two instances, they get to stay on the same cpu, since the other
cpu is already busy.

I'll start up the browser in the morning to download this proglet and
poke at it some, but sleep comes first.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores
  2009-01-28 20:56 ` [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores Andrew Morton
  2009-01-28 22:15   ` Peter Zijlstra
@ 2009-01-28 22:25   ` Thomas Pilarski
  2009-01-29  9:07     ` Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Thomas Pilarski @ 2009-01-28 22:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Mike Galbraith, Gregory Haskins, bugme-daemon,
	linux-kernel

Am Mittwoch, den 28.01.2009, 12:56 -0800 schrieb Andrew Morton: 

> (There's testcase code in the bugzilla report)
> 
> (Seems to be a regression)

There is a regression, because of the improved cpu switching. The problem exists in every kernel. 
I takes a lot of time to switch between the threads, when they are executed on different cores.
Perhaps of the big buffer size of 512KB?

> > Small improvement while using jiffies as clocksource instead of acpi_pm or hpet
> > (1.8 messages instead of 1.6). Disabling NO_HZ and HIGH_RESOLUTION_TIME gives
> > no improvement. Much higher performance with kernel <= 2.6.24, but still four
> > times slower.
> 
> Unclear.  What is four times slower than what?  You're saying that the
> app progresses four times faster when there are two instances of it
> running, rather than one instance?

About 4 messages every second, while executing only one instance and
about 8 message every second, while executing two instance of the test.
It makes 16 messages every second, when the two threads of a instance is
executed on only one core.

> Seems that 2.6.24 is faster than 2.6.28 with 20 messages, but 2.6.24
> and 2.6.28 run at the same speed when 200 messages are sent?

I have executed the test twenty times. It stays constant on 2.6.28. On
2.6.24 one of ten tests is executed slower.

******* kernel 2.6.28:
All threads finished: 20 messages in 12.853 seconds / 1.556 msg/s
real	0m12.857s
user	0m8.589s
sys	0m16.629s

******* kernel 2.6.24:
All threads finished: 20 messages in 4.939 seconds / 4.050 msg/s
real	0m4.942s
user	0m5.248s
sys	0m4.352s

One of ten executions is going down to 1.806 msg/s.
All threads finished: 20 messages in 11.074 seconds / 1.806 msg/s
real	0m11.077s
user	0m8.817s
sys	0m12.925s

> If so, that seems rather odd, doesn't it?  Is it possible that cpufreq
> does something bad once the CPU gets hot?

I have disabled the acpid, clocked the cpu to 2.4GHz and watched the
temperature of the cores and the frequency. The clock stay always at
2.4GHz and the temperature is always below 67°C. My cpu is clocking down
at 95°C.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores
  2009-01-28 22:25   ` Thomas Pilarski
@ 2009-01-29  9:07     ` Peter Zijlstra
  2009-01-29 10:12       ` Thomas Pilarski
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2009-01-29  9:07 UTC (permalink / raw)
  To: Thomas Pilarski
  Cc: Andrew Morton, Mike Galbraith, Gregory Haskins, bugme-daemon,
	linux-kernel

On Wed, 2009-01-28 at 23:25 +0100, Thomas Pilarski wrote:
> Am Mittwoch, den 28.01.2009, 12:56 -0800 schrieb Andrew Morton: 
> 
> > (There's testcase code in the bugzilla report)
> > 
> > (Seems to be a regression)
> 
> There is a regression, because of the improved cpu switching. The
> problem exists in every kernel. 

This is a contradiction in terms - twice.

If it is a regression, then clearly things haven't improved.

If it is a regression, state clearly when it worked last. If it never
worked, it cannot be a regression.

> I takes a lot of time to switch between the threads, when they are
> executed on different cores.
> Perhaps of the big buffer size of 512KB?

Of course, pushing 512kb to another cpu means lots and lots of cache
misses.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores
  2009-01-29  9:07     ` Peter Zijlstra
@ 2009-01-29 10:12       ` Thomas Pilarski
  2009-01-29 10:24         ` Thomas Pilarski
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Pilarski @ 2009-01-29 10:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Mike Galbraith, Gregory Haskins, bugme-daemon,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1893 bytes --]

> > There is a regression, because of the improved cpu switching. The
> > problem exists in every kernel. 
> 
> This is a contradiction in terms - twice.
> 
> If it is a regression, then clearly things haven't improved.
> 
> If it is a regression, state clearly when it worked last. If it never
> worked, it cannot be a regression.

There is a improvement in load balancing for single threaded
applications. It's a regression for my problem. But the problem exists
in every kernel I have tested.

> > I takes a lot of time to switch between the threads, when they are
> > executed on different cores.
> > Perhaps of the big buffer size of 512KB?
> 
> Of course, pushing 512kb to another cpu means lots and lots of cache
> misses.

I have tried 2.6.15, 2.6.18 and 2.6.20 too, but same behavior as in
2.6.24.
With Windows I can get 64 message every second with a buffer size of 512
KB. It is reduced to 16 messages with a buffer size of 1MB. But I think
it not really comparable, because there is nearby no cpu consumption
with 512kB. Perhaps random() works different. By increasing the cpu
usage eight times in the producer, I can get 16msg/s and both cores are
used about ~50%. Doing the same with linux I get a throughput of
~2msg/s. 

If it is a caching issue, shouldn't it exists in Windows too?

Using a smaller buffer of 4KB, the test is executed on one core only. 
./schedulerissue 1 4096 8 2000
All threads finished: 2000 messages in 1.631 seconds / 1226.076 msg/s
real	0m1.635s
user	0m1.352s
sys	0m0.052s


But I want to use both cores to increase the performance. Adding a
second producer and a second consumer reduces the performance to 33%.
Both cores are used.
./schedulerissue 2 4096 8 2000
All threads finished: 1999 messages in 4.744 seconds / 421.379 msg/s
real	0m4.748s
user	0m3.280s
sys	0m5.852s

I have added a new version as there was a possible deadlock during
shut-down.

[-- Attachment #2: ThreadSchedulingIssue.c --]
[-- Type: text/x-csrc, Size: 9410 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <math.h>
#include <signal.h>
#include <unistd.h>
#include <time.h>

int CTHREADPAIRCOUNT;
int CBUFFER_SIZE;
int CBUFFER_COUNT;
int CMESSAGES_COUNT;

int todo_messages;

pthread_mutex_t producer_mutex;
pthread_cond_t producer_cond;

pthread_mutex_t consumer_mutex;
pthread_cond_t consumer_cond;

pthread_mutex_t result_mutex;
pthread_mutex_t todo_mutex;

int message_count = 0;
double start;
double end;

int terminate;
int terminate_producer;

double* buffers; //[CBUFFER_COUNT][CBUFFER_SIZE];

int freebuffer_count;
int freebuffer_pos;
double** free_buffers; //[CBUFFER_COUNT];

int filledbuffer_count;
int filledbuffer_pos;
double** filled_buffers; //[CBUFFER_COUNT];


/**
 * Return system uptime in µs
 */
double getSystemTime() {
	struct timespec tv;
	clock_gettime(CLOCK_MONOTONIC, &tv);
	return (double) (tv.tv_sec) * 1000000.0 + (double) (tv.tv_nsec) / 1000.0;
}

/**
 * Get free buffer and block thread if not buffer is available.
 */
double* getFreeBuffer() {
	pthread_mutex_lock(&producer_mutex);
	// If there is no free buffer to be filled, wait
	while (freebuffer_count == 0) {
		//printf("wait for free buffer\n");
		/** exit if all masages are finished **/
		if (terminate || terminate_producer) {
			pthread_mutex_unlock(&producer_mutex);
			return NULL;
		}
		pthread_cond_wait(&producer_cond, &producer_mutex);
	}
	usleep(1);
	double* result = free_buffers[freebuffer_pos];
	freebuffer_pos = (freebuffer_pos + 1) % CBUFFER_COUNT;
	freebuffer_count--;
	pthread_mutex_unlock(&producer_mutex);
	return result;
}

/**
 * Return free buffer and notify producer
 */
void returnFreeBuffer(double* buff) {
	pthread_mutex_lock(&producer_mutex);
	free_buffers[(freebuffer_pos + freebuffer_count) % CBUFFER_COUNT] = buff;
	freebuffer_count++;
	// Notify waiting producer
	//printf("added free buffer\n");
	pthread_cond_signal(&producer_cond);
	pthread_mutex_unlock(&producer_mutex);
}

/**
 * Add filled buffer and notify consumer
 */
void putFilledBuffer(double* buff) {
	pthread_mutex_lock(&consumer_mutex);
	filled_buffers[(filledbuffer_pos + filledbuffer_count) % CBUFFER_COUNT]
			= buff;
	filledbuffer_count++;
	// Notify waiting consumers
	//printf("added filled buffer\n");
	pthread_cond_signal(&consumer_cond);
	pthread_mutex_unlock(&consumer_mutex);
}

/**
 * Get filled buffer or wait until exists
 */
double* getFilledBuffer() {
	pthread_mutex_lock(&consumer_mutex);
	// If there is no filled buffer, wait until producer fills a new one
	while (filledbuffer_count == 0) {
		//printf("wait for filled buffer\n");
		/**
		 * exit if all massages are finished
		 * This can cause the loosing of some
		 * already produces data.
		 **/
		if (terminate || terminate_producer) {
			terminate = 1;
			pthread_mutex_unlock(&consumer_mutex);
			return NULL;
		}
		pthread_cond_wait(&consumer_cond, &consumer_mutex);
	}
	double* result = filled_buffers[filledbuffer_pos];
	filledbuffer_pos = (filledbuffer_pos + 1) % CBUFFER_COUNT;
	filledbuffer_count--;
	pthread_mutex_unlock(&consumer_mutex);
	return result;
}

/**
 * Producer thread. Filled buffer with random numbers and add to consumer list.
 */
void *thread_producer(void *arg) {
	while (!terminate && !terminate_producer) {
		int i;
		pthread_mutex_lock(&todo_mutex);
		if (todo_messages <= 0) {
			terminate_producer = 1;
			pthread_mutex_unlock(&todo_mutex);
			break;
		}
		todo_messages--;
		pthread_mutex_unlock(&todo_mutex);
		double* cbuff = getFreeBuffer();
		if (cbuff) {
			cbuff[0] = getSystemTime();
			for (i = 2; i < CBUFFER_SIZE; i++) {
				// Fill the buffer with random character 0 - 255
				cbuff[i] =
//				(double) random() / (double) RAND_MAX *
//				(double) random() / (double) RAND_MAX *
//				(double) random() / (double) RAND_MAX *
//				(double) random() / (double) RAND_MAX *
//				(double) random() / (double) RAND_MAX *
//				(double) random() / (double) RAND_MAX *
//				(double) random() / (double) RAND_MAX *
//				(double) random() / (double) RAND_MAX *
				(double) random() / (double) RAND_MAX;
			}

			cbuff[1] = getSystemTime();
			putFilledBuffer(cbuff);

		}
	}
	pthread_exit(NULL);
}

/**
 * Consumer thread. Get filled buffer. Make something and return to producer list.
 */
void *thread_consumer(void *arg) {
	while (!terminate) {
		int i;
		double* cbuff = getFilledBuffer();
		if (cbuff) {
			cbuff[2] = getSystemTime();
			for (i = 4; i < CBUFFER_SIZE - 1; i++) {
				// Fill the buffer with random character 0 - 255
				cbuff[i] *= (double) random() / (double) RAND_MAX;
			}
			cbuff[3] = getSystemTime();

			pthread_mutex_lock(&result_mutex);
			if ((message_count == 0) || (start > cbuff[0])) {
				start = cbuff[0];
			}
			if ((message_count == 0) || (end < cbuff[3])) {
				end = cbuff[3];
			}
			message_count++;
			pthread_mutex_unlock(&result_mutex);
			//		printf("Message runntime Calc:%1.3fms / Sendmessage: %1.3fms / Calc:%1.3fms\n",
			//				(cbuff[1] - cbuff[0])/1000.0,
			//				(cbuff[2] - cbuff[1])/1000.0,
			//				(cbuff[3] - cbuff[2])/1000.0
			//				);
			returnFreeBuffer(cbuff);
		}
	}
	pthread_exit(NULL);
}

/**
 * Set terminate flag on sig quit.
 */
void sig_quit(int a) {
	terminate = 1;
	printf("Terminate calculation\n");
	/*
	 *  Notify producers, a they can wait for
	 *  free buffers.
	 */
	pthread_cond_broadcast(&producer_cond);
}

/**
 * For testing purposes only.
 */
int main(int argc, char *argv[]) {
	terminate = 0;
	terminate_producer = 0;
	if (signal(SIGINT, sig_quit) == SIG_ERR) {
		printf("Could not init quit signal\n");
		return -1;
	}

	if (argc < 5) {
		printf(
				"Need tree parameters. Number of thread pairs - message size in doubles - buffer count - overall messages - (show intermediate data intervall)\n");
		exit(-1);
	}

	int show_intermediate = 0;

	CTHREADPAIRCOUNT = atoi(argv[1]);
	CBUFFER_SIZE = atoi(argv[2]);
	CBUFFER_COUNT = atoi(argv[3]);
	CMESSAGES_COUNT = atoi(argv[4]);
	if (argc > 5) {
		show_intermediate = atoi(argv[5]);
	}

	if ((CTHREADPAIRCOUNT < 1) || (CTHREADPAIRCOUNT > 256)) {
		printf("Number of thread pairs is limited by 1-256\n");
		exit(-1);
	}
	if ((CBUFFER_SIZE < 8) || (CBUFFER_SIZE > 1048576)) {
		printf("Buffer size is limited by 8-1,048,576\n");
		exit(-1);
	}
	if ((CBUFFER_COUNT < 1) || (CBUFFER_COUNT
			> CTHREADPAIRCOUNT * 8)) {
		printf(
				"Number of buffers is limited by 1 and tread pairs * 8\n");
		exit(-1);
	}
	if ((CMESSAGES_COUNT < CTHREADPAIRCOUNT * 2) || (CBUFFER_COUNT
			> CTHREADPAIRCOUNT * 100)) {
		printf(
				"Number of messages is limited by thread pairs * 2 and tread pairs * 100\n");
		exit(-1);
	}
	if ((show_intermediate < 0) || (show_intermediate > 10)) {
		printf(	"Intermediate data interval must be in [0-10]\n");
		exit(-1);
	}

	buffers = malloc(CBUFFER_COUNT * CBUFFER_SIZE * sizeof(double));
	free_buffers = malloc(CBUFFER_COUNT * sizeof(double*));
	filled_buffers = malloc(CBUFFER_COUNT * sizeof(double*));

	todo_messages = CMESSAGES_COUNT;

	pthread_mutex_init(&consumer_mutex, NULL);
	pthread_cond_init(&consumer_cond, NULL);

	pthread_mutex_init(&producer_mutex, NULL);
	pthread_cond_init(&producer_cond, NULL);

	pthread_mutex_init(&result_mutex, NULL);
	pthread_mutex_init(&todo_mutex, NULL);

	int i;
	for (i = 0; i < CBUFFER_COUNT; i++) {
		free_buffers[i] = &(buffers[i * CBUFFER_SIZE]);
	}
	freebuffer_count = CBUFFER_COUNT;
	freebuffer_pos = 0;
	filledbuffer_count = 0;
	filledbuffer_pos = 0;

	pthread_t threads[CTHREADPAIRCOUNT * 2];

	for (i = 0; i < CTHREADPAIRCOUNT; i++) {
		if (pthread_create(&threads[i], NULL, thread_producer, NULL)) {
			printf("Could not create producer %d\n", i);
		}
		if (pthread_create(&threads[i + CTHREADPAIRCOUNT],
				NULL, thread_consumer, NULL)) {
			printf("Could not create consumer %d\n", i);
		}
	}

	double start_overall = -1;
	double end_overall = -1;
	int all_messages = 0;

	if (show_intermediate) {
		while (!terminate) {
			sleep(show_intermediate);
			pthread_mutex_lock(&result_mutex);
			printf("Messages %d - msg/s: %1.3f\n", message_count,
					((double) message_count) / ((end - start) / 1000000.0));
			if ((start_overall < 0) || (start_overall > start)) {
				start_overall = start;
			}
			if ((end_overall < 0) || (end_overall < end)) {
				end_overall = end;
			}
			//start = getSystemTime();
			all_messages += message_count;
			message_count = 0;
			pthread_mutex_unlock(&result_mutex);
		}
	}

	for (i = 0; i < CTHREADPAIRCOUNT; i++) {
		//printf("Wait for thread %d\n", i);
		pthread_join(threads[i], NULL);
	}


	terminate = 1;
	/**
	 * Notify consumers, as they can wait for
	 * data.
	 */
	pthread_cond_broadcast(&consumer_cond);

	for (i = CTHREADPAIRCOUNT; i < CTHREADPAIRCOUNT*2; i++) {
		//printf("Wait for thread %d\n", i);
		pthread_join(threads[i], NULL);
	}

	if (!show_intermediate) {
		start_overall = start;
		end_overall = end;
		all_messages = message_count;
	}
	printf(
			"All threads finished: %d messages in %1.3f seconds / %1.3f msg/s\n",
			all_messages, (end_overall - start_overall) / 1000000.0,
			(double) all_messages / ((end_overall - start_overall) / 1000000.0));

	pthread_mutex_destroy(&producer_mutex);
	pthread_cond_destroy(&producer_cond);

	pthread_mutex_destroy(&consumer_mutex);
	pthread_cond_destroy(&consumer_cond);

	pthread_mutex_destroy(&result_mutex);
	pthread_mutex_destroy(&todo_mutex);

	free(buffers);
	free(free_buffers);
	free(filled_buffers);

	return EXIT_SUCCESS;
}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores
  2009-01-29 10:12       ` Thomas Pilarski
@ 2009-01-29 10:24         ` Thomas Pilarski
  2009-01-29 10:31           ` Peter Zijlstra
  2009-01-29 11:37           ` Peter Zijlstra
  0 siblings, 2 replies; 20+ messages in thread
From: Thomas Pilarski @ 2009-01-29 10:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Mike Galbraith, Gregory Haskins, bugme-daemon,
	linux-kernel

Some explanation of the test program. 

./schedulerissue 1 4096 8 2000
1 producer and 1 consumer
buffer size of 4096 doubles * 8byte 
8 buffer (256kB total buffer)
2000 messages

./schedulerissue 2 4096 8 2000
2 producer and 2 consumer
buffer size of 4096 doubles * 8byte 
8 buffer (256kB total buffer)
2000 messages


It was not 512KB bytes in the test before, but 4MB.
But there is the same problem with a total buffer size of 48kB and 4
threads (./schedulerissue 2 2048 3 20000).



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores
  2009-01-29 10:24         ` Thomas Pilarski
@ 2009-01-29 10:31           ` Peter Zijlstra
  2009-01-29 11:37           ` Peter Zijlstra
  1 sibling, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2009-01-29 10:31 UTC (permalink / raw)
  To: Thomas Pilarski
  Cc: Andrew Morton, Mike Galbraith, Gregory Haskins, bugme-daemon,
	linux-kernel

On Thu, 2009-01-29 at 11:24 +0100, Thomas Pilarski wrote:
> Some explanation of the test program. 
> 
> ../schedulerissue 1 4096 8 2000
> 1 producer and 1 consumer
> buffer size of 4096 doubles * 8byte 
> 8 buffer (256kB total buffer)
> 2000 messages
> 
> ../schedulerissue 2 4096 8 2000
> 2 producer and 2 consumer
> buffer size of 4096 doubles * 8byte 
> 8 buffer (256kB total buffer)
> 2000 messages
> 
> 
> It was not 512KB bytes in the test before, but 4MB.
> But there is the same problem with a total buffer size of 48kB and 4
> threads (./schedulerissue 2 2048 3 20000).

Right, read the proglet (and removed that usleep(1)) and am poking at
it.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores
  2009-01-29 10:24         ` Thomas Pilarski
  2009-01-29 10:31           ` Peter Zijlstra
@ 2009-01-29 11:37           ` Peter Zijlstra
  2009-01-29 14:05             ` Thomas Pilarski
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2009-01-29 11:37 UTC (permalink / raw)
  To: Thomas Pilarski
  Cc: Andrew Morton, Mike Galbraith, Gregory Haskins, bugme-daemon,
	linux-kernel

On Thu, 2009-01-29 at 11:24 +0100, Thomas Pilarski wrote:
> Some explanation of the test program. 
> 
> ../schedulerissue 1 4096 8 2000
> 1 producer and 1 consumer
> buffer size of 4096 doubles * 8byte 
> 8 buffer (256kB total buffer)
> 2000 messages
> 
> ../schedulerissue 2 4096 8 2000
> 2 producer and 2 consumer
> buffer size of 4096 doubles * 8byte 
> 8 buffer (256kB total buffer)
> 2000 messages
> 
> 
> It was not 512KB bytes in the test before, but 4MB.
> But there is the same problem with a total buffer size of 48kB and 4
> threads (./schedulerissue 2 2048 3 20000).

Linux opteron 2.6.29-rc3-tip #61 SMP PREEMPT Thu Jan 29 11:59:15 CET
2009 x86_64 x86_64 x86_64 GNU/Linux

[root@opteron bench]# schedtool -a 1 -e ./ThreadSchedulingIssue 1 4096 8 20000
All threads finished: 19992 messages in 6.485 seconds / 3082.877 msg/s
[root@opteron bench]# ./ThreadSchedulingIssue 1 4096 8 20000
All threads finished: 19992 messages in 6.496 seconds / 3077.604 msg/s
[root@opteron bench]# ./ThreadSchedulingIssue 1 4096 8 20000 & ./ThreadSchedulingIssue 1 4096 8 20000 &
[1] 10314
[2] 10315
[root@opteron bench]# All threads finished: 19992 messages in 6.720 seconds / 2975.009 msg/s
All threads finished: 19992 messages in 6.792 seconds / 2943.574 msg/s

[1]-  Done                    ./ThreadSchedulingIssue 1 4096 8 20000
[2]+  Done                    ./ThreadSchedulingIssue 1 4096 8 20000
[root@opteron bench]# ./ThreadSchedulingIssue 2 4096 8 20000
All threads finished: 19992 messages in 17.299 seconds / 1155.667 msg/s


[root@opteron bench]# for i in 4 8 16 32 64 128 256 ; do 
> echo -n $((i*1024)) $((80000/i)) " " ; 
> schedtool -a 1 -e ./ThreadSchedulingIssue 1 $((i*1024)) 8 $((80000/i)) ;
> done
4096 20000  All threads finished: 19992 messages in 6.368 seconds / 3139.251 msg/s
8192 10000  All threads finished: 9992 messages in 5.363 seconds / 1863.083 msg/s
16384 5000  All threads finished: 4992 messages in 5.471 seconds / 912.479 msg/s
32768 2500  All threads finished: 2493 messages in 5.730 seconds / 435.059 msg/s
65536 1250  All threads finished: 1242 messages in 5.544 seconds / 224.021 msg/s
131072 625  All threads finished: 617 messages in 5.755 seconds / 107.217 msg/s
262144 312  All threads finished: 305 messages in 6.014 seconds / 50.713 msg/s

[root@opteron bench]# for i in 4 8 16 32 64 128 256 ; do
> echo -n $((i*1024)) $((80000/i)) " " ;
> ./ThreadSchedulingIssue 1 $((i*1024)) 8 $((80000/i)) ;
> done
4096 20000  All threads finished: 19992 messages in 6.462 seconds / 3093.717 msg/s
8192 10000  All threads finished: 9992 messages in 8.767 seconds / 1139.738 msg/s
16384 5000  All threads finished: 5000 messages in 5.366 seconds / 931.798 msg/s
32768 2500  All threads finished: 2494 messages in 20.720 seconds / 120.369 msg/s
65536 1250  All threads finished: 1242 messages in 11.521 seconds / 107.805 msg/s
131072 625  All threads finished: 618 messages in 14.035 seconds / 44.032 msg/s
262144 312  All threads finished: 305 messages in 17.342 seconds / 17.587 msg/s

The above point between 16 and 32 is exactly where the total working set
doesn't fit into cache anymore -- I suspect that pushes the producer's
latency to go to sleep over the edge and everything collapses.


We use wakeup patterns to determine if two tasks are working together
and should thus be kept together.

Task A should wake up B, and B should wake up A. Furthermore, any task
should quickly go to sleep after waking up the other.

This program does neither, with a single pair, the producer continues
production after waking the consumer (until the queue is filled --
which, if the consumer is fast enough, might never happen).

With multiple pairs there is no strict pair relation at all, since they
all work on the same global buffer queue, so P1 can wake Cn etc.

Furthermore the program uses shared memory (not a bad design), and thus
mises out on the explicit affinity hints of pipes, sockets, etc.


In short this program is carefully crafted to defeat all our affinity
tests - and I'm not sure what to do.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores
  2009-01-29 11:37           ` Peter Zijlstra
@ 2009-01-29 14:05             ` Thomas Pilarski
  2009-01-30  7:57               ` Mike Galbraith
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Pilarski @ 2009-01-29 14:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Mike Galbraith, Gregory Haskins, bugme-daemon,
	linux-kernel


> In short this program is carefully crafted to defeat all our affinity
> tests - and I'm not sure what to do.

I am sorry, although it is not carefully crafted. The function random()
is causing my problem. I currently have no real data, so I tried to make
some random utilization and data.

Without the random() function it works even with 80MB of data and I get
great results.

./ThreadSchedulingIssue 1 10485760 8 312
All threads finished: 309 messages in 29.369 seconds / 10.521 msg/s

schedtool -a 1 -e ./ThreadSchedulingIssue 1 10485760 8 312
All threads finished: 312 messages in 44.284 seconds / 7.045 msg/s

It does not even regress with more then two threads. 

./ThreadSchedulingIssue 2 10485760 8 312
All threads finished: 311 messages in 28.040 seconds / 11.091 msg/s

./ThreadSchedulingIssue 4 10485760 8 312
All threads finished: 309 messages in 28.021 seconds / 11.027 msg/s

With small amounts of data the speed on two core is even doubled. 

schedtool -a 1 -e ./ThreadSchedulingIssue 1 1048 8 312000
All threads finished: 311992 messages in 19.437 seconds / 16051.247
msg/s

./ThreadSchedulingIssue 3 1048 8 312000
All threads finished: 311998 messages in 9.652 seconds / 32324.411 msg/s

./ThreadSchedulingIssue 8 1048 8 312000
All threads finished: 311997 messages in 9.339 seconds / 33406.370 msg/s

--------------
Perhaps it is as it should be, but when I run the test (without
random()) with 2*8 threads, it uses ~186 of the cpu, while an instance
of "bzip2 -9 -c /dev/urandom >/dev/null" gets only 12%.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores
  2009-01-29 14:05             ` Thomas Pilarski
@ 2009-01-30  7:57               ` Mike Galbraith
  2009-02-02  7:43                 ` Thomas Pilarski
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Galbraith @ 2009-01-30  7:57 UTC (permalink / raw)
  To: Thomas Pilarski
  Cc: Peter Zijlstra, Andrew Morton, Gregory Haskins, bugme-daemon,
	linux-kernel

On Thu, 2009-01-29 at 15:05 +0100, Thomas Pilarski wrote:
> > In short this program is carefully crafted to defeat all our affinity
> > tests - and I'm not sure what to do.
> 
> I am sorry, although it is not carefully crafted. The function random()
> is causing my problem. I currently have no real data, so I tried to make
> some random utilization and data.

Yeah, rather big difference, mega-contention vs zero-contention.

2.6.28.2, profile of ThreadSchedulingIssue 4 524288 8 200

vma              samples  %        app name                 symbol name
ffffffff80251efa 2574819  31.6774  vmlinux                  futex_wake
ffffffff80251a39 1367613  16.8255  vmlinux                  futex_wait
0000000000411790 815426   10.0320  ThreadSchedulingIssue    random
ffffffff8022b3b5 343692    4.2284  vmlinux                  task_rq_lock
0000000000404e30 299316    3.6824  ThreadSchedulingIssue    __lll_lock_wait_private
ffffffff8030d430 262906    3.2345  vmlinux                  copy_user_generic_string
ffffffff80462af2 235176    2.8933  vmlinux                  schedule
0000000000411b90 210984    2.5957  ThreadSchedulingIssue    random_r
ffffffff80251730 129376    1.5917  vmlinux                  hash_futex
ffffffff8020be10 123548    1.5200  vmlinux                  system_call
ffffffff8020a679 119398    1.4689  vmlinux                  __switch_to
ffffffff8022f49b 110068    1.3541  vmlinux                  try_to_wake_up
ffffffff8024c4d1 106352    1.3084  vmlinux                  sched_clock_cpu
ffffffff8020be20 102709    1.2636  vmlinux                  system_call_after_swapgs
ffffffff80229a2d 100614    1.2378  vmlinux                  update_curr
ffffffff80248309 86475     1.0639  vmlinux                  add_wait_queue
ffffffff80253149 85969     1.0577  vmlinux                  do_futex

Versus using myrand() free sample cruft generator from rand(3) manpage.  Poof.

vma      samples  %        app name                 symbol name
004002f4 979506   90.7113  ThreadSchedulingIssue    myrand
00400b00 53348     4.9405  ThreadSchedulingIssue    thread_consumer
00400c25 42710     3.9553  ThreadSchedulingIssue    thread_producer

One of those "don't _ever_ do that" things?

	-Mike



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores
  2009-01-30  7:57               ` Mike Galbraith
@ 2009-02-02  7:43                 ` Thomas Pilarski
  2009-02-02  8:19                   ` Peter Zijlstra
  2009-02-03  3:56                   ` Valdis.Kletnieks
  0 siblings, 2 replies; 20+ messages in thread
From: Thomas Pilarski @ 2009-02-02  7:43 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Peter Zijlstra, Andrew Morton, Gregory Haskins, bugme-daemon,
	linux-kernel

Am Freitag, den 30.01.2009, 08:57 +0100 schrieb Mike Galbraith:
> One of those "don't _ever_ do that" things?

I did not known random() uses a system call. It's rather unrealistic to
have five million system calls in a second. By adding a small loop with
some calculations near the random, the problem disappears too.
It is a unlucky chosen data generator.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores
  2009-02-02  7:43                 ` Thomas Pilarski
@ 2009-02-02  8:19                   ` Peter Zijlstra
  2009-02-02  8:33                     ` Thomas Pilarski
  2009-02-03  3:56                   ` Valdis.Kletnieks
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2009-02-02  8:19 UTC (permalink / raw)
  To: Thomas Pilarski
  Cc: Mike Galbraith, Andrew Morton, Gregory Haskins, bugme-daemon,
	linux-kernel

On Mon, 2009-02-02 at 08:43 +0100, Thomas Pilarski wrote:
> Am Freitag, den 30.01.2009, 08:57 +0100 schrieb Mike Galbraith:
> > One of those "don't _ever_ do that" things?
> 
> I did not known random() uses a system call. It's rather unrealistic to
> have five million system calls in a second. By adding a small loop with
> some calculations near the random, the problem disappears too.
> It is a unlucky chosen data generator.

I suppose you'll have to go bug the glibc people about their random()
implementation.

If you really need random() to perform for your application (monte-carlo
stuff?) You might be better off writing a PRNG with TLS state or
something.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores
  2009-02-02  8:19                   ` Peter Zijlstra
@ 2009-02-02  8:33                     ` Thomas Pilarski
  2009-02-02  8:52                       ` Mike Galbraith
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Pilarski @ 2009-02-02  8:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Andrew Morton, Gregory Haskins, bugme-daemon,
	linux-kernel

Am Montag, den 02.02.2009, 09:19 +0100 schrieb Peter Zijlstra:
> I suppose you'll have to go bug the glibc people about their random()
> implementation.

Yes, I will.

> If you really need random() to perform for your application (monte-carlo
> stuff?) You might be better off writing a PRNG with TLS state or
> something.

I just need some noise in my images.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores
  2009-02-02  8:33                     ` Thomas Pilarski
@ 2009-02-02  8:52                       ` Mike Galbraith
  2009-02-02  8:55                         ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Galbraith @ 2009-02-02  8:52 UTC (permalink / raw)
  To: Thomas Pilarski
  Cc: Peter Zijlstra, Andrew Morton, Gregory Haskins, bugme-daemon,
	linux-kernel

On Mon, 2009-02-02 at 09:33 +0100, Thomas Pilarski wrote:
> Am Montag, den 02.02.2009, 09:19 +0100 schrieb Peter Zijlstra:
> > I suppose you'll have to go bug the glibc people about their random()
> > implementation.
> 
> Yes, I will.

Finding the below was easy enough...

/* POSIX.1c requires that there is mutual exclusion for the `rand' and
   `srand' functions to prevent concurrent calls from modifying common
   data.  */
__libc_lock_define_initialized (static, lock)

...

long int
__random ()
{
  int32_t retval;

  __libc_lock_lock (lock);

  (void) __random_r (&unsafe_state, &retval);

  __libc_lock_unlock (lock);

  return retval;
}

...but finding the plumbing leading to __lll_lock_wait_private()
over-taxed my attention span.

	-Mike


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores
  2009-02-02  8:52                       ` Mike Galbraith
@ 2009-02-02  8:55                         ` Peter Zijlstra
  2009-02-02 12:15                           ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2009-02-02  8:55 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Thomas Pilarski, Andrew Morton, Gregory Haskins, bugme-daemon,
	linux-kernel

On Mon, 2009-02-02 at 09:52 +0100, Mike Galbraith wrote:
> On Mon, 2009-02-02 at 09:33 +0100, Thomas Pilarski wrote:
> > Am Montag, den 02.02.2009, 09:19 +0100 schrieb Peter Zijlstra:
> > > I suppose you'll have to go bug the glibc people about their random()
> > > implementation.
> > 
> > Yes, I will.
> 
> Finding the below was easy enough...

Ah, that was a good clue, apparently all you need to so it use
random_r() and provide your own state and all should be well.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores
  2009-02-02  8:55                         ` Peter Zijlstra
@ 2009-02-02 12:15                           ` Peter Zijlstra
  2009-02-02 18:29                             ` Michael Kerrisk
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2009-02-02 12:15 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Thomas Pilarski, Andrew Morton, Gregory Haskins, bugme-daemon,
	linux-kernel, Michael Kerrisk

On Mon, 2009-02-02 at 09:55 +0100, Peter Zijlstra wrote:

> Ah, that was a good clue, apparently all you need to so it use
> random_r() and provide your own state and all should be well.

Michael, would it make sense to add the random_r() family to the "SEE
ALSO" section of the random() man page?

(Admittedly, my random() manpage is ancient: 2008-03-07, so it might be
this is already the case, in which case, ignore me :)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or  synchronizing threads on different cores
  2009-02-02 12:15                           ` Peter Zijlstra
@ 2009-02-02 18:29                             ` Michael Kerrisk
  2009-02-02 18:35                               ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Kerrisk @ 2009-02-02 18:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Thomas Pilarski, Andrew Morton, Gregory Haskins,
	bugme-daemon, linux-kernel

Hi Peter,

On Tue, Feb 3, 2009 at 1:15 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2009-02-02 at 09:55 +0100, Peter Zijlstra wrote:
>
>> Ah, that was a good clue, apparently all you need to so it use
>> random_r() and provide your own state and all should be well.
>
> Michael, would it make sense to add the random_r() family to the "SEE
> ALSO" section of the random() man page?
>
> (Admittedly, my random() manpage is ancient: 2008-03-07, so it might be
> this is already the case, in which case, ignore me :)

(Up-to-date version of the pages can always be found online at the
location in the .sig.)

Well, the man page already had this text under notes:

       This  function  should  not  be  used  in  cases where multiple
       threads use random() and the behavior should  be  reproducible.
       Use random_r(3) for that purpose.

But it certainly doesn't hurt to have random_r(3) also listed under
the SEE ALSO, and I've added it for man-pages-3.18.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or  synchronizing threads on different cores
  2009-02-02 18:29                             ` Michael Kerrisk
@ 2009-02-02 18:35                               ` Peter Zijlstra
  2009-02-03  4:55                                 ` Mike Galbraith
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2009-02-02 18:35 UTC (permalink / raw)
  To: mtk.manpages
  Cc: Mike Galbraith, Thomas Pilarski, Andrew Morton, Gregory Haskins,
	bugme-daemon, linux-kernel

On Tue, 2009-02-03 at 07:29 +1300, Michael Kerrisk wrote:
> Hi Peter,
> 
> On Tue, Feb 3, 2009 at 1:15 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, 2009-02-02 at 09:55 +0100, Peter Zijlstra wrote:
> >
> >> Ah, that was a good clue, apparently all you need to so it use
> >> random_r() and provide your own state and all should be well.
> >
> > Michael, would it make sense to add the random_r() family to the "SEE
> > ALSO" section of the random() man page?
> >
> > (Admittedly, my random() manpage is ancient: 2008-03-07, so it might be
> > this is already the case, in which case, ignore me :)
> 
> (Up-to-date version of the pages can always be found online at the
> location in the .sig.)

Ah, I'll try to remember that.

> Well, the man page already had this text under notes:
> 
>        This  function  should  not  be  used  in  cases where multiple
>        threads use random() and the behavior should  be  reproducible.
>        Use random_r(3) for that purpose.

Yeah, but I found it eventually, but I generally don't read a full
manpage when I'm looking for related functions, only the SEE ALSO
section.

> But it certainly doesn't hurt to have random_r(3) also listed under
> the SEE ALSO, and I've added it for man-pages-3.18.

Thanks.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or  synchronizing threads on different cores
  2009-02-02 18:35                               ` Peter Zijlstra
@ 2009-02-03  4:55                                 ` Mike Galbraith
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Galbraith @ 2009-02-03  4:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mtk.manpages, Thomas Pilarski, Andrew Morton, Gregory Haskins,
	bugme-daemon, linux-kernel

This bug is now dead... so who closes it?

	-Mike


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores
  2009-02-02  7:43                 ` Thomas Pilarski
  2009-02-02  8:19                   ` Peter Zijlstra
@ 2009-02-03  3:56                   ` Valdis.Kletnieks
  1 sibling, 0 replies; 20+ messages in thread
From: Valdis.Kletnieks @ 2009-02-03  3:56 UTC (permalink / raw)
  To: Thomas Pilarski
  Cc: Mike Galbraith, Peter Zijlstra, Andrew Morton, Gregory Haskins,
	bugme-daemon, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 634 bytes --]

On Mon, 02 Feb 2009 08:43:55 +0100, Thomas Pilarski said:
> Am Freitag, den 30.01.2009, 08:57 +0100 schrieb Mike Galbraith:
> > One of those "don't _ever_ do that" things?
> 
> I did not known random() uses a system call. It's rather unrealistic to
> have five million system calls in a second. By adding a small loop with
> some calculations near the random, the problem disappears too.
> It is a unlucky chosen data generator.

Am I the only one that's scared by the concept of anything that beats
on random numbers enough to need 5 million of them a second, but is still
using the relatively sucky one that's in most glibc's? :) 


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-02-03  4:55 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-12562-10286@http.bugzilla.kernel.org/>
2009-01-28 20:56 ` [Bugme-new] [Bug 12562] New: High overhead while switching or synchronizing threads on different cores Andrew Morton
2009-01-28 22:15   ` Peter Zijlstra
2009-01-28 22:25   ` Thomas Pilarski
2009-01-29  9:07     ` Peter Zijlstra
2009-01-29 10:12       ` Thomas Pilarski
2009-01-29 10:24         ` Thomas Pilarski
2009-01-29 10:31           ` Peter Zijlstra
2009-01-29 11:37           ` Peter Zijlstra
2009-01-29 14:05             ` Thomas Pilarski
2009-01-30  7:57               ` Mike Galbraith
2009-02-02  7:43                 ` Thomas Pilarski
2009-02-02  8:19                   ` Peter Zijlstra
2009-02-02  8:33                     ` Thomas Pilarski
2009-02-02  8:52                       ` Mike Galbraith
2009-02-02  8:55                         ` Peter Zijlstra
2009-02-02 12:15                           ` Peter Zijlstra
2009-02-02 18:29                             ` Michael Kerrisk
2009-02-02 18:35                               ` Peter Zijlstra
2009-02-03  4:55                                 ` Mike Galbraith
2009-02-03  3:56                   ` Valdis.Kletnieks

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox