Re: ~5x greater CPU load for a networked application when using 2.6.15-rt15-smp vs. 2.6.12-1.1390_FC4

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@elte.hu>
To: Gautam H Thaker <gthaker@atl.lmco.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: ~5x greater CPU load for a networked application when using 2.6.15-rt15-smp vs. 2.6.12-1.1390_FC4
Date: Thu, 23 Feb 2006 21:58:51 +0100	[thread overview]
Message-ID: <20060223205851.GA24321@elte.hu> (raw)
In-Reply-To: <43FE134C.6070600@atl.lmco.com>

[-- Attachment #1: Type: text/plain, Size: 1257 bytes --]


* Gautam H Thaker <gthaker@atl.lmco.com> wrote:

> ::::::::::::::
> top:  2.6.15-rt15-smp.out   # REAL_TIME KERNEL
> ::::::::::::::

>  2906 root     -66   0 18624 2244 1480 S 41.4  0.1  27:11.21 nalive.p
>     6 root     -91   0     0    0    0 S 32.3  0.0  21:04.53 softirq-net-rx/
>  1379 root     -40  -5     0    0    0 S 14.5  0.0   9:54.76 IRQ 23

One effect of the -rt kernel is that it shows IRQ load explicitly - 
while the stock kernel can 'hide' it because there interrupts run 
'atomically', making it hard to measure the true system overhead. The 
-rt kernel will likely show more overhead, but i'd not expect this 
amount of overhead.

To figure out the true overhead of both kernels, could you try the 
attached loop_print_thread.c code, and run it on: an idle non-rt kernel, 
and idle -rt kernel, a busy non-rt kernel and a busy -rt kernel, and 
send me the typical/average loops/sec value you are getting?

Furthermore, there have been some tasklet related fixes in 2.6.15-rt17, 
which maybe could improve this workload. Maybe ...

Also, would there be some easy way for me to reproduce that workload?  
Possibly some .c code you could send that is easy to run on the server 
and the client to reproduce the guts of this workload?

	Ingo

[-- Attachment #2: loop_print_thread.c --]
[-- Type: text/plain, Size: 2095 bytes --]


#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>

#define rdtscll(val) \
     __asm__ __volatile__ ("rdtsc;" : "=A" (val))

#define SECS 3ULL

volatile unsigned int count_array[1000] __attribute__((aligned(256)));
int atomic = 0;
unsigned long long delta = 0;

void *loop(void *arg)
{
	unsigned long long start, now, mhz = 525000000, limit = mhz * SECS,
		min = -1ULL, tmp;
	volatile unsigned int *count, offset = (int)arg;
	int j;

	printf("offset: %u (atomic: %d).\n", offset, atomic);
	count = (void *)count_array + offset;

	if (!arg) {
		for (j = 0; j < 10; j++) {
			limit = mhz/10;
			*count = 0;
			rdtscll(start);
			for (;;) {
				(*count)++;
				rdtscll(now);
				if (now - start > limit)
					break;
			}
			rdtscll(now);
			tmp = (now-start)/(*count);
			if (tmp < min)
				min = tmp;
		}
		printf("delta: %Ld\n", min);
		delta = min;
	} else
		while (!delta)
			usleep(100000);
	limit = mhz*SECS;

repeat:
	*count = 0;
	rdtscll(start);
	if (atomic)
		for (;;) {
			asm ("lock; incl %0" : "=m" (*count) : "m" (*count));
			rdtscll(now);
			if (now - start > limit)
				break;
		}
	else
		for (;;) {
			(*count)++;
			rdtscll(now);
			if (now - start > limit)
				break;
		}
	printf("speed: %Ld loops (%Ld cycles per iteration).\n", (*count)/SECS, (limit/(*count)-delta)); fflush(stdout);
	goto repeat;
}

int main (int argc, char **argv)
{
	unsigned int nr_threads, i, ret, offset = 0;
	pthread_t *t;

	if (argc != 2 && argc != 3 && argc != 4) {
usage:
		printf("usage: loop_print2 <nr threads> [<counter offset>] [<atomic>]\n");
		exit(-1);
	}
	nr_threads = atol(argv[1]);
	if (!nr_threads)
		goto usage;
	t = calloc(nr_threads, sizeof(*t));
	if (argc >= 3)
		offset = atol(argv[2]);
	if (offset < sizeof(unsigned int))
		offset = sizeof(unsigned int);
	atomic = 0;
	if (argc >= 4) {
		atomic = atol(argv[3]);
		printf("a: %d\n", atomic);
	}

	for (i = 1; i < nr_threads; i++) {
		ret = pthread_create (t+i, NULL, loop,
			       	(void *)(i*offset));
		if (ret)
			exit(-1);
	}
	loop((void *)0);

	return 0;
}

next prev parent reply	other threads:[~2006-02-23 21:00 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-23 19:55 ~5x greater CPU load for a networked application when using 2.6.15-rt15-smp vs. 2.6.12-1.1390_FC4 Gautam H Thaker
2006-02-23 20:15 ` Benjamin LaHaise
2006-02-23 20:58 ` Ingo Molnar [this message]
2006-02-23 21:06   ` Nish Aravamudan
2006-02-23 21:08     ` Ingo Molnar
2006-02-23 21:14       ` Nish Aravamudan
2006-02-23 22:07         ` Esben Nielsen
2006-02-24  8:03       ` Jan Engelhardt
2006-02-24 12:11   ` Andrew Morton
2006-02-24 20:06     ` Gautam H Thaker
2006-02-24 20:31       ` Andrew Morton
2006-02-24 20:44         ` Gautam H Thaker
2006-02-24 16:52 ` Theodore Ts'o
2006-02-24 19:25   ` Gautam H Thaker
2006-02-28 19:27 ` Matt Mackall
2006-02-28 22:19   ` Gautam H Thaker
  -- strict thread matches above, loose matches on Subject: below --
2006-07-11 18:08 Jonathan Walsh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060223205851.GA24321@elte.hu \
    --to=mingo@elte.hu \
    --cc=gthaker@atl.lmco.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox