public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG] scheduler: first timeslice of the exiting thread
@ 2007-04-07  7:31 Satoru Takeuchi
  2007-04-07  7:45 ` Satoru Takeuchi
  2007-04-09  6:09 ` Andrew Morton
  0 siblings, 2 replies; 15+ messages in thread
From: Satoru Takeuchi @ 2007-04-07  7:31 UTC (permalink / raw)
  To: Ingo Molnar, Linux Kernel; +Cc: Satoru Takeuchi

Hi Ingo and all,

When I was examining the following program ...

  1. There are a large amount of small jobs takes several msecs,
     and the number of job increases constantly.
  2. The process creates a thread or a process per job (I examined both
     the thread model and the process model).
  3. Each child process/thread does the assigned job and exit immediately.

... I found that the thread model's latency is longer than proess
model's one against my expectation. It's because of the current
sched_fork()/sched_exit() implementation as follows:

  a) On sched_fork, the creator share its timeslice with new process.
  b) On sched_exit, if the exiting process didn't exhaust its first
     timeslice yet, it gives its timeslice to the parent.

It has no problem on the process model since the creator is the parent.
However, on the thread model, the creator is not the parent, it is same
as the creator's parent. Hence, on this kind of program, the creator
can't retrieve shared timeslice and exausts its timeslice at a rate of
knots. In addition, somehow, the parent (typically shell?) gets extra
timeslice.

I believe it's a bug and the exiting process should give its timeslice
to the creator. Now I have some patch plan to fix this problem as follow:

 a) Add the field for the creator to task_struct. It needs extra memory.
 b) Doesn't add extra field and have thread's parent the creater, which is
    same as process creation. However it has many side effects, for example,
    we also need to change sys_getppid() implementation.

What do you think? Any comments are welcome.

BTW, We can easily confirm the problem with systemtap, a convenient diagnostic
program.

Test programs(attached in the mail):

  - satprocess.c:  Process model. It creates a child process and wait for it
                   several times. Each child process exits immediately.
  - satthread.c:   Thread model. It creates a child thread and join it several
                   times. Each child thread exits immediately.
  - fork_exit.stp: systemtap script to overlook satprocess/satthread

My kernel:

2.6.21-rc6(i386).

How to confirm:

1) Execute systemtap script.

# stap fork_exit.stp -v
Pass 1: parsed user script and 54 library script(s) in 680usr/40sys/728real ms.
Pass 2: analyzed script: 2 probe(s), 8 function(s), 0 embed(s), 0 global(s) in 670usr/40sys/699real ms.
Pass 3: using cached /root/.systemtap/cache/02/stap_02d3975cd5dedf7b6697a2d5f92f966a_3811.c
Pass 4: using cached /root/.systemtap/cache/02/stap_02d3975cd5dedf7b6697a2d5f92f966a_3811.ko
Pass 5: starting run.

2) Execute the process model program on another terminal.

$ ./satprocess

Then systemtap overlooks satprocess's fork/exit and prints the followings:

fork: pid = 11635, tgid = 11635, ppid = 5969, time_slice = 11
exit: pid = 11635, tgid = 11635, ppid = 11634, time_slice = 6
fork: pid = 11636, tgid = 11636, ppid = 5969, time_slice = 11
exit: pid = 11636, tgid = 11636, ppid = 11634, time_slice = 6
fork: pid = 11637, tgid = 11637, ppid = 5969, time_slice = 11
exit: pid = 11637, tgid = 11637, ppid = 11634, time_slice = 6
fork: pid = 11638, tgid = 11638, ppid = 5969, time_slice = 11
exit: pid = 11638, tgid = 11638, ppid = 11634, time_slice = 6
fork: pid = 11639, tgid = 11639, ppid = 5969, time_slice = 11
exit: pid = 11639, tgid = 11639, ppid = 11634, time_slice = 6
fork: pid = 11640, tgid = 11640, ppid = 5969, time_slice = 11
exit: pid = 11640, tgid = 11640, ppid = 11634, time_slice = 6
fork: pid = 11641, tgid = 11641, ppid = 5969, time_slice = 11
exit: pid = 11641, tgid = 11641, ppid = 11634, time_slice = 6
fork: pid = 11642, tgid = 11642, ppid = 5969, time_slice = 11
exit: pid = 11642, tgid = 11642, ppid = 11634, time_slice = 6
fork: pid = 11643, tgid = 11643, ppid = 5969, time_slice = 11
exit: pid = 11643, tgid = 11643, ppid = 11634, time_slice = 6
fork: pid = 11644, tgid = 11644, ppid = 5969, time_slice = 11
exit: pid = 11644, tgid = 11644, ppid = 11634, time_slice = 6
exit: pid = 11634, tgid = 11634, ppid = 5969, time_slice = 10

It looks good.

3) Execute the thread model program on another terminal.

$ ./satthread

Then systemtap overlooks satthread's fork/exit and prints the followings:

fork: pid = 11646, tgid = 11645, ppid = 5969, time_slice = 10
exit: pid = 11646, tgid = 11645, ppid = 5969, time_slice = 5
fork: pid = 11647, tgid = 11645, ppid = 5969, time_slice = 5
fork: pid = 11648, tgid = 11645, ppid = 5969, time_slice = 2
exit: pid = 11647, tgid = 11645, ppid = 5969, time_slice = 3
fork: pid = 11649, tgid = 11645, ppid = 5969, time_slice = 1
exit: pid = 11648, tgid = 11645, ppid = 5969, time_slice = 1
fork: pid = 11650, tgid = 11645, ppid = 5969, time_slice = 25
exit: pid = 11649, tgid = 11645, ppid = 5969, time_slice = 1
fork: pid = 11651, tgid = 11645, ppid = 5969, time_slice = 12
exit: pid = 11650, tgid = 11645, ppid = 5969, time_slice = 13
fork: pid = 11652, tgid = 11645, ppid = 5969, time_slice = 6
exit: pid = 11651, tgid = 11645, ppid = 5969, time_slice = 6
fork: pid = 11653, tgid = 11645, ppid = 5969, time_slice = 3
exit: pid = 11652, tgid = 11645, ppid = 5969, time_slice = 3
fork: pid = 11654, tgid = 11645, ppid = 5969, time_slice = 1
exit: pid = 11653, tgid = 11645, ppid = 5969, time_slice = 2
fork: pid = 11655, tgid = 11645, ppid = 5969, time_slice = 25
exit: pid = 11654, tgid = 11645, ppid = 5969, time_slice = 1
exit: pid = 11655, tgid = 11645, ppid = 5969, time_slice = 13
exit: pid = 11645, tgid = 11645, ppid = 5969, time_slice = 12

The creator couldn't retrieve child thread's first timeslice and exhausts
its timeslice at lg(25) th tread creation at best, hmm...

Thanks,

Satoru

/// satprocess.c //////////////////////////////////////////////////////////////
/*
 * satprocess - Create child process and wait for it several times. Each child
 *              process does nothing and exits immediately.
 *
 * Copyright (C) 2007 Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
 *
 * This software may be used and distributed according to the terms
 * of the GNU General Public License, incorporated herein by reference.
 */

#include <stdio.h>
#include <stdlib.h>
#include <err.h>
#include <signal.h>

#define NPROC 10

int
main(int argc, char **argv)
{
	int i;
	pid_t pid;

	for (i = 0; i < NPROC; i++) {
		pid = fork();
		if (pid < 0)
			err(EXIT_FAILURE, "fork(%d) failed\n", i);
		if (pid == 0)
			exit(EXIT_SUCCESS);
		wait(NULL);
	}
	exit(EXIT_SUCCESS);
}
///////////////////////////////////////////////////////////////////////////////

/// satthread.c ///////////////////////////////////////////////////////////////
/*
 * satthread - Create child thread and join it several times. Each child thread
 *             does nothing and exits immediately.
 *
 * Copyright (C) 2007 Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
 *
 * This software may be used and distributed according to the terms
 * of the GNU General Public License, incorporated herein by reference.
 */

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

#define NTHREAD 10

static void *thread_fn(void *arg)
{
	return NULL;
}

int
main(int argc, char **argv)
{
	int i;
	pthread_t t;

	for (i = 0; i < NTHREAD; i++) {
		if (pthread_create(&t, NULL, thread_fn, NULL)) {
			fprintf(stderr, "pthread_create(%d) failed\n", i);
			exit(EXIT_FAILURE);
		}
		pthread_join(t, NULL);
	}
	
	exit(EXIT_SUCCESS);
}
///////////////////////////////////////////////////////////////////////////////

/// fork_exit.stp /////////////////////////////////////////////////////////////
/*
 * fork_exit.stp - Overlooks sched_fork()/exit_exit() for satprocess/satthread
 *                 and prints some information
 *
 * Copyright (C) 2007 Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
 *
 * This software may be used and distributed according to the terms
 * of the GNU General Public License, incorporated herein by reference.
 */

function is_my_testpro(comm)
{
	if (comm == "satthread" || comm == "satprocess")
		return 1
	else
		return 0
}

function print_log(name, pid, tgid, ppid, time_slice)
{
	printf("%s: pid = %d, tgid = %d, ppid = %d, time_slice = %u\n",
	       name, pid, tgid, ppid, time_slice);
}

probe kernel.function("sched_fork")
{
	if (is_my_testpro(kernel_string($p->comm)))
		print_log(kernel_string($p->comm),
		          $p->pid, $p->tgid, $p->parent->pid, $p->time_slice);
}

probe kernel.function("sched_exit")
{
	if (is_my_testpro(kernel_string($p->comm)))
		print_log(kernel_string($p->comm),
		          $p->pid, $p->tgid, $p->parent->pid, $p->time_slice);
}

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: [BUG] scheduler: first timeslice of the exiting thread
@ 2007-04-09 10:29 Oleg Nesterov
  2007-04-09 11:03 ` Oleg Nesterov
  2007-04-10  1:19 ` Satoru Takeuchi
  0 siblings, 2 replies; 15+ messages in thread
From: Oleg Nesterov @ 2007-04-09 10:29 UTC (permalink / raw)
  To: Satoru Takeuchi; +Cc: Ingo Molnar, Andrew Morton, Mike Galbraith, linux-kernel

Satoru Takeuchi wrote:
>
>   a) On sched_fork, the creator share its timeslice with new process.
>   b) On sched_exit, if the exiting process didn't exhaust its first
>      timeslice yet, it gives its timeslice to the parent.
>
> It has no problem on the process model since the creator is the parent.
> However, on the thread model, the creator is not the parent, it is same
> as the creator's parent. Hence, on this kind of program, the creator
> can't retrieve shared timeslice and exausts its timeslice at a rate of
> knots. In addition, somehow, the parent (typically shell?) gets extra
> timeslice.

Yes, this is an old oddity.

> I believe it's a bug and the exiting process should give its timeslice
> to the creator. Now I have some patch plan to fix this problem as follow:
>
>  a) Add the field for the creator to task_struct. It needs extra memory.

... and locking/complications. The creator may already exited at this time.

>  b) Doesn't add extra field and have thread's parent the creater, which is
>     same as process creation. However it has many side effects, for example,
>     we also need to change sys_getppid() implementation.

can't understand this, sorry.

Ingo, I think sched_exit() has another buglet,

	if (unlikely(p->parent->time_slice > task_timeslice(p)))
		p->parent->time_slice = task_timeslice(p);

should be

	if (unlikely(parent->time_slice > task_timeslice(parent)))
		parent->time_slice = task_timeslice(parent);

The

	[PATCH 1/2] sched_exit: fix parent->time_slice calculation
	http://marc.info/?l=linux-kernel&m=115101842505365

was dropped because "[PATCH 2/2]" was very wrong.

Could you look at it?

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-04-16 12:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-07  7:31 [BUG] scheduler: first timeslice of the exiting thread Satoru Takeuchi
2007-04-07  7:45 ` Satoru Takeuchi
2007-04-09  6:09 ` Andrew Morton
2007-04-09  6:50   ` Mike Galbraith
2007-04-10  7:18     ` Satoru Takeuchi
2007-04-13 15:31   ` Con Kolivas
2007-04-16  2:16     ` [PATCH] scheduler: fix the return of the first time_slice Satoru Takeuchi
2007-04-16  4:36       ` [PATCH -mm] " Satoru Takeuchi
2007-04-16 10:45         ` Oleg Nesterov
2007-04-16 11:47           ` Oleg Nesterov
2007-04-16 11:54             ` Ingo Molnar
2007-04-16 11:58           ` Satoru Takeuchi
  -- strict thread matches above, loose matches on Subject: below --
2007-04-09 10:29 [BUG] scheduler: first timeslice of the exiting thread Oleg Nesterov
2007-04-09 11:03 ` Oleg Nesterov
2007-04-10  1:19 ` Satoru Takeuchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox