* Scheduler: SIGSTOP on multi threaded processes @ 2005-05-04 17:37 Olivier Croquette 2005-05-04 18:16 ` Richard B. Johnson 2005-05-04 19:10 ` Alexander Nyberg 0 siblings, 2 replies; 19+ messages in thread From: Olivier Croquette @ 2005-05-04 17:37 UTC (permalink / raw) To: LKML Hello On a 2.6.11 x86 system, I am SIGSTOP'ing processes which have started several threads before. As expected, all threads are suspended. But surprisingly, it can happen that some threads are still scheduled after the SIGSTOP has been issued. Typically, they get scheduled 2 times within the next 5ms, before being really stopped. Sadly, I could not reproduce that in a smaller example yet. As this behaviour is IMA against the SIGSTOP concept, I tried to analyze the kernel code responsible for that. I could not really find the exact lines. So here are my questions: 1. do you know any reason for which the SIGSTOP would not stop immediatly all threads of a process? 2. where do the threads get suspended exactly in the kernel? I think it is in signal.c but I am not sure exactly were. 3. can you confirm that the bug MUST be in my code? :) Thanks! Best regards Olivier ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Scheduler: SIGSTOP on multi threaded processes 2005-05-04 17:37 Scheduler: SIGSTOP on multi threaded processes Olivier Croquette @ 2005-05-04 18:16 ` Richard B. Johnson 2005-05-04 19:16 ` Daniel Jacobowitz 2005-05-05 1:04 ` Andy Isaacson 2005-05-04 19:10 ` Alexander Nyberg 1 sibling, 2 replies; 19+ messages in thread From: Richard B. Johnson @ 2005-05-04 18:16 UTC (permalink / raw) To: Olivier Croquette; +Cc: LKML On Wed, 4 May 2005, Olivier Croquette wrote: > Hello > > On a 2.6.11 x86 system, I am SIGSTOP'ing processes which have started > several threads before. > > As expected, all threads are suspended. > > But surprisingly, it can happen that some threads are still scheduled > after the SIGSTOP has been issued. > > Typically, they get scheduled 2 times within the next 5ms, before being > really stopped. > > Sadly, I could not reproduce that in a smaller example yet. > > As this behaviour is IMA against the SIGSTOP concept, I tried to analyze > the kernel code responsible for that. I could not really find the exact > lines. > > So here are my questions: > > 1. do you know any reason for which the SIGSTOP would not stop > immediatly all threads of a process? > > 2. where do the threads get suspended exactly in the kernel? I think it > is in signal.c but I am not sure exactly were. > > 3. can you confirm that the bug MUST be in my code? :) > > Thanks! > > Best regards > > Olivier The kernel doesn't do SIGSTOP or SIGCONT. Within init, there is a SIGSTOP and SIGCONT handler. These can be inherited by others unless changed, perhaps by a 'C' runtime library. Basically, the SIGSTOP handler executes pause() until the SIGCONT signal is received. Any delay in stopping is the time necessary for the signal to be delivered. It is possible that the section of code that contains the STOP/CONT handler was paged out and needs to be paged in before the signal can be delivered. You might quicken this up by installing your own handler for SIGSTOP and SIGCONT.... static int stp; static void contsig(int sig) // SIGCONT handler { stp = 0; } static void stopsig(int sig) // SIGSTOP handler { stp = 1; while(stp) pause(); } Put this near the code that will be executing most of the time. Cheers, Dick Johnson Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips). Notice : All mail here is now cached for review by Dictator Bush. 98.36% of all statistics are fiction. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Scheduler: SIGSTOP on multi threaded processes 2005-05-04 18:16 ` Richard B. Johnson @ 2005-05-04 19:16 ` Daniel Jacobowitz 2005-05-04 21:06 ` Alex Riesen 2005-05-05 0:33 ` Richard B. Johnson 2005-05-05 1:04 ` Andy Isaacson 1 sibling, 2 replies; 19+ messages in thread From: Daniel Jacobowitz @ 2005-05-04 19:16 UTC (permalink / raw) To: Richard B. Johnson; +Cc: Olivier Croquette, LKML On Wed, May 04, 2005 at 02:16:24PM -0400, Richard B. Johnson wrote: > The kernel doesn't do SIGSTOP or SIGCONT. Within init, there is > a SIGSTOP and SIGCONT handler. These can be inherited by others > unless changed, perhaps by a 'C' runtime library. Basically, > the SIGSTOP handler executes pause() until the SIGCONT signal > is received. > > Any delay in stopping is the time necessary for the signal to > be delivered. It is possible that the section of code that > contains the STOP/CONT handler was paged out and needs to be > paged in before the signal can be delivered. > > You might quicken this up by installing your own handler for > SIGSTOP and SIGCONT.... I don't know what RTOSes you've been working with recently, but none of the above is true for Linux. I don't think it ever has been. -- Daniel Jacobowitz CodeSourcery, LLC ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Scheduler: SIGSTOP on multi threaded processes 2005-05-04 19:16 ` Daniel Jacobowitz @ 2005-05-04 21:06 ` Alex Riesen 2005-05-05 0:42 ` Richard B. Johnson 2005-05-05 0:33 ` Richard B. Johnson 1 sibling, 1 reply; 19+ messages in thread From: Alex Riesen @ 2005-05-04 21:06 UTC (permalink / raw) To: Richard B. Johnson, Olivier Croquette, LKML On 5/4/05, Daniel Jacobowitz <dan@debian.org> wrote: > On Wed, May 04, 2005 at 02:16:24PM -0400, Richard B. Johnson wrote: > > The kernel doesn't do SIGSTOP or SIGCONT. Within init, there is > > a SIGSTOP and SIGCONT handler. These can be inherited by others > > unless changed, perhaps by a 'C' runtime library. Basically, > > the SIGSTOP handler executes pause() until the SIGCONT signal > > is received. > > > > Any delay in stopping is the time necessary for the signal to > > be delivered. It is possible that the section of code that > > contains the STOP/CONT handler was paged out and needs to be > > paged in before the signal can be delivered. > > > > You might quicken this up by installing your own handler for > > SIGSTOP and SIGCONT.... > > I don't know what RTOSes you've been working with recently, but none of > the above is true for Linux. I don't think it ever has been. > I don't even think it was true for anything. It's his usual way of saying things. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Scheduler: SIGSTOP on multi threaded processes 2005-05-04 21:06 ` Alex Riesen @ 2005-05-05 0:42 ` Richard B. Johnson 0 siblings, 0 replies; 19+ messages in thread From: Richard B. Johnson @ 2005-05-05 0:42 UTC (permalink / raw) To: Alex Riesen; +Cc: Olivier Croquette, LKML On Wed, 4 May 2005, Alex Riesen wrote: > On 5/4/05, Daniel Jacobowitz <dan@debian.org> wrote: >> On Wed, May 04, 2005 at 02:16:24PM -0400, Richard B. Johnson wrote: >>> The kernel doesn't do SIGSTOP or SIGCONT. Within init, there is >>> a SIGSTOP and SIGCONT handler. These can be inherited by others >>> unless changed, perhaps by a 'C' runtime library. Basically, >>> the SIGSTOP handler executes pause() until the SIGCONT signal >>> is received. >>> >>> Any delay in stopping is the time necessary for the signal to >>> be delivered. It is possible that the section of code that >>> contains the STOP/CONT handler was paged out and needs to be >>> paged in before the signal can be delivered. >>> >>> You might quicken this up by installing your own handler for >>> SIGSTOP and SIGCONT.... >> >> I don't know what RTOSes you've been working with recently, but none of >> the above is true for Linux. I don't think it ever has been. >> > > I don't even think it was true for anything. It's his usual way of > saying things. > Nope, I thought he was talking about the terminal stopper/starter, SIGTSTP used for X-ON and X-OFF. I thought he was sending that signal, timing it, then restarting with SIGCONT. You can't restart or even trap a SIGSTOP signal. Cheers, Dick Johnson Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips). Notice : All mail here is now cached for review by Dictator Bush. 98.36% of all statistics are fiction. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Scheduler: SIGSTOP on multi threaded processes 2005-05-04 19:16 ` Daniel Jacobowitz 2005-05-04 21:06 ` Alex Riesen @ 2005-05-05 0:33 ` Richard B. Johnson 2005-05-05 0:45 ` Richard B. Johnson 2005-05-05 12:24 ` Richard B. Johnson 1 sibling, 2 replies; 19+ messages in thread From: Richard B. Johnson @ 2005-05-05 0:33 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: Olivier Croquette, LKML On Wed, 4 May 2005, Daniel Jacobowitz wrote: > On Wed, May 04, 2005 at 02:16:24PM -0400, Richard B. Johnson wrote: >> The kernel doesn't do SIGSTOP or SIGCONT. Within init, there is >> a SIGSTOP and SIGCONT handler. These can be inherited by others >> unless changed, perhaps by a 'C' runtime library. Basically, >> the SIGSTOP handler executes pause() until the SIGCONT signal >> is received. >> >> Any delay in stopping is the time necessary for the signal to >> be delivered. It is possible that the section of code that >> contains the STOP/CONT handler was paged out and needs to be >> paged in before the signal can be delivered. >> >> You might quicken this up by installing your own handler for >> SIGSTOP and SIGCONT.... > > I don't know what RTOSes you've been working with recently, but none of > the above is true for Linux. I don't think it ever has been. > > -- > Daniel Jacobowitz > CodeSourcery, LLC > Grab a copy of your favorite init source. SIGSTOP and SIGCONT are signals. They are handled by signal handlers, always have been on Unix and Unix clones like Linux. Cheers, Dick Johnson Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips). Notice : All mail here is now cached for review by Dictator Bush. 98.36% of all statistics are fiction. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Scheduler: SIGSTOP on multi threaded processes 2005-05-05 0:33 ` Richard B. Johnson @ 2005-05-05 0:45 ` Richard B. Johnson 2005-05-05 12:24 ` Richard B. Johnson 1 sibling, 0 replies; 19+ messages in thread From: Richard B. Johnson @ 2005-05-05 0:45 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: Olivier Croquette, LKML On Wed, 4 May 2005, linux-os (Dick Johnson) wrote: > On Wed, 4 May 2005, Daniel Jacobowitz wrote: > >> On Wed, May 04, 2005 at 02:16:24PM -0400, Richard B. Johnson wrote: >>> The kernel doesn't do SIGSTOP or SIGCONT. Within init, there is >>> a SIGSTOP and SIGCONT handler. These can be inherited by others >>> unless changed, perhaps by a 'C' runtime library. Basically, >>> the SIGSTOP handler executes pause() until the SIGCONT signal >>> is received. >>> >>> Any delay in stopping is the time necessary for the signal to >>> be delivered. It is possible that the section of code that >>> contains the STOP/CONT handler was paged out and needs to be >>> paged in before the signal can be delivered. >>> >>> You might quicken this up by installing your own handler for >>> SIGSTOP and SIGCONT.... >> >> I don't know what RTOSes you've been working with recently, but none of >> the above is true for Linux. I don't think it ever has been. >> >> -- >> Daniel Jacobowitz >> CodeSourcery, LLC >> > > Grab a copy of your favorite init source. SIGSTOP and SIGCONT are > signals. They are handled by signal handlers, always have been > on Unix and Unix clones like Linux. > Sorry. I thought he was talking about SIGTSTP and SIGCONT, the X-ON X-OFF signals. I thought he was sending a SIGTSTP signal to a task, timing it, then continuing with SIGCONT. He said that it didn't operate fast enought. Cheers, Dick Johnson Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips). Notice : All mail here is now cached for review by Dictator Bush. 98.36% of all statistics are fiction. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Scheduler: SIGSTOP on multi threaded processes 2005-05-05 0:33 ` Richard B. Johnson 2005-05-05 0:45 ` Richard B. Johnson @ 2005-05-05 12:24 ` Richard B. Johnson 2005-05-05 13:14 ` Denis Vlasenko ` (3 more replies) 1 sibling, 4 replies; 19+ messages in thread From: Richard B. Johnson @ 2005-05-05 12:24 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: Olivier Croquette, LKML I don't think the kernel handler gets a chance to do anything because SYS-V init installs its own handler(s). There are comments about Linux misbehavior in the code. It turns out that I was right about SIGSTOP and SIGCONT... Source-code header..... Current init version is 2.85 but I can't find the source. This is 2.62 /* * Init A System-V Init Clone. * * Usage: /sbin/init * init [0123456SsQqAaBbCc] * telinit [0123456SsQqAaBbCc] * * Version: @(#)init.c 2.62 29-May-1996 MvS * * This file is part of the sysvinit suite, [SNIPPED...] /* * Linux ignores all signals sent to init when the * SIG_DFL handler is installed. Therefore we must catch SIGTSTP * and SIGCONT, or else they won't work.... * * The SIGCONT handler */ void cont_handler() { got_cont = 1; } /* * The SIGSTOP & SIGTSTP handler */ void stop_handler() { got_cont = 0; while(!got_cont) pause(); got_cont = 0; } Now, if POSIX threads signals were implimented within the kernel, without first purging the universe of all copies of the SYS-V init that was distributed with early copies of RedHat and others (don't know about current copies, a very long search failed to find the source), then whatever you do in the kernel is wasted. On Wed, 4 May 2005, Richard B. Johnson wrote: > On Wed, 4 May 2005, Daniel Jacobowitz wrote: > >> On Wed, May 04, 2005 at 02:16:24PM -0400, Richard B. Johnson wrote: >>> The kernel doesn't do SIGSTOP or SIGCONT. Within init, there is >>> a SIGSTOP and SIGCONT handler. These can be inherited by others >>> unless changed, perhaps by a 'C' runtime library. Basically, >>> the SIGSTOP handler executes pause() until the SIGCONT signal >>> is received. >>> >>> Any delay in stopping is the time necessary for the signal to >>> be delivered. It is possible that the section of code that >>> contains the STOP/CONT handler was paged out and needs to be >>> paged in before the signal can be delivered. >>> >>> You might quicken this up by installing your own handler for >>> SIGSTOP and SIGCONT.... >> >> I don't know what RTOSes you've been working with recently, but none of >> the above is true for Linux. I don't think it ever has been. >> >> -- >> Daniel Jacobowitz >> CodeSourcery, LLC >> > > Grab a copy of your favorite init source. SIGSTOP and SIGCONT are > signals. They are handled by signal handlers, always have been > on Unix and Unix clones like Linux. > > Cheers, > Dick Johnson > Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips). > Notice : All mail here is now cached for review by Dictator Bush. > 98.36% of all statistics are fiction. > Cheers, Dick Johnson Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips). Notice : All mail here is now cached for review by Dictator Bush. 98.36% of all statistics are fiction. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Scheduler: SIGSTOP on multi threaded processes 2005-05-05 12:24 ` Richard B. Johnson @ 2005-05-05 13:14 ` Denis Vlasenko 2005-05-05 13:30 ` Andreas Schwab ` (2 subsequent siblings) 3 siblings, 0 replies; 19+ messages in thread From: Denis Vlasenko @ 2005-05-05 13:14 UTC (permalink / raw) To: linux-os, Daniel Jacobowitz; +Cc: Olivier Croquette, LKML On Thursday 05 May 2005 15:24, Richard B. Johnson wrote: > > I don't think the kernel handler gets a chance to do anything > because SYS-V init installs its own handler(s). There are comments > about Linux misbehavior in the code. It turns out that I was > right about SIGSTOP and SIGCONT... No you are not. -- vda ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Scheduler: SIGSTOP on multi threaded processes 2005-05-05 12:24 ` Richard B. Johnson 2005-05-05 13:14 ` Denis Vlasenko @ 2005-05-05 13:30 ` Andreas Schwab 2005-05-05 22:04 ` Miquel van Smoorenburg 2005-05-10 20:59 ` Scheduler: SIGSTOP on multi threaded processes Olivier Croquette 3 siblings, 0 replies; 19+ messages in thread From: Andreas Schwab @ 2005-05-05 13:30 UTC (permalink / raw) To: linux-os; +Cc: Daniel Jacobowitz, Olivier Croquette, LKML "Richard B. Johnson" <linux-os@analogic.com> writes: > I don't think the kernel handler gets a chance to do anything > because SYS-V init installs its own handler(s). It's impossible to install a handler for SIGSTOP. > There are comments about Linux misbehavior in the code. It turns out > that I was right about SIGSTOP and SIGCONT... No, you are wrong. SIGTSTP != SIGSTOP. Andreas. -- Andreas Schwab, SuSE Labs, schwab@suse.de SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Scheduler: SIGSTOP on multi threaded processes 2005-05-05 12:24 ` Richard B. Johnson 2005-05-05 13:14 ` Denis Vlasenko 2005-05-05 13:30 ` Andreas Schwab @ 2005-05-05 22:04 ` Miquel van Smoorenburg 2005-05-06 23:15 ` Problem while stopping many threads within a module Yuly Finkelberg 2005-05-10 20:59 ` Scheduler: SIGSTOP on multi threaded processes Olivier Croquette 3 siblings, 1 reply; 19+ messages in thread From: Miquel van Smoorenburg @ 2005-05-05 22:04 UTC (permalink / raw) To: linux-kernel In article <Pine.LNX.4.61.0505050814340.24130@chaos.analogic.com>, Richard B. Johnson <linux-os@analogic.com> wrote: > >I don't think the kernel handler gets a chance to do anything >because SYS-V init installs its own handler(s). There are comments >about Linux misbehavior in the code. It turns out that I was >right about SIGSTOP and SIGCONT... No, you're confused. Sysvinit catches SIGTSTP and SIGCONT (not SIGSTOP) because pid #1 is special - unlike all other processes, SIG_DFL for pid #1 is equal to SIG_IGN. And remember - signal handlers are not inherited (how could they be..) so there is no such thing as "init installing a signal handler for all processes". Right now you should go out and buy a copy of the Stevens book, "Advanced programming in the Unix enviroment", and study it. Mike. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Problem while stopping many threads within a module 2005-05-05 22:04 ` Miquel van Smoorenburg @ 2005-05-06 23:15 ` Yuly Finkelberg 2006-04-20 8:43 ` shikha 0 siblings, 1 reply; 19+ messages in thread From: Yuly Finkelberg @ 2005-05-06 23:15 UTC (permalink / raw) To: linux-kernel Hello - I'm having a strange thread scheduling issue in a project that I'm working on. We have a module, with an interface that can be called by many (currently 50) threads simulatenously. Threads that have entered the kernel, sleep on a wait queue until everyone else has entered. At this point, a "master" process wakes up the first thread, which does some work, then wakes up the second, etc. After waking up its successor, each thread changes its state to STOPPED and sends itself a SIGSTOP. Note that the threads are created with CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND but NOT CLONE_THREAD so there is no group stop. Basically, the structure is the following: kernel_entry_point() { wait until its your turn ...... do some work .... (serialized) wake up the next thread send SIGSTOP to yourself } At the same time, a monitoring process polls until all the threads have stopped themselves: monitor() { repeat: for each thread if (thread->state < TASK_STOPPED) yield() goto repeat } Now, here's the problem. On 2.6.9 UP (Preempt), it is often the case that one thread gets "stuck" in between the wake up of the next thread and stopping itself -- this causes the monitor to poll for extended periods of time, since the thread remains RUNNING. Strangely enough, it generally gets unstuck by itself, sometimes within 10 seconds, sometimes after as long as 10 minutes. When peeking at the kernel stack of the offending process via the monitor, I only see that it is in schedule and the stack looks like this: c55e7ad0 00000086 c55e6000 c55e7a94 00000046 c55e6000 c55e7ad0 c0109c2d 00000000 c03ddae0 00000001 fd0b6c12 0013bc9f c6502130 001770fe fd478e5c 0013bc9f c55d546c c05d3960 00002710 c05d3960 c55e6000 c0106f25 c05d3960 Call Trace: [<c0106f25>] need_resched+0x27/0x32 It also continues to be charged ticks, indicating that its being scheduled but is making no progress? However, I can't find anything that this thread could be spinning on. Also, I don't understand why there is no further context on the stack -- the thread does eventually finish and never leaves the kernel, so the stack shouldn't be corrupted... How can it finish if it has nowhere to return? I realize that this is a long shot, but if anyone has any ideas, I'd appreciate hearing them. Please let me know if I can provide any further information. Thanks, -Yuly ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Problem while stopping many threads within a module 2005-05-06 23:15 ` Problem while stopping many threads within a module Yuly Finkelberg @ 2006-04-20 8:43 ` shikha 0 siblings, 0 replies; 19+ messages in thread From: shikha @ 2006-04-20 8:43 UTC (permalink / raw) To: linux-kernel Yuly Finkelberg <liquidicecube <at> gmail.com> writes: > > Hello - > > I'm having a strange thread scheduling issue in a project that I'm > information. > > Thanks, > -Yuly > Is there any patch for this problem ? We are facing the same problem with Java threads on Linux thanks shikha ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Scheduler: SIGSTOP on multi threaded processes 2005-05-05 12:24 ` Richard B. Johnson ` (2 preceding siblings ...) 2005-05-05 22:04 ` Miquel van Smoorenburg @ 2005-05-10 20:59 ` Olivier Croquette 2005-05-10 21:12 ` Roland McGrath 2005-05-10 23:05 ` Alex Riesen 3 siblings, 2 replies; 19+ messages in thread From: Olivier Croquette @ 2005-05-10 20:59 UTC (permalink / raw) To: linux-kernel; +Cc: roland, alexn, mingo Hi all I worked on my problem in the last days, and I came to these main 2 questions: - Can a SIGSTOP be in a pending state in Linux? - If kill(SIGSTOP,...) returns, does that mean that the corresponding process is completly suspended? I thought until now that SIGSTOP was so special that it could never be pending, and that as soon as: signal(SIGSTOP,pid) returned, then it was assured that the corresponding process (and all its threads) were suspended. This would make sense in my opinion, but apparently it is not always the case, and the POSIX norm do not say anything about that. Any hint? I did also some experiments, with one program which fork()s into: - a child which potentially starts threads and does some stuff - a parent which regularly sends SIGSTOP to the child and check if the activity really stopped, and then send SIGCONT again You will find the source code below. I tried that with different scheduling policies (SCHED_OTHER and SCHED_RR) and different number of threads: - 0: no thread started (ie. mono threaded child) - 1: 1 thread started, and the main task just pthread_join() it - 2: 2 threads started, and the main task pthread_join() them I came to the following results: Policy OTHER RR Threads 0 OK OK 1 FAIL OK 2 FAIL FAIL(1) - the answer to my 2 questions (see above) see to be No and Yes respectively when no thread is started - (1) For RR with 2 threads, there are 2 observed behaviour, apparently happening randomly: * either the parent call always stop instantaneously all threads (like when no thread is started), and that for a long time * or right at the beginning, we can observe that the parent can not do that I find this behaviour really strange. Any idea? Can one rely on the fact that the SIGSTOP operates instantaneously for non-threaded applications? Would it be possible to provide that for all applications? #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sched.h> #include <sys/time.h> #include <sys/types.h> #include <sys/wait.h> #include <sys/ipc.h> #include <sys/shm.h> #include <pthread.h> int set_process_sched(pid_t pid, int policy, int priority) { struct sched_param p; p.sched_priority = priority; if ( 1 || policy != sched_getscheduler(pid) ) { if ( sched_setscheduler(pid,policy,&p) ) { perror("sched_setscheduler()"); return 1; } } return 0; } unsigned long long gettime(void ) { struct timeval tv; if ( gettimeofday(&tv, NULL) ) { perror("gettimeofday()"); return 0; } return (tv.tv_usec + tv.tv_sec * 1000000LL); } typedef struct { int thread_nb; /* id defined by us */ pthread_t thread_id; /* system id of the thread */ } thread_data; int cont_main_loop = 1; void sigterm_handler(int dummy) { printf("sigterm_handler\n"); return; } /* We use a shared memory to communicate between the parent and the child They all only work in the first few bytes */ int shmid; unsigned long long int *shared_array; #define SHM_SIZE 1024 static inline void conf_shmem(void ) { shmid = shmget(IPC_PRIVATE, SHM_SIZE, 0666 | IPC_CREAT); if (shmid == -1) { perror("shmget()"); exit(0); } shared_array = (long long int *) shmat(shmid, 0, 0); if (! shared_array ) { perror("shmat()"); exit(0); } } void loop(int marker) { unsigned long long int begin = gettime(); /* run for 2 minutes at max (useful in case we end up with a busy loop in SCHED_RR... */ while ( gettime() - begin < 120000000LL ) { /* write in the shared memory */ shared_array[0] = marker; } } void *go_thread(void *dummy) { thread_data *data = (thread_data *) dummy; loop(data->thread_nb); fprintf(stderr,"%llu\tQuitting!\n",gettime()); return NULL; } #define MAX_THREADS 100 int main(int argc, char **argv) { int pid; int test_failed = 0; unsigned long long exec_begin = gettime(); int nb_threads = 0; conf_shmem(); shared_array[0] = 0; if ( argc > 1 ) nb_threads = atoi(argv[1]); if ( nb_threads > MAX_THREADS ) nb_threads = MAX_THREADS; pid = fork(); switch ( pid ) { case 0: /* child */ { int thread; thread_data threads[MAX_THREADS]; if ( nb_threads == 0 ) { /* no multi threading */ loop(1); break; } /* start the threads */ for ( thread = 0 ; thread < nb_threads ; thread ++) { threads[thread].thread_nb = thread + 1; if ( pthread_create ( & threads[thread].thread_id, NULL, go_thread, (void *)&threads[thread]) ) perror("pthread_create"); } { int thread; for ( thread = 0 ; thread < nb_threads ; thread ++) { pthread_join ( threads[thread].thread_id, NULL); } } exit(0); } default: /* parent */ { unsigned long long begin = gettime(); /* depending whether we set the priorities or not, we get different results. */ set_process_sched(0, SCHED_RR, 65); set_process_sched(pid, SCHED_RR, 60); /* run for 10s */ while ( gettime() - begin < 10000000 ) { unsigned long long int b_stop, a_stop; /* let the child run a little bit */ usleep(1000); /* stop it */ kill(pid, SIGSTOP); /* Reset our flag */ shared_array[0] = 0; /* Wait to see if someone dare overwriting our nice zero */ usleep(1000); if ( shared_array[0] > 0 ) { test_failed = shared_array[0]; break; } kill(pid, SIGCONT); } kill(pid, SIGKILL); break; } case -1: perror("fork()"); exit(0); } system("uname -a"); printf("%d thread(s)\n",nb_threads); if ( ! test_failed ) printf("test passed"); else printf("test FAILED (%d)",test_failed); printf(" after %f s\n\n", ( gettime() - exec_begin) / 1000000.0 ); return 0; } ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Scheduler: SIGSTOP on multi threaded processes 2005-05-10 20:59 ` Scheduler: SIGSTOP on multi threaded processes Olivier Croquette @ 2005-05-10 21:12 ` Roland McGrath 2005-05-11 18:58 ` Olivier Croquette 2005-05-10 23:05 ` Alex Riesen 1 sibling, 1 reply; 19+ messages in thread From: Roland McGrath @ 2005-05-10 21:12 UTC (permalink / raw) To: Olivier Croquette; +Cc: linux-kernel, alexn, mingo > - Can a SIGSTOP be in a pending state in Linux? For short periods. > - If kill(SIGSTOP,...) returns, does that mean that the corresponding > process is completly suspended? No. One or more threads of the process may still be running on another CPU momentarily before they process the interrupt and stop for the signal. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Scheduler: SIGSTOP on multi threaded processes 2005-05-10 21:12 ` Roland McGrath @ 2005-05-11 18:58 ` Olivier Croquette 0 siblings, 0 replies; 19+ messages in thread From: Olivier Croquette @ 2005-05-11 18:58 UTC (permalink / raw) To: Roland McGrath; +Cc: linux-kernel, mingo Hello Roland Thanks for your reply. >>- Can a SIGSTOP be in a pending state in Linux? > > For short periods. > >>- If kill(SIGSTOP,...) returns, does that mean that the corresponding >>process is completly suspended? > > No. One or more threads of the process may still be running on another CPU > momentarily before they process the interrupt and stop for the signal. I get sometimes 150ms delay between the end of kill() and suspension of the last thread of the 3 threads, on a single-CPU system (Pentium 4). It seems understandable to me to have a delay of <=1ms, especialy on SMP systems, but I really can't understand: - the so big delays (like the 150ms) - why only multi-threaded applications make problems - why the policy of the programs has an impact on the results - why for some executions, the SIGSTOP effect is instantaneous 100s of times in a row, until the end of the test, and the next execution shows delays right from the beginning I don't have much experience hacking the kernel, are these behaviours are quite difficult for me to monitor or trace. I am beginning to run out of ideas to test further :( Could it be that my observations undercover a problem? Or are the a consequence of the Linux implementation? Or do I have a problem in my test bench? Can anyone reproduce and/or validate these observations? Any hint would be appreciated! ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Scheduler: SIGSTOP on multi threaded processes 2005-05-10 20:59 ` Scheduler: SIGSTOP on multi threaded processes Olivier Croquette 2005-05-10 21:12 ` Roland McGrath @ 2005-05-10 23:05 ` Alex Riesen 1 sibling, 0 replies; 19+ messages in thread From: Alex Riesen @ 2005-05-10 23:05 UTC (permalink / raw) To: Olivier Croquette; +Cc: linux-kernel, roland, alexn, mingo This: http://www.opengroup.org/onlinepubs/009695399/toc.htm and probably all other issues of Open Group is very interesting reading. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Scheduler: SIGSTOP on multi threaded processes 2005-05-04 18:16 ` Richard B. Johnson 2005-05-04 19:16 ` Daniel Jacobowitz @ 2005-05-05 1:04 ` Andy Isaacson 1 sibling, 0 replies; 19+ messages in thread From: Andy Isaacson @ 2005-05-05 1:04 UTC (permalink / raw) To: Richard B. Johnson; +Cc: Olivier Croquette, LKML On Wed, May 04, 2005 at 02:16:24PM -0400, Richard B. Johnson wrote: > On Wed, 4 May 2005, Olivier Croquette wrote: > >On a 2.6.11 x86 system, I am SIGSTOP'ing processes which have started > >several threads before. > > The kernel doesn't do SIGSTOP or SIGCONT. Dear Wrongbot, No. -andy ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Scheduler: SIGSTOP on multi threaded processes 2005-05-04 17:37 Scheduler: SIGSTOP on multi threaded processes Olivier Croquette 2005-05-04 18:16 ` Richard B. Johnson @ 2005-05-04 19:10 ` Alexander Nyberg 1 sibling, 0 replies; 19+ messages in thread From: Alexander Nyberg @ 2005-05-04 19:10 UTC (permalink / raw) To: Olivier Croquette; +Cc: LKML > On a 2.6.11 x86 system, I am SIGSTOP'ing processes which have started > several threads before. > > As expected, all threads are suspended. > > But surprisingly, it can happen that some threads are still scheduled > after the SIGSTOP has been issued. > > Typically, they get scheduled 2 times within the next 5ms, before being > really stopped. > > Sadly, I could not reproduce that in a smaller example yet. > > As this behaviour is IMA against the SIGSTOP concept, I tried to analyze > the kernel code responsible for that. I could not really find the exact > lines. > > So here are my questions: > > 1. do you know any reason for which the SIGSTOP would not stop > immediatly all threads of a process? The following scenario is possible: program1 with a thread thread1 1) you send SIGSTOP to program1 2) thread1 is now scheduled and run. 3) program1 is now run and before it is scheduled off it notices it has a signal set, makes sure all threads in the group gets SIGSTOP set. 4) thread1 is now scheduled and run again. now before it is scheduled off it will find a signal pending and set itself in SIGSTOP. There are absolutely no guarantees when a signal will be delivered. Signals are delivered asynchronously. > 2. where do the threads get suspended exactly in the kernel? I think it > is in signal.c but I am not sure exactly were. do_notify_resume() do_signal() get_signal_to_deliver() do_signal_stop() finish_stop() > 3. can you confirm that the bug MUST be in my code? :) You'll have to use reliable mechanisms to achieve what you're looking for. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2006-04-20 9:05 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-05-04 17:37 Scheduler: SIGSTOP on multi threaded processes Olivier Croquette 2005-05-04 18:16 ` Richard B. Johnson 2005-05-04 19:16 ` Daniel Jacobowitz 2005-05-04 21:06 ` Alex Riesen 2005-05-05 0:42 ` Richard B. Johnson 2005-05-05 0:33 ` Richard B. Johnson 2005-05-05 0:45 ` Richard B. Johnson 2005-05-05 12:24 ` Richard B. Johnson 2005-05-05 13:14 ` Denis Vlasenko 2005-05-05 13:30 ` Andreas Schwab 2005-05-05 22:04 ` Miquel van Smoorenburg 2005-05-06 23:15 ` Problem while stopping many threads within a module Yuly Finkelberg 2006-04-20 8:43 ` shikha 2005-05-10 20:59 ` Scheduler: SIGSTOP on multi threaded processes Olivier Croquette 2005-05-10 21:12 ` Roland McGrath 2005-05-11 18:58 ` Olivier Croquette 2005-05-10 23:05 ` Alex Riesen 2005-05-05 1:04 ` Andy Isaacson 2005-05-04 19:10 ` Alexander Nyberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox