* Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time @ 2002-04-04 11:59 Adam J. Richter 2002-04-04 12:56 ` Stelian Pop 2002-04-04 16:23 ` David C. Hansen 0 siblings, 2 replies; 32+ messages in thread From: Adam J. Richter @ 2002-04-04 11:59 UTC (permalink / raw) To: torvalds, haveblue, linux-kernel [-- Attachment #1: Type: text/plain, Size: 786 bytes --] When I attempted to boot linux-2.5.8-pre1, I got a kernel BUG() for exit.c line 519. The was a small change to to kernel/exit.c in 2.5.8-pre1 which deleted a kernel_lock() call. Restoring that line resulted in a kernel that booted fine. I am sending this email from the machine running that kernel (so I guess a matching release of the kernel lock is already in the code). Here is the patch. Of course, it would be helpful if someone who actually understands this could take a look at it. -- Adam J. Richter __ ______________ 4880 Stevens Creek Blvd, Suite 104 adam@yggdrasil.com \ / San Jose, California 95129-1034 +1 408 261-6630 | g g d r a s i l United States of America fax +1 408 261-6631 "Free Software For The Rest Of Us." [-- Attachment #2: exit.diff --] [-- Type: text/plain, Size: 263 bytes --] --- linux-2.5.8-pre1/kernel/exit.c 2002-04-03 23:38:32.000000000 -0800 +++ linux/kernel/exit.c 2002-04-04 03:52:18.000000000 -0800 @@ -499,6 +499,7 @@ acct_process(code); __exit_mm(tsk); + lock_kernel(); sem_exit(); __exit_files(tsk); __exit_fs(tsk); ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 11:59 Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Adam J. Richter @ 2002-04-04 12:56 ` Stelian Pop 2002-04-04 13:40 ` Alessandro Suardi 2002-04-04 16:23 ` David C. Hansen 1 sibling, 1 reply; 32+ messages in thread From: Stelian Pop @ 2002-04-04 12:56 UTC (permalink / raw) To: Adam J. Richter; +Cc: Linux Kernel Mailing List On Thu, Apr 04, 2002 at 03:59:10AM -0800, Adam J. Richter wrote: > When I attempted to boot linux-2.5.8-pre1, I got a kernel > BUG() for exit.c line 519. The was a small change to to kernel/exit.c > in 2.5.8-pre1 which deleted a kernel_lock() call. Restoring that line > resulted in a kernel that booted fine. I am sending this email from > the machine running that kernel (so I guess a matching release of > the kernel lock is already in the code). It should be added that the bug is hit only if CONFIG_PREEMPT is on. Stelian. -- Stelian Pop <stelian.pop@fr.alcove.com> Alcove - http://www.alcove.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 12:56 ` Stelian Pop @ 2002-04-04 13:40 ` Alessandro Suardi 0 siblings, 0 replies; 32+ messages in thread From: Alessandro Suardi @ 2002-04-04 13:40 UTC (permalink / raw) To: Stelian Pop; +Cc: Adam J. Richter, Linux Kernel Mailing List Stelian Pop wrote: > On Thu, Apr 04, 2002 at 03:59:10AM -0800, Adam J. Richter wrote: > > >> When I attempted to boot linux-2.5.8-pre1, I got a kernel >>BUG() for exit.c line 519. The was a small change to to kernel/exit.c >>in 2.5.8-pre1 which deleted a kernel_lock() call. Restoring that line >>resulted in a kernel that booted fine. I am sending this email from >>the machine running that kernel (so I guess a matching release of >>the kernel lock is already in the code). > > > It should be added that the bug is hit only if CONFIG_PREEMPT is on. Just to say that * I hit the bug * I have CONFIG_PREEMPT * Adam's fix "works for me (TM)" --alessandro "time is never time at all / you can never ever leave without leaving a piece of youth" (Smashing Pumpkins, "Tonight, tonight") ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 11:59 Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Adam J. Richter 2002-04-04 12:56 ` Stelian Pop @ 2002-04-04 16:23 ` David C. Hansen 2002-04-04 18:28 ` Dave Hansen 1 sibling, 1 reply; 32+ messages in thread From: David C. Hansen @ 2002-04-04 16:23 UTC (permalink / raw) To: Adam J. Richter; +Cc: torvalds, linux-kernel Adam J. Richter wrote: > When I attempted to boot linux-2.5.8-pre1, I got a kernel >BUG() for exit.c line 519. > That bug is hit when the schedule() returns. In the do_exit() case, schedule is not supposed to return. After the current task is scheduled out, it is destroyed. I guess that most of the freeing of the task's resources is done in do_exit(), but I don't see where its kernel stack is freed. > The was a small change to to kernel/exit.c >in 2.5.8-pre1 which deleted a kernel_lock() call. Restoring that line >resulted in a kernel that booted fine. > I take it you don't have a copy of the BUG(). I was going to ask if preemption was enabled, but I see that it was from another message. I was guessing that preemption contributed to this, but now I know. The lock_kernel() has 2 different effects here. It locks the kernel_flag, AND it disables preemption. The correct fix here will probably be to disable preemption, rather than readd the lock_kernel(). For the preemption gurus: Is a preempt_disable() in do_exit() going to hurt anything? Shold we selectively skip the preempt_disable() in schedule() if it was schedule() called from do_exit()? -- Dave Hansen haveblue@us.ibm.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 16:23 ` David C. Hansen @ 2002-04-04 18:28 ` Dave Hansen 2002-04-04 18:51 ` Robert Love 2002-04-04 19:13 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Linus Torvalds 0 siblings, 2 replies; 32+ messages in thread From: Dave Hansen @ 2002-04-04 18:28 UTC (permalink / raw) To: torvalds; +Cc: Adam J. Richter, linux-kernel, Robert Love [-- Attachment #1: Type: text/plain, Size: 1142 bytes --] David C. Hansen wrote: > Adam J. Richter wrote: >> The was a small change to to kernel/exit.c >> in 2.5.8-pre1 which deleted a kernel_lock() call. Restoring that line >> resulted in a kernel that booted fine. >> > I take it you don't have a copy of the BUG(). I was going to ask if > preemption was enabled, but I see that it was from another message. I > was guessing that preemption contributed to this, but now I know. The > lock_kernel() has 2 different effects here. It locks the kernel_flag, > AND it disables preemption. The correct fix here will probably be to > disable preemption, rather than readd the lock_kernel(). I've replicated the problem too. I've diabled preemption in the area where it used to be disabled because of the old lock_kernel(). I'm sending this message from a machine with that patch applied, so the patch does fix it. As the comment says, this is something that the preempt experts need to take a look at. Linus, this is a hack, and there is probably still a window where preemption can happen. But, it is a band-aid until we find the real problem. -- Dave Hansen haveblue@us.ibm.com [-- Attachment #2: do_exit-add_preempt_disable.2.5.8-pre1.patch --] [-- Type: text/plain, Size: 633 bytes --] --- linux-2.5.8-pre1-clean/kernel/exit.c Thu Apr 4 08:58:31 2002 +++ linux/kernel/exit.c Thu Apr 4 10:19:37 2002 @@ -499,6 +499,11 @@ acct_process(code); __exit_mm(tsk); + /* I removed the lock_kernel() from here. It caused preempt kernels + to oops. This fixes it for now, but the real cause needs to + be found. + - Dave Hansen <haveblue@us.ibm.com> 04-04-2002 */ + preempt_disable(); sem_exit(); __exit_files(tsk); __exit_fs(tsk); @@ -515,6 +520,7 @@ tsk->exit_code = code; exit_notify(); + preempt_enable_no_resched(); /* partner of above preempt_disable(); */ schedule(); BUG(); /* ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 18:28 ` Dave Hansen @ 2002-04-04 18:51 ` Robert Love 2002-04-04 19:14 ` Linus Torvalds 2002-04-04 19:13 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Linus Torvalds 1 sibling, 1 reply; 32+ messages in thread From: Robert Love @ 2002-04-04 18:51 UTC (permalink / raw) To: Dave Hansen; +Cc: torvalds, Adam J. Richter, linux-kernel On Thu, 2002-04-04 at 13:28, Dave Hansen wrote: > I've replicated the problem too. > I've diabled preemption in the area where it used to be disabled because > of the old lock_kernel(). I'm sending this message from a machine with > that patch applied, so the patch does fix it. As the comment says, this > is something that the preempt experts need to take a look at. > Linus, this is a hack, and there is probably still a window where > preemption can happen. But, it is a band-aid until we find the real > problem. Thanks for the CC. I've been looking into this problem. I am not too sure why we require protection from concurrency via preemption and not via SMP. In other words, why are we SMP-safe but not preempt-safe here. I don't really have an answer. The problem I saw on boot was the BUG being triggered on the last line. If we are able to preempt here and cause some problem, what proof is there that we don't have a race wrt SMP? Robert Love ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 18:51 ` Robert Love @ 2002-04-04 19:14 ` Linus Torvalds 2002-04-04 19:26 ` Robert Love 0 siblings, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2002-04-04 19:14 UTC (permalink / raw) To: Robert Love; +Cc: Dave Hansen, Adam J. Richter, linux-kernel On 4 Apr 2002, Robert Love wrote: > > Thanks for the CC. I've been looking into this problem. I am not too > sure why we require protection from concurrency via preemption and not > via SMP. In other words, why are we SMP-safe but not preempt-safe here. > > I don't really have an answer. The answer is that preempt_schedule() illegally sets current->state = TASK_RUNNING; without asking the process whether that's ok. The SMP code never does anything like that. Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 19:14 ` Linus Torvalds @ 2002-04-04 19:26 ` Robert Love 2002-04-04 19:41 ` Dave Hansen ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Robert Love @ 2002-04-04 19:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Hansen, Adam J. Richter, linux-kernel On Thu, 2002-04-04 at 14:14, Linus Torvalds wrote: > The answer is that preempt_schedule() illegally sets > > current->state = TASK_RUNNING; > > without asking the process whether that's ok. The SMP code never does > anything like that. Well Ingo added that ;) We used to just set a flag in the preempt_count that marked the task as preempted and made sure on its next trip into schedule it ran again. Do you think it is better to deny preemption if state==TASK_ZOMBIE (note this requires code in preempt_schedule and the interrupt return path, since Ingo decoupled the two) or just disable preemption around critical regions caused by setting state to TASK_ZOMBIE ? I suspect this is the first occurrence of a problem of this kind ... and the attached patch handles it. Robert Love diff -urN linux-2.5.8-pre1/kernel/exit.c linux/kernel/exit.c --- linux-2.5.8-pre1/kernel/exit.c Wed Apr 3 20:57:37 2002 +++ linux/kernel/exit.c Thu Apr 4 14:22:30 2002 @@ -499,6 +499,8 @@ acct_process(code); __exit_mm(tsk); + preempt_disable(); + sem_exit(); __exit_files(tsk); __exit_fs(tsk); @@ -515,6 +517,7 @@ tsk->exit_code = code; exit_notify(); + preempt_enable_no_resched(); schedule(); BUG(); /* ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 19:26 ` Robert Love @ 2002-04-04 19:41 ` Dave Hansen 2002-04-04 20:02 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() atboot time george anzinger 2002-04-04 20:54 ` Andrew Morton 2002-04-04 21:34 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Roger Larsson 2 siblings, 1 reply; 32+ messages in thread From: Dave Hansen @ 2002-04-04 19:41 UTC (permalink / raw) To: Robert Love; +Cc: Linus Torvalds, Adam J. Richter, linux-kernel Robert Love wrote: > tsk->exit_code = code; > exit_notify(); > + preempt_enable_no_resched(); * PREEMPT HERE * > schedule(); > BUG(); Isn't there still a race here? A preempt CAN happen after you reenable, right? Admittedly, its a small window, but it _is_ a window. -- Dave Hansen haveblue@us.ibm.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() atboot time 2002-04-04 19:41 ` Dave Hansen @ 2002-04-04 20:02 ` george anzinger 0 siblings, 0 replies; 32+ messages in thread From: george anzinger @ 2002-04-04 20:02 UTC (permalink / raw) To: Dave Hansen; +Cc: Robert Love, Linus Torvalds, Adam J. Richter, linux-kernel Dave Hansen wrote: > > Robert Love wrote: > > tsk->exit_code = code; > > exit_notify(); > > + preempt_enable_no_resched(); > * PREEMPT HERE * > > schedule(); > > BUG(); > > Isn't there still a race here? A preempt CAN happen after you reenable, > right? Admittedly, its a small window, but it _is_ a window. Right, eliminate this line. Since the task is going away it doesn't matter. Actually it doesn't matter even if schedule() returns (i.e. for other than ZOMBIE) as long as preempt is enabled after the schedule() call, somewhere. It is designed to work this way for a reason, this being one of them. -g > > -- > Dave Hansen > haveblue@us.ibm.com > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Real time sched: http://sourceforge.net/projects/rtsched/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() atboot time 2002-04-04 19:26 ` Robert Love 2002-04-04 19:41 ` Dave Hansen @ 2002-04-04 20:54 ` Andrew Morton 2002-04-04 21:34 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Roger Larsson 2 siblings, 0 replies; 32+ messages in thread From: Andrew Morton @ 2002-04-04 20:54 UTC (permalink / raw) To: Robert Love; +Cc: Linus Torvalds, Dave Hansen, Adam J. Richter, linux-kernel Robert Love wrote: > > ... > Do you think it is better to deny preemption if state==TASK_ZOMBIE (note > this requires code in preempt_schedule and the interrupt return path, > since Ingo decoupled the two) or just disable preemption around critical > regions caused by setting state to TASK_ZOMBIE ? > > I suspect this is the first occurrence of a problem of this kind ... and > the attached patch handles it. > No, the problem goes deeper than this. I have code which does, effectively: sleeper() { spin_lock(&some_lock); set_current_state(TASK_UNINTERRUPTIBLE); some_flag = 0; spin_unlock(&lock); schedule(); if (some_flag == 0) i_am_horribly_confused(); } waker() { spin_lock(&some_lock); some_flag = 1; wake_up_process(sleeper); spin_unlock(&some_lock); } or something like that. See __pdflush() and pdflush_operation() in http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.8-pre1/delalloc/dallocbase-60-pdflush.patch The above code work fine, is nice and I want to keep it that way. But it fails on preempt. The spin_unlock() in sleeper() can sometimes set task->state to TASK_RUNNING(), so my schedule() call just falls straight through. Probably nobody has noticed this in other places because most sleep/wakeup stuff tends to be done inside a loop; the bogus "wakeup" is ignored. Although it can be worked around at the call site, I think this needs fixing. Otherwise we have the rule "spin_unlock will flip you into TASK_RUNNING 0.0001% of the time if CONFIG_PREEMPT=y". ug. I have thought deeply about this, and I then promptly forgot everything I thought about, but I ended up concluding that the sanest way of resolving this is inside __set_current_state(). If the new state is TASK_RUNNING and the old state is not TASK_RUNNING then enable preemption, call schedule() if necessary, etc. It is not acceptable to just say "don't preempt a task which is not in state TASK_RUNNING", because if an interrupt happens against a CPU which is running a task which is in state TASK_INTERRUPTIBLE (say), then that wakeup won't be serviced until the task exits the kernel. - ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 19:26 ` Robert Love 2002-04-04 19:41 ` Dave Hansen 2002-04-04 20:54 ` Andrew Morton @ 2002-04-04 21:34 ` Roger Larsson 2002-04-04 22:38 ` Andrew Morton 2 siblings, 1 reply; 32+ messages in thread From: Roger Larsson @ 2002-04-04 21:34 UTC (permalink / raw) To: Robert Love, Linus Torvalds; +Cc: Dave Hansen, Adam J. Richter, linux-kernel On torsdagen den 4 april 2002 21.26, Robert Love wrote: > On Thu, 2002-04-04 at 14:14, Linus Torvalds wrote: > > The answer is that preempt_schedule() illegally sets > > > > current->state = TASK_RUNNING; > > > > without asking the process whether that's OK. The SMP code never does > > anything like that. > > Well Ingo added that ;) > > We used to just set a flag in the preempt_count that marked the task as > preempted and made sure on its next trip into schedule it ran again. > How about doing: asmlinkage void preempt_schedule(void) { unsigned long saved_state; if (unlikely(preempt_get_count())) return; preempt_disable(); /* or use an atomic operation */ saved_state = current->state; current->state = TASK_RUNNING; preempt_enable_no_resched(); /* we are scheduling anyway... */ schedule(); current->state = saved_state; } It is unlikely to get preemption between schedule() and the setting since schedule it self checks - the window is small. And when it hits if will correctly restore the correct value. Note this code does not need to solve the FLAG problem. current->state |= FLAG * PREEMPT * current->state |= FLAG schedule() current->state &= ~FLAG schedule() with flag disabled /RogerL -- Roger Larsson Skellefteå Sweden ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 21:34 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Roger Larsson @ 2002-04-04 22:38 ` Andrew Morton 2002-04-04 22:42 ` Linus Torvalds 2002-04-04 22:55 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Robert Love 0 siblings, 2 replies; 32+ messages in thread From: Andrew Morton @ 2002-04-04 22:38 UTC (permalink / raw) To: Roger Larsson Cc: Robert Love, Linus Torvalds, Dave Hansen, Adam J. Richter, linux-kernel Roger Larsson wrote: > > ... > How about doing: > > asmlinkage void preempt_schedule(void) > { > unsigned long saved_state; > > if (unlikely(preempt_get_count())) > return; > > preempt_disable(); /* or use an atomic operation */ > saved_state = current->state; > current->state = TASK_RUNNING; > preempt_enable_no_resched(); /* we are scheduling anyway... */ > schedule(); Interrupt occurs, puts this task in state TASK_RUNNING. > current->state = saved_state; whoops. We went back into TASK_UNINTERRUPTIBLE. > } > We could fix this with changes to schedule(), but that's not nice. Another approach would be: preempt_schedule() { current->state2 = current->state; current->state = TASK_RUNNING; schedule(); current->state = current->state2; } and wake_up() would do: tsk->state = TASK_RUNNING; tsk->state2 = TASK_RUNNING; With the appropriate locking, memory barriers and other relevant goo I think this would work... - ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 22:38 ` Andrew Morton @ 2002-04-04 22:42 ` Linus Torvalds 2002-04-04 22:54 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() atboot time Andrew Morton 2002-04-04 23:07 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Robert Love 2002-04-04 22:55 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Robert Love 1 sibling, 2 replies; 32+ messages in thread From: Linus Torvalds @ 2002-04-04 22:42 UTC (permalink / raw) To: Andrew Morton Cc: Roger Larsson, Robert Love, Dave Hansen, Adam J. Richter, linux-kernel On Thu, 4 Apr 2002, Andrew Morton wrote: > Another approach would be: > > preempt_schedule() > { > current->state2 = current->state; > current->state = TASK_RUNNING; > schedule(); > current->state = current->state2; > } Yes, but please no. My current tree says asmlinkage void preempt_schedule(void) { if (unlikely(preempt_get_count())) return; if (current->state != TASK_RUNNING) return; schedule(); } and if people start getting latency problems due to loops with state != TASK_RUNNING, then I suspect we might just make "set_current_state()" check that case explicitly and do a conditional reschedule (ie make it the same as if we released a lock). That would be a hell of a lot cleaner, in my opinion. Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() atboot time 2002-04-04 22:42 ` Linus Torvalds @ 2002-04-04 22:54 ` Andrew Morton 2002-04-04 23:07 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Robert Love 1 sibling, 0 replies; 32+ messages in thread From: Andrew Morton @ 2002-04-04 22:54 UTC (permalink / raw) To: Linus Torvalds Cc: Roger Larsson, Robert Love, Dave Hansen, Adam J. Richter, linux-kernel Linus Torvalds wrote: > > On Thu, 4 Apr 2002, Andrew Morton wrote: > > Another approach would be: > > > > preempt_schedule() > > { > > current->state2 = current->state; > > current->state = TASK_RUNNING; > > schedule(); > > current->state = current->state2; > > } > > Yes, but please no. > > My current tree says > > asmlinkage void preempt_schedule(void) > { > if (unlikely(preempt_get_count())) > return; > if (current->state != TASK_RUNNING) > return; > schedule(); > } > > and if people start getting latency problems due to loops with state != > TASK_RUNNING, then I suspect we might just make "set_current_state()" > check that case explicitly and do a conditional reschedule (ie make it the > same as if we released a lock). That would be a hell of a lot cleaner, in > my opinion. > That would work. And would also fix my "spin_unlock sometimes stomps on TASK_INTERRUPTIBLE" problem. It does mean that we'll need to convert many open-coded current->state = whatever; instances to use [__]set_current_state(). But that's not a bad thing. Janitorial patches for this are already floating about. Robert, do you have time to do the code-and-test thing? - ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 22:42 ` Linus Torvalds 2002-04-04 22:54 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() atboot time Andrew Morton @ 2002-04-04 23:07 ` Robert Love 2002-04-04 23:42 ` Linus Torvalds 1 sibling, 1 reply; 32+ messages in thread From: Robert Love @ 2002-04-04 23:07 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Roger Larsson, Dave Hansen, Adam J. Richter, linux-kernel On Thu, 2002-04-04 at 17:42, Linus Torvalds wrote: > My current tree says > > asmlinkage void preempt_schedule(void) > { > if (unlikely(preempt_get_count())) > return; > if (current->state != TASK_RUNNING) > return; > schedule(); > } You need to do the same thing in entry.S ... Ingo decoupled the two entry paths from each other (entry.S does preempt_schedule's work on its own, and then directly calls schedule). So, right now the ret_from_intr path sets the task state to TASK_RUNNING. That needs to be replaced with a jump out if it is NOT task running. Curiously, what is wrong with the way we did it in the original patch? I.e. set a bit in preempt_count and use that to skip to pick_next_task in schedule. This allows us to preempt any task of any state ... Robert Love ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 23:07 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Robert Love @ 2002-04-04 23:42 ` Linus Torvalds 2002-04-04 23:47 ` Robert Love 0 siblings, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2002-04-04 23:42 UTC (permalink / raw) To: Robert Love Cc: Andrew Morton, Roger Larsson, Dave Hansen, Adam J. Richter, linux-kernel On 4 Apr 2002, Robert Love wrote: > > Curiously, what is wrong with the way we did it in the original patch? > I.e. set a bit in preempt_count and use that to skip to pick_next_task > in schedule. This allows us to preempt any task of any state ... Because that requires that every user of "set_task_state()" needs to know about preemption. Btw, I think entry.S should just call preempt_schedule() instead, instead of knowing about these details. Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 23:42 ` Linus Torvalds @ 2002-04-04 23:47 ` Robert Love 2002-04-04 23:55 ` Linus Torvalds 0 siblings, 1 reply; 32+ messages in thread From: Robert Love @ 2002-04-04 23:47 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Roger Larsson, Dave Hansen, Adam J. Richter, linux-kernel On Thu, 2002-04-04 at 18:42, Linus Torvalds wrote: > Because that requires that every user of "set_task_state()" needs to know > about preemption. Hm, how so? I contend not to rudely set the task state but instead mark the task as "preempted" in preempt_schedule and handle this case in schedule. It requires zero change to anything else; this is the behavior of the original patch I sent you. > Btw, I think entry.S should just call preempt_schedule() instead, instead > of knowing about these details. Agreed. Robert Love ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 23:47 ` Robert Love @ 2002-04-04 23:55 ` Linus Torvalds 2002-04-05 0:03 ` [PATCH] preemptive kernel behavior change: don't be rude Robert Love 0 siblings, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2002-04-04 23:55 UTC (permalink / raw) To: Robert Love Cc: Andrew Morton, Roger Larsson, Dave Hansen, Adam J. Richter, linux-kernel On 4 Apr 2002, Robert Love wrote: > > Hm, how so? I contend not to rudely set the task state but instead mark > the task as "preempted" in preempt_schedule and handle this case in > schedule. Ahh, I misunderstood - I thought you meant setting the bit when setting current->state. Fair enough. Send me a patch to look at. Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] preemptive kernel behavior change: don't be rude 2002-04-04 23:55 ` Linus Torvalds @ 2002-04-05 0:03 ` Robert Love 2002-04-05 1:51 ` george anzinger 0 siblings, 1 reply; 32+ messages in thread From: Robert Love @ 2002-04-05 0:03 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, george, linux-kernel On Thu, 2002-04-04 at 18:55, Linus Torvalds wrote: > Fair enough. Send me a patch to look at. Linus et all, Here we go: - do not manually set task->state - instead, in preempt_schedule, set a flag in preempt_count that denotes that this task is entering schedule off a kernel preemption. - use this flag in schedule to jump to pick_next_task - in preempt_schedule, upon return from schedule, unset the flag - have entry.S just call preempt_schedule and not duplicate this work, as Linus suggested. I agree. Note this makes debugging easier as we keep a single point of entry for kernel preemptions. The result: we can safely preempt non-TASK_RUNNING tasks. If one is preempted, we can safely survive schedule because we won't handle the special casing of non-TASK_RUNNING at the top of schedule. Thus other tasks can run as desired and our non-TASK_RUNNING task will eventually be rescheduled, in its original state, and complete happily. This is the behavior we have in the 2.4 patches and 2.5 until ~2.5.6-pre. This works. It requires no other changes elsewhere (it actually removes some special-casing Ingo did in the signal code). This patch works in theory and compiles, but received minimal testing. Robert Love diff -urN linux-2.5.8-pre1/arch/i386/kernel/entry.S linux/arch/i386/kernel/entry.S --- linux-2.5.8-pre1/arch/i386/kernel/entry.S Wed Apr 3 20:57:24 2002 +++ linux/arch/i386/kernel/entry.S Thu Apr 4 18:32:03 2002 @@ -240,9 +240,7 @@ jnz restore_all incl TI_PRE_COUNT(%ebx) sti - movl TI_TASK(%ebx), %ecx # ti->task - movl $0,(%ecx) # current->state = TASK_RUNNING - call SYMBOL_NAME(schedule) + call SYMBOL_NAME(preempt_schedule) jmp ret_from_intr #endif diff -urN linux-2.5.8-pre1/arch/i386/kernel/ptrace.c linux/arch/i386/kernel/ptrace.c --- linux-2.5.8-pre1/arch/i386/kernel/ptrace.c Wed Apr 3 20:57:24 2002 +++ linux/arch/i386/kernel/ptrace.c Thu Apr 4 18:31:17 2002 @@ -455,11 +455,9 @@ between a syscall stop and SIGTRAP delivery */ current->exit_code = SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) ? 0x80 : 0); - preempt_disable(); current->state = TASK_STOPPED; notify_parent(current, SIGCHLD); schedule(); - preempt_enable(); /* * this isn't the same as continuing with a signal, but it will do * for normal use. strace only continues with a signal if the diff -urN linux-2.5.8-pre1/arch/i386/kernel/signal.c linux/arch/i386/kernel/signal.c --- linux-2.5.8-pre1/arch/i386/kernel/signal.c Wed Apr 3 20:57:24 2002 +++ linux/arch/i386/kernel/signal.c Thu Apr 4 18:31:00 2002 @@ -610,11 +610,9 @@ if ((current->ptrace & PT_PTRACED) && signr != SIGKILL) { /* Let the debugger run. */ current->exit_code = signr; - preempt_disable(); current->state = TASK_STOPPED; notify_parent(current, SIGCHLD); schedule(); - preempt_enable(); /* We're back. Did the debugger cancel the sig? */ if (!(signr = current->exit_code)) @@ -669,14 +667,12 @@ case SIGSTOP: { struct signal_struct *sig; + current->state = TASK_STOPPED; current->exit_code = signr; sig = current->parent->sig; - preempt_disable(); - current->state = TASK_STOPPED; if (sig && !(sig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP)) notify_parent(current, SIGCHLD); schedule(); - preempt_enable(); continue; } diff -urN linux-2.5.8-pre1/include/linux/sched.h linux/include/linux/sched.h --- linux-2.5.8-pre1/include/linux/sched.h Wed Apr 3 20:57:27 2002 +++ linux/include/linux/sched.h Thu Apr 4 18:29:53 2002 @@ -91,6 +91,7 @@ #define TASK_UNINTERRUPTIBLE 2 #define TASK_ZOMBIE 4 #define TASK_STOPPED 8 +#define PREEMPT_ACTIVE 0x4000000 #define __set_task_state(tsk, state_value) \ do { (tsk)->state = (state_value); } while (0) diff -urN linux-2.5.8-pre1/kernel/sched.c linux/kernel/sched.c --- linux-2.5.8-pre1/kernel/sched.c Wed Apr 3 20:57:37 2002 +++ linux/kernel/sched.c Thu Apr 4 18:29:24 2002 @@ -764,6 +764,13 @@ prev->sleep_timestamp = jiffies; spin_lock_irq(&rq->lock); + /* + * if entering from preempt_schedule, off a kernel preemption, + * go straight to picking the next task. + */ + if (unlikely(preempt_get_count() & PREEMPT_ACTIVE)) + goto pick_next_task; + switch (prev->state) { case TASK_INTERRUPTIBLE: if (unlikely(signal_pending(prev))) { @@ -775,7 +782,7 @@ case TASK_RUNNING: ; } -#if CONFIG_SMP +#if CONFIG_SMP || CONFIG_PREEMPT pick_next_task: #endif if (unlikely(!rq->nr_running)) { @@ -843,8 +850,11 @@ { if (unlikely(preempt_get_count())) return; - current->state = TASK_RUNNING; + + current_thread_info()->preempt_count += PREEMPT_ACTIVE; schedule(); + current_thread_info()->preempt_count -= PREEMPT_ACTIVE; + barrier(); } #endif /* CONFIG_PREEMPT */ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] preemptive kernel behavior change: don't be rude 2002-04-05 0:03 ` [PATCH] preemptive kernel behavior change: don't be rude Robert Love @ 2002-04-05 1:51 ` george anzinger 2002-04-05 2:06 ` Robert Love 0 siblings, 1 reply; 32+ messages in thread From: george anzinger @ 2002-04-05 1:51 UTC (permalink / raw) To: Robert Love, Andrew Morton, george, linux-kernel, Linus Torvalds Robert Love wrote: > > On Thu, 2002-04-04 at 18:55, Linus Torvalds wrote: > > Fair enough. Send me a patch to look at. > > Linus et all, > > Here we go: > > - do not manually set task->state > - instead, in preempt_schedule, set a flag in preempt_count that > denotes that this task is entering schedule off a kernel preemption. > - use this flag in schedule to jump to pick_next_task > - in preempt_schedule, upon return from schedule, unset the flag > - have entry.S just call preempt_schedule and not duplicate this work, > as Linus suggested. I agree. Note this makes debugging easier as > we keep a single point of entry for kernel preemptions. > > The result: we can safely preempt non-TASK_RUNNING tasks. If one is > preempted, we can safely survive schedule because we won't handle the > special casing of non-TASK_RUNNING at the top of schedule. Thus other > tasks can run as desired and our non-TASK_RUNNING task will eventually > be rescheduled, in its original state, and complete happily. > > This is the behavior we have in the 2.4 patches and 2.5 until > ~2.5.6-pre. This works. It requires no other changes elsewhere (it > actually removes some special-casing Ingo did in the signal code). > > This patch works in theory and compiles, but received minimal testing. > > Robert Love > > diff -urN linux-2.5.8-pre1/arch/i386/kernel/entry.S linux/arch/i386/kernel/entry.S > --- linux-2.5.8-pre1/arch/i386/kernel/entry.S Wed Apr 3 20:57:24 2002 > +++ linux/arch/i386/kernel/entry.S Thu Apr 4 18:32:03 2002 > @@ -240,9 +240,7 @@ > jnz restore_all > incl TI_PRE_COUNT(%ebx) > sti > - movl TI_TASK(%ebx), %ecx # ti->task > - movl $0,(%ecx) # current->state = TASK_RUNNING > - call SYMBOL_NAME(schedule) > + call SYMBOL_NAME(preempt_schedule) > jmp ret_from_intr > #endif > > diff -urN linux-2.5.8-pre1/arch/i386/kernel/ptrace.c linux/arch/i386/kernel/ptrace.c > --- linux-2.5.8-pre1/arch/i386/kernel/ptrace.c Wed Apr 3 20:57:24 2002 > +++ linux/arch/i386/kernel/ptrace.c Thu Apr 4 18:31:17 2002 > @@ -455,11 +455,9 @@ > between a syscall stop and SIGTRAP delivery */ > current->exit_code = SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) > ? 0x80 : 0); > - preempt_disable(); > current->state = TASK_STOPPED; > notify_parent(current, SIGCHLD); > schedule(); > - preempt_enable(); > /* > * this isn't the same as continuing with a signal, but it will do > * for normal use. strace only continues with a signal if the > diff -urN linux-2.5.8-pre1/arch/i386/kernel/signal.c linux/arch/i386/kernel/signal.c > --- linux-2.5.8-pre1/arch/i386/kernel/signal.c Wed Apr 3 20:57:24 2002 > +++ linux/arch/i386/kernel/signal.c Thu Apr 4 18:31:00 2002 > @@ -610,11 +610,9 @@ > if ((current->ptrace & PT_PTRACED) && signr != SIGKILL) { > /* Let the debugger run. */ > current->exit_code = signr; > - preempt_disable(); > current->state = TASK_STOPPED; > notify_parent(current, SIGCHLD); > schedule(); > - preempt_enable(); > > /* We're back. Did the debugger cancel the sig? */ > if (!(signr = current->exit_code)) > @@ -669,14 +667,12 @@ > > case SIGSTOP: { > struct signal_struct *sig; > + current->state = TASK_STOPPED; > current->exit_code = signr; > sig = current->parent->sig; > - preempt_disable(); > - current->state = TASK_STOPPED; > if (sig && !(sig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP)) > notify_parent(current, SIGCHLD); > schedule(); > - preempt_enable(); > continue; > } > > diff -urN linux-2.5.8-pre1/include/linux/sched.h linux/include/linux/sched.h > --- linux-2.5.8-pre1/include/linux/sched.h Wed Apr 3 20:57:27 2002 > +++ linux/include/linux/sched.h Thu Apr 4 18:29:53 2002 > @@ -91,6 +91,7 @@ > #define TASK_UNINTERRUPTIBLE 2 > #define TASK_ZOMBIE 4 > #define TASK_STOPPED 8 > +#define PREEMPT_ACTIVE 0x4000000 > > #define __set_task_state(tsk, state_value) \ > do { (tsk)->state = (state_value); } while (0) > diff -urN linux-2.5.8-pre1/kernel/sched.c linux/kernel/sched.c > --- linux-2.5.8-pre1/kernel/sched.c Wed Apr 3 20:57:37 2002 > +++ linux/kernel/sched.c Thu Apr 4 18:29:24 2002 > @@ -764,6 +764,13 @@ > prev->sleep_timestamp = jiffies; > spin_lock_irq(&rq->lock); > > + /* > + * if entering from preempt_schedule, off a kernel preemption, > + * go straight to picking the next task. > + */ > + if (unlikely(preempt_get_count() & PREEMPT_ACTIVE)) > + goto pick_next_task; > + > switch (prev->state) { > case TASK_INTERRUPTIBLE: > if (unlikely(signal_pending(prev))) { > @@ -775,7 +782,7 @@ > case TASK_RUNNING: > ; > } > -#if CONFIG_SMP > +#if CONFIG_SMP || CONFIG_PREEMPT > pick_next_task: > #endif > if (unlikely(!rq->nr_running)) { > @@ -843,8 +850,11 @@ > { > if (unlikely(preempt_get_count())) > return; This line implies that entry.S calls with preempt count of 0. It use to call with 1 or was that WAY back there? If this is the way it is and it works, then the += and -= below can be changed to = and the second reference to PREEMPT_ACTIVE becomes 0. I think I would rather have entry.S set preempt_count to PREEMPT_ACTIVE with the interrupt system off, turn it on, make the call directly to schedule(), and then set to zero on return. I really am concerned with taking an interrupt during the call, i.e. between the interrupt on and the store below. This can lead to stack overflow rather easily. > - current->state = TASK_RUNNING; > + > + current_thread_info()->preempt_count += PREEMPT_ACTIVE; > schedule(); > + current_thread_info()->preempt_count -= PREEMPT_ACTIVE; > + barrier(); > } > #endif /* CONFIG_PREEMPT */ > -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Real time sched: http://sourceforge.net/projects/rtsched/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] preemptive kernel behavior change: don't be rude 2002-04-05 1:51 ` george anzinger @ 2002-04-05 2:06 ` Robert Love 0 siblings, 0 replies; 32+ messages in thread From: Robert Love @ 2002-04-05 2:06 UTC (permalink / raw) To: george anzinger; +Cc: Andrew Morton, linux-kernel, Linus Torvalds On Thu, 2002-04-04 at 20:51, george anzinger wrote: > This line implies that entry.S calls with preempt count of 0. It use to > call with 1 or was that WAY back there? No you are right, this line needs to be pushed back out to preempt_enable OR we need to duplicate preempt_schedule in the entry.S code. For now, I will just push this back into preempt_enable. > If this is the way it is and it works, then the += and -= below can be > changed to = and the second reference to PREEMPT_ACTIVE becomes 0. > > I think I would rather have entry.S set preempt_count to PREEMPT_ACTIVE > with the interrupt system off, turn it on, make the call directly to > schedule(), and then set to zero on return. I really am concerned with > taking an interrupt during the call, i.e. between the interrupt on and > the store below. This can lead to stack overflow rather easily. Agreed. This is probably the best way to do it ... for now I'll do it like above. Robert Love ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 22:38 ` Andrew Morton 2002-04-04 22:42 ` Linus Torvalds @ 2002-04-04 22:55 ` Robert Love 2002-04-04 23:10 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() atboot time Andrew Morton 1 sibling, 1 reply; 32+ messages in thread From: Robert Love @ 2002-04-04 22:55 UTC (permalink / raw) To: Andrew Morton Cc: Roger Larsson, Linus Torvalds, Dave Hansen, Adam J. Richter, linux-kernel On Thu, 2002-04-04 at 17:38, Andrew Morton wrote: > With the appropriate locking, memory barriers and other > relevant goo I think this would work... Yah, I guess, but that isn't pretty at all ;) Andrew, remember how we used to do it (and still do it in the 2.4 patch)? Wouldn't that work? Specifically, when we enter preempt_schedule we set a flag value in preempt_count. This flag value is checked at the top of schedule and, if set, we skip the first chunk of code that handles sleeping tasks. The task->state never changes. Upon leaving schedule and returning to preempt_schedule, we unset the flag. This allows us to preempt tasks in any state, without problems or special cases. It also wasn't too much overhead - compared to now, basically just: if (unlikely(current->preempt_count() & PREEMPT_ACTIVE)) goto pick_next_task; at the top of schedule(). Robert Love ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() atboot time 2002-04-04 22:55 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Robert Love @ 2002-04-04 23:10 ` Andrew Morton 2002-04-04 23:16 ` Robert Love 0 siblings, 1 reply; 32+ messages in thread From: Andrew Morton @ 2002-04-04 23:10 UTC (permalink / raw) To: Robert Love Cc: Roger Larsson, Linus Torvalds, Dave Hansen, Adam J. Richter, linux-kernel Robert Love wrote: > > ... > This allows us to preempt tasks in any state, without problems or > special cases. It also wasn't too much overhead - compared to now, > basically just: > > if (unlikely(current->preempt_count() & PREEMPT_ACTIVE)) > goto pick_next_task; > > at the top of schedule(). > That's changing schedule(). Seems that I'd ruled out that option prematurely. As current->preempt_count() and PREEMPT_ACTIVE can both evaluate to constant zero if CONFIG_PREEMPT=n, it can be done ifdeflessly. Everything happens inside rq->lock. Looks solid to me. - ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() atboot time 2002-04-04 23:10 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() atboot time Andrew Morton @ 2002-04-04 23:16 ` Robert Love 0 siblings, 0 replies; 32+ messages in thread From: Robert Love @ 2002-04-04 23:16 UTC (permalink / raw) To: Andrew Morton Cc: Roger Larsson, Linus Torvalds, Dave Hansen, Adam J. Richter, linux-kernel On Thu, 2002-04-04 at 18:10, Andrew Morton wrote: > That's changing schedule(). Seems that I'd ruled out that > option prematurely. As current->preempt_count() and PREEMPT_ACTIVE > can both evaluate to constant zero if CONFIG_PREEMPT=n, it can > be done ifdeflessly. > > Everything happens inside rq->lock. Looks solid to me. It is solid - I did not just invent that approach, it was how we have always done it until around 2.5.6-pre (Ingo sent a patch to change it). The 2.4 patches do it this way (take a look) and 2.5 before the change obviously worked like this. It hasn't shown any problems in ~6 months of use. I'll cook up a patch to do it, but I'd like to hear Linus opinion ... Robert Love ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 18:28 ` Dave Hansen 2002-04-04 18:51 ` Robert Love @ 2002-04-04 19:13 ` Linus Torvalds 2002-04-04 19:16 ` Robert Love 2002-04-04 19:48 ` george anzinger 1 sibling, 2 replies; 32+ messages in thread From: Linus Torvalds @ 2002-04-04 19:13 UTC (permalink / raw) To: Dave Hansen; +Cc: Adam J. Richter, linux-kernel, Robert Love On Thu, 4 Apr 2002, Dave Hansen wrote: > > I've diabled preemption in the area where it used to be disabled because > of the old lock_kernel(). I'm sending this message from a machine with > that patch applied, so the patch does fix it. I think the real fix is to make sure that preemption never hits while current->state == TASK_ZOMBIE. In other words, I think the _correct_ fix is to just make preempt_schedule() check for something like if (current->state != TASK_RUNNING) return; and just getting rid of the current "unconditionally set state to running". (Side note: if the state isn't running, we're most likely going to reschedule in a controlled manner soon anyway. Sure, there are some loops which set state to non-runnable early for race avoidance, but is it a _problem_?) Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 19:13 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Linus Torvalds @ 2002-04-04 19:16 ` Robert Love 2002-04-04 19:45 ` Linus Torvalds 2002-04-04 19:48 ` george anzinger 1 sibling, 1 reply; 32+ messages in thread From: Robert Love @ 2002-04-04 19:16 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Hansen, Adam J. Richter, linux-kernel On Thu, 2002-04-04 at 14:13, Linus Torvalds wrote: > I think the real fix is to make sure that preemption never hits while > current->state == TASK_ZOMBIE. > > In other words, I think the _correct_ fix is to just make > preempt_schedule() check for something like > > if (current->state != TASK_RUNNING) > return; > > and just getting rid of the current "unconditionally set state to > running". > > (Side note: if the state isn't running, we're most likely going to > reschedule in a controlled manner soon anyway. Sure, there are some loops > which set state to non-runnable early for race avoidance, but is it a > _problem_?) Eh, maybe - what about all the code that sets non-running before putting itself on a wait queue? Robert Love ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time 2002-04-04 19:16 ` Robert Love @ 2002-04-04 19:45 ` Linus Torvalds 2002-04-04 20:09 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() atboot time george anzinger 0 siblings, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2002-04-04 19:45 UTC (permalink / raw) To: Robert Love; +Cc: Dave Hansen, Adam J. Richter, linux-kernel On 4 Apr 2002, Robert Love wrote: > > Eh, maybe - what about all the code that sets non-running before putting > itself on a wait queue? In most cases that code will call a schedule itself. Of course, we might make just ZOMBIE a special case, but in general I think it's simply absolutely wrong for the preemption to change task internal data structures on its own. Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() atboot time 2002-04-04 19:45 ` Linus Torvalds @ 2002-04-04 20:09 ` george anzinger 0 siblings, 0 replies; 32+ messages in thread From: george anzinger @ 2002-04-04 20:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: Robert Love, Dave Hansen, Adam J. Richter, linux-kernel Linus Torvalds wrote: > > On 4 Apr 2002, Robert Love wrote: > > > > Eh, maybe - what about all the code that sets non-running before putting > > itself on a wait queue? > > In most cases that code will call a schedule itself. > > Of course, we might make just ZOMBIE a special case, but in general I > think it's simply absolutely wrong for the preemption to change task > internal data structures on its own. > Amen! > Linus > -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Real time sched: http://sourceforge.net/projects/rtsched/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() atboot time 2002-04-04 19:13 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Linus Torvalds 2002-04-04 19:16 ` Robert Love @ 2002-04-04 19:48 ` george anzinger 1 sibling, 0 replies; 32+ messages in thread From: george anzinger @ 2002-04-04 19:48 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Hansen, Adam J. Richter, linux-kernel, Robert Love Linus Torvalds wrote: > > On Thu, 4 Apr 2002, Dave Hansen wrote: > > > > I've diabled preemption in the area where it used to be disabled because > > of the old lock_kernel(). I'm sending this message from a machine with > > that patch applied, so the patch does fix it. > > I think the real fix is to make sure that preemption never hits while > current->state == TASK_ZOMBIE. > > In other words, I think the _correct_ fix is to just make > preempt_schedule() check for something like > > if (current->state != TASK_RUNNING) > return; > I agree that the set state to running code should go, but there are places in the kernel where a great deal of time is spent with the state other than running. AND most of them make sense. Adding a PREEMPTING super state, as was in the original patch, addressed this quite well IMHO. This super state was treated by the scheduler as RUNNING regardless of the actual state. As long as it is not in the actual "state" member, the condition waited for can still set the true state to RUNNING and all is well. What is the objection to this? > and just getting rid of the current "unconditionally set state to > running". > > (Side note: if the state isn't running, we're most likely going to > reschedule in a controlled manner soon anyway. Sure, there are some loops > which set state to non-runnable early for race avoidance, but is it a > _problem_?) > All I can say is that we thought it was when we did the original preemption work. Some of those loops are long. I tend to agree with not doing the random "set state to running". It breeds paranoia, and rightly so. And yes the ZOMBIE code must not be preempted. This is one of the reasons the preempt code was designed to allow schedule() to be called with preemption disabled. I gives a nice clean schedule() call. It might be a better solution than the interrupt off schedule() calls made in the sleep calls in sched.c. > Linus > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Real time sched: http://sourceforge.net/projects/rtsched/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <Pine.LNX.4.33.0204041740220.7731-100000@penguin.transmeta.com>]
* Re: [PATCH] preemptive kernel behavior change: don't be rude [not found] <Pine.LNX.4.33.0204041740220.7731-100000@penguin.transmeta.com> @ 2002-04-05 3:09 ` Robert Love 2002-04-05 20:03 ` Roger Larsson 0 siblings, 1 reply; 32+ messages in thread From: Robert Love @ 2002-04-05 3:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, George Anzinger, lkml Linus and company, Attached find a final version of the new behavior patch. It meets everyone's concerns and it compiles, boots, and survived a basic stress test here. The next step would be to push the preempt_count check back into preempt_schedule and duplicate preempt_schedule in entry.S and then have it call schedule directly. This would be both a space and time optimization. I will do this, but I'd like to give this as it is some testing first. Linus, if this is how you want it, please apply. Patch is against 2.5.8-pre1. Robert Love diff -urN linux-2.5.8-pre1/arch/i386/kernel/entry.S linux/arch/i386/kernel/entry.S --- linux-2.5.8-pre1/arch/i386/kernel/entry.S Wed Apr 3 20:57:24 2002 +++ linux/arch/i386/kernel/entry.S Thu Apr 4 20:44:40 2002 @@ -240,9 +240,7 @@ jnz restore_all incl TI_PRE_COUNT(%ebx) sti - movl TI_TASK(%ebx), %ecx # ti->task - movl $0,(%ecx) # current->state = TASK_RUNNING - call SYMBOL_NAME(schedule) + call SYMBOL_NAME(preempt_schedule) jmp ret_from_intr #endif diff -urN linux-2.5.8-pre1/arch/i386/kernel/ptrace.c linux/arch/i386/kernel/ptrace.c --- linux-2.5.8-pre1/arch/i386/kernel/ptrace.c Wed Apr 3 20:57:24 2002 +++ linux/arch/i386/kernel/ptrace.c Thu Apr 4 20:44:40 2002 @@ -455,11 +455,9 @@ between a syscall stop and SIGTRAP delivery */ current->exit_code = SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) ? 0x80 : 0); - preempt_disable(); current->state = TASK_STOPPED; notify_parent(current, SIGCHLD); schedule(); - preempt_enable(); /* * this isn't the same as continuing with a signal, but it will do * for normal use. strace only continues with a signal if the diff -urN linux-2.5.8-pre1/arch/i386/kernel/signal.c linux/arch/i386/kernel/signal.c --- linux-2.5.8-pre1/arch/i386/kernel/signal.c Wed Apr 3 20:57:24 2002 +++ linux/arch/i386/kernel/signal.c Thu Apr 4 20:44:40 2002 @@ -610,11 +610,9 @@ if ((current->ptrace & PT_PTRACED) && signr != SIGKILL) { /* Let the debugger run. */ current->exit_code = signr; - preempt_disable(); current->state = TASK_STOPPED; notify_parent(current, SIGCHLD); schedule(); - preempt_enable(); /* We're back. Did the debugger cancel the sig? */ if (!(signr = current->exit_code)) @@ -669,14 +667,12 @@ case SIGSTOP: { struct signal_struct *sig; + current->state = TASK_STOPPED; current->exit_code = signr; sig = current->parent->sig; - preempt_disable(); - current->state = TASK_STOPPED; if (sig && !(sig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP)) notify_parent(current, SIGCHLD); schedule(); - preempt_enable(); continue; } diff -urN linux-2.5.8-pre1/include/linux/sched.h linux/include/linux/sched.h --- linux-2.5.8-pre1/include/linux/sched.h Wed Apr 3 20:57:27 2002 +++ linux/include/linux/sched.h Thu Apr 4 21:56:11 2002 @@ -91,6 +91,7 @@ #define TASK_UNINTERRUPTIBLE 2 #define TASK_ZOMBIE 4 #define TASK_STOPPED 8 +#define PREEMPT_ACTIVE 0x4000000 #define __set_task_state(tsk, state_value) \ do { (tsk)->state = (state_value); } while (0) diff -urN linux-2.5.8-pre1/include/linux/spinlock.h linux/include/linux/spinlock.h --- linux-2.5.8-pre1/include/linux/spinlock.h Wed Apr 3 20:57:34 2002 +++ linux/include/linux/spinlock.h Thu Apr 4 21:56:11 2002 @@ -177,8 +177,9 @@ do { \ --current_thread_info()->preempt_count; \ barrier(); \ - if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \ - preempt_schedule(); \ + if (unlikely(!(current_thread_info()->preempt_count) && \ + test_thread_flag(TIF_NEED_RESCHED))) \ + preempt_schedule(); \ } while (0) #define spin_lock(lock) \ diff -urN linux-2.5.8-pre1/kernel/exit.c linux/kernel/exit.c --- linux-2.5.8-pre1/kernel/exit.c Wed Apr 3 20:57:37 2002 +++ linux/kernel/exit.c Thu Apr 4 21:55:29 2002 @@ -499,6 +495,7 @@ acct_process(code); __exit_mm(tsk); + preempt_disable(); sem_exit(); __exit_files(tsk); __exit_fs(tsk); diff -urN linux-2.5.8-pre1/kernel/sched.c linux/kernel/sched.c --- linux-2.5.8-pre1/kernel/sched.c Wed Apr 3 20:57:37 2002 +++ linux/kernel/sched.c Thu Apr 4 20:54:17 2002 @@ -764,6 +764,13 @@ prev->sleep_timestamp = jiffies; spin_lock_irq(&rq->lock); + /* + * if entering from preempt_schedule, off a kernel preemption, + * go straight to picking the next task. + */ + if (unlikely(preempt_get_count() & PREEMPT_ACTIVE)) + goto pick_next_task; + switch (prev->state) { case TASK_INTERRUPTIBLE: if (unlikely(signal_pending(prev))) { @@ -775,9 +782,7 @@ case TASK_RUNNING: ; } -#if CONFIG_SMP pick_next_task: -#endif if (unlikely(!rq->nr_running)) { #if CONFIG_SMP load_balance(rq, 1); @@ -841,10 +846,12 @@ */ asmlinkage void preempt_schedule(void) { - if (unlikely(preempt_get_count())) - return; - current->state = TASK_RUNNING; - schedule(); + do { + current_thread_info()->preempt_count += PREEMPT_ACTIVE; + schedule(); + current_thread_info()->preempt_count -= PREEMPT_ACTIVE; + barrier(); + } while(test_thread_flag(TIF_NEED_RESCHED)); } #endif /* CONFIG_PREEMPT */ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] preemptive kernel behavior change: don't be rude 2002-04-05 3:09 ` [PATCH] preemptive kernel behavior change: don't be rude Robert Love @ 2002-04-05 20:03 ` Roger Larsson 0 siblings, 0 replies; 32+ messages in thread From: Roger Larsson @ 2002-04-05 20:03 UTC (permalink / raw) To: Robert Love; +Cc: lkml Hi Robert, This does not look symmetrical, add and sub and bitops... > +#define PREEMPT_ACTIVE 0x4000000 > + /* > + * if entering from preempt_schedule, off a kernel preemption, > + * go straight to picking the next task. > + */ > + if (unlikely(preempt_get_count() & PREEMPT_ACTIVE)) > + goto pick_next_task; > + > + do { > + current_thread_info()->preempt_count += PREEMPT_ACTIVE; > + schedule(); > + current_thread_info()->preempt_count -= PREEMPT_ACTIVE; > + barrier(); And since it has to be zero to end up in the routine anyway... asmlinkage void preempt_schedule(void) { BUG_ON(current_thread_info()->preempt_count != 0); do { /* Problem: suppose a new interrupt happens before we got to set * preempt_count... then we will call this routine recursively. * innermost will select the correct process, but wont it be scheduled * away in enclosing preempt_schedule() - schedule() will be called * with PREEMPT_ACTIVE but not TIF_NEED_RESCHED... * (goto pick_next_task) */ current_thread_info()->preempt_count = PREEMPT_ACTIVE; /* interrupts here might cause calls to preempt_schedule() but those will bounce off above since preempt_count != 0 */ schedule(); current_thread_info()->preempt_count = 0; /* interrupt here will possibly preempt the right process - no problem */ /* need to check if any interrupt happened during the preemption off time, * this case is in fact unlikely */ while (unlikely(test_thread_flag(TIF_NEED_RESCHED))); } /RogerL -- Roger Larsson Skellefteå Sweden ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2002-04-05 20:02 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-04 11:59 Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Adam J. Richter
2002-04-04 12:56 ` Stelian Pop
2002-04-04 13:40 ` Alessandro Suardi
2002-04-04 16:23 ` David C. Hansen
2002-04-04 18:28 ` Dave Hansen
2002-04-04 18:51 ` Robert Love
2002-04-04 19:14 ` Linus Torvalds
2002-04-04 19:26 ` Robert Love
2002-04-04 19:41 ` Dave Hansen
2002-04-04 20:02 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() atboot time george anzinger
2002-04-04 20:54 ` Andrew Morton
2002-04-04 21:34 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Roger Larsson
2002-04-04 22:38 ` Andrew Morton
2002-04-04 22:42 ` Linus Torvalds
2002-04-04 22:54 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() atboot time Andrew Morton
2002-04-04 23:07 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Robert Love
2002-04-04 23:42 ` Linus Torvalds
2002-04-04 23:47 ` Robert Love
2002-04-04 23:55 ` Linus Torvalds
2002-04-05 0:03 ` [PATCH] preemptive kernel behavior change: don't be rude Robert Love
2002-04-05 1:51 ` george anzinger
2002-04-05 2:06 ` Robert Love
2002-04-04 22:55 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Robert Love
2002-04-04 23:10 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() atboot time Andrew Morton
2002-04-04 23:16 ` Robert Love
2002-04-04 19:13 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() at boot time Linus Torvalds
2002-04-04 19:16 ` Robert Love
2002-04-04 19:45 ` Linus Torvalds
2002-04-04 20:09 ` Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() atboot time george anzinger
2002-04-04 19:48 ` george anzinger
[not found] <Pine.LNX.4.33.0204041740220.7731-100000@penguin.transmeta.com>
2002-04-05 3:09 ` [PATCH] preemptive kernel behavior change: don't be rude Robert Love
2002-04-05 20:03 ` Roger Larsson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox