public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Kernel 2.2: tq_scheduler functions scheduling and waiting
@ 2001-05-28 19:19 Arthur Naseef
  2001-05-29  2:27 ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Arthur Naseef @ 2001-05-28 19:19 UTC (permalink / raw)
  To: linux-kernel

All:

I have been diagnosing kernel panics for over a week and I have
concerns with the use of tq_scheduler for which I was hoping I
could get some assistance.

Is it considered acceptable for functions in the tq_scheduler
task list to call schedule?  Is it acceptable for such functions
to wait on wait queues?  What limitations exist?

As near as I can determine, the TTY driver code makes use of
the tq_scheduler list for such purposes.

In my testing, I am running with 96 TTY devices (talking to a
high-density modem card) and I consistently achieve kernel panics
when the system is under heavy swapping.  I am continuing to
diagnose the problem.  The kernel panics are triggered mostly in
goodness() and del_from_runqueue(), as indicated by ksym_oops and
gdb, and I suspect the run queue is getting corrupted.

In spite of this testing, I believe that I have an argument against
tq_scheduler functions waiting on wait queues, but I have not
thoroughly convinced myself that (a) this was not already known,
and (b) this is already happening in existing kernel code.

Any help is greatly appreciated.

-art

Arthur Naseef

P.S. If this information is availed through existing documentation,
     searches, or other widely available resources, I would greatly
     appreciate references to this material.  All of my searches to
     date have yielded few results and nothing definitive.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel 2.2: tq_scheduler functions scheduling and waiting
  2001-05-28 19:19 Kernel 2.2: tq_scheduler functions scheduling and waiting Arthur Naseef
@ 2001-05-29  2:27 ` Andrew Morton
  2001-05-29 11:21   ` Arthur Naseef
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2001-05-29  2:27 UTC (permalink / raw)
  To: Arthur Naseef; +Cc: linux-kernel

Arthur Naseef wrote:
> 
> All:
> 
> I have been diagnosing kernel panics for over a week and I have
> concerns with the use of tq_scheduler for which I was hoping I
> could get some assistance.
> 
> Is it considered acceptable for functions in the tq_scheduler
> task list to call schedule?  Is it acceptable for such functions
> to wait on wait queues?  What limitations exist?

When a task wants to exit, it cleans up all its stuff,
sets its state to TASK_ZOMBIE and then calls schedule().
The scheduler takes it off the runqueue and the task
is never again executed.  It's just a couple of stack
pages which are waiting for someone in wait4() to release.

But imagine what happens if the TASK_ZOMBIE task hits
schedule() and finds a tq_scheduler task to run.  And that
task calls schedule().  In state TASK_ZOMBIE.  Messy.

At the very least, the schedule() call will never return.

If the tq_scheduler task sets current->state to 
TASK_[UN]INTERRUPTIBLE (as it should) before calling
schedule() then it has overwritten TASK_ZOMBIE and the
task which is trying to exit has become magically
resurrected.  As far as I can tell, the "dead" task
will run again, do the `fake_volatile' thing in do_exit()
and try to go zombie again.

It would be very interesting to change the test in
schedule():

        sti();
-       if (tq_scheduler)
+       if (tq_scheduler && current->state != TASK_ZOMBIE)
                goto handle_tq_scheduler;

It's all rather unpleasant, and tq_scheduler was killed
in 2.4.  I suggest you take a look at all the serial
drivers in 2.4, see how I converted them to use schedule_task().
Someone kindly ported schedule_task() to 2.2.recent, so you
should be able to use that in the same way.

-

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Kernel 2.2: tq_scheduler functions scheduling and waiting
  2001-05-29  2:27 ` Andrew Morton
@ 2001-05-29 11:21   ` Arthur Naseef
  2001-05-29 11:25     ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Arthur Naseef @ 2001-05-29 11:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew:

Excellent.  I will look at the 2.4 sources.

In addition to the TASK_ZOMBIE issue you mention, I believe there
is an issue of false termination of wait queues.  Consider this:

	- Task places itself on a wait queue
	- Calls schedule()
	- tq_scheduler function does the same

Now, there are two events which could place the task in TASK_RUNNING
and no clear way to differentiate.  And, since most of the kernel
code does not check that the wait condition was actually met, this
could lead to all types of problems, right?

-art

-----Original Message-----
From: akpm@uow.edu.au [mailto:akpm@uow.edu.au]On Behalf Of Andrew Morton
Sent: Monday, May 28, 2001 10:28 PM
To: Arthur Naseef
Cc: linux-kernel@vger.kernel.org
Subject: Re: Kernel 2.2: tq_scheduler functions scheduling and waiting


Arthur Naseef wrote:
> 
> All:
> 
> I have been diagnosing kernel panics for over a week and I have
> concerns with the use of tq_scheduler for which I was hoping I
> could get some assistance.
> 
> Is it considered acceptable for functions in the tq_scheduler
> task list to call schedule?  Is it acceptable for such functions
> to wait on wait queues?  What limitations exist?

When a task wants to exit, it cleans up all its stuff,
sets its state to TASK_ZOMBIE and then calls schedule().
The scheduler takes it off the runqueue and the task
is never again executed.  It's just a couple of stack
pages which are waiting for someone in wait4() to release.

But imagine what happens if the TASK_ZOMBIE task hits
schedule() and finds a tq_scheduler task to run.  And that
task calls schedule().  In state TASK_ZOMBIE.  Messy.

At the very least, the schedule() call will never return.

If the tq_scheduler task sets current->state to 
TASK_[UN]INTERRUPTIBLE (as it should) before calling
schedule() then it has overwritten TASK_ZOMBIE and the
task which is trying to exit has become magically
resurrected.  As far as I can tell, the "dead" task
will run again, do the `fake_volatile' thing in do_exit()
and try to go zombie again.

It would be very interesting to change the test in
schedule():

        sti();
-       if (tq_scheduler)
+       if (tq_scheduler && current->state != TASK_ZOMBIE)
                goto handle_tq_scheduler;

It's all rather unpleasant, and tq_scheduler was killed
in 2.4.  I suggest you take a look at all the serial
drivers in 2.4, see how I converted them to use schedule_task().
Someone kindly ported schedule_task() to 2.2.recent, so you
should be able to use that in the same way.

-

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel 2.2: tq_scheduler functions scheduling and waiting
  2001-05-29 11:21   ` Arthur Naseef
@ 2001-05-29 11:25     ` Andrew Morton
  2001-05-30  1:16       ` Arthur Naseef
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2001-05-29 11:25 UTC (permalink / raw)
  To: Arthur Naseef; +Cc: linux-kernel

Arthur Naseef wrote:
> 
> Andrew:
> 
> Excellent.  I will look at the 2.4 sources.
> 
> In addition to the TASK_ZOMBIE issue you mention, I believe there
> is an issue of false termination of wait queues.  Consider this:
> 
>         - Task places itself on a wait queue
>         - Calls schedule()
>         - tq_scheduler function does the same
> 
> Now, there are two events which could place the task in TASK_RUNNING
> and no clear way to differentiate.  And, since most of the kernel
> code does not check that the wait condition was actually met, this
> could lead to all types of problems, right?
> 

Yes.  The situation where one task is on two waitqueues
is rare, but does happen.  And yes, there is code out there
which does a bare schedule() and *assumes* that once the
schedule has returned, the thing it was waiting for has
indeed occurred.

Generally this is poor practice - it's safer to loop
over the schedule() call until the condition you're
sleeping on has been tested.

You really shouldn't be sleeping in this way on tq_scheduler
if there's any way in which the sleep can take an extended
period of time.  You may end up putting important kernel
tasks to sleep.

Best to use schedule_task(), or an independent kernel thread.

-

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Kernel 2.2: tq_scheduler functions scheduling and waiting
  2001-05-29 11:25     ` Andrew Morton
@ 2001-05-30  1:16       ` Arthur Naseef
  0 siblings, 0 replies; 5+ messages in thread
From: Arthur Naseef @ 2001-05-30  1:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

I have tested with a kernel thread running the tq_scheduler and it
is much more stable.  The kernel still ran into a problem in n_tty.c
in which the compiler optimized-out the check "if (!tty)" in
n_tty_set_termios(); I am still investigating the right solution to
this.

As a long term fix, I will review the 2.4 and latest 2.2 sources.

> Yes.  The situation where one task is on two waitqueues
> is rare, but does happen.  And yes, there is code out there
> which does a bare schedule() and *assumes* that once the
> schedule has returned, the thing it was waiting for has
> indeed occurred.
> 
> Generally this is poor practice - it's safer to loop
> over the schedule() call until the condition you're
> sleeping on has been tested.

I see your point.  It would prevent this type of problem if all code
waiting for conditions made certain those conditions were met.  However,
given the way the kernel works, it is not necessary to check unless the
task specifically expects more than one condition to awaken it - at
least it wasn't until tq_scheduler was introduced.  Actually, that is
not fair either - only when functions in tq_scheduler starting
"blocking" did this become a problem.

It would help me tremendously if these types of limitations and
requirements for working in the kernel were well documented.  It takes
significant effort to determine the requirements, and to verify that
my understanding is correct.

> 
> You really shouldn't be sleeping in this way on tq_scheduler
> if there's any way in which the sleep can take an extended
> period of time.  You may end up putting important kernel
> tasks to sleep.

I agree.  In addition, even if the tq_scheduler function did check for
its own condition, a problem still exists when the task returns to the
code using the first wait queue before its condition is met; since the
code using the second wait queue would set the task state to running
and would not set it back (which it couldn't without knowing the
conditions to check).

> 
> Best to use schedule_task(), or an independent kernel thread.
> 
> -

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2001-05-30  1:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-05-28 19:19 Kernel 2.2: tq_scheduler functions scheduling and waiting Arthur Naseef
2001-05-29  2:27 ` Andrew Morton
2001-05-29 11:21   ` Arthur Naseef
2001-05-29 11:25     ` Andrew Morton
2001-05-30  1:16       ` Arthur Naseef

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox