public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* scheduler problems
@ 2002-06-12  5:20 Anjali Kulkarni
  2002-06-12  6:20 ` Ingo Molnar
  0 siblings, 1 reply; 5+ messages in thread
From: Anjali Kulkarni @ 2002-06-12  5:20 UTC (permalink / raw)
  To: linux-kernel


Hi,

I am getting a problem in the scheduler() function....

I am running an in-kernel proxy on linux 2.2.16 and I get a problem in 
sched.c at line 384. It is due to the fact that the schedule() function 
does not find the 'current' process in the runqueue. (A detailed 
explanation of the OOPS message which comes when run without serial 
line debugging is given below).

With serial line debugging I got the following backtrace:---

0# schedule() at sched.c:384
1# schedule_timeout(timeout=-806527036) at sched.c:653
2# kupdate (unused=0x0) at buffer.c:1921
3# kernel_thread(fn=0xb, arg=0xbffff86c, flags=0) at process.c:496
4# system_call at process.c:812

Note that paramters to functions schedule_timeout(negative value) and 
kernel_thread are incorrect or do not seem right. 
When I booted the kernel, I set breakpoints in init/main.c where 
kupdate is created, and it shows a correct call to kernel_thread-
>kupdate->schedule_timeout->schedule with all functions called with 
correct parameters.

Can anyone tell me what's happening here? My kernel module is no way 
the cause of any of this. A detailed explanation is given below...

Thanks!
Anjali

/*--------------------In more detail-------------------------*/
I am running an in-kernel proxy on linux 2.2.16, which places high 
demand on the n/w activity of the linux m/c. I am repeatedly getting an 
OOPS message at a particular place in the scheduler() function call. I 
am trying to analyse the call trace, and it looks something like the 
following:-

schedule()
schedule_timeout()
process_timeout()
do_poll()
sys_poll()

After looking at the address where OOPS reports a problem in schedule() 
and looking at the objdump of sched.o, I found the problem is due to 
the fact that when schedule() calls del_from_runqueue(), it finds that 
the current process is *not* there on the runqueue. Further, this 
current process *IS* present in the list of task structs, in 
TASK_INTERRPUTIBLE state. This process is generally some process like 
inetd or some process doing n/w activity. 
Now, if I kill all the processes on my linux machine (just to check), 
the problem frequency reduces, but it still appears, and now the 
process not present on the runqueue is some process like init(pid 1) or 
kupdate(pid 3). These are the processes which could not be 
killed.
>From this, I concluded what happens is probably that some process 
called sys_poll which called do_poll(). In do_poll(), a 
process_timeout occured(I assume this a soft interrupt), which will try 
and wakeup the process which caused it, ie put the process on the 
runqueue. I dont know who calls the schedule_timeout? Is it the process 
which wakes up from the call to schedule_timeout() after a context 
switch occured?  I am probably not aware of exactly how to interpret a 
call trace...

Thanks again.

/*----------------------------------------------------------------*/


Anjali Kulkarni
Software Engineer
Indra Networks

~Living Well is the best Revenge~

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: scheduler problems
  2002-06-12  5:20 Anjali Kulkarni
@ 2002-06-12  6:20 ` Ingo Molnar
  0 siblings, 0 replies; 5+ messages in thread
From: Ingo Molnar @ 2002-06-12  6:20 UTC (permalink / raw)
  To: Anjali Kulkarni; +Cc: linux-kernel


On Tue, 11 Jun 2002, Anjali Kulkarni wrote:

> I am getting a problem in the scheduler() function....
> 
> I am running an in-kernel proxy on linux 2.2.16 and I get a problem in 
> sched.c at line 384. [...]

(given that the current 2.2 kernel is 2.2.21, the first thing would be to
test it there too.)

> [...] It is due to the fact that the schedule() function does not find
> the 'current' process in the runqueue. [...]

a crash in line 384 means that the runqueue got corrupted by something,
most likely caused by buggy kernel code outside of the scheduler.

> Can anyone tell me what's happening here? My kernel module is no way the
> cause of any of this. [...]

does it happen if you do not run your kernel module after bootup, ever?

	Ingo


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: scheduler problems
@ 2002-06-12  7:14 Anjali Kulkarni
  2002-06-12 10:20 ` Richard Zidlicky
  0 siblings, 1 reply; 5+ messages in thread
From: Anjali Kulkarni @ 2002-06-12  7:14 UTC (permalink / raw)
  To: mingo, Anjali Kulkarni, linux-kernel


> (given that the current 2.2 kernel is 2.2.21, the first thing would 
be to
> test it there too.)
> 

Thanks, I 'll do that.

> > [...] It is due to the fact that the schedule() function does not 
find
> > the 'current' process in the runqueue. [...]
> 
> a crash in line 384 means that the runqueue got corrupted by 
something,
> most likely caused by buggy kernel code outside of the scheduler.

Right, I thought of that, but how is it that it gets corrupt at exactly 
the same offset in task_struct of that process and every time with 
different processes? (I have run it atleast 20-30 times). And it just 
doesnt come if I kill the process in question? (I couldnt kill kupdate, 
and hence it comes anyways). And I have checked the task_struct of that 
process, the next_task & prev_task & other fields are not corrupted. 
Ofcource, it's still possible, like if the memory allocated & freed by 
my code is then used by scheduler for allocating task_struct; and then 
it is accessed again by mistake by my code at the same offset. 
But you feel sure it's a run queue corruption problem, and not anything 
else? If so, is there any particular way to debug this?

> > Can anyone tell me what's happening here? My kernel module is no 
way the
> > cause of any of this. [...]
> 
> does it happen if you do not run your kernel module after bootup, 
ever?

No, it does not:(

Thanks,
Anjali

> 
> 	Ingo
> 
> 


Anjali Kulkarni
Software Engineer
Indra Networks

~Living Well is the best Revenge~

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: scheduler problems
  2002-06-12  7:14 scheduler problems Anjali Kulkarni
@ 2002-06-12 10:20 ` Richard Zidlicky
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Zidlicky @ 2002-06-12 10:20 UTC (permalink / raw)
  To: Anjali Kulkarni; +Cc: mingo, linux-kernel

On Wed, Jun 12, 2002 at 12:14:09AM -0700, Anjali Kulkarni wrote:
> 
> > (given that the current 2.2 kernel is 2.2.21, the first thing would 
> be to
> > test it there too.)
> > 
> 
> Thanks, I 'll do that.
> 
> > > [...] It is due to the fact that the schedule() function does not 
> find
> > > the 'current' process in the runqueue. [...]
> > 
> > a crash in line 384 means that the runqueue got corrupted by 
> something,
> > most likely caused by buggy kernel code outside of the scheduler.
> 
> Right, I thought of that, but how is it that it gets corrupt at exactly 
> the same offset in task_struct of that process and every time with 
> different processes? (I have run it atleast 20-30 times). And it just 
> doesnt come if I kill the process in question? 

I've had similar problems when some code invalidated CPU cache 
and an interrupt came in at the wrong time.

Richard

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: scheduler problems
@ 2002-06-13  5:38 Anjali Kulkarni
  0 siblings, 0 replies; 5+ messages in thread
From: Anjali Kulkarni @ 2002-06-13  5:38 UTC (permalink / raw)
  To: Richard Zidlicky, Anjali Kulkarni, mingo, linux-kernel



> > > > [...] It is due to the fact that the schedule() function does 
not 
> > find
> > > > the 'current' process in the runqueue. [...]
> > > 
> > > a crash in line 384 means that the runqueue got corrupted by 
> > something,
> > > most likely caused by buggy kernel code outside of the scheduler.
> > 
> > Right, I thought of that, but how is it that it gets corrupt at 
exactly 
> > the same offset in task_struct of that process and every time with 
> > different processes? (I have run it atleast 20-30 times). And it 
just 
> > doesnt come if I kill the process in question? 
> 
> I've had similar problems when some code invalidated CPU cache 
> and an interrupt came in at the wrong time.
> 

Hi!

I have not very clear on what u mean. Can u explain in more detail?

Thanks,
Anjali

> Richard
> 
> 


Anjali Kulkarni
Software Engineer
Indra Networks

~Living Well is the best Revenge~

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-06-13  5:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-12  7:14 scheduler problems Anjali Kulkarni
2002-06-12 10:20 ` Richard Zidlicky
  -- strict thread matches above, loose matches on Subject: below --
2002-06-13  5:38 Anjali Kulkarni
2002-06-12  5:20 Anjali Kulkarni
2002-06-12  6:20 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox