Re: -rt more realtime scheduling issues

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Mike Kravetz <kravetz@us.ibm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: -rt more realtime scheduling issues
Date: Wed, 10 Oct 2007 19:37:23 -0700	[thread overview]
Message-ID: <20071011023723.GI5049@monkey.ibm.com> (raw)
In-Reply-To: <20071010115052.GA21391@goodmis.org>

On Wed, Oct 10, 2007 at 07:50:52AM -0400, Steven Rostedt wrote:
> On Tue, Oct 09, 2007 at 11:49:53AM -0700, Mike Kravetz wrote:
> > The more I try understand the IPI handling the more confused I get. :(
> > At fist I was concerned about an IPI happening in the middle of the
> > __schedule routine.  But, then it occurred to me that interrupts are
> > disabled when in this routine (when holding the runqueue lock).  So, IPIs
> > are not delivered during __schedule processing.  Right?
> > 
> > But, if this is case then I don't understand the following code in
> > schedule():
> > 
> >         local_irq_disable();
> > 
> >         do {
> >                 __schedule();
> >         } while (unlikely(test_thread_flag(TIF_NEED_RESCHED) ||
> >                           test_thread_flag(TIF_NEED_RESCHED_DELAYED)));
> > 
> >         local_irq_enable();
> > 
> > How can the reschedule flags possibly be set AFTER running __schedule.
> > Especially when the call is explicitly surrounded by local_irq_disable/
> > local_irq_enable.
> > 
> > Can someone help me?
> > 
> 
> Sure, another CPU can set the tasks NEED_RESCHED flag. In try_to_wake_up,
> if the process that is waking up is on a runqueue on another CPU and it
> is of higher priority than the current running task, the process that is
> doing the waking will set the NEED_RESCHED flag for that task.

Yes right.  I guess I spent too much time thinking about the 'broadcast
IPI' case where NEED_RESCHED is only set by the handler.  In the case
where we 'reschedule' on a specific CPU the flag is set and IPI sent.


> So to prevent a race where we have called schedule and after getting to
> the new running task, a higher priority process just got scheduled in,
> we will catch that here.
> 
> Now if this is really needed? I don't know. It seems that it just wants
> to check here so we don't need to jump to the interrupt and then
> schedule while coming back out of the interrupt handler as a preemption
> schedule. This way we just schedule again and save a little overhead
> from doing that through the interrupt.

One more thing.  test_thread_flag() uses thread info of the current
thread.  But, if __schedule() did a context switch then it is possible
the NEED_RESCHED flag was set in the previous task, instead of current.
Does that make sense?  The resched flags get cleared at the begining
of __schedule (as they should).  But, if we really want that loop to
be valid shouldn't the flags be copied to the current task.  Something
like the following after the context switch:

	if (test_and_clear_tsk_thread_flag(prev, TIF_NEED_RESCHED))
		set_tsk_need_resched(current);
	if (test_and_clear_tsk_thread_flag(prev, TIF_NEED_RESCHED_DELAYED))
		set_tsk_need_resched_delayed(current);

Don't we also need to worry about the flags being left set in the
previous task?  That is why I think a test_and_clear would make sense.

But, your argument about axing the loop altogether makes sense as well.

> But this brings up an interesting point. Since the IRQ handlers are run
> as threads, and the interrupt is what will wake them, this seems to add
> a bit of latency to interrupts.
> 
> For example:
> 
>   We schedule in process A of prio 1
> 
>   before exiting __schedule process B is woken up on that same rq
>   with a prio of 2 and sets A's NEED_RESCHED flag.
> 
>   Also an interrupt goes off and sent to this CPU. But since interrupts
>   are disabled, we wait.
> 
>   leaving __schedule() we see that A's NEED_RESCHED flag is set, so we
>   continue the do while loop and call __schedule again.
> 
>   We schedule in B of prio 2.
> 
>   Leave __schedule as well as the do while loop and then enable
>   interrupts.
> 
>   The interrupt that was pending is now triggered.
> 
>   Wakes up the handler of prio 90 and since it is higher in priority
>   than process B of prio 2 it sets B's NEED_RESCHED flag.
> 
>   On return from the interrupt we call schedule again.
> 
> This seems strange. I can imagine on a large # of CPUs box that this can
> happen quite often, and have the interrupts disabled for several rounds
> through schedule.
> 
> I say we ax that while loop.
> 
> Ingo?
> 
> -- Steve

-- 
Mike

next prev parent reply	other threads:[~2007-10-11  2:37 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-06  2:15 -rt more realtime scheduling issues Mike Kravetz
2007-10-08 18:45 ` Mike Kravetz
2007-10-09  3:04   ` Steven Rostedt
2007-10-09  8:16     ` Peter Zijlstra
2007-10-09 18:49     ` Mike Kravetz
2007-10-10 11:50       ` Steven Rostedt
2007-10-11  2:37         ` Mike Kravetz [this message]
2007-10-09  2:46 ` [PATCH RT] fix rt-task scheduling issue Steven Rostedt
2007-10-09  4:18   ` Gregory Haskins
2007-10-09 18:51   ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071011023723.GI5049@monkey.ibm.com \
    --to=kravetz@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox