From mboxrd@z Thu Jan 1 00:00:00 1970 From: Frank Rowand Subject: Re: Interrupt Bottom Half Scheduling Date: Tue, 15 Feb 2011 10:38:07 -0800 Message-ID: <4D5AC80F.1090205@am.sony.com> References: Reply-To: frank.rowand@am.sony.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Frank Rowand , linux-rt-users@vger.kernel.org To: Peter LaDow Return-path: Received: from mail-pw0-f46.google.com ([209.85.160.46]:63930 "EHLO mail-pw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754721Ab1BOSiO (ORCPT ); Tue, 15 Feb 2011 13:38:14 -0500 Received: by pwj3 with SMTP id 3so119199pwj.19 for ; Tue, 15 Feb 2011 10:38:13 -0800 (PST) In-Reply-To: Sender: linux-rt-users-owner@vger.kernel.org List-ID: On 02/15/11 08:42, Peter LaDow wrote: > On Mon, Feb 14, 2011 at 5:58 PM, Frank Rowand wrote: >> Just so we are speaking with a common definition of jitter, your first email >> said that the duration of the priority 99 thread loop increased by >> around 350us (average and maximum) when the lower priority task >> timers were added to the system. > > Well, I'm only speaking to the maximum. We do expect some increase in > the maximum runtime of the loop when those other timers are added. > However, we did not expect it to occasionally spike by 350us. > >>> Sure, we expect the timer interrupt to interfere. But as we >> >> So what is the overhead of the timer interrupt? > > We are on a PPC platform, and the decrementer interrupt is in > arch/powerpc/kernel/time.c on lines 541-593. The only line that seems > that it can have an impact (at least with regard to the timers) is on > line 576: > > evt->event_handler(evt); > > Which according to /proc/timer_list is hrtimer_interrupt. This is > found in kernel/hrtimer.c (lines 1195-1267). And this does indeed > seem to be where the bulk of the problem lies. On line 1226 we have: > > while ((node = base->first)) { > > Which loops through all the clock bases. This only checks the first > timer on the rbtree (uses base-->first). It then calls __run_timer > with the timer at the head of the tree. And __run_hrtimer calls the > timer callback function. In the case of these timers it is > hrtimer_wakeup. And each of these calls wake_up_process(). > > So hmm, perhaps this is it. There is no softirq that calls the wakeup > function. In fact, there doesn't seem to be a bottom half in this > case at all. The decrementer interrupt does all the work, rather than > postpone it to a bottom half. Looking at the call tree: > > timer_interrupt > | > + hrtimer_interrupt > | > + __run_timer > | > + hrtimer_wakeup > | > + wake_up_process > | > + try_to_wake_up > > And the try_to_wake_up is the scheduler (no?). try_to_wake_up() is in the scheduler code (kernel/sched.c), but it is not "the scheduler". If the task is not already running, try_to_wake_up() will put the task on the run queue and set it's state to TASK_RUNNING. If the priority of the newly woken thread was higher than the current thread, then the newly woken thread would preempt current. If a preemption occurred, then TIF_NEED_RESCHED is set. The actual "schedule" will occur on the exit path of the interrupt only if TIF_NEED_RESCHED is set (see the call of preempt_schedule_irq()). > > So, if this is the chain of events, then what is sirq-hrtimer for? I > see in hrtimers_init (lines 1642-1650): > > open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq); > > And run_hrtimer_softirq eventually calls hrtimer_interrupt. But the > prior mechanism seems to be the standard means. Even on my x86 box > (2.6.32-28) it shows hrtimer_interrupt as the event handler for the > clocks. And looking in arch/x86/kernel/time_32.c and > arch/x86/kernel/time_64.c both take the same route. > > So, it seems to me that run_hrtimer_softirq never gets called via any > interrupt mechanism. In fact, it only seems to be called when > creating timers such as in nanosleep. The HRTIMER_SOFTIRQ is only > raised in hrtimer_enqueue_reprogram, which is called in > hrtimer_start_range_ns. And none of these have to do with timer > expiration. > > So, it seems the problem really is interrupt overhead. We had > presumed that the timer sirq-hrtimer handled these timer expirations, > and thus the scheduler. Rather, we find that a full reschedule is > being done every interrupt. You should not have a full reschedule when a timer interrupt occurs for a priority 50 process while the priority 99 process is executing (see earlier explanation). But yes, there is a possibility that the problem is interrupt overhead. You could measure it to verify the theory. > > Does my analysis make sense? Yes. I did not double check the actual code that you described, and I haven't been poking around in PPC for a while, but what you describe sounds reasonable. > > Thanks, > Pete >