From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S272167AbTHNGJM (ORCPT ); Thu, 14 Aug 2003 02:09:12 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S272220AbTHNGJM (ORCPT ); Thu, 14 Aug 2003 02:09:12 -0400 Received: from holomorphy.com ([66.224.33.161]:38840 "EHLO holomorphy") by vger.kernel.org with ESMTP id S272167AbTHNGJA (ORCPT ); Thu, 14 Aug 2003 02:09:00 -0400 Date: Wed, 13 Aug 2003 23:09:59 -0700 From: William Lee Irwin III To: Timothy Miller Cc: rob@landley.net, Charlie Baylis , linux-kernel@vger.kernel.org, kernel@kolivas.org Subject: Re: [PATCH] O12.2int for interactivity Message-ID: <20030814060959.GK32488@holomorphy.com> Mail-Followup-To: William Lee Irwin III , Timothy Miller , rob@landley.net, Charlie Baylis , linux-kernel@vger.kernel.org, kernel@kolivas.org References: <20030804195058.GA8267@cray.fish.zetnet.co.uk> <3F303494.3030406@techsource.com> <200308110414.28569.rob@landley.net> <3F382B8B.9000301@techsource.com> <20030812001759.GS1715@holomorphy.com> <3F39020C.6040408@techsource.com> <20030812233201.GT1715@holomorphy.com> <3F3A5D61.7080207@techsource.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3F3A5D61.7080207@techsource.com> Organization: The Domain of Holomorphy User-Agent: Mutt/1.5.4i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org William Lee Irwin III wrote: >> The equivalent for the 2.4.x scheduler is the "epoch"; essentially what >> 2.6.x has implemented is incremental timeslice assignment for epoch >> expiry. On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote: > What do you mean by "incremental"? And by 2.6.x, are you talking about > vanilla, or interactive? I had no idea you had in mind Con Kolivas' changes vs. 2.6.x as "vanilla" vs. otherwise; I thought you meant 2.4.x vs. 2.6.x. In general, Con's patches just change how dynamic priorities are computed. The "incremental" business meant that tasks have their timeslices for the next epoch as they are moved onto the expired queue. In the 2.4.x scheduler this was not incremental, so when the epoch expired, every task on the system had its timeslice recomputed in one large (and time-consuming) loop. The 2.4.x case was also moderately wasteful as it recomputed priorities for tasks regardless of whether they were runnable, which IMHO is a flaw for which corrections are 2.4.x-mergeable. William Lee Irwin III wrote: >> The way that goes is that when a timeslice runs out, it's >> recomputed, and the expired queue is used as a placeholder for the >> information about what to run and in what order after the epoch expires >> (i.e. the active queue empties). On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote: > Where are blocked tasks placed? Are they put somewhere else and readded > to the active queue when their I/O (or whatever) completes? How is > their ordering in the queue determined? That is, does a task which > wakes up preempt the running task? Does it get run "next"? > In Operating Systems design class in college, we were told that when an > I/O interrupt arrives, the currently active task isn't immediately > returned to, because the overhead of the context switch and the fact > that we don't know how near to the end the task is to its timeslice > makes it not worth it, so we just move on to the next task. Blocked tasks are taken off the runqueues entirely, and are found by device drivers etc. by putting them on a waitqueue so whenever whatever they're blocked on happens, they can be found and put on the runqueue. William Lee Irwin III wrote: >> When the epoch expires/active queue >> empties, the two queues exchange places. The net effect is that the >> duelling queues end up representing a circular list maintained in some >> special order that can have insertions done into the middle of it for >> tasks designated as "interactive", and that priority preemption's >> effectiveness is unclear. On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote: > And by "middle", do you mean at the end of the active queue which is the > "middle" if you think of the expired queue as being a continuation of > the active queue? The "middle" is some arbitrary position in this list that makes it neither the last nor the first of the currently runnable tasks to be run after the currently running task. The above analogy removes the distinctions between active and expired queues, so it doesn't make sense to ask about the boundary between the two. To rehash, consider changing the representation of the scheduler's internals from two arrays to a single doubly-linked list (obviously it's not useful to implement this, but it is conceptually useful). Movement from the active queue to the expired queue would then be modeled by simply list_move_tail(&task->run_list, &run_list). Recycling interactive tasks to the active list or inserting newly woken tasks at particular priorities would involve hunting for a special position in the list and doing list_add() to there. Timeslices are entirely orthogonal; they control when list movement is done, but not how it's done. In this way, we can see that the priorities merely control how tasks are jumbled together into the list because the above alternative representation faithfully models the current arrangement. We're essentially cycling through this circular list and have to struggle to recover preferences for interactive tasks by fiddling with timeslices and the like, as otherwise such a circular queue would be completely fair, to the detriment of interactive tasks. More common arrangements avoid this "tying of the knot" with the queue swapping, and because of this, are capable of expressing long-term preferences directly in terms of priorities, as opposed to having to use timeslices to do so. Instead, timeslices are used to express the "scale" on which scheduling events should happen, and as tasks become more cpu-bound, they have longer timeslices, so that two cpu-bound tasks of identical priority will RR very slowly and have reduced context switch overhead, but are near infinitely preemptible by more interactive or short-running tasks. William Lee Irwin III wrote: >> Obviously this threatens to degenerate to something RR-like that fails >> to give the desired bonuses to short-running tasks. The way priorities >> are enforced is dynamic timeslice assignment, where longer tasks >> receive shorter timeslices and shorter tasks receive longer timeslices, On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote: > How do you vary timeslice? Can you tell the interrupt hardware to > "interrupt me after X clock ticks"? Or do you service the interrupt and > just do very little in the ISR if you're not going to switch contexts? > Isn't that a bit of overhead? task_timeslice() does a computation based on the dynamic priority to assign timeslices. This computation is effectively linear. William Lee Irwin III wrote: >> The active queue is scanned, but the queue data structure is organized >> so that it's composed of a separate list for each priority, and a >> bitmap is maintained alongside the array of lists for quicker searches >> for the highest nonempty priority (numerically low), and so scanning is >> an O(1) algorithm and touches very few cachelines (if more than one). On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote: > Cool. So, it's not so much a "queue" as it is a sort of an array of > lists, eh? The data structure itself is a priority queue; the implementation (or representation) of it used in the scheduler is an array of lists with a bitmap on the side. Other representations such as relaxed heaps or complete binary tree (CBT) heaps are possible, though probably not particularly useful for such small priority spaces, which are faster to exhaustively search with ffz() and the like than to use such structures to search. William Lee Irwin III wrote: >> Neither linearly nor exponentially; nice levels assign static >> priorities; dynamic priorities (the ones used for scheduling decisions) >> are restricted to the range where |dynamic_prio - static_prio| <= 5. >> Nice values do not keep tasks off the expired queue. Tasks on the >> expired queue are ordered by priority and not examined until the epoch >> expiry. On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote: > Doesn't it cause starvation of expired tasks if interactive tasks keep > hogging the active queue? Does it matter? It may very well cause starvation of expired tasks. It's unclear whether it matters. Instrumentation for these kinds of things is lacking, along with any results indicating what on earth is going on. William Lee Irwin III wrote: >> The 2.4.x scheduler had static priorities (nice numbers) and >> recomputed what amounted to dynamic priorities (the "goodness" rating) >> on the fly each time a task was examined. 2.4.x also recomputed >> priorities via the infamous for_each_task() loop >> for_each_task(p) >> p->count = (p->counter >> 1) + NICE_TO_TICKS(p->nice) On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote: > What does this do, exactly? It loops through every task on the system assigning timeslices to every task according to its nice number. William Lee Irwin III wrote: >> in the if (unlikely(!c)) case of the repeat_schedule: label where 2.6.x >> merely examines the priority when the task exhausts its timeslice to >> recompute it (and so the expired queue is a useful part of the >> mechanics of the algorithm: it's a placeholder for tasks whose >> timeslices have been recomputed but don't actually deserve to be run). On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote: > Is this recomputation of timeslice based on the heuristics in the > interactive scheduler, or does vanilla also do something like this? > What does it do differently? The timeslices in 2.6.x are controlled indirectly by the dynamic priority. But the dynamic priorities in 2.6.x are assigned according to interactivity heuristics, in both vanilla and Kolivas' patches. -- wli