* [PATCH] mm: yield during swap prefetching
@ 2006-03-07 23:13 Con Kolivas
2006-03-07 23:26 ` Andrew Morton
0 siblings, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-07 23:13 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, Andrew Morton, ck
Swap prefetching doesn't use very much cpu but spends a lot of time waiting on
disk in uninterruptible sleep. This means it won't get preempted often even at
a low nice level since it is seen as sleeping most of the time. We want to
minimise its cpu impact so yield where possible.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
---
mm/swap_prefetch.c | 1 +
1 file changed, 1 insertion(+)
Index: linux-2.6.15-ck5/mm/swap_prefetch.c
===================================================================
--- linux-2.6.15-ck5.orig/mm/swap_prefetch.c 2006-03-02 14:00:46.000000000 +1100
+++ linux-2.6.15-ck5/mm/swap_prefetch.c 2006-03-08 08:49:32.000000000 +1100
@@ -421,6 +421,7 @@ static enum trickle_return trickle_swap(
if (trickle_swap_cache_async(swp_entry, node) == TRICKLE_DELAY)
break;
+ yield();
}
if (sp_stat.prefetched_pages) {
^ permalink raw reply [flat|nested] 112+ messages in thread* Re: [PATCH] mm: yield during swap prefetching 2006-03-07 23:13 [PATCH] mm: yield during swap prefetching Con Kolivas @ 2006-03-07 23:26 ` Andrew Morton 2006-03-07 23:32 ` Con Kolivas 2006-03-08 8:48 ` [ck] Re: [PATCH] mm: yield during swap prefetching Andreas Mohr 0 siblings, 2 replies; 112+ messages in thread From: Andrew Morton @ 2006-03-07 23:26 UTC (permalink / raw) To: Con Kolivas; +Cc: linux-kernel, linux-mm, ck Con Kolivas <kernel@kolivas.org> wrote: > > Swap prefetching doesn't use very much cpu but spends a lot of time waiting on > disk in uninterruptible sleep. This means it won't get preempted often even at > a low nice level since it is seen as sleeping most of the time. We want to > minimise its cpu impact so yield where possible. > > Signed-off-by: Con Kolivas <kernel@kolivas.org> > --- > mm/swap_prefetch.c | 1 + > 1 file changed, 1 insertion(+) > > Index: linux-2.6.15-ck5/mm/swap_prefetch.c > =================================================================== > --- linux-2.6.15-ck5.orig/mm/swap_prefetch.c 2006-03-02 14:00:46.000000000 +1100 > +++ linux-2.6.15-ck5/mm/swap_prefetch.c 2006-03-08 08:49:32.000000000 +1100 > @@ -421,6 +421,7 @@ static enum trickle_return trickle_swap( > > if (trickle_swap_cache_async(swp_entry, node) == TRICKLE_DELAY) > break; > + yield(); > } > > if (sp_stat.prefetched_pages) { yield() really sucks if there are a lot of runnable tasks. And the amount of CPU which that thread uses isn't likely to matter anyway. I think it'd be better to just not do this. Perhaps alter the thread's static priority instead? Does the scheduler have a knob which can be used to disable a tasks's dynamic priority boost heuristic? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-07 23:26 ` Andrew Morton @ 2006-03-07 23:32 ` Con Kolivas 2006-03-08 0:05 ` Andrew Morton 2006-03-08 13:36 ` [ck] " Con Kolivas 2006-03-08 8:48 ` [ck] Re: [PATCH] mm: yield during swap prefetching Andreas Mohr 1 sibling, 2 replies; 112+ messages in thread From: Con Kolivas @ 2006-03-07 23:32 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm, ck Andrew Morton writes: > Con Kolivas <kernel@kolivas.org> wrote: >> >> Swap prefetching doesn't use very much cpu but spends a lot of time waiting on >> disk in uninterruptible sleep. This means it won't get preempted often even at >> a low nice level since it is seen as sleeping most of the time. We want to >> minimise its cpu impact so yield where possible. >> >> Signed-off-by: Con Kolivas <kernel@kolivas.org> >> --- >> mm/swap_prefetch.c | 1 + >> 1 file changed, 1 insertion(+) >> >> Index: linux-2.6.15-ck5/mm/swap_prefetch.c >> =================================================================== >> --- linux-2.6.15-ck5.orig/mm/swap_prefetch.c 2006-03-02 14:00:46.000000000 +1100 >> +++ linux-2.6.15-ck5/mm/swap_prefetch.c 2006-03-08 08:49:32.000000000 +1100 >> @@ -421,6 +421,7 @@ static enum trickle_return trickle_swap( >> >> if (trickle_swap_cache_async(swp_entry, node) == TRICKLE_DELAY) >> break; >> + yield(); >> } >> >> if (sp_stat.prefetched_pages) { > > yield() really sucks if there are a lot of runnable tasks. And the amount > of CPU which that thread uses isn't likely to matter anyway. > > I think it'd be better to just not do this. Perhaps alter the thread's > static priority instead? Does the scheduler have a knob which can be used > to disable a tasks's dynamic priority boost heuristic? We do have SCHED_BATCH but even that doesn't really have the desired effect. I know how much yield sucks and I actually want it to suck as much as yield does. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-07 23:32 ` Con Kolivas @ 2006-03-08 0:05 ` Andrew Morton 2006-03-08 0:51 ` Con Kolivas 2006-03-08 22:24 ` Pavel Machek 2006-03-08 13:36 ` [ck] " Con Kolivas 1 sibling, 2 replies; 112+ messages in thread From: Andrew Morton @ 2006-03-08 0:05 UTC (permalink / raw) To: Con Kolivas; +Cc: linux-kernel, linux-mm, ck Con Kolivas <kernel@kolivas.org> wrote: > > > yield() really sucks if there are a lot of runnable tasks. And the amount > > of CPU which that thread uses isn't likely to matter anyway. > > > > I think it'd be better to just not do this. Perhaps alter the thread's > > static priority instead? Does the scheduler have a knob which can be used > > to disable a tasks's dynamic priority boost heuristic? > > We do have SCHED_BATCH but even that doesn't really have the desired effect. > I know how much yield sucks and I actually want it to suck as much as yield > does. Why do you want that? If prefetch is doing its job then it will save the machine from a pile of major faults in the near future. The fact that the machine happens to be running a number of busy tasks doesn't alter that. It's _worth_ stealing a few cycles from those tasks now to avoid lengthy D-state sleeps in the near future? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 0:05 ` Andrew Morton @ 2006-03-08 0:51 ` Con Kolivas 2006-03-08 1:11 ` Andrew Morton 2006-03-08 22:24 ` Pavel Machek 1 sibling, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-08 0:51 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm, ck On Wed, 8 Mar 2006 11:05 am, Andrew Morton wrote: > Con Kolivas <kernel@kolivas.org> wrote: > > > yield() really sucks if there are a lot of runnable tasks. And the > > > amount of CPU which that thread uses isn't likely to matter anyway. > > > > > > I think it'd be better to just not do this. Perhaps alter the thread's > > > static priority instead? Does the scheduler have a knob which can be > > > used to disable a tasks's dynamic priority boost heuristic? > > > > We do have SCHED_BATCH but even that doesn't really have the desired > > effect. I know how much yield sucks and I actually want it to suck as > > much as yield does. > > Why do you want that? > > If prefetch is doing its job then it will save the machine from a pile of > major faults in the near future. The fact that the machine happens to be > running a number of busy tasks doesn't alter that. It's _worth_ stealing a > few cycles from those tasks now to avoid lengthy D-state sleeps in the near > future? The test case is the 3d (gaming) app that uses 100% cpu. It never sets delay swap prefetch in any way so swap prefetching starts working. Once swap prefetching starts reading it is mostly in uninterruptible sleep and always wakes up on the active array ready for cpu, never expiring even with its miniscule timeslice. The 3d app is always expiring and landing on the expired array behind kprefetchd even though kprefetchd is nice 19. The practical upshot of all this is that kprefetchd does a lot of prefetching with 3d gaming going on, and no amount of priority fiddling stops it doing this. The disk access is noticeable during 3d gaming unfortunately. Yielding regularly means a heck of a lot less prefetching occurs and is no longer noticeable. When idle, yield()ing doesn't seem to adversely affect the effectiveness of the prefetching. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 0:51 ` Con Kolivas @ 2006-03-08 1:11 ` Andrew Morton 2006-03-08 1:12 ` Con Kolivas 0 siblings, 1 reply; 112+ messages in thread From: Andrew Morton @ 2006-03-08 1:11 UTC (permalink / raw) To: Con Kolivas; +Cc: linux-kernel, linux-mm, ck Con Kolivas <kernel@kolivas.org> wrote: > > On Wed, 8 Mar 2006 11:05 am, Andrew Morton wrote: > > Con Kolivas <kernel@kolivas.org> wrote: > > > > yield() really sucks if there are a lot of runnable tasks. And the > > > > amount of CPU which that thread uses isn't likely to matter anyway. > > > > > > > > I think it'd be better to just not do this. Perhaps alter the thread's > > > > static priority instead? Does the scheduler have a knob which can be > > > > used to disable a tasks's dynamic priority boost heuristic? > > > > > > We do have SCHED_BATCH but even that doesn't really have the desired > > > effect. I know how much yield sucks and I actually want it to suck as > > > much as yield does. > > > > Why do you want that? > > > > If prefetch is doing its job then it will save the machine from a pile of > > major faults in the near future. The fact that the machine happens to be > > running a number of busy tasks doesn't alter that. It's _worth_ stealing a > > few cycles from those tasks now to avoid lengthy D-state sleeps in the near > > future? > > The test case is the 3d (gaming) app that uses 100% cpu. It never sets delay > swap prefetch in any way so swap prefetching starts working. Once swap > prefetching starts reading it is mostly in uninterruptible sleep and always > wakes up on the active array ready for cpu, never expiring even with its > miniscule timeslice. The 3d app is always expiring and landing on the expired > array behind kprefetchd even though kprefetchd is nice 19. The practical > upshot of all this is that kprefetchd does a lot of prefetching with 3d > gaming going on, and no amount of priority fiddling stops it doing this. The > disk access is noticeable during 3d gaming unfortunately. Yielding regularly > means a heck of a lot less prefetching occurs and is no longer noticeable. > When idle, yield()ing doesn't seem to adversely affect the effectiveness of > the prefetching. > but, but. If prefetching is prefetching stuff which that game will soon use then it'll be an aggregate improvement. If prefetch is prefetching stuff which that game _won't_ use then prefetch is busted. Using yield() to artificially cripple kprefetchd is a rather sad workaround isn't it? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 1:11 ` Andrew Morton @ 2006-03-08 1:12 ` Con Kolivas 2006-03-08 1:19 ` Con Kolivas ` (2 more replies) 0 siblings, 3 replies; 112+ messages in thread From: Con Kolivas @ 2006-03-08 1:12 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm, ck On Wed, 8 Mar 2006 12:11 pm, Andrew Morton wrote: > Con Kolivas <kernel@kolivas.org> wrote: > > On Wed, 8 Mar 2006 11:05 am, Andrew Morton wrote: > > > Con Kolivas <kernel@kolivas.org> wrote: > > > > > yield() really sucks if there are a lot of runnable tasks. And the > > > > > amount of CPU which that thread uses isn't likely to matter anyway. > > > > > > > > > > I think it'd be better to just not do this. Perhaps alter the > > > > > thread's static priority instead? Does the scheduler have a knob > > > > > which can be used to disable a tasks's dynamic priority boost > > > > > heuristic? > > > > > > > > We do have SCHED_BATCH but even that doesn't really have the desired > > > > effect. I know how much yield sucks and I actually want it to suck as > > > > much as yield does. > > > > > > Why do you want that? > > > > > > If prefetch is doing its job then it will save the machine from a pile > > > of major faults in the near future. The fact that the machine happens > > > to be running a number of busy tasks doesn't alter that. It's _worth_ > > > stealing a few cycles from those tasks now to avoid lengthy D-state > > > sleeps in the near future? > > > > The test case is the 3d (gaming) app that uses 100% cpu. It never sets > > delay swap prefetch in any way so swap prefetching starts working. Once > > swap prefetching starts reading it is mostly in uninterruptible sleep and > > always wakes up on the active array ready for cpu, never expiring even > > with its miniscule timeslice. The 3d app is always expiring and landing > > on the expired array behind kprefetchd even though kprefetchd is nice 19. > > The practical upshot of all this is that kprefetchd does a lot of > > prefetching with 3d gaming going on, and no amount of priority fiddling > > stops it doing this. The disk access is noticeable during 3d gaming > > unfortunately. Yielding regularly means a heck of a lot less prefetching > > occurs and is no longer noticeable. When idle, yield()ing doesn't seem to > > adversely affect the effectiveness of the prefetching. > > but, but. If prefetching is prefetching stuff which that game will soon > use then it'll be an aggregate improvement. If prefetch is prefetching > stuff which that game _won't_ use then prefetch is busted. Using yield() > to artificially cripple kprefetchd is a rather sad workaround isn't it? It's not the stuff that it prefetches that's the problem; it's the disk access. Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 1:12 ` Con Kolivas @ 2006-03-08 1:19 ` Con Kolivas 2006-03-08 1:23 ` Andrew Morton 2006-03-09 8:57 ` Helge Hafting 2 siblings, 0 replies; 112+ messages in thread From: Con Kolivas @ 2006-03-08 1:19 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm, ck On Wed, 8 Mar 2006 12:12 pm, Con Kolivas wrote: > On Wed, 8 Mar 2006 12:11 pm, Andrew Morton wrote: > > Con Kolivas <kernel@kolivas.org> wrote: > > > On Wed, 8 Mar 2006 11:05 am, Andrew Morton wrote: > > > > Con Kolivas <kernel@kolivas.org> wrote: > > > > > > yield() really sucks if there are a lot of runnable tasks. And > > > > > > the amount of CPU which that thread uses isn't likely to matter > > > > > > anyway. > > > > > > > > > > > > I think it'd be better to just not do this. Perhaps alter the > > > > > > thread's static priority instead? Does the scheduler have a knob > > > > > > which can be used to disable a tasks's dynamic priority boost > > > > > > heuristic? > > > > > > > > > > We do have SCHED_BATCH but even that doesn't really have the > > > > > desired effect. I know how much yield sucks and I actually want it > > > > > to suck as much as yield does. > > > > > > > > Why do you want that? > > > > > > > > If prefetch is doing its job then it will save the machine from a > > > > pile of major faults in the near future. The fact that the machine > > > > happens to be running a number of busy tasks doesn't alter that. > > > > It's _worth_ stealing a few cycles from those tasks now to avoid > > > > lengthy D-state sleeps in the near future? > > > > > > The test case is the 3d (gaming) app that uses 100% cpu. It never sets > > > delay swap prefetch in any way so swap prefetching starts working. Once > > > swap prefetching starts reading it is mostly in uninterruptible sleep > > > and always wakes up on the active array ready for cpu, never expiring > > > even with its miniscule timeslice. The 3d app is always expiring and > > > landing on the expired array behind kprefetchd even though kprefetchd > > > is nice 19. The practical upshot of all this is that kprefetchd does a > > > lot of prefetching with 3d gaming going on, and no amount of priority > > > fiddling stops it doing this. The disk access is noticeable during 3d > > > gaming unfortunately. Yielding regularly means a heck of a lot less > > > prefetching occurs and is no longer noticeable. When idle, yield()ing > > > doesn't seem to adversely affect the effectiveness of the prefetching. > > > > but, but. If prefetching is prefetching stuff which that game will soon > > use then it'll be an aggregate improvement. If prefetch is prefetching > > stuff which that game _won't_ use then prefetch is busted. Using yield() > > to artificially cripple kprefetchd is a rather sad workaround isn't it? > > It's not the stuff that it prefetches that's the problem; it's the disk > access. I guess what I'm saying is there isn't enough information to delay swap prefetch when cpu usage is high which was my intention as well. Yielding has the desired effect without adding further accounting checks to swap_prefetch. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 1:12 ` Con Kolivas 2006-03-08 1:19 ` Con Kolivas @ 2006-03-08 1:23 ` Andrew Morton 2006-03-08 1:28 ` Con Kolivas 2006-03-09 8:57 ` Helge Hafting 2 siblings, 1 reply; 112+ messages in thread From: Andrew Morton @ 2006-03-08 1:23 UTC (permalink / raw) To: Con Kolivas; +Cc: linux-kernel, linux-mm, ck Con Kolivas <kernel@kolivas.org> wrote: > > > but, but. If prefetching is prefetching stuff which that game will soon > > use then it'll be an aggregate improvement. If prefetch is prefetching > > stuff which that game _won't_ use then prefetch is busted. Using yield() > > to artificially cripple kprefetchd is a rather sad workaround isn't it? > > It's not the stuff that it prefetches that's the problem; it's the disk > access. But the prefetch code tries to avoid prefetching when the disk is otherwise busy (or it should - we discussed that a bit a while ago). Sorry, I'm not trying to be awkward here - I think that nobbling prefetch when there's a lot of CPU activity is just the wrong thing to do and it'll harm other workloads. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 1:23 ` Andrew Morton @ 2006-03-08 1:28 ` Con Kolivas 2006-03-08 2:08 ` Lee Revell 2006-03-08 7:51 ` Jan Knutar 0 siblings, 2 replies; 112+ messages in thread From: Con Kolivas @ 2006-03-08 1:28 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm, ck On Wed, 8 Mar 2006 12:23 pm, Andrew Morton wrote: > Con Kolivas <kernel@kolivas.org> wrote: > > > but, but. If prefetching is prefetching stuff which that game will > > > soon use then it'll be an aggregate improvement. If prefetch is > > > prefetching stuff which that game _won't_ use then prefetch is busted. > > > Using yield() to artificially cripple kprefetchd is a rather sad > > > workaround isn't it? > > > > It's not the stuff that it prefetches that's the problem; it's the disk > > access. > > But the prefetch code tries to avoid prefetching when the disk is otherwise > busy (or it should - we discussed that a bit a while ago). Anything that does disk access delays prefetch fine. Things that only do heavy cpu do not delay prefetch. Anything reading from disk will be noticeable during 3d gaming. > Sorry, I'm not trying to be awkward here - I think that nobbling prefetch > when there's a lot of CPU activity is just the wrong thing to do and it'll > harm other workloads. I can't distinguish between when cpu activity is important (game) and when it is not (compile), and assuming worst case scenario and not doing any swap prefetching is my intent. I could add cpu accounting to prefetch_suitable() instead, but that gets rather messy and yielding achieves the same endpoint. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 1:28 ` Con Kolivas @ 2006-03-08 2:08 ` Lee Revell 2006-03-08 2:12 ` Con Kolivas 2006-03-08 7:51 ` Jan Knutar 1 sibling, 1 reply; 112+ messages in thread From: Lee Revell @ 2006-03-08 2:08 UTC (permalink / raw) To: Con Kolivas; +Cc: Andrew Morton, linux-kernel, linux-mm, ck On Wed, 2006-03-08 at 12:28 +1100, Con Kolivas wrote: > I can't distinguish between when cpu activity is important (game) and when it > is not (compile), and assuming worst case scenario and not doing any swap > prefetching is my intent. I could add cpu accounting to prefetch_suitable() > instead, but that gets rather messy and yielding achieves the same endpoint. Shouldn't the game be running with RT priority or at least at a low nice value? Lee ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 2:08 ` Lee Revell @ 2006-03-08 2:12 ` Con Kolivas 2006-03-08 2:18 ` Lee Revell 0 siblings, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-08 2:12 UTC (permalink / raw) To: Lee Revell; +Cc: Andrew Morton, linux-kernel, linux-mm, ck On Wed, 8 Mar 2006 01:08 pm, Lee Revell wrote: > On Wed, 2006-03-08 at 12:28 +1100, Con Kolivas wrote: > > I can't distinguish between when cpu activity is important (game) and > > when it is not (compile), and assuming worst case scenario and not doing > > any swap prefetching is my intent. I could add cpu accounting to > > prefetch_suitable() instead, but that gets rather messy and yielding > > achieves the same endpoint. > > Shouldn't the game be running with RT priority or at least at a low nice > value? No way. Games run nice 0 SCHED_NORMAL. Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 2:12 ` Con Kolivas @ 2006-03-08 2:18 ` Lee Revell 2006-03-08 2:22 ` Con Kolivas 0 siblings, 1 reply; 112+ messages in thread From: Lee Revell @ 2006-03-08 2:18 UTC (permalink / raw) To: Con Kolivas; +Cc: Andrew Morton, linux-kernel, linux-mm, ck On Wed, 2006-03-08 at 13:12 +1100, Con Kolivas wrote: > On Wed, 8 Mar 2006 01:08 pm, Lee Revell wrote: > > On Wed, 2006-03-08 at 12:28 +1100, Con Kolivas wrote: > > > I can't distinguish between when cpu activity is important (game) and > > > when it is not (compile), and assuming worst case scenario and not doing > > > any swap prefetching is my intent. I could add cpu accounting to > > > prefetch_suitable() instead, but that gets rather messy and yielding > > > achieves the same endpoint. > > > > Shouldn't the game be running with RT priority or at least at a low nice > > value? > > No way. Games run nice 0 SCHED_NORMAL. Maybe this is a stupid/OT question (answer off list if you think so) but why not? Isn't that the standard way of telling the scheduler that you have a realtime constraint? It's how pro audio stuff works which I would think has similar RT requirements. How is the scheduler supposed to know to penalize a kernel compile taking 100% CPU but not a game using 100% CPU? Lee ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 2:18 ` Lee Revell @ 2006-03-08 2:22 ` Con Kolivas 2006-03-08 2:27 ` Lee Revell 0 siblings, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-08 2:22 UTC (permalink / raw) To: Lee Revell; +Cc: Andrew Morton, linux-kernel, linux-mm, ck On Wed, 8 Mar 2006 01:18 pm, Lee Revell wrote: > On Wed, 2006-03-08 at 13:12 +1100, Con Kolivas wrote: > > On Wed, 8 Mar 2006 01:08 pm, Lee Revell wrote: > > > On Wed, 2006-03-08 at 12:28 +1100, Con Kolivas wrote: > > > > I can't distinguish between when cpu activity is important (game) and > > > > when it is not (compile), and assuming worst case scenario and not > > > > doing any swap prefetching is my intent. I could add cpu accounting > > > > to prefetch_suitable() instead, but that gets rather messy and > > > > yielding achieves the same endpoint. > > > > > > Shouldn't the game be running with RT priority or at least at a low > > > nice value? > > > > No way. Games run nice 0 SCHED_NORMAL. > > Maybe this is a stupid/OT question (answer off list if you think so) but > why not? Isn't that the standard way of telling the scheduler that you > have a realtime constraint? It's how pro audio stuff works which I > would think has similar RT requirements. > > How is the scheduler supposed to know to penalize a kernel compile > taking 100% CPU but not a game using 100% CPU? Because being a serious desktop operating system that we are (bwahahahaha) means the user should not have special privileges to run something as simple as a game. Games should not need special scheduling classes. We can always use 'nice' for a compile though. Real time audio is a completely different world to this. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 2:22 ` Con Kolivas @ 2006-03-08 2:27 ` Lee Revell 2006-03-08 2:30 ` Con Kolivas 0 siblings, 1 reply; 112+ messages in thread From: Lee Revell @ 2006-03-08 2:27 UTC (permalink / raw) To: Con Kolivas; +Cc: Andrew Morton, linux-kernel, linux-mm, ck On Wed, 2006-03-08 at 13:22 +1100, Con Kolivas wrote: > > How is the scheduler supposed to know to penalize a kernel compile > > taking 100% CPU but not a game using 100% CPU? > > Because being a serious desktop operating system that we are (bwahahahaha) > means the user should not have special privileges to run something as simple > as a game. Games should not need special scheduling classes. We can always > use 'nice' for a compile though. Real time audio is a completely different > world to this. Actually recent distros like the upcoming Ubuntu Dapper support the new RLIMIT_NICE and RLIMIT_RTPRIO so this would Just Work without any special privileges (well, not root anyway - you'd have to put the user in the right group and add one line to /etc/security/limits.conf). I think OSX also uses special scheduling classes for stuff with RT constraints. The only barrier I see is that games aren't specifically written to take advantage of RT scheduling because historically it's only been available to root. Lee ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 2:27 ` Lee Revell @ 2006-03-08 2:30 ` Con Kolivas 2006-03-08 2:52 ` [ck] " André Goddard Rosa 0 siblings, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-08 2:30 UTC (permalink / raw) To: Lee Revell; +Cc: Andrew Morton, linux-kernel, linux-mm, ck On Wed, 8 Mar 2006 01:27 pm, Lee Revell wrote: > On Wed, 2006-03-08 at 13:22 +1100, Con Kolivas wrote: > > > How is the scheduler supposed to know to penalize a kernel compile > > > taking 100% CPU but not a game using 100% CPU? > > > > Because being a serious desktop operating system that we are > > (bwahahahaha) means the user should not have special privileges to run > > something as simple as a game. Games should not need special scheduling > > classes. We can always use 'nice' for a compile though. Real time audio > > is a completely different world to this. > > Actually recent distros like the upcoming Ubuntu Dapper support the new > RLIMIT_NICE and RLIMIT_RTPRIO so this would Just Work without any > special privileges (well, not root anyway - you'd have to put the user > in the right group and add one line to /etc/security/limits.conf). > > I think OSX also uses special scheduling classes for stuff with RT > constraints. > > The only barrier I see is that games aren't specifically written to take > advantage of RT scheduling because historically it's only been available > to root. Well as I said in my previous reply, games should _not_ need special scheduling classes. They are not written in a real time smart way and they do not have any realtime constraints or requirements. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [ck] Re: [PATCH] mm: yield during swap prefetching 2006-03-08 2:30 ` Con Kolivas @ 2006-03-08 2:52 ` André Goddard Rosa 2006-03-08 3:03 ` Lee Revell 2006-03-08 3:05 ` Con Kolivas 0 siblings, 2 replies; 112+ messages in thread From: André Goddard Rosa @ 2006-03-08 2:52 UTC (permalink / raw) To: Con Kolivas; +Cc: Lee Revell, Andrew Morton, linux-mm, linux-kernel, ck [...] > > > Because being a serious desktop operating system that we are > > > (bwahahahaha) means the user should not have special privileges to run > > > something as simple as a game. Games should not need special scheduling > > > classes. We can always use 'nice' for a compile though. Real time audio > > > is a completely different world to this. [...] > Well as I said in my previous reply, games should _not_ need special > scheduling classes. They are not written in a real time smart way and they do > not have any realtime constraints or requirements. Sorry Con, but I have to disagree with you on this. Games are very complex software, involving heavy use of hardware resources and they also have a lot of time constraints. So, I think they should use RT priorities if it is necessary to get the resources needed in time. Thanks, -- []s, André Goddard ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [ck] Re: [PATCH] mm: yield during swap prefetching 2006-03-08 2:52 ` [ck] " André Goddard Rosa @ 2006-03-08 3:03 ` Lee Revell 2006-03-08 3:05 ` Con Kolivas 1 sibling, 0 replies; 112+ messages in thread From: Lee Revell @ 2006-03-08 3:03 UTC (permalink / raw) To: André Goddard Rosa Cc: Con Kolivas, Andrew Morton, linux-mm, linux-kernel, ck On Tue, 2006-03-07 at 22:52 -0400, André Goddard Rosa wrote: > Sorry Con, but I have to disagree with you on this. > > Games are very complex software, involving heavy use of hardware > resources > and they also have a lot of time constraints. So, I think they should > use RT priorities > if it is necessary to get the resources needed in time. > The main reason I assumed games would want to use the POSIX realtime features like priority scheduling etc. is that the simulation people all use them - it seems like a very similar problem. Lee ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 2:52 ` [ck] " André Goddard Rosa 2006-03-08 3:03 ` Lee Revell @ 2006-03-08 3:05 ` Con Kolivas 2006-03-08 21:07 ` Zan Lynx 1 sibling, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-08 3:05 UTC (permalink / raw) To: André Goddard Rosa Cc: Lee Revell, Andrew Morton, linux-mm, linux-kernel, ck [-- Attachment #1: Type: text/plain, Size: 2473 bytes --] André Goddard Rosa writes: > [...] >> > > Because being a serious desktop operating system that we are >> > > (bwahahahaha) means the user should not have special privileges to run >> > > something as simple as a game. Games should not need special scheduling >> > > classes. We can always use 'nice' for a compile though. Real time audio >> > > is a completely different world to this. > [...] >> Well as I said in my previous reply, games should _not_ need special >> scheduling classes. They are not written in a real time smart way and they do >> not have any realtime constraints or requirements. > > Sorry Con, but I have to disagree with you on this. > > Games are very complex software, involving heavy use of hardware resources > and they also have a lot of time constraints. So, I think they should > use RT priorities > if it is necessary to get the resources needed in time. Excellent, I've opened the can of worms. Yes, games are a in incredibly complex beast. No they shouldn't need real time scheduling to work well if they are coded properly. However, witness the fact that most of our games are windows ports, therefore being lower quality than the original. Witness also the fact that at last with dual core support, lots and lots (but not all) of windows games on _windows_ are having scheduling trouble and jerky playback, forcing them to crappily force binding to one cpu. As much as I'd love to blame windows, it is almost certainly due to the coding of the application since better games don't exhibit this problem. Now the games in question can't be trusted to even run on SMP; do you really think they could cope with good real time code? Good -complex- real time coding is very difficult. If you take any game out there that currently exists and throw real time scheduling at it, almost certainly it will hang the machine. No, I don't believe games need realtime scheduling to work well; they just need to be written well and the kernel needs to be unintrusive enough to work well with them. Otherwise gaming would have needed realtime scheduling from day one on all operating systems. Generic kernel activities should not cause game stuttering either as users have little control over them. I do expect users to not run too many userspace programs while trying to play games though. I do not believe we should make games work well in the presence of updatedb running for example. Cheers, Con [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 3:05 ` Con Kolivas @ 2006-03-08 21:07 ` Zan Lynx 2006-03-08 23:00 ` Con Kolivas 0 siblings, 1 reply; 112+ messages in thread From: Zan Lynx @ 2006-03-08 21:07 UTC (permalink / raw) To: Con Kolivas Cc: André Goddard Rosa, Lee Revell, Andrew Morton, linux-mm, linux-kernel, ck [-- Attachment #1: Type: text/plain, Size: 2429 bytes --] On Wed, 2006-03-08 at 14:05 +1100, Con Kolivas wrote: > André Goddard Rosa writes: > > > [...] > >> > > Because being a serious desktop operating system that we are > >> > > (bwahahahaha) means the user should not have special privileges to run > >> > > something as simple as a game. Games should not need special scheduling > >> > > classes. We can always use 'nice' for a compile though. Real time audio > >> > > is a completely different world to this. > > [...] > >> Well as I said in my previous reply, games should _not_ need special > >> scheduling classes. They are not written in a real time smart way and they do > >> not have any realtime constraints or requirements. > > > > Sorry Con, but I have to disagree with you on this. > > > > Games are very complex software, involving heavy use of hardware resources > > and they also have a lot of time constraints. So, I think they should > > use RT priorities > > if it is necessary to get the resources needed in time. > > Excellent, I've opened the can of worms. > > Yes, games are a in incredibly complex beast. > > No they shouldn't need real time scheduling to work well if they are coded > properly. However, witness the fact that most of our games are windows > ports, therefore being lower quality than the original. Witness also the > fact that at last with dual core support, lots and lots (but not all) of > windows games on _windows_ are having scheduling trouble and jerky playback, > forcing them to crappily force binding to one cpu. [snip] Games where you notice frame-rate chop because the *disk system* is using too much CPU are perfect examples of applications that should be using real-time. Multiple CPU cores and multithreading in games is another perfect example of programming that *needs* predictable real-time thread priorities. There is no other way to guarantee that physics processing takes priority over graphics updates or AI, once each task becomes separated from a monolithic main loop and spread over several CPU cores. Because games often *are* badly written, a user-friendly Linux gaming system does need a high-priority real-time task watching for a special keystroke, like C-A-Del for example, so that it can kill the other real-time tasks and return to the UI shell. Games and real-time go together like they were made for each other. -- Zan Lynx <zlynx@acm.org> [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 191 bytes --] ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 21:07 ` Zan Lynx @ 2006-03-08 23:00 ` Con Kolivas 2006-03-08 23:48 ` Zan Lynx 0 siblings, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-08 23:00 UTC (permalink / raw) To: Zan Lynx Cc: André Goddard Rosa, Lee Revell, Andrew Morton, linux-mm, linux-kernel, ck [-- Attachment #1: Type: text/plain, Size: 2593 bytes --] Zan Lynx writes: > On Wed, 2006-03-08 at 14:05 +1100, Con Kolivas wrote: >> André Goddard Rosa writes: >> >> > [...] >> >> > > Because being a serious desktop operating system that we are >> >> > > (bwahahahaha) means the user should not have special privileges to run >> >> > > something as simple as a game. Games should not need special scheduling >> >> > > classes. We can always use 'nice' for a compile though. Real time audio >> >> > > is a completely different world to this. >> > [...] >> >> Well as I said in my previous reply, games should _not_ need special >> >> scheduling classes. They are not written in a real time smart way and they do >> >> not have any realtime constraints or requirements. >> > >> > Sorry Con, but I have to disagree with you on this. >> > >> > Games are very complex software, involving heavy use of hardware resources >> > and they also have a lot of time constraints. So, I think they should >> > use RT priorities >> > if it is necessary to get the resources needed in time. >> >> Excellent, I've opened the can of worms. >> >> Yes, games are a in incredibly complex beast. >> >> No they shouldn't need real time scheduling to work well if they are coded >> properly. However, witness the fact that most of our games are windows >> ports, therefore being lower quality than the original. Witness also the >> fact that at last with dual core support, lots and lots (but not all) of >> windows games on _windows_ are having scheduling trouble and jerky playback, >> forcing them to crappily force binding to one cpu. > [snip] > > Games where you notice frame-rate chop because the *disk system* is > using too much CPU are perfect examples of applications that should be > using real-time. > > Multiple CPU cores and multithreading in games is another perfect > example of programming that *needs* predictable real-time thread > priorities. There is no other way to guarantee that physics processing > takes priority over graphics updates or AI, once each task becomes > separated from a monolithic main loop and spread over several CPU cores. > > Because games often *are* badly written, a user-friendly Linux gaming > system does need a high-priority real-time task watching for a special > keystroke, like C-A-Del for example, so that it can kill the other > real-time tasks and return to the UI shell. > > Games and real-time go together like they were made for each other. I guess every single well working windows game since the dawn of time is some sort of anomaly then. Cheers, Con [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 23:00 ` Con Kolivas @ 2006-03-08 23:48 ` Zan Lynx 2006-03-09 0:07 ` Con Kolivas 0 siblings, 1 reply; 112+ messages in thread From: Zan Lynx @ 2006-03-08 23:48 UTC (permalink / raw) To: Con Kolivas Cc: André Goddard Rosa, Lee Revell, Andrew Morton, linux-mm, linux-kernel, ck [-- Attachment #1: Type: text/plain, Size: 1083 bytes --] On Thu, 2006-03-09 at 10:00 +1100, Con Kolivas wrote: > Zan Lynx writes: [snip] > > Games and real-time go together like they were made for each other. > > I guess every single well working windows game since the dawn of time is > some sort of anomaly then. Yes, those Windows games are anomalies that rely on the OS scheduling them AS IF they were real-time, but without actually claiming that priority. Because these games just assume they own the whole system and aren't explicitly telling the OS about their real-time requirements, the OS has to guess instead and can get it wrong, especially when hardware capabilities advance in ways that force changes to the task scheduler (multi-core, hyper-threading). And you said it yourself, many old games don't work well on dual-core systems. I think your effort to improve the guessing is a good idea, and thanks. Just don't dismiss the idea that games do have real-time requirements and if they did things correctly, games would explicitly specify those requirements. -- Zan Lynx <zlynx@acm.org> [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 191 bytes --] ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 23:48 ` Zan Lynx @ 2006-03-09 0:07 ` Con Kolivas 2006-03-09 3:13 ` Zan Lynx 0 siblings, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-09 0:07 UTC (permalink / raw) To: Zan Lynx Cc: André Goddard Rosa, Lee Revell, Andrew Morton, linux-mm, linux-kernel, ck [-- Attachment #1: Type: text/plain, Size: 1403 bytes --] Zan Lynx writes: > On Thu, 2006-03-09 at 10:00 +1100, Con Kolivas wrote: >> Zan Lynx writes: > [snip] >> > Games and real-time go together like they were made for each other. >> >> I guess every single well working windows game since the dawn of time is >> some sort of anomaly then. > > Yes, those Windows games are anomalies that rely on the OS scheduling > them AS IF they were real-time, but without actually claiming that > priority. > > Because these games just assume they own the whole system and aren't > explicitly telling the OS about their real-time requirements, the OS has > to guess instead and can get it wrong, especially when hardware > capabilities advance in ways that force changes to the task scheduler > (multi-core, hyper-threading). And you said it yourself, many old games > don't work well on dual-core systems. > > I think your effort to improve the guessing is a good idea, and > thanks. > > Just don't dismiss the idea that games do have real-time requirements > and if they did things correctly, games would explicitly specify those > requirements. Games worked on windows for a decade on single core without real time scheduling because that's what they were written for. Now that games are written for windows with dual core they work well - again without real time scheduling. Why should a port of these games to linux require real time? Cheers, Con [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-09 0:07 ` Con Kolivas @ 2006-03-09 3:13 ` Zan Lynx 2006-03-09 4:08 ` Con Kolivas 0 siblings, 1 reply; 112+ messages in thread From: Zan Lynx @ 2006-03-09 3:13 UTC (permalink / raw) To: Con Kolivas Cc: André Goddard Rosa, Lee Revell, Andrew Morton, linux-mm, linux-kernel, ck [-- Attachment #1: Type: text/plain, Size: 1298 bytes --] On Thu, 2006-03-09 at 11:07 +1100, Con Kolivas wrote: > Games worked on windows for a decade on single core without real time > scheduling because that's what they were written for. > > Now that games are written for windows with dual core they work well - > again > without real time scheduling. > > Why should a port of these games to linux require real time? That isn't what I said. I said nothing about *requiring* anything, only about how to do it better. Here is what Con said that I was disagreeing with. All the rest was to justify my disagreement. Con said, "... games should _not_ need special scheduling classes. They are not written in a real time smart way and they do not have any realtime constraints or requirements." And he said later, "No they shouldn't need real time scheduling to work well if they are coded properly." Here is a list of simple statements of what I am saying: Games do have real-time requirements. The OS guessing about real-time priorities will sometimes get it wrong. Guessing task priority is worse than being told and knowing for sure. Games should, in an ideal world, be using real-time OS scheduling. Games would work better using real-time OS scheduling. That is all from me. -- Zan Lynx <zlynx@acm.org> [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 191 bytes --] ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-09 3:13 ` Zan Lynx @ 2006-03-09 4:08 ` Con Kolivas 2006-03-09 4:54 ` Lee Revell 0 siblings, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-09 4:08 UTC (permalink / raw) To: Zan Lynx Cc: André Goddard Rosa, Lee Revell, Andrew Morton, linux-mm, linux-kernel, ck [-- Attachment #1: Type: text/plain, Size: 1667 bytes --] Zan Lynx writes: > On Thu, 2006-03-09 at 11:07 +1100, Con Kolivas wrote: >> Games worked on windows for a decade on single core without real time >> scheduling because that's what they were written for. >> >> Now that games are written for windows with dual core they work well - >> again >> without real time scheduling. >> >> Why should a port of these games to linux require real time? > > That isn't what I said. I said nothing about *requiring* anything, only > about how to do it better. > > Here is what Con said that I was disagreeing with. All the rest was to > justify my disagreement. > > Con said, "... games should _not_ need special scheduling classes. They > are not written in a real time smart way and they do not have any > realtime constraints or requirements." > > And he said later, "No they shouldn't need real time scheduling to work > well if they are coded properly." > > Here is a list of simple statements of what I am saying: > Games do have real-time requirements. > The OS guessing about real-time priorities will sometimes get it wrong. > Guessing task priority is worse than being told and knowing for sure. > Games should, in an ideal world, be using real-time OS scheduling. > Games would work better using real-time OS scheduling. At the risk of being repetitive to the point of tiresome, my point is that there are no real time requirements in games. You're assuming that everything will be better if we assume that there are rt requirements and that we're simulating pseudo real time conditions currently. That's just not the case and never has been. That's why it has worked fine for so long. Cheers, Con [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-09 4:08 ` Con Kolivas @ 2006-03-09 4:54 ` Lee Revell 0 siblings, 0 replies; 112+ messages in thread From: Lee Revell @ 2006-03-09 4:54 UTC (permalink / raw) To: Con Kolivas Cc: Zan Lynx, André Goddard Rosa, Andrew Morton, linux-mm, linux-kernel, ck On Thu, 2006-03-09 at 15:08 +1100, Con Kolivas wrote: > > Games do have real-time requirements. > > The OS guessing about real-time priorities will sometimes get it wrong. > > Guessing task priority is worse than being told and knowing for sure. > > Games should, in an ideal world, be using real-time OS scheduling. > > Games would work better using real-time OS scheduling. > > At the risk of being repetitive to the point of tiresome, my point is that > there are no real time requirements in games. You're assuming that > everything will be better if we assume that there are rt requirements and > that we're simulating pseudo real time conditions currently. That's just not > the case and never has been. That's why it has worked fine for so long. I think you are talking past each other, and are both right - Con is saying games don't need realtime scheduling (SCHED_FIFO, low nice value, whatever) to function correctly (true), while Zan is saying that games have RT constraints in that they must react as fast as possible to user input (also true). Anyway, this is getting OT, I wish I had not raised this issue in this thread. Lee ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 1:28 ` Con Kolivas 2006-03-08 2:08 ` Lee Revell @ 2006-03-08 7:51 ` Jan Knutar 2006-03-08 8:39 ` Con Kolivas 1 sibling, 1 reply; 112+ messages in thread From: Jan Knutar @ 2006-03-08 7:51 UTC (permalink / raw) To: Con Kolivas; +Cc: Andrew Morton, linux-kernel, linux-mm, ck On Wednesday 08 March 2006 03:28, Con Kolivas wrote: > Anything that does disk access delays prefetch fine. Things that only do heavy > cpu do not delay prefetch. Anything reading from disk will be noticeable > during 3d gaming. What exactly makes the disk accesses noticeable? Is it because they steal time from the disk that the game otherwise would need, or do the disk accesses themselves consume noticeable amounts of CPU time? Or, do bits of the game's executable drop from memory to make room for the new stuff being pulled in from memory, causing the game to halt while it waits for its pages to come back? On a related note, through advanced use of handwaving and guessing, this seems to be the thing that kills my destop experience (*buzzword alert*) most often. Checksumming a large file seems to be less of an impact than things that seek alot, like updatedb. I remember playing vegastrike on my linux desktop machine few years ago, the game leaked so much memory that it filled my 2G swap rather often, unleashing OOM killer mayhem. I "solved" this by putting swap on floppy at lower priority than the 2G, and a 128M swap file as "backup" at even lower priority than the floppy. I didn't notice the swapping to harddrive, but when it started to swap to floppy, it made the game run a bit slower for a few seconds, plus the floppy light went on, and I knew I had 128M left to save my position and quit. If I needed floppy to make disk access noticeable on my very low end machine... What are these new fancy things doing to make HD access noticeable? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 7:51 ` Jan Knutar @ 2006-03-08 8:39 ` Con Kolivas 0 siblings, 0 replies; 112+ messages in thread From: Con Kolivas @ 2006-03-08 8:39 UTC (permalink / raw) To: Jan Knutar; +Cc: Andrew Morton, linux-kernel, linux-mm, ck On Wednesday 08 March 2006 18:51, Jan Knutar wrote: > On Wednesday 08 March 2006 03:28, Con Kolivas wrote: > > Anything that does disk access delays prefetch fine. Things that only do > > heavy cpu do not delay prefetch. Anything reading from disk will be > > noticeable during 3d gaming. > > What exactly makes the disk accesses noticeable? Is it because they steal > time from the disk that the game otherwise would need, or do the disk > accesses themselves consume noticeable amounts of CPU time? > Or, do bits of the game's executable drop from memory to make room for the > new stuff being pulled in from memory, causing the game to halt while it > waits for its pages to come back? On a related note, through advanced use > of handwaving and guessing, this seems to be the thing that kills my destop > experience (*buzzword alert*) most often. Checksumming a large file seems > to be less of an impact than things that seek alot, like updatedb. > > I remember playing vegastrike on my linux desktop machine few years ago, > the game leaked so much memory that it filled my 2G swap rather often, > unleashing OOM killer mayhem. I "solved" this by putting swap on floppy at > lower priority than the 2G, and a 128M swap file as "backup" at even lower > priority than the floppy. I didn't notice the swapping to harddrive, but > when it started to swap to floppy, it made the game run a bit slower for a > few seconds, plus the floppy light went on, and I knew I had 128M left to > save my position and quit. > > If I needed floppy to make disk access noticeable on my very low end > machine... What are these new fancy things doing to make HD access > noticeable? It's the cumulative effect of the cpu used by the in kernel code paths and the kprefetchd kernel thread. Even running ultra low priority, if they read a lot from the hard drive it costs us cpu time (seen as I/O wait in top for example). Swap prefetch _never_ displaces anything from ram; it only ever reads things from swap if there is generous free ram available. Not only that but if it reads something from swap it is put at the end of the "least recently used" list meaning that if _anything_ needs ram, these are the first things displaced again. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 1:12 ` Con Kolivas 2006-03-08 1:19 ` Con Kolivas 2006-03-08 1:23 ` Andrew Morton @ 2006-03-09 8:57 ` Helge Hafting 2006-03-09 9:08 ` Con Kolivas 2 siblings, 1 reply; 112+ messages in thread From: Helge Hafting @ 2006-03-09 8:57 UTC (permalink / raw) To: Con Kolivas; +Cc: Andrew Morton, linux-kernel, linux-mm, ck Con Kolivas wrote: >On Wed, 8 Mar 2006 12:11 pm, Andrew Morton wrote: > > >>but, but. If prefetching is prefetching stuff which that game will soon >>use then it'll be an aggregate improvement. If prefetch is prefetching >>stuff which that game _won't_ use then prefetch is busted. Using yield() >>to artificially cripple kprefetchd is a rather sad workaround isn't it? >> >> > >It's not the stuff that it prefetches that's the problem; it's the disk >access. > > Well, seems you have some sorry kind of disk driver then? An ide disk not using dma? A low-cpu task that only abuses the disk shouldn't make an impact on a 3D game that hogs the cpu only. Unless the driver for your harddisk is faulty, using way more cpu than it need. Use hdparm, check the basics: unmaksirq=1, using_dma=1, multcount is some positive number, such as 8 or 16, readahead is some positive number. Also use hdparm -i and verify that the disk is using some nice udma mode. (too old for that, and it probably isn't worth optimizing this for...) Also make sure the disk driver isn't sharing an irq with the 3D card. Come to think of it, if your 3D game happens to saturate the pci bus for long times, then disk accesses might indeed be noticeable as they too need the bus. Check if going to a slower dma mode helps - this might free up the bus a bit. Helge Hafting ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-09 8:57 ` Helge Hafting @ 2006-03-09 9:08 ` Con Kolivas [not found] ` <4410AFD3.7090505@bigpond.net.au> 0 siblings, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-09 9:08 UTC (permalink / raw) To: Helge Hafting; +Cc: Andrew Morton, linux-kernel, linux-mm, ck On Thursday 09 March 2006 19:57, Helge Hafting wrote: > Con Kolivas wrote: > >On Wed, 8 Mar 2006 12:11 pm, Andrew Morton wrote: > >>but, but. If prefetching is prefetching stuff which that game will soon > >>use then it'll be an aggregate improvement. If prefetch is prefetching > >>stuff which that game _won't_ use then prefetch is busted. Using yield() > >>to artificially cripple kprefetchd is a rather sad workaround isn't it? > > > >It's not the stuff that it prefetches that's the problem; it's the disk > >access. > > Well, seems you have some sorry kind of disk driver then? > An ide disk not using dma? > > A low-cpu task that only abuses the disk shouldn't make an impact > on a 3D game that hogs the cpu only. Unless the driver for your > harddisk is faulty, using way more cpu than it need. > > Use hdparm, check the basics: > unmaksirq=1, using_dma=1, multcount is some positive number, > such as 8 or 16, readahead is some positive number. > Also use hdparm -i and verify that the disk is using some > nice udma mode. (too old for that, and it probably isn't worth > optimizing this for...) > > Also make sure the disk driver isn't sharing an irq with the > 3D card. > > Come to think of it, if your 3D game happens to saturate the > pci bus for long times, then disk accesses might indeed > be noticeable as they too need the bus. Check if going to > a slower dma mode helps - this might free up the bus a bit. Thanks for the hints. However I actually wrote the swap prefetch code and this is all about changing its behaviour to make it do what I want. The problem is that nice 19 will give it up to 5% cpu in the presence of a nice 0 task when I really don't want swap prefetch doing anything. Furthermore because it is constantly waking up from sleep (after disk activity) it is always given lower latency scheduling than a fully cpu bound nice 0 task - this is normally appropriate behaviour. Yielding regularly works around that issue. Ideally taking into account cpu usage and only working below a certain cpu threshold may be the better mechanism and it does appear this would be more popular. It would not be hard to implement, but does add yet more code to an increasingly complex heuristic used to detect "idleness". I am seriously considering it. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
[parent not found: <4410AFD3.7090505@bigpond.net.au>]
* Re: [ck] Re: [PATCH] mm: yield during swap prefetching [not found] ` <4410AFD3.7090505@bigpond.net.au> @ 2006-03-10 9:01 ` Andreas Mohr 2006-03-10 9:11 ` Con Kolivas 0 siblings, 1 reply; 112+ messages in thread From: Andreas Mohr @ 2006-03-10 9:01 UTC (permalink / raw) To: Peter Williams Cc: Con Kolivas, Andrew Morton, linux-mm, linux-kernel, ck, Helge Hafting Hi, On Fri, Mar 10, 2006 at 09:44:35AM +1100, Peter Williams wrote: > I'm working on a patch to add soft and hard CPU rate caps to the > scheduler and the soft caps may be useful for what you're trying to do. > They are a generalization of your SCHED_BATCH implementation in > staircase (which would have been better called SCHED_BACKGROUND :-) Which SCHED_BATCH? ;) I only know it as SCHED_IDLEPRIO, which, come to think of it, is a better name, I believe :-) (renamed due to mainline introducing a *different* SCHED_BATCH mechanism) > IMHO) in that a task with a soft cap will only use more CPU than that > cap if it (the cpu) would otherwise go unused. The main difference > between this mechanism and staircase's SCHED_BATCH mechanism is that you > can specify how much (as parts per thousand of a CPU) the task can use > instead of just being background or not background. With the soft cap > set to zero the effect would be essentially the same. Interesting. Hopefully it will bring some nice results! Andreas ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [ck] Re: [PATCH] mm: yield during swap prefetching 2006-03-10 9:01 ` [ck] " Andreas Mohr @ 2006-03-10 9:11 ` Con Kolivas 0 siblings, 0 replies; 112+ messages in thread From: Con Kolivas @ 2006-03-10 9:11 UTC (permalink / raw) To: Andreas Mohr Cc: Peter Williams, Andrew Morton, linux-mm, linux-kernel, ck, Helge Hafting On Friday 10 March 2006 20:01, Andreas Mohr wrote: > Hi, > > On Fri, Mar 10, 2006 at 09:44:35AM +1100, Peter Williams wrote: > > I'm working on a patch to add soft and hard CPU rate caps to the > > scheduler and the soft caps may be useful for what you're trying to do. > > They are a generalization of your SCHED_BATCH implementation in > > staircase (which would have been better called SCHED_BACKGROUND :-) > > Which SCHED_BATCH? ;) I only know it as SCHED_IDLEPRIO, which, come to > think of it, is a better name, I believe :-) > (renamed due to mainline introducing a *different* SCHED_BATCH mechanism) Just to clarify what Andreas is saying: I was forced to rename my SCHED_BATCH to SCHED_IDLEPRIO which is a more descriptive name anyway. That is in my 2.6.16-rc based patches. SCHED_BATCH as you know is now used to mean "don't treat me as interactive" so I'm using this policy naming in 2.6.16- based patches. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 0:05 ` Andrew Morton 2006-03-08 0:51 ` Con Kolivas @ 2006-03-08 22:24 ` Pavel Machek 2006-03-09 2:22 ` Nick Piggin 1 sibling, 1 reply; 112+ messages in thread From: Pavel Machek @ 2006-03-08 22:24 UTC (permalink / raw) To: Andrew Morton; +Cc: Con Kolivas, linux-kernel, linux-mm, ck On Út 07-03-06 16:05:15, Andrew Morton wrote: > Con Kolivas <kernel@kolivas.org> wrote: > > > > > yield() really sucks if there are a lot of runnable tasks. And the amount > > > of CPU which that thread uses isn't likely to matter anyway. > > > > > > I think it'd be better to just not do this. Perhaps alter the thread's > > > static priority instead? Does the scheduler have a knob which can be used > > > to disable a tasks's dynamic priority boost heuristic? > > > > We do have SCHED_BATCH but even that doesn't really have the desired effect. > > I know how much yield sucks and I actually want it to suck as much as yield > > does. > > Why do you want that? > > If prefetch is doing its job then it will save the machine from a pile of > major faults in the near future. The fact that the machine happens Or maybe not.... it is prefetch, it may prefetch wrongly, and you definitely want it doing nothing when system is loaded.... It only makes sense to prefetch when system is idle. Pavel -- Web maintainer for suspend.sf.net (www.sf.net/projects/suspend) wanted... ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-08 22:24 ` Pavel Machek @ 2006-03-09 2:22 ` Nick Piggin 2006-03-09 2:30 ` Con Kolivas 0 siblings, 1 reply; 112+ messages in thread From: Nick Piggin @ 2006-03-09 2:22 UTC (permalink / raw) To: Pavel Machek; +Cc: Andrew Morton, Con Kolivas, linux-kernel, linux-mm, ck Pavel Machek wrote: >On Út 07-03-06 16:05:15, Andrew Morton wrote: > >>Why do you want that? >> >>If prefetch is doing its job then it will save the machine from a pile of >>major faults in the near future. The fact that the machine happens >> > >Or maybe not.... it is prefetch, it may prefetch wrongly, and you >definitely want it doing nothing when system is loaded.... It only >makes sense to prefetch when system is idle. > Right. Prefetching is obviously going to have a very low work/benefit, assuming your page reclaim is working properly, because a) it doesn't deal with file pages, and b) it is doing work to reclaim pages that have already been deemed to be the least important. What it is good for is working around our interesting VM that apparently allows updatedb to swap everything out (although I haven't seen this problem myself), and artificial memory hogs. By moving work to times of low cost. No problem with the theory behind it. So as much as a major fault costs in terms of performance, the tiny chance that prefetching will avoid it means even the CPU usage is questionable. Using sched_yield() seems like a hack though. -- Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-09 2:22 ` Nick Piggin @ 2006-03-09 2:30 ` Con Kolivas 2006-03-09 2:57 ` Nick Piggin 0 siblings, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-09 2:30 UTC (permalink / raw) To: Nick Piggin; +Cc: Pavel Machek, Andrew Morton, linux-kernel, linux-mm, ck On Thu, 9 Mar 2006 01:22 pm, Nick Piggin wrote: > Pavel Machek wrote: > >On Út 07-03-06 16:05:15, Andrew Morton wrote: > >>Why do you want that? > >> > >>If prefetch is doing its job then it will save the machine from a pile of > >>major faults in the near future. The fact that the machine happens > > > >Or maybe not.... it is prefetch, it may prefetch wrongly, and you > >definitely want it doing nothing when system is loaded.... It only > >makes sense to prefetch when system is idle. > > Right. Prefetching is obviously going to have a very low work/benefit, > assuming your page reclaim is working properly, because a) it doesn't > deal with file pages, and b) it is doing work to reclaim pages that > have already been deemed to be the least important. > > What it is good for is working around our interesting VM that apparently > allows updatedb to swap everything out (although I haven't seen this > problem myself), and artificial memory hogs. By moving work to times of > low cost. No problem with the theory behind it. > > So as much as a major fault costs in terms of performance, the tiny > chance that prefetching will avoid it means even the CPU usage is > questionable. Using sched_yield() seems like a hack though. Yeah it's a hack alright. Funny how at last I find a place where yield does exactly what I want and because we hate yield so much noone wants me to use it all. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-09 2:30 ` Con Kolivas @ 2006-03-09 2:57 ` Nick Piggin 2006-03-09 9:11 ` Con Kolivas 0 siblings, 1 reply; 112+ messages in thread From: Nick Piggin @ 2006-03-09 2:57 UTC (permalink / raw) To: Con Kolivas; +Cc: Pavel Machek, Andrew Morton, linux-kernel, linux-mm, ck Con Kolivas wrote: >On Thu, 9 Mar 2006 01:22 pm, Nick Piggin wrote: > >> >>So as much as a major fault costs in terms of performance, the tiny >>chance that prefetching will avoid it means even the CPU usage is >>questionable. Using sched_yield() seems like a hack though. >> > >Yeah it's a hack alright. Funny how at last I find a place where yield does >exactly what I want and because we hate yield so much noone wants me to use >it all. > > AFAIKS it is a hack for the same reason using it for locking is a hack, it's just that prefetch doesn't care if it doesn't get the CPU back for a while. Given a yield implementation which does something completely different for SCHED_OTHER tasks, you code may find it doesn't work so well anymore. This is no different to the java folk using it with decent results for locking. Just because it happened to work OK for them at the time didn't mean it was the right thing to do. I have always maintained that a SCHED_OTHER task calling sched_yield is basically a bug because it is utterly undefined behaviour. But being an in-kernel user that "knows" the implementation sort of does the right thin, maybe you justify it that way. -- Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: yield during swap prefetching 2006-03-09 2:57 ` Nick Piggin @ 2006-03-09 9:11 ` Con Kolivas 0 siblings, 0 replies; 112+ messages in thread From: Con Kolivas @ 2006-03-09 9:11 UTC (permalink / raw) To: Nick Piggin; +Cc: Pavel Machek, Andrew Morton, linux-kernel, linux-mm, ck On Thursday 09 March 2006 13:57, Nick Piggin wrote: > Con Kolivas wrote: > >On Thu, 9 Mar 2006 01:22 pm, Nick Piggin wrote: > >>So as much as a major fault costs in terms of performance, the tiny > >>chance that prefetching will avoid it means even the CPU usage is > >>questionable. Using sched_yield() seems like a hack though. > > > >Yeah it's a hack alright. Funny how at last I find a place where yield > > does exactly what I want and because we hate yield so much noone wants me > > to use it all. > > AFAIKS it is a hack for the same reason using it for locking is a hack, > it's just that prefetch doesn't care if it doesn't get the CPU back for > a while. > > Given a yield implementation which does something completely different > for SCHED_OTHER tasks, you code may find it doesn't work so well anymore. > This is no different to the java folk using it with decent results for > locking. Just because it happened to work OK for them at the time didn't > mean it was the right thing to do. > > I have always maintained that a SCHED_OTHER task calling sched_yield > is basically a bug because it is utterly undefined behaviour. > > But being an in-kernel user that "knows" the implementation sort of does > the right thin, maybe you justify it that way. You're right. Even if I do know exactly how yield works and am using it to my advantage, any solution that depends on the way yield works may well not work in the future. It does look like I should just check cpu usage as well in prefetch_suitable(). That will probably be the best generalised solution to this. Thanks. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [ck] Re: [PATCH] mm: yield during swap prefetching 2006-03-07 23:32 ` Con Kolivas 2006-03-08 0:05 ` Andrew Morton @ 2006-03-08 13:36 ` Con Kolivas 2006-03-17 9:06 ` Ingo Molnar 1 sibling, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-08 13:36 UTC (permalink / raw) To: ck; +Cc: Andrew Morton, linux-mm, linux-kernel, Ingo Molnar cc'ing Ingo... On Wednesday 08 March 2006 10:32, Con Kolivas wrote: > Andrew Morton writes: > > Con Kolivas <kernel@kolivas.org> wrote: > >> Swap prefetching doesn't use very much cpu but spends a lot of time > >> waiting on disk in uninterruptible sleep. This means it won't get > >> preempted often even at a low nice level since it is seen as sleeping > >> most of the time. We want to minimise its cpu impact so yield where > >> possible. > >> > >> Signed-off-by: Con Kolivas <kernel@kolivas.org> > >> --- > >> mm/swap_prefetch.c | 1 + > >> 1 file changed, 1 insertion(+) > >> > >> Index: linux-2.6.15-ck5/mm/swap_prefetch.c > >> =================================================================== > >> --- linux-2.6.15-ck5.orig/mm/swap_prefetch.c 2006-03-02 > >> 14:00:46.000000000 +1100 +++ > >> linux-2.6.15-ck5/mm/swap_prefetch.c 2006-03-08 08:49:32.000000000 +1100 > >> @@ -421,6 +421,7 @@ static enum trickle_return trickle_swap( > >> > >> if (trickle_swap_cache_async(swp_entry, node) == TRICKLE_DELAY) > >> break; > >> + yield(); > >> } > >> > >> if (sp_stat.prefetched_pages) { > > > > yield() really sucks if there are a lot of runnable tasks. And the > > amount of CPU which that thread uses isn't likely to matter anyway. > > > > I think it'd be better to just not do this. Perhaps alter the thread's > > static priority instead? Does the scheduler have a knob which can be > > used to disable a tasks's dynamic priority boost heuristic? > > We do have SCHED_BATCH but even that doesn't really have the desired > effect. I know how much yield sucks and I actually want it to suck as much > as yield does. Thinking some more on this I wonder if SCHED_BATCH isn't a strong enough scheduling hint if it's not suitable for such an application. Ingo do you think we could make SCHED_BATCH tasks always wake up on the expired array? Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [ck] Re: [PATCH] mm: yield during swap prefetching 2006-03-08 13:36 ` [ck] " Con Kolivas @ 2006-03-17 9:06 ` Ingo Molnar 2006-03-17 10:46 ` interactive task starvation Mike Galbraith 2006-03-17 12:38 ` [PATCH] sched: activate SCHED BATCH expired Con Kolivas 0 siblings, 2 replies; 112+ messages in thread From: Ingo Molnar @ 2006-03-17 9:06 UTC (permalink / raw) To: Con Kolivas; +Cc: ck, Andrew Morton, linux-mm, linux-kernel * Con Kolivas <kernel@kolivas.org> wrote: > > We do have SCHED_BATCH but even that doesn't really have the desired > > effect. I know how much yield sucks and I actually want it to suck as much > > as yield does. > > Thinking some more on this I wonder if SCHED_BATCH isn't a strong > enough scheduling hint if it's not suitable for such an application. > Ingo do you think we could make SCHED_BATCH tasks always wake up on > the expired array? yep, i think that's a good idea. In the worst case the starvation timeout should kick in. Ingo ^ permalink raw reply [flat|nested] 112+ messages in thread
* interactive task starvation 2006-03-17 9:06 ` Ingo Molnar @ 2006-03-17 10:46 ` Mike Galbraith 2006-03-17 17:15 ` Mike Galbraith 2006-03-17 12:38 ` [PATCH] sched: activate SCHED BATCH expired Con Kolivas 1 sibling, 1 reply; 112+ messages in thread From: Mike Galbraith @ 2006-03-17 10:46 UTC (permalink / raw) To: Ingo Molnar; +Cc: lkml On Fri, 2006-03-17 at 10:06 +0100, Ingo Molnar wrote: > yep, i think that's a good idea. In the worst case the starvation > timeout should kick in. (I didn't want to hijack that thread ergo name change) Speaking of the starvation timeout... I'm beginning to wonder if it might not be a good idea to always have an expired_timestamp to ensure that there is a limit to how long interactive tasks can starve _each other_. Yesterday, I ran some tests with apache, and ended up waiting for over 3 minutes for a netstat| grep :81|wc -l to finish when competing with 10 copies of httpd. The problem with the expired_timestamp is that if there is nobody already expired, and if no non-interactive task exists, there's certainly no expired_timestamp, there's no starvation limit. There are other ways to cure 'interactive starvation', but forcing an array switch if a non-interactive task hasn't run for pick-a-number time is the easiest. -Mike (yup, folks would certainly feel it, and would _very_ likely gripe, so it would probably have to be configurable) ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-17 10:46 ` interactive task starvation Mike Galbraith @ 2006-03-17 17:15 ` Mike Galbraith 2006-03-20 7:09 ` Mike Galbraith 0 siblings, 1 reply; 112+ messages in thread From: Mike Galbraith @ 2006-03-17 17:15 UTC (permalink / raw) To: Ingo Molnar; +Cc: lkml On Fri, 2006-03-17 at 11:46 +0100, Mike Galbraith wrote: > On Fri, 2006-03-17 at 10:06 +0100, Ingo Molnar wrote: > > > yep, i think that's a good idea. In the worst case the starvation > > timeout should kick in. > > (I didn't want to hijack that thread ergo name change) > > Speaking of the starvation timeout... > <snip day late $ short idea> Problem solved. I now know why the starvation logic doesn't work. Wakeups. In the face of 10+ copies of httpd constantly waking up, it seems it just takes ages to get around to switching arrays. With the (urp) patch below, I now get... [root]:# time netstat|grep :81|wc -l 1648 real 0m27.735s user 0m0.158s sys 0m0.111s [root]:# time netstat|grep :81|wc -l 1817 real 0m13.550s user 0m0.121s sys 0m0.186s [root]:# time netstat|grep :81|wc -l 1641 real 0m17.022s user 0m0.132s sys 0m0.143s [root]:# which certainly isn't pleasant, but it beats the heck out of minutes. -Mike --- kernel/sched.c.org 2006-03-17 14:48:35.000000000 +0100 +++ kernel/sched.c 2006-03-17 17:41:25.000000000 +0100 @@ -662,11 +662,30 @@ } /* + * We place interactive tasks back into the active array, if possible. + * + * To guarantee that this does not starve expired tasks we ignore the + * interactivity of a task if the first expired task had to wait more + * than a 'reasonable' amount of time. This deadline timeout is + * load-dependent, as the frequency of array switched decreases with + * increasing number of running tasks. We also ignore the interactivity + * if a better static_prio task has expired: + */ +#define EXPIRED_STARVING(rq) \ + ((STARVATION_LIMIT && ((rq)->expired_timestamp && \ + (jiffies - (rq)->expired_timestamp >= \ + STARVATION_LIMIT * ((rq)->nr_running) + 1))) || \ + ((rq)->curr->static_prio > (rq)->best_expired_prio)) + +/* * __activate_task - move a task to the runqueue. */ static inline void __activate_task(task_t *p, runqueue_t *rq) { - enqueue_task(p, rq->active); + prio_array_t *array = rq->active; + if (unlikely(EXPIRED_STARVING(rq))) + array = rq->expired; + enqueue_task(p, array); rq->nr_running++; } @@ -2461,22 +2480,6 @@ } /* - * We place interactive tasks back into the active array, if possible. - * - * To guarantee that this does not starve expired tasks we ignore the - * interactivity of a task if the first expired task had to wait more - * than a 'reasonable' amount of time. This deadline timeout is - * load-dependent, as the frequency of array switched decreases with - * increasing number of running tasks. We also ignore the interactivity - * if a better static_prio task has expired: - */ -#define EXPIRED_STARVING(rq) \ - ((STARVATION_LIMIT && ((rq)->expired_timestamp && \ - (jiffies - (rq)->expired_timestamp >= \ - STARVATION_LIMIT * ((rq)->nr_running) + 1))) || \ - ((rq)->curr->static_prio > (rq)->best_expired_prio)) - -/* * Account user cpu time to a process. * @p: the process that the cpu time gets accounted to * @hardirq_offset: the offset to subtract from hardirq_count() ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-17 17:15 ` Mike Galbraith @ 2006-03-20 7:09 ` Mike Galbraith 2006-03-20 10:22 ` Ingo Molnar 2006-03-21 6:47 ` Willy Tarreau 0 siblings, 2 replies; 112+ messages in thread From: Mike Galbraith @ 2006-03-20 7:09 UTC (permalink / raw) To: lkml; +Cc: Ingo Molnar, Andrew Morton, Con Kolivas [-- Attachment #1: Type: text/plain, Size: 1613 bytes --] On Fri, 2006-03-17 at 18:15 +0100, Mike Galbraith wrote: > Problem solved. I now know why the starvation logic doesn't work. > Wakeups. In the face of 10+ copies of httpd constantly waking up, it > seems it just takes ages to get around to switching arrays. > > With the (urp) patch below, I now get... > > [root]:# time netstat|grep :81|wc -l > 1648 > > real 0m27.735s > user 0m0.158s > sys 0m0.111s > [root]:# time netstat|grep :81|wc -l > 1817 > > real 0m13.550s > user 0m0.121s > sys 0m0.186s > [root]:# time netstat|grep :81|wc -l > 1641 > > real 0m17.022s > user 0m0.132s > sys 0m0.143s > [root]:# For those interested in these kind of things, here are the numbers for 2.6.16-rc6-mm2 with my [tarball] throttle patches applied... [root]:# time netstat|grep :81|wc -l 1681 real 0m1.525s user 0m0.141s sys 0m0.136s [root]:# time netstat|grep :81|wc -l 1491 real 0m0.356s user 0m0.130s sys 0m0.114s [root]:# time netstat|grep :81|wc -l 1527 real 0m0.343s user 0m0.129s sys 0m0.114s [root]:# time netstat|grep :81|wc -l 1568 real 0m0.512s user 0m0.112s sys 0m0.138s ...while running with the same apache loadavg of over 10, and tunables set to server mode (0,0). <plug> Even a desktop running with these settings is so interactive that I could play a game of Maelstrom (asteroids like thing) while doing a make -j30 in slow nfs mount and barely feel it. In a local filesystem, I could't feel it at all, so I added a thud 3, irman2 and a bonnie -s 2047 for good measure. Try that with stock :) </plug> [-- Attachment #2: throttle-V22-2.6.16-rc6-mm2.tar.gz --] [-- Type: application/x-compressed-tar, Size: 7205 bytes --] ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-20 7:09 ` Mike Galbraith @ 2006-03-20 10:22 ` Ingo Molnar 2006-03-21 6:47 ` Willy Tarreau 1 sibling, 0 replies; 112+ messages in thread From: Ingo Molnar @ 2006-03-20 10:22 UTC (permalink / raw) To: Mike Galbraith; +Cc: lkml, Andrew Morton, Con Kolivas * Mike Galbraith <efault@gmx.de> wrote: > <plug> > Even a desktop running with these settings is so interactive that I > could play a game of Maelstrom (asteroids like thing) while doing a > make -j30 in slow nfs mount and barely feel it. In a local > filesystem, I could't feel it at all, so I added a thud 3, irman2 and > a bonnie -s 2047 for good measure. Try that with stock :) > </plug> great! Please make sure all the patches make their way into -mm. We definitely want to try this for v2.6.17. Increasing starvation resistance _and_ interactivity via the same patchset is a rare feat ;-) Acked-by: Ingo Molnar <mingo@elte.hu> Ingo ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-20 7:09 ` Mike Galbraith 2006-03-20 10:22 ` Ingo Molnar @ 2006-03-21 6:47 ` Willy Tarreau 2006-03-21 7:51 ` Mike Galbraith 1 sibling, 1 reply; 112+ messages in thread From: Willy Tarreau @ 2006-03-21 6:47 UTC (permalink / raw) To: Mike Galbraith; +Cc: lkml, Ingo Molnar, Andrew Morton, Con Kolivas Hi Mike, On Mon, Mar 20, 2006 at 08:09:13AM +0100, Mike Galbraith wrote: (...) > For those interested in these kind of things, here are the numbers for > 2.6.16-rc6-mm2 with my [tarball] throttle patches applied... > > [root]:# time netstat|grep :81|wc -l > 1681 > > real 0m1.525s > user 0m0.141s > sys 0m0.136s > [root]:# time netstat|grep :81|wc -l > 1491 > > real 0m0.356s > user 0m0.130s > sys 0m0.114s > [root]:# time netstat|grep :81|wc -l > 1527 > > real 0m0.343s > user 0m0.129s > sys 0m0.114s > [root]:# time netstat|grep :81|wc -l > 1568 > > real 0m0.512s > user 0m0.112s > sys 0m0.138s > > ...while running with the same apache loadavg of over 10, and tunables > set to server mode (0,0). > > <plug> > Even a desktop running with these settings is so interactive that I > could play a game of Maelstrom (asteroids like thing) while doing a make > -j30 in slow nfs mount and barely feel it. In a local filesystem, I > could't feel it at all, so I added a thud 3, irman2 and a bonnie -s 2047 > for good measure. Try that with stock :) > </plug> Very good job ! I told Grant in a private email that I felt confident the problem would quickly be solved now that someone familiar with the scheduler could reliably reproduce it. Your numbers look excellent, I'm willing to test. Could you remind us what kernel and what patches we need to apply to try the same, please ? Cheers, Willy ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 6:47 ` Willy Tarreau @ 2006-03-21 7:51 ` Mike Galbraith 2006-03-21 9:13 ` Willy Tarreau 0 siblings, 1 reply; 112+ messages in thread From: Mike Galbraith @ 2006-03-21 7:51 UTC (permalink / raw) To: Willy Tarreau; +Cc: lkml, Ingo Molnar, Andrew Morton, Con Kolivas [-- Attachment #1: Type: text/plain, Size: 1864 bytes --] On Tue, 2006-03-21 at 07:47 +0100, Willy Tarreau wrote: > Hi Mike, Greetings! > On Mon, Mar 20, 2006 at 08:09:13AM +0100, Mike Galbraith wrote: > > real 0m0.512s > > user 0m0.112s > > sys 0m0.138s > > > > ...while running with the same apache loadavg of over 10, and tunables > > set to server mode (0,0). ... > Very good job ! > I told Grant in a private email that I felt confident the problem would > quickly be solved now that someone familiar with the scheduler could > reliably reproduce it. Your numbers look excellent, I'm willing to test. > Could you remind us what kernel and what patches we need to apply to > try the same, please ? You bet. I'm most happy to have someone try it other than me :) Apply the patches from the attached tarball in the obvious order to 2.6.16-rc6-mm2. As delivered, it's knobs are set up for a desktop box. For a server, you'll probably want maximum starvation resistance, so echo 0 > /proc/sys/kernel/grace_g1 and grace_g2. This will set the time a task can exceed expected cpu (based upon sleep_avg) to zero seconds, ie immediate throttling upon detection. It will also disable some interactivity specific code in the scheduler. If you want to fiddle with the knobs, grace_g1 is the number of CPU seconds a new task is authorized to run completely free of any intervention... startup in a desktop environment. grace_g2 is the amount of CPU seconds a well behaved task can store for later usage. With the throttling patch, an interactive task must earn the right to exceed expected cpu by performing within expectations. The longer the task behaves, the more 'good carma' it earns. This allows interactive tasks to do a burst of activity, but the user determines how long that burst==starvation is authorized. Tasks with just use as much cpu as they can get run headlong into the throttle. -Mike [-- Attachment #2: throttle-V23-2.6.16-rc6-mm2.tar.gz --] [-- Type: application/x-compressed-tar, Size: 7259 bytes --] ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 7:51 ` Mike Galbraith @ 2006-03-21 9:13 ` Willy Tarreau 2006-03-21 9:14 ` Ingo Molnar 0 siblings, 1 reply; 112+ messages in thread From: Willy Tarreau @ 2006-03-21 9:13 UTC (permalink / raw) To: Mike Galbraith; +Cc: lkml, Ingo Molnar, Andrew Morton, Con Kolivas On Tue, Mar 21, 2006 at 08:51:38AM +0100, Mike Galbraith wrote: > On Tue, 2006-03-21 at 07:47 +0100, Willy Tarreau wrote: > > Hi Mike, > > Greetings! Thanks for the details, I'll try to find some time to test your code quickly. If this fixes this long standing problem, we should definitely try to get it into 2.6.17 ! Cheers, Willy ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 9:13 ` Willy Tarreau @ 2006-03-21 9:14 ` Ingo Molnar 2006-03-21 11:15 ` Willy Tarreau 0 siblings, 1 reply; 112+ messages in thread From: Ingo Molnar @ 2006-03-21 9:14 UTC (permalink / raw) To: Willy Tarreau; +Cc: Mike Galbraith, lkml, Andrew Morton, Con Kolivas * Willy Tarreau <willy@w.ods.org> wrote: > > On Tue, Mar 21, 2006 at 08:51:38AM +0100, Mike Galbraith wrote: > > On Tue, 2006-03-21 at 07:47 +0100, Willy Tarreau wrote: > > > Hi Mike, > > > > Greetings! > > Thanks for the details, > I'll try to find some time to test your code quickly. If this fixes this > long standing problem, we should definitely try to get it into 2.6.17 ! the time window is quickly closing for that to happen though. Ingo ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 9:14 ` Ingo Molnar @ 2006-03-21 11:15 ` Willy Tarreau 2006-03-21 11:18 ` Ingo Molnar 0 siblings, 1 reply; 112+ messages in thread From: Willy Tarreau @ 2006-03-21 11:15 UTC (permalink / raw) To: Ingo Molnar; +Cc: Mike Galbraith, lkml, Andrew Morton, Con Kolivas, bugsplatter On Tue, Mar 21, 2006 at 10:14:22AM +0100, Ingo Molnar wrote: > > * Willy Tarreau <willy@w.ods.org> wrote: > > > > > On Tue, Mar 21, 2006 at 08:51:38AM +0100, Mike Galbraith wrote: > > > On Tue, 2006-03-21 at 07:47 +0100, Willy Tarreau wrote: > > > > Hi Mike, > > > > > > Greetings! > > > > Thanks for the details, > > I'll try to find some time to test your code quickly. If this fixes this > > long standing problem, we should definitely try to get it into 2.6.17 ! > > the time window is quickly closing for that to happen though. Ingo, Mike, it's a great day :-) Right now, I'm typing this mail from my notebook which has 8 instances of my exploit running in background. Previously, 4 of them were enough on this machine to create pauses of up to 31 seconds. Right now, I can type normally, and I simply can say that my exploit has no effect anymore ! It's just consuming CPU and nothing else. I also tried to write 0 to grace_g[12] and I find it even more responsive with 0 in those values. I've not had time to do more extensive tests, but I can assure you that the problem is clearly solved for me. I'd like Grant to test ssh on his firewall with it too. Congratulations ! Willy ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 11:15 ` Willy Tarreau @ 2006-03-21 11:18 ` Ingo Molnar 2006-03-21 11:53 ` Con Kolivas 2006-03-21 12:07 ` Mike Galbraith 0 siblings, 2 replies; 112+ messages in thread From: Ingo Molnar @ 2006-03-21 11:18 UTC (permalink / raw) To: Willy Tarreau Cc: Mike Galbraith, lkml, Andrew Morton, Con Kolivas, bugsplatter * Willy Tarreau <willy@w.ods.org> wrote: > On Tue, Mar 21, 2006 at 10:14:22AM +0100, Ingo Molnar wrote: > > > > * Willy Tarreau <willy@w.ods.org> wrote: > > > > > > > > On Tue, Mar 21, 2006 at 08:51:38AM +0100, Mike Galbraith wrote: > > > > On Tue, 2006-03-21 at 07:47 +0100, Willy Tarreau wrote: > > > > > Hi Mike, > > > > > > > > Greetings! > > > > > > Thanks for the details, > > > I'll try to find some time to test your code quickly. If this fixes this > > > long standing problem, we should definitely try to get it into 2.6.17 ! > > > > the time window is quickly closing for that to happen though. > > Ingo, Mike, > > it's a great day :-) > > Right now, I'm typing this mail from my notebook which has 8 instances > of my exploit running in background. Previously, 4 of them were enough > on this machine to create pauses of up to 31 seconds. Right now, I can > type normally, and I simply can say that my exploit has no effect > anymore ! It's just consuming CPU and nothing else. I also tried to > write 0 to grace_g[12] and I find it even more responsive with 0 in > those values. I've not had time to do more extensive tests, but I can > assure you that the problem is clearly solved for me. I'd like Grant > to test ssh on his firewall with it too. great work by Mike! One detail: i'd like there to be just one default throttling value, i.e. no grace_g tunables [so that we have just one default scheduler behavior]. Is the default grace_g[12] setting good enough for your workload? Ingo ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 11:18 ` Ingo Molnar @ 2006-03-21 11:53 ` Con Kolivas 2006-03-21 13:10 ` Mike Galbraith 2006-03-21 12:07 ` Mike Galbraith 1 sibling, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-21 11:53 UTC (permalink / raw) To: Ingo Molnar Cc: Willy Tarreau, Mike Galbraith, lkml, Andrew Morton, bugsplatter On Tuesday 21 March 2006 22:18, Ingo Molnar wrote: > great work by Mike! One detail: i'd like there to be just one default > throttling value, i.e. no grace_g tunables [so that we have just one > default scheduler behavior]. Is the default grace_g[12] setting good > enough for your workload? I agree. If anything is required, a simple on/off tunable makes much more sense. Much like I suggested ages ago with an "interactive" switch which was rather unpopular when I first suggested it. Perhaps my marketing was wrong. Oh well. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 11:53 ` Con Kolivas @ 2006-03-21 13:10 ` Mike Galbraith 2006-03-21 13:13 ` Con Kolivas 0 siblings, 1 reply; 112+ messages in thread From: Mike Galbraith @ 2006-03-21 13:10 UTC (permalink / raw) To: Con Kolivas; +Cc: Ingo Molnar, Willy Tarreau, lkml, Andrew Morton, bugsplatter On Tue, 2006-03-21 at 22:53 +1100, Con Kolivas wrote: > On Tuesday 21 March 2006 22:18, Ingo Molnar wrote: > > great work by Mike! One detail: i'd like there to be just one default > > throttling value, i.e. no grace_g tunables [so that we have just one > > default scheduler behavior]. Is the default grace_g[12] setting good > > enough for your workload? > > I agree. If anything is required, a simple on/off tunable makes much more > sense. Much like I suggested ages ago with an "interactive" switch which was > rather unpopular when I first suggested it. Let me try to explain why on/off is not sufficient. You notice how Willy said that his notebook is more responsive with tunables set to 0,0? That's important, because it's absolutely true... depending what you're doing. Setting tunables to 0,0 cuts off the idle sleep logic, and the sleep_avg divisor - both of which were put there specifically for interactivity - and returns the scheduler to more or less original O(1) scheduler. You and I both know that these are most definitely needed in a Desktop environment. For instance, if Willy starts editing code in X, and scrolls while something is running in the background, he'll suddenly say hey, maybe this _ain't_ more responsive, because all of a sudden the starvation added with the interactivity logic will be sorely missed as my throttle wrings X's neck. How long should Willy be able to scroll without feeling the background, and how long should Apache be able to starve his shell. They are one and the same, and I can't say, because I'm not Willy. I don't know how to get there from here without tunables. Picking defaults is one thing, but I don't know how to make it one-size-fits-all. For the general case, the values delivered will work fine. For the apache case, they absolutely 100% guaranteed will not. -Mike ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 13:10 ` Mike Galbraith @ 2006-03-21 13:13 ` Con Kolivas 2006-03-21 13:33 ` Mike Galbraith 2006-03-21 13:38 ` Willy Tarreau 0 siblings, 2 replies; 112+ messages in thread From: Con Kolivas @ 2006-03-21 13:13 UTC (permalink / raw) To: Mike Galbraith Cc: Ingo Molnar, Willy Tarreau, lkml, Andrew Morton, bugsplatter On Wednesday 22 March 2006 00:10, Mike Galbraith wrote: > On Tue, 2006-03-21 at 22:53 +1100, Con Kolivas wrote: > > On Tuesday 21 March 2006 22:18, Ingo Molnar wrote: > > > great work by Mike! One detail: i'd like there to be just one default > > > throttling value, i.e. no grace_g tunables [so that we have just one > > > default scheduler behavior]. Is the default grace_g[12] setting good > > > enough for your workload? > > > > I agree. If anything is required, a simple on/off tunable makes much more > > sense. Much like I suggested ages ago with an "interactive" switch which > > was rather unpopular when I first suggested it. > > Let me try to explain why on/off is not sufficient. > > You notice how Willy said that his notebook is more responsive with > tunables set to 0,0? That's important, because it's absolutely true... > depending what you're doing. Setting tunables to 0,0 cuts off the idle > sleep logic, and the sleep_avg divisor - both of which were put there > specifically for interactivity - and returns the scheduler to more or > less original O(1) scheduler. You and I both know that these are most > definitely needed in a Desktop environment. For instance, if Willy > starts editing code in X, and scrolls while something is running in the > background, he'll suddenly say hey, maybe this _ain't_ more responsive, > because all of a sudden the starvation added with the interactivity > logic will be sorely missed as my throttle wrings X's neck. > > How long should Willy be able to scroll without feeling the background, > and how long should Apache be able to starve his shell. They are one > and the same, and I can't say, because I'm not Willy. I don't know how > to get there from here without tunables. Picking defaults is one thing, > but I don't know how to make it one-size-fits-all. For the general > case, the values delivered will work fine. For the apache case, they > absolutely 100% guaranteed will not. So how do you propose we tune such a beast then? Apache users will use off, everyone else will have no idea but to use the defaults. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 13:13 ` Con Kolivas @ 2006-03-21 13:33 ` Mike Galbraith 2006-03-21 13:37 ` Con Kolivas 2006-03-21 13:38 ` Willy Tarreau 1 sibling, 1 reply; 112+ messages in thread From: Mike Galbraith @ 2006-03-21 13:33 UTC (permalink / raw) To: Con Kolivas; +Cc: Ingo Molnar, Willy Tarreau, lkml, Andrew Morton, bugsplatter On Wed, 2006-03-22 at 00:13 +1100, Con Kolivas wrote: > On Wednesday 22 March 2006 00:10, Mike Galbraith wrote: > > How long should Willy be able to scroll without feeling the background, > > and how long should Apache be able to starve his shell. They are one > > and the same, and I can't say, because I'm not Willy. I don't know how > > to get there from here without tunables. Picking defaults is one thing, > > but I don't know how to make it one-size-fits-all. For the general > > case, the values delivered will work fine. For the apache case, they > > absolutely 100% guaranteed will not. > > So how do you propose we tune such a beast then? Apache users will use off, > everyone else will have no idea but to use the defaults. Set for desktop, which is intended to mostly emulate what we have right now, which most people are quite happy with. The throttle will still nail most of the corner cases, and the other adjustments nail the majority of what's left. That leaves the hefty server type loads as what certainly will require tuning. They always need tuning. -Mike ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 13:33 ` Mike Galbraith @ 2006-03-21 13:37 ` Con Kolivas 2006-03-21 13:44 ` Willy Tarreau 0 siblings, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-21 13:37 UTC (permalink / raw) To: Mike Galbraith Cc: Ingo Molnar, Willy Tarreau, lkml, Andrew Morton, bugsplatter On Wednesday 22 March 2006 00:33, Mike Galbraith wrote: > On Wed, 2006-03-22 at 00:13 +1100, Con Kolivas wrote: > > On Wednesday 22 March 2006 00:10, Mike Galbraith wrote: > > > How long should Willy be able to scroll without feeling the background, > > > and how long should Apache be able to starve his shell. They are one > > > and the same, and I can't say, because I'm not Willy. I don't know how > > > to get there from here without tunables. Picking defaults is one > > > thing, but I don't know how to make it one-size-fits-all. For the > > > general case, the values delivered will work fine. For the apache > > > case, they absolutely 100% guaranteed will not. > > > > So how do you propose we tune such a beast then? Apache users will use > > off, everyone else will have no idea but to use the defaults. > > Set for desktop, which is intended to mostly emulate what we have right > now, which most people are quite happy with. The throttle will still > nail most of the corner cases, and the other adjustments nail the > majority of what's left. That leaves the hefty server type loads as > what certainly will require tuning. They always need tuning. That still sounds like just on/off to me. Default for desktop and 0,0 for server. Am I missing something? Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 13:37 ` Con Kolivas @ 2006-03-21 13:44 ` Willy Tarreau 2006-03-21 13:45 ` Con Kolivas 0 siblings, 1 reply; 112+ messages in thread From: Willy Tarreau @ 2006-03-21 13:44 UTC (permalink / raw) To: Con Kolivas; +Cc: Mike Galbraith, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Wed, Mar 22, 2006 at 12:37:51AM +1100, Con Kolivas wrote: > On Wednesday 22 March 2006 00:33, Mike Galbraith wrote: > > On Wed, 2006-03-22 at 00:13 +1100, Con Kolivas wrote: > > > On Wednesday 22 March 2006 00:10, Mike Galbraith wrote: > > > > How long should Willy be able to scroll without feeling the background, > > > > and how long should Apache be able to starve his shell. They are one > > > > and the same, and I can't say, because I'm not Willy. I don't know how > > > > to get there from here without tunables. Picking defaults is one > > > > thing, but I don't know how to make it one-size-fits-all. For the > > > > general case, the values delivered will work fine. For the apache > > > > case, they absolutely 100% guaranteed will not. > > > > > > So how do you propose we tune such a beast then? Apache users will use > > > off, everyone else will have no idea but to use the defaults. > > > > Set for desktop, which is intended to mostly emulate what we have right > > now, which most people are quite happy with. The throttle will still > > nail most of the corner cases, and the other adjustments nail the > > majority of what's left. That leaves the hefty server type loads as > > what certainly will require tuning. They always need tuning. > > That still sounds like just on/off to me. Default for desktop and 0,0 for > server. Am I missing something? Believe it or not, there *are* people running their servers with full graphical environments. At the place we first encountered the interactivity problem with my load-balancer, they first installed in on a full FC2 with the OpenGL screen saver... No need to say they had scaling difficulties and trouble to log in ! Although that's a stupid thing to do, what I want to show is that even on servers, you can't easily predict the workload. Maybe a server which often forks processes for dedicated tasks (eg: monitoring) would prefer running between "desktop" and "server" mode. > Cheers, > Con Cheers, Willy ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 13:44 ` Willy Tarreau @ 2006-03-21 13:45 ` Con Kolivas 2006-03-21 14:01 ` Mike Galbraith 0 siblings, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-21 13:45 UTC (permalink / raw) To: Willy Tarreau Cc: Mike Galbraith, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Wednesday 22 March 2006 00:44, Willy Tarreau wrote: > On Wed, Mar 22, 2006 at 12:37:51AM +1100, Con Kolivas wrote: > > On Wednesday 22 March 2006 00:33, Mike Galbraith wrote: > > > On Wed, 2006-03-22 at 00:13 +1100, Con Kolivas wrote: > > > > On Wednesday 22 March 2006 00:10, Mike Galbraith wrote: > > > > > How long should Willy be able to scroll without feeling the > > > > > background, and how long should Apache be able to starve his shell. > > > > > They are one and the same, and I can't say, because I'm not Willy. > > > > > I don't know how to get there from here without tunables. Picking > > > > > defaults is one thing, but I don't know how to make it > > > > > one-size-fits-all. For the general case, the values delivered will > > > > > work fine. For the apache case, they absolutely 100% guaranteed > > > > > will not. > > > > > > > > So how do you propose we tune such a beast then? Apache users will > > > > use off, everyone else will have no idea but to use the defaults. > > > > > > Set for desktop, which is intended to mostly emulate what we have right > > > now, which most people are quite happy with. The throttle will still > > > nail most of the corner cases, and the other adjustments nail the > > > majority of what's left. That leaves the hefty server type loads as > > > what certainly will require tuning. They always need tuning. > > > > That still sounds like just on/off to me. Default for desktop and 0,0 for > > server. Am I missing something? > > Believe it or not, there *are* people running their servers with full > graphical environments. At the place we first encountered the interactivity > problem with my load-balancer, they first installed in on a full FC2 with > the OpenGL screen saver... No need to say they had scaling difficulties and > trouble to log in ! > > Although that's a stupid thing to do, what I want to show is that even on > servers, you can't easily predict the workload. Maybe a server which often > forks processes for dedicated tasks (eg: monitoring) would prefer running > between "desktop" and "server" mode. I give up. Add as many tunables as you like in as many places as possible that even less people will understand. You've already told me you'll be running 0,0. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 13:45 ` Con Kolivas @ 2006-03-21 14:01 ` Mike Galbraith 2006-03-21 14:17 ` Con Kolivas 0 siblings, 1 reply; 112+ messages in thread From: Mike Galbraith @ 2006-03-21 14:01 UTC (permalink / raw) To: Con Kolivas; +Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Wed, 2006-03-22 at 00:45 +1100, Con Kolivas wrote: > I give up. Add as many tunables as you like in as many places as possible that > even less people will understand. You've already told me you'll be running > 0,0. Instead of giving up, how about look at the code and make a suggestion for improvement? It's not an easy problem, as you're well aware. I really don't see why you're (seemingly) getting irate. Tunables for this are no different that tunables like CHILD_PENALTY etc etc etc. How many casual users know those exist, much less understand them? -Mike ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 14:01 ` Mike Galbraith @ 2006-03-21 14:17 ` Con Kolivas 2006-03-21 15:20 ` Con Kolivas 0 siblings, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-21 14:17 UTC (permalink / raw) To: Mike Galbraith Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Wednesday 22 March 2006 01:01, Mike Galbraith wrote: > On Wed, 2006-03-22 at 00:45 +1100, Con Kolivas wrote: > > I give up. Add as many tunables as you like in as many places as possible > > that even less people will understand. You've already told me you'll be > > running 0,0. > > Instead of giving up, how about look at the code and make a suggestion > for improvement? It's not an easy problem, as you're well aware. > > I really don't see why you're (seemingly) getting irate. Tunables for > this are no different that tunables like CHILD_PENALTY etc etc etc. How > many casual users know those exist, much less understand them? Because I strongly believe that tunables for this sort of thing are wrong. CHILD_PENALTY and friends have never been exported apart from out-of-tree patches. These were meant to be tuned in the kernel and never exported. Ingo didn't want *any* tunables so I'm relatively flexible with an on/off switch which he doesn't like. I really do believe most users will only have it on or off though. Don't think I'm ignoring your code. You inspired me to do the original patches 3 years ago. I have looked at your patch at length and basically what it does is variably convert the interactive estimator from full to zero over some timeframe choosable with your tunables. Since most users will use either full or zero I actually believe the same effect can be had by a tiny modification to enable/disable the estimator anyway. This is not to deny you've done a lot of work and confirmed that the estimator running indefinitely unthrottled is bad. What timeframe is correct to throttle is impossible to say though :-( Most desktop users would be quite happy with indefinite because they basically do not hit workloads that "exploit" it. Most server/hybrid setups are willing to sacrifice some interactivity for fairness, and the basic active->expired design gives them enough interactivity without virtually any boost anyway. Ironically, audio is fabulous on such a design since it virtually never consumes a full timeslice. So any value you place on the timeframe as the default ends up being a compromise, and this is what Ingo is suggesting. This is similar to when sleep_avg changed from 10 seconds to 30 seconds to 2 seconds at various times. Luckily the non linear decay of sleep_avg circumvents that being relevant... but it also leads to the exact issue you're trying to fix. Once again we're left with choosing some number, and as much as I'd like to help since I really care about the desktop, I don't think any compromise is correct. Just on or off. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 14:17 ` Con Kolivas @ 2006-03-21 15:20 ` Con Kolivas 2006-03-21 17:50 ` Willy Tarreau 2006-03-21 17:51 ` Mike Galbraith 0 siblings, 2 replies; 112+ messages in thread From: Con Kolivas @ 2006-03-21 15:20 UTC (permalink / raw) To: Mike Galbraith Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Wednesday 22 March 2006 01:17, Con Kolivas wrote: > I actually believe the same effect can be had by a tiny > modification to enable/disable the estimator anyway. Just for argument's sake it would look something like this. Cheers, Con --- Add sysctl to enable/disable cpu scheduer interactivity estimator Signed-off-by: Con Kolivas <kernel@kolivas.org> --- include/linux/sched.h | 1 + include/linux/sysctl.h | 1 + kernel/sched.c | 14 +++++++++++--- kernel/sysctl.c | 8 ++++++++ 4 files changed, 21 insertions(+), 3 deletions(-) Index: linux-2.6.16-rc6-mm2/include/linux/sched.h =================================================================== --- linux-2.6.16-rc6-mm2.orig/include/linux/sched.h 2006-03-19 11:15:27.000000000 +1100 +++ linux-2.6.16-rc6-mm2/include/linux/sched.h 2006-03-22 02:13:55.000000000 +1100 @@ -104,6 +104,7 @@ extern unsigned long nr_uninterruptible( extern unsigned long nr_active(void); extern unsigned long nr_iowait(void); extern unsigned long weighted_cpuload(const int cpu); +extern int sched_interactive; #include <linux/time.h> #include <linux/param.h> Index: linux-2.6.16-rc6-mm2/include/linux/sysctl.h =================================================================== --- linux-2.6.16-rc6-mm2.orig/include/linux/sysctl.h 2006-03-19 11:15:27.000000000 +1100 +++ linux-2.6.16-rc6-mm2/include/linux/sysctl.h 2006-03-22 02:14:43.000000000 +1100 @@ -148,6 +148,7 @@ enum KERN_SPIN_RETRY=70, /* int: number of spinlock retries */ KERN_ACPI_VIDEO_FLAGS=71, /* int: flags for setting up video after ACPI sleep */ KERN_IA64_UNALIGNED=72, /* int: ia64 unaligned userland trap enable */ + KERN_INTERACTIVE=73, /* int: enable/disable interactivity estimator */ }; Index: linux-2.6.16-rc6-mm2/kernel/sched.c =================================================================== --- linux-2.6.16-rc6-mm2.orig/kernel/sched.c 2006-03-19 15:41:08.000000000 +1100 +++ linux-2.6.16-rc6-mm2/kernel/sched.c 2006-03-22 02:13:56.000000000 +1100 @@ -128,6 +128,9 @@ * too hard. */ +/* Sysctl enable/disable interactive estimator */ +int sched_interactive __read_mostly = 1; + #define CURRENT_BONUS(p) \ (NS_TO_JIFFIES((p)->sleep_avg) * MAX_BONUS / \ MAX_SLEEP_AVG) @@ -151,7 +154,8 @@ INTERACTIVE_DELTA) #define TASK_INTERACTIVE(p) \ - ((p)->prio <= (p)->static_prio - DELTA(p)) + ((p)->prio <= (p)->static_prio - DELTA(p) && \ + sched_interactive) #define INTERACTIVE_SLEEP(p) \ (JIFFIES_TO_NS(MAX_SLEEP_AVG * \ @@ -662,9 +666,13 @@ static int effective_prio(task_t *p) if (rt_task(p)) return p->prio; - bonus = CURRENT_BONUS(p) - MAX_BONUS / 2; + prio = p->static_prio; + + if (sched_interactive) { + bonus = CURRENT_BONUS(p) - MAX_BONUS / 2; - prio = p->static_prio - bonus; + prio -= bonus; + } if (prio < MAX_RT_PRIO) prio = MAX_RT_PRIO; if (prio > MAX_PRIO-1) Index: linux-2.6.16-rc6-mm2/kernel/sysctl.c =================================================================== --- linux-2.6.16-rc6-mm2.orig/kernel/sysctl.c 2006-03-19 11:15:27.000000000 +1100 +++ linux-2.6.16-rc6-mm2/kernel/sysctl.c 2006-03-22 02:15:23.000000000 +1100 @@ -684,6 +684,14 @@ static ctl_table kern_table[] = { .proc_handler = &proc_dointvec, }, #endif + { + .ctl_name = KERN_SCHED_INTERACTIVE, + .procname = "interactive", + .data = &sched_interactive, + .maxlen = sizeof (int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, { .ctl_name = 0 } }; ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 15:20 ` Con Kolivas @ 2006-03-21 17:50 ` Willy Tarreau 2006-03-22 4:18 ` Mike Galbraith 2006-03-21 17:51 ` Mike Galbraith 1 sibling, 1 reply; 112+ messages in thread From: Willy Tarreau @ 2006-03-21 17:50 UTC (permalink / raw) To: Con Kolivas; +Cc: Mike Galbraith, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Wed, Mar 22, 2006 at 02:20:10AM +1100, Con Kolivas wrote: > On Wednesday 22 March 2006 01:17, Con Kolivas wrote: > > I actually believe the same effect can be had by a tiny > > modification to enable/disable the estimator anyway. > > Just for argument's sake it would look something like this. > > Cheers, > Con > --- > Add sysctl to enable/disable cpu scheduer interactivity estimator At least, in May 2005, the equivalent of this patch I tested on 2.6.11.7 considerably improved responsiveness, but there was still this very annoying slowdown when the load increased. vmstat delays increased by one second every 10 processes. I retried again around 2.6.14 a few months ago, and it was the same. Perhaps Mike's code and other changes in 2.6-mm really fix the initial problem (array switching ?) and then only the interactivity boost is causing the remaining trouble ? Cheers, Willy ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 17:50 ` Willy Tarreau @ 2006-03-22 4:18 ` Mike Galbraith 0 siblings, 0 replies; 112+ messages in thread From: Mike Galbraith @ 2006-03-22 4:18 UTC (permalink / raw) To: Willy Tarreau; +Cc: Con Kolivas, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Tue, 2006-03-21 at 18:50 +0100, Willy Tarreau wrote: > On Wed, Mar 22, 2006 at 02:20:10AM +1100, Con Kolivas wrote: > > On Wednesday 22 March 2006 01:17, Con Kolivas wrote: > > > I actually believe the same effect can be had by a tiny > > > modification to enable/disable the estimator anyway. > > > > Just for argument's sake it would look something like this. > > > > Cheers, > > Con > > --- > > Add sysctl to enable/disable cpu scheduer interactivity estimator > > At least, in May 2005, the equivalent of this patch I tested on > 2.6.11.7 considerably improved responsiveness, but there was still > this very annoying slowdown when the load increased. vmstat delays > increased by one second every 10 processes. I retried again around > 2.6.14 a few months ago, and it was the same. Perhaps Mike's code > and other changes in 2.6-mm really fix the initial problem (array > switching ?) and then only the interactivity boost is causing the > remaining trouble ? The slowdown you see is because a timeslice is 100ms, and that patch turned the scheduler into a non-preempting pure round-robin slug. Array switching is only one aspect, and one I hadn't thought of as I was tinkering with my patches, I discovered that aspect by accident. My code does a few things, and all of them are part of the picture. One of them is to deal with excessive interactive boost. Another is to tighten timeslice enforcement, and another is to close the fundamental hole in the concept sleep_avg. That hole is causing the majority of the problems that crop up, the interactivity bits only make it worse. The hole is this. If priority is based solely upon % sleep time, even if there is no interactive boost, even if accumulation vs consumption is 1:1, if you sleep 51% of the time, you will inevitably rise to max priority, and be able to use 49% of the CPU at max priority forever. The current heuristics make that very close to but not quite 95%. The fact that we don't have _horrendous_ problems shows that the basic concept of sleep_avg is pretty darn good. Close the hole in any way you can think of (mine is one), and it's excellent. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 15:20 ` Con Kolivas 2006-03-21 17:50 ` Willy Tarreau @ 2006-03-21 17:51 ` Mike Galbraith 1 sibling, 0 replies; 112+ messages in thread From: Mike Galbraith @ 2006-03-21 17:51 UTC (permalink / raw) To: Con Kolivas; +Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Wed, 2006-03-22 at 02:20 +1100, Con Kolivas wrote: > On Wednesday 22 March 2006 01:17, Con Kolivas wrote: > > I actually believe the same effect can be had by a tiny > > modification to enable/disable the estimator anyway. > > Just for argument's sake it would look something like this. That won't have the same effect. What you disabled isn't only about interactivity. It's also about preemption, throughput and fairness. -Mike (we now interrupt this thread for an evening of real life;) ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 13:13 ` Con Kolivas 2006-03-21 13:33 ` Mike Galbraith @ 2006-03-21 13:38 ` Willy Tarreau 2006-03-21 13:48 ` Mike Galbraith 1 sibling, 1 reply; 112+ messages in thread From: Willy Tarreau @ 2006-03-21 13:38 UTC (permalink / raw) To: Con Kolivas; +Cc: Mike Galbraith, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Wed, Mar 22, 2006 at 12:13:15AM +1100, Con Kolivas wrote: > On Wednesday 22 March 2006 00:10, Mike Galbraith wrote: > > On Tue, 2006-03-21 at 22:53 +1100, Con Kolivas wrote: > > > On Tuesday 21 March 2006 22:18, Ingo Molnar wrote: > > > > great work by Mike! One detail: i'd like there to be just one default > > > > throttling value, i.e. no grace_g tunables [so that we have just one > > > > default scheduler behavior]. Is the default grace_g[12] setting good > > > > enough for your workload? > > > > > > I agree. If anything is required, a simple on/off tunable makes much more > > > sense. Much like I suggested ages ago with an "interactive" switch which > > > was rather unpopular when I first suggested it. > > > > Let me try to explain why on/off is not sufficient. > > > > You notice how Willy said that his notebook is more responsive with > > tunables set to 0,0? That's important, because it's absolutely true... > > depending what you're doing. Setting tunables to 0,0 cuts off the idle > > sleep logic, and the sleep_avg divisor - both of which were put there > > specifically for interactivity - and returns the scheduler to more or > > less original O(1) scheduler. You and I both know that these are most > > definitely needed in a Desktop environment. For instance, if Willy > > starts editing code in X, and scrolls while something is running in the > > background, he'll suddenly say hey, maybe this _ain't_ more responsive, > > because all of a sudden the starvation added with the interactivity > > logic will be sorely missed as my throttle wrings X's neck. > > > > How long should Willy be able to scroll without feeling the background, > > and how long should Apache be able to starve his shell. They are one > > and the same, and I can't say, because I'm not Willy. I don't know how > > to get there from here without tunables. Picking defaults is one thing, > > but I don't know how to make it one-size-fits-all. For the general > > case, the values delivered will work fine. For the apache case, they > > absolutely 100% guaranteed will not. > > So how do you propose we tune such a beast then? Apache users will use off, > everyone else will have no idea but to use the defaults. What you describe is exactly a case for a tunable. Different people with different workloads want different values. Seems fair enough. After all, we already have /proc/sys/vm/swappiness, and things like that for the same reason : the default value should suit most users, and the ones with knowledge and different needs can tune their system. Maybe grace_{g1,g2} should be renamed to be more explicit, may be we can automatically tune one from the other and let only one tunable. But if both have a useful effect, I don't see a reason for hiding them. > Cheers, > Con Cheers, Willy ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 13:38 ` Willy Tarreau @ 2006-03-21 13:48 ` Mike Galbraith 0 siblings, 0 replies; 112+ messages in thread From: Mike Galbraith @ 2006-03-21 13:48 UTC (permalink / raw) To: Willy Tarreau; +Cc: Con Kolivas, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Tue, 2006-03-21 at 14:38 +0100, Willy Tarreau wrote: > What you describe is exactly a case for a tunable. Different people with > different workloads want different values. Seems fair enough. After all, > we already have /proc/sys/vm/swappiness, and things like that for the same > reason : the default value should suit most users, and the ones with > knowledge and different needs can tune their system. Maybe grace_{g1,g2} > should be renamed to be more explicit, may be we can automatically tune > one from the other and let only one tunable. But if both have a useful > effect, I don't see a reason for hiding them. I'm wide open to suggestions. I tried to make it functional, flexible, and above all, dirt simple. Adding 'acceptable' would be cool :) -Mike ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 11:18 ` Ingo Molnar 2006-03-21 11:53 ` Con Kolivas @ 2006-03-21 12:07 ` Mike Galbraith 2006-03-21 12:59 ` Willy Tarreau 1 sibling, 1 reply; 112+ messages in thread From: Mike Galbraith @ 2006-03-21 12:07 UTC (permalink / raw) To: Ingo Molnar; +Cc: Willy Tarreau, lkml, Andrew Morton, Con Kolivas, bugsplatter On Tue, 2006-03-21 at 12:18 +0100, Ingo Molnar wrote: > great work by Mike! One detail: i'd like there to be just one default > throttling value, i.e. no grace_g tunables [so that we have just one > default scheduler behavior]. Is the default grace_g[12] setting good > enough for your workload? I can make the knobs compile time so we don't see random behavior reports, but I don't think they can be totally eliminated. Would that be sufficient? If so, the numbers as delivered should be fine for desktop boxen I think. People who are building custom kernels can bend to fit as always. -Mike ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 12:07 ` Mike Galbraith @ 2006-03-21 12:59 ` Willy Tarreau 2006-03-21 13:24 ` Mike Galbraith 0 siblings, 1 reply; 112+ messages in thread From: Willy Tarreau @ 2006-03-21 12:59 UTC (permalink / raw) To: Mike Galbraith; +Cc: Ingo Molnar, lkml, Andrew Morton, Con Kolivas, bugsplatter On Tue, Mar 21, 2006 at 01:07:58PM +0100, Mike Galbraith wrote: > On Tue, 2006-03-21 at 12:18 +0100, Ingo Molnar wrote: > > > great work by Mike! One detail: i'd like there to be just one default > > throttling value, i.e. no grace_g tunables [so that we have just one > > default scheduler behavior]. Is the default grace_g[12] setting good > > enough for your workload? The default values are infinitely better than mainline, but it is still a huge improvement to reduce them (at least grace_g2) : default : grace_g1=10, grace_g2=14400, loadavg oscillating between 7 and 12 : willy@wtap:~$ time ls -la /data/src/tmp/|wc 2271 18250 212211 real 0m5.759s user 0m0.028s sys 0m0.008s willy@wtap:~$ time ls -la /data/src/tmp/|wc 2271 18250 212211 real 0m3.476s user 0m0.020s sys 0m0.016s willy@wtap:~$ I can still observe some occasionnal pauses of 1 to 3 seconds (once to 4 times per minute). - grace_g2 set to 0, load converges to a stable 8 : willy@wtap:~$ time ls -la /data/src/tmp/|wc 2271 18250 212211 real 0m0.441s user 0m0.036s sys 0m0.004s willy@wtap:~$ time ls -la /data/src/tmp/|wc 2271 18250 212211 real 0m0.400s user 0m0.032s sys 0m0.008s I can still observe some rare cases of 1 second pauses (once or twice per minute). - grace_g2 and grace_g1 set to zero : willy@wtap:~$ time ls -la /data/src/tmp/|wc 2271 18250 212211 real 0m0.214s user 0m0.028s sys 0m0.008s willy@wtap:~$ time ls -la /data/src/tmp/|wc 2271 18250 212211 real 0m0.193s user 0m0.032s sys 0m0.008s => I never observe any pause, and the numbers above sometimes even get lower (around 75 ms). I have also tried injecting traffic on my proxy, and at 16000 hits/s, its does not impact overall system's responsiveness, whatever (g1,g2). > I can make the knobs compile time so we don't see random behavior > reports, but I don't think they can be totally eliminated. Would that > be sufficient? > > If so, the numbers as delivered should be fine for desktop boxen I > think. People who are building custom kernels can bend to fit as > always. That would suit me perfectly. I think I would set them both to zero. It's not clear to me what workload they can help, it seems that they try to allow a sometimes unfair scheduling. > -Mike Cheers, Willy ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 12:59 ` Willy Tarreau @ 2006-03-21 13:24 ` Mike Galbraith 2006-03-21 13:53 ` Con Kolivas 2006-03-21 22:51 ` Peter Williams 0 siblings, 2 replies; 112+ messages in thread From: Mike Galbraith @ 2006-03-21 13:24 UTC (permalink / raw) To: Willy Tarreau; +Cc: Ingo Molnar, lkml, Andrew Morton, Con Kolivas, bugsplatter On Tue, 2006-03-21 at 13:59 +0100, Willy Tarreau wrote: > On Tue, Mar 21, 2006 at 01:07:58PM +0100, Mike Galbraith wrote: > > I can make the knobs compile time so we don't see random behavior > > reports, but I don't think they can be totally eliminated. Would that > > be sufficient? > > > > If so, the numbers as delivered should be fine for desktop boxen I > > think. People who are building custom kernels can bend to fit as > > always. > > That would suit me perfectly. I think I would set them both to zero. > It's not clear to me what workload they can help, it seems that they > try to allow a sometimes unfair scheduling. Correct. Massively unfair scheduling is what interactivity requires. -Mike ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 13:24 ` Mike Galbraith @ 2006-03-21 13:53 ` Con Kolivas 2006-03-21 14:17 ` Mike Galbraith 2006-03-21 22:51 ` Peter Williams 1 sibling, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-21 13:53 UTC (permalink / raw) To: Mike Galbraith Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Wednesday 22 March 2006 00:24, Mike Galbraith wrote: > On Tue, 2006-03-21 at 13:59 +0100, Willy Tarreau wrote: > > That would suit me perfectly. I think I would set them both to zero. > > It's not clear to me what workload they can help, it seems that they > > try to allow a sometimes unfair scheduling. > > Correct. Massively unfair scheduling is what interactivity requires. To some degree, yes. Transient unfairness was all that it was supposed to do and clearly it failed at being transient. I would argue that good interactivity is possible with fairness by changing the design. I won't go there (to try and push it that is), though, as the opposition to changing the whole scheduler in place or making it pluggable has already been voiced numerous times over, and it would kill me to try and promote such an alternative ever again. Especially since the number of people willing to test interactive patches and report to lkml has dropped to virtually nil. The yardstick for changes is now the speed of 'ls' scrolling in the console. Where exactly are those extra cycles going I wonder? Do you think the scheduler somehow makes the cpu idle doing nothing in that timespace? Clearly that's not true, and userspace is making something spin unnecessarily, but we're gonna fix that by modifying the scheduler.... sigh Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 13:53 ` Con Kolivas @ 2006-03-21 14:17 ` Mike Galbraith 2006-03-21 14:19 ` Con Kolivas 0 siblings, 1 reply; 112+ messages in thread From: Mike Galbraith @ 2006-03-21 14:17 UTC (permalink / raw) To: Con Kolivas; +Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Wed, 2006-03-22 at 00:53 +1100, Con Kolivas wrote: > The yardstick for changes is now the speed of 'ls' scrolling in the console. > Where exactly are those extra cycles going I wonder? Do you think the > scheduler somehow makes the cpu idle doing nothing in that timespace? Clearly > that's not true, and userspace is making something spin unnecessarily, but > we're gonna fix that by modifying the scheduler.... sigh *Blink* Are you having a bad hair day?? -Mike ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 14:17 ` Mike Galbraith @ 2006-03-21 14:19 ` Con Kolivas 2006-03-21 14:25 ` Ingo Molnar ` (2 more replies) 0 siblings, 3 replies; 112+ messages in thread From: Con Kolivas @ 2006-03-21 14:19 UTC (permalink / raw) To: Mike Galbraith Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Wednesday 22 March 2006 01:17, Mike Galbraith wrote: > On Wed, 2006-03-22 at 00:53 +1100, Con Kolivas wrote: > > The yardstick for changes is now the speed of 'ls' scrolling in the > > console. Where exactly are those extra cycles going I wonder? Do you > > think the scheduler somehow makes the cpu idle doing nothing in that > > timespace? Clearly that's not true, and userspace is making something > > spin unnecessarily, but we're gonna fix that by modifying the > > scheduler.... sigh > > *Blink* > > Are you having a bad hair day?? My hair is approximately 3mm long so it's kinda hard for that to happen. What you're fixing with unfairness is worth pursuing. The 'ls' issue just blows my mind though for reasons I've just said. Where are the magic cycles going when nothing else is running that make it take ten times longer? Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 14:19 ` Con Kolivas @ 2006-03-21 14:25 ` Ingo Molnar 2006-03-21 14:28 ` Con Kolivas 2006-03-21 14:28 ` Mike Galbraith 2006-03-21 14:39 ` Willy Tarreau 2 siblings, 1 reply; 112+ messages in thread From: Ingo Molnar @ 2006-03-21 14:25 UTC (permalink / raw) To: Con Kolivas Cc: Mike Galbraith, Willy Tarreau, lkml, Andrew Morton, bugsplatter * Con Kolivas <kernel@kolivas.org> wrote: > On Wednesday 22 March 2006 01:17, Mike Galbraith wrote: > > On Wed, 2006-03-22 at 00:53 +1100, Con Kolivas wrote: > > > The yardstick for changes is now the speed of 'ls' scrolling in the > > > console. Where exactly are those extra cycles going I wonder? Do you > > > think the scheduler somehow makes the cpu idle doing nothing in that > > > timespace? Clearly that's not true, and userspace is making something > > > spin unnecessarily, but we're gonna fix that by modifying the > > > scheduler.... sigh > > > > *Blink* > > > > Are you having a bad hair day?? > > My hair is approximately 3mm long so it's kinda hard for that to happen. > > What you're fixing with unfairness is worth pursuing. The 'ls' issue > just blows my mind though for reasons I've just said. Where are the > magic cycles going when nothing else is running that make it take ten > times longer? i believe such artifacts are due to array switches not happening (due to the workload getting queued back to rq->active, not rq->expired), and 'ls' only gets a timeslice once in a while, every STARVATION_LIMIT times. I.e. such workloads penalize the CPU-bound 'ls' process quite heavily. Ingo ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 14:25 ` Ingo Molnar @ 2006-03-21 14:28 ` Con Kolivas 2006-03-21 14:30 ` Ingo Molnar 0 siblings, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-21 14:28 UTC (permalink / raw) To: Ingo Molnar Cc: Mike Galbraith, Willy Tarreau, lkml, Andrew Morton, bugsplatter On Wednesday 22 March 2006 01:25, Ingo Molnar wrote: > * Con Kolivas <kernel@kolivas.org> wrote: > > What you're fixing with unfairness is worth pursuing. The 'ls' issue > > just blows my mind though for reasons I've just said. Where are the > > magic cycles going when nothing else is running that make it take ten > > times longer? > > i believe such artifacts are due to array switches not happening (due to > the workload getting queued back to rq->active, not rq->expired), and > 'ls' only gets a timeslice once in a while, every STARVATION_LIMIT > times. I.e. such workloads penalize the CPU-bound 'ls' process quite > heavily. With nothing else running on the machine it should still get all the cpu no matter which array it's on though. Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 14:28 ` Con Kolivas @ 2006-03-21 14:30 ` Ingo Molnar 0 siblings, 0 replies; 112+ messages in thread From: Ingo Molnar @ 2006-03-21 14:30 UTC (permalink / raw) To: Con Kolivas Cc: Mike Galbraith, Willy Tarreau, lkml, Andrew Morton, bugsplatter * Con Kolivas <kernel@kolivas.org> wrote: > On Wednesday 22 March 2006 01:25, Ingo Molnar wrote: > > * Con Kolivas <kernel@kolivas.org> wrote: > > > What you're fixing with unfairness is worth pursuing. The 'ls' issue > > > just blows my mind though for reasons I've just said. Where are the > > > magic cycles going when nothing else is running that make it take ten > > > times longer? > > > > i believe such artifacts are due to array switches not happening (due to > > the workload getting queued back to rq->active, not rq->expired), and > > 'ls' only gets a timeslice once in a while, every STARVATION_LIMIT > > times. I.e. such workloads penalize the CPU-bound 'ls' process quite > > heavily. > > With nothing else running on the machine it should still get all the > cpu no matter which array it's on though. yes. I thought you were asking why 'ls' pauses so long during the aforementioned workloads (of loadavg 7-8) - and i answered that. If you meant something else then please re-explain it to me. Ingo ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 14:19 ` Con Kolivas 2006-03-21 14:25 ` Ingo Molnar @ 2006-03-21 14:28 ` Mike Galbraith 2006-03-21 14:30 ` Con Kolivas 2006-03-21 14:39 ` Willy Tarreau 2 siblings, 1 reply; 112+ messages in thread From: Mike Galbraith @ 2006-03-21 14:28 UTC (permalink / raw) To: Con Kolivas; +Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Wed, 2006-03-22 at 01:19 +1100, Con Kolivas wrote: > On Wednesday 22 March 2006 01:17, Mike Galbraith wrote: > > On Wed, 2006-03-22 at 00:53 +1100, Con Kolivas wrote: > > > The yardstick for changes is now the speed of 'ls' scrolling in the > > > console. Where exactly are those extra cycles going I wonder? Do you > > > think the scheduler somehow makes the cpu idle doing nothing in that > > > timespace? Clearly that's not true, and userspace is making something > > > spin unnecessarily, but we're gonna fix that by modifying the > > > scheduler.... sigh > > > > *Blink* > > > > Are you having a bad hair day?? > > My hair is approximately 3mm long so it's kinda hard for that to happen. > > What you're fixing with unfairness is worth pursuing. The 'ls' issue just > blows my mind though for reasons I've just said. Where are the magic cycles > going when nothing else is running that make it take ten times longer? What I was talking about when I mentioned scrolling was rendering. -Mike ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 14:28 ` Mike Galbraith @ 2006-03-21 14:30 ` Con Kolivas 2006-03-21 14:32 ` Ingo Molnar 2006-03-21 14:36 ` Mike Galbraith 0 siblings, 2 replies; 112+ messages in thread From: Con Kolivas @ 2006-03-21 14:30 UTC (permalink / raw) To: Mike Galbraith Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Wednesday 22 March 2006 01:28, Mike Galbraith wrote: > On Wed, 2006-03-22 at 01:19 +1100, Con Kolivas wrote: > > What you're fixing with unfairness is worth pursuing. The 'ls' issue just > > blows my mind though for reasons I've just said. Where are the magic > > cycles going when nothing else is running that make it take ten times > > longer? > > What I was talking about when I mentioned scrolling was rendering. I'm talking about the long standing report that 'ls' takes 10 times longer on 2.6 90% of the time you run it, and doing 'ls | cat' makes it run as fast as 2.4. This is what Willy has been fighting with. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 14:30 ` Con Kolivas @ 2006-03-21 14:32 ` Ingo Molnar 2006-03-21 14:44 ` Willy Tarreau 2006-03-21 14:36 ` Mike Galbraith 1 sibling, 1 reply; 112+ messages in thread From: Ingo Molnar @ 2006-03-21 14:32 UTC (permalink / raw) To: Con Kolivas Cc: Mike Galbraith, Willy Tarreau, lkml, Andrew Morton, bugsplatter * Con Kolivas <kernel@kolivas.org> wrote: > On Wednesday 22 March 2006 01:28, Mike Galbraith wrote: > > On Wed, 2006-03-22 at 01:19 +1100, Con Kolivas wrote: > > > What you're fixing with unfairness is worth pursuing. The 'ls' issue just > > > blows my mind though for reasons I've just said. Where are the magic > > > cycles going when nothing else is running that make it take ten times > > > longer? > > > > What I was talking about when I mentioned scrolling was rendering. > > I'm talking about the long standing report that 'ls' takes 10 times > longer on 2.6 90% of the time you run it, and doing 'ls | cat' makes > it run as fast as 2.4. This is what Willy has been fighting with. ah. That's i think a gnome-terminal artifact - it does some really stupid dynamic things while rendering, it 'skips' certain portions of rendering, depending on the speed of scrolling. Gnome 2.14 ought to have that fixed i think. Ingo ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 14:32 ` Ingo Molnar @ 2006-03-21 14:44 ` Willy Tarreau 2006-03-21 14:52 ` Ingo Molnar 0 siblings, 1 reply; 112+ messages in thread From: Willy Tarreau @ 2006-03-21 14:44 UTC (permalink / raw) To: Ingo Molnar; +Cc: Con Kolivas, Mike Galbraith, lkml, Andrew Morton, bugsplatter On Tue, Mar 21, 2006 at 03:32:40PM +0100, Ingo Molnar wrote: > > * Con Kolivas <kernel@kolivas.org> wrote: > > > On Wednesday 22 March 2006 01:28, Mike Galbraith wrote: > > > On Wed, 2006-03-22 at 01:19 +1100, Con Kolivas wrote: > > > > What you're fixing with unfairness is worth pursuing. The 'ls' issue just > > > > blows my mind though for reasons I've just said. Where are the magic > > > > cycles going when nothing else is running that make it take ten times > > > > longer? > > > > > > What I was talking about when I mentioned scrolling was rendering. > > > > I'm talking about the long standing report that 'ls' takes 10 times > > longer on 2.6 90% of the time you run it, and doing 'ls | cat' makes > > it run as fast as 2.4. This is what Willy has been fighting with. > > ah. That's i think a gnome-terminal artifact - it does some really > stupid dynamic things while rendering, it 'skips' certain portions of > rendering, depending on the speed of scrolling. Gnome 2.14 ought to have > that fixed i think. Ah no, I never use those montruous environments ! xterm is already heavy. don't you remember, we found that doing "ls" in an xterm was waking the xterm process for every single line, which in turn woke the X server for a one-line scroll, while adding the "|cat" acted like a buffer with batched scrolls. Newer xterms have been improved to trigger jump scroll earlier and don't exhibit this behaviour even on non-patched kernels. However, sshd still shows the same problem IMHO. > Ingo Cheers, Willy ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 14:44 ` Willy Tarreau @ 2006-03-21 14:52 ` Ingo Molnar 2006-03-29 3:01 ` Lee Revell 0 siblings, 1 reply; 112+ messages in thread From: Ingo Molnar @ 2006-03-21 14:52 UTC (permalink / raw) To: Willy Tarreau Cc: Con Kolivas, Mike Galbraith, lkml, Andrew Morton, bugsplatter * Willy Tarreau <willy@w.ods.org> wrote: > Ah no, I never use those montruous environments ! xterm is already > heavy. [...] [ offtopic note: gnome-terminal developers claim some massive speedups in Gnome 2.14, and my experiments on Fedora rawhide seem to corraborate that - gnome-term is now faster (for me) than xterm. ] > [...] don't you remember, we found that doing "ls" in an xterm was > waking the xterm process for every single line, which in turn woke the > X server for a one-line scroll, while adding the "|cat" acted like a > buffer with batched scrolls. Newer xterms have been improved to > trigger jump scroll earlier and don't exhibit this behaviour even on > non-patched kernels. However, sshd still shows the same problem IMHO. yeah. The "|cat" changes the workload, which gets rated by the scheduler differently. Such artifacts are inevitable once interactivity heuristics are strong enough to significantly distort the equal sharing of CPU time. Ingo ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 14:52 ` Ingo Molnar @ 2006-03-29 3:01 ` Lee Revell 2006-03-29 5:56 ` Ray Lee 0 siblings, 1 reply; 112+ messages in thread From: Lee Revell @ 2006-03-29 3:01 UTC (permalink / raw) To: Ingo Molnar Cc: Willy Tarreau, Con Kolivas, Mike Galbraith, lkml, Andrew Morton, bugsplatter On Tue, 2006-03-21 at 15:52 +0100, Ingo Molnar wrote: > * Willy Tarreau <willy@w.ods.org> wrote: > > > Ah no, I never use those montruous environments ! xterm is already > > heavy. [...] > > [ offtopic note: gnome-terminal developers claim some massive speedups > in Gnome 2.14, and my experiments on Fedora rawhide seem to > corraborate that - gnome-term is now faster (for me) than xterm. ] > > > [...] don't you remember, we found that doing "ls" in an xterm was > > waking the xterm process for every single line, which in turn woke the > > X server for a one-line scroll, while adding the "|cat" acted like a > > buffer with batched scrolls. Newer xterms have been improved to > > trigger jump scroll earlier and don't exhibit this behaviour even on > > non-patched kernels. However, sshd still shows the same problem IMHO. > > yeah. The "|cat" changes the workload, which gets rated by the scheduler > differently. Such artifacts are inevitable once interactivity heuristics > are strong enough to significantly distort the equal sharing of CPU > time. Can you explain why terminal output ping-pongs back and forth between taking a certain amount of time, and approximately 10x longer? For example here's the result of "time dmesg" 6 times in an xterm with a constant background workload: real 0m0.086s user 0m0.005s sys 0m0.012s real 0m0.078s user 0m0.008s sys 0m0.009s real 0m0.082s user 0m0.004s sys 0m0.013s real 0m0.084s user 0m0.005s sys 0m0.011s real 0m0.751s user 0m0.006s sys 0m0.017s real 0m0.749s user 0m0.005s sys 0m0.017s Why does it ping-pong between taking ~0.08s and ~0.75s like that? The behavior is completely reproducible. Lee ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-29 3:01 ` Lee Revell @ 2006-03-29 5:56 ` Ray Lee 2006-03-29 6:16 ` Lee Revell 0 siblings, 1 reply; 112+ messages in thread From: Ray Lee @ 2006-03-29 5:56 UTC (permalink / raw) To: Lee Revell Cc: Ingo Molnar, Willy Tarreau, Con Kolivas, Mike Galbraith, lkml, Andrew Morton, bugsplatter On 3/28/06, Lee Revell <rlrevell@joe-job.com> wrote: > Can you explain why terminal output ping-pongs back and forth between > taking a certain amount of time, and approximately 10x longer? [...] > Why does it ping-pong between taking ~0.08s and ~0.75s like that? The > behavior is completely reproducible. Does the scheduler have any concept of dependent tasks? (If so, hit <delete> and move on.) If not, then the producer and consumer will be scheduled randomly w/r/t each other, right? Sometimes producer then consumer, sometimes vice versa. If so, the ping pong should be half of the time slow, half of the time fast (+/- sqrt(N)), and the slow time should scale directly with the number of tasks running on the system. Do any of the above WAGs match what you see? If so, then perhaps it's random just due to the order in which the tasks get initially scheduled (dmesg vs ssh, or dmesg vs xterm vs X -- er, though I guess in that latter case there's really <thinks> three separate timings that you'd get back, as the triple set of tasks could be in one of six orderings, one fast, one slow, and four equally mixed between the two). I wonder if on a pipe write, moving the reader to be right after the writer in the list would even that out. (But only on cases where the reader didn't just run -- wouldn't want a back and forth conversation to starve everyone else...) But like I said, just a WAG. Ray ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-29 5:56 ` Ray Lee @ 2006-03-29 6:16 ` Lee Revell 0 siblings, 0 replies; 112+ messages in thread From: Lee Revell @ 2006-03-29 6:16 UTC (permalink / raw) To: ray-gmail Cc: Ingo Molnar, Willy Tarreau, Con Kolivas, Mike Galbraith, lkml, Andrew Morton, bugsplatter On Tue, 2006-03-28 at 21:56 -0800, Ray Lee wrote: > Do any of the above WAGs match what you see? If so, then perhaps it's > random just due to the order in which the tasks get initially > scheduled (dmesg vs ssh, or dmesg vs xterm vs X -- er, though I guess > in that latter case there's really <thinks> three separate timings > that you'd get back, as the triple set of tasks could be in one of six > orderings, one fast, one slow, and four equally mixed between the > two). > Possibly - *very* rarely, like 1 out of 50 or 100 times, it falls somewhere in the middle. Lee ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 14:30 ` Con Kolivas 2006-03-21 14:32 ` Ingo Molnar @ 2006-03-21 14:36 ` Mike Galbraith 2006-03-21 14:39 ` Con Kolivas 1 sibling, 1 reply; 112+ messages in thread From: Mike Galbraith @ 2006-03-21 14:36 UTC (permalink / raw) To: Con Kolivas; +Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Wed, 2006-03-22 at 01:30 +1100, Con Kolivas wrote: > On Wednesday 22 March 2006 01:28, Mike Galbraith wrote: > > On Wed, 2006-03-22 at 01:19 +1100, Con Kolivas wrote: > > > What you're fixing with unfairness is worth pursuing. The 'ls' issue just > > > blows my mind though for reasons I've just said. Where are the magic > > > cycles going when nothing else is running that make it take ten times > > > longer? > > > > What I was talking about when I mentioned scrolling was rendering. > > I'm talking about the long standing report that 'ls' takes 10 times longer on > 2.6 90% of the time you run it, and doing 'ls | cat' makes it run as fast as > 2.4. This is what Willy has been fighting with. Oh. I thought you were calling me a _moron_ :) -Mike ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 14:36 ` Mike Galbraith @ 2006-03-21 14:39 ` Con Kolivas 0 siblings, 0 replies; 112+ messages in thread From: Con Kolivas @ 2006-03-21 14:39 UTC (permalink / raw) To: Mike Galbraith Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Wednesday 22 March 2006 01:36, Mike Galbraith wrote: > Oh. I thought you were calling me a _moron_ :) No, never assume any emotion in email and I'm sorry if you interpreted it that way. Since I run my own mailing list I had to make a FAQ on this. http://ck.kolivas.org/faqs/replying-to-mailing-list.txt Extract: 4. Be polite Humans by nature don't realise how much they depend on seeing facial expressions, voice intonations and body language to determine the emotion associated with words. In the context of email it is very common to misinterpret people's emotions based on the text alone. English subtleties will often be misinterpreted even across English speaking nations, and for non-English speakers it becomes much harder. Without the author explicitly stating his emotions, assume neutrality and respond politely. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 14:19 ` Con Kolivas 2006-03-21 14:25 ` Ingo Molnar 2006-03-21 14:28 ` Mike Galbraith @ 2006-03-21 14:39 ` Willy Tarreau 2006-03-21 18:39 ` Rafael J. Wysocki 2 siblings, 1 reply; 112+ messages in thread From: Willy Tarreau @ 2006-03-21 14:39 UTC (permalink / raw) To: Con Kolivas; +Cc: Mike Galbraith, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Wed, Mar 22, 2006 at 01:19:49AM +1100, Con Kolivas wrote: > On Wednesday 22 March 2006 01:17, Mike Galbraith wrote: > > On Wed, 2006-03-22 at 00:53 +1100, Con Kolivas wrote: > > > The yardstick for changes is now the speed of 'ls' scrolling in the > > > console. Where exactly are those extra cycles going I wonder? Do you > > > think the scheduler somehow makes the cpu idle doing nothing in that > > > timespace? Clearly that's not true, and userspace is making something > > > spin unnecessarily, but we're gonna fix that by modifying the > > > scheduler.... sigh > > > > *Blink* > > > > Are you having a bad hair day?? > > My hair is approximately 3mm long so it's kinda hard for that to happen. > > What you're fixing with unfairness is worth pursuing. The 'ls' issue just > blows my mind though for reasons I've just said. Where are the magic cycles > going when nothing else is running that make it take ten times longer? Con, those cycles are not "magic", if you look at the numbers, the time is not spent in the process itself. From what has been observed since the beginning, it is spent : - in other processes which are starvating the CPU (eg: X11 when xterm scrolls) - in context switches when you have a pipe somewhere and the CPU is bouncing between tasks. Concerning your angriness about me being OK with (0,0) and still asking for tunables, it's precisely because I know that *my* workload is not everyone else's, and I don't want to conclude too quickly that there are only two types of workloads. Maybe you're right, maybe you're wrong. At least you're right for as long as no other workload has been identified. But thinking like this is like some time ago when we thought that "if it runs XMMS without skipping, it'll be OK for everyone". > Cheers, > Con Cheers, Willy ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 14:39 ` Willy Tarreau @ 2006-03-21 18:39 ` Rafael J. Wysocki 2006-03-21 19:32 ` Willy Tarreau 0 siblings, 1 reply; 112+ messages in thread From: Rafael J. Wysocki @ 2006-03-21 18:39 UTC (permalink / raw) To: Willy Tarreau Cc: Con Kolivas, Mike Galbraith, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Tuesday 21 March 2006 15:39, Willy Tarreau wrote: > On Wed, Mar 22, 2006 at 01:19:49AM +1100, Con Kolivas wrote: > > On Wednesday 22 March 2006 01:17, Mike Galbraith wrote: > > > On Wed, 2006-03-22 at 00:53 +1100, Con Kolivas wrote: > > > > The yardstick for changes is now the speed of 'ls' scrolling in the > > > > console. Where exactly are those extra cycles going I wonder? Do you > > > > think the scheduler somehow makes the cpu idle doing nothing in that > > > > timespace? Clearly that's not true, and userspace is making something > > > > spin unnecessarily, but we're gonna fix that by modifying the > > > > scheduler.... sigh > > > > > > *Blink* > > > > > > Are you having a bad hair day?? > > > > My hair is approximately 3mm long so it's kinda hard for that to happen. > > > > What you're fixing with unfairness is worth pursuing. The 'ls' issue just > > blows my mind though for reasons I've just said. Where are the magic cycles > > going when nothing else is running that make it take ten times longer? > > Con, those cycles are not "magic", if you look at the numbers, the time is > not spent in the process itself. From what has been observed since the > beginning, it is spent : > - in other processes which are starvating the CPU (eg: X11 when xterm > scrolls) > - in context switches when you have a pipe somewhere and the CPU is > bouncing between tasks. > > Concerning your angriness about me being OK with (0,0) and still > asking for tunables, it's precisely because I know that *my* workload > is not everyone else's, and I don't want to conclude too quickly that > there are only two types of workloads. Well, perhaps we can assume there are only two types of workloads and wait for a test case that will show the assumption is wrong? > Maybe you're right, maybe you're wrong. At least you're right for as long > as no other workload has been identified. But thinking like this is like > some time ago when we thought that "if it runs XMMS without skipping, > it'll be OK for everyone". However, we should not try to anticipate every possible kind of workload IMHO. Greetings, Rafael ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 18:39 ` Rafael J. Wysocki @ 2006-03-21 19:32 ` Willy Tarreau 2006-03-21 21:47 ` Rafael J. Wysocki 0 siblings, 1 reply; 112+ messages in thread From: Willy Tarreau @ 2006-03-21 19:32 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Con Kolivas, Mike Galbraith, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Tue, Mar 21, 2006 at 07:39:11PM +0100, Rafael J. Wysocki wrote: > On Tuesday 21 March 2006 15:39, Willy Tarreau wrote: > > On Wed, Mar 22, 2006 at 01:19:49AM +1100, Con Kolivas wrote: > > > On Wednesday 22 March 2006 01:17, Mike Galbraith wrote: > > > > On Wed, 2006-03-22 at 00:53 +1100, Con Kolivas wrote: > > > > > The yardstick for changes is now the speed of 'ls' scrolling in the > > > > > console. Where exactly are those extra cycles going I wonder? Do you > > > > > think the scheduler somehow makes the cpu idle doing nothing in that > > > > > timespace? Clearly that's not true, and userspace is making something > > > > > spin unnecessarily, but we're gonna fix that by modifying the > > > > > scheduler.... sigh > > > > > > > > *Blink* > > > > > > > > Are you having a bad hair day?? > > > > > > My hair is approximately 3mm long so it's kinda hard for that to happen. > > > > > > What you're fixing with unfairness is worth pursuing. The 'ls' issue just > > > blows my mind though for reasons I've just said. Where are the magic cycles > > > going when nothing else is running that make it take ten times longer? > > > > Con, those cycles are not "magic", if you look at the numbers, the time is > > not spent in the process itself. From what has been observed since the > > beginning, it is spent : > > - in other processes which are starvating the CPU (eg: X11 when xterm > > scrolls) > > - in context switches when you have a pipe somewhere and the CPU is > > bouncing between tasks. > > > > Concerning your angriness about me being OK with (0,0) and still > > asking for tunables, it's precisely because I know that *my* workload > > is not everyone else's, and I don't want to conclude too quickly that > > there are only two types of workloads. > > Well, perhaps we can assume there are only two types of workloads and > wait for a test case that will show the assumption is wrong? It would certainly fit most usages, but as soon as we find another group of users complaining, we will add another sysctl just for them ? Perhaps we could just resume the two current sysctls into one called "interactivity_boost" with a value between 0 and 100, with the ability for any user to increase or decrease it easily ? Mainline would be pre-configured with something reasonable, like what Mike proposed as default values for example, and server admins would only set it to zero while desktop-intensive users could increase it a bit if they like to. > > Maybe you're right, maybe you're wrong. At least you're right for as long > > as no other workload has been identified. But thinking like this is like > > some time ago when we thought that "if it runs XMMS without skipping, > > it'll be OK for everyone". > > However, we should not try to anticipate every possible kind of workload > IMHO. I generally agree on this, except that we got caught once in this area for this exact reason. > Greetings, > Rafael Regards, Willy ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 19:32 ` Willy Tarreau @ 2006-03-21 21:47 ` Rafael J. Wysocki 0 siblings, 0 replies; 112+ messages in thread From: Rafael J. Wysocki @ 2006-03-21 21:47 UTC (permalink / raw) To: Willy Tarreau Cc: Con Kolivas, Mike Galbraith, Ingo Molnar, lkml, Andrew Morton, bugsplatter On Tuesday 21 March 2006 20:32, Willy Tarreau wrote: > On Tue, Mar 21, 2006 at 07:39:11PM +0100, Rafael J. Wysocki wrote: > > On Tuesday 21 March 2006 15:39, Willy Tarreau wrote: > > > On Wed, Mar 22, 2006 at 01:19:49AM +1100, Con Kolivas wrote: > > > > On Wednesday 22 March 2006 01:17, Mike Galbraith wrote: > > > > > On Wed, 2006-03-22 at 00:53 +1100, Con Kolivas wrote: > > > > > > The yardstick for changes is now the speed of 'ls' scrolling in the > > > > > > console. Where exactly are those extra cycles going I wonder? Do you > > > > > > think the scheduler somehow makes the cpu idle doing nothing in that > > > > > > timespace? Clearly that's not true, and userspace is making something > > > > > > spin unnecessarily, but we're gonna fix that by modifying the > > > > > > scheduler.... sigh > > > > > > > > > > *Blink* > > > > > > > > > > Are you having a bad hair day?? > > > > > > > > My hair is approximately 3mm long so it's kinda hard for that to happen. > > > > > > > > What you're fixing with unfairness is worth pursuing. The 'ls' issue just > > > > blows my mind though for reasons I've just said. Where are the magic cycles > > > > going when nothing else is running that make it take ten times longer? > > > > > > Con, those cycles are not "magic", if you look at the numbers, the time is > > > not spent in the process itself. From what has been observed since the > > > beginning, it is spent : > > > - in other processes which are starvating the CPU (eg: X11 when xterm > > > scrolls) > > > - in context switches when you have a pipe somewhere and the CPU is > > > bouncing between tasks. > > > > > > Concerning your angriness about me being OK with (0,0) and still > > > asking for tunables, it's precisely because I know that *my* workload > > > is not everyone else's, and I don't want to conclude too quickly that > > > there are only two types of workloads. > > > > Well, perhaps we can assume there are only two types of workloads and > > wait for a test case that will show the assumption is wrong? > > It would certainly fit most usages, but as soon as we find another group > of users complaining, we will add another sysctl just for them ? Perhaps > we could just resume the two current sysctls into one called > "interactivity_boost" with a value between 0 and 100, with the ability > for any user to increase or decrease it easily ? Mainline would be > pre-configured with something reasonable, like what Mike proposed as > default values for example, and server admins would only set it to > zero while desktop-intensive users could increase it a bit if they like > to. Sounds reasonable to me. Greetings, Rafael ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 13:24 ` Mike Galbraith 2006-03-21 13:53 ` Con Kolivas @ 2006-03-21 22:51 ` Peter Williams 2006-03-22 3:49 ` Mike Galbraith 1 sibling, 1 reply; 112+ messages in thread From: Peter Williams @ 2006-03-21 22:51 UTC (permalink / raw) To: Mike Galbraith Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, Con Kolivas, bugsplatter Mike Galbraith wrote: > On Tue, 2006-03-21 at 13:59 +0100, Willy Tarreau wrote: > >>On Tue, Mar 21, 2006 at 01:07:58PM +0100, Mike Galbraith wrote: > > >>>I can make the knobs compile time so we don't see random behavior >>>reports, but I don't think they can be totally eliminated. Would that >>>be sufficient? >>> >>>If so, the numbers as delivered should be fine for desktop boxen I >>>think. People who are building custom kernels can bend to fit as >>>always. >> >>That would suit me perfectly. I think I would set them both to zero. >>It's not clear to me what workload they can help, it seems that they >>try to allow a sometimes unfair scheduling. > > > Correct. Massively unfair scheduling is what interactivity requires. > Selective unfairness not massive unfairness is what's required. The hard part is automating the selectiveness especially when there are three quite different types of task that need special treatment: 1) the X server, 2) normal interactive tasks and 3) media streamers; each of which has different behavioural characteristics. A single mechanism that classifies all of these as "interactive" will unfortunately catch a lot of tasks that don't belong to any one of these types. Peter -- Peter Williams pwil3058@bigpond.net.au "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-21 22:51 ` Peter Williams @ 2006-03-22 3:49 ` Mike Galbraith 2006-03-22 3:59 ` Peter Williams 2006-03-22 12:14 ` [interbench numbers] " Mike Galbraith 0 siblings, 2 replies; 112+ messages in thread From: Mike Galbraith @ 2006-03-22 3:49 UTC (permalink / raw) To: Peter Williams Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, Con Kolivas, bugsplatter On Wed, 2006-03-22 at 09:51 +1100, Peter Williams wrote: > Mike Galbraith wrote: > > On Tue, 2006-03-21 at 13:59 +0100, Willy Tarreau wrote: > >>That would suit me perfectly. I think I would set them both to zero. > >>It's not clear to me what workload they can help, it seems that they > >>try to allow a sometimes unfair scheduling. > > > > > > Correct. Massively unfair scheduling is what interactivity requires. > > > > Selective unfairness not massive unfairness is what's required. The > hard part is automating the selectiveness especially when there are > three quite different types of task that need special treatment: 1) the > X server, 2) normal interactive tasks and 3) media streamers; each of > which has different behavioural characteristics. A single mechanism > that classifies all of these as "interactive" will unfortunately catch a > lot of tasks that don't belong to any one of these types. Yes, selective would be nice, but it's still massively unfair that is required. There is no criteria available for discrimination, so my patches don't even try to classify, they only enforce the rules. I don't classify X as interactive, I merely provide a mechanism which enables X to accumulate the cycles an interactive task needs to be able to perform by actually _being_ interactive, by conforming to the definition of sleep_avg. Fortunately, it uses that mechanism. I do nothing more than trade stout rope for good behavior. I anchor one end to a boulder, the other to a task's neck. The mechanism is agnostic. The task determines whether it gets hung or not, and the user determines how long the rope is. -Mike ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: interactive task starvation 2006-03-22 3:49 ` Mike Galbraith @ 2006-03-22 3:59 ` Peter Williams 2006-03-22 12:14 ` [interbench numbers] " Mike Galbraith 1 sibling, 0 replies; 112+ messages in thread From: Peter Williams @ 2006-03-22 3:59 UTC (permalink / raw) To: Mike Galbraith Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, Con Kolivas, bugsplatter Mike Galbraith wrote: > On Wed, 2006-03-22 at 09:51 +1100, Peter Williams wrote: > >>Mike Galbraith wrote: >> >>>On Tue, 2006-03-21 at 13:59 +0100, Willy Tarreau wrote: >>> >>>>That would suit me perfectly. I think I would set them both to zero. >>>>It's not clear to me what workload they can help, it seems that they >>>>try to allow a sometimes unfair scheduling. >>> >>> >>>Correct. Massively unfair scheduling is what interactivity requires. >>> >> >>Selective unfairness not massive unfairness is what's required. The >>hard part is automating the selectiveness especially when there are >>three quite different types of task that need special treatment: 1) the >>X server, 2) normal interactive tasks and 3) media streamers; each of >>which has different behavioural characteristics. A single mechanism >>that classifies all of these as "interactive" will unfortunately catch a >>lot of tasks that don't belong to any one of these types. > > > Yes, selective would be nice, but it's still massively unfair that is > required. There is no criteria available for discrimination, so my > patches don't even try to classify, they only enforce the rules. I > don't classify X as interactive, I merely provide a mechanism which > enables X to accumulate the cycles an interactive task needs to be able > to perform by actually _being_ interactive, by conforming to the > definition of sleep_avg. That's what I mean by classification :-) > Fortunately, it uses that mechanism. I do > nothing more than trade stout rope for good behavior. I anchor one end > to a boulder, the other to a task's neck. The mechanism is agnostic. > The task determines whether it gets hung or not, and the user determines > how long the rope is. I view that as a modification (hopefully an improvement) of the classification rules :-). In particular, a variation in the persistence of a classification and the criteria for losing/downgrading it. Peter -- Peter Williams pwil3058@bigpond.net.au "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce ^ permalink raw reply [flat|nested] 112+ messages in thread
* [interbench numbers] Re: interactive task starvation 2006-03-22 3:49 ` Mike Galbraith 2006-03-22 3:59 ` Peter Williams @ 2006-03-22 12:14 ` Mike Galbraith 2006-03-22 20:27 ` Con Kolivas 1 sibling, 1 reply; 112+ messages in thread From: Mike Galbraith @ 2006-03-22 12:14 UTC (permalink / raw) To: lkml Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, Con Kolivas, bugsplatter, Peter Williams Greetings, I was asked to do some interbench runs, with various throttle settings, see below. I'll not attempt to interpret results, only present raw data for others to examine. Tested throttling patches version is V24, because while I was compiling 2.6.16-rc6-mm2 in preparation for comparison, I found I'd introduced an SMP buglet in V23. Something good came from the added testing whether the results are informative or not :) -Mike 1. virgin 2.6.16-rc6-mm2. Using 1975961 loops per ms, running every load for 30 seconds Benchmarking kernel 2.6.16-rc6-mm2-smp at datestamp 200603221223 --- Benchmarking simulated cpu of Audio in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met None 0.024 +/- 0.0486 1 100 100 Video 0.996 +/- 1.31 6.05 100 100 X 0.336 +/- 0.739 5.01 100 100 Burn 0.028 +/- 0.0905 2.05 100 100 Write 0.058 +/- 0.508 12.1 100 100 Read 0.043 +/- 0.115 1.66 100 100 Compile 0.047 +/- 0.126 2.55 100 100 Memload 0.258 +/- 4.57 112 99.8 99.8 --- Benchmarking simulated cpu of Video in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met None 0.031 +/- 0.396 16.7 100 99.9 X 0.722 +/- 3.35 30.7 100 97 Burn 0.531 +/- 7.42 246 99.1 98 Write 0.302 +/- 2.31 40.4 99.9 98.5 Read 0.092 +/- 1.11 32.9 99.9 99.7 Compile 0.428 +/- 2.77 36.3 99.9 97.9 Memload 0.235 +/- 3.3 104 99.5 99.1 --- Benchmarking simulated cpu of X in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met None 1.25 +/- 6.46 70 85.8 83.2 Video 17.8 +/- 32 92 31.7 22.3 Burn 45.5 +/- 97.5 503 8.35 4.22 Write 3.55 +/- 12.2 66 79.9 73.6 Read 0.739 +/- 3.04 20 87.4 83 Compile 51.9 +/- 122 857 10.7 5.34 Memload 1.81 +/- 6.67 54 85.1 78.3 --- Benchmarking simulated cpu of Gaming in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU None 8.65 +/- 14.8 116 92 Video 77.9 +/- 78.5 107 56.2 X 64.2 +/- 72.9 124 60.9 Burn 301 +/- 317 524 24.9 Write 26.8 +/- 45.6 135 78.9 Read 13.1 +/- 16.8 67.9 88.4 Compile 478 +/- 519 765 17.3 Memload 21.1 +/- 28.8 148 82.6 2. 2.6.16-rc6-mm2x with no throttling. Using 1975961 loops per ms, running every load for 30 seconds Benchmarking kernel 2.6.16-rc6-mm2x-smp at datestamp 200603220914 --- Benchmarking simulated cpu of Audio in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met None 0.062 +/- 0.11 1.09 100 100 Video 1.15 +/- 1.53 11.4 100 100 X 0.223 +/- 0.609 6.09 100 100 Burn 0.039 +/- 0.258 6.01 100 100 Write 0.194 +/- 0.837 14 100 100 Read 0.05 +/- 0.202 3.01 100 100 Compile 0.216 +/- 1.36 19 100 100 Memload 0.218 +/- 2.22 51.4 100 99.8 --- Benchmarking simulated cpu of Video in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met None 0.185 +/- 1.6 18.8 100 99.1 X 1.27 +/- 4.47 27 100 94.3 Burn 1.57 +/- 13.3 345 98.1 93 Write 0.819 +/- 3.76 34.7 99.9 96 Read 0.301 +/- 2.05 18.7 100 98.5 Compile 4.22 +/- 12.9 233 92.4 80.2 Memload 0.624 +/- 3.46 66.7 99.6 97 --- Benchmarking simulated cpu of X in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met None 2.57 +/- 7.94 43 74.6 67.7 Video 17.6 +/- 32.2 99 31.2 22.3 Burn 40.1 +/- 79.4 716 12.9 6.65 Write 6.03 +/- 16.6 80 75.1 64.6 Read 2.52 +/- 7.49 42 74.8 66.7 Compile 54.1 +/- 79.3 410 15.6 6.56 Memload 2.08 +/- 6.93 48 77.3 71.7 --- Benchmarking simulated cpu of Gaming in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU None 12.3 +/- 16.6 65.3 89 Video 78.7 +/- 79.4 109 56 X 70.6 +/- 78.2 128 58.6 Burn 468 +/- 492 737 17.6 Write 36.6 +/- 52.7 300 73.2 Read 18.3 +/- 20.6 47.9 84.5 Compile 468 +/- 486 802 17.6 Memload 21.4 +/- 27 132 82.4 3. 2.6.16-rc6-mm2x with default settings. Using 1975961 loops per ms, running every load for 30 seconds Benchmarking kernel 2.6.16-rc6-mm2x-smp at datestamp 200603221006 --- Benchmarking simulated cpu of Audio in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met None 0.033 +/- 0.0989 1.05 100 100 Video 0.859 +/- 1.17 7.45 100 100 X 0.239 +/- 0.662 7.1 100 100 Burn 0.06 +/- 0.382 7.86 100 100 Write 0.123 +/- 0.422 4.12 100 100 Read 0.045 +/- 0.103 1.18 100 100 Compile 0.292 +/- 2.9 65.8 100 99.8 Memload 0.256 +/- 3.78 91.8 100 99.8 --- Benchmarking simulated cpu of Video in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met None 0.101 +/- 1.06 16.7 100 99.6 X 1.13 +/- 4.38 33.7 99.9 95.2 Burn 10.7 +/- 47.1 410 67.2 64.7 Write 1.17 +/- 10.9 417 98.2 94.8 Read 0.127 +/- 1.13 16.8 100 99.6 Compile 8.6 +/- 32.6 200 70.7 63.6 Memload 0.512 +/- 3.32 83.5 99.7 97.6 --- Benchmarking simulated cpu of X in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met None 2.2 +/- 7.75 51 81.9 74.9 Video 15.8 +/- 29.4 81 33 23.9 Burn 74.1 +/- 124 406 18.5 9.57 Write 4.6 +/- 14 86 55 48.5 Read 1.75 +/- 5.16 26 80.7 73.1 Compile 71.2 +/- 124 468 21.8 12.2 Memload 2.95 +/- 9.31 70 75.6 69.1 --- Benchmarking simulated cpu of Gaming in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU None 13.7 +/- 17.9 56.4 87.9 Video 74.6 +/- 75.4 98.5 57.3 X 68.2 +/- 76.1 128 59.4 Burn 515 +/- 526 735 16.3 Write 35.5 +/- 58.3 505 73.8 Read 15.7 +/- 17.8 45.8 86.4 Compile 436 +/- 453 863 18.7 Memload 22.3 +/- 30.1 227 81.8 4. 2.6.16-rc6-mm2x with max throttling. Using 1975961 loops per ms, running every load for 30 seconds Benchmarking kernel 2.6.16-rc6-mm2x-smp at datestamp 200603220938 --- Benchmarking simulated cpu of Audio in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met None 0.035 +/- 0.118 2.01 100 100 Video 0.043 +/- 0.231 5.02 100 100 X 0.109 +/- 0.737 12.3 100 100 Burn 0.072 +/- 0.574 9.78 100 100 Write 0.11 +/- 0.367 4.14 100 100 Read 0.052 +/- 0.141 2.02 100 100 Compile 0.5 +/- 4.84 112 99.8 99.8 Memload 0.093 +/- 0.461 9.13 100 100 --- Benchmarking simulated cpu of Video in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met None 0.187 +/- 1.59 16.7 100 99.1 X 2.4 +/- 6.26 32.8 99.9 90 Burn 59.7 +/- 130 478 27.1 23.8 Write 2.08 +/- 9.24 208 98.3 90.5 Read 0.154 +/- 1.3 18.8 100 99.4 Compile 57.9 +/- 130 714 28.3 22.4 Memload 0.743 +/- 3.7 66.7 99.8 96.3 --- Benchmarking simulated cpu of X in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met None 1.73 +/- 6.46 42 74.4 70 Video 13.3 +/- 24.5 74 39.8 29.2 Burn 142 +/- 206 579 9.11 4.69 Write 4.51 +/- 14.1 88.4 61.4 55.5 Read 1.38 +/- 4.38 24 85.3 78.3 Compile 126 +/- 190 619 12.4 6.51 Memload 3.61 +/- 11.7 70 61.7 55.8 --- Benchmarking simulated cpu of Gaming in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU None 12.9 +/- 16.5 67.6 88.6 Video 67.7 +/- 69 97.3 59.6 X 70.7 +/- 77.7 130 58.6 Burn 355 +/- 367 625 22 Write 35.6 +/- 61.3 545 73.8 Read 23.1 +/- 28.4 115 81.3 Compile 467 +/- 485 793 17.6 Memload 25.6 +/- 32.9 138 79.6 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [interbench numbers] Re: interactive task starvation 2006-03-22 12:14 ` [interbench numbers] " Mike Galbraith @ 2006-03-22 20:27 ` Con Kolivas 2006-03-23 3:22 ` Mike Galbraith 0 siblings, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-22 20:27 UTC (permalink / raw) To: Mike Galbraith Cc: lkml, Willy Tarreau, Ingo Molnar, Andrew Morton, bugsplatter, Peter Williams On Wednesday 22 March 2006 23:14, Mike Galbraith wrote: > Greetings, > > I was asked to do some interbench runs, with various throttle settings, > see below. I'll not attempt to interpret results, only present raw data > for others to examine. > > Tested throttling patches version is V24, because while I was compiling > 2.6.16-rc6-mm2 in preparation for comparison, I found I'd introduced an > SMP buglet in V23. Something good came from the added testing whether > the results are informative or not :) Thanks! I wonder why the results are affected even without any throttling settings but just patched in? Specifically I'm talking about deadlines met with video being sensitive to this. Were there any other config differences between the tests? Changing HZ would invalidate the results for example. Comments? Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [interbench numbers] Re: interactive task starvation 2006-03-22 20:27 ` Con Kolivas @ 2006-03-23 3:22 ` Mike Galbraith 2006-03-23 5:43 ` Con Kolivas 0 siblings, 1 reply; 112+ messages in thread From: Mike Galbraith @ 2006-03-23 3:22 UTC (permalink / raw) To: Con Kolivas Cc: lkml, Willy Tarreau, Ingo Molnar, Andrew Morton, bugsplatter, Peter Williams On Thu, 2006-03-23 at 07:27 +1100, Con Kolivas wrote: > On Wednesday 22 March 2006 23:14, Mike Galbraith wrote: > > Greetings, > > > > I was asked to do some interbench runs, with various throttle settings, > > see below. I'll not attempt to interpret results, only present raw data > > for others to examine. > > > > Tested throttling patches version is V24, because while I was compiling > > 2.6.16-rc6-mm2 in preparation for comparison, I found I'd introduced an > > SMP buglet in V23. Something good came from the added testing whether > > the results are informative or not :) > > Thanks! > > I wonder why the results are affected even without any throttling settings but > just patched in? Specifically I'm talking about deadlines met with video > being sensitive to this. Were there any other config differences between the > tests? Changing HZ would invalidate the results for example. Comments? I wondered the same. The only difference then is the lower idle sleep prio, tighter timeslice enforcement, and the SMP buglet fix for now < p->timestamp due to SMP rounding. Configs are identical. -Mike ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [interbench numbers] Re: interactive task starvation 2006-03-23 3:22 ` Mike Galbraith @ 2006-03-23 5:43 ` Con Kolivas 2006-03-23 5:53 ` Mike Galbraith 0 siblings, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-23 5:43 UTC (permalink / raw) To: Mike Galbraith Cc: lkml, Willy Tarreau, Ingo Molnar, Andrew Morton, bugsplatter, Peter Williams On Thu, 23 Mar 2006 02:22 pm, Mike Galbraith wrote: > On Thu, 2006-03-23 at 07:27 +1100, Con Kolivas wrote: > > I wonder why the results are affected even without any throttling > > settings but just patched in? Specifically I'm talking about deadlines > > met with video being sensitive to this. Were there any other config > > differences between the tests? Changing HZ would invalidate the results > > for example. Comments? > > I wondered the same. The only difference then is the lower idle sleep > prio, tighter timeslice enforcement, and the SMP buglet fix for now < > p->timestamp due to SMP rounding. Configs are identical. Ok well if we're going to run with this set of changes then we need to assess the affect of each change and splitting them up into separate patches would be appropriate normally anyway. That will allow us to track down which particular patch causes it. That won't mean we will turn down the change based on that one result, though, it will just help us understand it better. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [interbench numbers] Re: interactive task starvation 2006-03-23 5:43 ` Con Kolivas @ 2006-03-23 5:53 ` Mike Galbraith 2006-03-23 11:07 ` Mike Galbraith 0 siblings, 1 reply; 112+ messages in thread From: Mike Galbraith @ 2006-03-23 5:53 UTC (permalink / raw) To: Con Kolivas Cc: lkml, Willy Tarreau, Ingo Molnar, Andrew Morton, bugsplatter, Peter Williams On Thu, 2006-03-23 at 16:43 +1100, Con Kolivas wrote: > On Thu, 23 Mar 2006 02:22 pm, Mike Galbraith wrote: > > On Thu, 2006-03-23 at 07:27 +1100, Con Kolivas wrote: > > > I wonder why the results are affected even without any throttling > > > settings but just patched in? Specifically I'm talking about deadlines > > > met with video being sensitive to this. Were there any other config > > > differences between the tests? Changing HZ would invalidate the results > > > for example. Comments? > > > > I wondered the same. The only difference then is the lower idle sleep > > prio, tighter timeslice enforcement, and the SMP buglet fix for now < > > p->timestamp due to SMP rounding. Configs are identical. > > Ok well if we're going to run with this set of changes then we need to assess > the affect of each change and splitting them up into separate patches would > be appropriate normally anyway. That will allow us to track down which > particular patch causes it. That won't mean we will turn down the change > based on that one result, though, it will just help us understand it better. I'm investigating now. -Mike ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [interbench numbers] Re: interactive task starvation 2006-03-23 5:53 ` Mike Galbraith @ 2006-03-23 11:07 ` Mike Galbraith 2006-03-24 0:21 ` Con Kolivas 0 siblings, 1 reply; 112+ messages in thread From: Mike Galbraith @ 2006-03-23 11:07 UTC (permalink / raw) To: Con Kolivas Cc: lkml, Willy Tarreau, Ingo Molnar, Andrew Morton, bugsplatter, Peter Williams On Thu, 2006-03-23 at 06:53 +0100, Mike Galbraith wrote: > On Thu, 2006-03-23 at 16:43 +1100, Con Kolivas wrote: > > On Thu, 23 Mar 2006 02:22 pm, Mike Galbraith wrote: > > > On Thu, 2006-03-23 at 07:27 +1100, Con Kolivas wrote: > > > > I wonder why the results are affected even without any throttling > > > > settings but just patched in? Specifically I'm talking about deadlines > > > > met with video being sensitive to this. Were there any other config > > > > differences between the tests? Changing HZ would invalidate the results > > > > for example. Comments? > > > > > > I wondered the same. The only difference then is the lower idle sleep > > > prio, tighter timeslice enforcement, and the SMP buglet fix for now < > > > p->timestamp due to SMP rounding. Configs are identical. > > > > Ok well if we're going to run with this set of changes then we need to assess > > the affect of each change and splitting them up into separate patches would > > be appropriate normally anyway. That will allow us to track down which > > particular patch causes it. That won't mean we will turn down the change > > based on that one result, though, it will just help us understand it better. > > I'm investigating now. Nothing conclusive. Some of the difference may be because interbench has a dependency on the idle sleep path popping tasks in a prio 16 instead of 18. Some of it may be because I'm not restricting IO, doing that makes a bit of difference. Some of it is definitely plain old jitter. Six hours is long enough. I'm all done chasing interbench numbers. -Mike virgin --- Benchmarking simulated cpu of Video in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met None 0.031 +/- 0.396 16.7 100 99.9 X 0.722 +/- 3.35 30.7 100 97 Burn 0.531 +/- 7.42 246 99.1 98 Write 0.302 +/- 2.31 40.4 99.9 98.5 Read 0.092 +/- 1.11 32.9 99.9 99.7 Compile 0.428 +/- 2.77 36.3 99.9 97.9 Memload 0.235 +/- 3.3 104 99.5 99.1 throttle patches with throttling disabled --- Benchmarking simulated cpu of Video in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met None 0.185 +/- 1.6 18.8 100 99.1 X 1.27 +/- 4.47 27 100 94.3 Burn 1.57 +/- 13.3 345 98.1 93 Write 0.819 +/- 3.76 34.7 99.9 96 Read 0.301 +/- 2.05 18.7 100 98.5 Compile 4.22 +/- 12.9 233 92.4 80.2 Memload 0.624 +/- 3.46 66.7 99.6 97 minus idle sleep --- Benchmarking simulated cpu of Video in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met None 0.222 +/- 1.82 16.8 100 98.8 X 1.02 +/- 3.9 30.7 100 95.7 Burn 0.208 +/- 3.67 141 99.8 99.3 Write 0.755 +/- 3.62 37.2 99.9 96.4 Read 0.265 +/- 1.94 16.9 100 98.6 Compile 2.16 +/- 15.2 333 96.7 90.7 Memload 0.723 +/- 3.5 37.4 99.8 96.3 minus don't restrict IO --- Benchmarking simulated cpu of Video in the presence of simulated --- Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met None 0.226 +/- 1.82 16.8 100 98.8 X 1.38 +/- 4.68 49.4 99.9 93.9 Burn 0.513 +/- 9.62 339 98.8 98.4 Write 0.418 +/- 2.7 30.8 99.9 97.9 Read 0.565 +/- 2.99 16.7 100 96.8 Compile 1.05 +/- 13.6 545 99.1 95.1 Memload 0.345 +/- 3.23 80.5 99.8 98.5 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [interbench numbers] Re: interactive task starvation 2006-03-23 11:07 ` Mike Galbraith @ 2006-03-24 0:21 ` Con Kolivas 2006-03-24 5:02 ` Mike Galbraith 0 siblings, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-24 0:21 UTC (permalink / raw) To: Mike Galbraith Cc: lkml, Willy Tarreau, Ingo Molnar, Andrew Morton, bugsplatter, Peter Williams On Thursday 23 March 2006 22:07, Mike Galbraith wrote: > Nothing conclusive. Some of the difference may be because interbench > has a dependency on the idle sleep path popping tasks in a prio 16 > instead of 18. Some of it may be because I'm not restricting IO, doing > that makes a bit of difference. Some of it is definitely plain old > jitter. Thanks for those! Just a clarification please > virgin I assume 2.6.16-rc6-mm2 ? > throttle patches with throttling disabled With your full patchset but no throttling enabled? > minus idle sleep Full patchset -throttling-idlesleep ? > minus don't restrict IO Full patchset -throttling-idlesleep-restrictio ? Can you please email the latest separate patches so we can see them in isolation? I promise I won't ask for any more interbench numbers any time soon :) Thanks! Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [interbench numbers] Re: interactive task starvation 2006-03-24 0:21 ` Con Kolivas @ 2006-03-24 5:02 ` Mike Galbraith 2006-03-24 5:04 ` Con Kolivas 0 siblings, 1 reply; 112+ messages in thread From: Mike Galbraith @ 2006-03-24 5:02 UTC (permalink / raw) To: Con Kolivas Cc: lkml, Willy Tarreau, Ingo Molnar, Andrew Morton, bugsplatter, Peter Williams On Fri, 2006-03-24 at 11:21 +1100, Con Kolivas wrote: > On Thursday 23 March 2006 22:07, Mike Galbraith wrote: > > Nothing conclusive. Some of the difference may be because interbench > > has a dependency on the idle sleep path popping tasks in a prio 16 > > instead of 18. Some of it may be because I'm not restricting IO, doing > > that makes a bit of difference. Some of it is definitely plain old > > jitter. > > Thanks for those! Just a clarification please > > > virgin > > I assume 2.6.16-rc6-mm2 ? Yes. > > > throttle patches with throttling disabled > > With your full patchset but no throttling enabled? Yes. > > > minus idle sleep > > Full patchset -throttling-idlesleep ? Yes, using stock idle sleep bits. > > > minus don't restrict IO > > Full patchset -throttling-idlesleep-restrictio ? > Yes. > Can you please email the latest separate patches so we can see them in > isolation? I promise I won't ask for any more interbench numbers any time > soon :) I've separated the buglet fix parts from the rest, so there are four patches instead of two. I've also hidden the knobs, though for the testing phase at least, I personally think it would be better to leave the knobs there for people to twiddle. Something Willy said indicated to me that 'credit' would be more palatable than 'grace', so I've renamed and updated comments to match. I think it might look better, but can't know since 'grace' was perfectly fine for my taste buds ;-) I'll post as soon as I do some more cleanup pondering and verification. -Mike ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [interbench numbers] Re: interactive task starvation 2006-03-24 5:02 ` Mike Galbraith @ 2006-03-24 5:04 ` Con Kolivas 0 siblings, 0 replies; 112+ messages in thread From: Con Kolivas @ 2006-03-24 5:04 UTC (permalink / raw) To: Mike Galbraith Cc: lkml, Willy Tarreau, Ingo Molnar, Andrew Morton, bugsplatter, Peter Williams On Friday 24 March 2006 16:02, Mike Galbraith wrote: > I've separated the buglet fix parts from the rest, so there are four > patches instead of two. I've also hidden the knobs, though for the > testing phase at least, I personally think it would be better to leave > the knobs there for people to twiddle. Something Willy said indicated > to me that 'credit' would be more palatable than 'grace', so I've > renamed and updated comments to match. I think it might look better, > but can't know since 'grace' was perfectly fine for my taste buds ;-) > > I'll post as soon as I do some more cleanup pondering and verification. Great. I suggest making the base patch have the values hard coded as #defines and then have a patch on top that turns those into userspace tunables we can hand tune while in -mm which can then be dropped if/when merged upstream. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* [PATCH] sched: activate SCHED BATCH expired 2006-03-17 9:06 ` Ingo Molnar 2006-03-17 10:46 ` interactive task starvation Mike Galbraith @ 2006-03-17 12:38 ` Con Kolivas 2006-03-17 13:07 ` Ingo Molnar 2006-03-17 13:26 ` Nick Piggin 1 sibling, 2 replies; 112+ messages in thread From: Con Kolivas @ 2006-03-17 12:38 UTC (permalink / raw) To: Ingo Molnar; +Cc: ck, Andrew Morton, linux-kernel On Friday 17 March 2006 20:06, Ingo Molnar wrote: > * Con Kolivas <kernel@kolivas.org> wrote: > > Thinking some more on this I wonder if SCHED_BATCH isn't a strong > > enough scheduling hint if it's not suitable for such an application. > > Ingo do you think we could make SCHED_BATCH tasks always wake up on > > the expired array? > > yep, i think that's a good idea. In the worst case the starvation > timeout should kick in. Ok here's a patch that does exactly that. Without an "inline" hint, gcc 4.1.0 chooses not to inline this function. I can't say I have a strong opinion about whether it should be inlined or not (93 bytes larger inlined), so I've decided not to given the current trend. Cheers, Con --- To increase the strength of SCHED_BATCH as a scheduling hint we can activate batch tasks on the expired array since by definition they are latency insensitive tasks. Signed-off-by: Con Kolivas <kernel@kolivas.org> --- include/linux/sched.h | 1 + kernel/sched.c | 9 ++++++--- 2 files changed, 7 insertions(+), 3 deletions(-) Index: linux-2.6.16-rc6-mm1/include/linux/sched.h =================================================================== --- linux-2.6.16-rc6-mm1.orig/include/linux/sched.h 2006-03-13 20:12:22.000000000 +1100 +++ linux-2.6.16-rc6-mm1/include/linux/sched.h 2006-03-17 23:08:31.000000000 +1100 @@ -485,6 +485,7 @@ struct signal_struct { #define MAX_PRIO (MAX_RT_PRIO + 40) #define rt_task(p) (unlikely((p)->prio < MAX_RT_PRIO)) +#define batch_task(p) (unlikely((p)->policy == SCHED_BATCH)) /* * Some day this will be a full-fledged user tracking system.. Index: linux-2.6.16-rc6-mm1/kernel/sched.c =================================================================== --- linux-2.6.16-rc6-mm1.orig/kernel/sched.c 2006-03-13 20:12:15.000000000 +1100 +++ linux-2.6.16-rc6-mm1/kernel/sched.c 2006-03-17 23:08:12.000000000 +1100 @@ -737,9 +737,12 @@ static inline void dec_nr_running(task_t /* * __activate_task - move a task to the runqueue. */ -static inline void __activate_task(task_t *p, runqueue_t *rq) +static void __activate_task(task_t *p, runqueue_t *rq) { - enqueue_task(p, rq->active); + if (batch_task(p)) + enqueue_task(p, rq->expired); + else + enqueue_task(p, rq->active); inc_nr_running(p, rq); } @@ -758,7 +761,7 @@ static int recalc_task_prio(task_t *p, u unsigned long long __sleep_time = now - p->timestamp; unsigned long sleep_time; - if (unlikely(p->policy == SCHED_BATCH)) + if (batch_task(p)) sleep_time = 0; else { if (__sleep_time > NS_MAX_SLEEP_AVG) ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] sched: activate SCHED BATCH expired 2006-03-17 12:38 ` [PATCH] sched: activate SCHED BATCH expired Con Kolivas @ 2006-03-17 13:07 ` Ingo Molnar 2006-03-17 13:26 ` Nick Piggin 1 sibling, 0 replies; 112+ messages in thread From: Ingo Molnar @ 2006-03-17 13:07 UTC (permalink / raw) To: Con Kolivas; +Cc: ck, Andrew Morton, linux-kernel * Con Kolivas <kernel@kolivas.org> wrote: > To increase the strength of SCHED_BATCH as a scheduling hint we can activate > batch tasks on the expired array since by definition they are latency > insensitive tasks. > > Signed-off-by: Con Kolivas <kernel@kolivas.org> Acked-by: Ingo Molnar <mingo@elte.hu> Ingo ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] sched: activate SCHED BATCH expired 2006-03-17 12:38 ` [PATCH] sched: activate SCHED BATCH expired Con Kolivas 2006-03-17 13:07 ` Ingo Molnar @ 2006-03-17 13:26 ` Nick Piggin 2006-03-17 13:36 ` Con Kolivas 1 sibling, 1 reply; 112+ messages in thread From: Nick Piggin @ 2006-03-17 13:26 UTC (permalink / raw) To: Con Kolivas; +Cc: Ingo Molnar, ck, Andrew Morton, linux-kernel Con Kolivas wrote: > > Ok here's a patch that does exactly that. Without an "inline" hint, gcc 4.1.0 > chooses not to inline this function. I can't say I have a strong opinion > about whether it should be inlined or not (93 bytes larger inlined), so I've > decided not to given the current trend. > Sigh, sacrifice for the common case! :P > Index: linux-2.6.16-rc6-mm1/kernel/sched.c > =================================================================== > --- linux-2.6.16-rc6-mm1.orig/kernel/sched.c 2006-03-13 20:12:15.000000000 +1100 > +++ linux-2.6.16-rc6-mm1/kernel/sched.c 2006-03-17 23:08:12.000000000 +1100 > @@ -737,9 +737,12 @@ static inline void dec_nr_running(task_t > /* > * __activate_task - move a task to the runqueue. > */ > -static inline void __activate_task(task_t *p, runqueue_t *rq) > +static void __activate_task(task_t *p, runqueue_t *rq) > { > - enqueue_task(p, rq->active); > + if (batch_task(p)) > + enqueue_task(p, rq->expired); > + else > + enqueue_task(p, rq->active); > inc_nr_running(p, rq); > } > I prefer: prio_array_t *target = rq->active; if (batch_task(p)) target = rq->expired; enqueue_task(p, target); Because gcc can use things like predicated instructions for it. But perhaps it is smart enough these days to recognise this? At least in the past I have seen it start using cmov after doing such a conversion. At any rate, I think it looks nicer as well. IMO, of course. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] sched: activate SCHED BATCH expired 2006-03-17 13:26 ` Nick Piggin @ 2006-03-17 13:36 ` Con Kolivas 2006-03-17 13:46 ` Nick Piggin 2006-03-17 13:47 ` [ck] " Andreas Mohr 0 siblings, 2 replies; 112+ messages in thread From: Con Kolivas @ 2006-03-17 13:36 UTC (permalink / raw) To: Nick Piggin; +Cc: Ingo Molnar, ck, Andrew Morton, linux-kernel On Saturday 18 March 2006 00:26, Nick Piggin wrote: > Con Kolivas wrote: > > -static inline void __activate_task(task_t *p, runqueue_t *rq) > > +static void __activate_task(task_t *p, runqueue_t *rq) > > { > > - enqueue_task(p, rq->active); > > + if (batch_task(p)) > > + enqueue_task(p, rq->expired); > > + else > > + enqueue_task(p, rq->active); > > inc_nr_running(p, rq); > > } > > I prefer: > > prio_array_t *target = rq->active; > if (batch_task(p)) > target = rq->expired; > enqueue_task(p, target); > > Because gcc can use things like predicated instructions for it. > But perhaps it is smart enough these days to recognise this? > At least in the past I have seen it start using cmov after doing > such a conversion. > > At any rate, I think it looks nicer as well. IMO, of course. Well on my one boring architecture here is a before and after, gcc 4.1.0 with optimise for size kernel config: 0xb01127da <__activate_task+0>: push %ebp 0xb01127db <__activate_task+1>: mov %esp,%ebp 0xb01127dd <__activate_task+3>: push %esi 0xb01127de <__activate_task+4>: push %ebx 0xb01127df <__activate_task+5>: mov %eax,%esi 0xb01127e1 <__activate_task+7>: mov %edx,%ebx 0xb01127e3 <__activate_task+9>: cmpl $0x3,0x58(%eax) 0xb01127e7 <__activate_task+13>: jne 0xb01127ee <__activate_task+20> 0xb01127e9 <__activate_task+15>: mov 0x44(%edx),%edx 0xb01127ec <__activate_task+18>: jmp 0xb01127f1 <__activate_task+23> 0xb01127ee <__activate_task+20>: mov 0x40(%edx),%edx 0xb01127f1 <__activate_task+23>: mov %esi,%eax 0xb01127f3 <__activate_task+25>: call 0xb01124bb <enqueue_task> 0xb01127f8 <__activate_task+30>: incl 0x8(%ebx) 0xb01127fb <__activate_task+33>: mov 0x18(%esi),%eax 0xb01127fe <__activate_task+36>: add %eax,0xc(%ebx) 0xb0112801 <__activate_task+39>: pop %ebx 0xb0112802 <__activate_task+40>: pop %esi 0xb0112803 <__activate_task+41>: pop %ebp 0xb0112804 <__activate_task+42>: ret Your version: 0xb01127da <__activate_task+0>: push %ebp 0xb01127db <__activate_task+1>: mov %esp,%ebp 0xb01127dd <__activate_task+3>: push %esi 0xb01127de <__activate_task+4>: push %ebx 0xb01127df <__activate_task+5>: mov %eax,%esi 0xb01127e1 <__activate_task+7>: mov %edx,%ebx 0xb01127e3 <__activate_task+9>: mov 0x40(%edx),%edx 0xb01127e6 <__activate_task+12>: cmpl $0x3,0x58(%eax) 0xb01127ea <__activate_task+16>: jne 0xb01127ef <__activate_task+21> 0xb01127ec <__activate_task+18>: mov 0x44(%ebx),%edx 0xb01127ef <__activate_task+21>: mov %esi,%eax 0xb01127f1 <__activate_task+23>: call 0xb01124bb <enqueue_task> 0xb01127f6 <__activate_task+28>: incl 0x8(%ebx) 0xb01127f9 <__activate_task+31>: mov 0x18(%esi),%eax 0xb01127fc <__activate_task+34>: add %eax,0xc(%ebx) 0xb01127ff <__activate_task+37>: pop %ebx 0xb0112800 <__activate_task+38>: pop %esi 0xb0112801 <__activate_task+39>: pop %ebp 0xb0112802 <__activate_task+40>: ret I'm not attached to the style, just the feature. If you think it's warranted I'll change it. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] sched: activate SCHED BATCH expired 2006-03-17 13:36 ` Con Kolivas @ 2006-03-17 13:46 ` Nick Piggin 2006-03-17 13:51 ` Nick Piggin 2006-03-17 14:11 ` Con Kolivas 2006-03-17 13:47 ` [ck] " Andreas Mohr 1 sibling, 2 replies; 112+ messages in thread From: Nick Piggin @ 2006-03-17 13:46 UTC (permalink / raw) To: Con Kolivas; +Cc: Ingo Molnar, ck, Andrew Morton, linux-kernel Con Kolivas wrote: > On Saturday 18 March 2006 00:26, Nick Piggin wrote: > >>Con Kolivas wrote: >> >>>-static inline void __activate_task(task_t *p, runqueue_t *rq) >>>+static void __activate_task(task_t *p, runqueue_t *rq) >>> { >>>- enqueue_task(p, rq->active); >>>+ if (batch_task(p)) >>>+ enqueue_task(p, rq->expired); >>>+ else >>>+ enqueue_task(p, rq->active); >>> inc_nr_running(p, rq); >>> } >> >>I prefer: >> >> prio_array_t *target = rq->active; >> if (batch_task(p)) >> target = rq->expired; >> enqueue_task(p, target); >> >>Because gcc can use things like predicated instructions for it. >>But perhaps it is smart enough these days to recognise this? >>At least in the past I have seen it start using cmov after doing >>such a conversion. >> >>At any rate, I think it looks nicer as well. IMO, of course. > > > Well on my one boring architecture here is a before and after, gcc 4.1.0 with > optimise for size kernel config: > I'm not attached to the style, just the feature. If you think it's warranted > I'll change it. > I guess it isn't doing the cmov because it doesn't want to do the extra load in the common case, which is fair enough (are you compiling for a pentiumpro+, without generic x86 support? what about if you turn off optimise for size?) At least other archtectures might be able to make better use of it, and I agree even for i386 the code looks better (and slightly smaller). -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] sched: activate SCHED BATCH expired 2006-03-17 13:46 ` Nick Piggin @ 2006-03-17 13:51 ` Nick Piggin 2006-03-17 14:11 ` Con Kolivas 1 sibling, 0 replies; 112+ messages in thread From: Nick Piggin @ 2006-03-17 13:51 UTC (permalink / raw) To: Con Kolivas; +Cc: Ingo Molnar, ck, Andrew Morton, linux-kernel Nick Piggin wrote: > Con Kolivas wrote: >> I'm not attached to the style, just the feature. If you think it's >> warranted I'll change it. >> > > At least other archtectures might be able to make better use of it, > and I agree even for i386 the code looks better (and slightly smaller). > s/I agree/I think/ -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] sched: activate SCHED BATCH expired 2006-03-17 13:46 ` Nick Piggin 2006-03-17 13:51 ` Nick Piggin @ 2006-03-17 14:11 ` Con Kolivas 2006-03-17 14:59 ` Ingo Molnar 1 sibling, 1 reply; 112+ messages in thread From: Con Kolivas @ 2006-03-17 14:11 UTC (permalink / raw) To: Nick Piggin; +Cc: Ingo Molnar, ck, Andrew Morton, linux-kernel On Saturday 18 March 2006 00:46, Nick Piggin wrote: > I guess it isn't doing the cmov because it doesn't want to do the > extra load in the common case, which is fair enough (are you compiling > for a pentiumpro+, without generic x86 support? For pentium4 with no generic support. > what about if you > turn off optimise for size?) Dunno, sleep is taking me... > At least other archtectures might be able to make better use of it, > and I agree even for i386 the code looks better (and slightly smaller). Good enough for me. Here's a respin, thanks! Cheers, Con --- To increase the strength of SCHED_BATCH as a scheduling hint we can activate batch tasks on the expired array since by definition they are latency insensitive tasks. Signed-off-by: Con Kolivas <kernel@kolivas.org> --- include/linux/sched.h | 1 + kernel/sched.c | 10 +++++++--- 2 files changed, 8 insertions(+), 3 deletions(-) Index: linux-2.6.16-rc6-mm1/include/linux/sched.h =================================================================== --- linux-2.6.16-rc6-mm1.orig/include/linux/sched.h 2006-03-13 20:12:22.000000000 +1100 +++ linux-2.6.16-rc6-mm1/include/linux/sched.h 2006-03-17 23:08:31.000000000 +1100 @@ -485,6 +485,7 @@ struct signal_struct { #define MAX_PRIO (MAX_RT_PRIO + 40) #define rt_task(p) (unlikely((p)->prio < MAX_RT_PRIO)) +#define batch_task(p) (unlikely((p)->policy == SCHED_BATCH)) /* * Some day this will be a full-fledged user tracking system.. Index: linux-2.6.16-rc6-mm1/kernel/sched.c =================================================================== --- linux-2.6.16-rc6-mm1.orig/kernel/sched.c 2006-03-13 20:12:15.000000000 +1100 +++ linux-2.6.16-rc6-mm1/kernel/sched.c 2006-03-18 01:05:02.000000000 +1100 @@ -737,9 +737,13 @@ static inline void dec_nr_running(task_t /* * __activate_task - move a task to the runqueue. */ -static inline void __activate_task(task_t *p, runqueue_t *rq) +static void __activate_task(task_t *p, runqueue_t *rq) { - enqueue_task(p, rq->active); + prio_array_t *target = rq->active; + + if (batch_task(p)) + target = rq->expired; + enqueue_task(p, target); inc_nr_running(p, rq); } @@ -758,7 +762,7 @@ static int recalc_task_prio(task_t *p, u unsigned long long __sleep_time = now - p->timestamp; unsigned long sleep_time; - if (unlikely(p->policy == SCHED_BATCH)) + if (batch_task(p)) sleep_time = 0; else { if (__sleep_time > NS_MAX_SLEEP_AVG) ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] sched: activate SCHED BATCH expired 2006-03-17 14:11 ` Con Kolivas @ 2006-03-17 14:59 ` Ingo Molnar 0 siblings, 0 replies; 112+ messages in thread From: Ingo Molnar @ 2006-03-17 14:59 UTC (permalink / raw) To: Con Kolivas; +Cc: Nick Piggin, ck, Andrew Morton, linux-kernel * Con Kolivas <kernel@kolivas.org> wrote: > Good enough for me. Here's a respin, thanks! > Signed-off-by: Con Kolivas <kernel@kolivas.org> Still-Acked-by: Ingo Molnar <mingo@elte.hu> Ingo ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [ck] Re: [PATCH] sched: activate SCHED BATCH expired 2006-03-17 13:36 ` Con Kolivas 2006-03-17 13:46 ` Nick Piggin @ 2006-03-17 13:47 ` Andreas Mohr 2006-03-17 13:59 ` Con Kolivas 2006-03-17 14:06 ` Nick Piggin 1 sibling, 2 replies; 112+ messages in thread From: Andreas Mohr @ 2006-03-17 13:47 UTC (permalink / raw) To: Con Kolivas; +Cc: Nick Piggin, ck, Andrew Morton, linux-kernel Hi, On Sat, Mar 18, 2006 at 12:36:10AM +1100, Con Kolivas wrote: > I'm not attached to the style, just the feature. If you think it's warranted > I'll change it. Seconded. An even nicer way (this solution seems somewhat asymmetric) than prio_array_t *target = rq->active; if (batch_task(p)) target = rq->expired; enqueue_task(p, target); may be prio_array_t *target; if (batch_task(p)) target = rq->expired; else target = rq->active; enqueue_task(p, target); and thus (but this coding style may be considered overloaded): prio_array_t *target; target = batch_task(p) ? rq->expired : rq->active; enqueue_task(p, target); But this discussion is clearly growing out of control now ;) Andreas Mohr ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [ck] Re: [PATCH] sched: activate SCHED BATCH expired 2006-03-17 13:47 ` [ck] " Andreas Mohr @ 2006-03-17 13:59 ` Con Kolivas 2006-03-17 14:06 ` Nick Piggin 1 sibling, 0 replies; 112+ messages in thread From: Con Kolivas @ 2006-03-17 13:59 UTC (permalink / raw) To: Andreas Mohr; +Cc: Nick Piggin, ck, Andrew Morton, linux-kernel On Saturday 18 March 2006 00:47, Andreas Mohr wrote: > Hi, > > On Sat, Mar 18, 2006 at 12:36:10AM +1100, Con Kolivas wrote: > > I'm not attached to the style, just the feature. If you think it's > > warranted I'll change it. > > Seconded. > > An even nicer way (this solution seems somewhat asymmetric) than > > prio_array_t *target = rq->active; > if (batch_task(p)) > target = rq->expired; > enqueue_task(p, target); > > may be > > prio_array_t *target; > if (batch_task(p)) > target = rq->expired; > else > target = rq->active; > enqueue_task(p, target); Well I hadn't quite gone to bed so I tried yours for grins too and interestingly it produced the identical code to my original version. > But this discussion is clearly growing out of control now ;) I prefer a month's worth of this over a single more email about cd-fscking-record's amazing perfection. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [ck] Re: [PATCH] sched: activate SCHED BATCH expired 2006-03-17 13:47 ` [ck] " Andreas Mohr 2006-03-17 13:59 ` Con Kolivas @ 2006-03-17 14:06 ` Nick Piggin 1 sibling, 0 replies; 112+ messages in thread From: Nick Piggin @ 2006-03-17 14:06 UTC (permalink / raw) To: Andreas Mohr; +Cc: Con Kolivas, ck, Andrew Morton, linux-kernel Andreas Mohr wrote: > Hi, > > On Sat, Mar 18, 2006 at 12:36:10AM +1100, Con Kolivas wrote: > >>I'm not attached to the style, just the feature. If you think it's warranted >>I'll change it. > > > Seconded. > > An even nicer way (this solution seems somewhat asymmetric) than > > prio_array_t *target = rq->active; > if (batch_task(p)) > target = rq->expired; > enqueue_task(p, target); > > may be > > prio_array_t *target; > if (batch_task(p)) > target = rq->expired; > else > target = rq->active; > enqueue_task(p, target); > It doesn't actually generate the same code here (I guess it is good that gcc gives us this control). I think my way is (ever so slightly) better because it gets the load going earlier and comprises one less conditional jump (admittedly in the slowpath). You'd probably never be able to measure a difference between any of the variants, however ;) -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [ck] Re: [PATCH] mm: yield during swap prefetching 2006-03-07 23:26 ` Andrew Morton 2006-03-07 23:32 ` Con Kolivas @ 2006-03-08 8:48 ` Andreas Mohr 2006-03-08 8:52 ` Con Kolivas 1 sibling, 1 reply; 112+ messages in thread From: Andreas Mohr @ 2006-03-08 8:48 UTC (permalink / raw) To: Andrew Morton; +Cc: Con Kolivas, ck, linux-mm, linux-kernel Hi, On Tue, Mar 07, 2006 at 03:26:36PM -0800, Andrew Morton wrote: > Con Kolivas <kernel@kolivas.org> wrote: > > > > Swap prefetching doesn't use very much cpu but spends a lot of time waiting on > > disk in uninterruptible sleep. This means it won't get preempted often even at > > a low nice level since it is seen as sleeping most of the time. We want to > > minimise its cpu impact so yield where possible. > yield() really sucks if there are a lot of runnable tasks. And the amount > of CPU which that thread uses isn't likely to matter anyway. > > I think it'd be better to just not do this. Perhaps alter the thread's > static priority instead? Does the scheduler have a knob which can be used > to disable a tasks's dynamic priority boost heuristic? This problem occurs due to giving a priority boost to processes that are sleeping a lot (e.g. in this case, I/O, from disk), right? Forgive me my possibly less insightful comments, but maybe instead of adding crude specific hacks (namely, yield()) to each specific problematic process as it comes along (it just happens to be the swap prefetch thread this time) there is a *general way* to give processes with lots of disk I/O sleeping much smaller amounts of boost in order to get them preempted more often in favour of an actually much more critical process (game)? >From the discussion here it seems this problem is caused by a *general* miscalculation of processes sleeping on disk I/O a lot. Thus IMHO this problem should be solved in a general way if at all possible. Andreas Mohr ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [ck] Re: [PATCH] mm: yield during swap prefetching 2006-03-08 8:48 ` [ck] Re: [PATCH] mm: yield during swap prefetching Andreas Mohr @ 2006-03-08 8:52 ` Con Kolivas 0 siblings, 0 replies; 112+ messages in thread From: Con Kolivas @ 2006-03-08 8:52 UTC (permalink / raw) To: Andreas Mohr; +Cc: Andrew Morton, ck, linux-mm, linux-kernel On Wednesday 08 March 2006 19:48, Andreas Mohr wrote: > Hi, > > On Tue, Mar 07, 2006 at 03:26:36PM -0800, Andrew Morton wrote: > > Con Kolivas <kernel@kolivas.org> wrote: > > > Swap prefetching doesn't use very much cpu but spends a lot of time > > > waiting on disk in uninterruptible sleep. This means it won't get > > > preempted often even at a low nice level since it is seen as sleeping > > > most of the time. We want to minimise its cpu impact so yield where > > > possible. > > > > yield() really sucks if there are a lot of runnable tasks. And the > > amount of CPU which that thread uses isn't likely to matter anyway. > > > > I think it'd be better to just not do this. Perhaps alter the thread's > > static priority instead? Does the scheduler have a knob which can be > > used to disable a tasks's dynamic priority boost heuristic? > > This problem occurs due to giving a priority boost to processes that are > sleeping a lot (e.g. in this case, I/O, from disk), right? > Forgive me my possibly less insightful comments, but maybe instead of > adding crude specific hacks (namely, yield()) to each specific problematic > process as it comes along (it just happens to be the swap prefetch thread > this time) there is a *general way* to give processes with lots of disk I/O > sleeping much smaller amounts of boost in order to get them preempted more > often in favour of an actually much more critical process (game)? > > >From the discussion here it seems this problem is caused by a *general* > > miscalculation of processes sleeping on disk I/O a lot. > > Thus IMHO this problem should be solved in a general way if at all > possible. No. We already do special things for tasks waiting on uninterruptible sleep. This is more about what is exaggerated on a dual array expiring scheduler design that mainline has. Cheers, Con ^ permalink raw reply [flat|nested] 112+ messages in thread
end of thread, other threads:[~2006-03-29 6:16 UTC | newest]
Thread overview: 112+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-07 23:13 [PATCH] mm: yield during swap prefetching Con Kolivas
2006-03-07 23:26 ` Andrew Morton
2006-03-07 23:32 ` Con Kolivas
2006-03-08 0:05 ` Andrew Morton
2006-03-08 0:51 ` Con Kolivas
2006-03-08 1:11 ` Andrew Morton
2006-03-08 1:12 ` Con Kolivas
2006-03-08 1:19 ` Con Kolivas
2006-03-08 1:23 ` Andrew Morton
2006-03-08 1:28 ` Con Kolivas
2006-03-08 2:08 ` Lee Revell
2006-03-08 2:12 ` Con Kolivas
2006-03-08 2:18 ` Lee Revell
2006-03-08 2:22 ` Con Kolivas
2006-03-08 2:27 ` Lee Revell
2006-03-08 2:30 ` Con Kolivas
2006-03-08 2:52 ` [ck] " André Goddard Rosa
2006-03-08 3:03 ` Lee Revell
2006-03-08 3:05 ` Con Kolivas
2006-03-08 21:07 ` Zan Lynx
2006-03-08 23:00 ` Con Kolivas
2006-03-08 23:48 ` Zan Lynx
2006-03-09 0:07 ` Con Kolivas
2006-03-09 3:13 ` Zan Lynx
2006-03-09 4:08 ` Con Kolivas
2006-03-09 4:54 ` Lee Revell
2006-03-08 7:51 ` Jan Knutar
2006-03-08 8:39 ` Con Kolivas
2006-03-09 8:57 ` Helge Hafting
2006-03-09 9:08 ` Con Kolivas
[not found] ` <4410AFD3.7090505@bigpond.net.au>
2006-03-10 9:01 ` [ck] " Andreas Mohr
2006-03-10 9:11 ` Con Kolivas
2006-03-08 22:24 ` Pavel Machek
2006-03-09 2:22 ` Nick Piggin
2006-03-09 2:30 ` Con Kolivas
2006-03-09 2:57 ` Nick Piggin
2006-03-09 9:11 ` Con Kolivas
2006-03-08 13:36 ` [ck] " Con Kolivas
2006-03-17 9:06 ` Ingo Molnar
2006-03-17 10:46 ` interactive task starvation Mike Galbraith
2006-03-17 17:15 ` Mike Galbraith
2006-03-20 7:09 ` Mike Galbraith
2006-03-20 10:22 ` Ingo Molnar
2006-03-21 6:47 ` Willy Tarreau
2006-03-21 7:51 ` Mike Galbraith
2006-03-21 9:13 ` Willy Tarreau
2006-03-21 9:14 ` Ingo Molnar
2006-03-21 11:15 ` Willy Tarreau
2006-03-21 11:18 ` Ingo Molnar
2006-03-21 11:53 ` Con Kolivas
2006-03-21 13:10 ` Mike Galbraith
2006-03-21 13:13 ` Con Kolivas
2006-03-21 13:33 ` Mike Galbraith
2006-03-21 13:37 ` Con Kolivas
2006-03-21 13:44 ` Willy Tarreau
2006-03-21 13:45 ` Con Kolivas
2006-03-21 14:01 ` Mike Galbraith
2006-03-21 14:17 ` Con Kolivas
2006-03-21 15:20 ` Con Kolivas
2006-03-21 17:50 ` Willy Tarreau
2006-03-22 4:18 ` Mike Galbraith
2006-03-21 17:51 ` Mike Galbraith
2006-03-21 13:38 ` Willy Tarreau
2006-03-21 13:48 ` Mike Galbraith
2006-03-21 12:07 ` Mike Galbraith
2006-03-21 12:59 ` Willy Tarreau
2006-03-21 13:24 ` Mike Galbraith
2006-03-21 13:53 ` Con Kolivas
2006-03-21 14:17 ` Mike Galbraith
2006-03-21 14:19 ` Con Kolivas
2006-03-21 14:25 ` Ingo Molnar
2006-03-21 14:28 ` Con Kolivas
2006-03-21 14:30 ` Ingo Molnar
2006-03-21 14:28 ` Mike Galbraith
2006-03-21 14:30 ` Con Kolivas
2006-03-21 14:32 ` Ingo Molnar
2006-03-21 14:44 ` Willy Tarreau
2006-03-21 14:52 ` Ingo Molnar
2006-03-29 3:01 ` Lee Revell
2006-03-29 5:56 ` Ray Lee
2006-03-29 6:16 ` Lee Revell
2006-03-21 14:36 ` Mike Galbraith
2006-03-21 14:39 ` Con Kolivas
2006-03-21 14:39 ` Willy Tarreau
2006-03-21 18:39 ` Rafael J. Wysocki
2006-03-21 19:32 ` Willy Tarreau
2006-03-21 21:47 ` Rafael J. Wysocki
2006-03-21 22:51 ` Peter Williams
2006-03-22 3:49 ` Mike Galbraith
2006-03-22 3:59 ` Peter Williams
2006-03-22 12:14 ` [interbench numbers] " Mike Galbraith
2006-03-22 20:27 ` Con Kolivas
2006-03-23 3:22 ` Mike Galbraith
2006-03-23 5:43 ` Con Kolivas
2006-03-23 5:53 ` Mike Galbraith
2006-03-23 11:07 ` Mike Galbraith
2006-03-24 0:21 ` Con Kolivas
2006-03-24 5:02 ` Mike Galbraith
2006-03-24 5:04 ` Con Kolivas
2006-03-17 12:38 ` [PATCH] sched: activate SCHED BATCH expired Con Kolivas
2006-03-17 13:07 ` Ingo Molnar
2006-03-17 13:26 ` Nick Piggin
2006-03-17 13:36 ` Con Kolivas
2006-03-17 13:46 ` Nick Piggin
2006-03-17 13:51 ` Nick Piggin
2006-03-17 14:11 ` Con Kolivas
2006-03-17 14:59 ` Ingo Molnar
2006-03-17 13:47 ` [ck] " Andreas Mohr
2006-03-17 13:59 ` Con Kolivas
2006-03-17 14:06 ` Nick Piggin
2006-03-08 8:48 ` [ck] Re: [PATCH] mm: yield during swap prefetching Andreas Mohr
2006-03-08 8:52 ` Con Kolivas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).