From: Con Kolivas <kernel@kolivas.org>
To: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>,
Michal Piotrowski <michal.k.k.piotrowski@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Adrian Bunk <bunk@stusta.de>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org
Subject: Re: 2.6.21-rc1: known regressions (v2) (part 2)
Date: Wed, 28 Feb 2007 10:07:15 +1100 [thread overview]
Message-ID: <200702281007.16316.kernel@kolivas.org> (raw)
In-Reply-To: <1172566453.7009.28.camel@Homer.simpson.net>
Apologies for the resend, lkml address got mangled...
On Tuesday 27 February 2007 19:54, Mike Galbraith wrote:
> On Tue, 2007-02-27 at 09:33 +0100, Ingo Molnar wrote:
> > * Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote:
> > > Thomas Gleixner napisał(a):
> > > > Adrian,
> > > >
> > > > On Mon, 2007-02-26 at 23:05 +0100, Adrian Bunk wrote:
> > > >> Subject : kernel BUG at kernel/time/tick-sched.c:168
> > > >> (CONFIG_NO_HZ) References : http://lkml.org/lkml/2007/2/16/346
> > > >> Submitter : Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
> > > I can confirm that the bug is fixed (over 20 hours of testing should
> > > be enough).
> >
> > thanks alot! I think this thing was a long-term performance/latency
> > regression in HT scheduling as well.
Ingo I'm going to have to partially disagree with you on this.
This has only become a problem because of what happens with dynticks now when
rq->curr == rq->idle. Prior to this, that particular SMT code only leads to
relative delays in scheduling for lower priority tasks. Whether or not that
task is ksoftirqd should not matter because it is not like they are starved
indefinitely, it is only that nice 19 tasks are relatively delayed, which by
definition is implied with the usage of nice as a scheduler hint wouldn't you
say? I know it has been discussed many times before as to whether 'nice'
means less cpu and/or more latency, but in our current implementation, nice
means both less cpu and more latency. So to me, the kernels without dynticks
do not have a regression. This seems to only be a problem in the setting of
the new dynticks code IMHO. That's not to say it isn't a bug! Nor am I saying
that dynticks is a problem! Please don't misinterpret that.
The second issue is that this is a problem because of the fuzzy definition of
what idle is for a runqueue in the setting of this SMT code. Normally,
rq->curr==rq->idle means the runqueue is idle, but not with this code since
there are still rq->nr_running on that runqueue. What dynticks in this
implementation is doing is trying to idle a hyperthread sibling on a cpu
whose logical partner is busy. I did not find that added any power saving on
my earlier dynticks implementation, and found it easier to keep that sibling
ticking at the same rate as its partner. Of course you may have found
something different, and I definitely agree with what you are likely to say
in response to this- we shouldn't have to special case logical siblings as
having a different definition of idle than any other smp case. Ultimately,
that leaves us with your simple patch as a reasonable solution for the
dynticks case even though it does change the behaviour dramatically for a
more loaded cpu. I don't see this code as presenting a problem without or
prior to the dynticks implementation. You being the scheduler maintainer
means you get to choose what is the best way to tackle this problem.
> Agreed.
>
> I was recently looking at that spot because I found that niced tasks
> were taking latency hits, and disabled it, which helped a bunch.
Ok... as I said above to Ingo, nice means more latency too, and there is no
doubt that if we disable nice as a working feature then the niced tasks will
have less latency. Of course, this ends up meaning that un-niced tasks no
longer receive their better cpu performance.. You're basically saying that
you prefer nice not to work in the setting of HT.
> I also
> can't understand why it would be OK to interleave a normal task with an
> RT task sometimes, but not others.. that's meaningless to the RT task.
Clearly there would be a reason that code is there... The whole problem with
HT is that as soon as you load up a sibling, you slow down the logical
sibling, hence why this code is there in the first place. Since I know you're
one to test things for yourself, I will put it to you this way:
Boot into UP. Time how long it takes to do a million of these in a real time
task:
asm volatile("" : : : "memory");
Then start up a SCHED_NORMAL task fully cpu bound such as "yes > /dev/null"
and time that again. Obviously the former being a realtime task will take the
same amount of time and the SCHED_NORMAL task will be starved until the
realtime task finishes.
Now try the same experiment with hyperthreading enabled and an ordinary SMP
kernel. You'll find the realtime task runs at only ~60% performance. That's a
serious performance hit for realtime tasks considering you're running a
SCHED_NORMAL task. The SMT code that you seem to dislike fixes this problem.
The reason for interleaving is that there are a few cycles to be gained by
using the second core for a separate SCHED_NORMAL task, and you don't want to
disable access to the second core entirely for the duration the realtime task
is running. Since there is no simple relationship between SCHED_NORMAL
timeslices and realtime timeslices, we have to use some form of interleaving
based on the expected extra cycles and HZ is the obvious choice.
> IMHO, SMT scheduling should be a buyer beware thing. Maximizing your
> core utilization comes at a price, but so does disabling it, so I think
> letting the user decide what he wants is the right thing to do.
To me this is like arguing that we should not implement 'nice' within the cpu
scheduler at all and only allow nice to work on the few architectures that
support hardware priorities in the cpu (like power5). Now there is no doubt
that if we do away with nice entirely everywhere in the scheduler we'll gain
some throughput. However, nice is a basic unix/linux function and if hardware
comes along that breaks it working we should be working to make sure that it
keeps working in software. That is why smt nice and smp nice was implemented.
Of course it is our duty to ensure we do that at minimal overhead at all
times. That's a different argument to what you are debating here. The
throughput should not be adversely affected by this SMT priority code because
although the nice 19 task gets less throughput, the higher priority task gets
more as a result, which is essentially what nice is meant to do.
--
-ck
next prev parent reply other threads:[~2007-02-27 23:23 UTC|newest]
Thread overview: 206+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-02-21 4:53 Linux 2.6.21-rc1 Linus Torvalds
2007-02-21 13:26 ` Faik Uygur
2007-02-21 18:41 ` Jiri Slaby
2007-02-21 18:51 ` Jiri Slaby
2007-02-21 19:19 ` Linus Torvalds
2007-02-21 13:32 ` Thomas Gleixner
2007-02-21 15:58 ` Kok, Auke
2007-02-21 19:24 ` Linus Torvalds
2007-02-21 19:45 ` Andrew Morton
2007-02-21 16:23 ` request_module: runaway loop modprobe net-pf-1 (is Re: Linux 2.6.21-rc1) YOSHIFUJI Hideaki / 吉藤英明
[not found] ` <87ps8372bf.fsf@duaron.myhome.or.jp>
2007-02-21 20:36 ` Greg KH
2007-02-21 21:16 ` OGAWA Hirofumi
2007-02-22 0:18 ` Greg KH
2007-02-22 9:57 ` Anders Larsen
2007-02-22 10:30 ` David Miller
[not found] ` <20070222.110440.47345562.yoshfuji@linux-ipv6.org>
[not found] ` <20070222.123417.79213825.yoshfuji@linux-ipv6.org>
2007-02-22 6:08 ` Greg KH
2007-02-21 16:24 ` Linux 2.6.21-rc1 Daniel Walker
2007-02-21 17:07 ` Thomas Gleixner
2007-02-21 17:19 ` Daniel Walker
2007-02-21 17:41 ` Thomas Gleixner
2007-02-21 17:38 ` Daniel Walker
2007-02-21 18:18 ` Thomas Gleixner
2007-02-21 18:23 ` Daniel Walker
2007-02-21 19:23 ` Thomas Gleixner
2007-02-21 19:24 ` Daniel Walker
2007-02-21 20:00 ` Daniel Walker
2007-02-21 20:18 ` Linus Torvalds
2007-02-21 20:43 ` Thomas Gleixner
2007-02-21 20:49 ` Daniel Walker
2007-02-21 21:06 ` Linus Torvalds
2007-02-21 21:21 ` Thomas Gleixner
2007-02-21 21:23 ` Daniel Walker
2007-02-21 22:05 ` Linux 2.6.21-rc1 [git bisect] Pete Harlan
2007-02-23 10:08 ` Linux 2.6.21-rc1 Andrew Morton
2007-02-23 11:35 ` Ingo Molnar
2007-02-23 11:39 ` Ingo Molnar
2007-02-23 11:47 ` Thomas Gleixner
2007-02-21 18:34 ` Andreas Schwab
2007-02-21 18:40 ` Dave Jones
2007-02-21 23:04 ` NO_HZ: timer interrupt stuck [Re: Linux 2.6.21-rc1] Luca Tettamanti
2007-02-21 23:17 ` Thomas Gleixner
2007-02-21 23:19 ` Luca Tettamanti
2007-02-22 12:36 ` Jan Engelhardt
2007-02-22 13:25 ` Arjan van de Ven
2007-02-22 14:10 ` Pierre Ossman
2007-02-22 14:20 ` Arjan van de Ven
2007-02-22 14:51 ` Pierre Ossman
2007-02-22 15:13 ` Pierre Ossman
2007-02-22 16:00 ` Thomas Gleixner
2007-02-22 16:27 ` Pierre Ossman
2007-02-22 16:42 ` Arjan van de Ven
2007-02-22 21:07 ` Pierre Ossman
2007-02-22 21:25 ` Andreas Mohr
2007-02-22 22:21 ` Arjan van de Ven
2007-02-23 6:55 ` Pierre Ossman
2007-02-22 19:58 ` Pallipadi, Venkatesh
2007-02-22 15:51 ` Thomas Gleixner
2007-02-22 17:26 ` NO_HZ: timer interrupt stuck David Miller
2007-02-22 17:39 ` Thomas Gleixner
2007-02-23 9:25 ` David Miller
2007-02-23 9:56 ` sparc generic time / clockevents [ was Re: NO_HZ: timer interrupt stuck ] Thomas Gleixner
2007-02-23 9:55 ` sparc generic time / clockevents David Miller
2007-02-23 19:51 ` john stultz
2007-02-23 22:15 ` Peter Keilty
2007-02-24 0:34 ` David Miller
2007-02-24 0:53 ` john stultz
2007-02-24 5:52 ` David Miller
2007-02-25 5:13 ` generic one-shot bug (was Re: sparc generic time / clockevents) David Miller
2007-02-25 8:34 ` Thomas Gleixner
2007-02-23 15:50 ` NO_HZ: timer interrupt stuck Andi Kleen
2007-02-23 15:48 ` NO_HZ: timer interrupt stuck [Re: Linux 2.6.21-rc1] Andi Kleen
2007-02-23 5:25 ` Linux 2.6.21-rc1 -- suspend Pavel Machek
2007-02-25 17:52 ` 2.6.21-rc1: known regressions (part 1) Adrian Bunk
2007-02-27 8:39 ` regression: forcedeth.c hang Ingo Molnar
2007-02-27 9:01 ` Ingo Molnar
2007-02-27 9:38 ` Ingo Molnar
2007-02-27 11:25 ` Ingo Molnar
2007-02-27 15:42 ` Linus Torvalds
2007-02-28 7:36 ` Ingo Molnar
2007-02-28 18:16 ` 2.6.21-rc1: known regressions (part 1) Karasyov, Konstantin A
2007-02-25 17:55 ` 2.6.21-rc1: known regressions (part 2) Adrian Bunk
2007-02-27 10:02 ` Jens Axboe
2007-02-27 10:21 ` Pavel Machek
2007-02-27 10:30 ` Jens Axboe
2007-02-27 10:34 ` Ingo Molnar
2007-02-27 10:59 ` Jens Axboe
2007-02-27 11:15 ` Jens Axboe
2007-02-27 13:09 ` Jens Axboe
2007-03-01 9:34 ` Ingo Molnar
2007-03-01 10:41 ` Ingo Molnar
2007-03-01 14:52 ` Ingo Molnar
2007-03-01 16:12 ` Rafael J. Wysocki
2007-03-02 0:26 ` Linus Torvalds
2007-03-02 0:41 ` Linus Torvalds
2007-03-02 7:14 ` Ingo Molnar
2007-03-02 7:21 ` Ingo Molnar
2007-03-02 8:04 ` Ingo Molnar
2007-03-02 10:20 ` Ingo Molnar
2007-03-02 10:22 ` [patch] KVM: T60 resume fix Ingo Molnar
2007-03-02 11:39 ` Michael S. Tsirkin
2007-03-03 8:22 ` Avi Kivity
2007-03-03 8:21 ` Avi Kivity
2007-03-03 11:57 ` Andrew Morton
2007-03-03 12:07 ` Junio C Hamano
2007-03-05 8:22 ` Ingo Molnar
2007-03-05 8:50 ` Avi Kivity
2007-03-05 8:44 ` Ingo Molnar
2007-03-05 8:57 ` Ingo Molnar
2007-03-05 9:27 ` Avi Kivity
2007-03-05 10:05 ` Ingo Molnar
2007-03-05 10:33 ` Avi Kivity
2007-03-05 10:33 ` Ingo Molnar
2007-03-05 10:40 ` Michael S. Tsirkin
2007-03-05 12:54 ` Michael S. Tsirkin
2007-03-05 12:50 ` Ingo Molnar
2007-03-05 13:26 ` Michael S. Tsirkin
2007-03-05 13:32 ` Ingo Molnar
2007-03-05 10:23 ` Michael S. Tsirkin
2007-03-05 10:29 ` Ingo Molnar
2007-03-05 15:44 ` 2.6.21-rc1: known regressions (part 2) Michael S. Tsirkin
2007-03-05 16:14 ` Michael S. Tsirkin
2007-03-05 16:41 ` Ingo Molnar
2007-03-05 18:16 ` Jens Axboe
2007-03-01 23:36 ` Linus Torvalds
2007-03-02 10:07 ` Pavel Machek
2007-03-05 8:42 ` Michael S. Tsirkin
2007-03-05 10:11 ` SATA resume slowness, e1000 MSI warning Ingo Molnar
2007-03-06 5:30 ` Jeff Garzik
2007-03-06 6:35 ` Kok, Auke
2007-03-06 9:04 ` Ingo Molnar
2007-03-06 15:34 ` Kok, Auke
2007-03-07 4:15 ` Eric W. Biederman
2007-03-07 16:31 ` Kok, Auke
2007-03-07 16:45 ` Kok, Auke
2007-03-07 19:28 ` Eric W. Biederman
2007-03-08 2:53 ` Andrew Morton
2007-03-08 6:58 ` Eric W. Biederman
2007-03-08 9:55 ` Jeff Garzik
2007-03-08 17:27 ` Eric W. Biederman
2007-03-08 19:58 ` [PATCH 0/2] Repair pci_restore_state when used with device resets Eric W. Biederman
2007-03-08 20:04 ` [PATCH 1/2] msi: Safer state caching Eric W. Biederman
2007-03-08 20:06 ` [PATCH 2/2] pci: Repair pci_save/restore_state so we can restore one save many times Eric W. Biederman
2007-03-10 7:50 ` patch pci-repair-pci_save-restore_state-so-we-can-restore-one-save-many-times.patch added to gregkh-2.6 tree gregkh
2007-03-12 22:46 ` [PATCH 2/2] pci: Repair pci_save/restore_state so we can restore one save many times Kok, Auke
2007-03-10 7:49 ` patch msi-safer-state-caching.patch added to gregkh-2.6 tree gregkh
2007-03-08 20:08 ` [PATCH 0/2] Repair pci_restore_state when used with device resets Ingo Molnar
2007-03-08 20:26 ` Eric W. Biederman
2007-03-08 10:23 ` SATA resume slowness, e1000 MSI warning Michael S. Tsirkin
2007-03-11 11:11 ` Eric W. Biederman
2007-03-11 11:24 ` Michael S. Tsirkin
2007-03-11 17:37 ` Eric W. Biederman
2007-03-11 18:03 ` Michael S. Tsirkin
2007-03-11 18:27 ` Eric W. Biederman
2007-03-11 18:37 ` Michael S. Tsirkin
2007-03-11 19:50 ` Eric W. Biederman
2007-03-12 4:35 ` Michael S. Tsirkin
2007-04-16 19:56 ` Michael S. Tsirkin
2007-03-09 23:06 ` Kok, Auke
2007-03-10 3:41 ` Eric W. Biederman
2007-03-06 9:06 ` Ingo Molnar
2007-03-06 16:26 ` Thomas Gleixner
2007-03-06 16:52 ` Linus Torvalds
2007-03-06 17:09 ` Kok, Auke
2007-03-09 6:44 ` [linux-pm] 2.6.21-rc1: known regressions (part 2) Pavel Machek
2007-03-05 15:34 ` Michael S. Tsirkin
2007-02-27 22:09 ` Adrian Bunk
2007-02-28 7:41 ` Jens Axboe
2007-02-25 18:02 ` 2.6.21-rc1: known regressions (part 3) Adrian Bunk
2007-02-25 20:59 ` Greg KH
2007-02-26 22:01 ` 2.6.21-rc1: known regressions (v2) (part 1) Adrian Bunk
2007-02-27 13:00 ` Meelis Roos
2007-02-27 14:16 ` Alan
2007-02-28 21:13 ` Michael S. Tsirkin
2007-02-28 21:27 ` Thomas Gleixner
2007-02-28 21:40 ` Michael S. Tsirkin
2007-03-01 3:45 ` Jeff Chua
2007-03-02 12:26 ` [linux-pm] " Pavel Machek
2007-03-03 11:17 ` Jens Axboe
2007-03-05 0:04 ` Adrian Bunk
2007-03-06 1:32 ` Jeff Chua
2007-03-06 12:03 ` Jeff Chua
2007-03-06 12:08 ` Michael S. Tsirkin
2007-03-06 12:12 ` Jeff Chua
2007-03-19 15:32 ` Pavel Machek
2007-03-19 21:23 ` Rafael J. Wysocki
2007-02-26 22:05 ` 2.6.21-rc1: known regressions (v2) (part 2) Adrian Bunk
2007-02-27 8:21 ` Thomas Gleixner
2007-02-27 8:33 ` Michal Piotrowski
2007-02-27 8:33 ` Ingo Molnar
2007-02-27 8:54 ` Mike Galbraith
2007-02-27 23:07 ` Con Kolivas [this message]
2007-02-28 4:21 ` Mike Galbraith
2007-02-28 22:01 ` Con Kolivas
2007-03-01 0:02 ` Mike Galbraith
2007-03-01 8:46 ` Ingo Molnar
2007-03-01 11:13 ` Con Kolivas
2007-03-01 11:33 ` Thomas Gleixner
2007-03-01 12:05 ` Con Kolivas
2007-03-01 12:20 ` Thomas Gleixner
2007-03-01 13:30 ` Ingo Molnar
2007-03-01 21:51 ` Con Kolivas
2007-03-01 22:33 ` [PATCH] sched: remove SMT nice Con Kolivas
[not found] ` <20070301173002.456f9534.akpm@linux-foundation.org>
2007-03-02 1:25 ` Con Kolivas
2007-02-27 8:35 ` 2.6.21-rc1: known regressions (v2) (part 2) Michal Piotrowski
[not found] ` <200702271525.48645.ismail@pardus.org.tr>
[not found] ` <b637ec0b0702270614i25b6be9fmfb4b12ddd789a467@mail.gmail.com>
2007-02-27 18:44 ` 2.6.21-rc1: known regressions (v2) (part 1) S.Çağlar Onur
2007-02-27 19:08 ` S.Çağlar Onur
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200702281007.16316.kernel@kolivas.org \
--to=kernel@kolivas.org \
--cc=akpm@linux-foundation.org \
--cc=bunk@stusta.de \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=michal.k.k.piotrowski@gmail.com \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox