From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org,
Fernando Luis Vazquez Cao <fernando_b1@lab.ntt.co.jp>,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
Frederic Weisbecker <fweisbec@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Arjan van de Ven <arjan@linux.intel.com>,
Oleg Nesterov <oleg@redhat.com>,
Preeti U Murthy <preeti@linux.vnet.ibm.com>,
Denys Vlasenko <vda.linux@googlemail.com>,
stable@vger.kernel.org
Subject: Re: [PATCH 2/2] nohz: delayed iowait accounting for nohz idle time stats
Date: Thu, 24 Apr 2014 14:50:05 +0900 [thread overview]
Message-ID: <5358A60D.6070407@jp.fujitsu.com> (raw)
In-Reply-To: <20140423094119.GJ11096@twins.programming.kicks-ass.net>
(2014/04/23 18:41), Peter Zijlstra wrote:
> On Wed, Apr 23, 2014 at 04:40:18PM +0900, Hidetoshi Seto wrote:
>> (2014/04/23 4:45), Peter Zijlstra wrote:
>>> On Thu, Apr 17, 2014 at 06:41:41PM +0900, Hidetoshi Seto wrote:
>>>> [TARGET OF THIS PATCH]:
>>>>
>>>> Complete rework for iowait accounting implies that some user
>>>> interfaces might be replaced completely. It will introduce new
>>>> counter or so, and kill per-cpu iowait counter which is known to
>>>> being nonsense. It will take long time to be achieved, considering
>>>> time to make the problem to a public knowledge, and also time for
>>>> interface transition. Anyway as long as linux try to be reliable OS,
>>>> such drastic change must be placed on mainline kernel in development
>>>> sooner or later.
>>>>
>>>> OTOH, drastic change like above is not acceptable for conservative
>>>> environment, such as stable kernels, some distributor's kernel and
>>>> so on, due to respect for compatibility. Still these kernel expects
>>>> per-cpu iowait counter works as well as it might have been.
>>>> Therefore for such environment, applying a simple patch to fix
>>>> behavior of counter will be appreciated rather than replacing an
>>>> over-used interface or changing an existing usage/semantics.
>>>>
>>>> This patch targets latter kernels mainly. It does not do too much,
>>>> but intend to fix the idle stats counters to be monotonic. I believe
>>>> that mainline kernel should pick up this patch too, because it
>>>> surely fix a hidden bug and does not intercept upcoming drastic
>>>> change.
>>>>
>>>> Again, in summary, this patch does not do drastic change to cope
>>>> with problem 2. But it just fix behavior of idle/iowait counter of
>>>> /proc/stats.
>>>>
>>>> [WHAT THIS PATCH PROPOSED]:
>>>>
>>>> The main reason of the bug is that observers have no idea to
>>>> determine the delta to be idle or iowait at the first place.
>>>>
>>>> So I introduced delayed iowait accounting to allow observers to
>>>> assign delta based on status of observed cpu at idle entry.
>>>>
>>>
>>> So the problem I have with this is that it makes CONFIG_NOHZ=[ny]
>>> kernels behave quite differently.
>>
>> It is not true.
>> There are already differences before applying my patches.
>> The behavior of NOHZ=y kernel diverged from original since it was born.
>
> But if you argue about not actually fixing iowait properly, then this
> difference is the only actual regression.
A difference is not always a regression.
>> Please note that no one complained about this difference.
>
> Then why are you working on this? You're the one that said there was a
> regression between unnamed enterprise distro 5 and unnamed enterprise
> distro 6.
e.g.
regression: a counter loses monotonicity
improvement: drop power consumption by skipping tick during in idle
not matter: useless fuzzy crap value turned to be another crap
If you say every difference is regression, then all kernel config
must be nop.
>> I just want to fix a counter not to go backward.
>> It's a simple bug, isn't it?
>
> Seeing how we managed to send as many patches as we did, I'd say that's
> a fairly big clue as to how its not as simple.
I think the situation is that a simple small bug is glued to big
complicated background. I want to separate them and handle only
a small bug, therefore I wrote patch description a lot for background
and my target.
> Just make the value unconditionally 0 then. That's guaranteed not to go
> backwards and just about as useful as the random fwd walk you make it.
> Plus, its a lot easier.
I'd like to hire constant 0 if it is acceptable for stables.
I suppose it could be considered as improvement for mainline kernel
since it will be the first step for upcoming drastic changes.
But I also suppose it could be classified as regression for stable
kernels because one counter ordinarily used stops its progress.
>>> Ideally NOHZ=y and NOHZ=n behave the same, my proposed solution the
>>> other day does just that. This one OTOH seems to generate entirely
>>> different results between those kernels.
>>
>> As you know, behaviors of NOHZ=[ny] are both crap because of per-cpu
>> iowait accounting.
>>
>> I guess we should say:
>> Ideally NOHZ=[ny] behave the same "in the proper way."
>
> No, the premise of NOHZ is that behaviour should not change, we found a
> change in behaviour, we should make it go away.
>
> Secondly, with or without NOHZ iowait accounting is complete crap. We
> should also fix that.
Humm..? I could not catch the reason why you say no here.
I think wasting time to unify 2 craps into 1 crap is not necessary
before replacing craps by a proper thing.
>> What you proposed will do too much to make one nonsense to another
>> nonsense. It will be unhelpful for people...
>
> I proposed that if you don't want to fix the actual iowait is crap
> problem, you at least fix the NOHZ regression proper.
I'd like to target only a part of differences which is considered as
regression.
I'm being overly cautious since removing or stopping this crap counter
could be new regression.
>>> It also doesn't really simplify the code; there's quite a bit of
>>> complexity introduced to paper over this silly stuff.
>>
>> Don't complicate things. I want to talk about a small simple bug.
>>
>> a) today's NOHZ=n
>> - provides per-cpu iowait counter
>> (nonsense but still loved by innocent userland)
>> b) today's NOHZ=y
>> - provides per-cpu iowait counter
>> - use it's own idle time accounting different from a)
>> - have a *bug* that counter might go backward
>
> Like said, the intent of NOHZ is to not change accounting, its often
> more complicated from a because well, no ticks.
>
> But it really should give the same number, nonsense or not. We do the
> same for all other numbers -- we spend a ton of effort to fix the
> loadavg a year or two ago.
>
> loadavg is also another dubious number, fwiw.
I can understand how you feel.
>> b') NOHZ=y + my patch
>> - provides per-cpu iowait counter
>> - use it's own idle time accounting same as b)
>> - *bug* in b) have gone
>> - instead accept gap in iowait value from b)
>> - "pending" will not bloat more than one iowait span
>>
>> c) unified something for NOHZ=[ny] (you proposed)
>> - provides per-cpu iowait counter
>> - use new accounting different from a) and also b)
>>
>> d) ideal goal (=not designed & realized yet)
>> - no per-cpu iowait counter any more
>> - use new proper accounting different from all of above
>>
>> I just want to make b) to b') by a patch as small as possible.
>
> I really don't see the value of b', the actual value of its result is
> really no better than the constant 0, and the constant implementation is
> _way_ simpler.
a) monotonic per-cpu counter providing useless value
b) non-monotonic per-cpu counter providing useless value
b') monotonic per-cpu counter providing useless value
c) monotonic per-cpu counter providing useless value
d) monotonic counter(s) providing proper value
x) constant 0 counter
My evaluation is:
for stables:
(bad) (b,d,x) <<<<< (a,b',c) (good)
for mainline:
(bad) b <<<<< (a,b',c) <<<<< x <<<<< d (good)
>> What you proposed will make both of a) and b) to c).
>> I think it does too much and changing a) is not required here.
>> (from my conservative perspective, patch must be non-intrusive)
>>
>> Our final goal must be making d) to replace all nasty things
>> around there. But still there is no idea to do that.
>> And such big jump will not fit to stable environments.
>>
>> Again, I just want to fix a small bug here...
>>
>> I believe my patch is enough simple to do a minimum fix.
>> Please tell me if you have more simple/better way to fix this
>> long-standing minor bug, not only for mainline in development
>> but also for conservative stables like distributor's kernel.
>
> I really think only fixing the backward motion is retarded. Its papering
> over so much crap its not funny.
>
> The fact that it does go backwards is because b does not give the
> identical results from a, _that_ is a bug per the design principle of
> NOHZ.
>
> Fixing it to just return some random number that doesn't go backwards
> doesn't fix it. It just makes the immediately observed problem go away;
> entirely the wrong mindset.
I feel like that you are going to use cardboards instead of papers...
BTW, I found that Denys posted his updated patch set:
https://lkml.org/lkml/2014/4/23/579
I think it looks like taking almost same approach as what you
proposed. However I guess:
- It should not have #ifdef CONFIG_NOHZ_* because it make
differences between behavior of NOHZ=[ny].
- last_iowait need to be located in rq rather than in struct
tick_sched, considering cache hit at io_schedule*().
- last_iowait need to be referred in tick for NOHZ=n, not to make
difference from NOHZ=y.
Anyway, though I still wonder what you proposed is acceptable fix
for distro's quality assurance, I hope someone in charge may have
broad mind like you and may resolve the matter favorably.
Is it worth to do if I make v5 based on your proposal and post it
for review comparing with v4?
Thanks,
H.Seto
next prev parent reply other threads:[~2014-04-24 5:50 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-17 9:35 [PATCH v4 0/2] nohz: fix idle accounting in NO_HZ kernels Hidetoshi Seto
2014-04-17 9:38 ` [PATCH 1/2] nohz: make updating sleep stats local Hidetoshi Seto
2014-04-17 9:41 ` [PATCH 2/2] nohz: delayed iowait accounting for nohz idle time stats Hidetoshi Seto
2014-04-22 19:45 ` Peter Zijlstra
2014-04-23 7:40 ` Hidetoshi Seto
2014-04-23 9:41 ` Peter Zijlstra
2014-04-24 5:50 ` Hidetoshi Seto [this message]
2014-04-22 6:34 ` [PATCH v4 0/2] nohz: fix idle accounting in NO_HZ kernels Hidetoshi Seto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5358A60D.6070407@jp.fujitsu.com \
--to=seto.hidetoshi@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=arjan@linux.intel.com \
--cc=fernando_b1@lab.ntt.co.jp \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=peterz@infradead.org \
--cc=preeti@linux.vnet.ibm.com \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=vda.linux@googlemail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox