public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org,
	Fernando Luis Vazquez Cao <fernando_b1@lab.ntt.co.jp>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Arjan van de Ven <arjan@linux.intel.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Preeti U Murthy <preeti@linux.vnet.ibm.com>,
	Denys Vlasenko <vda.linux@googlemail.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH 2/2] nohz: delayed iowait accounting for nohz idle time stats
Date: Thu, 24 Apr 2014 14:50:05 +0900	[thread overview]
Message-ID: <5358A60D.6070407@jp.fujitsu.com> (raw)
In-Reply-To: <20140423094119.GJ11096@twins.programming.kicks-ass.net>

(2014/04/23 18:41), Peter Zijlstra wrote:
> On Wed, Apr 23, 2014 at 04:40:18PM +0900, Hidetoshi Seto wrote:
>> (2014/04/23 4:45), Peter Zijlstra wrote:
>>> On Thu, Apr 17, 2014 at 06:41:41PM +0900, Hidetoshi Seto wrote:
>>>> [TARGET OF THIS PATCH]:
>>>>
>>>>   Complete rework for iowait accounting implies that some user
>>>>   interfaces might be replaced completely. It will introduce new
>>>>   counter or so, and kill per-cpu iowait counter which is known to
>>>>   being nonsense. It will take long time to be achieved, considering
>>>>   time to make the problem to a public knowledge, and also time for
>>>>   interface transition. Anyway as long as linux try to be reliable OS,
>>>>   such drastic change must be placed on mainline kernel in development
>>>>   sooner or later.
>>>>
>>>>   OTOH, drastic change like above is not acceptable for conservative
>>>>   environment, such as stable kernels, some distributor's kernel and
>>>>   so on, due to respect for compatibility. Still these kernel expects
>>>>   per-cpu iowait counter works as well as it might have been.
>>>>   Therefore for such environment, applying a simple patch to fix
>>>>   behavior of counter will be appreciated rather than replacing an
>>>>   over-used interface or changing an existing usage/semantics.
>>>>
>>>>   This patch targets latter kernels mainly. It does not do too much,
>>>>   but intend to fix the idle stats counters to be monotonic. I believe
>>>>   that mainline kernel should pick up this patch too, because it
>>>>   surely fix a hidden bug and does not intercept upcoming drastic
>>>>   change.
>>>>
>>>>   Again, in summary, this patch does not do drastic change to cope
>>>>   with problem 2. But it just fix behavior of idle/iowait counter of
>>>>   /proc/stats.
>>>>
>>>> [WHAT THIS PATCH PROPOSED]:
>>>>
>>>>   The main reason of the bug is that observers have no idea to
>>>>   determine the delta to be idle or iowait at the first place.
>>>>
>>>>   So I introduced delayed iowait accounting to allow observers to
>>>>   assign delta based on status of observed cpu at idle entry.
>>>>
>>>
>>> So the problem I have with this is that it makes CONFIG_NOHZ=[ny]
>>> kernels behave quite differently.
>>
>> It is not true.
>> There are already differences before applying my patches.
>> The behavior of NOHZ=y kernel diverged from original since it was born.
> 
> But if you argue about not actually fixing iowait properly, then this
> difference is the only actual regression.

A difference is not always a regression.

>> Please note that no one complained about this difference.
> 
> Then why are you working on this? You're the one that said there was a
> regression between unnamed enterprise distro 5 and unnamed enterprise
> distro 6.

e.g.
 regression: a counter loses monotonicity
 improvement: drop power consumption by skipping tick during in idle
 not matter: useless fuzzy crap value turned to be another crap

If you say every difference is regression, then all kernel config
must be nop.

>> I just want to fix a counter not to go backward.
>> It's a simple bug, isn't it?
> 
> Seeing how we managed to send as many patches as we did, I'd say that's
> a fairly big clue as to how its not as simple.

I think the situation is that a simple small bug is glued to big
complicated background. I want to separate them and handle only
a small bug, therefore I wrote patch description a lot for background
and my target.

> Just make the value unconditionally 0 then. That's guaranteed not to go
> backwards and just about as useful as the random fwd walk you make it.
> Plus, its a lot easier.

I'd like to hire constant 0 if it is acceptable for stables.

I suppose it could be considered as improvement for mainline kernel
since it will be the first step for upcoming drastic changes.
But I also suppose it could be classified as regression for stable
kernels because one counter ordinarily used stops its progress.

>>> Ideally NOHZ=y and NOHZ=n behave the same, my proposed solution the
>>> other day does just that. This one OTOH seems to generate entirely
>>> different results between those kernels.
>>
>> As you know, behaviors of NOHZ=[ny] are both crap because of per-cpu
>> iowait accounting.
>>
>> I guess we should say:
>>   Ideally NOHZ=[ny] behave the same "in the proper way."
> 
> No, the premise of NOHZ is that behaviour should not change, we found a
> change in behaviour, we should make it go away.
> 
> Secondly, with or without NOHZ iowait accounting is complete crap. We
> should also fix that.

Humm..? I could not catch the reason why you say no here.
I think wasting time to unify 2 craps into 1 crap is not necessary
before replacing craps by a proper thing. 

>> What you proposed will do too much to make one nonsense to another
>> nonsense. It will be unhelpful for people...
> 
> I proposed that if you don't want to fix the actual iowait is crap
> problem, you at least fix the NOHZ regression proper.

I'd like to target only a part of differences which is considered as
regression.

I'm being overly cautious since removing or stopping this crap counter
could be new regression.

>>> It also doesn't really simplify the code; there's quite a bit of
>>> complexity introduced to paper over this silly stuff.
>>
>> Don't complicate things. I want to talk about a small simple bug.
>>
>> a) today's NOHZ=n
>>    - provides per-cpu iowait counter
>>      (nonsense but still loved by innocent userland)
>> b) today's NOHZ=y
>>    - provides per-cpu iowait counter
>>    - use it's own idle time accounting different from a)
>>    - have a *bug* that counter might go backward
> 
> Like said, the intent of NOHZ is to not change accounting, its often
> more complicated from a because well, no ticks.
> 
> But it really should give the same number, nonsense or not. We do the
> same for all other numbers -- we spend a ton of effort to fix the
> loadavg a year or two ago.
> 
> loadavg is also another dubious number, fwiw.

I can understand how you feel.

>> b') NOHZ=y + my patch
>>    - provides per-cpu iowait counter
>>    - use it's own idle time accounting same as b)
>>    - *bug* in b) have gone
>>    - instead accept gap in iowait value from b)
>>      - "pending" will not bloat more than one iowait span
>>
>> c) unified something for NOHZ=[ny] (you proposed)
>>    - provides per-cpu iowait counter
>>    - use new accounting different from a) and also b)
>>
>> d) ideal goal (=not designed & realized yet)
>>    - no per-cpu iowait counter any more
>>    - use new proper accounting different from all of above
>>
>> I just want to make b) to b') by a patch as small as possible.
> 
> I really don't see the value of b', the actual value of its result is
> really no better than the constant 0, and the constant implementation is
> _way_ simpler.

 a) monotonic per-cpu counter providing useless value
 b) non-monotonic per-cpu counter providing useless value
 b') monotonic per-cpu counter providing useless value
 c) monotonic per-cpu counter providing useless value
 d) monotonic counter(s) providing proper value
 x) constant 0 counter

My evaluation is:
  for stables:
   (bad)  (b,d,x) <<<<< (a,b',c) (good)
  for mainline:
   (bad)  b <<<<< (a,b',c) <<<<< x <<<<< d  (good) 

>> What you proposed will make both of a) and b) to c).
>> I think it does too much and changing a) is not required here.
>> (from my conservative perspective, patch must be non-intrusive)
>>
>> Our final goal must be making d) to replace all nasty things
>> around there. But still there is no idea to do that.
>> And such big jump will not fit to stable environments.
>>
>> Again, I just want to fix a small bug here...
>>
>> I believe my patch is enough simple to do a minimum fix.
>> Please tell me if you have more simple/better way to fix this
>> long-standing minor bug, not only for mainline in development
>> but also for conservative stables like distributor's kernel.
> 
> I really think only fixing the backward motion is retarded. Its papering
> over so much crap its not funny.
> 
> The fact that it does go backwards is because b does not give the
> identical results from a, _that_ is a bug per the design principle of
> NOHZ.
> 
> Fixing it to just return some random number that doesn't go backwards
> doesn't fix it. It just makes the immediately observed problem go away;
> entirely the wrong mindset.

I feel like that you are going to use cardboards instead of papers...


BTW, I found that Denys posted his updated patch set: 
https://lkml.org/lkml/2014/4/23/579

I think it looks like taking almost same approach as what you
proposed. However I guess:
  - It should not have #ifdef CONFIG_NOHZ_* because it make
    differences between behavior of NOHZ=[ny].
  - last_iowait need to be located in rq rather than in struct
    tick_sched, considering cache hit at io_schedule*().
  - last_iowait need to be referred in tick for NOHZ=n, not to make
    difference from NOHZ=y.

Anyway, though I still wonder what you proposed is acceptable fix
for distro's quality assurance, I hope someone in charge may have
broad mind like you and may resolve the matter favorably.

Is it worth to do if I make v5 based on your proposal and post it
for review comparing with v4?


Thanks,
H.Seto


  reply	other threads:[~2014-04-24  5:50 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-17  9:35 [PATCH v4 0/2] nohz: fix idle accounting in NO_HZ kernels Hidetoshi Seto
2014-04-17  9:38 ` [PATCH 1/2] nohz: make updating sleep stats local Hidetoshi Seto
2014-04-17  9:41 ` [PATCH 2/2] nohz: delayed iowait accounting for nohz idle time stats Hidetoshi Seto
2014-04-22 19:45   ` Peter Zijlstra
2014-04-23  7:40     ` Hidetoshi Seto
2014-04-23  9:41       ` Peter Zijlstra
2014-04-24  5:50         ` Hidetoshi Seto [this message]
2014-04-22  6:34 ` [PATCH v4 0/2] nohz: fix idle accounting in NO_HZ kernels Hidetoshi Seto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5358A60D.6070407@jp.fujitsu.com \
    --to=seto.hidetoshi@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@linux.intel.com \
    --cc=fernando_b1@lab.ntt.co.jp \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=peterz@infradead.org \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vda.linux@googlemail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox