From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932715AbcBIRLo (ORCPT ); Tue, 9 Feb 2016 12:11:44 -0500 Received: from mail-wm0-f45.google.com ([74.125.82.45]:35237 "EHLO mail-wm0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753259AbcBIRLn (ORCPT ); Tue, 9 Feb 2016 12:11:43 -0500 Date: Tue, 9 Feb 2016 18:11:40 +0100 From: Frederic Weisbecker To: riel@redhat.com Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@kernel.org, luto@amacapital.net, peterz@infradead.org, clark@redhat.com, eric.dumazet@gmail.com Subject: Re: [PATCH 4/4] sched,time: only call account_{user,sys,guest,idle}_time once a jiffy Message-ID: <20160209171139.GA7535@lerouge> References: <1454433586-3234-1-git-send-email-riel@redhat.com> <1454433586-3234-5-git-send-email-riel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1454433586-3234-5-git-send-email-riel@redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 02, 2016 at 12:19:46PM -0500, riel@redhat.com wrote: > From: Rik van Riel > > After removing __acct_update_integrals from the profile, > native_sched_clock remains as the top CPU user. This can be > reduced by only calling account_{user,sys,guest,idle}_time > once per jiffy for long running tasks on nohz_full CPUs. > > This will reduce timing accuracy on nohz_full CPUs to jiffy > based sampling, just like on normal CPUs. I wonder if that assumption is actually right. With tick based sampling, we indeed have a statistical accounting which precision is based on HZ. Now the time accounted when the tick fires is always a single unit: 1 jiffy. So we have a well distributed accounting value because it's constant and based on the probability of a periodic event. So for any T_slice being a given cpu timeslice (in secs) executed between two ring switch (user <-> kernel), we are going to account: 1 * P(T_slice*HZ) (P() stand for probability here). Now after this patch, the scenario is rather different. We are accounting the real time spent in a slice with a similar probablity. This becomes: T_slice * P(T_slice*HZ). So it seems it could result into logarithmic accounting: timeslices of 1 second will be accounted right whereas repeating tiny timeslices may result in much lower values than expected. To fix this we should instead account jiffies_to_nsecs(jiffies - t->vtime_jiffies). Well, that would drop the use of finegrained clock and even the need of nsecs based cputime. But why not if we still have acceptable result for much more performances. I don't know if all the above actually makes sense. I suck at maths so I may well be wrong.