public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Giovanni Gherdovich <ggherdovich@suse.cz>
To: Stanislaw Gruszka <sgruszka@redhat.com>, linux-kernel@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Mike Galbraith <mgalbraith@suse.de>,
	Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [PATCH] sched/cputime: do not account thread group tasks pending runtime to improve performance
Date: Fri, 26 Aug 2016 17:24:26 +0200	[thread overview]
Message-ID: <1472225066.1821.24.camel@suse.cz> (raw)
In-Reply-To: <20160817093043.GA25206@redhat.com>

On Wed, 2016-08-17 at 11:30 +0200, Stanislaw Gruszka wrote:
> Commit d670ec13178d0 ("posix-cpu-timers: Cure SMP wobbles") makes we
> account thread group tasks pending runtime in thread_group_cputime().
> Another commit 6e998916dfe32 ("sched/cputime:
> Fix clock_nanosleep()/clock_gettime() inconsistency") makes we update
> scheduler runtime statistics (call update_curr()) when read task pending
> runtime. Those changes cause bad performance of times() and
> clock_gettimes(CLOCK_PROCESS_CPUTIME_ID) syscalls.
> 
> While we would like to have cpuclock monotonicity kept i.e. have
> problems fixed by above commits stay fixed, we also would like to have
> good performance.
>
>                  [... snip ...]
>
> Reported-and-tested-by: Giovanni Gherdovich <ggherdovich@suse.cz>
> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
> ---
>  kernel/sched/cputime.c | 33 ++++++++++++++++++++++++++++++++-
>  1 file changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 1934f65..4fca604 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -301,6 +301,26 @@ static inline cputime_t account_other_time(cputime_t max)
>  	return accounted;
>  }
>  
> +#ifdef CONFIG_64BIT
> +static inline u64 read_sum_exec_runtime(struct task_struct *t)
> +{
> +	return t->se.sum_exec_runtime;
> +}
> +#else
> +static u64 read_sum_exec_runtime(struct task_struct *t)
> +{
> +	u64 ns;
> +	struct rq_flags rf;
> +	struct rq *rq;
> +
> +	rq = task_rq_lock(t, &rf);
> +	ns = t->se.sum_exec_runtime;
> +	task_rq_unlock(rq, t, &rf);
> +
> +	return ns;
> +}
> +#endif
> +
>  /*
>   * Accumulate raw cputime values of dead tasks (sig->[us]time) and live
>   * tasks (sum on group iteration) belonging to @tsk's group.
> @@ -313,6 +333,17 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
>  	unsigned int seq, nextseq;
>  	unsigned long flags;
>  
> +	/*
> +	 * Update current task runtime to account pending time since last
> +	 * scheduler action or thread_group_cputime() call. This thread group
> +	 * might have other running tasks on different CPUs, but updating
> +	 * their runtime can affect syscall performance, so we skip account
> +	 * those pending times and rely only on values updated on tick or
> +	 * other scheduler action.
> +	 */
> +	if (same_thread_group(current, tsk))
> +		(void) task_sched_runtime(current);
> +
>  	rcu_read_lock();
>  	/* Attempt a lockless read on the first round. */
>  	nextseq = 0;
> @@ -327,7 +358,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
>  			task_cputime(t, &utime, &stime);
>  			times->utime += utime;
>  			times->stime += stime;
> -			times->sum_exec_runtime += task_sched_runtime(t);
> +			times->sum_exec_runtime += read_sum_exec_runtime(t);
>  		}
>  		/* If lockless access failed, take the lock. */
>  		nextseq = 1;

Hello Stanislaw and all,

I know I'm quite late to the party as this patch is already taken in Ingo's
"tip" repo, but I want to chime in anyway and give my positive review and
acknowledgment of the patch.

The patch works as advertised in the commit message; the time accounting
behaviour you're changing is consistent with what happened before
d670ec13178d0 "posix-cpu-timers: Cure SMP wobbles", i.e. only the runtime
statistics for the current task are up-to-date and not those for all the other
threads in the group. As you say, that's how things used to work -- I'm
favorable to this trade-off.

You correctly address Mel Gorman's remark ("how do you know that tsk ==
current?") by using the "current" macro when you call task_sched_runtime.
As you note, task_sched_runtime(current) (which in turns call update_curr on
that task) is all you need to solve the problem of "the diff of 'process'
should always be >= the diff of 'thread'" that you initially addressed in your
6e998916df "sched/cputime: Fix clock_nanosleep()/clock_gettime()
inconsistency".

Acked-by: Giovanni Gherdovich <ggherdovich@suse.cz>


--
Giovanni Gherdovich
SUSE Labs

      parent reply	other threads:[~2016-08-26 15:24 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-17  9:30 [PATCH] sched/cputime: do not account thread group tasks pending runtime to improve performance Stanislaw Gruszka
2016-08-18 11:04 ` [tip:sched/core] sched/cputime: Improve scalability by not accounting thread group tasks pending runtime tip-bot for Stanislaw Gruszka
2016-08-26 15:24 ` Giovanni Gherdovich [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1472225066.1821.24.camel@suse.cz \
    --to=ggherdovich@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgalbraith@suse.de \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sgruszka@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox