Re: [RFC PATCH 3/3] sched/fair: Add a per-shard overload flag

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Vernet <void@manifault.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
	mingo@redhat.com, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	bristot@redhat.com, vschneid@redhat.com, tj@kernel.org,
	roman.gushchin@linux.dev, gautham.shenoy@amd.com,
	aaron.lu@intel.com, wuyun.abel@bytedance.com,
	kernel-team@meta.com
Subject: Re: [RFC PATCH 3/3] sched/fair: Add a per-shard overload flag
Date: Thu, 31 Aug 2023 14:11:03 -0500	[thread overview]
Message-ID: <20230831191103.GC531917@maniforge> (raw)
In-Reply-To: <20230831104508.7619-4-kprateek.nayak@amd.com>

On Thu, Aug 31, 2023 at 04:15:08PM +0530, K Prateek Nayak wrote:

Hi Prateek,

> Even with the two patches, I still observe the following lock
> contention when profiling the tbench 128-clients run with IBS:
> 
>   -   12.61%  swapper          [kernel.vmlinux]         [k] native_queued_spin_lock_slowpath
>      - 10.94% native_queued_spin_lock_slowpath
>         - 10.73% _raw_spin_lock
>            - 9.57% __schedule
>                 schedule_idle
>                 do_idle
>               + cpu_startup_entry
>            - 0.82% task_rq_lock
>                 newidle_balance
>                 pick_next_task_fair
>                 __schedule
>                 schedule_idle
>                 do_idle
>               + cpu_startup_entry
> 
> Since David mentioned rq->avg_idle check is probably not the right step
> towards the solution, this experiment introduces a per-shard
> "overload" flag. Similar to "rq->rd->overload", per-shard overload flag
> notifies of the possibility of one or more rq covered in the shard's
> domain having a queued task. shard's overload flag is set at the same
> time as "rq->rd->overload", and is cleared when shard's list is found
> to be empty.

I think this is an interesting idea, but I feel that it's still working
against the core proposition of SHARED_RUNQ, which is to enable work
conservation.

> With these changes, following are the results for tbench 128-clients:

Just to make sure I understand, this is to address the contention we're
observing on tbench with 64 - 256 clients, right?  That's my
understanding from Gautham's reply in [0].

[0]: https://lore.kernel.org/all/ZOc7i7wM0x4hF4vL@BLR-5CG11610CF.amd.com/

If so, are we sure this change won't regress other workloads that would
have benefited from the work conservation?

Also, I assume that you don't see the improved contention without this,
even if you include your fix to the newidle_balance() that has us skip
over the <= LLC domain?

Thanks,
David

P.S. Taking off on vacation now, so any replies will be very delayed.
Thanks again for working on this!

> 
> tip				: 1.00 (var: 1.00%)
> tip + v3 + series till patch 2	: 0.41 (var: 1.15%) (diff: -58.81%)
> tip + v3 + full series		: 1.01 (var: 0.36%) (diff: +00.92%)
> 
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
>  kernel/sched/fair.c  | 13 +++++++++++--
>  kernel/sched/sched.h | 17 +++++++++++++++++
>  2 files changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 446ffdad49e1..31fe109fdaf0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -186,6 +186,7 @@ static void shared_runq_reassign_domains(void)
>  		rq->cfs.shared_runq = shared_runq;
>  		rq->cfs.shard = &shared_runq->shards[shard_idx];
>  		rq_unlock(rq, &rf);
> +		WRITE_ONCE(rq->cfs.shard->overload, 0);
>  	}
>  }
>  
> @@ -202,6 +203,7 @@ static void __shared_runq_drain(struct shared_runq *shared_runq)
>  		list_for_each_entry_safe(p, tmp, &shard->list, shared_runq_node)
>  			list_del_init(&p->shared_runq_node);
>  		raw_spin_unlock(&shard->lock);
> +		WRITE_ONCE(shard->overload, 0);
>  	}
>  }
>  
> @@ -258,13 +260,20 @@ shared_runq_pop_task(struct shared_runq_shard *shard, int target)
>  {
>  	struct task_struct *p;
>  
> -	if (list_empty(&shard->list))
> +	if (!READ_ONCE(shard->overload))
>  		return NULL;
>  
> +	if (list_empty(&shard->list)) {
> +		WRITE_ONCE(shard->overload, 0);
> +		return NULL;
> +	}
> +
>  	raw_spin_lock(&shard->lock);
>  	p = list_first_entry_or_null(&shard->list, struct task_struct,
>  				     shared_runq_node);
> -	if (p && is_cpu_allowed(p, target))
> +	if (!p)
> +		WRITE_ONCE(shard->overload, 0);
> +	else if (is_cpu_allowed(p, target))
>  		list_del_init(&p->shared_runq_node);
>  	else
>  		p = NULL;
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index f50176f720b1..e8d4d948f742 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -601,6 +601,20 @@ do {									\
>  struct shared_runq_shard {
>  	struct list_head list;
>  	raw_spinlock_t lock;
> +	/*
> +	 * shared_runq_shard can contain running tasks.
> +	 * In such cases where all the tasks are running,
> +	 * it is futile to attempt to pull tasks from the
> +	 * list. Overload flag is used to indicate case
> +	 * where one or more rq in the shard domain may
> +	 * have a queued task. If the flag is 0, it is
> +	 * very likely that all tasks in the shard are
> +	 * running and cannot be migrated. This is not
> +	 * guarded by the shard lock, and since it may
> +	 * be updated often, it is placed into its own
> +	 * cacheline.
> +	 */
> +	int overload ____cacheline_aligned;
>  } ____cacheline_aligned;
>  
>  /* This would likely work better as a configurable knob via debugfs */
> @@ -2585,6 +2599,9 @@ static inline void add_nr_running(struct rq *rq, unsigned count)
>  	if (prev_nr < 2 && rq->nr_running >= 2) {
>  		if (!READ_ONCE(rq->rd->overload))
>  			WRITE_ONCE(rq->rd->overload, 1);
> +
> +		if (rq->cfs.shard && !READ_ONCE(rq->cfs.shard->overload))
> +			WRITE_ONCE(rq->cfs.shard->overload, 1);
>  	}
>  #endif
>  
> -- 
> 2.34.1
>

next prev parent reply	other threads:[~2023-08-31 19:11 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-09 22:12 [PATCH v3 0/7] sched: Implement shared runqueue in CFS David Vernet
2023-08-09 22:12 ` [PATCH v3 1/7] sched: Expose move_queued_task() from core.c David Vernet
2023-08-09 22:12 ` [PATCH v3 2/7] sched: Move is_cpu_allowed() into sched.h David Vernet
2023-08-09 22:12 ` [PATCH v3 3/7] sched: Check cpu_active() earlier in newidle_balance() David Vernet
2023-08-09 22:12 ` [PATCH v3 4/7] sched: Enable sched_feat callbacks on enable/disable David Vernet
2023-08-09 22:12 ` [PATCH v3 5/7] sched/fair: Add SHARED_RUNQ sched feature and skeleton calls David Vernet
2023-08-09 22:12 ` [PATCH v3 6/7] sched: Implement shared runqueue in CFS David Vernet
2023-08-10  7:11   ` kernel test robot
2023-08-10  7:41   ` kernel test robot
2023-08-30  6:46   ` K Prateek Nayak
2023-08-31  1:34     ` David Vernet
2023-08-31  3:47       ` K Prateek Nayak
2023-08-09 22:12 ` [PATCH v3 7/7] sched: Shard per-LLC shared runqueues David Vernet
2023-08-09 23:46   ` kernel test robot
2023-08-10  0:12     ` David Vernet
2023-08-10  7:11   ` kernel test robot
2023-08-30  6:17   ` Chen Yu
2023-08-31  0:01     ` David Vernet
2023-08-31 10:45       ` Chen Yu
2023-08-31 19:14         ` David Vernet
2023-09-23  6:35           ` Chen Yu
2023-08-17  8:42 ` [PATCH v3 0/7] sched: Implement shared runqueue in CFS Gautham R. Shenoy
2023-08-18  5:03   ` David Vernet
2023-08-18  8:49     ` Gautham R. Shenoy
2023-08-24 11:14       ` Gautham R. Shenoy
2023-08-24 22:51         ` David Vernet
2023-08-30  9:56           ` K Prateek Nayak
2023-08-31  2:32             ` David Vernet
2023-08-31  4:21               ` K Prateek Nayak
2023-08-31 10:45             ` [RFC PATCH 0/3] DO NOT MERGE: Breaking down the experimantal diff K Prateek Nayak
2023-08-31 10:45               ` [RFC PATCH 1/3] sched/fair: Move SHARED_RUNQ related structs and definitions into sched.h K Prateek Nayak
2023-08-31 10:45               ` [RFC PATCH 2/3] sched/fair: Improve integration of SHARED_RUNQ feature within newidle_balance K Prateek Nayak
2023-08-31 18:45                 ` David Vernet
2023-08-31 19:47                   ` K Prateek Nayak
2023-08-31 10:45               ` [RFC PATCH 3/3] sched/fair: Add a per-shard overload flag K Prateek Nayak
2023-08-31 19:11                 ` David Vernet [this message]
2023-08-31 20:23                   ` K Prateek Nayak
2023-09-29 17:01                     ` David Vernet
2023-10-04  4:21                       ` K Prateek Nayak
2023-10-04 17:20                         ` David Vernet
2023-10-05  3:50                           ` K Prateek Nayak
2023-09-27  4:23                   ` K Prateek Nayak
2023-09-27  6:59                     ` Chen Yu
2023-09-27  8:36                       ` K Prateek Nayak
2023-09-28  8:41                         ` Chen Yu
2023-10-03 21:05                       ` David Vernet
2023-10-07  2:10                         ` Chen Yu
2023-09-27 13:08                     ` David Vernet
2023-11-27  8:28 ` [PATCH v3 0/7] sched: Implement shared runqueue in CFS Aboorva Devarajan
2023-11-27 19:49   ` David Vernet
2023-12-07  6:00     ` Aboorva Devarajan
2023-12-04 19:30 ` David Vernet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230831191103.GC531917@maniforge \
    --to=void@manifault.com \
    --cc=aaron.lu@intel.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=gautham.shenoy@amd.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@meta.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=wuyun.abel@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.