From: David Vernet <void@manifault.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
mingo@redhat.com, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
bristot@redhat.com, vschneid@redhat.com, tj@kernel.org,
roman.gushchin@linux.dev, gautham.shenoy@amd.com,
aaron.lu@intel.com, wuyun.abel@bytedance.com,
kernel-team@meta.com
Subject: Re: [RFC PATCH 3/3] sched/fair: Add a per-shard overload flag
Date: Thu, 31 Aug 2023 14:11:03 -0500 [thread overview]
Message-ID: <20230831191103.GC531917@maniforge> (raw)
In-Reply-To: <20230831104508.7619-4-kprateek.nayak@amd.com>
On Thu, Aug 31, 2023 at 04:15:08PM +0530, K Prateek Nayak wrote:
Hi Prateek,
> Even with the two patches, I still observe the following lock
> contention when profiling the tbench 128-clients run with IBS:
>
> - 12.61% swapper [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> - 10.94% native_queued_spin_lock_slowpath
> - 10.73% _raw_spin_lock
> - 9.57% __schedule
> schedule_idle
> do_idle
> + cpu_startup_entry
> - 0.82% task_rq_lock
> newidle_balance
> pick_next_task_fair
> __schedule
> schedule_idle
> do_idle
> + cpu_startup_entry
>
> Since David mentioned rq->avg_idle check is probably not the right step
> towards the solution, this experiment introduces a per-shard
> "overload" flag. Similar to "rq->rd->overload", per-shard overload flag
> notifies of the possibility of one or more rq covered in the shard's
> domain having a queued task. shard's overload flag is set at the same
> time as "rq->rd->overload", and is cleared when shard's list is found
> to be empty.
I think this is an interesting idea, but I feel that it's still working
against the core proposition of SHARED_RUNQ, which is to enable work
conservation.
> With these changes, following are the results for tbench 128-clients:
Just to make sure I understand, this is to address the contention we're
observing on tbench with 64 - 256 clients, right? That's my
understanding from Gautham's reply in [0].
[0]: https://lore.kernel.org/all/ZOc7i7wM0x4hF4vL@BLR-5CG11610CF.amd.com/
If so, are we sure this change won't regress other workloads that would
have benefited from the work conservation?
Also, I assume that you don't see the improved contention without this,
even if you include your fix to the newidle_balance() that has us skip
over the <= LLC domain?
Thanks,
David
P.S. Taking off on vacation now, so any replies will be very delayed.
Thanks again for working on this!
>
> tip : 1.00 (var: 1.00%)
> tip + v3 + series till patch 2 : 0.41 (var: 1.15%) (diff: -58.81%)
> tip + v3 + full series : 1.01 (var: 0.36%) (diff: +00.92%)
>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
> kernel/sched/fair.c | 13 +++++++++++--
> kernel/sched/sched.h | 17 +++++++++++++++++
> 2 files changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 446ffdad49e1..31fe109fdaf0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -186,6 +186,7 @@ static void shared_runq_reassign_domains(void)
> rq->cfs.shared_runq = shared_runq;
> rq->cfs.shard = &shared_runq->shards[shard_idx];
> rq_unlock(rq, &rf);
> + WRITE_ONCE(rq->cfs.shard->overload, 0);
> }
> }
>
> @@ -202,6 +203,7 @@ static void __shared_runq_drain(struct shared_runq *shared_runq)
> list_for_each_entry_safe(p, tmp, &shard->list, shared_runq_node)
> list_del_init(&p->shared_runq_node);
> raw_spin_unlock(&shard->lock);
> + WRITE_ONCE(shard->overload, 0);
> }
> }
>
> @@ -258,13 +260,20 @@ shared_runq_pop_task(struct shared_runq_shard *shard, int target)
> {
> struct task_struct *p;
>
> - if (list_empty(&shard->list))
> + if (!READ_ONCE(shard->overload))
> return NULL;
>
> + if (list_empty(&shard->list)) {
> + WRITE_ONCE(shard->overload, 0);
> + return NULL;
> + }
> +
> raw_spin_lock(&shard->lock);
> p = list_first_entry_or_null(&shard->list, struct task_struct,
> shared_runq_node);
> - if (p && is_cpu_allowed(p, target))
> + if (!p)
> + WRITE_ONCE(shard->overload, 0);
> + else if (is_cpu_allowed(p, target))
> list_del_init(&p->shared_runq_node);
> else
> p = NULL;
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index f50176f720b1..e8d4d948f742 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -601,6 +601,20 @@ do { \
> struct shared_runq_shard {
> struct list_head list;
> raw_spinlock_t lock;
> + /*
> + * shared_runq_shard can contain running tasks.
> + * In such cases where all the tasks are running,
> + * it is futile to attempt to pull tasks from the
> + * list. Overload flag is used to indicate case
> + * where one or more rq in the shard domain may
> + * have a queued task. If the flag is 0, it is
> + * very likely that all tasks in the shard are
> + * running and cannot be migrated. This is not
> + * guarded by the shard lock, and since it may
> + * be updated often, it is placed into its own
> + * cacheline.
> + */
> + int overload ____cacheline_aligned;
> } ____cacheline_aligned;
>
> /* This would likely work better as a configurable knob via debugfs */
> @@ -2585,6 +2599,9 @@ static inline void add_nr_running(struct rq *rq, unsigned count)
> if (prev_nr < 2 && rq->nr_running >= 2) {
> if (!READ_ONCE(rq->rd->overload))
> WRITE_ONCE(rq->rd->overload, 1);
> +
> + if (rq->cfs.shard && !READ_ONCE(rq->cfs.shard->overload))
> + WRITE_ONCE(rq->cfs.shard->overload, 1);
> }
> #endif
>
> --
> 2.34.1
>
next prev parent reply other threads:[~2023-08-31 19:11 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-09 22:12 [PATCH v3 0/7] sched: Implement shared runqueue in CFS David Vernet
2023-08-09 22:12 ` [PATCH v3 1/7] sched: Expose move_queued_task() from core.c David Vernet
2023-08-09 22:12 ` [PATCH v3 2/7] sched: Move is_cpu_allowed() into sched.h David Vernet
2023-08-09 22:12 ` [PATCH v3 3/7] sched: Check cpu_active() earlier in newidle_balance() David Vernet
2023-08-09 22:12 ` [PATCH v3 4/7] sched: Enable sched_feat callbacks on enable/disable David Vernet
2023-08-09 22:12 ` [PATCH v3 5/7] sched/fair: Add SHARED_RUNQ sched feature and skeleton calls David Vernet
2023-08-09 22:12 ` [PATCH v3 6/7] sched: Implement shared runqueue in CFS David Vernet
2023-08-10 7:11 ` kernel test robot
2023-08-10 7:41 ` kernel test robot
2023-08-30 6:46 ` K Prateek Nayak
2023-08-31 1:34 ` David Vernet
2023-08-31 3:47 ` K Prateek Nayak
2023-08-09 22:12 ` [PATCH v3 7/7] sched: Shard per-LLC shared runqueues David Vernet
2023-08-09 23:46 ` kernel test robot
2023-08-10 0:12 ` David Vernet
2023-08-10 7:11 ` kernel test robot
2023-08-30 6:17 ` Chen Yu
2023-08-31 0:01 ` David Vernet
2023-08-31 10:45 ` Chen Yu
2023-08-31 19:14 ` David Vernet
2023-09-23 6:35 ` Chen Yu
2023-08-17 8:42 ` [PATCH v3 0/7] sched: Implement shared runqueue in CFS Gautham R. Shenoy
2023-08-18 5:03 ` David Vernet
2023-08-18 8:49 ` Gautham R. Shenoy
2023-08-24 11:14 ` Gautham R. Shenoy
2023-08-24 22:51 ` David Vernet
2023-08-30 9:56 ` K Prateek Nayak
2023-08-31 2:32 ` David Vernet
2023-08-31 4:21 ` K Prateek Nayak
2023-08-31 10:45 ` [RFC PATCH 0/3] DO NOT MERGE: Breaking down the experimantal diff K Prateek Nayak
2023-08-31 10:45 ` [RFC PATCH 1/3] sched/fair: Move SHARED_RUNQ related structs and definitions into sched.h K Prateek Nayak
2023-08-31 10:45 ` [RFC PATCH 2/3] sched/fair: Improve integration of SHARED_RUNQ feature within newidle_balance K Prateek Nayak
2023-08-31 18:45 ` David Vernet
2023-08-31 19:47 ` K Prateek Nayak
2023-08-31 10:45 ` [RFC PATCH 3/3] sched/fair: Add a per-shard overload flag K Prateek Nayak
2023-08-31 19:11 ` David Vernet [this message]
2023-08-31 20:23 ` K Prateek Nayak
2023-09-29 17:01 ` David Vernet
2023-10-04 4:21 ` K Prateek Nayak
2023-10-04 17:20 ` David Vernet
2023-10-05 3:50 ` K Prateek Nayak
2023-09-27 4:23 ` K Prateek Nayak
2023-09-27 6:59 ` Chen Yu
2023-09-27 8:36 ` K Prateek Nayak
2023-09-28 8:41 ` Chen Yu
2023-10-03 21:05 ` David Vernet
2023-10-07 2:10 ` Chen Yu
2023-09-27 13:08 ` David Vernet
2023-11-27 8:28 ` [PATCH v3 0/7] sched: Implement shared runqueue in CFS Aboorva Devarajan
2023-11-27 19:49 ` David Vernet
2023-12-07 6:00 ` Aboorva Devarajan
2023-12-04 19:30 ` David Vernet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230831191103.GC531917@maniforge \
--to=void@manifault.com \
--cc=aaron.lu@intel.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=gautham.shenoy@amd.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@meta.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=roman.gushchin@linux.dev \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=wuyun.abel@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox