From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
To: Chengming Zhou <zhouchengming-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
Cc: tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
mkoutny-IBi9RG/b67k@public.gmane.org,
surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org,
corbet-T1hC0tSOHrs@public.gmane.org,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-doc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org
Subject: Re: [PATCH v4 05/10] sched/psi: optimize task switch inside shared cgroups again
Date: Thu, 25 Aug 2022 13:16:07 -0400 [thread overview]
Message-ID: <YweuV/+G0DGn3ECV@cmpxchg.org> (raw)
In-Reply-To: <20220825164111.29534-6-zhouchengming-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
On Fri, Aug 26, 2022 at 12:41:06AM +0800, Chengming Zhou wrote:
> Way back when PSI_MEM_FULL was accounted from the timer tick, task
> switching could simply iterate next and prev to the common ancestor to
> update TSK_ONCPU and be done.
>
> Then memstall ticks were replaced with checking curr->in_memstall
> directly in psi_group_change(). That meant that now if the task switch
> was between a memstall and a !memstall task, we had to iterate through
> the common ancestors at least ONCE to fix up their state_masks.
>
> We added the identical_state filter to make sure the common ancestor
> elimination was skipped in that case. It seems that was always a
> little too eager, because it caused us to walk the common ancestors
> *twice* instead of the required once: the iteration for next could
> have stopped at the common ancestor; prev could have updated TSK_ONCPU
> up to the common ancestor, then finish to the root without changing
> any flags, just to get the new curr->in_memstall into the state_masks.
>
> This patch recognizes this and makes it so that we walk to the root
> exactly once if state_mask needs updating, which is simply catching up
> on a missed optimization that could have been done in commit 7fae6c8171d2
> ("psi: Use ONCPU state tracking machinery to detect reclaim") directly.
>
> Apart from this, it's also necessary for the next patch "sched/psi: remove
> NR_ONCPU task accounting". Suppose we walk the common ancestors twice:
>
> (1) psi_group_change(.clear = 0, .set = TSK_ONCPU)
> (2) psi_group_change(.clear = TSK_ONCPU, .set = 0)
>
> We previously used tasks[NR_ONCPU] to record TSK_ONCPU, tasks[NR_ONCPU]++
> in (1) then tasks[NR_ONCPU]-- in (2), so tasks[NR_ONCPU] still be correct.
>
> The next patch change to use one bit in state mask to record TSK_ONCPU,
> PSI_ONCPU bit will be set in (1), but then be cleared in (2), which cause
> the psi_group_cpu has task running on CPU but without PSI_ONCPU bit set!
>
> With this patch, we will never walk the common ancestors twice, so won't
> have above problem.
>
> Suggested-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> Signed-off-by: Chengming Zhou <zhouchengming-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
Acked-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Chengming Zhou <zhouchengming@bytedance.com>
Cc: tj@kernel.org, mkoutny@suse.com, surenb@google.com,
mingo@redhat.com, peterz@infradead.org,
gregkh@linuxfoundation.org, corbet@lwn.net,
cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, songmuchun@bytedance.com
Subject: Re: [PATCH v4 05/10] sched/psi: optimize task switch inside shared cgroups again
Date: Thu, 25 Aug 2022 13:16:07 -0400 [thread overview]
Message-ID: <YweuV/+G0DGn3ECV@cmpxchg.org> (raw)
In-Reply-To: <20220825164111.29534-6-zhouchengming@bytedance.com>
On Fri, Aug 26, 2022 at 12:41:06AM +0800, Chengming Zhou wrote:
> Way back when PSI_MEM_FULL was accounted from the timer tick, task
> switching could simply iterate next and prev to the common ancestor to
> update TSK_ONCPU and be done.
>
> Then memstall ticks were replaced with checking curr->in_memstall
> directly in psi_group_change(). That meant that now if the task switch
> was between a memstall and a !memstall task, we had to iterate through
> the common ancestors at least ONCE to fix up their state_masks.
>
> We added the identical_state filter to make sure the common ancestor
> elimination was skipped in that case. It seems that was always a
> little too eager, because it caused us to walk the common ancestors
> *twice* instead of the required once: the iteration for next could
> have stopped at the common ancestor; prev could have updated TSK_ONCPU
> up to the common ancestor, then finish to the root without changing
> any flags, just to get the new curr->in_memstall into the state_masks.
>
> This patch recognizes this and makes it so that we walk to the root
> exactly once if state_mask needs updating, which is simply catching up
> on a missed optimization that could have been done in commit 7fae6c8171d2
> ("psi: Use ONCPU state tracking machinery to detect reclaim") directly.
>
> Apart from this, it's also necessary for the next patch "sched/psi: remove
> NR_ONCPU task accounting". Suppose we walk the common ancestors twice:
>
> (1) psi_group_change(.clear = 0, .set = TSK_ONCPU)
> (2) psi_group_change(.clear = TSK_ONCPU, .set = 0)
>
> We previously used tasks[NR_ONCPU] to record TSK_ONCPU, tasks[NR_ONCPU]++
> in (1) then tasks[NR_ONCPU]-- in (2), so tasks[NR_ONCPU] still be correct.
>
> The next patch change to use one bit in state mask to record TSK_ONCPU,
> PSI_ONCPU bit will be set in (1), but then be cleared in (2), which cause
> the psi_group_cpu has task running on CPU but without PSI_ONCPU bit set!
>
> With this patch, we will never walk the common ancestors twice, so won't
> have above problem.
>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
next prev parent reply other threads:[~2022-08-25 17:16 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-25 16:41 [PATCH v4 00/10] sched/psi: some optimizations and extensions Chengming Zhou
2022-08-25 16:41 ` Chengming Zhou
2022-08-25 16:41 ` [PATCH v4 02/10] sched/psi: don't create cgroup PSI files when psi_disabled Chengming Zhou
2022-09-09 14:00 ` [tip: sched/psi] sched/psi: Don't " tip-bot2 for Chengming Zhou
2022-08-25 16:41 ` [PATCH v4 04/10] sched/psi: move private helpers to sched/stats.h Chengming Zhou
2022-09-09 14:00 ` [tip: sched/psi] sched/psi: Move " tip-bot2 for Chengming Zhou
2022-08-25 16:41 ` [PATCH v4 05/10] sched/psi: optimize task switch inside shared cgroups again Chengming Zhou
[not found] ` <20220825164111.29534-6-zhouchengming-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
2022-08-25 17:16 ` Johannes Weiner [this message]
2022-08-25 17:16 ` Johannes Weiner
2022-09-09 14:00 ` [tip: sched/psi] sched/psi: Optimize " tip-bot2 for Chengming Zhou
2022-08-25 16:41 ` [PATCH v4 06/10] sched/psi: remove NR_ONCPU task accounting Chengming Zhou
2022-09-09 14:00 ` [tip: sched/psi] sched/psi: Remove " tip-bot2 for Johannes Weiner
2022-08-25 16:41 ` [PATCH v4 10/10] sched/psi: per-cgroup PSI accounting disable/re-enable interface Chengming Zhou
2022-09-07 9:03 ` [PATCH] sched/psi: Per-cgroup " Chengming Zhou
2022-09-09 14:00 ` [tip: sched/psi] " tip-bot2 for Chengming Zhou
[not found] ` <20220825164111.29534-1-zhouchengming-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
2022-08-25 16:41 ` [PATCH v4 01/10] sched/psi: fix periodic aggregation shut off Chengming Zhou
2022-08-25 16:41 ` Chengming Zhou
2022-09-09 14:00 ` [tip: sched/psi] sched/psi: Fix " tip-bot2 for Chengming Zhou
2022-08-25 16:41 ` [PATCH v4 03/10] sched/psi: save percpu memory when !psi_cgroups_enabled Chengming Zhou
2022-08-25 16:41 ` Chengming Zhou
2022-09-09 14:00 ` [tip: sched/psi] sched/psi: Save " tip-bot2 for Chengming Zhou
2022-08-25 16:41 ` [PATCH v4 07/10] sched/psi: add PSI_IRQ to track IRQ/SOFTIRQ pressure Chengming Zhou
2022-08-25 16:41 ` Chengming Zhou
[not found] ` <20220825164111.29534-8-zhouchengming-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
2022-08-25 17:17 ` Johannes Weiner
2022-08-25 17:17 ` Johannes Weiner
2022-09-09 14:00 ` [tip: sched/psi] sched/psi: Add " tip-bot2 for Chengming Zhou
2022-08-25 16:41 ` [PATCH v4 08/10] sched/psi: consolidate cgroup_psi() Chengming Zhou
2022-08-25 16:41 ` Chengming Zhou
2022-09-09 14:00 ` [tip: sched/psi] sched/psi: Consolidate cgroup_psi() tip-bot2 for Chengming Zhou
2022-08-25 16:41 ` [PATCH v4 09/10] sched/psi: cache parent psi_group to speed up groups iterate Chengming Zhou
2022-08-25 16:41 ` Chengming Zhou
2022-09-09 14:00 ` [tip: sched/psi] sched/psi: Cache parent psi_group to speed up group iteration tip-bot2 for Chengming Zhou
2022-09-06 13:13 ` [PATCH v4 00/10] sched/psi: some optimizations and extensions Chengming Zhou
2022-09-06 13:13 ` Chengming Zhou
[not found] ` <be071d5a-ff2d-d06e-2f89-f2ca247dd19e-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
2022-09-06 14:43 ` Peter Zijlstra
2022-09-06 14:43 ` Peter Zijlstra
[not found] ` <YxdcfX4Ss/9k8qA9-Nxj+rRp3nVydTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2022-09-07 1:55 ` Chengming Zhou
2022-09-07 1:55 ` Chengming Zhou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YweuV/+G0DGn3ECV@cmpxchg.org \
--to=hannes-druugvl0lcnafugrpc6u6w@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=corbet-T1hC0tSOHrs@public.gmane.org \
--cc=gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org \
--cc=linux-doc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=mkoutny-IBi9RG/b67k@public.gmane.org \
--cc=peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
--cc=songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org \
--cc=surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=zhouchengming-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.