From: Peter Zijlstra <peterz@infradead.org>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: mingo@redhat.com, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
vschneid@redhat.com, hannes@cmpxchg.org, surenb@google.com,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 4/4] sched: Fix cgroup irq accounting for CONFIG_IRQ_TIME_ACCOUNTING
Date: Fri, 1 Nov 2024 11:28:42 +0100 [thread overview]
Message-ID: <20241101102842.GW14555@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20241101031750.1471-5-laoar.shao@gmail.com>
On Fri, Nov 01, 2024 at 11:17:50AM +0800, Yafang Shao wrote:
> After enabling CONFIG_IRQ_TIME_ACCOUNTING to monitor IRQ pressure in our
> container environment, we observed several noticeable behavioral changes.
>
> One of our IRQ-heavy services, such as Redis, reported a significant
> reduction in CPU usage after upgrading to the new kernel with
> CONFIG_IRQ_TIME_ACCOUNTING enabled. However, despite adding more threads
> to handle an increased workload, the CPU usage could not be raised. In
> other words, even though the container’s CPU usage appeared low, it was
> unable to process more workloads to utilize additional CPU resources, which
> caused issues.
> We can verify the CPU usage of the test cgroup using cpuacct.stat. The
> output shows:
>
> system: 53
> user: 2
>
> The CPU usage of the cgroup is relatively low at around 55%, but this usage
> doesn't increase, even with more netperf tasks. The reason is that CPU0 is
> at 100% utilization, as confirmed by mpstat:
>
> 02:56:22 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> 02:56:23 PM 0 0.99 0.00 55.45 0.00 0.99 42.57 0.00 0.00 0.00 0.00
>
> 02:56:23 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> 02:56:24 PM 0 2.00 0.00 55.00 0.00 0.00 43.00 0.00 0.00 0.00 0.00
>
> It is clear that the %soft is not accounted into the cgroup of the
> interrupted task. This behavior is unexpected. We should account for IRQ
> time to the cgroup to reflect the pressure the group is under.
>
> After a thorough analysis, I discovered that this change in behavior is due
> to commit 305e6835e055 ("sched: Do not account irq time to current task"),
> which altered whether IRQ time should be charged to the interrupted task.
> While I agree that a task should not be penalized by random interrupts, the
> task itself cannot progress while interrupted. Therefore, the interrupted
> time should be reported to the user.
>
> The system metric in cpuacct.stat is crucial in indicating whether a
> container is under heavy system pressure, including IRQ/softirq activity.
> Hence, IRQ/softirq time should be accounted for in the cpuacct system
> usage, which also applies to cgroup2’s rstat.
>
> This patch reintroduces IRQ/softirq accounting to cgroups.
How !? what does it actually do?
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> ---
> kernel/sched/core.c | 33 +++++++++++++++++++++++++++++++--
> kernel/sched/psi.c | 14 +++-----------
> kernel/sched/stats.h | 7 ++++---
> 3 files changed, 38 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 06a06f0897c3..5ed2c5c8c911 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5579,6 +5579,35 @@ __setup("resched_latency_warn_ms=", setup_resched_latency_warn_ms);
> static inline u64 cpu_resched_latency(struct rq *rq) { return 0; }
> #endif /* CONFIG_SCHED_DEBUG */
>
> +#ifdef CONFIG_IRQ_TIME_ACCOUNTING
> +static void account_irqtime(struct rq *rq, struct task_struct *curr,
> + struct task_struct *prev)
> +{
> + int cpu = smp_processor_id();
> + s64 delta;
> + u64 irq;
> +
> + if (!static_branch_likely(&sched_clock_irqtime))
> + return;
> +
> + irq = irq_time_read(cpu);
> + delta = (s64)(irq - rq->psi_irq_time);
At this point the variable is no longer exclusive to PSI and should
probably be renamed.
> + if (delta < 0)
> + return;
> +
> + rq->psi_irq_time = irq;
> + psi_account_irqtime(rq, curr, prev, delta);
> + cgroup_account_cputime(curr, delta);
> + /* We account both softirq and irq into softirq */
> + cgroup_account_cputime_field(curr, CPUTIME_SOFTIRQ, delta);
This seems wrong.. we have CPUTIME_IRQ.
> +}
In fact, much of this seems like it's going about things sideways.
Why can't you just add the cgroup_account_*() garbage to
irqtime_account_irq()? That is were it's still split out into softirq
and irq.
But the much bigger question is -- how can you be sure that this
interrupt is in fact for the cgroup you're attributing it to? Could be
for an entirely different cgroup.
next prev parent reply other threads:[~2024-11-01 10:28 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-01 3:17 [PATCH v4 0/4] sched: Fix irq accounting for CONFIG_IRQ_TIME_ACCOUNTING Yafang Shao
2024-11-01 3:17 ` [PATCH v4 1/4] sched: Define sched_clock_irqtime as static key Yafang Shao
2024-11-01 10:06 ` Peter Zijlstra
2024-11-01 11:55 ` Yafang Shao
2024-11-01 3:17 ` [PATCH v4 2/4] sched: Don't account irq time if sched_clock_irqtime is disabled Yafang Shao
2024-11-01 3:17 ` [PATCH v4 3/4] sched, psi: " Yafang Shao
2024-11-01 3:17 ` [PATCH v4 4/4] sched: Fix cgroup irq accounting for CONFIG_IRQ_TIME_ACCOUNTING Yafang Shao
2024-11-01 10:28 ` Peter Zijlstra [this message]
2024-11-01 12:04 ` Yafang Shao
2024-11-01 10:54 ` [PATCH v4 0/4] sched: Fix " Peter Zijlstra
2024-11-01 11:54 ` Yafang Shao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241101102842.GW14555@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=bsegall@google.com \
--cc=cgroups@vger.kernel.org \
--cc=dietmar.eggemann@arm.com \
--cc=hannes@cmpxchg.org \
--cc=juri.lelli@redhat.com \
--cc=laoar.shao@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=rostedt@goodmis.org \
--cc=surenb@google.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox