public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: mingo@redhat.com, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	vschneid@redhat.com, hannes@cmpxchg.org, surenb@google.com,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 4/4] sched: Fix cgroup irq accounting for CONFIG_IRQ_TIME_ACCOUNTING
Date: Fri, 1 Nov 2024 11:28:42 +0100	[thread overview]
Message-ID: <20241101102842.GW14555@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20241101031750.1471-5-laoar.shao@gmail.com>

On Fri, Nov 01, 2024 at 11:17:50AM +0800, Yafang Shao wrote:
> After enabling CONFIG_IRQ_TIME_ACCOUNTING to monitor IRQ pressure in our
> container environment, we observed several noticeable behavioral changes.
> 
> One of our IRQ-heavy services, such as Redis, reported a significant
> reduction in CPU usage after upgrading to the new kernel with
> CONFIG_IRQ_TIME_ACCOUNTING enabled. However, despite adding more threads
> to handle an increased workload, the CPU usage could not be raised. In
> other words, even though the container’s CPU usage appeared low, it was
> unable to process more workloads to utilize additional CPU resources, which
> caused issues.

> We can verify the CPU usage of the test cgroup using cpuacct.stat. The
> output shows:
> 
>   system: 53
>   user: 2
> 
> The CPU usage of the cgroup is relatively low at around 55%, but this usage
> doesn't increase, even with more netperf tasks. The reason is that CPU0 is
> at 100% utilization, as confirmed by mpstat:
> 
>   02:56:22 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
>   02:56:23 PM    0    0.99    0.00   55.45    0.00    0.99   42.57    0.00    0.00    0.00    0.00
> 
>   02:56:23 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
>   02:56:24 PM    0    2.00    0.00   55.00    0.00    0.00   43.00    0.00    0.00    0.00    0.00
> 
> It is clear that the %soft is not accounted into the cgroup of the
> interrupted task. This behavior is unexpected. We should account for IRQ
> time to the cgroup to reflect the pressure the group is under.
> 
> After a thorough analysis, I discovered that this change in behavior is due
> to commit 305e6835e055 ("sched: Do not account irq time to current task"),
> which altered whether IRQ time should be charged to the interrupted task.
> While I agree that a task should not be penalized by random interrupts, the
> task itself cannot progress while interrupted. Therefore, the interrupted
> time should be reported to the user.
> 
> The system metric in cpuacct.stat is crucial in indicating whether a
> container is under heavy system pressure, including IRQ/softirq activity.
> Hence, IRQ/softirq time should be accounted for in the cpuacct system
> usage, which also applies to cgroup2’s rstat.
> 
> This patch reintroduces IRQ/softirq accounting to cgroups.

How !? what does it actually do?

> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  kernel/sched/core.c  | 33 +++++++++++++++++++++++++++++++--
>  kernel/sched/psi.c   | 14 +++-----------
>  kernel/sched/stats.h |  7 ++++---
>  3 files changed, 38 insertions(+), 16 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 06a06f0897c3..5ed2c5c8c911 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5579,6 +5579,35 @@ __setup("resched_latency_warn_ms=", setup_resched_latency_warn_ms);
>  static inline u64 cpu_resched_latency(struct rq *rq) { return 0; }
>  #endif /* CONFIG_SCHED_DEBUG */
>  
> +#ifdef CONFIG_IRQ_TIME_ACCOUNTING
> +static void account_irqtime(struct rq *rq, struct task_struct *curr,
> +			    struct task_struct *prev)
> +{
> +	int cpu = smp_processor_id();
> +	s64 delta;
> +	u64 irq;
> +
> +	if (!static_branch_likely(&sched_clock_irqtime))
> +		return;
> +
> +	irq = irq_time_read(cpu);
> +	delta = (s64)(irq - rq->psi_irq_time);

At this point the variable is no longer exclusive to PSI and should
probably be renamed.

> +	if (delta < 0)
> +		return;
> +
> +	rq->psi_irq_time = irq;
> +	psi_account_irqtime(rq, curr, prev, delta);
> +	cgroup_account_cputime(curr, delta);
> +	/* We account both softirq and irq into softirq */
> +	cgroup_account_cputime_field(curr, CPUTIME_SOFTIRQ, delta);

This seems wrong.. we have CPUTIME_IRQ.

> +}

In fact, much of this seems like it's going about things sideways.

Why can't you just add the cgroup_account_*() garbage to
irqtime_account_irq()? That is were it's still split out into softirq
and irq.

But the much bigger question is -- how can you be sure that this
interrupt is in fact for the cgroup you're attributing it to? Could be
for an entirely different cgroup.



  reply	other threads:[~2024-11-01 10:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-01  3:17 [PATCH v4 0/4] sched: Fix irq accounting for CONFIG_IRQ_TIME_ACCOUNTING Yafang Shao
2024-11-01  3:17 ` [PATCH v4 1/4] sched: Define sched_clock_irqtime as static key Yafang Shao
2024-11-01 10:06   ` Peter Zijlstra
2024-11-01 11:55     ` Yafang Shao
2024-11-01  3:17 ` [PATCH v4 2/4] sched: Don't account irq time if sched_clock_irqtime is disabled Yafang Shao
2024-11-01  3:17 ` [PATCH v4 3/4] sched, psi: " Yafang Shao
2024-11-01  3:17 ` [PATCH v4 4/4] sched: Fix cgroup irq accounting for CONFIG_IRQ_TIME_ACCOUNTING Yafang Shao
2024-11-01 10:28   ` Peter Zijlstra [this message]
2024-11-01 12:04     ` Yafang Shao
2024-11-01 10:54 ` [PATCH v4 0/4] sched: Fix " Peter Zijlstra
2024-11-01 11:54   ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241101102842.GW14555@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bsegall@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=juri.lelli@redhat.com \
    --cc=laoar.shao@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=surenb@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox