From: Jakub Kicinski <kuba@kernel.org>
To: Jason Xing <kerneljasonxing@gmail.com>
Cc: jbrouer@redhat.com, davem@davemloft.net, edumazet@google.com,
pabeni@redhat.com, ast@kernel.org, daniel@iogearbox.net,
hawk@kernel.org, john.fastabend@gmail.com,
stephen@networkplumber.org, simon.horman@corigine.com,
sinquersw@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org,
Jason Xing <kernelxing@tencent.com>
Subject: Re: [PATCH v4 net-next 2/2] net: introduce budget_squeeze to help us tune rx behavior
Date: Thu, 16 Mar 2023 20:26:48 -0700 [thread overview]
Message-ID: <20230316202648.1f8c2f80@kernel.org> (raw)
In-Reply-To: <CAL+tcoDNvMUenwNEH2QByEY7cS1qycTSw1TLFSnNKt4Q0dCJUw@mail.gmail.com>
On Fri, 17 Mar 2023 10:27:11 +0800 Jason Xing wrote:
> > That is the common case, and can be understood from the napi trace
>
> Thanks for your reply. It is commonly happening every day on many servers.
Right but the common issue is the time squeeze, not budget squeeze,
and either way the budget squeeze doesn't really matter because
the softirq loop will call us again soon, if softirq itself is
not scheduled out.
So if you want to monitor a meaningful event in your fleet, I think
a better event to monitor is the number of times ksoftirqd was woken
up and latency of it getting onto the CPU.
Did you try to measure that?
(Please do *not* send patches to touch softirq code right now, just
measure first. We are trying to improve the situation but the core
kernel maintainers are weary of changes:
https://lwn.net/Articles/925540/
so if both of us start sending code they will probably take neither
patches :()
> > point and probing the kernel with bpftrace. We should only add
>
> We probably can deduce (or guess) which one causes the latency because
> trace_napi_poll() only counts the budget consumed per poll.
>
> Besides, tracing napi poll is totally ok with the testbed but not ok
> with those servers with heavy load which bpftrace related tools
> capturing the data from the hot path may cause some bad impact,
> especially with special cards equipped, say, 100G nic card. Resorting
> to legacy file softnet_stat is relatively feasible based on my limited
> knowledge.
Right, but we're still measuring something relatively irrelevant.
As I said the softirq loop will call us again. In my experience
network queues get long when ksoftirqd is woken up but not scheduled
for a long time. That is the source of latency. You may have the same
problem (high latency) without consuming the entire budget.
I think if we wanna make new stats we should try to come up with a way
of capturing the problem rather than one of the symptoms.
> Paolo also added backlog queues into this file in 2020 (see commit:
> 7d58e6555870d). I believe that after this patch, there are few or no
> more new data that is needed to print for the next few years.
>
> > uAPI for statistics which must be maintained contiguously. For
>
> In this patch, I didn't touch the old data as suggested in the
> previous emails and only separated the old way of counting
> @time_squeeze into two parts (time_squeeze and budget_squeeze). Using
> budget_squeeze can help us profile the server and tune it more
> usefully.
>
> > investigations tracing will always be orders of magnitude more
> > powerful :(
>
> > On the time squeeze BTW, have you found out what the problem was?
> > In workloads I've seen the time problems are often because of noise
> > in how jiffies are accounted (cgroup code disables interrupts
> > for long periods of time, for example, making jiffies increment
> > by 2, 3 or 4 rather than by 1).
>
> Yes ! The issue of jiffies increment troubles those servers more often
> than not. For a small group of servers, budget limit is also a
> problem. Sometimes we might treat guest OS differently.
next prev parent reply other threads:[~2023-03-17 3:27 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-15 9:20 [PATCH v4 net-next 0/2] add some detailed data when reading softnet_stat Jason Xing
2023-03-15 9:20 ` [PATCH v4 net-next 1/2] net-sysfs: display two backlog queue len separately Jason Xing
2023-03-19 3:05 ` Jason Xing
2023-03-20 18:40 ` Jakub Kicinski
2023-03-21 1:49 ` Jason Xing
2023-03-15 9:20 ` [PATCH v4 net-next 2/2] net: introduce budget_squeeze to help us tune rx behavior Jason Xing
2023-03-17 0:20 ` Jakub Kicinski
2023-03-17 2:27 ` Jason Xing
2023-03-17 3:26 ` Jakub Kicinski [this message]
2023-03-17 4:11 ` Jason Xing
2023-03-17 4:30 ` Jakub Kicinski
2023-03-18 4:00 ` Jason Xing
2023-03-20 13:30 ` Jesper Dangaard Brouer
2023-03-20 18:46 ` Jakub Kicinski
2023-03-21 2:08 ` Jason Xing
2023-03-30 9:59 ` Jason Xing
2023-03-30 16:23 ` Jakub Kicinski
2023-03-31 0:48 ` Jason Xing
2023-03-31 2:20 ` Jakub Kicinski
2023-03-31 2:33 ` Jason Xing
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230316202648.1f8c2f80@kernel.org \
--to=kuba@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=jbrouer@redhat.com \
--cc=john.fastabend@gmail.com \
--cc=kerneljasonxing@gmail.com \
--cc=kernelxing@tencent.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=simon.horman@corigine.com \
--cc=sinquersw@gmail.com \
--cc=stephen@networkplumber.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).