From: "Doug Smythies" <dsmythies@telus.net>
To: "'Alexander Egorenkov'" <egorenar@linux.ibm.com>, <peterz@infradead.org>
Cc: <linux-kernel@vger.kernel.org>, <mingo@kernel.org>,
<x86@kernel.org>, "Doug Smythies" <dsmythies@telus.net>
Subject: RE: [tip: sched/urgent] sched/fair: Fix EEVDF entity placement bug causing scheduling lag
Date: Sat, 26 Apr 2025 08:09:55 -0700 [thread overview]
Message-ID: <002401dbb6bd$4527ec00$cf77c400$@telus.net> (raw)
In-Reply-To: <87msc6dmbz.fsf@li-0ccc18cc-2c67-11b2-a85c-a193851e4c5d.ibm.com>
Hi Alexander,
Thank you for your reply.
Note that I have adjusted the address list for this email, because I don't know if bots can get emails, and Peter was not on the
"To" line, and might not have noticed this thread.
@Peter : Off-list I will forward you the other emails, in case you missed them. I apologise if you did see them but haven't had time
to get to them or whatever.
Also note that I know nothing about the scheduler and was only on the original email because I had a "Reported-by" tag.
On 2025.04.24 00:57 Alexander Egorenkov wrote:
> Hi all,
[Doug wrote]
>> That is a very very stressful test. It crashes within a few seconds on my test computer,
>> with a " Segmentation fault (core dumped)" message.
>
> Yes, this is an artificial test i came up with to demonstrate the
> problem we have with another realistic test which i can hardly
> use here for the sake of demonstration. But it reveals the exact
> same problem we have with our CI test on s390x test systems.
>
> Let me explain shortly how it happens.
>
> Basically, we have a test system where we execute a test suite and
> simultaneously monitor this system on another system via simple SSH
> logins (approximately invoked every 15 seconds) whether the test system
> is still online and dump automatically if it remains unresponsive for
> 5m straight. We limit every such SSH login to 10 seconds because
> we had situations where SSH sometimes hanged for a long time due to
> various problems with networking, test system itself etc., just to make
> our monitoring robust.
>
> And since the commit "sched/fair: Fix EEVDF entity placement bug causing
> scheduling lag" we regularly see SSH logins (limited to 10s) failing for
> 5m straight, not a single SSH login succeeds. This happens regularly
> with test suites which compile software with GCC and use all CPUs
> at 100%. Before the commit, a SSH login required under 1 second.
> I cannot judge whether the problem really in this commit, or it is just an
> accumulated effect after multiple ones.
>
> FYI:
> One such system where it happens regularly has 7 cores (5.2Ghz SMT 2x, 14 cpus)
> and 8G of main memory with 20G of swap.
>
> Thanks
> Regards
> Alex
Thanks for the explanation.
I have recreated your situation with a workflow that, while it stresses the CPUs,
doesn't make any entries in /var/log/kern.log and /var/log/syslog.
Under the same conditions, I have confirmed that the ssh login lag doesn't occur
With kernel 6.12, but does with kernel 6.13
My workflow is stuff I have used for many years and wrote myself.
Basically, I create a huge queue of running tasks, with each doing a little work
and then sleeping for a short period. I have 2 methods to achieve similar overall
workflow, and one shows the issue and one does not. I can also create a huge
queue by just increasing the number "yes" tasks to a ridiculous number, but
that does not show your ssh login lag issue.
Anyway, for the workflow that does show your issue, I had a load average of
about 19,500 (20,000 tasks) and ssh login times ranged from 38 to 10 seconds,
with an average of about 13 seconds. ssh login times using kernel 6.12 were
negligible.
... Doug
next prev parent reply other threads:[~2025-04-26 15:09 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-29 22:51 [REGRESSION] Re: [PATCH 00/24] Complete EEVDF Doug Smythies
2025-01-06 11:57 ` Peter Zijlstra
2025-01-06 15:01 ` Doug Smythies
2025-01-06 16:59 ` Peter Zijlstra
2025-01-06 17:04 ` Peter Zijlstra
2025-01-06 17:14 ` Peter Zijlstra
2025-01-07 1:24 ` Doug Smythies
2025-01-07 10:49 ` Peter Zijlstra
2025-01-06 22:28 ` Doug Smythies
2025-01-07 11:26 ` Peter Zijlstra
2025-01-07 15:04 ` Doug Smythies
2025-01-07 16:25 ` Doug Smythies
2025-01-07 19:23 ` Peter Zijlstra
2025-01-08 5:15 ` Doug Smythies
2025-01-08 13:12 ` Peter Zijlstra
2025-01-08 15:48 ` Doug Smythies
2025-01-09 10:59 ` Peter Zijlstra
2025-01-09 12:18 ` [tip: sched/urgent] sched/fair: Fix EEVDF entity placement bug causing scheduling lag tip-bot2 for Peter Zijlstra
2025-04-17 9:56 ` Alexander Egorenkov
2025-04-22 5:40 ` ll"RE: " Doug Smythies
2025-04-24 7:56 ` Alexander Egorenkov
2025-04-26 15:09 ` Doug Smythies [this message]
2025-01-10 5:09 ` [REGRESSION] Re: [PATCH 00/24] Complete EEVDF Doug Smythies
2025-01-10 11:57 ` Peter Zijlstra
2025-01-12 23:14 ` Doug Smythies
2025-01-13 11:03 ` Peter Zijlstra
2025-01-14 10:58 ` Peter Zijlstra
2025-01-14 15:15 ` Doug Smythies
2025-01-15 2:08 ` Len Brown
2025-01-15 16:47 ` Doug Smythies
2025-01-19 0:09 ` Doug Smythies
2025-01-20 3:55 ` Doug Smythies
2025-01-21 11:06 ` Peter Zijlstra
2025-01-21 8:49 ` Peter Zijlstra
2025-01-21 11:21 ` Peter Zijlstra
2025-01-21 15:58 ` Doug Smythies
2025-01-24 4:34 ` Doug Smythies
2025-01-24 11:04 ` Peter Zijlstra
2025-01-13 11:05 ` Peter Zijlstra
2025-01-13 16:01 ` Doug Smythies
2025-01-13 12:58 ` [tip: sched/urgent] sched/fair: Fix update_cfs_group() vs DELAY_DEQUEUE tip-bot2 for Peter Zijlstra
2025-01-12 19:59 ` [REGRESSION] Re: [PATCH 00/24] Complete EEVDF Doug Smythies
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='002401dbb6bd$4527ec00$cf77c400$@telus.net' \
--to=dsmythies@telus.net \
--cc=egorenar@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.