From: "Michael S. Tsirkin" <mst@redhat.com>
To: Tobias Huschle <huschle@linux.ibm.com>
Cc: Luis Machado <luis.machado@arm.com>,
Jason Wang <jasowang@redhat.com>,
Abel Wu <wuyun.abel@bytedance.com>,
Peter Zijlstra <peterz@infradead.org>,
Linux Kernel <linux-kernel@vger.kernel.org>,
kvm@vger.kernel.org, virtualization@lists.linux.dev,
netdev@vger.kernel.org, nd <nd@arm.com>
Subject: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)
Date: Fri, 15 Mar 2024 06:31:51 -0400 [thread overview]
Message-ID: <20240315062839-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <84704.124031504335801509@us-mta-515.us.mimecast.lan>
On Fri, Mar 15, 2024 at 09:33:49AM +0100, Tobias Huschle wrote:
> On Thu, Mar 14, 2024 at 11:09:25AM -0400, Michael S. Tsirkin wrote:
> >
> > Thanks a lot! To clarify it is not that I am opposed to changing vhost.
> > I would like however for some documentation to exist saying that if you
> > do abc then call API xyz. Then I hope we can feel a bit safer that
> > future scheduler changes will not break vhost (though as usual, nothing
> > is for sure). Right now we are going by the documentation and that says
> > cond_resched so we do that.
> >
> > --
> > MST
> >
>
> Here I'd like to add that we have two different problems:
>
> 1. cond_resched not working as expected
> This appears to me to be a bug in the scheduler where it lets the cgroup,
> which the vhost is running in, loop endlessly. In EEVDF terms, the cgroup
> is allowed to surpass its own deadline without consequences. One of my RFCs
> mentioned above adresses this issue (not happy yet with the implementation).
> This issue only appears in that specific scenario, so it's not a general
> issue, rather a corner case.
> But, this fix will still allow the vhost to reach its deadline, which is
> one full time slice. This brings down the max delays from 300+ms to whatever
> the timeslice is. This is not enough to fix the regression.
>
> 2. vhost relying on kworker being scheduled on wake up
> This is the bigger issue for the regression. There are rare cases, where
> the vhost runs only for a very short amount of time before it wakes up
> the kworker. Simultaneously, the kworker takes longer than usual to
> complete its work and takes longer than the vhost did before. We
> are talking 4digit to low 5digit nanosecond values.
> With those two being the only tasks on the CPU, the scheduler now assumes
> that the kworker wants to unfairly consume more than the vhost and denies
> it being scheduled on wakeup.
> In the regular cases, the kworker is faster than the vhost, so the
> scheduler assumes that the kworker needs help, which benefits the
> scenario we are looking at.
> In the bad case, this means unfortunately, that cond_resched cannot work
> as good as before, for this particular case!
> So, let's assume that problem 1 from above is fixed. It will take one
> full time slice to get the need_resched flag set by the scheduler
> because vhost surpasses its deadline. Before, the scheduler cannot know
> that the kworker should actually run. The kworker itself is unable
> to communicate that by itself since it's not getting scheduled and there
> is no external entity that could intervene.
> Hence my argumentation that cond_resched still works as expected. The
> crucial part is that the wake up behavior has changed which is why I'm
> a bit reluctant to propose a documentation change on cond_resched.
> I could see proposing a doc change, that cond_resched should not be
> used if a task heavily relies on a woken up task being scheduled.
Could you remind me pls, what is the kworker doing specifically that
vhost is relying on?
--
MST
next prev parent reply other threads:[~2024-03-15 10:32 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-16 18:58 EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement) Tobias Huschle
2023-11-17 9:23 ` Peter Zijlstra
2023-11-17 9:58 ` Peter Zijlstra
2023-11-17 12:24 ` Tobias Huschle
2023-11-17 12:37 ` Peter Zijlstra
2023-11-17 13:07 ` Abel Wu
2023-11-21 13:17 ` Tobias Huschle
2023-11-22 10:00 ` Peter Zijlstra
2023-11-27 13:56 ` Tobias Huschle
[not found] ` <6564a012.c80a0220.adb78.f0e4SMTPIN_ADDED_BROKEN@mx.google.com>
2023-11-28 8:55 ` Abel Wu
2023-11-29 6:31 ` Tobias Huschle
2023-12-07 6:22 ` Tobias Huschle
[not found] ` <07513.123120701265800278@us-mta-474.us.mimecast.lan>
2023-12-07 6:48 ` Michael S. Tsirkin
2023-12-08 9:24 ` Tobias Huschle
2023-12-08 17:28 ` Mike Christie
[not found] ` <56082.123120804242300177@us-mta-137.us.mimecast.lan>
2023-12-08 10:31 ` Re: " Michael S. Tsirkin
2023-12-08 11:41 ` Tobias Huschle
[not found] ` <53044.123120806415900549@us-mta-342.us.mimecast.lan>
2023-12-09 10:42 ` Michael S. Tsirkin
2023-12-11 7:26 ` Jason Wang
2023-12-11 16:53 ` Michael S. Tsirkin
2023-12-12 3:00 ` Jason Wang
2023-12-12 16:15 ` Michael S. Tsirkin
2023-12-13 10:37 ` Tobias Huschle
[not found] ` <42870.123121305373200110@us-mta-641.us.mimecast.lan>
2023-12-13 12:00 ` Michael S. Tsirkin
2023-12-13 12:45 ` Tobias Huschle
[not found] ` <25485.123121307454100283@us-mta-18.us.mimecast.lan>
2023-12-13 14:47 ` Michael S. Tsirkin
2023-12-13 14:55 ` Michael S. Tsirkin
2023-12-14 7:14 ` Michael S. Tsirkin
2024-01-08 13:13 ` Tobias Huschle
[not found] ` <92916.124010808133201076@us-mta-622.us.mimecast.lan>
2024-01-09 23:07 ` Michael S. Tsirkin
2024-01-21 18:44 ` Michael S. Tsirkin
2024-01-22 11:29 ` Tobias Huschle
2024-02-01 7:38 ` Tobias Huschle
[not found] ` <07974.124020102385100135@us-mta-501.us.mimecast.lan>
2024-02-01 8:08 ` Michael S. Tsirkin
2024-02-01 11:47 ` Tobias Huschle
[not found] ` <89460.124020106474400877@us-mta-475.us.mimecast.lan>
2024-02-01 12:08 ` Michael S. Tsirkin
2024-02-22 19:23 ` Michael S. Tsirkin
2024-03-11 17:05 ` Michael S. Tsirkin
2024-03-12 9:45 ` Luis Machado
2024-03-14 11:46 ` Tobias Huschle
[not found] ` <73123.124031407552500165@us-mta-156.us.mimecast.lan>
2024-03-14 15:09 ` Michael S. Tsirkin
2024-03-15 8:33 ` Tobias Huschle
[not found] ` <84704.124031504335801509@us-mta-515.us.mimecast.lan>
2024-03-15 10:31 ` Michael S. Tsirkin [this message]
2024-03-19 8:21 ` Tobias Huschle
2024-03-19 8:29 ` Michael S. Tsirkin
2024-03-19 8:59 ` Tobias Huschle
2024-04-30 10:50 ` Tobias Huschle
2024-05-01 10:51 ` Peter Zijlstra
2024-05-01 15:31 ` Michael S. Tsirkin
2024-05-02 9:16 ` Peter Zijlstra
2024-05-02 12:23 ` Tobias Huschle
2024-05-02 12:20 ` Tobias Huschle
2023-11-18 5:14 ` Abel Wu
2023-11-20 10:56 ` Peter Zijlstra
2023-11-20 12:06 ` Abel Wu
2023-11-18 7:33 ` Abel Wu
2023-11-18 15:29 ` Honglei Wang
2023-11-19 13:29 ` Bagas Sanjaya
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240315062839-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=huschle@linux.ibm.com \
--cc=jasowang@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luis.machado@arm.com \
--cc=nd@arm.com \
--cc=netdev@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=virtualization@lists.linux.dev \
--cc=wuyun.abel@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.