From: Sean Christopherson <seanjc@google.com>
To: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
mingo@redhat.com, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
vschneid@redhat.com, linux-kernel@vger.kernel.org,
kprateek.nayak@amd.com, wuyun.abel@bytedance.com,
youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de,
kvm@vger.kernel.org
Subject: Re: [PATCH 17/24] sched/fair: Implement delayed dequeue
Date: Wed, 9 Oct 2024 19:49:54 -0700 [thread overview]
Message-ID: <ZwdA0sbA2tJA3IKh@google.com> (raw)
In-Reply-To: <5618d029-769a-4690-a581-2df8939f26a9@samsung.com>
+KVM
On Thu, Aug 29, 2024, Marek Szyprowski wrote:
> On 27.07.2024 12:27, Peter Zijlstra wrote:
> > Extend / fix 86bfbb7ce4f6 ("sched/fair: Add lag based placement") by
> > noting that lag is fundamentally a temporal measure. It should not be
> > carried around indefinitely.
> >
> > OTOH it should also not be instantly discarded, doing so will allow a
> > task to game the system by purposefully (micro) sleeping at the end of
> > its time quantum.
> >
> > Since lag is intimately tied to the virtual time base, a wall-time
> > based decay is also insufficient, notably competition is required for
> > any of this to make sense.
> >
> > Instead, delay the dequeue and keep the 'tasks' on the runqueue,
> > competing until they are eligible.
> >
> > Strictly speaking, we only care about keeping them until the 0-lag
> > point, but that is a difficult proposition, instead carry them around
> > until they get picked again, and dequeue them at that point.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>
> This patch landed recently in linux-next as commit 152e11f6df29
> ("sched/fair: Implement delayed dequeue"). In my tests on some of the
> ARM 32bit boards it causes a regression in rtcwake tool behavior - from
> time to time this simple call never ends:
>
> # time rtcwake -s 10 -m on
>
> Reverting this commit (together with its compile dependencies) on top of
> linux-next fixes this issue. Let me know how can I help debugging this
> issue.
This commit broke KVM's posted interrupt handling (and other things), and the root
cause may be the same underlying issue.
TL;DR: Code that checks task_struct.on_rq may be broken by this commit.
KVM's breakage boils down to the preempt notifiers, i.e. kvm_sched_out(), being
invoked with current->on_rq "true" after KVM has explicitly called schedule().
kvm_sched_out() uses current->on_rq to determine if the vCPU is being preempted
(voluntarily or not, doesn't matter), and so waiting until some later point in
time to call __block_task() causes KVM to think the task was preempted, when in
reality it was not.
static void kvm_sched_out(struct preempt_notifier *pn,
struct task_struct *next)
{
struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
WRITE_ONCE(vcpu->scheduled_out, true);
if (current->on_rq && vcpu->wants_to_run) { <================
WRITE_ONCE(vcpu->preempted, true);
WRITE_ONCE(vcpu->ready, true);
}
kvm_arch_vcpu_put(vcpu);
__this_cpu_write(kvm_running_vcpu, NULL);
}
KVM uses vcpu->preempted for a variety of things, but the most visibly problematic
is waking a vCPU from (virtual) HLT via posted interrupt wakeup. When a vCPU
HLTs, KVM ultimate calls schedule() to schedule out the vCPU until it receives
a wake event.
When a device or another vCPU can post an interrupt as a wake event, KVM mucks
with the blocking vCPU's posted interrupt descriptor so that posted interrupts
that should be wake events get delivered on a dedicated host IRQ vector, so that
KVM can kick and wake the target vCPU.
But when vcpu->preempted is true, KVM suppresses posted interrupt notifications,
knowing that the vCPU will be scheduled back in. Because a vCPU (task) can be
preempted while KVM is emulating HLT, KVM keys off vcpu->preempted to set PID.SN,
and doesn't exempt the blocking case. In short, KVM uses vcpu->preempted, i.e.
current->on_rq, to differentiate between the vCPU getting preempted and KVM
executing schedule().
As a result, the false positive for vcpu->preempted causes KVM to suppress posted
interrupt notifications and the target vCPU never gets its wake event.
Peter,
Any thoughts on how best to handle this? The below hack-a-fix resolves the issue,
but it's obviously not appropriate. KVM uses vcpu->preempted for more than just
posted interrupts, so KVM needs equivalent functionality to current->on-rq as it
was before this commit.
@@ -6387,7 +6390,7 @@ static void kvm_sched_out(struct preempt_notifier *pn,
WRITE_ONCE(vcpu->scheduled_out, true);
- if (current->on_rq && vcpu->wants_to_run) {
+ if (se_runnable(¤t->se) && vcpu->wants_to_run) {
WRITE_ONCE(vcpu->preempted, true);
WRITE_ONCE(vcpu->ready, true);
}
next prev parent reply other threads:[~2024-10-10 2:49 UTC|newest]
Thread overview: 242+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-27 10:27 [PATCH 00/24] Complete EEVDF Peter Zijlstra
2024-07-27 10:27 ` [PATCH 01/24] sched/eevdf: Add feature comments Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-07-27 10:27 ` [PATCH 02/24] sched/eevdf: Remove min_vruntime_copy Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-07-27 10:27 ` [PATCH 03/24] sched/fair: Cleanup pick_task_fair() vs throttle Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-07-27 10:27 ` [PATCH 04/24] sched/fair: Cleanup pick_task_fair()s curr Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] sched/fair: Cleanup pick_task_fair()'s curr tip-bot2 for Peter Zijlstra
2024-07-27 10:27 ` [PATCH 05/24] sched/fair: Unify pick_{,next_}_task_fair() Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-07-27 10:27 ` [PATCH 06/24] sched: Allow sched_class::dequeue_task() to fail Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-07-27 10:27 ` [PATCH 07/24] sched/fair: Re-organize dequeue_task_fair() Peter Zijlstra
2024-08-09 16:53 ` Valentin Schneider
2024-08-10 22:17 ` Peter Zijlstra
2024-08-12 10:02 ` Valentin Schneider
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-07-27 10:27 ` [PATCH 08/24] sched: Split DEQUEUE_SLEEP from deactivate_task() Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-07-27 10:27 ` [PATCH 09/24] sched: Prepare generic code for delayed dequeue Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-07-27 10:27 ` [PATCH 10/24] sched/uclamg: Handle " Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-08-19 9:14 ` Christian Loehle
2024-08-20 16:23 ` [PATCH 10/24] " Hongyan Xia
2024-08-21 13:34 ` Hongyan Xia
2024-08-22 8:19 ` Vincent Guittot
2024-08-22 8:21 ` Vincent Guittot
2024-08-22 9:21 ` Luis Machado
2024-08-22 9:53 ` Vincent Guittot
2024-08-22 10:20 ` Vincent Guittot
2024-08-22 10:28 ` Luis Machado
2024-08-22 12:07 ` Luis Machado
2024-08-22 12:10 ` Vincent Guittot
2024-08-22 14:58 ` Vincent Guittot
2024-08-29 15:42 ` Hongyan Xia
2024-09-05 13:02 ` Dietmar Eggemann
2024-09-05 13:33 ` Vincent Guittot
2024-09-05 14:07 ` Dietmar Eggemann
2024-09-05 14:29 ` Vincent Guittot
2024-09-05 14:50 ` Dietmar Eggemann
2024-09-05 14:53 ` Peter Zijlstra
2024-09-06 6:14 ` Vincent Guittot
2024-09-06 10:45 ` Peter Zijlstra
2024-09-08 7:43 ` Mike Galbraith
2024-09-10 8:09 ` [tip: sched/core] sched/eevdf: More PELT vs DELAYED_DEQUEUE tip-bot2 for Peter Zijlstra
2024-11-27 4:17 ` K Prateek Nayak
2024-11-27 9:34 ` Luis Machado
2024-11-28 6:35 ` K Prateek Nayak
2024-09-10 11:04 ` [PATCH 10/24] sched/uclamg: Handle delayed dequeue Luis Machado
2024-09-10 14:05 ` Peter Zijlstra
2024-09-11 8:35 ` Luis Machado
2024-09-11 8:45 ` Peter Zijlstra
2024-09-11 8:55 ` Luis Machado
2024-09-11 9:10 ` Mike Galbraith
2024-09-11 9:13 ` Peter Zijlstra
2024-09-11 9:27 ` Mike Galbraith
2024-09-12 14:00 ` Mike Galbraith
2024-09-13 16:39 ` Mike Galbraith
2024-09-14 3:40 ` Mike Galbraith
2024-09-24 15:16 ` Luis Machado
2024-09-24 17:35 ` Mike Galbraith
2024-09-25 5:14 ` Mike Galbraith
2024-09-11 11:49 ` Dietmar Eggemann
2024-09-11 9:38 ` Luis Machado
2024-09-12 12:58 ` Luis Machado
2024-09-12 20:44 ` Dietmar Eggemann
2024-09-11 10:46 ` Luis Machado
2024-09-06 9:55 ` Dietmar Eggemann
2024-09-05 14:18 ` Peter Zijlstra
2024-09-10 8:09 ` [tip: sched/core] kernel/sched: Fix util_est accounting for DELAY_DEQUEUE tip-bot2 for Dietmar Eggemann
2024-07-27 10:27 ` [PATCH 11/24] sched/fair: Assert {set_next,put_prev}_entity() are properly balanced Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-07-27 10:27 ` [PATCH 12/24] sched/fair: Prepare exit/cleanup paths for delayed_dequeue Peter Zijlstra
2024-08-13 12:43 ` Valentin Schneider
2024-08-13 21:54 ` Peter Zijlstra
2024-08-13 22:07 ` Peter Zijlstra
2024-08-14 5:53 ` Peter Zijlstra
2024-08-27 9:35 ` Chen Yu
2024-08-27 20:29 ` Valentin Schneider
2024-08-28 2:55 ` Chen Yu
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-08-27 9:17 ` [PATCH 12/24] " Chen Yu
2024-08-28 3:06 ` Chen Yu
2024-07-27 10:27 ` [PATCH 13/24] sched/fair: Prepare pick_next_task() for delayed dequeue Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-09-10 9:16 ` [PATCH 13/24] " Luis Machado
2024-07-27 10:27 ` [PATCH 14/24] sched/fair: Implement ENQUEUE_DELAYED Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-07-27 10:27 ` [PATCH 15/24] sched,freezer: Mark TASK_FROZEN special Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-07-27 10:27 ` [PATCH 16/24] sched: Teach dequeue_task() about special task states Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-07-27 10:27 ` [PATCH 17/24] sched/fair: Implement delayed dequeue Peter Zijlstra
2024-08-02 14:39 ` Valentin Schneider
2024-08-02 14:59 ` Peter Zijlstra
2024-08-02 16:32 ` Valentin Schneider
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-08-19 10:01 ` [PATCH 17/24] " Luis Machado
2024-08-28 22:38 ` Marek Szyprowski
2024-10-10 2:49 ` Sean Christopherson [this message]
2024-10-10 7:57 ` Mike Galbraith
2024-10-10 16:18 ` Sean Christopherson
2024-10-10 17:12 ` Mike Galbraith
2024-10-10 8:19 ` Peter Zijlstra
2024-10-10 9:18 ` Peter Zijlstra
2024-10-10 18:23 ` Sean Christopherson
2024-10-12 14:15 ` [tip: sched/urgent] sched: Fix external p->on_rq users tip-bot2 for Peter Zijlstra
2024-10-14 7:28 ` [tip: sched/urgent] sched/fair: " tip-bot2 for Peter Zijlstra
2024-11-01 12:47 ` [PATCH 17/24] sched/fair: Implement delayed dequeue Phil Auld
2024-11-01 12:56 ` Peter Zijlstra
2024-11-01 13:38 ` Phil Auld
2024-11-01 14:26 ` Peter Zijlstra
2024-11-01 14:42 ` Phil Auld
2024-11-01 18:08 ` Mike Galbraith
2024-11-01 20:07 ` Phil Auld
2024-11-02 4:32 ` Mike Galbraith
2024-11-04 13:05 ` Phil Auld
2024-11-05 4:05 ` Mike Galbraith
2024-11-05 4:22 ` K Prateek Nayak
2024-11-05 6:46 ` Mike Galbraith
2024-11-06 3:02 ` K Prateek Nayak
2024-11-05 15:20 ` Phil Auld
2024-11-05 19:05 ` Phil Auld
2024-11-06 2:45 ` Mike Galbraith
2024-11-06 13:53 ` Peter Zijlstra
2024-11-06 14:14 ` Peter Zijlstra
2024-11-06 14:38 ` Peter Zijlstra
2024-11-06 15:22 ` Mike Galbraith
2024-11-07 4:03 ` Mike Galbraith
2024-11-07 9:46 ` Mike Galbraith
2024-11-07 14:02 ` Mike Galbraith
2024-11-07 14:09 ` Peter Zijlstra
2024-11-08 0:24 ` [PATCH] sched/fair: Dequeue sched_delayed tasks when waking to a busy CPU Mike Galbraith
2024-11-08 13:34 ` Phil Auld
2024-11-11 2:46 ` Xuewen Yan
2024-11-11 3:53 ` Mike Galbraith
2024-11-12 7:05 ` Mike Galbraith
2024-11-12 12:41 ` Phil Auld
2024-11-12 14:23 ` Peter Zijlstra
2024-11-12 14:23 ` Mike Galbraith
2024-11-12 15:41 ` Phil Auld
2024-11-12 16:15 ` Mike Galbraith
2024-11-14 11:07 ` Mike Galbraith
2024-11-14 11:28 ` Phil Auld
2024-11-19 11:30 ` Phil Auld
2024-11-19 11:51 ` Mike Galbraith
2024-11-20 18:37 ` Mike Galbraith
2024-11-21 11:56 ` Phil Auld
2024-11-21 12:07 ` Phil Auld
2024-11-21 21:21 ` Phil Auld
2024-11-23 8:44 ` [PATCH V2] " Mike Galbraith
2024-11-26 5:32 ` K Prateek Nayak
2024-11-26 6:30 ` Mike Galbraith
2024-11-26 9:42 ` Mike Galbraith
2024-12-02 19:15 ` Phil Auld
2024-11-27 14:13 ` Mike Galbraith
2024-12-02 16:24 ` Phil Auld
2024-12-02 16:55 ` Mike Galbraith
2024-12-02 19:12 ` Phil Auld
2024-12-09 13:11 ` Phil Auld
2024-12-09 15:06 ` Mike Galbraith
2024-11-06 14:14 ` [PATCH 17/24] sched/fair: Implement delayed dequeue Mike Galbraith
2024-11-06 14:33 ` Peter Zijlstra
2024-11-04 9:28 ` Dietmar Eggemann
2024-11-04 11:55 ` Dietmar Eggemann
2024-11-04 12:50 ` Phil Auld
2024-11-05 9:53 ` Christian Loehle
2024-11-05 15:55 ` Phil Auld
2024-11-08 14:53 ` Dietmar Eggemann
2024-11-08 18:16 ` Phil Auld
2024-11-11 11:29 ` Dietmar Eggemann
2024-07-27 10:27 ` [PATCH 18/24] sched/fair: Implement DELAY_ZERO Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-07-27 10:27 ` [PATCH 19/24] sched/eevdf: Fixup PELT vs DELAYED_DEQUEUE Peter Zijlstra
2024-08-13 12:43 ` Valentin Schneider
2024-08-13 22:18 ` Peter Zijlstra
2024-08-14 7:25 ` Peter Zijlstra
2024-08-14 7:28 ` Peter Zijlstra
2024-08-14 10:23 ` Valentin Schneider
2024-08-14 12:59 ` Vincent Guittot
2024-08-17 23:06 ` Peter Zijlstra
2024-08-19 12:50 ` Vincent Guittot
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-07-27 10:27 ` [PATCH 20/24] sched/fair: Avoid re-setting virtual deadline on migrations Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] sched/fair: Avoid re-setting virtual deadline on 'migrations' tip-bot2 for Peter Zijlstra
2024-07-27 10:27 ` [PATCH 21/24] sched/eevdf: Allow shorter slices to wakeup-preempt Peter Zijlstra
2024-08-05 12:24 ` Chunxin Zang
2024-08-07 17:54 ` Peter Zijlstra
2024-08-13 10:44 ` Chunxin Zang
2024-08-08 10:15 ` Chen Yu
2024-08-08 10:22 ` Peter Zijlstra
2024-08-08 12:31 ` Chen Yu
2024-08-09 7:35 ` Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-07-27 10:27 ` [PATCH 22/24] sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-07-27 10:27 ` [PATCH 23/24] sched/eevdf: Propagate min_slice up the cgroup hierarchy Peter Zijlstra
2024-08-18 6:23 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-09-29 2:02 ` [PATCH 23/24] " Tianchen Ding
2024-07-27 10:27 ` [RFC PATCH 24/24] sched/time: Introduce CLOCK_THREAD_DVFS_ID Peter Zijlstra
2024-07-28 21:30 ` Thomas Gleixner
2024-07-29 7:53 ` Juri Lelli
2024-08-02 11:29 ` Peter Zijlstra
2024-08-19 11:11 ` Christian Loehle
2024-08-01 12:08 ` [PATCH 00/24] Complete EEVDF Luis Machado
2024-08-14 14:34 ` Vincent Guittot
2024-08-14 16:45 ` Mike Galbraith
2024-08-14 16:59 ` Vincent Guittot
2024-08-14 17:18 ` Mike Galbraith
2024-08-14 17:25 ` Vincent Guittot
2024-08-14 17:35 ` K Prateek Nayak
2024-08-16 15:22 ` Valentin Schneider
2024-08-20 16:43 ` Hongyan Xia
2024-08-21 9:46 ` Hongyan Xia
2024-08-21 16:25 ` Mike Galbraith
2024-08-22 15:55 ` Peter Zijlstra
2024-08-27 9:43 ` Hongyan Xia
2024-08-29 17:02 ` Aleksandr Nogikh
2024-09-10 11:45 ` Sven Schnelle
2024-09-10 12:21 ` Sven Schnelle
2024-09-10 14:07 ` Peter Zijlstra
2024-09-10 14:52 ` Sven Schnelle
2024-11-06 1:07 ` Saravana Kannan
2024-11-06 6:19 ` K Prateek Nayak
2024-11-06 11:09 ` Peter Zijlstra
2024-11-06 12:06 ` Luis Machado
2024-11-08 7:07 ` Saravana Kannan
2024-11-08 23:17 ` Samuel Wu
2024-11-11 4:07 ` K Prateek Nayak
2024-11-26 23:32 ` Saravana Kannan
2024-11-28 10:32 ` [REGRESSION] " Marcel Ziswiler
2024-11-28 10:58 ` Peter Zijlstra
2024-11-28 11:37 ` Marcel Ziswiler
2024-11-29 9:08 ` Peter Zijlstra
2024-12-02 18:46 ` Marcel Ziswiler
2024-12-09 9:49 ` Peter Zijlstra
2024-12-10 16:05 ` Marcel Ziswiler
2024-12-10 16:13 ` Steven Rostedt
2024-12-10 8:45 ` Luis Machado
-- strict thread matches above, loose matches on Subject: below --
2024-08-30 12:34 [PATCH 17/24] sched/fair: Implement delayed dequeue Bert Karwatzki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZwdA0sbA2tJA3IKh@google.com \
--to=seanjc@google.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=efault@gmx.de \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=m.szyprowski@samsung.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=wuyun.abel@bytedance.com \
--cc=youssefesmat@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.