All of lore.kernel.org
 help / color / mirror / Atom feed
From: Quentin Perret <qperret@google.com>
To: Colton Lewis <coltonlewis@google.com>
Cc: kvm@vger.kernel.org, maz@kernel.org, oliver.upton@linux.dev,
	james.morse@arm.com, suzuki.poulose@arm.com,
	yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
	pbonzini@redhat.com, mingo@redhat.com, peterz@infradead.org,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
	vschneid@redhat.com, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] KVM: arm64: Add KVM_CAP to control WFx trapping
Date: Thu, 28 Mar 2024 10:30:08 +0000	[thread overview]
Message-ID: <ZgVGsKoAoW4YwQD_@google.com> (raw)
In-Reply-To: <gsntjzlqax63.fsf@coltonlewis-kvm.c.googlers.com>

Hi Colton,

On Monday 25 Mar 2024 at 20:12:04 (+0000), Colton Lewis wrote:
> Thanks for the feedback.
> 
> Quentin Perret <qperret@google.com> writes:
> 
> > On Friday 22 Mar 2024 at 14:24:35 (+0000), Quentin Perret wrote:
> > > On Tuesday 19 Mar 2024 at 16:43:41 (+0000), Colton Lewis wrote:
> > > > Add a KVM_CAP to control WFx (WFI or WFE) trapping based on scheduler
> > > > runqueue depth. This is so they can be passed through if the runqueue
> > > > is shallow or the CPU has support for direct interrupt injection. They
> > > > may be always trapped by setting this value to 0. Technically this
> > > > means traps will be cleared when the runqueue depth is 0, but that
> > > > implies nothing is running anyway so there is no reason to care. The
> > > > default value is 1 to preserve previous behavior before adding this
> > > > option.
> 
> > > I recently discovered that this was enabled by default, but it's not
> > > obvious to me everyone will want this enabled, so I'm in favour of
> > > figuring out a way to turn it off (in fact we might want to make this
> > > feature opt in as the status quo used to be to always trap).
> 
> Setting the introduced threshold to zero will cause it to trap whenever
> something is running. Is there a problem with doing it that way?

No problem per se, I was simply hoping we could set the default to zero
to revert to the old behaviour. I don't think removing WFx traps was a
universally desirable behaviour, so it prob should have been opt-in from
the start.

> I'd also be interested to get more input before changing the current
> default behavior.

Ack, that is my personal opinion.

> > > There are a few potential issues I see with having this enabled:
> 
> > >   - a lone vcpu thread on a CPU will completely screw up the host
> > >     scheduler's load tracking metrics if the vCPU actually spends a
> > >     significant amount of time in WFI (the PELT signal will no longer
> > >     be a good proxy for "how much CPU time does this task need");
> 
> > >   - the scheduler's decision will impact massively the behaviour of the
> > >     vcpu task itself. Co-scheduling a task with a vcpu task (or not) will
> > >     impact massively the perceived behaviour of the vcpu task in a way
> > >     that is entirely unpredictable to the scheduler;
> 
> > >   - while the above problems might be OK for some users, I don't think
> > >     this will always be true, e.g. when running on big.LITTLE systems the
> > >     above sounds nightmare-ish;
> 
> > >   - the guest spending long periods of time in WFI prevents the host from
> > >     being able to enter deeper idle states, which will impact power very
> > >     negatively;
> 
> > > And probably a whole bunch of other things.
> 
> > > > Think about his option as a threshold. The instruction will be trapped
> > > > if the runqueue depth is higher than the threshold.
> 
> > > So talking about the exact interface, I'm not sure exposing this to
> > > userspace is really appropriate. The current rq depth is next to
> > > impossible for userspace to control well.
> 
> Using runqueue depth is going off of a suggestion from Oliver [1], who I've
> also talked to internally at Google a few times about this.
> 
> But hearing your comment makes me lean more towards having some
> enumeration of behaviors like TRAP_ALWAYS, TRAP_NEVER,
> TRAP_IF_MULTIPLE_TASKS.

Do you guys really expect to set this TRAP_IF_MULTIPLE_TASKS? Again, the
rq depth is quite hard to reason about from userspace, so not sure
anybody will really want that? A simpler on/off thing might be simpler.

> > > My gut feeling tells me we might want to gate all of this on
> > > PREEMPT_FULL instead, since PREEMPT_FULL is pretty much a way to say
> > > "I'm willing to give up scheduler tracking accuracy to gain throughput
> > > when I've got a task running alone on a CPU". Thoughts?
> 
> > And obviously I meant s/PREEMPT_FULL/NOHZ_FULL, but hopefully that was
> > clear :-)
> 
> Sounds good to me but I've not touched anything scheduling related before.

Do you guys use NOHZ_FULL in prod? If not that idea might very well be a
non-starter, because switching to NOHZ_FULL would be a big ask. So,
yeah, I'm curious :)

Thanks,
Quentin

WARNING: multiple messages have this Message-ID (diff)
From: Quentin Perret <qperret@google.com>
To: Colton Lewis <coltonlewis@google.com>
Cc: kvm@vger.kernel.org, maz@kernel.org, oliver.upton@linux.dev,
	james.morse@arm.com, suzuki.poulose@arm.com,
	yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
	pbonzini@redhat.com, mingo@redhat.com, peterz@infradead.org,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
	vschneid@redhat.com, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] KVM: arm64: Add KVM_CAP to control WFx trapping
Date: Thu, 28 Mar 2024 10:30:08 +0000	[thread overview]
Message-ID: <ZgVGsKoAoW4YwQD_@google.com> (raw)
In-Reply-To: <gsntjzlqax63.fsf@coltonlewis-kvm.c.googlers.com>

Hi Colton,

On Monday 25 Mar 2024 at 20:12:04 (+0000), Colton Lewis wrote:
> Thanks for the feedback.
> 
> Quentin Perret <qperret@google.com> writes:
> 
> > On Friday 22 Mar 2024 at 14:24:35 (+0000), Quentin Perret wrote:
> > > On Tuesday 19 Mar 2024 at 16:43:41 (+0000), Colton Lewis wrote:
> > > > Add a KVM_CAP to control WFx (WFI or WFE) trapping based on scheduler
> > > > runqueue depth. This is so they can be passed through if the runqueue
> > > > is shallow or the CPU has support for direct interrupt injection. They
> > > > may be always trapped by setting this value to 0. Technically this
> > > > means traps will be cleared when the runqueue depth is 0, but that
> > > > implies nothing is running anyway so there is no reason to care. The
> > > > default value is 1 to preserve previous behavior before adding this
> > > > option.
> 
> > > I recently discovered that this was enabled by default, but it's not
> > > obvious to me everyone will want this enabled, so I'm in favour of
> > > figuring out a way to turn it off (in fact we might want to make this
> > > feature opt in as the status quo used to be to always trap).
> 
> Setting the introduced threshold to zero will cause it to trap whenever
> something is running. Is there a problem with doing it that way?

No problem per se, I was simply hoping we could set the default to zero
to revert to the old behaviour. I don't think removing WFx traps was a
universally desirable behaviour, so it prob should have been opt-in from
the start.

> I'd also be interested to get more input before changing the current
> default behavior.

Ack, that is my personal opinion.

> > > There are a few potential issues I see with having this enabled:
> 
> > >   - a lone vcpu thread on a CPU will completely screw up the host
> > >     scheduler's load tracking metrics if the vCPU actually spends a
> > >     significant amount of time in WFI (the PELT signal will no longer
> > >     be a good proxy for "how much CPU time does this task need");
> 
> > >   - the scheduler's decision will impact massively the behaviour of the
> > >     vcpu task itself. Co-scheduling a task with a vcpu task (or not) will
> > >     impact massively the perceived behaviour of the vcpu task in a way
> > >     that is entirely unpredictable to the scheduler;
> 
> > >   - while the above problems might be OK for some users, I don't think
> > >     this will always be true, e.g. when running on big.LITTLE systems the
> > >     above sounds nightmare-ish;
> 
> > >   - the guest spending long periods of time in WFI prevents the host from
> > >     being able to enter deeper idle states, which will impact power very
> > >     negatively;
> 
> > > And probably a whole bunch of other things.
> 
> > > > Think about his option as a threshold. The instruction will be trapped
> > > > if the runqueue depth is higher than the threshold.
> 
> > > So talking about the exact interface, I'm not sure exposing this to
> > > userspace is really appropriate. The current rq depth is next to
> > > impossible for userspace to control well.
> 
> Using runqueue depth is going off of a suggestion from Oliver [1], who I've
> also talked to internally at Google a few times about this.
> 
> But hearing your comment makes me lean more towards having some
> enumeration of behaviors like TRAP_ALWAYS, TRAP_NEVER,
> TRAP_IF_MULTIPLE_TASKS.

Do you guys really expect to set this TRAP_IF_MULTIPLE_TASKS? Again, the
rq depth is quite hard to reason about from userspace, so not sure
anybody will really want that? A simpler on/off thing might be simpler.

> > > My gut feeling tells me we might want to gate all of this on
> > > PREEMPT_FULL instead, since PREEMPT_FULL is pretty much a way to say
> > > "I'm willing to give up scheduler tracking accuracy to gain throughput
> > > when I've got a task running alone on a CPU". Thoughts?
> 
> > And obviously I meant s/PREEMPT_FULL/NOHZ_FULL, but hopefully that was
> > clear :-)
> 
> Sounds good to me but I've not touched anything scheduling related before.

Do you guys use NOHZ_FULL in prod? If not that idea might very well be a
non-starter, because switching to NOHZ_FULL would be a big ask. So,
yeah, I'm curious :)

Thanks,
Quentin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2024-03-28 10:30 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-19 16:43 [PATCH v2] KVM: arm64: Add KVM_CAP to control WFx trapping Colton Lewis
2024-03-19 16:43 ` Colton Lewis
2024-03-22 14:24 ` Quentin Perret
2024-03-22 14:24   ` Quentin Perret
2024-03-22 14:34   ` Quentin Perret
2024-03-22 14:34     ` Quentin Perret
2024-03-25 20:12     ` Colton Lewis
2024-03-25 20:12       ` Colton Lewis
2024-03-28 10:30       ` Quentin Perret [this message]
2024-03-28 10:30         ` Quentin Perret

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZgVGsKoAoW4YwQD_@google.com \
    --to=qperret@google.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=coltonlewis@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=james.morse@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=suzuki.poulose@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.