From: Sean Christopherson <seanjc@google.com>
To: Like Xu <like.xu.linux@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Wanpeng Li <wanpengli@tencent.com>
Subject: Re: [PATCH v3 5/7] KVM: x86/pmu: Defer counter emulated overflow via pmc->prev_counter
Date: Thu, 22 Sep 2022 23:59:03 +0000 [thread overview]
Message-ID: <Yyz2x5bSR/7ZTV0R@google.com> (raw)
In-Reply-To: <20220831085328.45489-6-likexu@tencent.com>
On Wed, Aug 31, 2022, Like Xu wrote:
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 7f391750ebd3..3c42df3a55ff 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -349,6 +349,10 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
> }
>
> reprogram_counter(pmc);
> +
> + if (pmc->counter < pmc->prev_counter)
> + __kvm_perf_overflow(pmc, false);
I would prefer to stick this in repgrogram_counter(), after pausing the counter
and checking that the event is enabled, but before the actual programming/resume.
I don't think false positives are actually possible, especially without my fixes
for clearing reprogram_pmi bits (incoming), but I don't like the twisty logic
that's required to suss out that prev_counter can be non-zero if and only if the
PMC is enabled.
The bigger issue is that calling __kvm_perf_overflow() here can get false negatives.
If reprogramming fails due to contention, the reprogram_pmi bit will be left set
and so this check in __kvm_perf_overflow() will suppress the PMI.
if (test_and_set_bit(pmc->idx, pmu->reprogram_pmi))
return;
And the related issue is that because __kvm_perf_overflow() sets the bit and
makes another KVM_REQ_PMU, overflow will cause KVM to reprogram the counter a
second time. That's especially inefficient since KVM will get quite far into the
VM-Enter flow before detecting the new event.
So, I think this? (goes on top of patches I'm about to post)
static void reprogram_counter(struct kvm_pmc *pmc)
{
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
u64 eventsel = pmc->eventsel;
u64 new_config = eventsel;
u8 fixed_ctr_ctrl;
pmc_pause_counter(pmc);
if (!pmc_speculative_in_use(pmc) || !pmc_is_enabled(pmc))
goto reprogram_complete;
if (!check_pmu_event_filter(pmc))
goto reprogram_complete;
if (pmc->counter < pmc->prev_counter)
__kvm_perf_overflow(pmc, false);
if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
printk_once("kvm pmu: pin control bit is ignored\n");
if (pmc_is_fixed(pmc)) {
fixed_ctr_ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl,
pmc->idx - INTEL_PMC_IDX_FIXED);
if (fixed_ctr_ctrl & 0x1)
eventsel |= ARCH_PERFMON_EVENTSEL_OS;
if (fixed_ctr_ctrl & 0x2)
eventsel |= ARCH_PERFMON_EVENTSEL_USR;
if (fixed_ctr_ctrl & 0x8)
eventsel |= ARCH_PERFMON_EVENTSEL_INT;
new_config = (u64)fixed_ctr_ctrl;
}
if (pmc->current_config == new_config && pmc_resume_counter(pmc))
goto reprogram_complete;
pmc_release_perf_event(pmc);
pmc->current_config = new_config;
/*
* If reprogramming fails, e.g. due to contention, leave the counter's
* regprogram bit set, i.e. opportunistically try again on the next PMU
* refresh. Don't make a new request as doing so can stall the guest
* if reprogramming repeatedly fails.
*/
if (pmc_reprogram_counter(pmc, PERF_TYPE_RAW,
(eventsel & pmu->raw_event_mask),
!(eventsel & ARCH_PERFMON_EVENTSEL_USR),
!(eventsel & ARCH_PERFMON_EVENTSEL_OS),
eventsel & ARCH_PERFMON_EVENTSEL_INT))
return;
reprogram_complete:
clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
pmc->prev_counter = 0;
}
static inline void __kvm_perf_overflow(struct kvm_pmc *pmc, bool in_pmi)
{
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
bool skip_pmi = false;
if (pmc->perf_event && pmc->perf_event->attr.precise_ip) {
if (!in_pmi) {
/*
* TODO: KVM is currently _choosing_ to not generate records
* for emulated instructions, avoiding BUFFER_OVF PMI when
* there are no records. Strictly speaking, it should be done
* as well in the right context to improve sampling accuracy.
*/
skip_pmi = true;
} else {
/* Indicate PEBS overflow PMI to guest. */
skip_pmi = __test_and_set_bit(GLOBAL_STATUS_BUFFER_OVF_BIT,
(unsigned long *)&pmu->global_status);
}
} else {
__set_bit(pmc->idx, (unsigned long *)&pmu->global_status);
}
if (!pmc->intr || skip_pmi)
return;
/*
* Inject PMI. If vcpu was in a guest mode during NMI PMI
* can be ejected on a guest mode re-entry. Otherwise we can't
* be sure that vcpu wasn't executing hlt instruction at the
* time of vmexit and is not going to re-enter guest mode until
* woken up. So we should wake it, but this is impossible from
* NMI context. Do it from irq work instead.
*/
if (in_pmi && !kvm_handling_nmi_from_guest(pmc->vcpu))
irq_work_queue(&pmc_to_pmu(pmc)->irq_work);
else
kvm_make_request(KVM_REQ_PMI, pmc->vcpu);
}
static void kvm_perf_overflow(struct perf_event *perf_event,
struct perf_sample_data *data,
struct pt_regs *regs)
{
struct kvm_pmc *pmc = perf_event->overflow_handler_context;
/*
* Ignore overflow events for counters that are scheduled to be
* reprogrammed, e.g. if a PMI for the previous event races with KVM's
* handling of a related guest WRMSR.
*/
if (test_and_set_bit(pmc->idx, pmc_to_pmu(pmc)->reprogram_pmi))
return;
__kvm_perf_overflow(pmc, true);
kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
}
next prev parent reply other threads:[~2022-09-22 23:59 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-31 8:53 [PATCH v3 0/7] x86/pmu: Corner cases fixes and optimization Like Xu
2022-08-31 8:53 ` [PATCH v3 1/7] KVM: x86/pmu: Avoid setting BIT_ULL(-1) to pmu->host_cross_mapped_mask Like Xu
2022-09-22 22:30 ` Sean Christopherson
2022-08-31 8:53 ` [PATCH v3 2/7] KVM: x86/pmu: Don't generate PEBS records for emulated instructions Like Xu
2022-08-31 8:53 ` [PATCH v3 3/7] KVM: x86/pmu: Avoid using PEBS perf_events for normal counters Like Xu
2022-09-22 22:35 ` Sean Christopherson
2022-08-31 8:53 ` [PATCH v3 4/7] KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event() Like Xu
2022-09-22 23:18 ` Sean Christopherson
2022-08-31 8:53 ` [PATCH v3 5/7] KVM: x86/pmu: Defer counter emulated overflow via pmc->prev_counter Like Xu
2022-09-22 23:59 ` Sean Christopherson [this message]
2022-10-14 8:08 ` Like Xu
2022-08-31 8:53 ` [PATCH v3 6/7] KVM: x86/svm/pmu: Direct access pmu->gp_counter[] to implement amd_*_to_pmc() Like Xu
2022-08-31 8:53 ` [PATCH v3 7/7] KVM: x86/svm/pmu: Rewrite get_gp_pmc_amd() for more counters scalability Like Xu
2022-09-07 9:52 ` [PATCH v3 0/7] x86/pmu: Corner cases fixes and optimization Like Xu
2022-09-19 8:58 ` Like Xu
2022-09-22 22:29 ` Sean Christopherson
2022-09-22 23:11 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yyz2x5bSR/7ZTV0R@google.com \
--to=seanjc@google.com \
--cc=kvm@vger.kernel.org \
--cc=like.xu.linux@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.