From: Sean Christopherson <seanjc@google.com>
To: Like Xu <like.xu.linux@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Wanpeng Li <wanpengli@tencent.com>
Subject: Re: [PATCH v3 5/7] KVM: x86/pmu: Defer counter emulated overflow via pmc->prev_counter
Date: Thu, 22 Sep 2022 23:59:03 +0000 [thread overview]
Message-ID: <Yyz2x5bSR/7ZTV0R@google.com> (raw)
In-Reply-To: <20220831085328.45489-6-likexu@tencent.com>
On Wed, Aug 31, 2022, Like Xu wrote:
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 7f391750ebd3..3c42df3a55ff 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -349,6 +349,10 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
> }
>
> reprogram_counter(pmc);
> +
> + if (pmc->counter < pmc->prev_counter)
> + __kvm_perf_overflow(pmc, false);
I would prefer to stick this in repgrogram_counter(), after pausing the counter
and checking that the event is enabled, but before the actual programming/resume.
I don't think false positives are actually possible, especially without my fixes
for clearing reprogram_pmi bits (incoming), but I don't like the twisty logic
that's required to suss out that prev_counter can be non-zero if and only if the
PMC is enabled.
The bigger issue is that calling __kvm_perf_overflow() here can get false negatives.
If reprogramming fails due to contention, the reprogram_pmi bit will be left set
and so this check in __kvm_perf_overflow() will suppress the PMI.
if (test_and_set_bit(pmc->idx, pmu->reprogram_pmi))
return;
And the related issue is that because __kvm_perf_overflow() sets the bit and
makes another KVM_REQ_PMU, overflow will cause KVM to reprogram the counter a
second time. That's especially inefficient since KVM will get quite far into the
VM-Enter flow before detecting the new event.
So, I think this? (goes on top of patches I'm about to post)
static void reprogram_counter(struct kvm_pmc *pmc)
{
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
u64 eventsel = pmc->eventsel;
u64 new_config = eventsel;
u8 fixed_ctr_ctrl;
pmc_pause_counter(pmc);
if (!pmc_speculative_in_use(pmc) || !pmc_is_enabled(pmc))
goto reprogram_complete;
if (!check_pmu_event_filter(pmc))
goto reprogram_complete;
if (pmc->counter < pmc->prev_counter)
__kvm_perf_overflow(pmc, false);
if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
printk_once("kvm pmu: pin control bit is ignored\n");
if (pmc_is_fixed(pmc)) {
fixed_ctr_ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl,
pmc->idx - INTEL_PMC_IDX_FIXED);
if (fixed_ctr_ctrl & 0x1)
eventsel |= ARCH_PERFMON_EVENTSEL_OS;
if (fixed_ctr_ctrl & 0x2)
eventsel |= ARCH_PERFMON_EVENTSEL_USR;
if (fixed_ctr_ctrl & 0x8)
eventsel |= ARCH_PERFMON_EVENTSEL_INT;
new_config = (u64)fixed_ctr_ctrl;
}
if (pmc->current_config == new_config && pmc_resume_counter(pmc))
goto reprogram_complete;
pmc_release_perf_event(pmc);
pmc->current_config = new_config;
/*
* If reprogramming fails, e.g. due to contention, leave the counter's
* regprogram bit set, i.e. opportunistically try again on the next PMU
* refresh. Don't make a new request as doing so can stall the guest
* if reprogramming repeatedly fails.
*/
if (pmc_reprogram_counter(pmc, PERF_TYPE_RAW,
(eventsel & pmu->raw_event_mask),
!(eventsel & ARCH_PERFMON_EVENTSEL_USR),
!(eventsel & ARCH_PERFMON_EVENTSEL_OS),
eventsel & ARCH_PERFMON_EVENTSEL_INT))
return;
reprogram_complete:
clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
pmc->prev_counter = 0;
}
static inline void __kvm_perf_overflow(struct kvm_pmc *pmc, bool in_pmi)
{
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
bool skip_pmi = false;
if (pmc->perf_event && pmc->perf_event->attr.precise_ip) {
if (!in_pmi) {
/*
* TODO: KVM is currently _choosing_ to not generate records
* for emulated instructions, avoiding BUFFER_OVF PMI when
* there are no records. Strictly speaking, it should be done
* as well in the right context to improve sampling accuracy.
*/
skip_pmi = true;
} else {
/* Indicate PEBS overflow PMI to guest. */
skip_pmi = __test_and_set_bit(GLOBAL_STATUS_BUFFER_OVF_BIT,
(unsigned long *)&pmu->global_status);
}
} else {
__set_bit(pmc->idx, (unsigned long *)&pmu->global_status);
}
if (!pmc->intr || skip_pmi)
return;
/*
* Inject PMI. If vcpu was in a guest mode during NMI PMI
* can be ejected on a guest mode re-entry. Otherwise we can't
* be sure that vcpu wasn't executing hlt instruction at the
* time of vmexit and is not going to re-enter guest mode until
* woken up. So we should wake it, but this is impossible from
* NMI context. Do it from irq work instead.
*/
if (in_pmi && !kvm_handling_nmi_from_guest(pmc->vcpu))
irq_work_queue(&pmc_to_pmu(pmc)->irq_work);
else
kvm_make_request(KVM_REQ_PMI, pmc->vcpu);
}
static void kvm_perf_overflow(struct perf_event *perf_event,
struct perf_sample_data *data,
struct pt_regs *regs)
{
struct kvm_pmc *pmc = perf_event->overflow_handler_context;
/*
* Ignore overflow events for counters that are scheduled to be
* reprogrammed, e.g. if a PMI for the previous event races with KVM's
* handling of a related guest WRMSR.
*/
if (test_and_set_bit(pmc->idx, pmc_to_pmu(pmc)->reprogram_pmi))
return;
__kvm_perf_overflow(pmc, true);
kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
}
next prev parent reply other threads:[~2022-09-22 23:59 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-31 8:53 [PATCH v3 0/7] x86/pmu: Corner cases fixes and optimization Like Xu
2022-08-31 8:53 ` [PATCH v3 1/7] KVM: x86/pmu: Avoid setting BIT_ULL(-1) to pmu->host_cross_mapped_mask Like Xu
2022-09-22 22:30 ` Sean Christopherson
2022-08-31 8:53 ` [PATCH v3 2/7] KVM: x86/pmu: Don't generate PEBS records for emulated instructions Like Xu
2022-08-31 8:53 ` [PATCH v3 3/7] KVM: x86/pmu: Avoid using PEBS perf_events for normal counters Like Xu
2022-09-22 22:35 ` Sean Christopherson
2022-08-31 8:53 ` [PATCH v3 4/7] KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event() Like Xu
2022-09-22 23:18 ` Sean Christopherson
2022-08-31 8:53 ` [PATCH v3 5/7] KVM: x86/pmu: Defer counter emulated overflow via pmc->prev_counter Like Xu
2022-09-22 23:59 ` Sean Christopherson [this message]
2022-10-14 8:08 ` Like Xu
2022-08-31 8:53 ` [PATCH v3 6/7] KVM: x86/svm/pmu: Direct access pmu->gp_counter[] to implement amd_*_to_pmc() Like Xu
2022-08-31 8:53 ` [PATCH v3 7/7] KVM: x86/svm/pmu: Rewrite get_gp_pmc_amd() for more counters scalability Like Xu
2022-09-07 9:52 ` [PATCH v3 0/7] x86/pmu: Corner cases fixes and optimization Like Xu
2022-09-19 8:58 ` Like Xu
2022-09-22 22:29 ` Sean Christopherson
2022-09-22 23:11 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yyz2x5bSR/7ZTV0R@google.com \
--to=seanjc@google.com \
--cc=kvm@vger.kernel.org \
--cc=like.xu.linux@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox