kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gleb Natapov <gleb@redhat.com>
To: Avi Kivity <avi@redhat.com>
Cc: kvm@vger.kernel.org, mtosatti@redhat.com, mingo@elte.hu,
	a.p.zijlstra@chello.nl, acme@ghostprotocols.net
Subject: Re: [PATCH 2/9] KVM: Expose a version 2 architectural PMU to a guests
Date: Wed, 2 Nov 2011 13:09:46 +0200	[thread overview]
Message-ID: <20111102110945.GC14726@redhat.com> (raw)
In-Reply-To: <4EB1150F.3020509@redhat.com>

On Wed, Nov 02, 2011 at 12:01:51PM +0200, Avi Kivity wrote:
> On 11/01/2011 02:30 PM, Gleb Natapov wrote:
> > > > +
> > > > +/* mapping between fixed pmc index and arch_events array */
> > > > +int fixed_pmc_events[] = {1, 0, 2};
> > > > +
> > > > +static bool pmc_is_gp(struct kvm_pmc *pmc)
> > > > +{
> > > > +	return pmc->type == KVM_PMC_GP;
> > > > +}
> > > > +
> > > > +static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
> > > > +{
> > > > +	struct kvm_pmu *pmu = &pmc->vcpu->arch.pmu;
> > > > +
> > > > +	return pmc_is_gp(pmc) ? pmu->gp_counter_bitmask :
> > > > +		pmu->fixed_counter_bitmask;
> > > > +}
> > > 
> > > Nicer to just push the bitmask (or bitwidth) into the counter itself.
> > > 
> > Hmm, is it really nicer to replicate the same information 35 times?
> 
> If it were 35 times, you could do pmu->type->bitmask.  But it's just 5
> or 6 times.
> 
It is 35. Perf defines X86_PMC_MAX_GENERIC to be 32 and
X86_PMC_MAX_FIXED to be 3. I can do pmu->type->bitmask if you think it
is better.

> > > > +
> > > > +static void kvm_perf_overflow_intr(struct perf_event *perf_event,
> > > > +		struct perf_sample_data *data, struct pt_regs *regs)
> > > > +{
> > > > +	struct kvm_pmc *pmc = perf_event->overflow_handler_context;
> > > > +	struct kvm_pmu *pmu = &pmc->vcpu->arch.pmu;
> > > > +	if (!__test_and_set_bit(pmc_to_global_idx(pmc),
> > > > +				(unsigned long *)&pmu->reprogram_pmi)) {
> > > > +		kvm_perf_overflow(perf_event, data, regs);
> > > > +		kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
> > > > +	}
> > > > +}
> > > 
> > > Is it safe to use the __ versions here?
> > >
> > It supposed to run in an NMI context on the same CPU that just ran
> > the vcpu so simultaneous access to the same variable from different
> > CPUs shouldn't be possible. But if your scenario below can happen then
> > that assumption may not hold. The question is if PMI delivery can be
> > so skewed as to be delivered long after vmexit (which switches perf msr
> > values btw).
> 
> The compiler/runtime is allowed to implement __test_and_set_bit() as
> multiple instructions, no? Do we have any similar sequences outside nmi
> context?
> 
Yes we do. On handling PMU event during guest entry and during event
reprogramming. On x86 __ version is different from non __ version only
by lock prefix. It would be pity to use locked functions here though. We
need local_ functions for bitops.

> > > Do we need to follow kvm_make_request() with kvm_vcpu_kick()?  If there
> > > is a skew between the overflow and the host PMI, the guest might have
> > > executed a HLT.
> > Is kvm_vcpu_kick() safe for NMI context?
> 
> No.  There is irq_work_queue() for that.  Would be good to avoid it if
> we know that it's safe to (for example if we have PF_VCPU set).
> 
Checking PF_VCPU will not tell us that vcpu is going to reenter guest
mode again.

> > > 
> > > > +
> > > > +static void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 en_pmi, int idx)
> > > > +{
> > > > +	unsigned en = en_pmi & 0x3;
> > > > +	bool pmi = en_pmi & 0x8;
> > > > +
> > > > +	stop_counter(pmc);
> > > > +
> > > > +	if (!en || !pmc_enabled(pmc))
> > > > +		return;
> > > > +
> > > > +	reprogram_counter(pmc, PERF_TYPE_HARDWARE,
> > > > +			arch_events[fixed_pmc_events[idx]].event_type,
> > > > +			!(en & 0x2), /* exclude user */
> > > > +			!(en & 0x1), /* exclude kernel */
> > > > +			pmi);
> > > 
> > > Are there no #defines for those constants?
> > > 
> > Nope. perf_event_intel.c open codes them too.
> 
> Okay.
> 
> > > 
> > > The user can cause this to be very small (even zero).  Can this cause an
> > > NMI storm?
> > > 
> > If user will set it to zero then attr.sample_period will always be 0 and
> > perf will think that the event is non sampling and will use max_period
> > instead. For a small value greater than zero how is it different from
> > userspace creating an event with sample_period of 1?
> 
> I don't know.  Does the kernel survive it?
> 
Need to test, but I do not see anything in the kernel that prevent userspace
from setting it to any value.

--
			Gleb.

  reply	other threads:[~2011-11-02 11:10 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-30 16:53 [PATCH 0/9] KVM in-guest performance monitoring Gleb Natapov
2011-10-30 16:53 ` [PATCH 1/9] KVM: Expose kvm_lapic_local_deliver() Gleb Natapov
2011-10-30 16:53 ` [PATCH 2/9] KVM: Expose a version 2 architectural PMU to a guests Gleb Natapov
2011-11-01 10:47   ` Avi Kivity
2011-11-01 12:30     ` Gleb Natapov
2011-11-01 13:57       ` Gleb Natapov
2011-11-02  9:54         ` Avi Kivity
2011-11-02  9:56           ` Gleb Natapov
2011-11-02 10:01       ` Avi Kivity
2011-11-02 11:09         ` Gleb Natapov [this message]
2011-11-02 12:03           ` Gleb Natapov
2011-11-03  8:31           ` Gleb Natapov
2011-12-15 12:04   ` [PATCH] KVM: x86: Fix build breakage due to anonymous field initialization Jan Kiszka
2011-12-15 12:08     ` Peter Zijlstra
2011-12-15 12:16       ` Jan Kiszka
2011-12-26 12:38     ` Avi Kivity
2011-10-30 16:53 ` [PATCH 3/9] KVM: Add generic RDPMC support Gleb Natapov
2011-10-30 16:53 ` [PATCH 4/9] KVM: SVM: Intercept RDPMC Gleb Natapov
2011-10-30 16:53 ` [PATCH 5/9] KVM: VMX: " Gleb Natapov
2011-10-30 16:53 ` [PATCH 6/9] perf: expose perf capability to other modules Gleb Natapov
2011-11-01 10:49   ` Avi Kivity
2011-11-01 15:49   ` David Ahern
2011-11-01 16:13     ` Gleb Natapov
2011-11-01 16:20       ` David Ahern
2011-11-01 16:41         ` Gleb Natapov
2011-11-02  7:42         ` Frederic Weisbecker
2011-11-07 14:45           ` Will Deacon
2011-11-10  8:58             ` Frederic Weisbecker
2011-11-10 12:12               ` Jason Wessel
2011-11-15 18:34                 ` Frederic Weisbecker
2011-10-30 16:53 ` [PATCH 7/9] KVM: Expose the architectural performance monitoring CPUID leaf Gleb Natapov
2011-11-01 10:51   ` Avi Kivity
2011-11-01 11:25     ` Gleb Natapov
2011-11-01 15:49   ` David Ahern
2011-11-01 16:18     ` Gleb Natapov
2011-11-01 16:24       ` David Ahern
2011-11-01 16:40         ` Gleb Natapov
2011-11-01 17:43           ` David Ahern
2011-11-02 11:18             ` Gleb Natapov
2011-10-30 16:53 ` [PATCH 8/9] KVM: x86 emulator: fix RDPMC privilege check Gleb Natapov
2011-10-30 16:53 ` [PATCH 9/9] KVM: x86 emulator: implement RDPMC (0F 33) Gleb Natapov
2011-10-30 16:57 ` [PATCH 0/9] KVM in-guest performance monitoring Gleb Natapov
  -- strict thread matches above, loose matches on Subject: below --
2011-11-03 12:31 Gleb Natapov
2011-11-03 12:31 ` [PATCH 2/9] KVM: Expose a version 2 architectural PMU to a guests Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111102110945.GC14726@redhat.com \
    --to=gleb@redhat.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@ghostprotocols.net \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).