From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5528C352AA for ; Mon, 7 Oct 2019 10:49:20 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8F9AF2084D for ; Mon, 7 Oct 2019 10:49:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="k2UmPgV+" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8F9AF2084D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Subject:To:From:Message-ID:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=ETGyP8IPcDi3uC1M2p3fcBAOOMpD5c9HsH8XNYC6838=; b=k2UmPgV+YIPrxj bte7n/JxIUMBkjDJS61w2tofRyPKB1RMxKq/NctDtQjoM9HDDs41hQ0WZFXQRGg6gkCykL865Ow5A uOize31wja0O+4r7/nnnVIt+XDDYq0jMqhEio9XxscSKpH1Lq8GyGPFOgwKXHILKu3di1AtlyAFF6 cOkH4+I+haqKolLGc3jsJ5LmqGI4xMNSdkBuOW5lcL7Iq4sQApwwS5kCBzSEQyzJ1YA4q9QjebnzW Wdyc1TCXd9iYz+zsUgy3XsImIf+/rnml16dIxGpSvySacYuqsiHbZgnLChLBICdFIy0nwzNkQaHs0 whYL6D22Y6jzV4vxdk3A==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1iHQZl-0006zk-20; Mon, 07 Oct 2019 10:49:13 +0000 Received: from inca-roads.misterjones.org ([213.251.177.50]) by bombadil.infradead.org with esmtps (Exim 4.92.2 #3 (Red Hat Linux)) id 1iHQZi-0006z0-I3 for linux-arm-kernel@lists.infradead.org; Mon, 07 Oct 2019 10:49:12 +0000 Received: from [185.201.63.254] (helo=big-swifty.misterjones.org) by cheepnis.misterjones.org with esmtpsa (TLSv1.2:AES256-GCM-SHA384:256) (Exim 4.80) (envelope-from ) id 1iHQZd-00073k-7d; Mon, 07 Oct 2019 12:49:07 +0200 Date: Mon, 07 Oct 2019 11:48:33 +0100 Message-ID: <86sgo4zv9a.wl-maz@kernel.org> From: Marc Zyngier To: Andrew Murray , Mark Rutland Subject: Re: [PATCH 3/3] KVM: arm64: pmu: Reset sample period on overflow handling In-Reply-To: <20191007094325.GX42880@e119886-lin.cambridge.arm.com> References: <20191006104636.11194-1-maz@kernel.org> <20191006104636.11194-4-maz@kernel.org> <20191007094325.GX42880@e119886-lin.cambridge.arm.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/26 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") X-SA-Exim-Connect-IP: 185.201.63.254 X-SA-Exim-Rcpt-To: andrew.murray@arm.com, mark.rutland@arm.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, julien.thierry.kdev@gmail.com, james.morse@arm.com, suzuki.poulose@arm.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on cheepnis.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20191007_034910_742112_23543081 X-CRM114-Status: GOOD ( 38.06 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kvm@vger.kernel.org, Suzuki K Poulose , James Morse , linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, Julien Thierry Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, 07 Oct 2019 10:43:27 +0100, Andrew Murray wrote: > > On Sun, Oct 06, 2019 at 11:46:36AM +0100, maz@kernel.org wrote: > > From: Marc Zyngier > > > > The PMU emulation code uses the perf event sample period to trigger > > the overflow detection. This works fine for the *first* overflow > > handling > > Although, even though the first overflow is timed correctly, the value > the guest reads may be wrong... > > Assuming a Linux guest with the arm_pmu.c driver, if I recall correctly > this writes the -remainingperiod to the counter upon stopping/starting. > In the case of a perf_event that is pinned to a task, this will happen > upon every context switch of that task. If the counter was getting close > to overflow before the context switch, then the value written to the > guest counter will be very high and thus the sample_period written in KVM > will be very low... > > The best scenario is when the host handles the overflow, the guest > handles its overflow and rewrites the guest counter (resulting in a new > host perf_event) - all before the first host perf_event fires again. This > is clearly the assumption the code makes. > > Or - the host handles its overflow and kicks the guest, but the guest > doesn't respond in time, so we end up endlessly and pointlessly kicking it > for each host overflow - thus resulting in the large difference between number > of interrupts between host and guest. This isn't ideal, because when the > guest does read its counter, the value isn't correct (because it overflowed > a zillion times at a value less than the arrchitected max). > > Worse still is when the sample_period is so small, the host doesn't > even keep up. Well, there are plenty of ways to make this code go mad. The overarching reason is that we abuse the notion of sampling period to generate interrupts, while what we'd really like is something that says "call be back in that many events", rather than the sampling period which doesn't match the architecture. Yes, small values will results in large drifts. Nothing we can do about it. > > > , but results in a huge number of interrupts on the host, > > unrelated to the number of interrupts handled in the guest (a x20 > > factor is pretty common for the cycle counter). On a slow system > > (such as a SW model), this can result in the guest only making > > forward progress at a glacial pace. > > > > It turns out that the clue is in the name. The sample period is > > exactly that: a period. And once the an overflow has occured, > > the following period should be the full width of the associated > > counter, instead of whatever the guest had initially programed. > > > > Reset the sample period to the architected value in the overflow > > handler, which now results in a number of host interrupts that is > > much closer to the number of interrupts in the guest. > > This seems a reasonable pragmatic approach - though of course you will end > up counting slightly slower due to the host interrupt latency. But that's > better than the status quo. Slower than what? > > It may be possible with perf to have a single-fire counter (this mitigates > against my third scenario but you still end up with a loss of precision) - > See PERF_EVENT_IOC_REFRESH. Unfortunately, that's a userspace interface, not something that's available to the kernel at large... > Ideally the PERF_EVENT_IOC_REFRESH type of functionality could be updated > to reload to a different value after the first hit. Which is what I was hinting at above. I'd like a way to reload the next period on each expiration, much like a timer. > > This problem also exists on arch/x86/kvm/pmu.c (though I'm not sure what > their PMU drivers do with respect to the value they write). > > > > > Fixes: b02386eb7dac ("arm64: KVM: Add PMU overflow interrupt routing") > > Signed-off-by: Marc Zyngier > > --- > > virt/kvm/arm/pmu.c | 12 ++++++++++++ > > 1 file changed, 12 insertions(+) > > > > diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c > > index c30c3a74fc7f..3ca4761fc0f5 100644 > > --- a/virt/kvm/arm/pmu.c > > +++ b/virt/kvm/arm/pmu.c > > @@ -444,6 +444,18 @@ static void kvm_pmu_perf_overflow(struct perf_event *perf_event, > > struct kvm_pmc *pmc = perf_event->overflow_handler_context; > > struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc); > > int idx = pmc->idx; > > + u64 val, period; > > + > > + /* Start by resetting the sample period to the architectural limit */ > > + val = kvm_pmu_get_pair_counter_value(vcpu, pmc); > > + > > + if (kvm_pmu_idx_is_64bit(vcpu, pmc->idx)) > > This is correct, because in this case we *do* care about _PMCR_LC. > > > + period = (-val) & GENMASK(63, 0); > > + else > > + period = (-val) & GENMASK(31, 0); > > + > > + pmc->perf_event->attr.sample_period = period; > > + pmc->perf_event->hw.sample_period = period; > > I'm not sure about the above line - does direct manipulation of sample_period > work on a running perf event? As far as I can tell this is already done in the > kernel with __perf_event_period - however this also does other stuff (such as > disable and re-enable the event). I'm not sure you could do that in the handler, which is run in atomic context. It doesn't look like anything bad happens when updating the sample period directly (the whole thing has stopped getting crazy), but I'd really like someone who understands the perf internals to help here (hence Mark being on cc). Thanks, M. -- Jazz is not dead, it just smells funny. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel