From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AA956C25B75 for ; Fri, 31 May 2024 13:01:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=CcExRalPIAaJTFc/oogB74/NE52LiPSg+xip/nuAbPM=; b=zivEaCCUypPKIJ ReZt6+uouLLk0E2AbTtbUyKAUZyjiJU/WGVGsAYbtHDTlkJ8DaVI+nhdWnO1Zj0RxUIK24sv32vhZ b+fW3OJtFB6hwd7SZ9aYV2Gdioqy8RUt6+z+QVyFJNBWaCJQr9Q+/64/iUAyHPdJd2MKYKGBuKSen bJFbF1Ji7RaW23hQjq5jPQAuYQVJl4u2UVjgwkLHcJM2eKFtor3GVaQq531OUYUGK//X27J4RNg1c UkGnumdXd+x3uTxFpDuDTorO04mC6fNL4n+x+eimt7RXLWiqEjG18ITUlPj9aqhCFE1PSb0pnbJzc I8/iUlqSYq9F0tm+Lojw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sD1sl-0000000AGhE-35uK; Fri, 31 May 2024 13:01:19 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sD1sj-0000000AGfw-0teP for linux-arm-kernel@lists.infradead.org; Fri, 31 May 2024 13:01:18 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 47FA11424; Fri, 31 May 2024 06:01:40 -0700 (PDT) Received: from J2N7QTR9R3 (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8B9013F641; Fri, 31 May 2024 06:01:13 -0700 (PDT) Date: Fri, 31 May 2024 14:01:08 +0100 From: Mark Rutland To: James Clark , Peter Zijlstra Cc: Anshuman Khandual , Mark Brown , Rob Herring , Marc Zyngier , Suzuki Poulose , Ingo Molnar , Arnaldo Carvalho de Melo , linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, will@kernel.org, catalin.marinas@arm.com Subject: Re: [PATCH V17 0/9] arm64/perf: Enable branch stack sampling Message-ID: References: <20240405024639.1179064-1-anshuman.khandual@arm.com> <80d33844-bdd2-4fee-81dd-9cd37c63d42c@arm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240531_060117_389504_B6C33A9E X-CRM114-Status: GOOD ( 27.95 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, May 30, 2024 at 06:41:14PM +0100, Mark Rutland wrote: > On Thu, May 30, 2024 at 10:47:34AM +0100, James Clark wrote: > > On 05/04/2024 03:46, Anshuman Khandual wrote: > > > ------------------ Possible 'branch_sample_type' Mismatch ----------------- > > > > > > Branch stack sampling attributes 'event->attr.branch_sample_type' generally > > > remain the same for all the events during a perf record session. > > > > > > $perf record -e -e -j [workload] > > > > > > event_1->attr.branch_sample_type == event_2->attr.branch_sample_type > > > > > > This 'branch_sample_type' is used to configure the BRBE hardware, when both > > > events i.e and get scheduled on a given PMU. But during > > > PMU HW event's privilege filter inheritance, 'branch_sample_type' does not > > > remain the same for all events. Let's consider the following example > > > > > > $perf record -e cycles:u -e instructions:k -j any,save_type ls > > > > > > cycles->attr.branch_sample_type != instructions->attr.branch_sample_type > > > > > > Because cycles event inherits PERF_SAMPLE_BRANCH_USER and instruction event > > > inherits PERF_SAMPLE_BRANCH_KERNEL. The proposed solution here configures > > > BRBE hardware with 'branch_sample_type' from last event to be added in the > > > PMU and hence captured branch records only get passed on to matching events > > > during a PMU interrupt. > > > > > > > Hi Anshuman, > > > > Surely because of this example we should merge? At least we have to try > > to make the most common basic command lines work. Unless we expect all > > tools to know whether the branch buffer is shared between PMUs on each > > architecture or not. The driver knows though, so can merge the settings > > because it all has to go into one BRBE. > > The difficulty here is that these are opened as independent events (not > in the same event group), and so from the driver's PoV, this is no > different two two users independently doing: > > perf record -e event:u -j any,save_type -p ${SOME_PID} > > perf record -e event:k -j any,save_type -p ${SOME_PID} > > .. where either would be surprised to get the merged result. I took a look at how x86 handles this, and it looks like they may have the problem we'd like to avoid. AFAICT, intel_pmu_lbr_add() blats cpuc->br_sel with the branch selection of the last event added, and So I took a look at what happens on my x86-64 desktop running v5.10.0-9-amd64 from Debian 11. Running the following program: | int main (int argc, char *argv[]) | { | for (;;) { | asm volatile("" ::: "memory"); | } | | return 0; | } I set /proc/sys/kernel/perf_event_paranoid to 2 and started two independent perf sessions: perf record -e cycles:u -j any -o perf-user.data -p 1320224 sudo perf record -e cycles:k -j any -o perf-kernel.data -p 1320224 ... after ~10 seconds, I killed both sessions with ^C. When i susbsequently do 'perf report -i perf-kernel.data, I see: | Samples: 295 of event 'cycles:k', Event count (approx.): 295 | Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles | 99.66% loop loop [k] main [k] main - | 0.34% loop [kernel.kallsyms] [k] native_irq_return_iret [k] main - ... where the user symbols are surprising. Similarly for 'perf report -i perf-user.data', I see: | Samples: 198K of event 'cycles:u', Event count (approx.): 198739 | Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles | 99.99% loop loop [.] main [.] main - | 0.00% loop [unknown] [.] 0xffffffff87801007 [.] main - | 0.00% loop [unknown] [.] 0xffffffff86e05626 [.] 0xffffffff86e05629 - | 0.00% loop [unknown] [.] 0xffffffff86e0563d [.] 0xffffffff86e0c850 - | 0.00% loop [unknown] [.] 0xffffffff86e0c86f [.] 0xffffffff86e6b3f0 - | 0.00% loop [unknown] [.] 0xffffffff86e0c884 [.] 0xffffffff86e11ed0 - | 0.00% loop [unknown] [.] 0xffffffff86e0c88a [.] 0xffffffff86e13850 - | 0.00% loop [unknown] [.] 0xffffffff86e11eee [.] 0xffffffff86e0c889 - | 0.00% loop [unknown] [.] 0xffffffff86e13885 [.] 0xffffffff86e13888 - | 0.00% loop [unknown] [.] 0xffffffff86e13889 [.] 0xffffffff86e138a1 - | 0.00% loop [unknown] [.] 0xffffffff86e138a9 [.] 0xffffffff86e6b320 - | 0.00% loop [unknown] [.] 0xffffffff86e138c3 [.] 0xffffffff86e6b3f0 - | 0.00% loop [unknown] [.] 0xffffffff86e6b33a [.] 0xffffffff86e138ae - | 0.00% loop [unknown] [.] 0xffffffff86e6b3fb [.] 0xffffffff86e0c874 - | 0.00% loop [unknown] [.] 0xffffffff86ff6c91 [.] 0xffffffff87a01ca0 - | 0.00% loop [unknown] [.] 0xffffffff87a01ca0 [.] 0xffffffff87a01ca5 - | 0.00% loop [unknown] [.] 0xffffffff87a01ca5 [.] 0xffffffff87a01cb1 - | 0.00% loop [unknown] [.] 0xffffffff87a01cb5 [.] 0xffffffff86e05600 - Where the unknown (kernel!) samples are surprising. Peter, do you have any opinion on this? My thinking is that the "last scheduled event branch selection wins" isn't the behaviour we actually want, and either: (a) Conflicting events shouldn't be scheduled concurrently (e.g. treat that like running out of counters). (b) The HW filters should be configured to allow anything permited by any of the events, and SW filtering should remove the unexpected records on a per-event basis. ... but I imagine (b) is hard maybe? I don't know if LBR tells you which CPU mode the src/dst were in. Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel