From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8975AC433EF for ; Tue, 24 May 2022 07:55:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235465AbiEXHzM (ORCPT ); Tue, 24 May 2022 03:55:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235425AbiEXHzG (ORCPT ); Tue, 24 May 2022 03:55:06 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1E44985AC for ; Tue, 24 May 2022 00:55:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1653378905; x=1684914905; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=LwejdRyVv+pfwyT3YLqpqBhrKOTcuxVfihOUoNBOtrM=; b=TyEoy84v6tn8U6I3x4dZ80QzPzJaiCLCMRX0C/reZu99NagOMLtxye0j GBccNblxiBuic9U64q50TvIt5D5bv3kFYza8dUET3V+lRFTyhPHJK80IL RZtim8RAs0PVCFFLqb2U8bRQDmAaIcJieq6dKYVEGuf9yM/r3QqmUFdAj 5xxiGmv13jqa4zBdIdLQFP81sKaaD67tVCNEayKRyIi0h+VMGj2gVij2h eaefr42PPN+dfwkWb+t1pkLqz0skSbroWDqrut8Bo9K5IxD79Qkd5uUjv wezjksursIpnDVcUcHtF4yrcc7NkIjTpk9QIRImWNx5RvGECT3q/cABKD A==; X-IronPort-AV: E=McAfee;i="6400,9594,10356"; a="273455411" X-IronPort-AV: E=Sophos;i="5.91,248,1647327600"; d="scan'208";a="273455411" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 May 2022 00:55:04 -0700 X-IronPort-AV: E=Sophos;i="5.91,248,1647327600"; d="scan'208";a="717072330" Received: from ahunter6-mobl1.ger.corp.intel.com (HELO ahunter-VirtualBox.home\044ger.corp.intel.com) ([10.252.52.210]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 May 2022 00:55:01 -0700 From: Adrian Hunter To: Arnaldo Carvalho de Melo Cc: Jiri Olsa , Ian Rogers , Alexey Bayduraev , Namhyung Kim , Leo Yan , linux-kernel@vger.kernel.org Subject: [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Date: Tue, 24 May 2022 10:54:21 +0300 Message-Id: <20220524075436.29144-1-adrian.hunter@intel.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Organization: Intel Finland Oy, Registered Address: PL 281, 00181 Helsinki, Business Identity Code: 0357606 - 4, Domiciled in Helsinki Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Here are V4 patches to support capturing Intel PT sideband events such as mmap, task, context switch, text poke etc, on every CPU even when tracing selected user_requested_cpus. That is, when using the perf record -C or --cpu option. This is needed for: 1. text poke: a text poke on any CPU affects all CPUs 2. tracing user space: a user space process can migrate between CPUs so mmap events that happen on a different CPU can be needed to decode a user_requested_cpus CPU. For example: Trace on CPU 1: perf record --kcore -C 1 -e intel_pt// & Start a task on CPU 0: taskset 0x1 testprog & Migrate it to CPU 1: taskset -p 0x2 Stop tracing: kill %1 Prior to these changes there will be errors decoding testprog in userspace because the comm and mmap events for testprog will not have been captured. There is quite a bit of preparation: The first patch is a small Intel PT test for system-wide side band. The test fails before the patches are applied, passed afterwards. perf intel-pt: Add a test for system-wide side band [new in V1] The next 5 patches (now already applied) stop auxtrace mixing up mmap idx between evlist and evsel. That is going to matter when evlist->all_cpus != evlist->user_requested_cpus != evsel->cpus: libperf evsel: Factor out perf_evsel__ioctl() [now applied] libperf evsel: Add perf_evsel__enable_thread() perf evlist: Use libperf functions in evlist__enable_event_idx() perf auxtrace: Move evlist__enable_event_idx() to auxtrace.c perf auxtrace: Do not mix up mmap idx The next 6 patches (first 4 now already applied) stop attempts to auxtrace mmap when it is not an auxtrace event e.g. when mmapping the CPUs on which only sideband is captured: libperf evlist: Remove ->idx() per_cpu parameter libperf evlist: Move ->idx() into mmap_per_evsel() libperf evlist: Add evsel as a parameter to ->idx() perf auxtrace: Record whether an auxtrace mmap is needed perf auxctrace: Add mmap_needed to auxtrace_mmap_params perf auxtrace: Remove auxtrace_mmap_params__set_idx() per_cpu parameter The next 5 patches switch to setting up dummy event maps before adding the evsel so that the evsel is subject to map propagation, primarily to cause addition of the evsel's CPUs to all_cpus. perf evlist: Factor out evlist__dummy_event() perf evlist: Add evlist__add_system_wide_dummy() perf record: Use evlist__add_system_wide_dummy() in record__config_text_poke() perf intel-pt: Use evlist__add_system_wide_dummy() for switch tracking perf intel-pt: Track sideband system-wide when needed The remaining patches make more significant changes. First change from using user_requested_cpus to using all_cpus where necessary: perf tools: Allow all_cpus to be a superset of user_requested_cpus Secondly, mmap all per-thread and all per-cpu events: libperf evlist: Allow mixing per-thread and per-cpu mmaps libperf evlist: Check nr_mmaps is correct [new in V1] Stop using system_wide flag for uncore because it will not work anymore: perf stat: Add requires_cpu flag for uncore libperf evsel: Add comments for booleans [new in V1] Finally change map propagation so that system-wide events retain their cpus and (dummy) threads: perf tools: Allow system-wide events to keep their own CPUs perf tools: Allow system-wide events to keep their own threads Changes in V4: Added Acked-by: Namhyung Kim Added a couple Acked-by: Ian Rogers perf intel-pt: Add a test for system-wide side band Put in commit message that test succeeds only after other patches applied libperf evsel: Add perf_evsel__enable_thread() perf evlist: Use libperf functions in evlist__enable_event_idx() perf auxtrace: Move evlist__enable_event_idx() to auxtrace.c perf auxtrace: Do not mix up mmap idx libperf evlist: Remove ->idx() per_cpu parameter libperf evlist: Move ->idx() into mmap_per_evsel() libperf evlist: Add evsel as a parameter to ->idx() perf auxtrace: Record whether an auxtrace mmap is needed Omitted because already applied libperf evsel: Add comments for booleans Amended comment about own_cpus Changes in V3: perf auxtrace: Add mmap_needed to auxtrace_mmap_params Amended mmap_needed comment perf evlist: Add evlist__add_dummy_on_all_cpus() Amended comment about all CPUs. Changes in V2: Added some Acked-by: Ian Rogers libperf evsel: Add perf_evsel__enable_thread() Use perf_cpu_map__for_each_cpu() perf auxtrace: Add mmap_needed to auxtrace_mmap_params Add documentation comment for mmap_needed perf auxtrace: Remove auxtrace_mmap_params__set_idx() per_cpu parameter Fix missing auxtrace_mmap_params__set_idx change libperf evlist: Check nr_mmaps is correct Remove unused code libperf evsel: Add comments for booleans Amend comments perf evlist: Add evlist__add_dummy_on_all_cpus() Rename evlist__add_system_wide -> evlist__add_on_all_cpus Changed patch subject accordingly perf record: Use evlist__add_dummy_on_all_cpus() in record__config_text_poke() Rename evlist__add_system_wide -> evlist__add_on_all_cpus Changed patch subject accordingly perf intel-pt: Use evlist__add_dummy_on_all_cpus() for switch tracking Rename evlist__add_system_wide -> evlist__add_on_all_cpus Changed patch subject accordingly Changes in V1: perf intel-pt: Add a test for system-wide side band New patch libperf evsel: Factor out perf_evsel__ioctl() Dropped because it has been applied. libperf evsel: Add perf_evsel__enable_thread() Rename variable i -> idx perf auxtrace: Do not mix up mmap idx Rename variable cpu to cpu_map_idx perf tools: Allow all_cpus to be a superset of user_requested_cpus Add Acked-by: Ian Rogers libperf evlist: Allow mixing per-thread and per-cpu mmaps Fix perf_evlist__nr_mmaps() calculation libperf evlist: Check nr_mmaps is correct New patch libperf evsel: Add comments for booleans New patch perf tools: Allow system-wide events to keep their own CPUs perf tools: Allow system-wide events to keep their own threads Adrian Hunter (15): perf intel-pt: Add a test for system-wide side band perf auxtrace: Add mmap_needed to auxtrace_mmap_params perf auxtrace: Remove auxtrace_mmap_params__set_idx() per_cpu parameter perf evlist: Factor out evlist__dummy_event() perf evlist: Add evlist__add_dummy_on_all_cpus() perf record: Use evlist__add_dummy_on_all_cpus() in record__config_text_poke() perf intel-pt: Use evlist__add_dummy_on_all_cpus() for switch tracking perf intel-pt: Track sideband system-wide when needed perf tools: Allow all_cpus to be a superset of user_requested_cpus libperf evlist: Allow mixing per-thread and per-cpu mmaps libperf evlist: Check nr_mmaps is correct perf stat: Add requires_cpu flag for uncore libperf evsel: Add comments for booleans perf tools: Allow system-wide events to keep their own CPUs perf tools: Allow system-wide events to keep their own threads tools/lib/perf/evlist.c | 71 ++++++++++++++------------------- tools/lib/perf/include/internal/evsel.h | 11 +++++ tools/perf/arch/x86/util/intel-pt.c | 31 ++++++-------- tools/perf/builtin-record.c | 39 +++++++----------- tools/perf/builtin-stat.c | 5 +-- tools/perf/tests/shell/test_intel_pt.sh | 71 +++++++++++++++++++++++++++++++++ tools/perf/util/auxtrace.c | 15 +++++-- tools/perf/util/auxtrace.h | 13 ++++-- tools/perf/util/evlist.c | 61 +++++++++++++++++++++++++--- tools/perf/util/evlist.h | 5 +++ tools/perf/util/evsel.c | 1 + tools/perf/util/mmap.c | 4 +- tools/perf/util/parse-events.c | 2 +- 13 files changed, 226 insertions(+), 103 deletions(-) create mode 100755 tools/perf/tests/shell/test_intel_pt.sh Regards Adrian