From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 96ED919D8B8; Tue, 27 Aug 2024 09:20:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724750439; cv=none; b=ULOoRXonmQfRMuEz9lpGSUuUk0qy6RnIW8BNF98rKg0kHfUmSGEVHDaLMslIyXZfaKfkBlFr16OBQj7tJXQuUnzq3ZZ6OxduDElruCbUrhlkWjq9h5PlfeDeLzu3npxKbw3GCBqFG2BLYXNkKtekZG0hPS+/Tu4MbXLqbwGBA1M= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724750439; c=relaxed/simple; bh=J9ep6SvvhVvWae5Siyf9xQ4d9pIREuEisVKN1WZ0ypg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Ttb7lLU/Fha59skwlWWcKLJt8WExxgTC+/PNgaIlR24aso6nfLmhOyvtJQpUAU0kJ0r7/gd2S7nL+TfwKrHqV5xU5J1+FG6rOQT2ibF6iDIbngLUod05+EyqBNFL9tdBgfrG321YiX7TSdE93RFD8kAUS+ZyMmLoojJv/m3Lz38= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Qlk/3bzO; arc=none smtp.client-ip=209.85.210.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Qlk/3bzO" Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-71446fefddfso2830944b3a.0; Tue, 27 Aug 2024 02:20:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724750437; x=1725355237; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wrpdlMQ1W0rFqxCLf05gZ+eKlSN802JEZH3ICSjn+mU=; b=Qlk/3bzOqvvFUDgIEiAdo/9iRqBJsIVA9vYCqxEDCYdjOGgJIpVm+0AV7FSTlCHvEj ko40E22sy5+RZF3Wp/ir0GIfMfu6mIFvJ4z4lmSGK4c4R5twO4sFuzKdWtamxoPSyduA mPPl/rMUb1OGTlvPVqd76B90cPVh/jQ80uw2O1ULlYHRoydk5ae64d2sGTwGBb4ZopqY DBsIvkCCg4DrQzFgTRUm2jdnVqFh57hy8MxHiFdWDg0LyUp78HZ1+j9kz2g9DMXHAKAt gyDZhU2f0yl9rrfi9o1j3MA4iTUpOLDJ4bIJiBWof6ESXPTJyskpxySyVP8S6QatJErg ROqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724750437; x=1725355237; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wrpdlMQ1W0rFqxCLf05gZ+eKlSN802JEZH3ICSjn+mU=; b=jUToeghT1vgziBaiUey8/KakhjZA21DeXZlCseDzIHfEiigYh8leHk9Wm/ypM0RDrT HZxwUYVwoQQc1DI0UdCJvcpp2o3KlVqGxDYliUcRBsdknEvWcwlXa1aZuyS4JGtzHR3l F17ERaHqBeKnpXAxm8frKcHHaI4Qj5KTtjXyrhRoMV73u92Cz1XaPegNjcljoVQBj+GY ScSDX/mxuLGsTc4AcrF3zC7O2PEQxVIVZ9J1h02dUQZqVkvYWOMDh+X5IlrPUlZ0Aur6 VzK/ksm99UjYx4OY1S6uDLrhXD/szx5WJJd/Dkig6k3YlSuFOmg4WXiibBDjaE4PfGR7 3L7Q== X-Forwarded-Encrypted: i=1; AJvYcCURdoY3qlruNVy3nLBvRv+nHVwRVnaqeQp4f80mZ/00abwFu0X0cJzn0uXLzQFy9oj4KY7zKBO0yZiBHEISCm0Okw==@vger.kernel.org, AJvYcCWRx/Wbj4bK8JitSNOFjvU9wigxxCO/gLYKZ2K0IcMlI57lx9Q7oxWedo4w0wZfu0EfLUCIV6pkBrEMj1o=@vger.kernel.org X-Gm-Message-State: AOJu0Yz5naGPz0KpktKuGnarjKHWCcHpRVYjq0xl5eNfE34trksqczp9 4ZTg93qFW9cj/+yyatWF9pDewPPh96g5L5oa/Bto7OmRjt0IN8HI X-Google-Smtp-Source: AGHT+IFGXDFRw5VzdW5zdoyd7BG1p/ClFVmdN+5eYllG2ePkcd0IhsdwnjOR6L1/7bnpjJ2gHWN1Kg== X-Received: by 2002:a05:6a20:cf8d:b0:1c4:9f31:ac8f with SMTP id adf61e73a8af0-1cc89ee960emr13469666637.37.1724750436768; Tue, 27 Aug 2024 02:20:36 -0700 (PDT) Received: from localhost.localdomain ([2600:1700:ab14:280:f82b:883b:12eb:74c7]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2d6136fdaeasm11523939a91.7.2024.08.27.02.20.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Aug 2024 02:20:36 -0700 (PDT) From: Howard Chu To: acme@kernel.org Cc: namhyung@kernel.org, irogers@google.com, jolsa@kernel.org, adrian.hunter@intel.com, kan.liang@linux.intel.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Howard Chu Subject: [PATCH v1 2/2] perf trace: Use pid to index perf_event in BPF Date: Tue, 27 Aug 2024 17:20:13 +0800 Message-Id: <20240827092013.1596-3-howardchu95@gmail.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20240827092013.1596-1-howardchu95@gmail.com> References: <20240827092013.1596-1-howardchu95@gmail.com> Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Currently, perf trace -p is broken for some syscalls. This patch fixes the it. Before: perf $ perf trace -e open -p 79768 ? ( ): ... [continued]: open()) = -1 ENOENT (No such file or directory) ? ( ): ... [continued]: open()) = -1 ENOENT (No such file or directory) ? ( ): ... [continued]: open()) = -1 ENOENT (No such file or directory) After: perf $ ./perf trace -e open -p 79768 0.000 ( 0.019 ms): open(filename: "DINGZHEN", flags: WRONLY) = -1 ENOENT (No such file or directory) 1000.187 ( 0.031 ms): open(filename: "DINGZHEN", flags: WRONLY) = -1 ENOENT (No such file or directory) 2000.377 ( 0.019 ms): open(filename: "DINGZHEN", flags: WRONLY) = -1 ENOENT (No such file or directory) This is because when using -p in perf trace, we mmap the pids instead of cpus. But in BPF, we tend to use a per-cpu mapped perf_event to output the augmented data (such as using BPF_F_CURRENT_CPU). That means the index for perf_event map is cpu. When we are using -p , there is "cpu = -1, pid = ". perf_event_map [-1] = target_perf_event_of_this_pid This -1 index will never work in BPF. So my original solution is to map every cpu on this single pid, which is: perf_event_map [0] = target_perf_event_of_this_pid [1] = target_perf_event_of_this_pid [2] = target_perf_event_of_this_pid [3] = target_perf_event_of_this_pid But that will cause * times sys_perf_event_open. So Namhyung's solution is to introduce a new map. I call it pid2perf_event. pid2perf_event_map [pid] = perf_event_index and then: perf_event_map [perf_event_index] = target_perf_event_of_this_pid we use pid to get the correct index in perf_event map, and retrieve the correct perf_event using this index. Suggested-by: Namhyung Kim Signed-off-by: Howard Chu --- tools/perf/builtin-trace.c | 55 +++++++++++++++---- .../bpf_skel/augmented_raw_syscalls.bpf.c | 33 +++++++++-- tools/perf/util/evlist.c | 2 +- 3 files changed, 72 insertions(+), 18 deletions(-) diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c index d38e0b919e8e..f9ff65c3d4d2 100644 --- a/tools/perf/builtin-trace.c +++ b/tools/perf/builtin-trace.c @@ -3920,6 +3920,7 @@ static int trace__set_allowed_pids(struct trace *trace) struct strlist *pids_slist = strlist__new(trace->opts.target.pid, NULL); trace->skel->bss->task_specific = false; + trace->skel->bss->is_workload = false; if (pids_slist) { strlist__for_each_entry(pos, pids_slist) { @@ -3944,6 +3945,7 @@ static int trace__set_allowed_pids(struct trace *trace) return err; trace->skel->bss->task_specific = true; + trace->skel->bss->is_workload = true; } strlist__delete(pids_slist); @@ -4321,18 +4323,49 @@ static int trace__run(struct trace *trace, int argc, const char **argv) goto out_error_open; #ifdef HAVE_BPF_SKEL if (trace->syscalls.events.bpf_output) { - struct perf_cpu cpu; + if (trace->opts.target.pid) { + /* + * perf_event map is supposed to be a cpu to perf_event mapping, which is + * different from which when we specified -p, with cpu = -1, pid = . + * In this case, we treat perf_event map as an array and ignore the cpu + * mapping side of it, and use pid to retrieve the correct index to its + * corresponding perf_event. + */ + int j = 0; + struct perf_thread_map *threads; + struct evsel *evsel_aug_sys = evlist__find_evsel_by_str(trace->evlist, "__augmented_syscalls__"); - /* - * Set up the __augmented_syscalls__ BPF map to hold for each - * CPU the bpf-output event's file descriptor. - */ - perf_cpu_map__for_each_cpu(cpu, i, trace->syscalls.events.bpf_output->core.cpus) { - bpf_map__update_elem(trace->skel->maps.__augmented_syscalls__, - &cpu.cpu, sizeof(int), - xyarray__entry(trace->syscalls.events.bpf_output->core.fd, - cpu.cpu, 0), - sizeof(__u32), BPF_ANY); + if (evsel_aug_sys == NULL) + goto out_error; + + threads = evsel_aug_sys->core.threads; + + for (int thread = 0; thread < perf_thread_map__nr(threads); thread++, j++) { + pid_t pid = perf_thread_map__pid(threads, thread); + + bpf_map__update_elem(trace->skel->maps.pid2perf_event, &pid, sizeof(pid_t), + &j, sizeof(int), BPF_ANY); + + bpf_map__update_elem(trace->skel->maps.__augmented_syscalls__, + &j, sizeof(int), + xyarray__entry(trace->syscalls.events.bpf_output->core.fd, + 0, j), + sizeof(__u32), BPF_ANY); + } + } else { + struct perf_cpu cpu; + + /* + * Set up the __augmented_syscalls__ BPF map to hold for each + * CPU the bpf-output event's file descriptor. + */ + perf_cpu_map__for_each_cpu(cpu, i, trace->syscalls.events.bpf_output->core.cpus) { + bpf_map__update_elem(trace->skel->maps.__augmented_syscalls__, + &cpu.cpu, sizeof(int), + xyarray__entry(trace->syscalls.events.bpf_output->core.fd, + cpu.cpu, 0), + sizeof(__u32), BPF_ANY); + } } } #endif diff --git a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c index 1ab0a56c8f35..ef8aa0bd2275 100644 --- a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c +++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c @@ -25,6 +25,7 @@ #define MAX_CPUS 4096 volatile bool task_specific; +volatile bool is_workload; /* bpf-output associated map */ struct __augmented_syscalls__ { @@ -90,6 +91,13 @@ struct pids_allowed { __uint(max_entries, 512); } pids_allowed SEC(".maps"); +struct pid2perf_event { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, pid_t); + __type(value, int); + __uint(max_entries, MAX_CPUS); +} pid2perf_event SEC(".maps"); + /* * Desired design of maximum size and alignment (see RFC2553) */ @@ -154,6 +162,11 @@ struct beauty_payload_enter_map { __uint(max_entries, 1); } beauty_payload_enter_map SEC(".maps"); +static pid_t getpid(void) +{ + return bpf_get_current_pid_tgid(); +} + static inline struct augmented_args_payload *augmented_args_payload(void) { int key = 0; @@ -168,7 +181,20 @@ static inline int augmented__output(void *ctx, struct augmented_args_payload *ar static inline int augmented__beauty_output(void *ctx, void *data, int len) { - return bpf_perf_event_output(ctx, &__augmented_syscalls__, BPF_F_CURRENT_CPU, data, len); + /* + * when it's cpu = -1 pid = PID, we look up the perf_event for this PID. Workload is + * per-cpu mapped so we don't do so. + */ + if (task_specific && !is_workload) { + pid_t pid = getpid(); + u32 *perf_event = bpf_map_lookup_elem(&pid2perf_event, &pid); + if (perf_event) + return bpf_perf_event_output(ctx, &__augmented_syscalls__, *perf_event, data, len); + } else { + return bpf_perf_event_output(ctx, &__augmented_syscalls__, BPF_F_CURRENT_CPU, data, len); + } + + return -1; } static inline @@ -397,11 +423,6 @@ int sys_enter_nanosleep(struct syscall_enter_args *args) return 1; /* Failure: don't filter */ } -static pid_t getpid(void) -{ - return bpf_get_current_pid_tgid(); -} - static inline bool should_filter() { pid_t pid = getpid(); diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index f14b7e6ff1dc..ef58a7764318 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1067,7 +1067,7 @@ int evlist__create_maps(struct evlist *evlist, struct target *target) if (!threads) return -1; - if (target__uses_dummy_map(target) && !evlist__has_bpf_output(evlist)) + if (target__uses_dummy_map(target)) cpus = perf_cpu_map__new_any_cpu(); else cpus = perf_cpu_map__new(target->cpu_list); -- 2.46.0