From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5F30C433DF for ; Fri, 21 Aug 2020 12:52:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7910E2078D for ; Fri, 21 Aug 2020 12:52:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="iFej8B8a" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728253AbgHUMwH (ORCPT ); Fri, 21 Aug 2020 08:52:07 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:57720 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726975AbgHUMwA (ORCPT ); Fri, 21 Aug 2020 08:52:00 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 07LCpXDa159935; Fri, 21 Aug 2020 12:51:44 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : in-reply-to : message-id : references : mime-version : content-type; s=corp-2020-01-29; bh=QtlQgPEUKDqI5Mad7y9Qd6Mzu/D8Zuxsc3o+KleFeX8=; b=iFej8B8a/KDLZuAPCNHO/3vA+6IaN+IUIU4CI4xI/YmHvefQxixzK/RVURu6bJz3NRIB ACPMkx7Uxm78v+/3DRaENIt5OnMPW1KFoRKSaGQsLF0MBbWifWZeFkEg2KmESasH9/6q B0R01o/jW7B2KdD7/Q7vJYQ1BJBrMYIW/O/0aaeCKD/nrDPB/O6HBpKaiuiIOxt7yT35 m9jBUI1Kql/Hs17rVo0xFYSCTEhl1E7eTDewdTZAgrSohiFXTPitYVCJ1RFz9mMdNN+j KJJwWbKgqWHb99YFYKIxxBuWBMys43EP1NB5Ut11GeEuW+qJt8FUbytPjNHBnP3xMbry jA== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 3322bjjdhk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 21 Aug 2020 12:51:43 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 07LCmNbo108411; Fri, 21 Aug 2020 12:51:43 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3020.oracle.com with ESMTP id 332536q3f0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 21 Aug 2020 12:51:42 +0000 Received: from abhmp0005.oracle.com (abhmp0005.oracle.com [141.146.116.11]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 07LCped8028829; Fri, 21 Aug 2020 12:51:40 GMT Received: from dhcp-10-175-209-93.vpn.oracle.com (/10.175.209.93) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 21 Aug 2020 12:51:40 +0000 Date: Fri, 21 Aug 2020 13:51:31 +0100 (IST) From: Alan Maguire X-X-Sender: alan@localhost To: Andrii Nakryiko cc: bpf@vger.kernel.org, netdev@vger.kernel.org, ast@fb.com, daniel@iogearbox.net, andrii.nakryiko@gmail.com, kernel-team@fb.com Subject: Re: [PATCH bpf-next] libbpf: add perf_buffer APIs for better integration with outside epoll loop In-Reply-To: <20200821025448.2087055-1-andriin@fb.com> Message-ID: References: <20200821025448.2087055-1-andriin@fb.com> User-Agent: Alpine 2.21 (LRH 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9719 signatures=668679 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=11 mlxscore=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 mlxlogscore=999 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2008210117 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9719 signatures=668679 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 adultscore=0 mlxscore=0 mlxlogscore=999 phishscore=0 suspectscore=11 lowpriorityscore=0 spamscore=0 impostorscore=0 clxscore=1011 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2008210118 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Thu, 20 Aug 2020, Andrii Nakryiko wrote: > Add a set of APIs to perf_buffer manage to allow applications to integrate > perf buffer polling into existing epoll-based infrastructure. One example is > applications using libevent already and wanting to plug perf_buffer polling, > instead of relying on perf_buffer__poll() and waste an extra thread to do it. > But perf_buffer is still extremely useful to set up and consume perf buffer > rings even for such use cases. > > So to accomodate such new use cases, add three new APIs: > - perf_buffer__buffer_cnt() returns number of per-CPU buffers maintained by > given instance of perf_buffer manager; > - perf_buffer__buffer_fd() returns FD of perf_event corresponding to > a specified per-CPU buffer; this FD is then polled independently; > - perf_buffer__consume_buffer() consumes data from single per-CPU buffer, > identified by its slot index. > > These APIs allow for great flexiblity, but do not sacrifice general usability > of perf_buffer. > This is great! If I understand correctly, you're supporting the retrieval and ultimately insertion of the individual per-cpu buffer fds into another epoll()ed fd. I've been exploring another possibility - hierarchical epoll, where the top-level perf_buffer epoll_fd field is used rather than the individual per-cpu buffers. In that context, would an interface to return the perf_buffer epoll_fd make sense too? i.e. int perf_buffer__fd(const struct perf_buffer *pb); ? When events occur for the perf_buffer__fd, we can simply call perf_buffer__poll(perf_buffer__fd(pb), ...) to handle them it seems. That approach _appears_ to work, though I do see occasional event loss. Is that method legit too or am I missing something? > Also exercise and check new APIs in perf_buffer selftest. > > Signed-off-by: Andrii Nakryiko A few question around the test below, but Reviewed-by: Alan Maguire > --- > tools/lib/bpf/libbpf.c | 51 ++++++++++++++- > tools/lib/bpf/libbpf.h | 3 + > tools/lib/bpf/libbpf.map | 7 +++ > .../selftests/bpf/prog_tests/perf_buffer.c | 62 +++++++++++++++---- > 4 files changed, 111 insertions(+), 12 deletions(-) > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > index 0bc1fd813408..a6359d49aa9d 100644 > --- a/tools/lib/bpf/libbpf.c > +++ b/tools/lib/bpf/libbpf.c > @@ -9390,6 +9390,55 @@ int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms) > return cnt < 0 ? -errno : cnt; > } > > +/* Return number of PERF_EVENT_ARRAY map slots set up by this perf_buffer > + * manager. > + */ > +size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb) > +{ > + return pb->cpu_cnt; > +} > + > +/* > + * Return perf_event FD of a ring buffer in *buf_idx* slot of > + * PERF_EVENT_ARRAY BPF map. This FD can be polled for new data using > + * select()/poll()/epoll() Linux syscalls. > + */ > +int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx) > +{ > + struct perf_cpu_buf *cpu_buf; > + > + if (buf_idx >= pb->cpu_cnt) > + return -EINVAL; > + > + cpu_buf = pb->cpu_bufs[buf_idx]; > + if (!cpu_buf) > + return -ENOENT; > + > + return cpu_buf->fd; > +} > + > +/* > + * Consume data from perf ring buffer corresponding to slot *buf_idx* in > + * PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to > + * consume, do nothing and return success. > + * Returns: > + * - 0 on success; > + * - <0 on failure. > + */ > +int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx) > +{ > + struct perf_cpu_buf *cpu_buf; > + > + if (buf_idx >= pb->cpu_cnt) > + return -EINVAL; > + > + cpu_buf = pb->cpu_bufs[buf_idx]; > + if (!cpu_buf) > + return -ENOENT; > + > + return perf_buffer__process_records(pb, cpu_buf); > +} > + > int perf_buffer__consume(struct perf_buffer *pb) > { > int i, err; > @@ -9402,7 +9451,7 @@ int perf_buffer__consume(struct perf_buffer *pb) > > err = perf_buffer__process_records(pb, cpu_buf); > if (err) { > - pr_warn("error while processing records: %d\n", err); > + pr_warn("perf_buffer: failed to process records in buffer #%d: %d\n", i, err); > return err; > } > } > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h > index 5ecb4069a9f0..15e02dcda2c7 100644 > --- a/tools/lib/bpf/libbpf.h > +++ b/tools/lib/bpf/libbpf.h > @@ -590,6 +590,9 @@ perf_buffer__new_raw(int map_fd, size_t page_cnt, > LIBBPF_API void perf_buffer__free(struct perf_buffer *pb); > LIBBPF_API int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms); > LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb); > +LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx); > +LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb); > +LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx); > > typedef enum bpf_perf_event_ret > (*bpf_perf_event_print_t)(struct perf_event_header *hdr, > diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map > index e35bd6cdbdbf..77466958310a 100644 > --- a/tools/lib/bpf/libbpf.map > +++ b/tools/lib/bpf/libbpf.map > @@ -299,3 +299,10 @@ LIBBPF_0.1.0 { > btf__set_fd; > btf__set_pointer_size; > } LIBBPF_0.0.9; > + > +LIBBPF_0.2.0 { > + global: > + perf_buffer__buffer_cnt; > + perf_buffer__buffer_fd; > + perf_buffer__consume_buffer; > +} LIBBPF_0.1.0; > diff --git a/tools/testing/selftests/bpf/prog_tests/perf_buffer.c b/tools/testing/selftests/bpf/prog_tests/perf_buffer.c > index c33ec180b3f2..add224ce17af 100644 > --- a/tools/testing/selftests/bpf/prog_tests/perf_buffer.c > +++ b/tools/testing/selftests/bpf/prog_tests/perf_buffer.c > @@ -7,6 +7,8 @@ > #include "test_perf_buffer.skel.h" > #include "bpf/libbpf_internal.h" > > +static int duration; > + > /* AddressSanitizer sometimes crashes due to data dereference below, due to > * this being mmap()'ed memory. Disable instrumentation with > * no_sanitize_address attribute > @@ -24,13 +26,31 @@ static void on_sample(void *ctx, int cpu, void *data, __u32 size) > CPU_SET(cpu, cpu_seen); > } > > +int trigger_on_cpu(int cpu) > +{ > + cpu_set_t cpu_set; > + int err; > + > + CPU_ZERO(&cpu_set); > + CPU_SET(cpu, &cpu_set); > + > + err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set); > + if (err && CHECK(err, "set_affinity", "cpu #%d, err %d\n", cpu, err)) > + return err; > + > + usleep(1); > + > + return 0; > +} > + > void test_perf_buffer(void) > { > - int err, on_len, nr_on_cpus = 0, nr_cpus, i, duration = 0; > + int err, on_len, nr_on_cpus = 0, nr_cpus, i; > struct perf_buffer_opts pb_opts = {}; > struct test_perf_buffer *skel; > - cpu_set_t cpu_set, cpu_seen; > + cpu_set_t cpu_seen; > struct perf_buffer *pb; > + int last_fd = -1, fd; > bool *online; > > nr_cpus = libbpf_num_possible_cpus(); > @@ -71,16 +91,8 @@ void test_perf_buffer(void) > continue; > } > > - CPU_ZERO(&cpu_set); > - CPU_SET(i, &cpu_set); > - > - err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), > - &cpu_set); > - if (err && CHECK(err, "set_affinity", "cpu #%d, err %d\n", > - i, err)) > + if (trigger_on_cpu(i)) > goto out_close; > - > - usleep(1); > } > > /* read perf buffer */ > @@ -92,6 +104,34 @@ void test_perf_buffer(void) > "expect %d, seen %d\n", nr_on_cpus, CPU_COUNT(&cpu_seen))) > goto out_free_pb; > > + if (CHECK(perf_buffer__buffer_cnt(pb) != nr_cpus, "buf_cnt", > + "got %zu, expected %d\n", perf_buffer__buffer_cnt(pb), nr_cpus)) > + goto out_close; > + > + for (i = 0; i < nr_cpus; i++) { > + if (i >= on_len || !online[i]) > + continue; > + > + fd = perf_buffer__buffer_fd(pb, i); > + CHECK(last_fd == fd, "fd_check", "last fd %d == fd %d\n", last_fd, fd); > + last_fd = fd; > + I'm not sure why you're testing this way - shouldn't it just be a verification of whether we get an unexpected error code rather than a valid fd? > + err = perf_buffer__consume_buffer(pb, i); > + if (CHECK(err, "drain_buf", "cpu %d, err %d\n", i, err)) > + goto out_close; > + I think I'm a bit lost in what processes what here. The first perf_buffer__poll() should handle the processing of the records associated with the first set of per-cpu triggering I think. Is the above perf_buffer__consume_buffer() checking the "no data, return success" case? If that's right should we do something to explicitly check it indeed was a no-op, like CHECK()ing CPU_ISSET(i, &cpu_seen) to ensure the on_sample() handler wasn't called? The "drain_buf" description makes me think I'm misreading this and we're draining additional events, so I wanted to check what's going on here to make sure. Thanks! Alan