From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4C9F81F63ED for ; Fri, 17 Jan 2025 10:31:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.52 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737109910; cv=none; b=VawSciyPUOo3YhDTLeg0K9WLGrGoIXI8Q+qPVibQT+A/sg1oEvAwpijP7/P+sKgoLI8XBv/E2eIcoPFdUQFWvrkkH+m6CGR08O0NaG9cPcxCcxL2TK/+YwWHJMG3wqMbDyII73S3+Iheo6dMZlnWt+gOobmmWWMW1LvgUqVJxjU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737109910; c=relaxed/simple; bh=jZtcWczJkWQSm7PfylScafoMlcdX3+Xl7ux2Pxfj7yc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=mSYDA2st7FRDRVnuH0cniovxiG99n/ZwYYrwy7u7M9wQeEo94Elu4YlB/BLB7UCkJDl9lHIQNMm57OFUrmhOT0h148c5NaadZkuL1U434dGJrKCf3elHpMUmiuZgzrgFQvF4yzq3TkGagp6PXuYLVDYVz8COhzuiKDV0b8jSVlQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=isovalent.com; spf=pass smtp.mailfrom=isovalent.com; dkim=pass (2048-bit key) header.d=isovalent.com header.i=@isovalent.com header.b=ZAjNqbSR; arc=none smtp.client-ip=209.85.218.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=isovalent.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=isovalent.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=isovalent.com header.i=@isovalent.com header.b="ZAjNqbSR" Received: by mail-ej1-f52.google.com with SMTP id a640c23a62f3a-aaee2c5ee6eso343883766b.1 for ; Fri, 17 Jan 2025 02:31:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1737109906; x=1737714706; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=cNwFhA5vTka5BCVv0SiX0chrl8U101YQ0RQsv3+bOhc=; b=ZAjNqbSR4diMoiKphyzOXdVtkkBKt4p9T08X8+MTZsIXEYaqHaIpK80vpTYsGPTvKI qpNVMUvOs6sYfa4Y9InfS90VEmmxkMToKgODPM7S5e8vwliOcdzX10OaibnSF+wHIfrk Y99TZbQLlA4nGNiyXC/3Pqp+CdzJv/L5a5tk/R4TfXjpAwhp1UIG/+lg8SikUNdylDsb kc46WvPrq5oM9XZ+cyrQhX6SqUQDedukgmtVqF5OwBw6UvBr3j0tiIzJtXIOji6MgiwI roelck544PJYGR4C4FtjEG14HgXsFikD6oevXDxuEdkVHFLc94s9lAbPalAM8sZ2lf3C tklQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737109906; x=1737714706; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=cNwFhA5vTka5BCVv0SiX0chrl8U101YQ0RQsv3+bOhc=; b=ZI460vUo8mZBB15gYMPJ8oFz7SvPl92X8hKQ5BQIcSAHBmzFNRoxuTMXm6t8pik5En MIHIOeJvJE9nWBuOjSW/nHMxElyPB9Hm91oDbjpkfFqYn5L+HymvB/EBdYRDuinTXTci Bd9hw+uB1I7qK0Yfc5tLqCMXRjJoA4qkwhPyX+uJ/0h9jxF638nSYGJgIeLzBKbgNaV3 Q6I+tnXTfPyfZPn8Pk7jP7mW6SS7psxB0eN1wC6DaClj4fj38M7kQIX7yd3z//Z7/58i M+jElPNedPGdJk/W4dCrU/Uy+sCcNb7e83HFnXxJqglqvgpb14YaPnEP/QKc2TJwauXs /mYg== X-Forwarded-Encrypted: i=1; AJvYcCX+2ZEs5c9FgtasUyFFp8liEzAoQqsRQ+CUkN6/XokZsTzcOvWYXzS/psEiDXlMW1GsVic=@vger.kernel.org X-Gm-Message-State: AOJu0Yz2Q1NxVeKiAXH8/ZxdJ1OcklpfuwsoY577xRD5yqb5IxSxQnLx yvytRdrwFo6lai72IREK++gxwmglaoZrl57LSNX0o3CIzH1kGLEgT784iAHRiHZklRVSwqvS+2p f X-Gm-Gg: ASbGncsONkGCImUnjXlh0RaCAmi4Fz1QxytGtSXLEu0JR61aKuwagciriBEHR1Owxjv i3BUbCZsuKrw6pD5HBseL0RExi/Sq8IZwrAdm+Lq01lEU2eiZG7ygp/HKdPH3e+r+tElhXFc9QV OuVJNIiey6TRd6z3GwSYyRSMPXM8wIg9d0dQoj0T0a5S0mc4nEgOgzedpX0nTQnChmNhA6VWaj2 FZYXah5pWeU6TXygZtMaR2oKGOY6T87rQft/pRRYt1PmA== X-Google-Smtp-Source: AGHT+IHGh2AMhadR21zLBrsnmcBHQzANkF4mAMLKV1tuJsf9gdLnUp6JwYXKtjbhhliM6WEWTqdFqQ== X-Received: by 2002:a17:907:2da6:b0:ab3:25c9:fbf0 with SMTP id a640c23a62f3a-ab38b48c301mr198244466b.56.1737109906438; Fri, 17 Jan 2025 02:31:46 -0800 (PST) Received: from eis ([2a04:ee41:4:b2de:1ac0:4dff:fe0f:3782]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-ab384f87065sm146789566b.133.2025.01.17.02.31.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jan 2025 02:31:45 -0800 (PST) Date: Fri, 17 Jan 2025 10:35:47 +0000 From: Anton Protopopov To: Nick Zavaritsky Cc: Charalampos Stylianopoulos , Daniel Borkmann , bpf@vger.kernel.org, Alexei Starovoitov , aspsk2@gmail.com Subject: Re: [PATCH bpf-next 0/4] expose number of map entries to userspace Message-ID: References: <20250106145328.399610-1-charalampos.stylianopoulos@gmail.com> <28acb589-6632-4250-a8ca-00eacda03305@iogearbox.net> <68891842-975E-48B0-AED5-875F3ABC5F49@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On 25/01/16 06:52PM, Nick Zavaritsky wrote: > > > On 16. Jan 2025, at 15:59, Anton Protopopov wrote: > > > > On 25/01/14 12:38PM, Nick Zavaritsky wrote: > >> > >>> On 9. Jan 2025, at 18:37, Anton Protopopov wrote: > >>> > >>> On 25/01/07 12:10PM, Charalampos Stylianopoulos wrote: > >>>> (sorry for double posting, this time in plain text) > >>>> Thanks a lot for the feedback! > >>>> > >>>> So, to double check, the suggestion is to only extend the libbpf API > >>>> with a new helper that does pretty much what get_cur_elements() does > >>>> in tools/testing/selftests/bpf/map_tests/map_percpu_stats.c ? > >>> > >>> What is your use case for getting the number of elements in a > >>> particular map? Will it work for you to just use a variant of > >>> get_cur_elements() from selftests vs. adding new API to libbpf? > >> > >> (On behalf of Charalampos Stylianopoulos) we would like to get the > >> number of elements in some maps for monitoring purposes. The end goal is > >> to get someone paged when a fixed-capacity map is about to start > >> rejecting inserts. > >> > >> We aim to operate a large number of apps in containers (custom packet > >> processing services, telekom). We find it most convenient for an app > >> itself to expose metrics concerning the maps it has created. > >> > >> We currently use a map iterator and a bunch of bpf_probe_read_kernel. We > >> foresee the number of maps in our systems getting significantly higher > >> in the near future. Therefore enumerating every map in the system to get > >> a number of elements in a particular map doesn't look sustainable. > >> > >> How do you feel about introducing bpf_map_sum_elem_count_by_fd kfunc, > >> available in syscall programs? > > > > This should work already, something like > > > > __s64 bpf_map_sum_elem_count(const struct bpf_map *map) __ksym; > > __s64 ret_user; > > > > struct { > > __uint(type, BPF_MAP_TYPE_HASH); > > __type(key, int); > > __type(value, int); > > __uint(max_entries, 4); > > } your_map SEC(".maps"); > > > > SEC("syscall") > > int sum(void *ctx) > > { > > struct bpf_map *map = (struct bpf_map *)&your_map; > > > > ret_user = bpf_map_sum_elem_count(map); > > > > return 0; > > } > > > > char _license[] SEC("license") = "GPL"; > > > > Is this sufficient for your use case? > > Technically it works. One can add a program similar to the snippet below > to their bpf code to expose the number of elements in every map of > interest. > > struct stats { __s64 a, b, c, d; }; > SEC(“.maps”) struct { ... } a, b, c, d; > > SEC(“syscall”) > int sum_element_count_bulk(void *ctx) > { > struct stats *stats = ctx; > stats->a = bpf_map_sum_element_count((void *)a); > stats->b = bpf_map_sum_element_count((void *)b); > ... > return 0; > } > > The downside is that it is boilerplate code that has to be written every > single time. With the proposed bpf_map_sum_element_count_by_fd, one can > have a library in user space that offers convenient > sum_element_count(int fd). > > It could leverage the following bpf program behind the scenes: > > SEC(“syscall”) > int sum_element_count(void *ctx) > { > *(__s64 *)ctx = bpf_map_sum_element_count_by_fd(*(int *)ctx); > return 0; > } Makes sense. And this can also be used for multiple maps in one call. I've quickly tested that the following implementation works, please send a patch + selftests. Note that unlike the bpf_map_sum_elem_count function, the bpf_map_sum_elem_count_by_fd should be only allowed for SYSCALL programs. __bpf_kfunc s64 bpf_map_sum_elem_count_by_fd(int fd) { struct bpf_map *map; s64 ret; map = bpf_map_get(fd); if (IS_ERR(map)) return 0; ret = bpf_map_sum_elem_count(map); bpf_map_put(map); return ret; } > > > >>> > >>> [Also, please try not to top-post, see https://www.idallen.com/topposting.html] > >>> > >>>>> On Tue, 7 Jan 2025 at 08:44, Anton Protopopov wrote: > >>>>>> > >>>>>> On 25/01/06 05:19PM, Daniel Borkmann wrote: > >>>>>>> On 1/6/25 3:53 PM, Charalampos Stylianopoulos wrote: > >>>>>>>> This patch series provides an easy way for userspace applications to > >>>>>>>> query the number of entries currently present in a map. > >>>>>>>> > >>>>>>>> Currently, the number of entries in a map is accessible only from kernel space > >>>>>>>> and eBPF programs. A userspace program that wants to track map utilization has to > >>>>>>>> create and attach an eBPF program solely for that purpose. > >>>>>>>> > >>>>>>>> This series makes the number of entries in a map easily accessible, by extending the > >>>>>>>> main bpf syscall with a new command. The command supports only maps that already > >>>>>>>> track utilization, namely hash maps, LPM maps and queue/stack maps. > >>>>>>> > >>>>>>> An earlier attempt to directly expose it to user space can be found here [0], which > >>>>>>> eventually led to [1] to only expose it via kfunc for BPF programs in order to avoid > >>>>>>> extending UAPI. > >>>>>>> > >>>>>>> Perhaps instead add a small libbpf helper (e.g. bpf_map__current_entries to complement > >>>>>>> bpf_map__max_entries) which does all the work to extract that info via [1] underneath? > >>>>>> > >>>>>> One small thingy here is that bpf_map_sum_elem_count() is only > >>>>>> available from the map iterator. Which means that to get the > >>>>>> bpf_map_sum_elem_count() for one map only, one have to iterate > >>>>>> through the whole set of maps (and filter out all but one). > >>>>>> > >>>>>> I wanted to follow up my series by either adding the result of > >>>>>> calling bpf_map_sum_elem_count() to map_info as u32 or to add > >>>>>> possibility to provide a map_fd/map_id when creating an iterator > >>>>>> (so that it is only called for one map). But so far I haven't > >>>>>> a real use case for getting the number of elements for one map only. > >>>>>> > >>>>>>> Thanks, > >>>>>>> Daniel > >>>>>>> > >>>>>>> [0] https://lore.kernel.org/bpf/20230531110511.64612-1-aspsk@isovalent.com/ > >>>>>>> [1] https://lore.kernel.org/bpf/20230705160139.19967-1-aspsk@isovalent.com/ > >>>>>>> https://lore.kernel.org/bpf/20230719092952.41202-1-aspsk@isovalent.com/ > >>>>>>> > >>>>>>>> Charalampos Stylianopoulos (4): > >>>>>>>> bpf: Add map_num_entries map op > >>>>>>>> bpf: Add bpf command to get number of map entries > >>>>>>>> libbpf: Add support for MAP_GET_NUM_ENTRIES command > >>>>>>>> selftests/bpf: Add tests for bpf_map_get_num_entries > >>>>>>>> > >>>>>>>> include/linux/bpf.h | 3 ++ > >>>>>>>> include/linux/bpf_local_storage.h | 1 + > >>>>>>>> include/uapi/linux/bpf.h | 17 +++++++++ > >>>>>>>> kernel/bpf/devmap.c | 14 ++++++++ > >>>>>>>> kernel/bpf/hashtab.c | 10 ++++++ > >>>>>>>> kernel/bpf/lpm_trie.c | 8 +++++ > >>>>>>>> kernel/bpf/queue_stack_maps.c | 11 +++++- > >>>>>>>> kernel/bpf/syscall.c | 32 +++++++++++++++++ > >>>>>>>> tools/include/uapi/linux/bpf.h | 17 +++++++++ > >>>>>>>> tools/lib/bpf/bpf.c | 16 +++++++++ > >>>>>>>> tools/lib/bpf/bpf.h | 2 ++ > >>>>>>>> tools/lib/bpf/libbpf.map | 1 + > >>>>>>>> .../bpf/map_tests/lpm_trie_map_basic_ops.c | 5 +++ > >>>>>>>> tools/testing/selftests/bpf/test_maps.c | 35 +++++++++++++++++++ > >>>>>>>> 14 files changed, 171 insertions(+), 1 deletion(-) > >