From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx0b-00206402.pphosted.com (mx0b-00206402.pphosted.com [148.163.152.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 09C906EB4C; Tue, 25 Feb 2025 01:41:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.152.16 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740447698; cv=none; b=RSdgdGIRmGYKZIA17v0zL97XQpKJG4Z00Iw0l6xn5NTV3aaRqmOQZwKXJnJXLClwr2eoFu/HmxBFeeNFwMVaEzG2FDPVjiPYoDy9BhEOu74GAOvu9osssu64wiQXgtLSLf3JMtgYQXAhXOiDBCKnNEFIDcOAeM1Y5+DtGGK6BTk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740447698; c=relaxed/simple; bh=g6aUaTojdNvf/YcN3Weqm/NHumO0uO2W45Rn+zfqbbw=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=R99XOdJyMgYlWtQfSBVc5k4mtM/MrB86Vg5pOGwB5EaQsGNq2wpTAJ+tqElFX3hEaSGtitg/x4UfSU9DsLjkBtsWBTWXiGN6sOnU1KBdQmxcuDA2PDrgtGo8BKND7H13UP4jyrJpxTRXWLrXt1brrdj1fGK3BLMwVcEd1OIXMSw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=crowdstrike.com; spf=pass smtp.mailfrom=crowdstrike.com; dkim=pass (2048-bit key) header.d=crowdstrike.com header.i=@crowdstrike.com header.b=yhRcTsfc; arc=none smtp.client-ip=148.163.152.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=crowdstrike.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=crowdstrike.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=crowdstrike.com header.i=@crowdstrike.com header.b="yhRcTsfc" Received: from pps.filterd (m0354654.ppops.net [127.0.0.1]) by mx0b-00206402.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51OLj3tF002668; Tue, 25 Feb 2025 01:12:39 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=crowdstrike.com; h=cc:content-transfer-encoding:content-type:date:from :in-reply-to:message-id:mime-version:references:subject:to; s= default; bh=C3dnc16gEIyGpwzy8St3qAXWKJxIdmhRQXzAy2TPXcI=; b=yhRc TsfcFnQza8lU8dnqTnhrjd82usIvrtEsdz8PV8UOGFWcUfcP5AvUlVCLwOFeooW2 vI71ELfALo5GOA2+9QYpNYH2rtEtu8c7QzcMAXlS1bd+bSIanpesmpsJuPFfrypT 1HuF2kNu3BbJmjzCKjz3Ga4SdpxasIZ/PXxJsqkkM1FdQOAcCideoJRaPjKQI/+E dKgJyGUs8mr+i4Ieqmg+bd36PlwRQbNtsBYDCgkabrR04OhJQvtyVXajc0VsMmnv Cv/Zt/y4s/TjYItzaYJukESD+bWM8BiSLR+PGoLk1P82/QvljPslm+WphUq0in6U ZO8kVLl8Jo2Mh4TBfA== Received: from mail.crowdstrike.com (dragosx.crowdstrike.com [208.42.231.60] (may be forged)) by mx0b-00206402.pphosted.com (PPS) with ESMTPS id 450vxnhgb1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 25 Feb 2025 01:12:39 +0000 (GMT) Received: from ML-CTVHTF21DX.crowdstrike.sys (10.100.11.122) by 04WPEXCH007.crowdstrike.sys (10.100.11.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.9; Tue, 25 Feb 2025 01:12:33 +0000 From: Slava Imameev To: CC: , , , , , , , , , , , , , , , , , , , Subject: Re: Re: Re: Re: Re: [PATCH 2/2] libbpf: BPF programs dynamic loading and attaching Date: Tue, 25 Feb 2025 12:12:31 +1100 Message-ID: <20250225011231.27681-1-slava.imameev@crowdstrike.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-ClientProxiedBy: 04WPEXCH016.crowdstrike.sys (10.100.11.68) To 04WPEXCH007.crowdstrike.sys (10.100.11.74) X-Disclaimer: USA X-Authority-Analysis: v=2.4 cv=MM6amNZl c=1 sm=1 tr=0 ts=67bd1907 cx=c_pps a=1d8vc5iZWYKGYgMGCdbIRA==:117 a=1d8vc5iZWYKGYgMGCdbIRA==:17 a=EjBHVkixTFsA:10 a=IkcTkHD0fZMA:10 a=T2h4t0Lz3GQA:10 a=P-IC7800AAAA:8 a=YdVmfgTgZjZG2QomKWsA:9 a=3ZKOabzyN94A:10 a=QEXdDO2ut3YA:10 a=d3PnA9EDa4IxuAV0gXij:22 X-Proofpoint-GUID: guqdbejqgUvRNFYOP4EUwJSqXeGbtK_B X-Proofpoint-ORIG-GUID: guqdbejqgUvRNFYOP4EUwJSqXeGbtK_B X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-24_12,2025-02-24_02,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 lowpriorityscore=0 impostorscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 clxscore=1015 bulkscore=0 malwarescore=0 spamscore=0 adultscore=0 phishscore=0 classifier=spam authscore=0 authtc=n/a authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2502100000 definitions=main-2502250006 This reply was resent as the previous email had a missing In-Reply-To in the header. > > > On Mon, 2025-02-10 at 16:06 -0800, Andrii Nakryiko wrote: > > > > Tracking associated maps for a program is not necessary. As long as > > > > the last BPF program using the BPF map is unloaded, the kernel will > > > > automatically free not-anymore-referenced BPF map. Note that > > > > bpf_object itself will keep FDs for BPF maps, so you'd need to make > > > > sure to do bpf_object__close() to release those references. > > > > > > > > But if you are going to ask to re-create BPF maps next time BPF > > > > program is loaded... Well, I'll say you are asking for a bit too > > > > > much, > > > > tbh. If you want to be *that* sophisticated, it shouldn't be too > > > > hard > > > > for you to get all this information from BPF program's > > > > instructions. > > > > > > > > We really are that sophisticated (see below for more details). We could > > scan program instructions, but we'd then tie our logic to BPF > > implementation details and duplicate logic already present in libbpf > > implementation details and duplicate logic already present in libbpf > > (https://elixir.bootlin.com/linux/v6.13.2/source/tools/lib/bpf/libbpf.c#L= 6087 > > ). Obviously this *can* be done but it's not at all ideal from an > > application perspective. > > > > > I agree it's not ideal, but it's also not some complicated and > bound-to-be-changed logic. What you point out in libbpf source code is > a bit different thing, reality is much simpler. Only so-called ldimm64 > instruction (BPF_LD | BPF_IMM | BPF_DW opcode) can be referencing map > FD, so analysing this is borderline trivial. And this is part of BPF > ISA, so not going to change. Our approach is to associate an array of maps as a property with each BPF program, this property is initialised at the relocation stage. So, we do not need to parse BPF program instructions. Instead, we rely on recorded relocations. I think this is a more robust and clean solution with advantage of all code in the same place and being at the higher level of abstraction with a relocation table. The mainline libbpf keeps array of maps for a bpf_object, we extended this by adding an array of maps associated with each bpf_program. For example, a code excerpt, from our development branch, which associates a map with bpf_program at relocation phase: insn[0].src_reg = BPF_PSEUDO_MAP_FD; insn[0].imm = map->fd; err = bpf_program__add_map(prog, map); > > > > > > > > > > bpf_object is the unit of coherence in libbpf, so I don't see us > > > > refcounting maps between bpf_objects. Kernel is doing refcounting > > > > based on FDs, so see if you can use that. > > > > > > > > I can understand that. That said, I think if there's no logic across > > objects, and bpf_object access is not thread-safe, it puts us into a > > tough situation: > > - Complex refcounting, code scanning, etc to keep consistency when > > manipulating maps used by multiple programs. > > - Parallel loading not being well-balanced, if we split programs across > > objects. > > > > We could alternatively write our own custom loader, but then we’d have > > to duplicate much of the useful logic that libbpf already implements: > > skeleton generation, map/program association, embedding programs into > > ELFs, loading logic and kernel probing, etc. We’d like some way to > > handle dynamic/parallel loading without having to replicate all the > > advantages libbpf grants us. > > > > > Yeah, I can understand that as well, but bpf_object's single-threaded > design and the fact that bpf_object__load is kind of the final step > where programs are loaded (or not) is pretty backed in. I don't see > bpf_object becoming multi-threaded. We understood this, but the current bpf_object design allowed us to use it in a multithreaded environment with minor modification for bpf_program load. We understand that the design choice of libbpf being single threaded is unlikely to be reconsidered. > > > > > > > > > > bpf_object is the unit of coherence in libbpf, so I don't see us > > > > refcounting maps between bpf_objects. Kernel is doing refcounting > > > > based on FDs, so see if you can use that. > > > > > > > > I can understand that. That said, I think if there's no logic across > > objects, and bpf_object access is not thread-safe, it puts us into a > > tough situation: > > - Complex refcounting, code scanning, etc to keep consistency when > > manipulating maps used by multiple programs. > > - Parallel loading not being well-balanced, if we split programs across > > objects. > > > > We could alternatively write our own custom loader, but then we’d have > > to duplicate much of the useful logic that libbpf already implements: > > skeleton generation, map/program association, embedding programs into > > ELFs, loading logic and kernel probing, etc. We’d like some way to > > handle dynamic/parallel loading without having to replicate all the > > advantages libbpf grants us. > > > > > Yeah, I can understand that as well, but bpf_object's single-threaded > design and the fact that bpf_object__load is kind of the final step > where programs are loaded (or not) is pretty backed in. I don't see > bpf_object becoming multi-threaded. The dynamic program > loading/unloading/loading again is something that I can't yet justify, > tbh. > > > So the best I can propose you is to use libbpf's skeleton and > bpf_object concept for, effectively, ELF handling, relocations, all > the preparations up to loading BPF programs. And after that you can > take over loading and handling program lifetime outside of bpf_object. > > > Dynamic map creation after bpf_object__load() I think is completely > outside of the scope and you'll have to solve this problem for > yourself. I would point out, though, that internally libbpf already > switched to sort-of pre-creating stable FDs for maps before they are > actually created in the kernel. So it's conceivable that we can have > more granularity in bpf_object preparation. I.e., first step would be > to parse ELF and handle relocations, prepare everything. After that we > can have a step to create maps, and then another one to create > programs. Usually people would do all that, but you can stop right > before maps creation or before program creation, whatever fits your > use case better. > > > The key is that program instructions will be final and won't need > adjustments regardless of maps actually being created or not. FDs, as > I mentioned, are stable regardless. We used this in our design, so we did not need to scan BPF program instructions to fix map's fds referenced by instructions from a dynamically loaded bpf_program with dynamically created maps. > > > > The use case here is that our security monitoring agent leverages eBPF > > as its foundational technology to gather telemetry from the kernel. As > > part of that, we hook many different kernel subsystems (process, > > memory, filesystem, network, etc), tying them together and tracking > > with maps. So we legitimately have a very large number of programs all > > doing different work. For products of this scale, it increases security > > and performance to load this set of programs and their maps in an > > optimized, parallel fashion and subsequently change the loaded set of > > programs and maps dynamically without disturbing the rest of the > > application. > > > Yes, makes sense. You'll need to decide for yourself if it's actually > more meaningful to split those 200 programs into independent > bpf_objects by features, and be rigorous about sharing state (maps) > through bpf_map__reuse_fd(), which would allow to parallelize loading > within confines of existing libbpf APIs. Or you can be a bit more > low-level with program loading outside of bpf_object API, as I > described above. Yes, this can be one of the ways to share bpf maps across multiple bpf_objects and use existing libbpf for parallel bps programs loading, if we want to keep a full libbpf compatibility, but at a cost of complicating design, as we need to convert a single bpf_object model to multiple bpf_objects with a new layer that manages these bpf_objects. In our case, as a bpf_program can map to multiple features, which can be modified independently, and to achieve an even load balancing across multiple threads, it would be probably one bpf_program for a bpf_object.