From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-189.mta0.migadu.com (out-189.mta0.migadu.com [91.218.175.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B09D1E5B6D for ; Wed, 28 May 2025 17:23:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.189 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748453037; cv=none; b=HZ8QGTPUlhLEsHmpq4h6xywOXbqpjNJ8bicEHg1PfAbDnrK1BmS2nbvCtJtplQtVvNobEHLwIm2nK8o3T6RQZZX3tjuLEgDUZYFA6fGSfnBXYKVJbraPqXNlLHU4Ii2TTDEPS/ndLdgThoEBKEiFFUbQcS2IvL5l0KqJpGbWz1k= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748453037; c=relaxed/simple; bh=alXO9rRWMFzWnCVyWIsJ/ytdMT4PMZTdITLAW6S0gvE=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=u83W3PIy1/X7tSMjRGzJ6g1BWN9Z6h6J/uqd7FqPWChLrEYB5Ek9Hx960h0omLaH1UFIZcVnIitt8ZNi0pwzEHe7b9lfgvqQ7fy+nZYpm63r787MWgFJKrFWZ8qxDRF04xyRczf8tAgcbi7nN94sV1DhZ3pNntDGpDHgbyI3RGM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=EvxN6vTa; arc=none smtp.client-ip=91.218.175.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="EvxN6vTa" Message-ID: <5bfb5b19-40bd-4b37-a4a4-00e2a3473447@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1748453032; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=24Y5RQIoZksBEzEODLeAdhoOqGJGFsWHL6wbconqKnA=; b=EvxN6vTaY4BrxnRFoxAEIAF1b4q/f/MXrBrkXeqG1IfnPh9PHksspEK2Aaug8myvKD4Pfm t7QMV6JdsApRugOBbQqFaQIABGDZvK4mlPWDoBUGqvduggkt6F9ye9E1SdaxPBv0dxdOLj 59Fmmoh9skBszpnMK1ok8Aa69hjfZRI= Date: Wed, 28 May 2025 10:23:45 -0700 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH bpf-next v3 2/5] bpf: Implement mprog API on top of existing cgroup progs Content-Language: en-GB To: Andrii Nakryiko Cc: bpf@vger.kernel.org, Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau References: <20250517162720.4077882-1-yonghong.song@linux.dev> <20250517162731.4078451-1-yonghong.song@linux.dev> <067aec4f-6847-4c86-9e93-1be8145b252a@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yonghong Song In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On 5/27/25 2:36 PM, Andrii Nakryiko wrote: > On Fri, May 23, 2025 at 6:04 PM Yonghong Song wrote: >> >> >> On 5/22/25 1:45 PM, Andrii Nakryiko wrote: >>> On Sat, May 17, 2025 at 9:27 AM Yonghong Song wrote: >>>> Current cgroup prog ordering is appending at attachment time. This is not >>>> ideal. In some cases, users want specific ordering at a particular cgroup >>>> level. To address this, the existing mprog API seems an ideal solution with >>>> supporting BPF_F_BEFORE and BPF_F_AFTER flags. >>>> >>>> But there are a few obstacles to directly use kernel mprog interface. >>>> Currently cgroup bpf progs already support prog attach/detach/replace >>>> and link-based attach/detach/replace. For example, in struct >>>> bpf_prog_array_item, the cgroup_storage field needs to be together >>>> with bpf prog. But the mprog API struct bpf_mprog_fp only has bpf_prog >>>> as the member, which makes it difficult to use kernel mprog interface. >>>> >>>> In another case, the current cgroup prog detach tries to use the >>>> same flag as in attach. This is different from mprog kernel interface >>>> which uses flags passed from user space. >>>> >>>> So to avoid modifying existing behavior, I made the following changes to >>>> support mprog API for cgroup progs: >>>> - The support is for prog list at cgroup level. Cross-level prog list >>>> (a.k.a. effective prog list) is not supported. >>>> - Previously, BPF_F_PREORDER is supported only for prog attach, now >>>> BPF_F_PREORDER is also supported by link-based attach. >>>> - For attach, BPF_F_BEFORE/BPF_F_AFTER/BPF_F_ID/BPF_F_LINK is supported >>>> similar to kernel mprog but with different implementation. >>>> - For detach and replace, use the existing implementation. >>>> - For attach, detach and replace, the revision for a particular prog >>>> list, associated with a particular attach type, will be updated >>>> by increasing count by 1. >>>> >>>> Signed-off-by: Yonghong Song >>>> --- >>>> include/uapi/linux/bpf.h | 7 ++ >>>> kernel/bpf/cgroup.c | 195 +++++++++++++++++++++++++++++---- >>>> kernel/bpf/syscall.c | 43 +++++--- >>>> tools/include/uapi/linux/bpf.h | 7 ++ >>>> 4 files changed, 214 insertions(+), 38 deletions(-) >>>> [...] >>>> + } >>>> + >>>> + if (link) { >>>> + anchor_link = bpf_get_anchor_link(flags, id_or_fd, prog->type); >>>> + if (IS_ERR(anchor_link)) >>>> + return ERR_PTR(PTR_ERR(anchor_link)); >>>> + anchor_prog = anchor_link->prog; >>>> + } else if (id || id_or_fd) { >>>> + anchor_prog = bpf_get_anchor_prog(flags, id_or_fd, prog->type); >>>> + if (IS_ERR(anchor_prog)) >>>> + return ERR_PTR(PTR_ERR(anchor_prog)); >>>> + } >>>> + >>>> + if (!anchor_prog) { >>>> + /* if there is no anchor_prog, then BPF_F_PREORDER doesn't matter >>>> + * since either prepend or append to a combined list of progs will >>>> + * end up with correct result. >>>> + */ >>>> + hlist_for_each_entry(pltmp, progs, node) { >>>> + if (flags & BPF_F_BEFORE) >>>> + return pltmp; >>>> + if (pltmp->node.next) >>>> + continue; >>>> + return pltmp; >>>> + } >>>> + return NULL; >>>> + } >>>> + >>>> + hlist_for_each_entry(pltmp, progs, node) { >>>> + pltmp_prog = pltmp->link ? pltmp->link->link.prog : pltmp->prog; >>>> + if (pltmp_prog != anchor_prog) >>>> + continue; >>>> + if (!!(pltmp->flags & BPF_F_PREORDER) != preorder) >>>> + goto out; >>> hm... thinking about this a bit more, is it illegal to have the same >>> BPF program attached as PREORDER and POSTORDER? That seems legit to >>> me, do we artificially disallow this? >> Good question, in find_attach_entry(), we have >> >> hlist_for_each_entry(pl, progs, node) { >> if (prog && pl->prog == prog && prog != replace_prog) >> /* disallow attaching the same prog twice */ >> return ERR_PTR(-EINVAL); >> if (link && pl->link == link) >> /* disallow attaching the same link twice */ >> return ERR_PTR(-EINVAL); >> } >> >> Basically, two same progs are not allowed. Here we didn't check PREORDER flag. >> Should we relax this for this patch set? > So with BPF link-based attachment we already allow multiple same BPF > programs to be attached, regardless of PREORDER. I don't think we need > to make any relaxations for old-style program attachment. But your > mprog API is, effectively ignoring link stuff and checking for > uniqueness of the program (regardless of whether it came from link or > not), which is problematic, I think, no? You are correct. I should check link instead of prog for link-based attachment in mprog. Will fix. > > [...] > >>>> @@ -640,7 +765,8 @@ static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs, >>>> static int __cgroup_bpf_attach(struct cgroup *cgrp, >>>> struct bpf_prog *prog, struct bpf_prog *replace_prog, >>>> struct bpf_cgroup_link *link, >>>> - enum bpf_attach_type type, u32 flags) >>>> + enum bpf_attach_type type, u32 flags, u32 id_or_fd, >>>> + u64 revision) >>>> { >>>> u32 saved_flags = (flags & (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI)); >>>> struct bpf_prog *old_prog = NULL; >>>> @@ -656,6 +782,9 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp, >>>> ((flags & BPF_F_REPLACE) && !(flags & BPF_F_ALLOW_MULTI))) >>>> /* invalid combination */ >>>> return -EINVAL; >>>> + if ((flags & BPF_F_REPLACE) && (flags & (BPF_F_BEFORE | BPF_F_AFTER))) >>> but can it be that neither is set? >> I would say it is possible. In that case, the new prog is appended to >> the end of prog list. > ok, makes sense > >>>> + /* only either replace or insertion with before/after */ >>>> + return -EINVAL; >>>> if (link && (prog || replace_prog)) >>>> /* only either link or prog/replace_prog can be specified */ >>>> return -EINVAL; >>> [...]