[RFC PATCH bpf-next 0/4] bpf: Implement mprog API on top of existing cgroup progs

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH bpf-next 0/4] bpf: Implement mprog API on top of existing cgroup progs
@ 2025-04-11  1:15 Yonghong Song
  2025-04-11  1:15 ` [RFC PATCH bpf-next 1/4] cgroup: Add bpf prog revisions to struct cgroup_bpf Yonghong Song
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Yonghong Song @ 2025-04-11  1:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

Current cgroup prog ordering is appending at attachment time. This is not
ideal. In some cases, users want specific ordering at a particular cgroup
level. For example, in Meta, we have a case where three different
applications all have cgroup/setsockopt progs and they require specific
ordering. Current approach is to use a bpfchainer where one bpf prog
contains multiple global functions and each global function can be
freplaced by a prog for a specific application. The ordering of global
functions decides the ordering of those application specific bpf progs.
Using bpftrainer is a centralized approach and is not desirable as
one of applications acts as a deamon. The decentralized attachment
approach is more favorable for those applications.

To address this, the existing mprog API ([2]) seems an ideal solution with
supporting BPF_F_BEFORE and BPF_F_AFTER flags on top of existing cgroup
bpf implementation. More specifically, the support is added for prog/link
attachment with BPF_F_BEFORE and BPF_F_AFTER. The kernel mprog
interface ([2]) is not used and the implementation is directly done in
cgroup bpf code base. The mprog 'revision' is also implemented in
attach/detach/replace, so users can query revision number to check the
change of cgroup prog list.

The patch set contains 4 patches. Patch 1 adds revision support for
cgroup bpf progs. Patch 2 implements mprog API implementation for
prog/link attach and revision update. Patch 3 adds a new libbpf
API to do cgroup link attach with flags like BPF_F_BEFORE/BPF_F_AFTER.
Patch 4 adds two tests to validate the implementation.

  [1] https://lore.kernel.org/r/20250224230116.283071-1-yonghong.song@linux.dev
  [2] https://lore.kernel.org/r/20230719140858.13224-2-daniel@iogearbox.net

Yonghong Song (4):
  cgroup: Add bpf prog revisions to struct cgroup_bpf
  bpf: Implement mprog API on top of existing cgroup progs
  libbpf: Support link-based cgroup attach with options
  selftests/bpf: Add two selftests for mprog API based cgroup progs

 include/linux/bpf-cgroup-defs.h               |   1 +
 include/uapi/linux/bpf.h                      |   7 +
 kernel/bpf/cgroup.c                           | 151 +++-
 kernel/bpf/syscall.c                          |  58 +-
 kernel/cgroup/cgroup.c                        |   5 +-
 tools/include/uapi/linux/bpf.h                |   7 +
 tools/lib/bpf/bpf.c                           |  44 +
 tools/lib/bpf/bpf.h                           |   5 +
 tools/lib/bpf/libbpf.c                        |  28 +
 tools/lib/bpf/libbpf.h                        |  15 +
 tools/lib/bpf/libbpf.map                      |   1 +
 .../bpf/prog_tests/cgroup_mprog_opts.c        | 752 ++++++++++++++++++
 .../bpf/prog_tests/cgroup_mprog_ordering.c    |  77 ++
 .../selftests/bpf/progs/cgroup_mprog.c        |  30 +
 14 files changed, 1138 insertions(+), 43 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/cgroup_mprog_opts.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/cgroup_mprog_ordering.c
 create mode 100644 tools/testing/selftests/bpf/progs/cgroup_mprog.c

-- 
2.47.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH bpf-next 1/4] cgroup: Add bpf prog revisions to struct cgroup_bpf
  2025-04-11  1:15 [RFC PATCH bpf-next 0/4] bpf: Implement mprog API on top of existing cgroup progs Yonghong Song
@ 2025-04-11  1:15 ` Yonghong Song
  2025-04-12  0:41   ` kernel test robot
                     ` (2 more replies)
  2025-04-11  1:15 ` [RFC PATCH bpf-next 2/4] bpf: Implement mprog API on top of existing cgroup progs Yonghong Song
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 13+ messages in thread
From: Yonghong Song @ 2025-04-11  1:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

One of key items in mprog API is revision for prog list. The revision
number will be increased if the prog list changed, e.g., attach, detach
or replace.

Add 'revisions' field to struct cgroup_bpf, representing revisions for
all cgroup related attachment types. The initial revision value is
set to 1, the same as kernel mprog implementations.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 include/linux/bpf-cgroup-defs.h | 1 +
 kernel/cgroup/cgroup.c          | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
index 0985221d5478..a3cbbd00731a 100644
--- a/include/linux/bpf-cgroup-defs.h
+++ b/include/linux/bpf-cgroup-defs.h
@@ -62,6 +62,7 @@ struct cgroup_bpf {
 	 * when BPF_F_ALLOW_MULTI the list can have up to BPF_CGROUP_MAX_PROGS
 	 */
 	struct hlist_head progs[MAX_CGROUP_BPF_ATTACH_TYPE];
+	atomic64_t revisions[MAX_CGROUP_BPF_ATTACH_TYPE];
 	u8 flags[MAX_CGROUP_BPF_ATTACH_TYPE];
 
 	/* list of cgroup shared storages */
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index ac2db99941ca..dea7d12c8927 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2053,7 +2053,7 @@ static int cgroup_reconfigure(struct fs_context *fc)
 static void init_cgroup_housekeeping(struct cgroup *cgrp)
 {
 	struct cgroup_subsys *ss;
-	int ssid;
+	int i, ssid;
 
 	INIT_LIST_HEAD(&cgrp->self.sibling);
 	INIT_LIST_HEAD(&cgrp->self.children);
@@ -2071,6 +2071,9 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	for_each_subsys(ss, ssid)
 		INIT_LIST_HEAD(&cgrp->e_csets[ssid]);
 
+	for (i = 0; i < ARRAY_SIZE(cgrp->bpf.revisions); i++)
+		atomic64_set(&cgrp->bpf.revisions[i], 1);
+
 	init_waitqueue_head(&cgrp->offline_waitq);
 	INIT_WORK(&cgrp->release_agent_work, cgroup1_release_agent);
 }
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH bpf-next 1/4] cgroup: Add bpf prog revisions to struct cgroup_bpf
  2025-04-11  1:15 ` [RFC PATCH bpf-next 1/4] cgroup: Add bpf prog revisions to struct cgroup_bpf Yonghong Song
@ 2025-04-12  0:41   ` kernel test robot
  2025-05-08  4:18     ` Yonghong Song
  2025-04-12  1:13   ` kernel test robot
  2025-04-23 23:19   ` Andrii Nakryiko
  2 siblings, 1 reply; 13+ messages in thread
From: kernel test robot @ 2025-04-12  0:41 UTC (permalink / raw)
  To: Yonghong Song; +Cc: oe-kbuild-all

Hi Yonghong,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Yonghong-Song/cgroup-Add-bpf-prog-revisions-to-struct-cgroup_bpf/20250411-091743
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20250411011528.1839359-1-yonghong.song%40linux.dev
patch subject: [RFC PATCH bpf-next 1/4] cgroup: Add bpf prog revisions to struct cgroup_bpf
config: x86_64-buildonly-randconfig-001-20250412 (https://download.01.org/0day-ci/archive/20250412/202504120853.I4bFTHhG-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250412/202504120853.I4bFTHhG-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504120853.I4bFTHhG-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/linux/kernel.h:16,
                    from include/linux/cpumask.h:11,
                    from arch/x86/include/asm/tlbbatch.h:5,
                    from include/linux/mm_types_task.h:17,
                    from include/linux/sched.h:38,
                    from include/linux/cgroup.h:12,
                    from kernel/cgroup/cgroup-internal.h:5,
                    from kernel/cgroup/cgroup.c:31:
   kernel/cgroup/cgroup.c: In function 'init_cgroup_housekeeping':
>> kernel/cgroup/cgroup.c:2074:45: error: 'struct cgroup_bpf' has no member named 'revisions'
    2074 |         for (i = 0; i < ARRAY_SIZE(cgrp->bpf.revisions); i++)
         |                                             ^
   include/linux/array_size.h:11:33: note: in definition of macro 'ARRAY_SIZE'
      11 | #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
         |                                 ^~~
>> kernel/cgroup/cgroup.c:2074:45: error: 'struct cgroup_bpf' has no member named 'revisions'
    2074 |         for (i = 0; i < ARRAY_SIZE(cgrp->bpf.revisions); i++)
         |                                             ^
   include/linux/array_size.h:11:48: note: in definition of macro 'ARRAY_SIZE'
      11 | #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
         |                                                ^~~
   In file included from include/linux/build_bug.h:5,
                    from arch/x86/include/asm/current.h:5,
                    from include/linux/sched.h:12:
>> kernel/cgroup/cgroup.c:2074:45: error: 'struct cgroup_bpf' has no member named 'revisions'
    2074 |         for (i = 0; i < ARRAY_SIZE(cgrp->bpf.revisions); i++)
         |                                             ^
   include/linux/compiler.h:197:79: note: in definition of macro '__BUILD_BUG_ON_ZERO_MSG'
     197 | #define __BUILD_BUG_ON_ZERO_MSG(e, msg) ((int)sizeof(struct {_Static_assert(!(e), msg);}))
         |                                                                               ^
   include/linux/compiler.h:201:35: note: in expansion of macro '__same_type'
     201 | #define __is_array(a)           (!__same_type((a), &(a)[0]))
         |                                   ^~~~~~~~~~~
   include/linux/compiler.h:202:58: note: in expansion of macro '__is_array'
     202 | #define __must_be_array(a)      __BUILD_BUG_ON_ZERO_MSG(!__is_array(a), \
         |                                                          ^~~~~~~~~~
   include/linux/array_size.h:11:59: note: in expansion of macro '__must_be_array'
      11 | #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
         |                                                           ^~~~~~~~~~~~~~~
   kernel/cgroup/cgroup.c:2074:25: note: in expansion of macro 'ARRAY_SIZE'
    2074 |         for (i = 0; i < ARRAY_SIZE(cgrp->bpf.revisions); i++)
         |                         ^~~~~~~~~~
>> kernel/cgroup/cgroup.c:2074:45: error: 'struct cgroup_bpf' has no member named 'revisions'
    2074 |         for (i = 0; i < ARRAY_SIZE(cgrp->bpf.revisions); i++)
         |                                             ^
   include/linux/compiler.h:197:79: note: in definition of macro '__BUILD_BUG_ON_ZERO_MSG'
     197 | #define __BUILD_BUG_ON_ZERO_MSG(e, msg) ((int)sizeof(struct {_Static_assert(!(e), msg);}))
         |                                                                               ^
   include/linux/compiler.h:201:35: note: in expansion of macro '__same_type'
     201 | #define __is_array(a)           (!__same_type((a), &(a)[0]))
         |                                   ^~~~~~~~~~~
   include/linux/compiler.h:202:58: note: in expansion of macro '__is_array'
     202 | #define __must_be_array(a)      __BUILD_BUG_ON_ZERO_MSG(!__is_array(a), \
         |                                                          ^~~~~~~~~~
   include/linux/array_size.h:11:59: note: in expansion of macro '__must_be_array'
      11 | #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
         |                                                           ^~~~~~~~~~~~~~~
   kernel/cgroup/cgroup.c:2074:25: note: in expansion of macro 'ARRAY_SIZE'
    2074 |         for (i = 0; i < ARRAY_SIZE(cgrp->bpf.revisions); i++)
         |                         ^~~~~~~~~~
   include/linux/compiler.h:197:77: error: expression in static assertion is not an integer
     197 | #define __BUILD_BUG_ON_ZERO_MSG(e, msg) ((int)sizeof(struct {_Static_assert(!(e), msg);}))
         |                                                                             ^
   include/linux/compiler.h:202:33: note: in expansion of macro '__BUILD_BUG_ON_ZERO_MSG'
     202 | #define __must_be_array(a)      __BUILD_BUG_ON_ZERO_MSG(!__is_array(a), \
         |                                 ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/array_size.h:11:59: note: in expansion of macro '__must_be_array'
      11 | #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
         |                                                           ^~~~~~~~~~~~~~~
   kernel/cgroup/cgroup.c:2074:25: note: in expansion of macro 'ARRAY_SIZE'
    2074 |         for (i = 0; i < ARRAY_SIZE(cgrp->bpf.revisions); i++)
         |                         ^~~~~~~~~~
   kernel/cgroup/cgroup.c:2075:40: error: 'struct cgroup_bpf' has no member named 'revisions'
    2075 |                 atomic64_set(&cgrp->bpf.revisions[i], 1);
         |                                        ^


vim +2074 kernel/cgroup/cgroup.c

  2052	
  2053	static void init_cgroup_housekeeping(struct cgroup *cgrp)
  2054	{
  2055		struct cgroup_subsys *ss;
  2056		int i, ssid;
  2057	
  2058		INIT_LIST_HEAD(&cgrp->self.sibling);
  2059		INIT_LIST_HEAD(&cgrp->self.children);
  2060		INIT_LIST_HEAD(&cgrp->cset_links);
  2061		INIT_LIST_HEAD(&cgrp->pidlists);
  2062		mutex_init(&cgrp->pidlist_mutex);
  2063		cgrp->self.cgroup = cgrp;
  2064		cgrp->self.flags |= CSS_ONLINE;
  2065		cgrp->dom_cgrp = cgrp;
  2066		cgrp->max_descendants = INT_MAX;
  2067		cgrp->max_depth = INT_MAX;
  2068		INIT_LIST_HEAD(&cgrp->rstat_css_list);
  2069		prev_cputime_init(&cgrp->prev_cputime);
  2070	
  2071		for_each_subsys(ss, ssid)
  2072			INIT_LIST_HEAD(&cgrp->e_csets[ssid]);
  2073	
> 2074		for (i = 0; i < ARRAY_SIZE(cgrp->bpf.revisions); i++)
  2075			atomic64_set(&cgrp->bpf.revisions[i], 1);
  2076	
  2077		init_waitqueue_head(&cgrp->offline_waitq);
  2078		INIT_WORK(&cgrp->release_agent_work, cgroup1_release_agent);
  2079	}
  2080	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH bpf-next 1/4] cgroup: Add bpf prog revisions to struct cgroup_bpf
  2025-04-12  0:41   ` kernel test robot
@ 2025-05-08  4:18     ` Yonghong Song
  0 siblings, 0 replies; 13+ messages in thread
From: Yonghong Song @ 2025-05-08  4:18 UTC (permalink / raw)
  To: kernel test robot; +Cc: oe-kbuild-all



On 4/11/25 8:41 AM, kernel test robot wrote:
> Hi Yonghong,
>
> [This is a private test report for your RFC patch.]
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on bpf-next/master]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Yonghong-Song/cgroup-Add-bpf-prog-revisions-to-struct-cgroup_bpf/20250411-091743
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> patch link:    https://lore.kernel.org/r/20250411011528.1839359-1-yonghong.song%40linux.dev
> patch subject: [RFC PATCH bpf-next 1/4] cgroup: Add bpf prog revisions to struct cgroup_bpf
> config: x86_64-buildonly-randconfig-001-20250412 (https://download.01.org/0day-ci/archive/20250412/202504120853.I4bFTHhG-lkp@intel.com/config)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250412/202504120853.I4bFTHhG-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202504120853.I4bFTHhG-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
>     In file included from include/linux/kernel.h:16,
>                      from include/linux/cpumask.h:11,
>                      from arch/x86/include/asm/tlbbatch.h:5,
>                      from include/linux/mm_types_task.h:17,
>                      from include/linux/sched.h:38,
>                      from include/linux/cgroup.h:12,
>                      from kernel/cgroup/cgroup-internal.h:5,
>                      from kernel/cgroup/cgroup.c:31:
>     kernel/cgroup/cgroup.c: In function 'init_cgroup_housekeeping':
>>> kernel/cgroup/cgroup.c:2074:45: error: 'struct cgroup_bpf' has no member named 'revisions'
>      2074 |         for (i = 0; i < ARRAY_SIZE(cgrp->bpf.revisions); i++)

This needs the guard CONFIG_CGROUP_BPF. Will fix in the next revision.

>           |                                             ^
>     include/linux/array_size.h:11:33: note: in definition of macro 'ARRAY_SIZE'
>        11 | #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
>           |                                 ^~~
>
[...]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH bpf-next 1/4] cgroup: Add bpf prog revisions to struct cgroup_bpf
  2025-04-11  1:15 ` [RFC PATCH bpf-next 1/4] cgroup: Add bpf prog revisions to struct cgroup_bpf Yonghong Song
  2025-04-12  0:41   ` kernel test robot
@ 2025-04-12  1:13   ` kernel test robot
  2025-04-23 23:19   ` Andrii Nakryiko
  2 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2025-04-12  1:13 UTC (permalink / raw)
  To: Yonghong Song; +Cc: llvm, oe-kbuild-all

Hi Yonghong,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Yonghong-Song/cgroup-Add-bpf-prog-revisions-to-struct-cgroup_bpf/20250411-091743
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20250411011528.1839359-1-yonghong.song%40linux.dev
patch subject: [RFC PATCH bpf-next 1/4] cgroup: Add bpf prog revisions to struct cgroup_bpf
config: x86_64-buildonly-randconfig-004-20250412 (https://download.01.org/0day-ci/archive/20250412/202504120859.YZRdhT38-lkp@intel.com/config)
compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250412/202504120859.YZRdhT38-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504120859.YZRdhT38-lkp@intel.com/

All errors (new ones prefixed by >>):

>> kernel/cgroup/cgroup.c:2074:39: error: no member named 'revisions' in 'struct cgroup_bpf'
    2074 |         for (i = 0; i < ARRAY_SIZE(cgrp->bpf.revisions); i++)
         |                                    ~~~~~~~~~ ^
   include/linux/array_size.h:11:33: note: expanded from macro 'ARRAY_SIZE'
      11 | #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
         |                                 ^~~
>> kernel/cgroup/cgroup.c:2074:39: error: no member named 'revisions' in 'struct cgroup_bpf'
    2074 |         for (i = 0; i < ARRAY_SIZE(cgrp->bpf.revisions); i++)
         |                                    ~~~~~~~~~ ^
   include/linux/array_size.h:11:48: note: expanded from macro 'ARRAY_SIZE'
      11 | #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
         |                                                ^~~
>> kernel/cgroup/cgroup.c:2074:39: error: no member named 'revisions' in 'struct cgroup_bpf'
    2074 |         for (i = 0; i < ARRAY_SIZE(cgrp->bpf.revisions); i++)
         |                                    ~~~~~~~~~ ^
   include/linux/array_size.h:11:75: note: expanded from macro 'ARRAY_SIZE'
      11 | #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
         |                                                                           ^~~
   include/linux/compiler.h:202:64: note: expanded from macro '__must_be_array'
     202 | #define __must_be_array(a)      __BUILD_BUG_ON_ZERO_MSG(!__is_array(a), \
         |                                                                     ^
   include/linux/compiler.h:201:39: note: expanded from macro '__is_array'
     201 | #define __is_array(a)           (!__same_type((a), &(a)[0]))
         |                                                ^
   include/linux/compiler_types.h:498:63: note: expanded from macro '__same_type'
     498 | #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
         |                                                               ^
   include/linux/compiler.h:197:79: note: expanded from macro '__BUILD_BUG_ON_ZERO_MSG'
     197 | #define __BUILD_BUG_ON_ZERO_MSG(e, msg) ((int)sizeof(struct {_Static_assert(!(e), msg);}))
         |                                                                               ^
   kernel/cgroup/cgroup.c:2075:27: error: no member named 'revisions' in 'struct cgroup_bpf'
    2075 |                 atomic64_set(&cgrp->bpf.revisions[i], 1);
         |                               ~~~~~~~~~ ^
   4 errors generated.


vim +2074 kernel/cgroup/cgroup.c

  2052	
  2053	static void init_cgroup_housekeeping(struct cgroup *cgrp)
  2054	{
  2055		struct cgroup_subsys *ss;
  2056		int i, ssid;
  2057	
  2058		INIT_LIST_HEAD(&cgrp->self.sibling);
  2059		INIT_LIST_HEAD(&cgrp->self.children);
  2060		INIT_LIST_HEAD(&cgrp->cset_links);
  2061		INIT_LIST_HEAD(&cgrp->pidlists);
  2062		mutex_init(&cgrp->pidlist_mutex);
  2063		cgrp->self.cgroup = cgrp;
  2064		cgrp->self.flags |= CSS_ONLINE;
  2065		cgrp->dom_cgrp = cgrp;
  2066		cgrp->max_descendants = INT_MAX;
  2067		cgrp->max_depth = INT_MAX;
  2068		INIT_LIST_HEAD(&cgrp->rstat_css_list);
  2069		prev_cputime_init(&cgrp->prev_cputime);
  2070	
  2071		for_each_subsys(ss, ssid)
  2072			INIT_LIST_HEAD(&cgrp->e_csets[ssid]);
  2073	
> 2074		for (i = 0; i < ARRAY_SIZE(cgrp->bpf.revisions); i++)
  2075			atomic64_set(&cgrp->bpf.revisions[i], 1);
  2076	
  2077		init_waitqueue_head(&cgrp->offline_waitq);
  2078		INIT_WORK(&cgrp->release_agent_work, cgroup1_release_agent);
  2079	}
  2080	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH bpf-next 1/4] cgroup: Add bpf prog revisions to struct cgroup_bpf
  2025-04-11  1:15 ` [RFC PATCH bpf-next 1/4] cgroup: Add bpf prog revisions to struct cgroup_bpf Yonghong Song
  2025-04-12  0:41   ` kernel test robot
  2025-04-12  1:13   ` kernel test robot
@ 2025-04-23 23:19   ` Andrii Nakryiko
  2025-05-08  4:19     ` Yonghong Song
  2 siblings, 1 reply; 13+ messages in thread
From: Andrii Nakryiko @ 2025-04-23 23:19 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On Thu, Apr 10, 2025 at 6:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
> One of key items in mprog API is revision for prog list. The revision
> number will be increased if the prog list changed, e.g., attach, detach
> or replace.
>
> Add 'revisions' field to struct cgroup_bpf, representing revisions for
> all cgroup related attachment types. The initial revision value is
> set to 1, the same as kernel mprog implementations.
>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  include/linux/bpf-cgroup-defs.h | 1 +
>  kernel/cgroup/cgroup.c          | 5 ++++-
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> index 0985221d5478..a3cbbd00731a 100644
> --- a/include/linux/bpf-cgroup-defs.h
> +++ b/include/linux/bpf-cgroup-defs.h
> @@ -62,6 +62,7 @@ struct cgroup_bpf {
>          * when BPF_F_ALLOW_MULTI the list can have up to BPF_CGROUP_MAX_PROGS
>          */
>         struct hlist_head progs[MAX_CGROUP_BPF_ATTACH_TYPE];
> +       atomic64_t revisions[MAX_CGROUP_BPF_ATTACH_TYPE];

for cgroups all the attachment and detachment happens under
cgroup_mutex, so I don't think we need atomic64_t, just plain u64 will
work

>         u8 flags[MAX_CGROUP_BPF_ATTACH_TYPE];
>
>         /* list of cgroup shared storages */
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index ac2db99941ca..dea7d12c8927 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -2053,7 +2053,7 @@ static int cgroup_reconfigure(struct fs_context *fc)
>  static void init_cgroup_housekeeping(struct cgroup *cgrp)
>  {
>         struct cgroup_subsys *ss;
> -       int ssid;
> +       int i, ssid;
>
>         INIT_LIST_HEAD(&cgrp->self.sibling);
>         INIT_LIST_HEAD(&cgrp->self.children);
> @@ -2071,6 +2071,9 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
>         for_each_subsys(ss, ssid)
>                 INIT_LIST_HEAD(&cgrp->e_csets[ssid]);
>
> +       for (i = 0; i < ARRAY_SIZE(cgrp->bpf.revisions); i++)
> +               atomic64_set(&cgrp->bpf.revisions[i], 1);
> +
>         init_waitqueue_head(&cgrp->offline_waitq);
>         INIT_WORK(&cgrp->release_agent_work, cgroup1_release_agent);
>  }
> --
> 2.47.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH bpf-next 1/4] cgroup: Add bpf prog revisions to struct cgroup_bpf
  2025-04-23 23:19   ` Andrii Nakryiko
@ 2025-05-08  4:19     ` Yonghong Song
  0 siblings, 0 replies; 13+ messages in thread
From: Yonghong Song @ 2025-05-08  4:19 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau



On 4/23/25 7:19 AM, Andrii Nakryiko wrote:
> On Thu, Apr 10, 2025 at 6:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>> One of key items in mprog API is revision for prog list. The revision
>> number will be increased if the prog list changed, e.g., attach, detach
>> or replace.
>>
>> Add 'revisions' field to struct cgroup_bpf, representing revisions for
>> all cgroup related attachment types. The initial revision value is
>> set to 1, the same as kernel mprog implementations.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   include/linux/bpf-cgroup-defs.h | 1 +
>>   kernel/cgroup/cgroup.c          | 5 ++++-
>>   2 files changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
>> index 0985221d5478..a3cbbd00731a 100644
>> --- a/include/linux/bpf-cgroup-defs.h
>> +++ b/include/linux/bpf-cgroup-defs.h
>> @@ -62,6 +62,7 @@ struct cgroup_bpf {
>>           * when BPF_F_ALLOW_MULTI the list can have up to BPF_CGROUP_MAX_PROGS
>>           */
>>          struct hlist_head progs[MAX_CGROUP_BPF_ATTACH_TYPE];
>> +       atomic64_t revisions[MAX_CGROUP_BPF_ATTACH_TYPE];
> for cgroups all the attachment and detachment happens under
> cgroup_mutex, so I don't think we need atomic64_t, just plain u64 will
> work

Indeed Make sense.

>
>>          u8 flags[MAX_CGROUP_BPF_ATTACH_TYPE];
>>
>>          /* list of cgroup shared storages */
>> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
>> index ac2db99941ca..dea7d12c8927 100644
>> --- a/kernel/cgroup/cgroup.c
>> +++ b/kernel/cgroup/cgroup.c
>> @@ -2053,7 +2053,7 @@ static int cgroup_reconfigure(struct fs_context *fc)
>>   static void init_cgroup_housekeeping(struct cgroup *cgrp)
>>   {
>>          struct cgroup_subsys *ss;
>> -       int ssid;
>> +       int i, ssid;
>>
>>          INIT_LIST_HEAD(&cgrp->self.sibling);
>>          INIT_LIST_HEAD(&cgrp->self.children);
>> @@ -2071,6 +2071,9 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
>>          for_each_subsys(ss, ssid)
>>                  INIT_LIST_HEAD(&cgrp->e_csets[ssid]);
>>
>> +       for (i = 0; i < ARRAY_SIZE(cgrp->bpf.revisions); i++)
>> +               atomic64_set(&cgrp->bpf.revisions[i], 1);
>> +
>>          init_waitqueue_head(&cgrp->offline_waitq);
>>          INIT_WORK(&cgrp->release_agent_work, cgroup1_release_agent);
>>   }
>> --
>> 2.47.1
>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH bpf-next 2/4] bpf: Implement mprog API on top of existing cgroup progs
  2025-04-11  1:15 [RFC PATCH bpf-next 0/4] bpf: Implement mprog API on top of existing cgroup progs Yonghong Song
  2025-04-11  1:15 ` [RFC PATCH bpf-next 1/4] cgroup: Add bpf prog revisions to struct cgroup_bpf Yonghong Song
@ 2025-04-11  1:15 ` Yonghong Song
  2025-04-23 23:19   ` Andrii Nakryiko
  2025-04-11  1:15 ` [RFC PATCH bpf-next 3/4] libbpf: Support link-based cgroup attach with options Yonghong Song
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Yonghong Song @ 2025-04-11  1:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

Current cgroup prog ordering is appending at attachment time. This is not
ideal. In some cases, users want specific ordering at a particular cgroup
level. To address this, the existing mprog API seems an ideal solution with
supporting BPF_F_BEFORE and BPF_F_AFTER flags.

But there are a few obstacles to directly use kernel mprog interface.
Currently cgroup bpf progs already support prog attach/detach/replace
and link-based attach/detach/replace. For example, in struct
bpf_prog_array_item, the cgroup_storage field needs to be together
with bpf prog. But the mprog API struct bpf_mprog_fp only has bpf_prog
as the member, which makes it difficult to use kernel mprog interface.

In another case, the current cgroup prog detach tries to use the
same flag as in attach. This is different from mprog kernel interface
which uses flags passed from user space.

So to avoid modifying existing behavior, I made the following changes to
support mprog API for cgroup progs:
 - The support is for prog list at cgroup level. Cross-level prog list
   (a.k.a. effective prog list) is not supported.
 - Previously, BPF_F_PREORDER is supported only for prog attach, now
   BPF_F_PREORDER is also supported by link-based attach.
 - For attach, BPF_F_BEFORE/BPF_F_AFTER/BPF_F_ID is supported similar to
   kernel mprog but with different implementation.
 - For detach and replace, use the existing implementation.
 - For attach, detach and replace, the revision for a particular prog
   list, associated with a particular attach type, will be updated
   by increasing count by 1.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 include/uapi/linux/bpf.h       |   7 ++
 kernel/bpf/cgroup.c            | 151 ++++++++++++++++++++++++++++-----
 kernel/bpf/syscall.c           |  58 ++++++++-----
 tools/include/uapi/linux/bpf.h |   7 ++
 4 files changed, 181 insertions(+), 42 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 71d5ac83cf5d..a5c7992e8f7c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1794,6 +1794,13 @@ union bpf_attr {
 				};
 				__u64		expected_revision;
 			} netkit;
+			struct {
+				union {
+					__u32	relative_fd;
+					__u32	relative_id;
+				};
+				__u64		expected_revision;
+			} cgroup;
 		};
 	} link_create;
 
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 84f58f3d028a..ffd455051131 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -624,6 +624,90 @@ static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs,
 	return NULL;
 }
 
+static struct bpf_prog *get_cmp_prog(struct hlist_head *progs, struct bpf_prog *prog,
+				     u32 flags, u32 id_or_fd, struct bpf_prog_list **ppltmp)
+{
+	struct bpf_prog *cmp_prog = NULL, *pltmp_prog;
+	bool preorder = !!(flags & BPF_F_PREORDER);
+	struct bpf_prog_list *pltmp;
+	bool id = flags & BPF_F_ID;
+	bool found;
+
+	if (id || id_or_fd) {
+		/* flags must have BPF_F_BEFORE or BPF_F_AFTER */
+		if (!(flags & (BPF_F_BEFORE | BPF_F_AFTER)))
+			return ERR_PTR(-EINVAL);
+
+		if (id)
+			cmp_prog = bpf_prog_by_id(id_or_fd);
+		else
+			cmp_prog = bpf_prog_get(id_or_fd);
+		if (IS_ERR(cmp_prog))
+			return cmp_prog;
+		if (cmp_prog->type != prog->type)
+			return ERR_PTR(-EINVAL);
+
+		found = false;
+		hlist_for_each_entry(pltmp, progs, node) {
+			pltmp_prog = pltmp->link ? pltmp->link->link.prog : pltmp->prog;
+			if (pltmp_prog == cmp_prog) {
+				if (!!(pltmp->flags & BPF_F_PREORDER) != preorder)
+					return ERR_PTR(-EINVAL);
+				found = true;
+				*ppltmp = pltmp;
+				break;
+			}
+		}
+		if (!found)
+			return ERR_PTR(-ENOENT);
+	}
+
+	return cmp_prog;
+}
+
+static int insert_pl_to_hlist(struct bpf_prog_list *pl, struct hlist_head *progs,
+			      struct bpf_prog *prog, u32 flags, u32 id_or_fd)
+{
+	struct hlist_node *last, *last_node = NULL;
+	struct bpf_prog_list *pltmp = NULL;
+	struct bpf_prog *cmp_prog;
+
+	/* flags cannot have both BPF_F_BEFORE and BPF_F_AFTER */
+	if ((flags & BPF_F_BEFORE) && (flags & BPF_F_AFTER))
+		return -EINVAL;
+
+	cmp_prog = get_cmp_prog(progs, prog, flags, id_or_fd, &pltmp);
+	if (IS_ERR(cmp_prog))
+		return PTR_ERR(cmp_prog);
+
+	if (hlist_empty(progs)) {
+		hlist_add_head(&pl->node, progs);
+	} else {
+		hlist_for_each(last, progs) {
+			if (last->next)
+				continue;
+			last_node = last;
+			break;
+		}
+
+		if (!cmp_prog) {
+			if (flags & BPF_F_BEFORE)
+				hlist_add_head(&pl->node, progs);
+			else
+				hlist_add_behind(&pl->node, last_node);
+		} else {
+			if (flags & BPF_F_BEFORE)
+				hlist_add_before(&pl->node, &pltmp->node);
+			else if (flags & BPF_F_AFTER)
+				hlist_add_behind(&pl->node, &pltmp->node);
+			else
+				hlist_add_behind(&pl->node, last_node);
+		}
+	}
+
+	return 0;
+}
+
 /**
  * __cgroup_bpf_attach() - Attach the program or the link to a cgroup, and
  *                         propagate the change to descendants
@@ -633,6 +717,8 @@ static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs,
  * @replace_prog: Previously attached program to replace if BPF_F_REPLACE is set
  * @type: Type of attach operation
  * @flags: Option flags
+ * @id_or_fd: Relative prog id or fd
+ * @revision: bpf_prog_list revision
  *
  * Exactly one of @prog or @link can be non-null.
  * Must be called with cgroup_mutex held.
@@ -640,7 +726,8 @@ static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs,
 static int __cgroup_bpf_attach(struct cgroup *cgrp,
 			       struct bpf_prog *prog, struct bpf_prog *replace_prog,
 			       struct bpf_cgroup_link *link,
-			       enum bpf_attach_type type, u32 flags)
+			       enum bpf_attach_type type, u32 flags, u32 id_or_fd,
+			       u64 revision)
 {
 	u32 saved_flags = (flags & (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI));
 	struct bpf_prog *old_prog = NULL;
@@ -656,6 +743,9 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
 	    ((flags & BPF_F_REPLACE) && !(flags & BPF_F_ALLOW_MULTI)))
 		/* invalid combination */
 		return -EINVAL;
+	if ((flags & BPF_F_REPLACE) && (flags & (BPF_F_BEFORE | BPF_F_AFTER)))
+		/* only either replace or insertion with before/after */
+		return -EINVAL;
 	if (link && (prog || replace_prog))
 		/* only either link or prog/replace_prog can be specified */
 		return -EINVAL;
@@ -663,9 +753,12 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
 		/* replace_prog implies BPF_F_REPLACE, and vice versa */
 		return -EINVAL;
 
+
 	atype = bpf_cgroup_atype_find(type, new_prog->aux->attach_btf_id);
 	if (atype < 0)
 		return -EINVAL;
+	if (revision && revision != atomic64_read(&cgrp->bpf.revisions[atype]))
+		return -ESTALE;
 
 	progs = &cgrp->bpf.progs[atype];
 
@@ -694,22 +787,18 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
 	if (pl) {
 		old_prog = pl->prog;
 	} else {
-		struct hlist_node *last = NULL;
-
 		pl = kmalloc(sizeof(*pl), GFP_KERNEL);
 		if (!pl) {
 			bpf_cgroup_storages_free(new_storage);
 			return -ENOMEM;
 		}
-		if (hlist_empty(progs))
-			hlist_add_head(&pl->node, progs);
-		else
-			hlist_for_each(last, progs) {
-				if (last->next)
-					continue;
-				hlist_add_behind(&pl->node, last);
-				break;
-			}
+
+		err = insert_pl_to_hlist(pl, progs, prog ? : link->link.prog, flags, id_or_fd);
+		if (err) {
+			kfree(pl);
+			bpf_cgroup_storages_free(new_storage);
+			return err;
+		}
 	}
 
 	pl->prog = prog;
@@ -728,6 +817,7 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
 	if (err)
 		goto cleanup_trampoline;
 
+	atomic64_inc(&cgrp->bpf.revisions[atype]);
 	if (old_prog) {
 		if (type == BPF_LSM_CGROUP)
 			bpf_trampoline_unlink_cgroup_shim(old_prog);
@@ -759,12 +849,13 @@ static int cgroup_bpf_attach(struct cgroup *cgrp,
 			     struct bpf_prog *prog, struct bpf_prog *replace_prog,
 			     struct bpf_cgroup_link *link,
 			     enum bpf_attach_type type,
-			     u32 flags)
+			     u32 flags, u32 id_or_fd, u64 revision)
 {
 	int ret;
 
 	cgroup_lock();
-	ret = __cgroup_bpf_attach(cgrp, prog, replace_prog, link, type, flags);
+	ret = __cgroup_bpf_attach(cgrp, prog, replace_prog, link, type, flags,
+				  id_or_fd, revision);
 	cgroup_unlock();
 	return ret;
 }
@@ -852,6 +943,7 @@ static int __cgroup_bpf_replace(struct cgroup *cgrp,
 	if (!found)
 		return -ENOENT;
 
+	atomic64_inc(&cgrp->bpf.revisions[atype]);
 	old_prog = xchg(&link->link.prog, new_prog);
 	replace_effective_prog(cgrp, atype, link);
 	bpf_prog_put(old_prog);
@@ -977,12 +1069,14 @@ static void purge_effective_progs(struct cgroup *cgrp, struct bpf_prog *prog,
  * @prog: A program to detach or NULL
  * @link: A link to detach or NULL
  * @type: Type of detach operation
+ * @revision: bpf_prog_list revision
  *
  * At most one of @prog or @link can be non-NULL.
  * Must be called with cgroup_mutex held.
  */
 static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
-			       struct bpf_cgroup_link *link, enum bpf_attach_type type)
+			       struct bpf_cgroup_link *link, enum bpf_attach_type type,
+			       u64 revision)
 {
 	enum cgroup_bpf_attach_type atype;
 	struct bpf_prog *old_prog;
@@ -1000,6 +1094,9 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 	if (atype < 0)
 		return -EINVAL;
 
+	if (revision && revision != atomic64_read(&cgrp->bpf.revisions[atype]))
+		return -ESTALE;
+
 	progs = &cgrp->bpf.progs[atype];
 	flags = cgrp->bpf.flags[atype];
 
@@ -1025,6 +1122,7 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 
 	/* now can actually delete it from this cgroup list */
 	hlist_del(&pl->node);
+	atomic64_inc(&cgrp->bpf.revisions[atype]);
 
 	kfree(pl);
 	if (hlist_empty(progs))
@@ -1040,12 +1138,12 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 }
 
 static int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
-			     enum bpf_attach_type type)
+			     enum bpf_attach_type type, u64 revision)
 {
 	int ret;
 
 	cgroup_lock();
-	ret = __cgroup_bpf_detach(cgrp, prog, NULL, type);
+	ret = __cgroup_bpf_detach(cgrp, prog, NULL, type, revision);
 	cgroup_unlock();
 	return ret;
 }
@@ -1063,6 +1161,7 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 	struct bpf_prog_array *effective;
 	int cnt, ret = 0, i;
 	int total_cnt = 0;
+	u64 revision = 0;
 	u32 flags;
 
 	if (effective_query && prog_attach_flags)
@@ -1100,6 +1199,10 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 		return -EFAULT;
 	if (copy_to_user(&uattr->query.prog_cnt, &total_cnt, sizeof(total_cnt)))
 		return -EFAULT;
+	if (!effective_query && from_atype == to_atype)
+		revision = atomic64_read(&cgrp->bpf.revisions[from_atype]);
+	if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision)))
+		return -EFAULT;
 	if (attr->query.prog_cnt == 0 || !prog_ids || !total_cnt)
 		/* return early if user requested only program count + flags */
 		return 0;
@@ -1182,7 +1285,8 @@ int cgroup_bpf_prog_attach(const union bpf_attr *attr,
 	}
 
 	ret = cgroup_bpf_attach(cgrp, prog, replace_prog, NULL,
-				attr->attach_type, attr->attach_flags);
+				attr->attach_type, attr->attach_flags,
+				attr->relative_fd, attr->expected_revision);
 
 	if (replace_prog)
 		bpf_prog_put(replace_prog);
@@ -1204,7 +1308,7 @@ int cgroup_bpf_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
 	if (IS_ERR(prog))
 		prog = NULL;
 
-	ret = cgroup_bpf_detach(cgrp, prog, attr->attach_type);
+	ret = cgroup_bpf_detach(cgrp, prog, attr->attach_type, attr->expected_revision);
 	if (prog)
 		bpf_prog_put(prog);
 
@@ -1233,7 +1337,7 @@ static void bpf_cgroup_link_release(struct bpf_link *link)
 	}
 
 	WARN_ON(__cgroup_bpf_detach(cg_link->cgroup, NULL, cg_link,
-				    cg_link->type));
+				    cg_link->type, 0));
 	if (cg_link->type == BPF_LSM_CGROUP)
 		bpf_trampoline_unlink_cgroup_shim(cg_link->link.prog);
 
@@ -1312,7 +1416,8 @@ int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 	struct cgroup *cgrp;
 	int err;
 
-	if (attr->link_create.flags)
+	if (attr->link_create.flags &&
+	    (attr->link_create.flags & (~(BPF_F_ID | BPF_F_BEFORE | BPF_F_AFTER | BPF_F_PREORDER))))
 		return -EINVAL;
 
 	cgrp = cgroup_get_from_fd(attr->link_create.target_fd);
@@ -1336,7 +1441,9 @@ int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 	}
 
 	err = cgroup_bpf_attach(cgrp, NULL, NULL, link,
-				link->type, BPF_F_ALLOW_MULTI);
+				link->type, BPF_F_ALLOW_MULTI | attr->link_create.flags,
+				attr->link_create.cgroup.relative_fd,
+				attr->link_create.cgroup.expected_revision);
 	if (err) {
 		bpf_link_cleanup(&link_primer);
 		goto out_put_cgroup;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 9794446bc8c6..48cf855f949f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4183,6 +4183,23 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
 	}
 }
 
+static bool bpf_cgroup_prog_attached(enum bpf_prog_type ptype)
+{
+	switch (ptype) {
+	case BPF_PROG_TYPE_CGROUP_DEVICE:
+	case BPF_PROG_TYPE_CGROUP_SKB:
+	case BPF_PROG_TYPE_CGROUP_SOCK:
+	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
+	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
+	case BPF_PROG_TYPE_CGROUP_SYSCTL:
+	case BPF_PROG_TYPE_SOCK_OPS:
+	case BPF_PROG_TYPE_LSM:
+		return true;
+	default:
+		return false;
+	}
+}
+
 #define BPF_PROG_ATTACH_LAST_FIELD expected_revision
 
 #define BPF_F_ATTACH_MASK_BASE	\
@@ -4214,11 +4231,15 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 		if (attr->attach_flags & ~BPF_F_ATTACH_MASK_MPROG)
 			return -EINVAL;
 	} else {
-		if (attr->attach_flags & ~BPF_F_ATTACH_MASK_BASE)
-			return -EINVAL;
-		if (attr->relative_fd ||
-		    attr->expected_revision)
-			return -EINVAL;
+		if (bpf_cgroup_prog_attached(ptype)) {
+			if (attr->attach_flags & BPF_F_LINK)
+				return -EINVAL;
+		} else {
+			if (attr->attach_flags & ~BPF_F_ATTACH_MASK_BASE)
+				return -EINVAL;
+			if (attr->relative_fd || attr->expected_revision)
+				return -EINVAL;
+		}
 	}
 
 	prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype);
@@ -4241,20 +4262,6 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
 		ret = netns_bpf_prog_attach(attr, prog);
 		break;
-	case BPF_PROG_TYPE_CGROUP_DEVICE:
-	case BPF_PROG_TYPE_CGROUP_SKB:
-	case BPF_PROG_TYPE_CGROUP_SOCK:
-	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
-	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
-	case BPF_PROG_TYPE_CGROUP_SYSCTL:
-	case BPF_PROG_TYPE_SOCK_OPS:
-	case BPF_PROG_TYPE_LSM:
-		if (ptype == BPF_PROG_TYPE_LSM &&
-		    prog->expected_attach_type != BPF_LSM_CGROUP)
-			ret = -EINVAL;
-		else
-			ret = cgroup_bpf_prog_attach(attr, ptype, prog);
-		break;
 	case BPF_PROG_TYPE_SCHED_CLS:
 		if (attr->attach_type == BPF_TCX_INGRESS ||
 		    attr->attach_type == BPF_TCX_EGRESS)
@@ -4263,7 +4270,15 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 			ret = netkit_prog_attach(attr, prog);
 		break;
 	default:
-		ret = -EINVAL;
+		if (!bpf_cgroup_prog_attached(ptype)) {
+			ret = -EINVAL;
+		} else {
+			if (ptype == BPF_PROG_TYPE_LSM &&
+			    prog->expected_attach_type != BPF_LSM_CGROUP)
+				ret = -EINVAL;
+			else
+				ret = cgroup_bpf_prog_attach(attr, ptype, prog);
+		}
 	}
 
 	if (ret)
@@ -4293,6 +4308,9 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 			if (IS_ERR(prog))
 				return PTR_ERR(prog);
 		}
+	} else if (bpf_cgroup_prog_attached(ptype)) {
+		if (attr->attach_flags || attr->relative_fd)
+			return -EINVAL;
 	} else if (attr->attach_flags ||
 		   attr->relative_fd ||
 		   attr->expected_revision) {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 71d5ac83cf5d..a5c7992e8f7c 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1794,6 +1794,13 @@ union bpf_attr {
 				};
 				__u64		expected_revision;
 			} netkit;
+			struct {
+				union {
+					__u32	relative_fd;
+					__u32	relative_id;
+				};
+				__u64		expected_revision;
+			} cgroup;
 		};
 	} link_create;
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH bpf-next 2/4] bpf: Implement mprog API on top of existing cgroup progs
  2025-04-11  1:15 ` [RFC PATCH bpf-next 2/4] bpf: Implement mprog API on top of existing cgroup progs Yonghong Song
@ 2025-04-23 23:19   ` Andrii Nakryiko
  2025-05-08  4:42     ` Yonghong Song
  0 siblings, 1 reply; 13+ messages in thread
From: Andrii Nakryiko @ 2025-04-23 23:19 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On Thu, Apr 10, 2025 at 6:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
> Current cgroup prog ordering is appending at attachment time. This is not
> ideal. In some cases, users want specific ordering at a particular cgroup
> level. To address this, the existing mprog API seems an ideal solution with
> supporting BPF_F_BEFORE and BPF_F_AFTER flags.
>
> But there are a few obstacles to directly use kernel mprog interface.
> Currently cgroup bpf progs already support prog attach/detach/replace
> and link-based attach/detach/replace. For example, in struct
> bpf_prog_array_item, the cgroup_storage field needs to be together
> with bpf prog. But the mprog API struct bpf_mprog_fp only has bpf_prog
> as the member, which makes it difficult to use kernel mprog interface.
>
> In another case, the current cgroup prog detach tries to use the
> same flag as in attach. This is different from mprog kernel interface
> which uses flags passed from user space.
>
> So to avoid modifying existing behavior, I made the following changes to
> support mprog API for cgroup progs:
>  - The support is for prog list at cgroup level. Cross-level prog list
>    (a.k.a. effective prog list) is not supported.
>  - Previously, BPF_F_PREORDER is supported only for prog attach, now
>    BPF_F_PREORDER is also supported by link-based attach.
>  - For attach, BPF_F_BEFORE/BPF_F_AFTER/BPF_F_ID is supported similar to
>    kernel mprog but with different implementation.
>  - For detach and replace, use the existing implementation.
>  - For attach, detach and replace, the revision for a particular prog
>    list, associated with a particular attach type, will be updated
>    by increasing count by 1.
>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  include/uapi/linux/bpf.h       |   7 ++
>  kernel/bpf/cgroup.c            | 151 ++++++++++++++++++++++++++++-----
>  kernel/bpf/syscall.c           |  58 ++++++++-----
>  tools/include/uapi/linux/bpf.h |   7 ++
>  4 files changed, 181 insertions(+), 42 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 71d5ac83cf5d..a5c7992e8f7c 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1794,6 +1794,13 @@ union bpf_attr {
>                                 };
>                                 __u64           expected_revision;
>                         } netkit;
> +                       struct {
> +                               union {
> +                                       __u32   relative_fd;
> +                                       __u32   relative_id;
> +                               };
> +                               __u64           expected_revision;
> +                       } cgroup;
>                 };
>         } link_create;
>
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 84f58f3d028a..ffd455051131 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -624,6 +624,90 @@ static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs,
>         return NULL;
>  }
>
> +static struct bpf_prog *get_cmp_prog(struct hlist_head *progs, struct bpf_prog *prog,
> +                                    u32 flags, u32 id_or_fd, struct bpf_prog_list **ppltmp)
> +{
> +       struct bpf_prog *cmp_prog = NULL, *pltmp_prog;
> +       bool preorder = !!(flags & BPF_F_PREORDER);

nit: !!() pattern is not necessary when assigning to bool and is just
a visual and cognitive noise

> +       struct bpf_prog_list *pltmp;
> +       bool id = flags & BPF_F_ID;
> +       bool found;
> +
> +       if (id || id_or_fd) {

let's invert the condition and exit early? also I find "id" as a bool
very confusing, I think it's fine to just open-coded it in two places
you actually check for BPF_F_ID flag.

But also, isn't this `if (id)` part redundant? Would just `if
(id_or_fd)` be enough?


> +               /* flags must have BPF_F_BEFORE or BPF_F_AFTER */
> +               if (!(flags & (BPF_F_BEFORE | BPF_F_AFTER)))
> +                       return ERR_PTR(-EINVAL);
> +
> +               if (id)
> +                       cmp_prog = bpf_prog_by_id(id_or_fd);
> +               else
> +                       cmp_prog = bpf_prog_get(id_or_fd);

how about we use "anchor" terminology here? So this would be "anchor
program" or anchor_prog?

> +               if (IS_ERR(cmp_prog))
> +                       return cmp_prog;
> +               if (cmp_prog->type != prog->type)

bpf_prog_put?

pw-bot: cr

> +                       return ERR_PTR(-EINVAL);
> +
> +               found = false;
> +               hlist_for_each_entry(pltmp, progs, node) {
> +                       pltmp_prog = pltmp->link ? pltmp->link->link.prog : pltmp->prog;
> +                       if (pltmp_prog == cmp_prog) {

try keeping nesting minimal:

if (pltmp_prog != cmp_prog)
    continue;

> +                               if (!!(pltmp->flags & BPF_F_PREORDER) != preorder)
> +                                       return ERR_PTR(-EINVAL);
> +                               found = true;
> +                               *ppltmp = pltmp;

we don't need found flag if we set ppltmp to NULL before loop, and to
non-NULL if we find the match

> +                               break;
> +                       }
> +               }
> +               if (!found)

bpf_prog_put(cmp_prog)

> +                       return ERR_PTR(-ENOENT);
> +       }
> +
> +       return cmp_prog;
> +}
> +
> +static int insert_pl_to_hlist(struct bpf_prog_list *pl, struct hlist_head *progs,
> +                             struct bpf_prog *prog, u32 flags, u32 id_or_fd)
> +{
> +       struct hlist_node *last, *last_node = NULL;
> +       struct bpf_prog_list *pltmp = NULL;
> +       struct bpf_prog *cmp_prog;
> +
> +       /* flags cannot have both BPF_F_BEFORE and BPF_F_AFTER */
> +       if ((flags & BPF_F_BEFORE) && (flags & BPF_F_AFTER))
> +               return -EINVAL;
> +
> +       cmp_prog = get_cmp_prog(progs, prog, flags, id_or_fd, &pltmp);

why get_cmp_prog can't return this last_node if we have BPF_F_AFTER
with no id/fd specified? Then you wouldn't have to special-case
appending (same for prepending, actually)?

> +       if (IS_ERR(cmp_prog))
> +               return PTR_ERR(cmp_prog);
> +
> +       if (hlist_empty(progs)) {
> +               hlist_add_head(&pl->node, progs);
> +       } else {
> +               hlist_for_each(last, progs) {
> +                       if (last->next)
> +                               continue;
> +                       last_node = last;
> +                       break;
> +               }
> +
> +               if (!cmp_prog) {
> +                       if (flags & BPF_F_BEFORE)
> +                               hlist_add_head(&pl->node, progs);
> +                       else
> +                               hlist_add_behind(&pl->node, last_node);
> +               } else {
> +                       if (flags & BPF_F_BEFORE)
> +                               hlist_add_before(&pl->node, &pltmp->node);
> +                       else if (flags & BPF_F_AFTER)
> +                               hlist_add_behind(&pl->node, &pltmp->node);
> +                       else
> +                               hlist_add_behind(&pl->node, last_node);
> +               }
> +       }
> +
> +       return 0;
> +}
> +
>  /**
>   * __cgroup_bpf_attach() - Attach the program or the link to a cgroup, and
>   *                         propagate the change to descendants
> @@ -633,6 +717,8 @@ static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs,
>   * @replace_prog: Previously attached program to replace if BPF_F_REPLACE is set
>   * @type: Type of attach operation
>   * @flags: Option flags
> + * @id_or_fd: Relative prog id or fd
> + * @revision: bpf_prog_list revision
>   *
>   * Exactly one of @prog or @link can be non-null.
>   * Must be called with cgroup_mutex held.
> @@ -640,7 +726,8 @@ static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs,
>  static int __cgroup_bpf_attach(struct cgroup *cgrp,
>                                struct bpf_prog *prog, struct bpf_prog *replace_prog,
>                                struct bpf_cgroup_link *link,
> -                              enum bpf_attach_type type, u32 flags)
> +                              enum bpf_attach_type type, u32 flags, u32 id_or_fd,
> +                              u64 revision)
>  {
>         u32 saved_flags = (flags & (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI));
>         struct bpf_prog *old_prog = NULL;
> @@ -656,6 +743,9 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>             ((flags & BPF_F_REPLACE) && !(flags & BPF_F_ALLOW_MULTI)))
>                 /* invalid combination */
>                 return -EINVAL;
> +       if ((flags & BPF_F_REPLACE) && (flags & (BPF_F_BEFORE | BPF_F_AFTER)))
> +               /* only either replace or insertion with before/after */
> +               return -EINVAL;
>         if (link && (prog || replace_prog))
>                 /* only either link or prog/replace_prog can be specified */
>                 return -EINVAL;
> @@ -663,9 +753,12 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>                 /* replace_prog implies BPF_F_REPLACE, and vice versa */
>                 return -EINVAL;
>
> +
>         atype = bpf_cgroup_atype_find(type, new_prog->aux->attach_btf_id);
>         if (atype < 0)
>                 return -EINVAL;
> +       if (revision && revision != atomic64_read(&cgrp->bpf.revisions[atype]))

this is happening under lock, no need for atomic operations

> +               return -ESTALE;
>
>         progs = &cgrp->bpf.progs[atype];
>

[...]

> @@ -1063,6 +1161,7 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>         struct bpf_prog_array *effective;
>         int cnt, ret = 0, i;
>         int total_cnt = 0;
> +       u64 revision = 0;
>         u32 flags;
>
>         if (effective_query && prog_attach_flags)
> @@ -1100,6 +1199,10 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>                 return -EFAULT;
>         if (copy_to_user(&uattr->query.prog_cnt, &total_cnt, sizeof(total_cnt)))
>                 return -EFAULT;
> +       if (!effective_query && from_atype == to_atype)
> +               revision = atomic64_read(&cgrp->bpf.revisions[from_atype]);

even here we hold cgroup_mutex

> +       if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision)))
> +               return -EFAULT;
>         if (attr->query.prog_cnt == 0 || !prog_ids || !total_cnt)
>                 /* return early if user requested only program count + flags */
>                 return 0;

[...]

> @@ -1336,7 +1441,9 @@ int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>         }
>
>         err = cgroup_bpf_attach(cgrp, NULL, NULL, link,
> -                               link->type, BPF_F_ALLOW_MULTI);
> +                               link->type, BPF_F_ALLOW_MULTI | attr->link_create.flags,
> +                               attr->link_create.cgroup.relative_fd,
> +                               attr->link_create.cgroup.expected_revision);
>         if (err) {
>                 bpf_link_cleanup(&link_primer);
>                 goto out_put_cgroup;
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 9794446bc8c6..48cf855f949f 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -4183,6 +4183,23 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
>         }
>  }
>
> +static bool bpf_cgroup_prog_attached(enum bpf_prog_type ptype)

I find this "attached" naming misleading. I think the name should call
out that we are just checking if program type is cgroup-attaching, so
maybe something like "is_cgroup_prog_type", or something along those
lines?


But for LSM we need to look at expected_attach_type to be
BPF_LSM_CGROUP, so maybe pass both prog type and expected attach type?

> +{
> +       switch (ptype) {
> +       case BPF_PROG_TYPE_CGROUP_DEVICE:
> +       case BPF_PROG_TYPE_CGROUP_SKB:
> +       case BPF_PROG_TYPE_CGROUP_SOCK:
> +       case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
> +       case BPF_PROG_TYPE_CGROUP_SOCKOPT:
> +       case BPF_PROG_TYPE_CGROUP_SYSCTL:
> +       case BPF_PROG_TYPE_SOCK_OPS:
> +       case BPF_PROG_TYPE_LSM:
> +               return true;
> +       default:
> +               return false;
> +       }
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH bpf-next 2/4] bpf: Implement mprog API on top of existing cgroup progs
  2025-04-23 23:19   ` Andrii Nakryiko
@ 2025-05-08  4:42     ` Yonghong Song
  0 siblings, 0 replies; 13+ messages in thread
From: Yonghong Song @ 2025-05-08  4:42 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau



On 4/23/25 7:19 AM, Andrii Nakryiko wrote:
> On Thu, Apr 10, 2025 at 6:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>> Current cgroup prog ordering is appending at attachment time. This is not
>> ideal. In some cases, users want specific ordering at a particular cgroup
>> level. To address this, the existing mprog API seems an ideal solution with
>> supporting BPF_F_BEFORE and BPF_F_AFTER flags.
>>
>> But there are a few obstacles to directly use kernel mprog interface.
>> Currently cgroup bpf progs already support prog attach/detach/replace
>> and link-based attach/detach/replace. For example, in struct
>> bpf_prog_array_item, the cgroup_storage field needs to be together
>> with bpf prog. But the mprog API struct bpf_mprog_fp only has bpf_prog
>> as the member, which makes it difficult to use kernel mprog interface.
>>
>> In another case, the current cgroup prog detach tries to use the
>> same flag as in attach. This is different from mprog kernel interface
>> which uses flags passed from user space.
>>
>> So to avoid modifying existing behavior, I made the following changes to
>> support mprog API for cgroup progs:
>>   - The support is for prog list at cgroup level. Cross-level prog list
>>     (a.k.a. effective prog list) is not supported.
>>   - Previously, BPF_F_PREORDER is supported only for prog attach, now
>>     BPF_F_PREORDER is also supported by link-based attach.
>>   - For attach, BPF_F_BEFORE/BPF_F_AFTER/BPF_F_ID is supported similar to
>>     kernel mprog but with different implementation.
>>   - For detach and replace, use the existing implementation.
>>   - For attach, detach and replace, the revision for a particular prog
>>     list, associated with a particular attach type, will be updated
>>     by increasing count by 1.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   include/uapi/linux/bpf.h       |   7 ++
>>   kernel/bpf/cgroup.c            | 151 ++++++++++++++++++++++++++++-----
>>   kernel/bpf/syscall.c           |  58 ++++++++-----
>>   tools/include/uapi/linux/bpf.h |   7 ++
>>   4 files changed, 181 insertions(+), 42 deletions(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 71d5ac83cf5d..a5c7992e8f7c 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1794,6 +1794,13 @@ union bpf_attr {
>>                                  };
>>                                  __u64           expected_revision;
>>                          } netkit;
>> +                       struct {
>> +                               union {
>> +                                       __u32   relative_fd;
>> +                                       __u32   relative_id;
>> +                               };
>> +                               __u64           expected_revision;
>> +                       } cgroup;
>>                  };
>>          } link_create;
>>
>> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
>> index 84f58f3d028a..ffd455051131 100644
>> --- a/kernel/bpf/cgroup.c
>> +++ b/kernel/bpf/cgroup.c
>> @@ -624,6 +624,90 @@ static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs,
>>          return NULL;
>>   }
>>
>> +static struct bpf_prog *get_cmp_prog(struct hlist_head *progs, struct bpf_prog *prog,
>> +                                    u32 flags, u32 id_or_fd, struct bpf_prog_list **ppltmp)
>> +{
>> +       struct bpf_prog *cmp_prog = NULL, *pltmp_prog;
>> +       bool preorder = !!(flags & BPF_F_PREORDER);
> nit: !!() pattern is not necessary when assigning to bool and is just
> a visual and cognitive noise

Okay, will remove !! operators.

>
>> +       struct bpf_prog_list *pltmp;
>> +       bool id = flags & BPF_F_ID;
>> +       bool found;
>> +
>> +       if (id || id_or_fd) {
> let's invert the condition and exit early? also I find "id" as a bool
> very confusing, I think it's fine to just open-coded it in two places
> you actually check for BPF_F_ID flag.

Open-codedflags & BPF_F_ID is okay.

> But also, isn't this `if (id)` part redundant? Would just `if
> (id_or_fd)` be enough?

Yes, id_or_fd should be enough.

>
>
>> +               /* flags must have BPF_F_BEFORE or BPF_F_AFTER */
>> +               if (!(flags & (BPF_F_BEFORE | BPF_F_AFTER)))
>> +                       return ERR_PTR(-EINVAL);
>> +
>> +               if (id)
>> +                       cmp_prog = bpf_prog_by_id(id_or_fd);
>> +               else
>> +                       cmp_prog = bpf_prog_get(id_or_fd);
> how about we use "anchor" terminology here? So this would be "anchor
> program" or anchor_prog?

anchor_prog sounds good.

>
>> +               if (IS_ERR(cmp_prog))
>> +                       return cmp_prog;
>> +               if (cmp_prog->type != prog->type)
> bpf_prog_put?

Yes. Missed bpf_prog_put. Will fix.

>
> pw-bot: cr
>
>> +                       return ERR_PTR(-EINVAL);
>> +
>> +               found = false;
>> +               hlist_for_each_entry(pltmp, progs, node) {
>> +                       pltmp_prog = pltmp->link ? pltmp->link->link.prog : pltmp->prog;
>> +                       if (pltmp_prog == cmp_prog) {
> try keeping nesting minimal:
>
> if (pltmp_prog != cmp_prog)
>      continue;

Good point. Will do.

>
>> +                               if (!!(pltmp->flags & BPF_F_PREORDER) != preorder)
>> +                                       return ERR_PTR(-EINVAL);
>> +                               found = true;
>> +                               *ppltmp = pltmp;
> we don't need found flag if we set ppltmp to NULL before loop, and to
> non-NULL if we find the match

Ack.

>
>> +                               break;
>> +                       }
>> +               }
>> +               if (!found)
> bpf_prog_put(cmp_prog)

Again, will do bpf_prog_put before turn error.

>
>> +                       return ERR_PTR(-ENOENT);
>> +       }
>> +
>> +       return cmp_prog;
>> +}
>> +
>> +static int insert_pl_to_hlist(struct bpf_prog_list *pl, struct hlist_head *progs,
>> +                             struct bpf_prog *prog, u32 flags, u32 id_or_fd)
>> +{
>> +       struct hlist_node *last, *last_node = NULL;
>> +       struct bpf_prog_list *pltmp = NULL;
>> +       struct bpf_prog *cmp_prog;
>> +
>> +       /* flags cannot have both BPF_F_BEFORE and BPF_F_AFTER */
>> +       if ((flags & BPF_F_BEFORE) && (flags & BPF_F_AFTER))
>> +               return -EINVAL;
>> +
>> +       cmp_prog = get_cmp_prog(progs, prog, flags, id_or_fd, &pltmp);
> why get_cmp_prog can't return this last_node if we have BPF_F_AFTER
> with no id/fd specified? Then you wouldn't have to special-case
> appending (same for prepending, actually)?

We could do this. Let me try this.

>
>> +       if (IS_ERR(cmp_prog))
>> +               return PTR_ERR(cmp_prog);
>> +
>> +       if (hlist_empty(progs)) {
>> +               hlist_add_head(&pl->node, progs);
>> +       } else {
>> +               hlist_for_each(last, progs) {
>> +                       if (last->next)
>> +                               continue;
>> +                       last_node = last;
>> +                       break;
>> +               }
>> +
>> +               if (!cmp_prog) {
>> +                       if (flags & BPF_F_BEFORE)
>> +                               hlist_add_head(&pl->node, progs);
>> +                       else
>> +                               hlist_add_behind(&pl->node, last_node);
>> +               } else {
>> +                       if (flags & BPF_F_BEFORE)
>> +                               hlist_add_before(&pl->node, &pltmp->node);
>> +                       else if (flags & BPF_F_AFTER)
>> +                               hlist_add_behind(&pl->node, &pltmp->node);
>> +                       else
>> +                               hlist_add_behind(&pl->node, last_node);
>> +               }
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>>   /**
>>    * __cgroup_bpf_attach() - Attach the program or the link to a cgroup, and
>>    *                         propagate the change to descendants
>> @@ -633,6 +717,8 @@ static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs,
>>    * @replace_prog: Previously attached program to replace if BPF_F_REPLACE is set
>>    * @type: Type of attach operation
>>    * @flags: Option flags
>> + * @id_or_fd: Relative prog id or fd
>> + * @revision: bpf_prog_list revision
>>    *
>>    * Exactly one of @prog or @link can be non-null.
>>    * Must be called with cgroup_mutex held.
>> @@ -640,7 +726,8 @@ static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs,
>>   static int __cgroup_bpf_attach(struct cgroup *cgrp,
>>                                 struct bpf_prog *prog, struct bpf_prog *replace_prog,
>>                                 struct bpf_cgroup_link *link,
>> -                              enum bpf_attach_type type, u32 flags)
>> +                              enum bpf_attach_type type, u32 flags, u32 id_or_fd,
>> +                              u64 revision)
>>   {
>>          u32 saved_flags = (flags & (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI));
>>          struct bpf_prog *old_prog = NULL;
>> @@ -656,6 +743,9 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>>              ((flags & BPF_F_REPLACE) && !(flags & BPF_F_ALLOW_MULTI)))
>>                  /* invalid combination */
>>                  return -EINVAL;
>> +       if ((flags & BPF_F_REPLACE) && (flags & (BPF_F_BEFORE | BPF_F_AFTER)))
>> +               /* only either replace or insertion with before/after */
>> +               return -EINVAL;
>>          if (link && (prog || replace_prog))
>>                  /* only either link or prog/replace_prog can be specified */
>>                  return -EINVAL;
>> @@ -663,9 +753,12 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>>                  /* replace_prog implies BPF_F_REPLACE, and vice versa */
>>                  return -EINVAL;
>>
>> +
>>          atype = bpf_cgroup_atype_find(type, new_prog->aux->attach_btf_id);
>>          if (atype < 0)
>>                  return -EINVAL;
>> +       if (revision && revision != atomic64_read(&cgrp->bpf.revisions[atype]))
> this is happening under lock, no need for atomic operations

Ack

>
>> +               return -ESTALE;
>>
>>          progs = &cgrp->bpf.progs[atype];
>>
> [...]
>
>> @@ -1063,6 +1161,7 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>>          struct bpf_prog_array *effective;
>>          int cnt, ret = 0, i;
>>          int total_cnt = 0;
>> +       u64 revision = 0;
>>          u32 flags;
>>
>>          if (effective_query && prog_attach_flags)
>> @@ -1100,6 +1199,10 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>>                  return -EFAULT;
>>          if (copy_to_user(&uattr->query.prog_cnt, &total_cnt, sizeof(total_cnt)))
>>                  return -EFAULT;
>> +       if (!effective_query && from_atype == to_atype)
>> +               revision = atomic64_read(&cgrp->bpf.revisions[from_atype]);
> even here we hold cgroup_mutex

Ack

>
>> +       if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision)))
>> +               return -EFAULT;
>>          if (attr->query.prog_cnt == 0 || !prog_ids || !total_cnt)
>>                  /* return early if user requested only program count + flags */
>>                  return 0;
> [...]
>
>> @@ -1336,7 +1441,9 @@ int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>>          }
>>
>>          err = cgroup_bpf_attach(cgrp, NULL, NULL, link,
>> -                               link->type, BPF_F_ALLOW_MULTI);
>> +                               link->type, BPF_F_ALLOW_MULTI | attr->link_create.flags,
>> +                               attr->link_create.cgroup.relative_fd,
>> +                               attr->link_create.cgroup.expected_revision);
>>          if (err) {
>>                  bpf_link_cleanup(&link_primer);
>>                  goto out_put_cgroup;
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 9794446bc8c6..48cf855f949f 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -4183,6 +4183,23 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
>>          }
>>   }
>>
>> +static bool bpf_cgroup_prog_attached(enum bpf_prog_type ptype)
> I find this "attached" naming misleading. I think the name should call
> out that we are just checking if program type is cgroup-attaching, so
> maybe something like "is_cgroup_prog_type", or something along those
> lines?
>
>
> But for LSM we need to look at expected_attach_type to be
> BPF_LSM_CGROUP, so maybe pass both prog type and expected attach type?

Yes, let us pass both prog type and expected attach type with
func name is_cgroup_prog_type().

>
>> +{
>> +       switch (ptype) {
>> +       case BPF_PROG_TYPE_CGROUP_DEVICE:
>> +       case BPF_PROG_TYPE_CGROUP_SKB:
>> +       case BPF_PROG_TYPE_CGROUP_SOCK:
>> +       case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
>> +       case BPF_PROG_TYPE_CGROUP_SOCKOPT:
>> +       case BPF_PROG_TYPE_CGROUP_SYSCTL:
>> +       case BPF_PROG_TYPE_SOCK_OPS:
>> +       case BPF_PROG_TYPE_LSM:
>> +               return true;
>> +       default:
>> +               return false;
>> +       }
>> +}
>> +
> [...]


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH bpf-next 3/4] libbpf: Support link-based cgroup attach with options
  2025-04-11  1:15 [RFC PATCH bpf-next 0/4] bpf: Implement mprog API on top of existing cgroup progs Yonghong Song
  2025-04-11  1:15 ` [RFC PATCH bpf-next 1/4] cgroup: Add bpf prog revisions to struct cgroup_bpf Yonghong Song
  2025-04-11  1:15 ` [RFC PATCH bpf-next 2/4] bpf: Implement mprog API on top of existing cgroup progs Yonghong Song
@ 2025-04-11  1:15 ` Yonghong Song
  2025-04-11  1:15 ` [RFC PATCH bpf-next 4/4] selftests/bpf: Add two selftests for mprog API based cgroup progs Yonghong Song
  2025-04-23 23:20 ` [RFC PATCH bpf-next 0/4] bpf: Implement mprog API on top of existing " Andrii Nakryiko
  4 siblings, 0 replies; 13+ messages in thread
From: Yonghong Song @ 2025-04-11  1:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

Currently libbpf supports bpf_program__attach_cgroup() with signature:
  LIBBPF_API struct bpf_link *
  bpf_program__attach_cgroup(const struct bpf_program *prog, int cgroup_fd);

To support mprog style attachment, additionsl fields like flags,
relative_{fd,id} and expected_revision are needed.

Add a new API:
  LIBBPF_API struct bpf_link *
  bpf_program__attach_cgroup_opts(const struct bpf_program *prog, int cgroup_fd,
                                  const struct bpf_cgroup_opts *opts);
where bpf_cgroup_opts contains all above needed fields.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 tools/lib/bpf/bpf.c      | 44 ++++++++++++++++++++++++++++++++++++++++
 tools/lib/bpf/bpf.h      |  5 +++++
 tools/lib/bpf/libbpf.c   | 28 +++++++++++++++++++++++++
 tools/lib/bpf/libbpf.h   | 15 ++++++++++++++
 tools/lib/bpf/libbpf.map |  1 +
 5 files changed, 93 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index a9c3e33d0f8a..6eb421ccf91b 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -837,6 +837,50 @@ int bpf_link_create(int prog_fd, int target_fd,
 		if (!OPTS_ZEROED(opts, netkit))
 			return libbpf_err(-EINVAL);
 		break;
+	case BPF_CGROUP_INET_INGRESS:
+	case BPF_CGROUP_INET_EGRESS:
+	case BPF_CGROUP_INET_SOCK_CREATE:
+	case BPF_CGROUP_INET_SOCK_RELEASE:
+	case BPF_CGROUP_INET4_BIND:
+	case BPF_CGROUP_INET6_BIND:
+	case BPF_CGROUP_INET4_POST_BIND:
+	case BPF_CGROUP_INET6_POST_BIND:
+	case BPF_CGROUP_INET4_CONNECT:
+	case BPF_CGROUP_INET6_CONNECT:
+	case BPF_CGROUP_UNIX_CONNECT:
+	case BPF_CGROUP_INET4_GETPEERNAME:
+	case BPF_CGROUP_INET6_GETPEERNAME:
+	case BPF_CGROUP_UNIX_GETPEERNAME:
+	case BPF_CGROUP_INET4_GETSOCKNAME:
+	case BPF_CGROUP_INET6_GETSOCKNAME:
+	case BPF_CGROUP_UNIX_GETSOCKNAME:
+	case BPF_CGROUP_UDP4_SENDMSG:
+	case BPF_CGROUP_UDP6_SENDMSG:
+	case BPF_CGROUP_UNIX_SENDMSG:
+	case BPF_CGROUP_UDP4_RECVMSG:
+	case BPF_CGROUP_UDP6_RECVMSG:
+	case BPF_CGROUP_UNIX_RECVMSG:
+	case BPF_CGROUP_SOCK_OPS:
+	case BPF_CGROUP_DEVICE:
+	case BPF_CGROUP_SYSCTL:
+	case BPF_CGROUP_GETSOCKOPT:
+	case BPF_CGROUP_SETSOCKOPT:
+	case BPF_LSM_CGROUP:
+		relative_fd = OPTS_GET(opts, cgroup.relative_fd, 0);
+		relative_id = OPTS_GET(opts, cgroup.relative_id, 0);
+		if (relative_fd && relative_id)
+			return libbpf_err(-EINVAL);
+		if (relative_id) {
+			attr.link_create.cgroup.relative_id = relative_id;
+			attr.link_create.flags |= BPF_F_ID;
+		} else {
+			attr.link_create.cgroup.relative_fd = relative_fd;
+		}
+		attr.link_create.cgroup.expected_revision =
+			OPTS_GET(opts, cgroup.expected_revision, 0);
+		if (!OPTS_ZEROED(opts, cgroup))
+			return libbpf_err(-EINVAL);
+		break;
 	default:
 		if (!OPTS_ZEROED(opts, flags))
 			return libbpf_err(-EINVAL);
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 777627d33d25..1342564214c8 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -438,6 +438,11 @@ struct bpf_link_create_opts {
 			__u32 relative_id;
 			__u64 expected_revision;
 		} netkit;
+		struct {
+			__u32 relative_fd;
+			__u32 relative_id;
+			__u64 expected_revision;
+		} cgroup;
 	};
 	size_t :0;
 };
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index b2591f5cab65..129a3382abb9 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -12848,6 +12848,34 @@ struct bpf_link *bpf_program__attach_xdp(const struct bpf_program *prog, int ifi
 	return bpf_program_attach_fd(prog, ifindex, "xdp", NULL);
 }
 
+struct bpf_link *
+bpf_program__attach_cgroup_opts(const struct bpf_program *prog, int cgroup_fd,
+				const struct bpf_cgroup_opts *opts)
+{
+	LIBBPF_OPTS(bpf_link_create_opts, link_create_opts);
+	__u32 relative_id;
+	int relative_fd;
+
+	if (!OPTS_VALID(opts, bpf_cgroup_opts))
+		return libbpf_err_ptr(-EINVAL);
+
+	relative_id = OPTS_GET(opts, relative_id, 0);
+	relative_fd = OPTS_GET(opts, relative_fd, 0);
+
+	if (relative_fd && relative_id) {
+		pr_warn("prog '%s': relative_fd and relative_id cannot be set at the same time\n",
+			prog->name);
+		return libbpf_err_ptr(-EINVAL);
+	}
+
+	link_create_opts.cgroup.expected_revision = OPTS_GET(opts, expected_revision, 0);
+	link_create_opts.cgroup.relative_fd = relative_fd;
+	link_create_opts.cgroup.relative_id = relative_id;
+	link_create_opts.flags = OPTS_GET(opts, flags, 0);
+
+	return bpf_program_attach_fd(prog, cgroup_fd, "cgroup", &link_create_opts);
+}
+
 struct bpf_link *
 bpf_program__attach_tcx(const struct bpf_program *prog, int ifindex,
 			const struct bpf_tcx_opts *opts)
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index d39f19c8396d..622de1b932ee 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -877,6 +877,21 @@ LIBBPF_API struct bpf_link *
 bpf_program__attach_netkit(const struct bpf_program *prog, int ifindex,
 			   const struct bpf_netkit_opts *opts);
 
+struct bpf_cgroup_opts {
+	/* size of this struct, for forward/backward compatibility */
+	size_t sz;
+	__u32 flags;
+	__u32 relative_fd;
+	__u32 relative_id;
+	__u64 expected_revision;
+	size_t :0;
+};
+#define bpf_cgroup_opts__last_field expected_revision
+
+LIBBPF_API struct bpf_link *
+bpf_program__attach_cgroup_opts(const struct bpf_program *prog, int cgroup_fd,
+				const struct bpf_cgroup_opts *opts);
+
 struct bpf_map;
 
 LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 1205f9a4fe04..c7fc0bde5648 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -437,6 +437,7 @@ LIBBPF_1.6.0 {
 		bpf_linker__add_fd;
 		bpf_linker__new_fd;
 		bpf_object__prepare;
+		bpf_program__attach_cgroup_opts;
 		bpf_program__func_info;
 		bpf_program__func_info_cnt;
 		bpf_program__line_info;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH bpf-next 4/4] selftests/bpf: Add two selftests for mprog API based cgroup progs
  2025-04-11  1:15 [RFC PATCH bpf-next 0/4] bpf: Implement mprog API on top of existing cgroup progs Yonghong Song
                   ` (2 preceding siblings ...)
  2025-04-11  1:15 ` [RFC PATCH bpf-next 3/4] libbpf: Support link-based cgroup attach with options Yonghong Song
@ 2025-04-11  1:15 ` Yonghong Song
  2025-04-23 23:20 ` [RFC PATCH bpf-next 0/4] bpf: Implement mprog API on top of existing " Andrii Nakryiko
  4 siblings, 0 replies; 13+ messages in thread
From: Yonghong Song @ 2025-04-11  1:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

Two tests are added:
  - cgroup_mprog_opts, which mimics tc_opts.c ([1]). Both prog and link
    attach are tested. Some negative tests are also included.
  - cgroup_mprog_ordering, which actually runs the program with some mprog
    API flags.

  [1] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/prog_tests/tc_opts.c

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 .../bpf/prog_tests/cgroup_mprog_opts.c        | 752 ++++++++++++++++++
 .../bpf/prog_tests/cgroup_mprog_ordering.c    |  77 ++
 .../selftests/bpf/progs/cgroup_mprog.c        |  30 +
 3 files changed, 859 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/cgroup_mprog_opts.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/cgroup_mprog_ordering.c
 create mode 100644 tools/testing/selftests/bpf/progs/cgroup_mprog.c

diff --git a/tools/testing/selftests/bpf/prog_tests/cgroup_mprog_opts.c b/tools/testing/selftests/bpf/prog_tests/cgroup_mprog_opts.c
new file mode 100644
index 000000000000..a8374ea2267b
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/cgroup_mprog_opts.c
@@ -0,0 +1,752 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+#include <test_progs.h>
+#include "cgroup_helpers.h"
+#include "cgroup_mprog.skel.h"
+
+static __u32 id_from_prog_fd(int fd)
+{
+	struct bpf_prog_info prog_info = {};
+	__u32 prog_info_len = sizeof(prog_info);
+	int err;
+
+	err = bpf_obj_get_info_by_fd(fd, &prog_info, &prog_info_len);
+	if (!ASSERT_OK(err, "id_from_prog_fd"))
+		return 0;
+
+	ASSERT_NEQ(prog_info.id, 0, "prog_info.id");
+	return prog_info.id;
+}
+
+static void assert_mprog_count(int cg, int atype, int expected)
+{
+	__u32 count = 0, attach_flags = 0;
+	int err;
+
+	err = bpf_prog_query(cg, atype, 0, &attach_flags,
+			     NULL, &count);
+	ASSERT_EQ(count, expected, "count");
+	ASSERT_EQ(err, 0, "prog_query");
+}
+
+static void test_prog_attach_detach(int atype)
+{
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+	struct cgroup_mprog *skel;
+	__u32 prog_ids[10];
+	int cg, err;
+
+	cg = test__join_cgroup("/prog_attach_detach");
+	if (!ASSERT_GE(cg, 0, "join_cgroup /prog_attach_detach"))
+		return;
+
+	skel = cgroup_mprog__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd1 = bpf_program__fd(skel->progs.getsockopt_1);
+	fd2 = bpf_program__fd(skel->progs.getsockopt_2);
+	fd3 = bpf_program__fd(skel->progs.getsockopt_3);
+	fd4 = bpf_program__fd(skel->progs.getsockopt_4);
+
+	id1 = id_from_prog_fd(fd1);
+	id2 = id_from_prog_fd(fd2);
+	id3 = id_from_prog_fd(fd3);
+	id4 = id_from_prog_fd(fd4);
+
+	assert_mprog_count(cg, atype, 0);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI,
+		.expected_revision = 1,
+	);
+
+	/* ordering: [fd1] */
+	err = bpf_prog_attach_opts(fd1, cg, atype, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+
+	assert_mprog_count(cg, atype, 1);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI | BPF_F_BEFORE,
+		.expected_revision = 2,
+	);
+
+	/* ordering: [fd2, fd1] */
+	err = bpf_prog_attach_opts(fd2, cg, atype, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup1;
+
+	assert_mprog_count(cg, atype, 2);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI | BPF_F_AFTER,
+		.relative_fd = fd2,
+		.expected_revision = 3,
+	);
+
+	/* ordering: [fd2, fd3, fd1] */
+	err = bpf_prog_attach_opts(fd3, cg, atype, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup2;
+
+	assert_mprog_count(cg, atype, 3);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI,
+		.expected_revision = 4,
+	);
+
+	/* ordering: [fd2, fd3, fd1, fd4] */
+	err = bpf_prog_attach_opts(fd4, cg, atype, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup3;
+
+	assert_mprog_count(cg, atype, 4);
+
+	/* retrieve optq.prog_cnt */
+	err = bpf_prog_query_opts(cg, atype, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup4;
+
+	/* optq.prog_cnt will be used in below query */
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.prog_ids = prog_ids;
+	err = bpf_prog_query_opts(cg, atype, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup4;
+
+	ASSERT_EQ(optq.count, 4, "count");
+	ASSERT_EQ(optq.revision, 5, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], id1, "prog_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], id4, "prog_ids[3]");
+	ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+	ASSERT_EQ(optq.link_ids, NULL, "link_ids");
+
+cleanup4:
+	optd.expected_revision = 5;
+	err = bpf_prog_detach_opts(fd4, cg, atype, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(cg, atype, 3);
+
+cleanup3:
+	LIBBPF_OPTS_RESET(optd);
+	err = bpf_prog_detach_opts(fd3, cg, atype, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(cg, atype, 2);
+
+	/* Check revision after two detach operations */
+	err = bpf_prog_query_opts(cg, atype, &optq);
+	ASSERT_OK(err, "prog_query");
+	ASSERT_EQ(optq.revision, 7, "revision");
+
+cleanup2:
+	err = bpf_prog_detach_opts(fd2, cg, atype, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(cg, atype, 1);
+
+cleanup1:
+	err = bpf_prog_detach_opts(fd1, cg, atype, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(cg, atype, 0);
+
+cleanup:
+	cgroup_mprog__destroy(skel);
+	close(cg);
+}
+
+static void test_link_attach_detach(int atype)
+{
+	LIBBPF_OPTS(bpf_cgroup_opts, opta);
+	LIBBPF_OPTS(bpf_cgroup_opts, optd);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	struct bpf_link *link1, *link2, *link3, *link4;
+	__u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+	struct cgroup_mprog *skel;
+	__u32 prog_ids[10];
+	int cg, err;
+
+	cg = test__join_cgroup("/link_attach_detach");
+	if (!ASSERT_GE(cg, 0, "join_cgroup /link_attach_detach"))
+		return;
+
+	skel = cgroup_mprog__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd1 = bpf_program__fd(skel->progs.getsockopt_1);
+	fd2 = bpf_program__fd(skel->progs.getsockopt_2);
+	fd3 = bpf_program__fd(skel->progs.getsockopt_3);
+	fd4 = bpf_program__fd(skel->progs.getsockopt_4);
+
+	id1 = id_from_prog_fd(fd1);
+	id2 = id_from_prog_fd(fd2);
+	id3 = id_from_prog_fd(fd3);
+	id4 = id_from_prog_fd(fd4);
+
+	assert_mprog_count(cg, atype, 0);
+
+	LIBBPF_OPTS_RESET(opta,
+		.expected_revision = 1,
+	);
+
+	/* ordering: [fd1] */
+	link1 = bpf_program__attach_cgroup_opts(skel->progs.getsockopt_1, cg, &opta);
+	if (!ASSERT_OK_PTR(link1, "link_attach"))
+		goto cleanup;
+
+	assert_mprog_count(cg, atype, 1);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_BEFORE,
+		.expected_revision = 2,
+	);
+
+	/* ordering: [fd2, fd1] */
+	link2 = bpf_program__attach_cgroup_opts(skel->progs.getsockopt_2, cg, &opta);
+	if (!ASSERT_OK_PTR(link2, "link_attach"))
+		goto cleanup1;
+
+	assert_mprog_count(cg, atype, 2);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_AFTER,
+		.relative_fd = fd2,
+		.expected_revision = 3,
+	);
+
+	/* ordering: [fd2, fd3, fd1] */
+	link3 = bpf_program__attach_cgroup_opts(skel->progs.getsockopt_3, cg, &opta);
+	if (!ASSERT_OK_PTR(link3, "link_attach"))
+		goto cleanup2;
+
+	assert_mprog_count(cg, atype, 3);
+
+	LIBBPF_OPTS_RESET(opta,
+		.expected_revision = 4,
+	);
+
+	/* ordering: [fd2, fd3, fd1, fd4] */
+	link4 = bpf_program__attach_cgroup_opts(skel->progs.getsockopt_4, cg, &opta);
+	if (!ASSERT_OK_PTR(link4, "link_attach"))
+		goto cleanup3;
+
+	assert_mprog_count(cg, atype, 4);
+
+	/* retrieve optq.prog_cnt */
+	err = bpf_prog_query_opts(cg, atype, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup4;
+
+	/* optq.prog_cnt will be used in below query */
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.prog_ids = prog_ids;
+	err = bpf_prog_query_opts(cg, atype, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup4;
+
+	ASSERT_EQ(optq.count, 4, "count");
+	ASSERT_EQ(optq.revision, 5, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], id1, "prog_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], id4, "prog_ids[3]");
+	ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+	ASSERT_EQ(optq.link_ids, NULL, "link_ids");
+
+cleanup4:
+	bpf_link__destroy(link4);
+	assert_mprog_count(cg, atype, 3);
+
+cleanup3:
+	bpf_link__destroy(link3);
+	assert_mprog_count(cg, atype, 2);
+
+	/* Check revision after two detach operations */
+	err = bpf_prog_query_opts(cg, atype, &optq);
+	ASSERT_OK(err, "prog_query");
+	ASSERT_EQ(optq.revision, 7, "revision");
+
+cleanup2:
+	bpf_link__destroy(link2);
+	assert_mprog_count(cg, atype, 1);
+
+cleanup1:
+	bpf_link__destroy(link1);
+	assert_mprog_count(cg, atype, 0);
+
+cleanup:
+	cgroup_mprog__destroy(skel);
+	close(cg);
+}
+
+static void test_mix_attach_detach(int atype)
+{
+	LIBBPF_OPTS(bpf_cgroup_opts, lopta);
+	LIBBPF_OPTS(bpf_cgroup_opts, loptd);
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+	struct bpf_link *link2, *link4;
+	struct cgroup_mprog *skel;
+	__u32 prog_ids[10];
+	int cg, err;
+
+	cg = test__join_cgroup("/mix_attach_detach");
+	if (!ASSERT_GE(cg, 0, "join_cgroup /mix_attach_detach"))
+		return;
+
+	skel = cgroup_mprog__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd1 = bpf_program__fd(skel->progs.getsockopt_1);
+	fd2 = bpf_program__fd(skel->progs.getsockopt_2);
+	fd3 = bpf_program__fd(skel->progs.getsockopt_3);
+	fd4 = bpf_program__fd(skel->progs.getsockopt_4);
+
+	id1 = id_from_prog_fd(fd1);
+	id2 = id_from_prog_fd(fd2);
+	id3 = id_from_prog_fd(fd3);
+	id4 = id_from_prog_fd(fd4);
+
+	assert_mprog_count(cg, atype, 0);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI,
+		.expected_revision = 1,
+	);
+
+	/* ordering: [fd1] */
+	err = bpf_prog_attach_opts(fd1, cg, atype, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+
+	assert_mprog_count(cg, atype, 1);
+
+	LIBBPF_OPTS_RESET(lopta,
+		.flags = BPF_F_BEFORE,
+		.expected_revision = 2,
+	);
+
+	/* ordering: [fd2, fd1] */
+	link2 = bpf_program__attach_cgroup_opts(skel->progs.getsockopt_2, cg, &lopta);
+	if (!ASSERT_OK_PTR(link2, "link_attach"))
+		goto cleanup1;
+
+	assert_mprog_count(cg, atype, 2);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI | BPF_F_AFTER,
+		.relative_fd = fd2,
+		.expected_revision = 3,
+	);
+
+	/* ordering: [fd2, fd3, fd1] */
+	err = bpf_prog_attach_opts(fd3, cg, atype, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup2;
+
+	assert_mprog_count(cg, atype, 3);
+
+	LIBBPF_OPTS_RESET(lopta,
+		.expected_revision = 4,
+	);
+
+	/* ordering: [fd2, fd3, fd1, fd4] */
+	link4 = bpf_program__attach_cgroup_opts(skel->progs.getsockopt_4, cg, &lopta);
+	if (!ASSERT_OK_PTR(link4, "link_attach"))
+		goto cleanup3;
+
+	assert_mprog_count(cg, atype, 4);
+
+	/* retrieve optq.prog_cnt */
+	err = bpf_prog_query_opts(cg, atype, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup4;
+
+	/* optq.prog_cnt will be used in below query */
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.prog_ids = prog_ids;
+	err = bpf_prog_query_opts(cg, atype, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup4;
+
+	ASSERT_EQ(optq.count, 4, "count");
+	ASSERT_EQ(optq.revision, 5, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], id1, "prog_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], id4, "prog_ids[3]");
+	ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+	ASSERT_EQ(optq.link_ids, NULL, "link_ids");
+
+cleanup4:
+	bpf_link__destroy(link4);
+	assert_mprog_count(cg, atype, 3);
+
+cleanup3:
+	err = bpf_prog_detach_opts(fd3, cg, atype, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(cg, atype, 2);
+
+	/* Check revision after two detach operations */
+	err = bpf_prog_query_opts(cg, atype, &optq);
+	ASSERT_OK(err, "prog_query");
+	ASSERT_EQ(optq.revision, 7, "revision");
+
+cleanup2:
+	bpf_link__destroy(link2);
+	assert_mprog_count(cg, atype, 1);
+
+cleanup1:
+	err = bpf_prog_detach_opts(fd1, cg, atype, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(cg, atype, 0);
+
+cleanup:
+	cgroup_mprog__destroy(skel);
+	close(cg);
+}
+
+static void test_preorder_prog_attach_detach(int atype)
+{
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	__u32 fd1, fd2, fd3, fd4;
+	struct cgroup_mprog *skel;
+	int cg, err;
+
+	cg = test__join_cgroup("/preorder_prog_attach_detach");
+	if (!ASSERT_GE(cg, 0, "join_cgroup /preorder_prog_attach_detach"))
+		return;
+
+	skel = cgroup_mprog__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd1 = bpf_program__fd(skel->progs.getsockopt_1);
+	fd2 = bpf_program__fd(skel->progs.getsockopt_2);
+	fd3 = bpf_program__fd(skel->progs.getsockopt_3);
+	fd4 = bpf_program__fd(skel->progs.getsockopt_4);
+
+	assert_mprog_count(cg, atype, 0);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI,
+		.expected_revision = 1,
+	);
+
+	/* ordering: [fd1] */
+	err = bpf_prog_attach_opts(fd1, cg, atype, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+
+	assert_mprog_count(cg, atype, 1);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI | BPF_F_PREORDER,
+		.expected_revision = 2,
+	);
+
+	/* ordering: [fd1, fd2] */
+	err = bpf_prog_attach_opts(fd2, cg, atype, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup1;
+
+	assert_mprog_count(cg, atype, 2);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI | BPF_F_AFTER,
+		.relative_fd = fd2,
+		.expected_revision = 3,
+	);
+
+	err = bpf_prog_attach_opts(fd3, cg, atype, &opta);
+	if (!ASSERT_EQ(err, -EINVAL, "prog_attach"))
+		goto cleanup2;
+
+	assert_mprog_count(cg, atype, 2);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI | BPF_F_AFTER | BPF_F_PREORDER,
+		.relative_fd = fd2,
+		.expected_revision = 3,
+	);
+
+	/* ordering: [fd1, fd2, fd3] */
+	err = bpf_prog_attach_opts(fd3, cg, atype, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup2;
+
+	assert_mprog_count(cg, atype, 3);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI,
+		.expected_revision = 4,
+	);
+
+	/* ordering: [fd2, fd3, fd1, fd4] */
+	err = bpf_prog_attach_opts(fd4, cg, atype, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup3;
+
+	assert_mprog_count(cg, atype, 4);
+
+	err = bpf_prog_detach_opts(fd4, cg, atype, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(cg, atype, 3);
+
+cleanup3:
+	err = bpf_prog_detach_opts(fd3, cg, atype, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(cg, atype, 2);
+
+cleanup2:
+	err = bpf_prog_detach_opts(fd2, cg, atype, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(cg, atype, 1);
+
+cleanup1:
+	err = bpf_prog_detach_opts(fd1, cg, atype, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(cg, atype, 0);
+
+cleanup:
+	cgroup_mprog__destroy(skel);
+	close(cg);
+}
+
+static void test_preorder_link_attach_detach(int atype)
+{
+	LIBBPF_OPTS(bpf_cgroup_opts, opta);
+	struct bpf_link *link1, *link2, *link3, *link4;
+	struct cgroup_mprog *skel;
+	__u32 fd2;
+	int cg;
+
+	cg = test__join_cgroup("/preorder_link_attach_detach");
+	if (!ASSERT_GE(cg, 0, "join_cgroup /preorder_link_attach_detach"))
+		return;
+
+	skel = cgroup_mprog__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd2 = bpf_program__fd(skel->progs.getsockopt_2);
+
+	assert_mprog_count(cg, atype, 0);
+
+	LIBBPF_OPTS_RESET(opta,
+		.expected_revision = 1,
+	);
+
+	/* ordering: [fd1] */
+	link1 = bpf_program__attach_cgroup_opts(skel->progs.getsockopt_1, cg, &opta);
+	if (!ASSERT_OK_PTR(link1, "link_attach"))
+		goto cleanup;
+
+	assert_mprog_count(cg, atype, 1);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_PREORDER,
+		.expected_revision = 2,
+	);
+
+	/* ordering: [fd1, fd2] */
+	link2 = bpf_program__attach_cgroup_opts(skel->progs.getsockopt_2, cg, &opta);
+	if (!ASSERT_OK_PTR(link2, "link_attach"))
+		goto cleanup1;
+
+	assert_mprog_count(cg, atype, 2);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_AFTER,
+		.relative_fd = fd2,
+		.expected_revision = 3,
+	);
+
+	link3 = bpf_program__attach_cgroup_opts(skel->progs.getsockopt_3, cg, &opta);
+	if (!ASSERT_ERR_PTR(link3, "link_attach"))
+		goto cleanup2;
+
+	assert_mprog_count(cg, atype, 2);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_AFTER | BPF_F_PREORDER,
+		.relative_fd = fd2,
+		.expected_revision = 3,
+	);
+
+	/* ordering: [fd1, fd2, fd3] */
+	link3 = bpf_program__attach_cgroup_opts(skel->progs.getsockopt_3, cg, &opta);
+	if (!ASSERT_OK_PTR(link3, "link_attach"))
+		goto cleanup2;
+
+	assert_mprog_count(cg, atype, 3);
+
+	LIBBPF_OPTS_RESET(opta,
+		.expected_revision = 4,
+	);
+
+	/* ordering: [fd2, fd3, fd1, fd4] */
+	link4 = bpf_program__attach_cgroup_opts(skel->progs.getsockopt_4, cg, &opta);
+	if (!ASSERT_OK_PTR(link4, "prog_attach"))
+		goto cleanup3;
+
+	assert_mprog_count(cg, atype, 4);
+
+	bpf_link__destroy(link4);
+	assert_mprog_count(cg, atype, 3);
+
+cleanup3:
+	bpf_link__destroy(link3);
+	assert_mprog_count(cg, atype, 2);
+
+cleanup2:
+	bpf_link__destroy(link2);
+	assert_mprog_count(cg, atype, 1);
+
+cleanup1:
+	bpf_link__destroy(link1);
+	assert_mprog_count(cg, atype, 0);
+
+cleanup:
+	cgroup_mprog__destroy(skel);
+	close(cg);
+}
+
+static void test_invalid_attach_detach(int atype)
+{
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	__u32 fd1, fd2, id2;
+	struct cgroup_mprog *skel;
+	int cg, err;
+
+	cg = test__join_cgroup("/invalid_attach_detach");
+	if (!ASSERT_GE(cg, 0, "join_cgroup /invalid_attach_detach"))
+		return;
+
+	skel = cgroup_mprog__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd1 = bpf_program__fd(skel->progs.getsockopt_1);
+	fd2 = bpf_program__fd(skel->progs.getsockopt_2);
+
+	id2 = id_from_prog_fd(fd2);
+
+	assert_mprog_count(cg, atype, 0);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI | BPF_F_BEFORE | BPF_F_AFTER,
+	);
+
+	err = bpf_prog_attach_opts(fd1, cg, atype, &opta);
+	ASSERT_EQ(err, -EINVAL, "prog_attach");
+	assert_mprog_count(cg, atype, 0);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI | BPF_F_BEFORE | BPF_F_ID,
+	);
+
+	err = bpf_prog_attach_opts(fd1, cg, atype, &opta);
+	ASSERT_EQ(err, -ENOENT, "prog_attach");
+	assert_mprog_count(cg, atype, 0);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI | BPF_F_AFTER | BPF_F_ID,
+	);
+
+	err = bpf_prog_attach_opts(fd1, cg, atype, &opta);
+	ASSERT_EQ(err, -ENOENT, "prog_attach");
+	assert_mprog_count(cg, atype, 0);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI | BPF_F_BEFORE | BPF_F_AFTER,
+		.relative_id = id2,
+	);
+
+	err = bpf_prog_attach_opts(fd1, cg, atype, &opta);
+	ASSERT_EQ(err, -EINVAL, "prog_attach");
+	assert_mprog_count(cg, atype, 0);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI | BPF_F_ID,
+		.relative_id = id2,
+	);
+
+	err = bpf_prog_attach_opts(fd1, cg, atype, &opta);
+	ASSERT_EQ(err, -EINVAL, "prog_attach");
+	assert_mprog_count(cg, atype, 0);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI | BPF_F_BEFORE,
+		.relative_fd = fd1,
+	);
+
+	err = bpf_prog_attach_opts(fd1, cg, atype, &opta);
+	ASSERT_EQ(err, -ENOENT, "prog_attach");
+	assert_mprog_count(cg, atype, 0);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI | BPF_F_AFTER,
+		.relative_fd = fd1,
+	);
+
+	err = bpf_prog_attach_opts(fd1, cg, atype, &opta);
+	ASSERT_EQ(err, -ENOENT, "prog_attach");
+	assert_mprog_count(cg, atype, 0);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI,
+	);
+
+	err = bpf_prog_attach_opts(fd1, cg, atype, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+	assert_mprog_count(cg, atype, 1);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI | BPF_F_AFTER,
+	);
+
+	err = bpf_prog_attach_opts(fd1, cg, atype, &opta);
+	ASSERT_EQ(err, -EINVAL, "prog_attach");
+	assert_mprog_count(cg, atype, 1);
+
+	LIBBPF_OPTS_RESET(opta,
+		.flags = BPF_F_ALLOW_MULTI | BPF_F_REPLACE | BPF_F_AFTER,
+		.replace_prog_fd = fd1,
+	);
+
+	err = bpf_prog_attach_opts(fd1, cg, atype, &opta);
+	ASSERT_EQ(err, -EINVAL, "prog_attach");
+	assert_mprog_count(cg, atype, 1);
+cleanup:
+	cgroup_mprog__destroy(skel);
+	close(cg);
+}
+
+void test_cgroup_mprog_opts(void)
+{
+	if (test__start_subtest("prog_attach_detach"))
+		test_prog_attach_detach(BPF_CGROUP_GETSOCKOPT);
+	if (test__start_subtest("link_attach_detach"))
+		test_link_attach_detach(BPF_CGROUP_GETSOCKOPT);
+	if (test__start_subtest("mix_attach_detach"))
+		test_mix_attach_detach(BPF_CGROUP_GETSOCKOPT);
+	if (test__start_subtest("preorder_prog_attach_detach"))
+		test_preorder_prog_attach_detach(BPF_CGROUP_GETSOCKOPT);
+	if (test__start_subtest("preorder_link_attach_detach"))
+		test_preorder_link_attach_detach(BPF_CGROUP_GETSOCKOPT);
+	if (test__start_subtest("invalid_attach_detach"))
+		test_invalid_attach_detach(BPF_CGROUP_GETSOCKOPT);
+}
diff --git a/tools/testing/selftests/bpf/prog_tests/cgroup_mprog_ordering.c b/tools/testing/selftests/bpf/prog_tests/cgroup_mprog_ordering.c
new file mode 100644
index 000000000000..4a4e9710b474
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/cgroup_mprog_ordering.c
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+#include <test_progs.h>
+#include "cgroup_helpers.h"
+#include "cgroup_preorder.skel.h"
+
+static int run_getsockopt_test(int cg_parent, int sock_fd, bool has_relative_fd)
+{
+	LIBBPF_OPTS(bpf_prog_attach_opts, opts);
+	enum bpf_attach_type prog_p_atype, prog_p2_atype;
+	int prog_p_fd, prog_p2_fd;
+	struct cgroup_preorder *skel = NULL;
+	struct bpf_program *prog;
+	__u8 *result, buf;
+	socklen_t optlen;
+	int err = 0;
+
+	skel = cgroup_preorder__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "cgroup_preorder__open_and_load"))
+		return 0;
+
+	LIBBPF_OPTS_RESET(opts);
+	opts.flags = BPF_F_ALLOW_MULTI;
+	prog = skel->progs.parent;
+	prog_p_fd = bpf_program__fd(prog);
+	prog_p_atype = bpf_program__expected_attach_type(prog);
+	err = bpf_prog_attach_opts(prog_p_fd, cg_parent, prog_p_atype, &opts);
+	if (!ASSERT_OK(err, "bpf_prog_attach_opts-parent"))
+		goto close_skel;
+
+	opts.flags = BPF_F_ALLOW_MULTI | BPF_F_BEFORE;
+	if (has_relative_fd)
+		opts.relative_fd = prog_p_fd;
+	prog = skel->progs.parent_2;
+	prog_p2_fd = bpf_program__fd(prog);
+	prog_p2_atype = bpf_program__expected_attach_type(prog);
+	err = bpf_prog_attach_opts(prog_p2_fd, cg_parent, prog_p2_atype, &opts);
+	if (!ASSERT_OK(err, "bpf_prog_attach_opts-parent_2"))
+		goto detach_parent;
+
+	err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
+	if (!ASSERT_OK(err, "getsockopt"))
+		goto detach_parent_2;
+
+	result = skel->bss->result;
+	ASSERT_TRUE(result[0] == 4 && result[1] == 3, "result values");
+
+detach_parent_2:
+	ASSERT_OK(bpf_prog_detach2(prog_p2_fd, cg_parent, prog_p2_atype),
+		  "bpf_prog_detach2-parent_2");
+detach_parent:
+	ASSERT_OK(bpf_prog_detach2(prog_p_fd, cg_parent, prog_p_atype),
+		  "bpf_prog_detach2-parent");
+close_skel:
+	cgroup_preorder__destroy(skel);
+	return err;
+}
+
+void test_cgroup_mprog_ordering(void)
+{
+	int cg_parent = -1, sock_fd = -1;
+
+	cg_parent = test__join_cgroup("/parent");
+	if (!ASSERT_GE(cg_parent, 0, "join_cgroup /parent"))
+		goto out;
+
+	sock_fd = socket(AF_INET, SOCK_STREAM, 0);
+	if (!ASSERT_GE(sock_fd, 0, "socket"))
+		goto out;
+
+	ASSERT_OK(run_getsockopt_test(cg_parent, sock_fd, false), "getsockopt_test_1");
+	ASSERT_OK(run_getsockopt_test(cg_parent, sock_fd, true), "getsockopt_test_2");
+
+out:
+	close(sock_fd);
+	close(cg_parent);
+}
diff --git a/tools/testing/selftests/bpf/progs/cgroup_mprog.c b/tools/testing/selftests/bpf/progs/cgroup_mprog.c
new file mode 100644
index 000000000000..6a0ea02c4de2
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/cgroup_mprog.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+
+char _license[] SEC("license") = "GPL";
+
+SEC("cgroup/getsockopt")
+int getsockopt_1(struct bpf_sockopt *ctx)
+{
+	return 1;
+}
+
+SEC("cgroup/getsockopt")
+int getsockopt_2(struct bpf_sockopt *ctx)
+{
+	return 1;
+}
+
+SEC("cgroup/getsockopt")
+int getsockopt_3(struct bpf_sockopt *ctx)
+{
+	return 1;
+}
+
+SEC("cgroup/getsockopt")
+int getsockopt_4(struct bpf_sockopt *ctx)
+{
+	return 1;
+}
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH bpf-next 0/4] bpf: Implement mprog API on top of existing cgroup progs
  2025-04-11  1:15 [RFC PATCH bpf-next 0/4] bpf: Implement mprog API on top of existing cgroup progs Yonghong Song
                   ` (3 preceding siblings ...)
  2025-04-11  1:15 ` [RFC PATCH bpf-next 4/4] selftests/bpf: Add two selftests for mprog API based cgroup progs Yonghong Song
@ 2025-04-23 23:20 ` Andrii Nakryiko
  4 siblings, 0 replies; 13+ messages in thread
From: Andrii Nakryiko @ 2025-04-23 23:20 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On Thu, Apr 10, 2025 at 6:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
> Current cgroup prog ordering is appending at attachment time. This is not
> ideal. In some cases, users want specific ordering at a particular cgroup
> level. For example, in Meta, we have a case where three different
> applications all have cgroup/setsockopt progs and they require specific
> ordering. Current approach is to use a bpfchainer where one bpf prog
> contains multiple global functions and each global function can be
> freplaced by a prog for a specific application. The ordering of global
> functions decides the ordering of those application specific bpf progs.
> Using bpftrainer is a centralized approach and is not desirable as

typo: bpfchainer

> one of applications acts as a deamon. The decentralized attachment

typo: daemon


> approach is more favorable for those applications.
>
> To address this, the existing mprog API ([2]) seems an ideal solution with
> supporting BPF_F_BEFORE and BPF_F_AFTER flags on top of existing cgroup
> bpf implementation. More specifically, the support is added for prog/link
> attachment with BPF_F_BEFORE and BPF_F_AFTER. The kernel mprog
> interface ([2]) is not used and the implementation is directly done in
> cgroup bpf code base. The mprog 'revision' is also implemented in
> attach/detach/replace, so users can query revision number to check the
> change of cgroup prog list.
>
> The patch set contains 4 patches. Patch 1 adds revision support for
> cgroup bpf progs. Patch 2 implements mprog API implementation for
> prog/link attach and revision update. Patch 3 adds a new libbpf
> API to do cgroup link attach with flags like BPF_F_BEFORE/BPF_F_AFTER.
> Patch 4 adds two tests to validate the implementation.
>
>   [1] https://lore.kernel.org/r/20250224230116.283071-1-yonghong.song@linux.dev
>   [2] https://lore.kernel.org/r/20230719140858.13224-2-daniel@iogearbox.net
>
> Yonghong Song (4):
>   cgroup: Add bpf prog revisions to struct cgroup_bpf
>   bpf: Implement mprog API on top of existing cgroup progs
>   libbpf: Support link-based cgroup attach with options
>   selftests/bpf: Add two selftests for mprog API based cgroup progs
>
>  include/linux/bpf-cgroup-defs.h               |   1 +
>  include/uapi/linux/bpf.h                      |   7 +
>  kernel/bpf/cgroup.c                           | 151 +++-
>  kernel/bpf/syscall.c                          |  58 +-
>  kernel/cgroup/cgroup.c                        |   5 +-
>  tools/include/uapi/linux/bpf.h                |   7 +
>  tools/lib/bpf/bpf.c                           |  44 +
>  tools/lib/bpf/bpf.h                           |   5 +
>  tools/lib/bpf/libbpf.c                        |  28 +
>  tools/lib/bpf/libbpf.h                        |  15 +
>  tools/lib/bpf/libbpf.map                      |   1 +
>  .../bpf/prog_tests/cgroup_mprog_opts.c        | 752 ++++++++++++++++++
>  .../bpf/prog_tests/cgroup_mprog_ordering.c    |  77 ++
>  .../selftests/bpf/progs/cgroup_mprog.c        |  30 +
>  14 files changed, 1138 insertions(+), 43 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/cgroup_mprog_opts.c
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/cgroup_mprog_ordering.c
>  create mode 100644 tools/testing/selftests/bpf/progs/cgroup_mprog.c
>
> --
> 2.47.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-05-08  4:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-11  1:15 [RFC PATCH bpf-next 0/4] bpf: Implement mprog API on top of existing cgroup progs Yonghong Song
2025-04-11  1:15 ` [RFC PATCH bpf-next 1/4] cgroup: Add bpf prog revisions to struct cgroup_bpf Yonghong Song
2025-04-12  0:41   ` kernel test robot
2025-05-08  4:18     ` Yonghong Song
2025-04-12  1:13   ` kernel test robot
2025-04-23 23:19   ` Andrii Nakryiko
2025-05-08  4:19     ` Yonghong Song
2025-04-11  1:15 ` [RFC PATCH bpf-next 2/4] bpf: Implement mprog API on top of existing cgroup progs Yonghong Song
2025-04-23 23:19   ` Andrii Nakryiko
2025-05-08  4:42     ` Yonghong Song
2025-04-11  1:15 ` [RFC PATCH bpf-next 3/4] libbpf: Support link-based cgroup attach with options Yonghong Song
2025-04-11  1:15 ` [RFC PATCH bpf-next 4/4] selftests/bpf: Add two selftests for mprog API based cgroup progs Yonghong Song
2025-04-23 23:20 ` [RFC PATCH bpf-next 0/4] bpf: Implement mprog API on top of existing " Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.