public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed
* Using cgroup membership for resource access control?
@ 2023-02-06 21:21 Tony Luck
       [not found] ` <Y+FvQbfTdcTe9GVu-E6Nu+q68HHQWLgLFz2vlpa2pdiUAq4bhAL8bYrjMMd8@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Tony Luck @ 2023-02-06 21:21 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Ramesh Thomas

Hi,

Cgroups primary function seems to be to divide limited resources and
make sure that they are allocated "fairly" (where the sysadmin decides
what is fair, and how much of each resource should be made available to
groups of processes).

Intel has a h/w feature in the DSA (Data Streaming Accelerator) device
that will allow a process to offer access to bounded virtual windows
into its address space to other processes.

The case where one process wants to make this offer to just one other
process seems simple.

But the h/w allows, and a process might want, to offer a virtual window
to several other processes. As soon as anyone says the words "several
processes" the immediate thought is "can cgroups help with this?"

I'm thinking along these lines:

1) Sysadmin creates a cgroup for a "job". Initializes the limits on
how many of these virtual windows can be used (h/w has a fixed number).
Assigns tasks in the job to this cgroup.

2) Tasks in the job that want to offer virtual windows call into the
driver to allocate and partially set up windows tagged with "available
to any other process in my cgroup".

3) Other tasks in the group ask the driver to complete the h/w
initialization by adding them (their PASID) to the access list
for each window.

My questions:

1) Is this horrible - have I misunderstood cgroups?
	1a) If this is horrible, can it be rescued?

2) Will it work - is "membership in a cgroup" a valid security mechanism?

3) Has someone done something similar before (so I can learn from their code)?

4) Is there an existing exported API to help. I see task_cgroup_path()
which looks generally helpful (though I'd prefer a task_cgroup() that
just takes a task and gives me the cgroup to which it belongs.)

Thanks

-Tony

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Using cgroup membership for resource access control?
       [not found] ` <Y+FvQbfTdcTe9GVu-E6Nu+q68HHQWLgLFz2vlpa2pdiUAq4bhAL8bYrjMMd8@public.gmane.org>
@ 2023-02-06 21:42   ` Tejun Heo
       [not found]     ` <Y+F0NA9iI0zlONz7-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2023-02-06 21:42 UTC (permalink / raw)
  To: Tony Luck; +Cc: Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA, Ramesh Thomas

Hello,

On Mon, Feb 06, 2023 at 01:21:05PM -0800, Tony Luck wrote:
...
> I'm thinking along these lines:
> 
> 1) Sysadmin creates a cgroup for a "job". Initializes the limits on
> how many of these virtual windows can be used (h/w has a fixed number).
> Assigns tasks in the job to this cgroup.
> 
> 2) Tasks in the job that want to offer virtual windows call into the
> driver to allocate and partially set up windows tagged with "available
> to any other process in my cgroup".
> 
> 3) Other tasks in the group ask the driver to complete the h/w
> initialization by adding them (their PASID) to the access list
> for each window.
> 
> My questions:
> 
> 1) Is this horrible - have I misunderstood cgroups?
> 	1a) If this is horrible, can it be rescued?

I'm not sure whether I fully understand what the virtual windows are but
assuming that it's a hardware resource that's limited in number and if
you're okay with tying it to cgroup hierarchy, I think it can just be one of
the resources that are managed by the misc controller.

The tying access domain to the cgroup part is likely the challenging part.
cgroup hierarchy is primarily composed according to the innate usage
hierarchy of the system and this might not match what you need. I can't
tell.

> 2) Will it work - is "membership in a cgroup" a valid security mechanism?

Again, I can't tell without knowing about the specifics but it gets used
that way in some cases - e.g. devcg in cgroup1 or the BPF counterpart in
ecgroup2 are used to enforce raw device access restrictions based on cgroup
membership.

> 3) Has someone done something similar before (so I can learn from their code)?

For limiting the number available to a part of a system, the misc controller
- kernel/cgroup/misc.c.

> 4) Is there an existing exported API to help. I see task_cgroup_path()
> which looks generally helpful (though I'd prefer a task_cgroup() that
> just takes a task and gives me the cgroup to which it belongs.)

task_dfl_cgroup(). However, please note that everything in cgroup is
supposed to be hierarchical, so that's something to keep on mind when
defining an external security mechanism based on cgroup membership.

The flip side is that on vast majority of configurations, cgroup hierarchy
more or less coincides with process tree which has the benefit of being
available regardless of cgroups, so in a lot of cases, it can be better to
just go the traditional way and tie these things to the process tree.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Using cgroup membership for resource access control?
       [not found]     ` <Y+F0NA9iI0zlONz7-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
@ 2023-02-06 21:43       ` Tejun Heo
       [not found]         ` <Y+F0mXS9z0flDhf7-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2023-02-06 21:43 UTC (permalink / raw)
  To: Tony Luck; +Cc: Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA, Ramesh Thomas

On Mon, Feb 06, 2023 at 11:42:12AM -1000, Tejun Heo wrote:
> The flip side is that on vast majority of configurations, cgroup hierarchy
> more or less coincides with process tree which has the benefit of being
> available regardless of cgroups, so in a lot of cases, it can be better to
> just go the traditional way and tie these things to the process tree.

In case it wasn't clear - use the misc controller to restrict which cgroups
can get how many but as for sharing domain, use more traditional mechanisms
whether that's sharing through cloning, fd passing, shared path with perm
checks or whatever.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Using cgroup membership for resource access control?
       [not found]         ` <Y+F0mXS9z0flDhf7-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
@ 2023-02-06 22:18           ` Luck, Tony
       [not found]             ` <SJ1PR11MB6083C61BCA70A31F8C0F12ECFCDA9-0NYKzF1JBnQoAmWoDslVMJPPoyLQLiKMvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Luck, Tony @ 2023-02-06 22:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Thomas, Ramesh

Tejun,

Thanks for the quick response.

> In case it wasn't clear - use the misc controller to restrict which cgroups
> can get how many but as for sharing domain, use more traditional mechanisms
> whether that's sharing through cloning, fd passing, shared path with perm
> checks or whatever.

That's always an option. But feels like a lot of complexity during setup that
I'd like to explore ways to avoid.

Some extra details of a workload that will use these shared virtual windows.

Imagine some AI training application with one process running per core on
a server with a hundred or so cores. Each of these processes wants periodically
to share work so far on a subset of the problem with one or more other processes.
The "virtual windows" allow an accelerator device to copy data between a region
in the source process (the owner of the virtual window) and another process that
needs to access/supply updates.

Process tree is easy if the test is just "do these two tasks have the same getppid()?"
Seems harder if the process tree is more complex and I want "Are these two processes
both descended from a particular common ancestor?"

Using fd passing would involve an O(N^2) step where each process talks to each
other process in turn to complete a link in the mesh of connections. This would need
to be repeated if additional processes are started.

It would be much nicer to have an operation that matches what the applications
want to do, namely "I want to broadcast-share this with all my peers".

[N.B. I've suggested that these folks should just re-write their applications to
simply attach to a giant blob of shared memory, and thus avoid all of this. But
that doesn't fit for various reasons]

-Tony

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Using cgroup membership for resource access control?
       [not found]             ` <SJ1PR11MB6083C61BCA70A31F8C0F12ECFCDA9-0NYKzF1JBnQoAmWoDslVMJPPoyLQLiKMvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2023-02-06 23:30               ` Tejun Heo
       [not found]                 ` <Y+GNp4VA1T9pV6nM-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2023-02-06 23:30 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Thomas, Ramesh

Hello,

On Mon, Feb 06, 2023 at 10:18:11PM +0000, Luck, Tony wrote:
> Imagine some AI training application with one process running per core on
> a server with a hundred or so cores. Each of these processes wants periodically
> to share work so far on a subset of the problem with one or more other processes.
> The "virtual windows" allow an accelerator device to copy data between a region
> in the source process (the owner of the virtual window) and another process that
> needs to access/supply updates.
> 
> Process tree is easy if the test is just "do these two tasks have the same getppid()?"
> Seems harder if the process tree is more complex and I want "Are these two processes
> both descended from a particular common ancestor?"
> 
> Using fd passing would involve an O(N^2) step where each process talks to each
> other process in turn to complete a link in the mesh of connections. This would need
> to be repeated if additional processes are started.

Wouldn't it be more usual for the parent to create the fd and let all the
children share through it? Even if not necessarily the parent, there can
always be a main process that can send the fd to whoever needs it.

> It would be much nicer to have an operation that matches what the applications
> want to do, namely "I want to broadcast-share this with all my peers".
>
> [N.B. I've suggested that these folks should just re-write their applications to
> simply attach to a giant blob of shared memory, and thus avoid all of this. But
> that doesn't fit for various reasons]

I'm not sure it'd be a good idea to introduce a whole new mode of access
control for this when it's something which can be addressed with more
conventional mechanisms. Maybe it's a bit more upfront work but one-off
security / naming mechanism feels like they'd have a reasonable chance to
cause long term headaches.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Using cgroup membership for resource access control?
       [not found]                 ` <Y+GNp4VA1T9pV6nM-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
@ 2023-02-07  0:48                   ` Luck, Tony
  0 siblings, 0 replies; 6+ messages in thread
From: Luck, Tony @ 2023-02-07  0:48 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Thomas, Ramesh

> I'm not sure it'd be a good idea to introduce a whole new mode of access
> control for this when it's something which can be addressed with more
> conventional mechanisms. Maybe it's a bit more upfront work but one-off
> security / naming mechanism feels like they'd have a reasonable chance to
> cause long term headaches.

Being the first to do something is always risky. But thanks for looking at my ideas.
I'm going to take your advice and try to build on existing mechanisms instead
of creating a new thing.

-Tony

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-02-07  0:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-06 21:21 Using cgroup membership for resource access control? Tony Luck
     [not found] ` <Y+FvQbfTdcTe9GVu-E6Nu+q68HHQWLgLFz2vlpa2pdiUAq4bhAL8bYrjMMd8@public.gmane.org>
2023-02-06 21:42   ` Tejun Heo
     [not found]     ` <Y+F0NA9iI0zlONz7-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2023-02-06 21:43       ` Tejun Heo
     [not found]         ` <Y+F0mXS9z0flDhf7-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2023-02-06 22:18           ` Luck, Tony
     [not found]             ` <SJ1PR11MB6083C61BCA70A31F8C0F12ECFCDA9-0NYKzF1JBnQoAmWoDslVMJPPoyLQLiKMvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2023-02-06 23:30               ` Tejun Heo
     [not found]                 ` <Y+GNp4VA1T9pV6nM-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2023-02-07  0:48                   ` Luck, Tony

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox