* [PATCH v6 0/6] Proposal for a GPU cgroup controller
@ 2022-05-02 23:19 T.J. Mercier
2022-05-02 23:19 ` [PATCH v6 1/6] gpu: rfc: " T.J. Mercier
0 siblings, 1 reply; 6+ messages in thread
From: T.J. Mercier @ 2022-05-02 23:19 UTC (permalink / raw)
To: tjmercier, Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
Martijn Coenen, Joel Fernandes, Christian Brauner,
Hridya Valsaraju, Suren Baghdasaryan, Sumit Semwal,
Christian König, Benjamin Gaignard, Liam Mark, Laura Abbott,
Brian Starkey, John Stultz, Shuah Khan
Cc: daniel, jstultz, cmllamas, kaleshsingh, Kenny.Ho, mkoutny, skhan,
kernel-team, cgroups, linux-doc, linux-kernel, linux-media,
dri-devel, linaro-mm-sig, linux-kselftest
This patch series revisits the proposal for a GPU cgroup controller to
track and limit memory allocations by various device/allocator
subsystems. The patch series also contains a simple prototype to
illustrate how Android intends to implement DMA-BUF allocator
attribution using the GPU cgroup controller. The prototype does not
include resource limit enforcements.
Changelog:
v6:
Move documentation into cgroup-v2.rst per Tejun Heo.
Rename BINDER_FD{A}_FLAG_SENDER_NO_NEED ->
BINDER_FD{A}_FLAG_XFER_CHARGE per Carlos Llamas.
Return error on transfer failure per Carlos Llamas.
v5:
Rebase on top of v5.18-rc3
Drop the global GPU cgroup "total" (sum of all device totals) portion
of the design since there is no currently known use for this per
Tejun Heo.
Fix commit message which still contained the old name for
dma_buf_transfer_charge per Michal Koutný.
Remove all GPU cgroup code except what's necessary to support charge transfer
from dma_buf. Previously charging was done in export, but for non-Android
graphics use-cases this is not ideal since there may be a delay between
allocation and export, during which time there is no accounting.
Merge dmabuf: Use the GPU cgroup charge/uncharge APIs patch into
dmabuf: heaps: export system_heap buffers with GPU cgroup charging as a
result of above.
Put the charge and uncharge code in the same file (system_heap_allocate,
system_heap_dma_buf_release) instead of splitting them between the heap and
the dma_buf_release. This avoids asymmetric management of the gpucg charges.
Modify the dma_buf_transfer_charge API to accept a task_struct instead
of a gpucg. This avoids requiring the caller to manage the refcount
of the gpucg upon failure and confusing ownership transfer logic.
Support all strings for gpucg_register_bucket instead of just string
literals.
Enforce globally unique gpucg_bucket names.
Constrain gpucg_bucket name lengths to 64 bytes.
Append "-heap" to gpucg_bucket names from dmabuf-heaps.
Drop patch 7 from the series, which changed the types of
binder_transaction_data's sender_pid and sender_euid fields. This was
done in another commit here:
https://lore.kernel.org/all/20220210021129.3386083-4-masahiroy@kernel.org/
Rename:
gpucg_try_charge -> gpucg_charge
find_cg_rpool_locked -> cg_rpool_find_locked
init_cg_rpool -> cg_rpool_init
get_cg_rpool_locked -> cg_rpool_get_locked
"gpu cgroup controller" -> "GPU controller"
gpucg_device -> gpucg_bucket
usage -> size
Tests:
Support both binder_fd_array_object and binder_fd_object. This is
necessary because new versions of Android will use binder_fd_object
instead of binder_fd_array_object, and we need to support both.
Tests for both binder_fd_array_object and binder_fd_object.
For binder_utils return error codes instead of
struct binder{fs}_ctx.
Use ifdef __ANDROID__ to choose platform-dependent temp path instead
of a runtime fallback.
Ensure binderfs_mntpt ends with a trailing '/' character instead of
prepending it where used.
v4:
Skip test if not run as root per Shuah Khan
Add better test logging for abnormal child termination per Shuah Khan
Adjust ordering of charge/uncharge during transfer to avoid potentially
hitting cgroup limit per Michal Koutný
Adjust gpucg_try_charge critical section for charge transfer functionality
Fix uninitialized return code error for dmabuf_try_charge error case
v3:
Remove Upstreaming Plan from gpu-cgroup.rst per John Stultz
Use more common dual author commit message format per John Stultz
Remove android from binder changes title per Todd Kjos
Add a kselftest for this new behavior per Greg Kroah-Hartman
Include details on behavior for all combinations of kernel/userspace
versions in changelog (thanks Suren Baghdasaryan) per Greg Kroah-Hartman.
Fix pid and uid types in binder UAPI header
v2:
See the previous revision of this change submitted by Hridya Valsaraju
at: https://lore.kernel.org/all/20220115010622.3185921-1-hridya@google.com/
Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König. Pointers to struct gpucg and struct gpucg_device
tracking the current associations were added to the dma_buf struct to
achieve this.
Fix incorrect Kconfig help section indentation per Randy Dunlap.
History of the GPU cgroup controller
====================================
The GPU/DRM cgroup controller came into being when a consensus[1]
was reached that the resources it tracked were unsuitable to be integrated
into memcg. Originally, the proposed controller was specific to the DRM
subsystem and was intended to track GEM buffers and GPU-specific
resources[2]. In order to help establish a unified memory accounting model
for all GPU and all related subsystems, Daniel Vetter put forth a
suggestion to move it out of the DRM subsystem so that it can be used by
other DMA-BUF exporters as well[3]. This RFC proposes an interface that
does the same.
[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty@intel.com/#22624705
[2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/
[3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/
Hridya Valsaraju (3):
gpu: rfc: Proposal for a GPU cgroup controller
cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
memory
binder: Add flags to relinquish ownership of fds
T.J. Mercier (3):
dmabuf: heaps: export system_heap buffers with GPU cgroup charging
dmabuf: Add gpu cgroup charge transfer function
selftests: Add binder cgroup gpu memory transfer tests
Documentation/admin-guide/cgroup-v2.rst | 24 +
drivers/android/binder.c | 31 +-
drivers/dma-buf/dma-buf.c | 80 ++-
drivers/dma-buf/dma-heap.c | 39 ++
drivers/dma-buf/heaps/system_heap.c | 28 +-
include/linux/cgroup_gpu.h | 137 +++++
include/linux/cgroup_subsys.h | 4 +
include/linux/dma-buf.h | 49 +-
include/linux/dma-heap.h | 15 +
include/uapi/linux/android/binder.h | 23 +-
init/Kconfig | 7 +
kernel/cgroup/Makefile | 1 +
kernel/cgroup/gpu.c | 386 +++++++++++++
.../selftests/drivers/android/binder/Makefile | 8 +
.../drivers/android/binder/binder_util.c | 250 +++++++++
.../drivers/android/binder/binder_util.h | 32 ++
.../selftests/drivers/android/binder/config | 4 +
.../binder/test_dmabuf_cgroup_transfer.c | 526 ++++++++++++++++++
18 files changed, 1621 insertions(+), 23 deletions(-)
create mode 100644 include/linux/cgroup_gpu.h
create mode 100644 kernel/cgroup/gpu.c
create mode 100644 tools/testing/selftests/drivers/android/binder/Makefile
create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.c
create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.h
create mode 100644 tools/testing/selftests/drivers/android/binder/config
create mode 100644 tools/testing/selftests/drivers/android/binder/test_dmabuf_cgroup_transfer.c
--
2.36.0.464.gb9c8b46e94-goog
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v6 1/6] gpu: rfc: Proposal for a GPU cgroup controller
2022-05-02 23:19 [PATCH v6 0/6] Proposal for a GPU cgroup controller T.J. Mercier
@ 2022-05-02 23:19 ` T.J. Mercier
2022-05-04 12:10 ` Michal Koutný
0 siblings, 1 reply; 6+ messages in thread
From: T.J. Mercier @ 2022-05-02 23:19 UTC (permalink / raw)
To: tjmercier, Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet
Cc: daniel, hridya, christian.koenig, jstultz, tkjos, cmllamas,
surenb, kaleshsingh, Kenny.Ho, mkoutny, skhan, kernel-team,
cgroups, linux-doc, linux-kernel
From: Hridya Valsaraju <hridya@google.com>
This patch adds a proposal for a new GPU cgroup controller for
accounting/limiting GPU and GPU-related memory allocations.
The proposed controller is based on the DRM cgroup controller[1] and
follows the design of the RDMA cgroup controller.
The new cgroup controller would:
* Allow setting per-device limits on the total size of buffers
allocated by device within a cgroup.
* Expose a per-device/allocator breakdown of the buffers charged to a
cgroup.
The prototype in the following patches is only for memory accounting
using the GPU cgroup controller and does not implement limit setting.
[1]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
v6 changes
Move documentation into cgroup-v2.rst per Tejun Heo.
v5 changes
Drop the global GPU cgroup "total" (sum of all device totals) portion
of the design since there is no currently known use for this per
Tejun Heo.
Update for renamed functions/variables.
v3 changes
Remove Upstreaming Plan from gpu-cgroup.rst per John Stultz.
Use more common dual author commit message format per John Stultz.
---
Documentation/admin-guide/cgroup-v2.rst | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 69d7a6983f78..baeec096f1d8 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2352,6 +2352,30 @@ first, and stays charged to that cgroup until that resource is freed. Migrating
a process to a different cgroup does not move the charge to the destination
cgroup where the process has moved.
+
+GPU
+---
+
+The GPU controller accounts for device and system memory allocated by the GPU
+and related subsystems for graphics use. Resource limits are not currently
+supported.
+
+GPU Interface Files
+~~~~~~~~~~~~~~~~~~~~
+
+ gpu.memory.current
+ A read-only file containing memory allocations in flat-keyed format. The key
+ is a string representing the device name. The value is the size of the memory
+ charged to the device in bytes. The device names are globally unique.::
+
+ $ cat /sys/kernel/fs/cgroup1/gpu.memory.current
+ dev1 4194304
+ dev2 104857600
+
+ The device name string is set by a device driver when it registers with the
+ GPU cgroup controller to participate in resource accounting. Non-unique names
+ will be rejected at the point of registration.
+
Others
------
--
2.36.0.464.gb9c8b46e94-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v6 1/6] gpu: rfc: Proposal for a GPU cgroup controller
2022-05-02 23:19 ` [PATCH v6 1/6] gpu: rfc: " T.J. Mercier
@ 2022-05-04 12:10 ` Michal Koutný
2022-05-04 17:16 ` T.J. Mercier
0 siblings, 1 reply; 6+ messages in thread
From: Michal Koutný @ 2022-05-04 12:10 UTC (permalink / raw)
To: T.J. Mercier
Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet, daniel,
hridya, christian.koenig, jstultz, tkjos, cmllamas, surenb,
kaleshsingh, Kenny.Ho, skhan, kernel-team, cgroups, linux-doc,
linux-kernel
Hello.
On Mon, May 02, 2022 at 11:19:35PM +0000, "T.J. Mercier" <tjmercier@google.com> wrote:
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> [...]
> + The device name string is set by a device driver when it registers with the
> + GPU cgroup controller to participate in resource accounting.
Are these names available anywhere else for the user? (I.e. would
drivers add respective sysfs attributes or similar?)
> + Non-unique names will be rejected at the point of registration.
This doesn't seem relevant to the cgroupfs user, does it?
I think it should be mentioned at the respective API.
HTH,
Michal
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v6 1/6] gpu: rfc: Proposal for a GPU cgroup controller
2022-05-04 12:10 ` Michal Koutný
@ 2022-05-04 17:16 ` T.J. Mercier
2022-05-05 11:29 ` Michal Koutný
0 siblings, 1 reply; 6+ messages in thread
From: T.J. Mercier @ 2022-05-04 17:16 UTC (permalink / raw)
To: Michal Koutný
Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
Daniel Vetter, Hridya Valsaraju, Christian König,
John Stultz, Todd Kjos, Carlos Llamas, Suren Baghdasaryan,
Kalesh Singh, Kenny.Ho, Shuah Khan, kernel-team, cgroups,
linux-doc, linux-kernel
On Wed, May 4, 2022 at 5:10 AM Michal Koutný <mkoutny@suse.com> wrote:
>
> Hello.
>
> On Mon, May 02, 2022 at 11:19:35PM +0000, "T.J. Mercier" <tjmercier@google.com> wrote:
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > [...]
> > + The device name string is set by a device driver when it registers with the
> > + GPU cgroup controller to participate in resource accounting.
>
> Are these names available anywhere else for the user? (I.e. would
> drivers add respective sysfs attributes or similar?)
>
Hi, this sounds like it could be a good idea but it'd probably be best
to do this inside gpucg_register_bucket instead of requiring drivers
to perform this externally, possibly in a non-uniform way. Maybe a
sysfs file that prints each name of the gpucg_buckets elements?
However the only names that would result from this series are the
names of the dma-buf heaps, with "-heap" appended. So they are
predictable from the /dev/dma_heap/* names, and only the system and
cma heaps currently exist upstream.
For other future uses of this controller I thought we were headed in
the direction of "standardized" names which would be
predefined/hardcoded and documented, so these names wouldn't really
need to be made available to a user at runtime.
https://lore.kernel.org/lkml/CABdmKX3gTAohaOwkNccGrQyXN9tzT-oEVibO5ZPF+eP+Vq=AOg@mail.gmail.com/
>
> > + Non-unique names will be rejected at the point of registration.
>
> This doesn't seem relevant to the cgroupfs user, does it?
> I think it should be mentioned at the respective API.
>
Yeah you're right. Thank you.
> HTH,
> Michal
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v6 1/6] gpu: rfc: Proposal for a GPU cgroup controller
2022-05-04 17:16 ` T.J. Mercier
@ 2022-05-05 11:29 ` Michal Koutný
2022-05-05 23:56 ` T.J. Mercier
0 siblings, 1 reply; 6+ messages in thread
From: Michal Koutný @ 2022-05-05 11:29 UTC (permalink / raw)
To: T.J. Mercier
Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
Daniel Vetter, Hridya Valsaraju, Christian König,
John Stultz, Todd Kjos, Carlos Llamas, Suren Baghdasaryan,
Kalesh Singh, Kenny.Ho, Shuah Khan, kernel-team, cgroups,
linux-doc, linux-kernel
On Wed, May 04, 2022 at 10:16:50AM -0700, "T.J. Mercier" <tjmercier@google.com> wrote:
> However the only names that would result from this series are the
> names of the dma-buf heaps, with "-heap" appended. So they are
> predictable from the /dev/dma_heap/* names, and only the system and
> cma heaps currently exist upstream.
It's not so important with the read-only stats currently posted (a
crafted sysfs file with these names would be an overlikill)...
>
> For other future uses of this controller I thought we were headed in
> the direction of "standardized" names which would be
> predefined/hardcoded and documented, so these names wouldn't really
> need to be made available to a user at runtime.
> https://lore.kernel.org/lkml/CABdmKX3gTAohaOwkNccGrQyXN9tzT-oEVibO5ZPF+eP+Vq=AOg@mail.gmail.com/
(Ah, I see.)
...but if writers (limits) are envisioned, the keys should represent
something that the user can derive/construct from available info -- e.g.
the documentation.
OK, so I understand current form just presents some statistics.
Michal
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v6 1/6] gpu: rfc: Proposal for a GPU cgroup controller
2022-05-05 11:29 ` Michal Koutný
@ 2022-05-05 23:56 ` T.J. Mercier
0 siblings, 0 replies; 6+ messages in thread
From: T.J. Mercier @ 2022-05-05 23:56 UTC (permalink / raw)
To: Michal Koutný
Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
Daniel Vetter, Hridya Valsaraju, Christian König,
John Stultz, Todd Kjos, Carlos Llamas, Suren Baghdasaryan,
Kalesh Singh, Kenny.Ho, Shuah Khan, kernel-team, cgroups,
linux-doc, linux-kernel
On Thu, May 5, 2022 at 4:29 AM Michal Koutný <mkoutny@suse.com> wrote:
>
> On Wed, May 04, 2022 at 10:16:50AM -0700, "T.J. Mercier" <tjmercier@google.com> wrote:
> > However the only names that would result from this series are the
> > names of the dma-buf heaps, with "-heap" appended. So they are
> > predictable from the /dev/dma_heap/* names, and only the system and
> > cma heaps currently exist upstream.
>
> It's not so important with the read-only stats currently posted (a
> crafted sysfs file with these names would be an overlikill)...
>
> >
> > For other future uses of this controller I thought we were headed in
> > the direction of "standardized" names which would be
> > predefined/hardcoded and documented, so these names wouldn't really
> > need to be made available to a user at runtime.
> > https://lore.kernel.org/lkml/CABdmKX3gTAohaOwkNccGrQyXN9tzT-oEVibO5ZPF+eP+Vq=AOg@mail.gmail.com/
>
> (Ah, I see.)
>
> ...but if writers (limits) are envisioned, the keys should represent
> something that the user can derive/construct from available info -- e.g.
> the documentation.
>
> OK, so I understand current form just presents some statistics.
>
Yup, thanks for taking a look.
> Michal
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-05-05 23:56 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-05-02 23:19 [PATCH v6 0/6] Proposal for a GPU cgroup controller T.J. Mercier
2022-05-02 23:19 ` [PATCH v6 1/6] gpu: rfc: " T.J. Mercier
2022-05-04 12:10 ` Michal Koutný
2022-05-04 17:16 ` T.J. Mercier
2022-05-05 11:29 ` Michal Koutný
2022-05-05 23:56 ` T.J. Mercier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).