public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/6] Proposal for a GPU cgroup controller
@ 2022-05-02 23:19 T.J. Mercier
  2022-05-02 23:19 ` [PATCH v6 1/6] gpu: rfc: " T.J. Mercier
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: T.J. Mercier @ 2022-05-02 23:19 UTC (permalink / raw)
  To: tjmercier-hpIqsD4AKlfQT0dZR+AlfA, Tejun Heo, Zefan Li,
	Johannes Weiner, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey
  Cc: daniel-/w4YWyX8dFk, jstultz-hpIqsD4AKlfQT0dZR+AlfA,
	cmllamas-hpIqsD4AKlfQT0dZR+AlfA,
	kaleshsingh-hpIqsD4AKlfQT0dZR+AlfA, Kenny.Ho-5C7GfCeVMHo,
	mkoutny-IBi9RG/b67k, skhan-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-doc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	linux-kselftest-u79uwXL29TY76Z2rM5mHXA

This patch series revisits the proposal for a GPU cgroup controller to
track and limit memory allocations by various device/allocator
subsystems. The patch series also contains a simple prototype to
illustrate how Android intends to implement DMA-BUF allocator
attribution using the GPU cgroup controller. The prototype does not
include resource limit enforcements.

Changelog:
v6:
Move documentation into cgroup-v2.rst per Tejun Heo.

Rename BINDER_FD{A}_FLAG_SENDER_NO_NEED ->
BINDER_FD{A}_FLAG_XFER_CHARGE per Carlos Llamas.

Return error on transfer failure per Carlos Llamas.

v5:
Rebase on top of v5.18-rc3

Drop the global GPU cgroup "total" (sum of all device totals) portion
of the design since there is no currently known use for this per
Tejun Heo.

Fix commit message which still contained the old name for
dma_buf_transfer_charge per Michal Koutný.

Remove all GPU cgroup code except what's necessary to support charge transfer
from dma_buf. Previously charging was done in export, but for non-Android
graphics use-cases this is not ideal since there may be a delay between
allocation and export, during which time there is no accounting.

Merge dmabuf: Use the GPU cgroup charge/uncharge APIs patch into
dmabuf: heaps: export system_heap buffers with GPU cgroup charging as a
result of above.

Put the charge and uncharge code in the same file (system_heap_allocate,
system_heap_dma_buf_release) instead of splitting them between the heap and
the dma_buf_release. This avoids asymmetric management of the gpucg charges.

Modify the dma_buf_transfer_charge API to accept a task_struct instead
of a gpucg. This avoids requiring the caller to manage the refcount
of the gpucg upon failure and confusing ownership transfer logic.

Support all strings for gpucg_register_bucket instead of just string
literals.

Enforce globally unique gpucg_bucket names.

Constrain gpucg_bucket name lengths to 64 bytes.

Append "-heap" to gpucg_bucket names from dmabuf-heaps.

Drop patch 7 from the series, which changed the types of
binder_transaction_data's sender_pid and sender_euid fields. This was
done in another commit here:
https://lore.kernel.org/all/20220210021129.3386083-4-masahiroy-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org/

Rename:
  gpucg_try_charge -> gpucg_charge
  find_cg_rpool_locked -> cg_rpool_find_locked
  init_cg_rpool -> cg_rpool_init
  get_cg_rpool_locked -> cg_rpool_get_locked
  "gpu cgroup controller" -> "GPU controller"
  gpucg_device -> gpucg_bucket
  usage -> size

Tests:
  Support both binder_fd_array_object and binder_fd_object. This is
  necessary because new versions of Android will use binder_fd_object
  instead of binder_fd_array_object, and we need to support both.

  Tests for both binder_fd_array_object and binder_fd_object.

  For binder_utils return error codes instead of
  struct binder{fs}_ctx.

  Use ifdef __ANDROID__ to choose platform-dependent temp path instead
  of a runtime fallback.

  Ensure binderfs_mntpt ends with a trailing '/' character instead of
  prepending it where used.

v4:
Skip test if not run as root per Shuah Khan

Add better test logging for abnormal child termination per Shuah Khan

Adjust ordering of charge/uncharge during transfer to avoid potentially
hitting cgroup limit per Michal Koutný

Adjust gpucg_try_charge critical section for charge transfer functionality

Fix uninitialized return code error for dmabuf_try_charge error case

v3:
Remove Upstreaming Plan from gpu-cgroup.rst per John Stultz

Use more common dual author commit message format per John Stultz

Remove android from binder changes title per Todd Kjos

Add a kselftest for this new behavior per Greg Kroah-Hartman

Include details on behavior for all combinations of kernel/userspace
versions in changelog (thanks Suren Baghdasaryan) per Greg Kroah-Hartman.

Fix pid and uid types in binder UAPI header

v2:
See the previous revision of this change submitted by Hridya Valsaraju
at: https://lore.kernel.org/all/20220115010622.3185921-1-hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org/

Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König. Pointers to struct gpucg and struct gpucg_device
tracking the current associations were added to the dma_buf struct to
achieve this.

Fix incorrect Kconfig help section indentation per Randy Dunlap.

History of the GPU cgroup controller
====================================
The GPU/DRM cgroup controller came into being when a consensus[1]
was reached that the resources it tracked were unsuitable to be integrated
into memcg. Originally, the proposed controller was specific to the DRM
subsystem and was intended to track GEM buffers and GPU-specific
resources[2]. In order to help establish a unified memory accounting model
for all GPU and all related subsystems, Daniel Vetter put forth a
suggestion to move it out of the DRM subsystem so that it can be used by
other DMA-BUF exporters as well[3]. This RFC proposes an interface that
does the same.

[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org/#22624705
[2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/
[3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org/

Hridya Valsaraju (3):
  gpu: rfc: Proposal for a GPU cgroup controller
  cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
    memory
  binder: Add flags to relinquish ownership of fds

T.J. Mercier (3):
  dmabuf: heaps: export system_heap buffers with GPU cgroup charging
  dmabuf: Add gpu cgroup charge transfer function
  selftests: Add binder cgroup gpu memory transfer tests

 Documentation/admin-guide/cgroup-v2.rst       |  24 +
 drivers/android/binder.c                      |  31 +-
 drivers/dma-buf/dma-buf.c                     |  80 ++-
 drivers/dma-buf/dma-heap.c                    |  39 ++
 drivers/dma-buf/heaps/system_heap.c           |  28 +-
 include/linux/cgroup_gpu.h                    | 137 +++++
 include/linux/cgroup_subsys.h                 |   4 +
 include/linux/dma-buf.h                       |  49 +-
 include/linux/dma-heap.h                      |  15 +
 include/uapi/linux/android/binder.h           |  23 +-
 init/Kconfig                                  |   7 +
 kernel/cgroup/Makefile                        |   1 +
 kernel/cgroup/gpu.c                           | 386 +++++++++++++
 .../selftests/drivers/android/binder/Makefile |   8 +
 .../drivers/android/binder/binder_util.c      | 250 +++++++++
 .../drivers/android/binder/binder_util.h      |  32 ++
 .../selftests/drivers/android/binder/config   |   4 +
 .../binder/test_dmabuf_cgroup_transfer.c      | 526 ++++++++++++++++++
 18 files changed, 1621 insertions(+), 23 deletions(-)
 create mode 100644 include/linux/cgroup_gpu.h
 create mode 100644 kernel/cgroup/gpu.c
 create mode 100644 tools/testing/selftests/drivers/android/binder/Makefile
 create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.c
 create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.h
 create mode 100644 tools/testing/selftests/drivers/android/binder/config
 create mode 100644 tools/testing/selftests/drivers/android/binder/test_dmabuf_cgroup_transfer.c

-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v6 1/6] gpu: rfc: Proposal for a GPU cgroup controller
  2022-05-02 23:19 [PATCH v6 0/6] Proposal for a GPU cgroup controller T.J. Mercier
@ 2022-05-02 23:19 ` T.J. Mercier
       [not found]   ` <20220502231944.3891435-2-tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
       [not found] ` <20220502231944.3891435-1-tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2022-05-02 23:19 ` [PATCH v6 4/6] dmabuf: Add gpu cgroup charge transfer function T.J. Mercier
  2 siblings, 1 reply; 12+ messages in thread
From: T.J. Mercier @ 2022-05-02 23:19 UTC (permalink / raw)
  To: tjmercier, Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet
  Cc: daniel, hridya, christian.koenig, jstultz, tkjos, cmllamas,
	surenb, kaleshsingh, Kenny.Ho, mkoutny, skhan, kernel-team,
	cgroups, linux-doc, linux-kernel

From: Hridya Valsaraju <hridya@google.com>

This patch adds a proposal for a new GPU cgroup controller for
accounting/limiting GPU and GPU-related memory allocations.
The proposed controller is based on the DRM cgroup controller[1] and
follows the design of the RDMA cgroup controller.

The new cgroup controller would:
* Allow setting per-device limits on the total size of buffers
  allocated by device within a cgroup.
* Expose a per-device/allocator breakdown of the buffers charged to a
  cgroup.

The prototype in the following patches is only for memory accounting
using the GPU cgroup controller and does not implement limit setting.

[1]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/

Signed-off-by: Hridya Valsaraju <hridya@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>

---
v6 changes
Move documentation into cgroup-v2.rst per Tejun Heo.

v5 changes
Drop the global GPU cgroup "total" (sum of all device totals) portion
of the design since there is no currently known use for this per
Tejun Heo.

Update for renamed functions/variables.

v3 changes
Remove Upstreaming Plan from gpu-cgroup.rst per John Stultz.

Use more common dual author commit message format per John Stultz.
---
 Documentation/admin-guide/cgroup-v2.rst | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 69d7a6983f78..baeec096f1d8 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2352,6 +2352,30 @@ first, and stays charged to that cgroup until that resource is freed. Migrating
 a process to a different cgroup does not move the charge to the destination
 cgroup where the process has moved.
 
+
+GPU
+---
+
+The GPU controller accounts for device and system memory allocated by the GPU
+and related subsystems for graphics use. Resource limits are not currently
+supported.
+
+GPU Interface Files
+~~~~~~~~~~~~~~~~~~~~
+
+  gpu.memory.current
+	A read-only file containing memory allocations in flat-keyed format. The key
+	is a string representing the device name. The value is the size of the memory
+	charged to the device in bytes. The device names are globally unique.::
+
+	  $ cat /sys/kernel/fs/cgroup1/gpu.memory.current
+	  dev1 4194304
+	  dev2 104857600
+
+	The device name string is set by a device driver when it registers with the
+	GPU cgroup controller to participate in resource accounting. Non-unique names
+	will be rejected at the point of registration.
+
 Others
 ------
 
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
       [not found] ` <20220502231944.3891435-1-tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2022-05-02 23:19   ` T.J. Mercier
       [not found]     ` <20220502231944.3891435-3-tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: T.J. Mercier @ 2022-05-02 23:19 UTC (permalink / raw)
  To: tjmercier-hpIqsD4AKlfQT0dZR+AlfA, Tejun Heo, Zefan Li,
	Johannes Weiner
  Cc: daniel-/w4YWyX8dFk, hridya-hpIqsD4AKlfQT0dZR+AlfA,
	christian.koenig-5C7GfCeVMHo, jstultz-hpIqsD4AKlfQT0dZR+AlfA,
	tkjos-z5hGa2qSFaRBDgjK7y7TUQ, cmllamas-hpIqsD4AKlfQT0dZR+AlfA,
	surenb-hpIqsD4AKlfQT0dZR+AlfA, kaleshsingh-hpIqsD4AKlfQT0dZR+AlfA,
	Kenny.Ho-5C7GfCeVMHo, mkoutny-IBi9RG/b67k,
	skhan-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

From: Hridya Valsaraju <hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

The cgroup controller provides accounting for GPU and GPU-related
memory allocations. The memory being accounted can be device memory or
memory allocated from pools dedicated to serve GPU-related tasks.

This patch adds APIs to:
-allow a device to register for memory accounting using the GPU cgroup
controller.
-charge and uncharge allocated memory to a cgroup.

When the cgroup controller is enabled, it would expose information about
the memory allocated by each device(registered for GPU cgroup memory
accounting) for each cgroup.

The API/UAPI can be extended to set per-device/total allocation limits
in the future.

The cgroup controller has been named following the discussion in [1].

[1]: https://lore.kernel.org/amd-gfx/YCJp%2F%2FkMC7YjVMXv-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org/

Signed-off-by: Hridya Valsaraju <hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Signed-off-by: T.J. Mercier <tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

---
v5 changes
Support all strings for gpucg_register_device instead of just string
literals.

Enforce globally unique gpucg_bucket names.

Constrain gpucg_bucket name lengths to 64 bytes.

Obtain just a single css refcount instead of nr_pages for each
charge.

Rename:
gpucg_try_charge -> gpucg_charge
find_cg_rpool_locked -> cg_rpool_find_locked
init_cg_rpool -> cg_rpool_init
get_cg_rpool_locked -> cg_rpool_get_locked
"gpu cgroup controller" -> "GPU controller"
gpucg_device -> gpucg_bucket
usage -> size

v4 changes
Adjust gpucg_try_charge critical section for future charge transfer
functionality.

v3 changes
Use more common dual author commit message format per John Stultz.

v2 changes
Fix incorrect Kconfig help section indentation per Randy Dunlap.
---
 include/linux/cgroup_gpu.h    | 123 +++++++++++++
 include/linux/cgroup_subsys.h |   4 +
 init/Kconfig                  |   7 +
 kernel/cgroup/Makefile        |   1 +
 kernel/cgroup/gpu.c           | 324 ++++++++++++++++++++++++++++++++++
 5 files changed, 459 insertions(+)
 create mode 100644 include/linux/cgroup_gpu.h
 create mode 100644 kernel/cgroup/gpu.c

diff --git a/include/linux/cgroup_gpu.h b/include/linux/cgroup_gpu.h
new file mode 100644
index 000000000000..4dfe633d6ec7
--- /dev/null
+++ b/include/linux/cgroup_gpu.h
@@ -0,0 +1,123 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ * Copyright (C) 2022 Google LLC.
+ */
+#ifndef _CGROUP_GPU_H
+#define _CGROUP_GPU_H
+
+#include <linux/cgroup.h>
+#include <linux/list.h>
+
+#define GPUCG_BUCKET_NAME_MAX_LEN 64
+
+#ifdef CONFIG_CGROUP_GPU
+ /* The GPU cgroup controller data structure */
+struct gpucg {
+	struct cgroup_subsys_state css;
+
+	/* list of all resource pools that belong to this cgroup */
+	struct list_head rpools;
+};
+
+/* A named entity representing bucket of tracked memory. */
+struct gpucg_bucket {
+	/* list of various resource pools in various cgroups that the bucket is part of */
+	struct list_head rpools;
+
+	/* list of all buckets registered for GPU cgroup accounting */
+	struct list_head bucket_node;
+
+	/* string to be used as identifier for accounting and limit setting */
+	const char *name;
+};
+
+/**
+ * css_to_gpucg - get the corresponding gpucg ref from a cgroup_subsys_state
+ * @css: the target cgroup_subsys_state
+ *
+ * Returns: gpu cgroup that contains the @css
+ */
+static inline struct gpucg *css_to_gpucg(struct cgroup_subsys_state *css)
+{
+	return css ? container_of(css, struct gpucg, css) : NULL;
+}
+
+/**
+ * gpucg_get - get the gpucg reference that a task belongs to
+ * @task: the target task
+ *
+ * This increases the reference count of the css that the @task belongs to.
+ *
+ * Returns: reference to the gpu cgroup the task belongs to.
+ */
+static inline struct gpucg *gpucg_get(struct task_struct *task)
+{
+	if (!cgroup_subsys_enabled(gpu_cgrp_subsys))
+		return NULL;
+	return css_to_gpucg(task_get_css(task, gpu_cgrp_id));
+}
+
+/**
+ * gpucg_put - put a gpucg reference
+ * @gpucg: the target gpucg
+ *
+ * Put a reference obtained via gpucg_get
+ */
+static inline void gpucg_put(struct gpucg *gpucg)
+{
+	if (gpucg)
+		css_put(&gpucg->css);
+}
+
+/**
+ * gpucg_parent - find the parent of a gpu cgroup
+ * @cg: the target gpucg
+ *
+ * This does not increase the reference count of the parent cgroup
+ *
+ * Returns: parent gpu cgroup of @cg
+ */
+static inline struct gpucg *gpucg_parent(struct gpucg *cg)
+{
+	return css_to_gpucg(cg->css.parent);
+}
+
+int gpucg_charge(struct gpucg *gpucg, struct gpucg_bucket *bucket, u64 size);
+void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_bucket *bucket, u64 size);
+int gpucg_register_bucket(struct gpucg_bucket *bucket, const char *name);
+#else /* CONFIG_CGROUP_GPU */
+
+struct gpucg;
+struct gpucg_bucket;
+
+static inline struct gpucg *css_to_gpucg(struct cgroup_subsys_state *css)
+{
+	return NULL;
+}
+
+static inline struct gpucg *gpucg_get(struct task_struct *task)
+{
+	return NULL;
+}
+
+static inline void gpucg_put(struct gpucg *gpucg) {}
+
+static inline struct gpucg *gpucg_parent(struct gpucg *cg)
+{
+	return NULL;
+}
+
+static inline int gpucg_charge(struct gpucg *gpucg,
+			       struct gpucg_bucket *bucket,
+			       u64 size)
+{
+	return 0;
+}
+
+static inline void gpucg_uncharge(struct gpucg *gpucg,
+				  struct gpucg_bucket *bucket,
+				  u64 size) {}
+
+static inline int gpucg_register_bucket(struct gpucg_bucket *bucket, const char *name) {}
+#endif /* CONFIG_CGROUP_GPU */
+#endif /* _CGROUP_GPU_H */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 445235487230..46a2a7b93c41 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -65,6 +65,10 @@ SUBSYS(rdma)
 SUBSYS(misc)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_GPU)
+SUBSYS(gpu)
+#endif
+
 /*
  * The following subsystems are not supported on the default hierarchy.
  */
diff --git a/init/Kconfig b/init/Kconfig
index ddcbefe535e9..2e00a190e170 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -984,6 +984,13 @@ config BLK_CGROUP
 
 	See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information.
 
+config CGROUP_GPU
+	bool "GPU controller (EXPERIMENTAL)"
+	select PAGE_COUNTER
+	help
+	  Provides accounting and limit setting for memory allocations by the GPU and
+	  GPU-related subsystems.
+
 config CGROUP_WRITEBACK
 	bool
 	depends on MEMCG && BLK_CGROUP
diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index 12f8457ad1f9..be95a5a532fc 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_CGROUP_RDMA) += rdma.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
 obj-$(CONFIG_CGROUP_MISC) += misc.o
 obj-$(CONFIG_CGROUP_DEBUG) += debug.o
+obj-$(CONFIG_CGROUP_GPU) += gpu.o
diff --git a/kernel/cgroup/gpu.c b/kernel/cgroup/gpu.c
new file mode 100644
index 000000000000..34d0a5b85834
--- /dev/null
+++ b/kernel/cgroup/gpu.c
@@ -0,0 +1,324 @@
+// SPDX-License-Identifier: MIT
+// Copyright 2019 Advanced Micro Devices, Inc.
+// Copyright (C) 2022 Google LLC.
+
+#include <linux/cgroup.h>
+#include <linux/cgroup_gpu.h>
+#include <linux/err.h>
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/page_counter.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+static struct gpucg *root_gpucg __read_mostly;
+
+/*
+ * Protects list of resource pools maintained on per cgroup basis and list
+ * of buckets registered for memory accounting using the GPU cgroup controller.
+ */
+static DEFINE_MUTEX(gpucg_mutex);
+static LIST_HEAD(gpucg_buckets);
+
+struct gpucg_resource_pool {
+	/* The bucket whose resource usage is tracked by this resource pool */
+	struct gpucg_bucket *bucket;
+
+	/* list of all resource pools for the cgroup */
+	struct list_head cg_node;
+
+	/* list maintained by the gpucg_bucket to keep track of its resource pools */
+	struct list_head bucket_node;
+
+	/* tracks memory usage of the resource pool */
+	struct page_counter total;
+};
+
+static void free_cg_rpool_locked(struct gpucg_resource_pool *rpool)
+{
+	lockdep_assert_held(&gpucg_mutex);
+
+	list_del(&rpool->cg_node);
+	list_del(&rpool->bucket_node);
+	kfree(rpool);
+}
+
+static void gpucg_css_free(struct cgroup_subsys_state *css)
+{
+	struct gpucg_resource_pool *rpool, *tmp;
+	struct gpucg *gpucg = css_to_gpucg(css);
+
+	// delete all resource pools
+	mutex_lock(&gpucg_mutex);
+	list_for_each_entry_safe(rpool, tmp, &gpucg->rpools, cg_node)
+		free_cg_rpool_locked(rpool);
+	mutex_unlock(&gpucg_mutex);
+
+	kfree(gpucg);
+}
+
+static struct cgroup_subsys_state *
+gpucg_css_alloc(struct cgroup_subsys_state *parent_css)
+{
+	struct gpucg *gpucg, *parent;
+
+	gpucg = kzalloc(sizeof(struct gpucg), GFP_KERNEL);
+	if (!gpucg)
+		return ERR_PTR(-ENOMEM);
+
+	parent = css_to_gpucg(parent_css);
+	if (!parent)
+		root_gpucg = gpucg;
+
+	INIT_LIST_HEAD(&gpucg->rpools);
+
+	return &gpucg->css;
+}
+
+static struct gpucg_resource_pool *cg_rpool_find_locked(
+	struct gpucg *cg,
+	struct gpucg_bucket *bucket)
+{
+	struct gpucg_resource_pool *rpool;
+
+	lockdep_assert_held(&gpucg_mutex);
+
+	list_for_each_entry(rpool, &cg->rpools, cg_node)
+		if (rpool->bucket == bucket)
+			return rpool;
+
+	return NULL;
+}
+
+static struct gpucg_resource_pool *cg_rpool_init(struct gpucg *cg,
+						 struct gpucg_bucket *bucket)
+{
+	struct gpucg_resource_pool *rpool = kzalloc(sizeof(*rpool),
+							GFP_KERNEL);
+	if (!rpool)
+		return ERR_PTR(-ENOMEM);
+
+	rpool->bucket = bucket;
+
+	page_counter_init(&rpool->total, NULL);
+	INIT_LIST_HEAD(&rpool->cg_node);
+	INIT_LIST_HEAD(&rpool->bucket_node);
+	list_add_tail(&rpool->cg_node, &cg->rpools);
+	list_add_tail(&rpool->bucket_node, &bucket->rpools);
+
+	return rpool;
+}
+
+/**
+ * get_cg_rpool_locked - find the resource pool for the specified bucket and
+ * specified cgroup. If the resource pool does not exist for the cg, it is
+ * created in a hierarchical manner in the cgroup and its ancestor cgroups who
+ * do not already have a resource pool entry for the bucket.
+ *
+ * @cg: The cgroup to find the resource pool for.
+ * @bucket: The bucket associated with the returned resource pool.
+ *
+ * Return: return resource pool entry corresponding to the specified bucket in
+ * the specified cgroup (hierarchically creating them if not existing already).
+ *
+ */
+static struct gpucg_resource_pool *
+cg_rpool_get_locked(struct gpucg *cg, struct gpucg_bucket *bucket)
+{
+	struct gpucg *parent_cg, *p, *stop_cg;
+	struct gpucg_resource_pool *rpool, *tmp_rpool;
+	struct gpucg_resource_pool *parent_rpool = NULL, *leaf_rpool = NULL;
+
+	rpool = cg_rpool_find_locked(cg, bucket);
+	if (rpool)
+		return rpool;
+
+	stop_cg = cg;
+	do {
+		rpool = cg_rpool_init(stop_cg, bucket);
+		if (IS_ERR(rpool))
+			goto err;
+
+		if (!leaf_rpool)
+			leaf_rpool = rpool;
+
+		stop_cg = gpucg_parent(stop_cg);
+		if (!stop_cg)
+			break;
+
+		rpool = cg_rpool_find_locked(stop_cg, bucket);
+	} while (!rpool);
+
+	/*
+	 * Re-initialize page counters of all rpools created in this invocation
+	 * to enable hierarchical charging.
+	 * stop_cg is the first ancestor cg who already had a resource pool for
+	 * the bucket. It can also be NULL if no ancestors had a pre-existing
+	 * resource pool for the bucket before this invocation.
+	 */
+	rpool = leaf_rpool;
+	for (p = cg; p != stop_cg; p = parent_cg) {
+		parent_cg = gpucg_parent(p);
+		if (!parent_cg)
+			break;
+		parent_rpool = cg_rpool_find_locked(parent_cg, bucket);
+		page_counter_init(&rpool->total, &parent_rpool->total);
+
+		rpool = parent_rpool;
+	}
+
+	return leaf_rpool;
+err:
+	for (p = cg; p != stop_cg; p = gpucg_parent(p)) {
+		tmp_rpool = cg_rpool_find_locked(p, bucket);
+		free_cg_rpool_locked(tmp_rpool);
+	}
+	return rpool;
+}
+
+/**
+ * gpucg_charge - charge memory to the specified gpucg and gpucg_bucket.
+ * Caller must hold a reference to @gpucg obtained through gpucg_get(). The size
+ * of the memory is rounded up to be a multiple of the page size.
+ *
+ * @gpucg: The gpu cgroup to charge the memory to.
+ * @bucket: The bucket to charge the memory to.
+ * @size: The size of memory to charge in bytes.
+ *        This size will be rounded up to the nearest page size.
+ *
+ * Return: returns 0 if the charging is successful and otherwise returns an
+ * error code.
+ */
+int gpucg_charge(struct gpucg *gpucg, struct gpucg_bucket *bucket, u64 size)
+{
+	struct page_counter *counter;
+	u64 nr_pages;
+	struct gpucg_resource_pool *rp;
+	int ret = 0;
+
+	nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+	mutex_lock(&gpucg_mutex);
+	rp = cg_rpool_get_locked(gpucg, bucket);
+	/*
+	 * Continue to hold gpucg_mutex because we use it to block charges while transfers are in
+	 * progress to avoid potentially exceeding a limit.
+	 */
+	if (IS_ERR(rp)) {
+		mutex_unlock(&gpucg_mutex);
+		return PTR_ERR(rp);
+	}
+
+	if (page_counter_try_charge(&rp->total, nr_pages, &counter))
+		css_get(&gpucg->css);
+	else
+		ret = -ENOMEM;
+	mutex_unlock(&gpucg_mutex);
+
+	return ret;
+}
+
+/**
+ * gpucg_uncharge - uncharge memory from the specified gpucg and gpucg_bucket.
+ * The caller must hold a reference to @gpucg obtained through gpucg_get().
+ *
+ * @gpucg: The gpu cgroup to uncharge the memory from.
+ * @bucket: The bucket to uncharge the memory from.
+ * @size: The size of memory to uncharge in bytes.
+ *        This size will be rounded up to the nearest page size.
+ */
+void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_bucket *bucket, u64 size)
+{
+	u64 nr_pages;
+	struct gpucg_resource_pool *rp;
+
+	mutex_lock(&gpucg_mutex);
+	rp = cg_rpool_find_locked(gpucg, bucket);
+	/*
+	 * gpucg_mutex can be unlocked here, rp will stay valid until gpucg is freed and there are
+	 * active refs on gpucg. Uncharges are fine while transfers are in progress since there is
+	 * no potential to exceed a limit while uncharging and transferring.
+	 */
+	mutex_unlock(&gpucg_mutex);
+
+	if (unlikely(!rp)) {
+		pr_err("Resource pool not found, incorrect charge/uncharge ordering?\n");
+		return;
+	}
+
+	nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	page_counter_uncharge(&rp->total, nr_pages);
+	css_put(&gpucg->css);
+}
+
+/**
+ * gpucg_register_bucket - Registers a bucket for memory accounting using the
+ * GPU cgroup controller.
+ *
+ * @bucket: The bucket to register for memory accounting.
+ * @name: Pointer to a null-terminated string to denote the name of the bucket. This name should be
+ *        globally unique, and should not exceed @GPUCG_BUCKET_NAME_MAX_LEN bytes.
+ *
+ * @bucket must remain valid. @name will be copied.
+ *
+ * Returns 0 on success, or a negative errno code otherwise.
+ */
+int gpucg_register_bucket(struct gpucg_bucket *bucket, const char *name)
+{
+	struct gpucg_bucket *b;
+
+	if (!bucket || !name)
+		return -EINVAL;
+
+	if (strlen(name) >= GPUCG_BUCKET_NAME_MAX_LEN)
+		return -ENAMETOOLONG;
+
+	INIT_LIST_HEAD(&bucket->bucket_node);
+	INIT_LIST_HEAD(&bucket->rpools);
+	bucket->name = kstrdup_const(name, GFP_KERNEL);
+
+	mutex_lock(&gpucg_mutex);
+	list_for_each_entry(b, &gpucg_buckets, bucket_node) {
+		if (strncmp(b->name, bucket->name, GPUCG_BUCKET_NAME_MAX_LEN) == 0) {
+			mutex_unlock(&gpucg_mutex);
+			kfree_const(bucket->name);
+			return -EEXIST;
+		}
+	}
+	list_add_tail(&bucket->bucket_node, &gpucg_buckets);
+	mutex_unlock(&gpucg_mutex);
+
+	return 0;
+}
+
+static int gpucg_resource_show(struct seq_file *sf, void *v)
+{
+	struct gpucg_resource_pool *rpool;
+	struct gpucg *cg = css_to_gpucg(seq_css(sf));
+
+	mutex_lock(&gpucg_mutex);
+	list_for_each_entry(rpool, &cg->rpools, cg_node) {
+		seq_printf(sf, "%s %lu\n", rpool->bucket->name,
+			   page_counter_read(&rpool->total) * PAGE_SIZE);
+	}
+	mutex_unlock(&gpucg_mutex);
+
+	return 0;
+}
+
+struct cftype files[] = {
+	{
+		.name = "memory.current",
+		.seq_show = gpucg_resource_show,
+	},
+	{ }     /* terminate */
+};
+
+struct cgroup_subsys gpu_cgrp_subsys = {
+	.css_alloc      = gpucg_css_alloc,
+	.css_free       = gpucg_css_free,
+	.early_init     = false,
+	.legacy_cftypes = files,
+	.dfl_cftypes    = files,
+};
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 4/6] dmabuf: Add gpu cgroup charge transfer function
  2022-05-02 23:19 [PATCH v6 0/6] Proposal for a GPU cgroup controller T.J. Mercier
  2022-05-02 23:19 ` [PATCH v6 1/6] gpu: rfc: " T.J. Mercier
       [not found] ` <20220502231944.3891435-1-tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2022-05-02 23:19 ` T.J. Mercier
  2 siblings, 0 replies; 12+ messages in thread
From: T.J. Mercier @ 2022-05-02 23:19 UTC (permalink / raw)
  To: tjmercier, Sumit Semwal, Christian König, Tejun Heo,
	Zefan Li, Johannes Weiner
  Cc: daniel, hridya, jstultz, tkjos, cmllamas, surenb, kaleshsingh,
	Kenny.Ho, mkoutny, skhan, kernel-team, linux-media, dri-devel,
	linaro-mm-sig, linux-kernel, cgroups

The dma_buf_transfer_charge function provides a way for processes to
transfer charge of a buffer to a different process. This is essential
for the cases where a central allocator process does allocations for
various subsystems, hands over the fd to the client who requested the
memory and drops all references to the allocated memory.

Originally-by: Hridya Valsaraju <hridya@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>

---
v5 changes
Fix commit message which still contained the old name for
dma_buf_transfer_charge per Michal Koutný.

Modify the dma_buf_transfer_charge API to accept a task_struct instead
of a gpucg. This avoids requiring the caller to manage the refcount
of the gpucg upon failure and confusing ownership transfer logic.

v4 changes
Adjust ordering of charge/uncharge during transfer to avoid potentially
hitting cgroup limit per Michal Koutný.

v3 changes
Use more common dual author commit message format per John Stultz.

v2 changes
Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König.
---
 drivers/dma-buf/dma-buf.c  | 57 +++++++++++++++++++++++++++++++++++
 include/linux/cgroup_gpu.h | 14 +++++++++
 include/linux/dma-buf.h    |  6 ++++
 kernel/cgroup/gpu.c        | 62 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 139 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index bc89c44bd9b9..f3fb844925e2 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1341,6 +1341,63 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map)
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap, DMA_BUF);
 
+/**
+ * dma_buf_transfer_charge - Change the GPU cgroup to which the provided dma_buf is charged.
+ * @dmabuf:	[in]	buffer whose charge will be migrated to a different GPU cgroup
+ * @target:	[in]	the task_struct of the destination process for the GPU cgroup charge
+ *
+ * Only tasks that belong to the same cgroup the buffer is currently charged to
+ * may call this function, otherwise it will return -EPERM.
+ *
+ * Returns 0 on success, or a negative errno code otherwise.
+ */
+int dma_buf_transfer_charge(struct dma_buf *dmabuf, struct task_struct *target)
+{
+	struct gpucg *current_gpucg, *target_gpucg, *to_release;
+	int ret;
+
+	if (!dmabuf->gpucg || !dmabuf->gpucg_bucket) {
+		/* This dmabuf is not tracked under GPU cgroup accounting */
+		return 0;
+	}
+
+	current_gpucg = gpucg_get(current);
+	target_gpucg = gpucg_get(target);
+	to_release = target_gpucg;
+
+	/* If the source and destination cgroups are the same, don't do anything. */
+	if (current_gpucg == target_gpucg) {
+		ret = 0;
+		goto skip_transfer;
+	}
+
+	/*
+	 * Verify that the cgroup of the process requesting the transfer
+	 * is the same as the one the buffer is currently charged to.
+	 */
+	mutex_lock(&dmabuf->lock);
+	if (current_gpucg != dmabuf->gpucg) {
+		ret = -EPERM;
+		goto err;
+	}
+
+	ret = gpucg_transfer_charge(
+		dmabuf->gpucg, target_gpucg, dmabuf->gpucg_bucket, dmabuf->size);
+	if (ret)
+		goto err;
+
+	to_release = dmabuf->gpucg;
+	dmabuf->gpucg = target_gpucg;
+
+err:
+	mutex_unlock(&dmabuf->lock);
+skip_transfer:
+	gpucg_put(current_gpucg);
+	gpucg_put(to_release);
+	return ret;
+}
+EXPORT_SYMBOL_NS_GPL(dma_buf_transfer_charge, DMA_BUF);
+
 #ifdef CONFIG_DEBUG_FS
 static int dma_buf_debug_show(struct seq_file *s, void *unused)
 {
diff --git a/include/linux/cgroup_gpu.h b/include/linux/cgroup_gpu.h
index 4dfe633d6ec7..f5973ef9f926 100644
--- a/include/linux/cgroup_gpu.h
+++ b/include/linux/cgroup_gpu.h
@@ -83,7 +83,13 @@ static inline struct gpucg *gpucg_parent(struct gpucg *cg)
 }
 
 int gpucg_charge(struct gpucg *gpucg, struct gpucg_bucket *bucket, u64 size);
+
 void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_bucket *bucket, u64 size);
+
+int gpucg_transfer_charge(struct gpucg *source,
+			  struct gpucg *dest,
+			  struct gpucg_bucket *bucket,
+			  u64 size);
 int gpucg_register_bucket(struct gpucg_bucket *bucket, const char *name);
 #else /* CONFIG_CGROUP_GPU */
 
@@ -118,6 +124,14 @@ static inline void gpucg_uncharge(struct gpucg *gpucg,
 				  struct gpucg_bucket *bucket,
 				  u64 size) {}
 
+static inline int gpucg_transfer_charge(struct gpucg *source,
+					struct gpucg *dest,
+					struct gpucg_bucket *bucket,
+					u64 size)
+{
+	return 0;
+}
+
 static inline int gpucg_register_bucket(struct gpucg_bucket *bucket, const char *name) {}
 #endif /* CONFIG_CGROUP_GPU */
 #endif /* _CGROUP_GPU_H */
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 8e7c55c830b3..438ad8577b76 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -18,6 +18,7 @@
 #include <linux/file.h>
 #include <linux/err.h>
 #include <linux/scatterlist.h>
+#include <linux/sched.h>
 #include <linux/list.h>
 #include <linux/dma-mapping.h>
 #include <linux/fs.h>
@@ -650,9 +651,14 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map);
 void dma_buf_exp_info_set_gpucg(struct dma_buf_export_info *exp_info,
 				struct gpucg *gpucg,
 				struct gpucg_bucket *gpucg_bucket);
+
+int dma_buf_transfer_charge(struct dma_buf *dmabuf, struct task_struct *target);
 #else/* CONFIG_CGROUP_GPU */
 static inline void dma_buf_exp_info_set_gpucg(struct dma_buf_export_info *exp_info,
 					      struct gpucg *gpucg,
 					      struct gpucg_bucket *gpucg_bucket) {}
+
+static inline int dma_buf_transfer_charge(struct dma_buf *dmabuf, struct task_struct *target)
+{ return 0; }
 #endif /* CONFIG_CGROUP_GPU */
 #endif /* __DMA_BUF_H__ */
diff --git a/kernel/cgroup/gpu.c b/kernel/cgroup/gpu.c
index 34d0a5b85834..7dfbe0fd7e45 100644
--- a/kernel/cgroup/gpu.c
+++ b/kernel/cgroup/gpu.c
@@ -252,6 +252,68 @@ void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_bucket *bucket, u64 size)
 	css_put(&gpucg->css);
 }
 
+/**
+ * gpucg_transfer_charge - Transfer a GPU charge from one cgroup to another.
+ *
+ * @source:	[in]	The GPU cgroup the charge will be transferred from.
+ * @dest:	[in]	The GPU cgroup the charge will be transferred to.
+ * @bucket:	[in]	The GPU cgroup bucket corresponding to the charge.
+ * @size:	[in]	The size of the memory in bytes.
+ *                      This size will be rounded up to the nearest page size.
+ *
+ * Returns 0 on success, or a negative errno code otherwise.
+ */
+int gpucg_transfer_charge(struct gpucg *source,
+			  struct gpucg *dest,
+			  struct gpucg_bucket *bucket,
+			  u64 size)
+{
+	struct page_counter *counter;
+	u64 nr_pages;
+	struct gpucg_resource_pool *rp_source, *rp_dest;
+	int ret = 0;
+
+	nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+	mutex_lock(&gpucg_mutex);
+	rp_source = cg_rpool_find_locked(source, bucket);
+	if (unlikely(!rp_source)) {
+		ret = -ENOENT;
+		goto exit_early;
+	}
+
+	rp_dest = cg_rpool_get_locked(dest, bucket);
+	if (IS_ERR(rp_dest)) {
+		ret = PTR_ERR(rp_dest);
+		goto exit_early;
+	}
+
+	/*
+	 * First uncharge from the pool it's currently charged to. This ordering avoids double
+	 * charging while the transfer is in progress, which could cause us to hit a limit.
+	 * If the try_charge fails for this transfer, we need to be able to reverse this uncharge,
+	 * so we continue to hold the gpucg_mutex here.
+	 */
+	page_counter_uncharge(&rp_source->total, nr_pages);
+	css_put(&source->css);
+
+	/* Now attempt the new charge */
+	if (page_counter_try_charge(&rp_dest->total, nr_pages, &counter)) {
+		css_get(&dest->css);
+	} else {
+		/*
+		 * The new charge failed, so reverse the uncharge from above. This should always
+		 * succeed since charges on source are blocked by gpucg_mutex.
+		 */
+		WARN_ON(!page_counter_try_charge(&rp_source->total, nr_pages, &counter));
+		css_get(&source->css);
+		ret = -ENOMEM;
+	}
+exit_early:
+	mutex_unlock(&gpucg_mutex);
+	return ret;
+}
+
 /**
  * gpucg_register_bucket - Registers a bucket for memory accounting using the
  * GPU cgroup controller.
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 1/6] gpu: rfc: Proposal for a GPU cgroup controller
       [not found]   ` <20220502231944.3891435-2-tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2022-05-04 12:10     ` Michal Koutný
       [not found]       ` <20220504121052.GA24172-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Koutný @ 2022-05-04 12:10 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
	daniel-/w4YWyX8dFk, hridya-hpIqsD4AKlfQT0dZR+AlfA,
	christian.koenig-5C7GfCeVMHo, jstultz-hpIqsD4AKlfQT0dZR+AlfA,
	tkjos-z5hGa2qSFaRBDgjK7y7TUQ, cmllamas-hpIqsD4AKlfQT0dZR+AlfA,
	surenb-hpIqsD4AKlfQT0dZR+AlfA, kaleshsingh-hpIqsD4AKlfQT0dZR+AlfA,
	Kenny.Ho-5C7GfCeVMHo, skhan-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-doc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello.

On Mon, May 02, 2022 at 11:19:35PM +0000, "T.J. Mercier" <tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> [...]
> +	The device name string is set by a device driver when it registers with the
> +	GPU cgroup controller to participate in resource accounting. 

Are these names available anywhere else for the user? (I.e. would
drivers add respective sysfs attributes or similar?)


> +     Non-unique names will be rejected at the point of registration.

This doesn't seem relevant to the cgroupfs user, does it?
I think it should be mentioned at the respective API.

HTH,
Michal


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
       [not found]     ` <20220502231944.3891435-3-tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2022-05-04 12:25       ` Michal Koutný
       [not found]         ` <20220504122558.GB24172-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Koutný @ 2022-05-04 12:25 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, daniel-/w4YWyX8dFk,
	hridya-hpIqsD4AKlfQT0dZR+AlfA, christian.koenig-5C7GfCeVMHo,
	jstultz-hpIqsD4AKlfQT0dZR+AlfA, tkjos-z5hGa2qSFaRBDgjK7y7TUQ,
	cmllamas-hpIqsD4AKlfQT0dZR+AlfA, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	kaleshsingh-hpIqsD4AKlfQT0dZR+AlfA, Kenny.Ho-5C7GfCeVMHo,
	skhan-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello.

On Mon, May 02, 2022 at 11:19:36PM +0000, "T.J. Mercier" <tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> This patch adds APIs to:
> -allow a device to register for memory accounting using the GPU cgroup
> controller.
> -charge and uncharge allocated memory to a cgroup.

Is this API for separately built consumers?
The respective functions should be exported (EXPORT_SYMBOL_GPL) if I
haven't missed anything.

> +#ifdef CONFIG_CGROUP_GPU
> + /* The GPU cgroup controller data structure */
> +struct gpucg {
> +	struct cgroup_subsys_state css;
> +
> +	/* list of all resource pools that belong to this cgroup */
> +	struct list_head rpools;
> +};
> +
> +/* A named entity representing bucket of tracked memory. */
> +struct gpucg_bucket {
> +	/* list of various resource pools in various cgroups that the bucket is part of */
> +	struct list_head rpools;
> +
> +	/* list of all buckets registered for GPU cgroup accounting */
> +	struct list_head bucket_node;
> +
> +	/* string to be used as identifier for accounting and limit setting */
> +	const char *name;
> +};

Do these struct have to be defined "publicly"?
I.e. the driver code could just work with gpucg and gpucg_bucket
pointers.

> +int gpucg_register_bucket(struct gpucg_bucket *bucket, const char *name)

...and the registration function would return a pointer to newly
(internally) allocated gpucg_bucket.

Regards,
Michal

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 1/6] gpu: rfc: Proposal for a GPU cgroup controller
       [not found]       ` <20220504121052.GA24172-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
@ 2022-05-04 17:16         ` T.J. Mercier
  2022-05-05 11:29           ` Michal Koutný
  0 siblings, 1 reply; 12+ messages in thread
From: T.J. Mercier @ 2022-05-04 17:16 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
	Daniel Vetter, Hridya Valsaraju, Christian König,
	John Stultz, Todd Kjos, Carlos Llamas, Suren Baghdasaryan,
	Kalesh Singh, Kenny.Ho-5C7GfCeVMHo, Shuah Khan,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-doc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, May 4, 2022 at 5:10 AM Michal Koutn√Ω <mkoutny-IBi9RG/b67k@public.gmane.org> wrote:
>
> Hello.
>
> On Mon, May 02, 2022 at 11:19:35PM +0000, "T.J. Mercier" <tjmercier@google.com> wrote:
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > [...]
> > +     The device name string is set by a device driver when it registers with the
> > +     GPU cgroup controller to participate in resource accounting.
>
> Are these names available anywhere else for the user? (I.e. would
> drivers add respective sysfs attributes or similar?)
>
Hi, this sounds like it could be a good idea but it'd probably be best
to do this inside gpucg_register_bucket instead of requiring drivers
to perform this externally, possibly in a non-uniform way. Maybe a
sysfs file that prints each name of the gpucg_buckets elements?
However the only names that would result from this series are the
names of the dma-buf heaps, with "-heap" appended. So they are
predictable from the /dev/dma_heap/* names, and only the system and
cma heaps currently exist upstream.

For other future uses of this controller I thought we were headed in
the direction of "standardized" names which would be
predefined/hardcoded and documented, so these names wouldn't really
need to be made available to a user at runtime.
https://lore.kernel.org/lkml/CABdmKX3gTAohaOwkNccGrQyXN9tzT-oEVibO5ZPF+eP+Vq=AOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/
>
> > +     Non-unique names will be rejected at the point of registration.
>
> This doesn't seem relevant to the cgroupfs user, does it?
> I think it should be mentioned at the respective API.
>
Yeah you're right. Thank you.

> HTH,
> Michal
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
       [not found]         ` <20220504122558.GB24172-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
@ 2022-05-04 17:19           ` T.J. Mercier
       [not found]             ` <CABdmKX2DJy0i3XAP7xTduZ8KFVKtgto24w714YJNUb_=pfYiKw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: T.J. Mercier @ 2022-05-04 17:19 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Daniel Vetter,
	Hridya Valsaraju, Christian König, John Stultz, Todd Kjos,
	Carlos Llamas, Suren Baghdasaryan, Kalesh Singh,
	Kenny.Ho-5C7GfCeVMHo, Shuah Khan,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed, May 4, 2022 at 5:26 AM Michal Koutn√Ω <mkoutny-IBi9RG/b67k@public.gmane.org> wrote:
>
> Hello.
>
> On Mon, May 02, 2022 at 11:19:36PM +0000, "T.J. Mercier" <tjmercier@google.com> wrote:
> > This patch adds APIs to:
> > -allow a device to register for memory accounting using the GPU cgroup
> > controller.
> > -charge and uncharge allocated memory to a cgroup.
>
> Is this API for separately built consumers?
> The respective functions should be exported (EXPORT_SYMBOL_GPL) if I
> haven't missed anything.
>
As the only users are dmabuf heaps and dmabuf, and those cannot be
built as modules I did not export the symbols here. However these
definitely would need to be exported to support use by modules, and I
have had to do that in one of my device test trees for this change.
Should I export these now for this series?

> > +#ifdef CONFIG_CGROUP_GPU
> > + /* The GPU cgroup controller data structure */
> > +struct gpucg {
> > +     struct cgroup_subsys_state css;
> > +
> > +     /* list of all resource pools that belong to this cgroup */
> > +     struct list_head rpools;
> > +};
> > +
> > +/* A named entity representing bucket of tracked memory. */
> > +struct gpucg_bucket {
> > +     /* list of various resource pools in various cgroups that the bucket is part of */
> > +     struct list_head rpools;
> > +
> > +     /* list of all buckets registered for GPU cgroup accounting */
> > +     struct list_head bucket_node;
> > +
> > +     /* string to be used as identifier for accounting and limit setting */
> > +     const char *name;
> > +};
>
> Do these struct have to be defined "publicly"?
> I.e. the driver code could just work with gpucg and gpucg_bucket
> pointers.
>
> > +int gpucg_register_bucket(struct gpucg_bucket *bucket, const char *name)
>
> ...and the registration function would return a pointer to newly
> (internally) allocated gpucg_bucket.
>
No, except maybe the gpucg_bucket name which I can add an accessor
function for. Won't this mean depending on LTO for potential inlining
of the functions currently implemented in the header? I'm happy to
make this change, but I wonder why some parts of the kernel take this
approach and others do not.

> Regards,
> Michal

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 1/6] gpu: rfc: Proposal for a GPU cgroup controller
  2022-05-04 17:16         ` T.J. Mercier
@ 2022-05-05 11:29           ` Michal Koutný
  2022-05-05 23:56             ` T.J. Mercier
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Koutný @ 2022-05-05 11:29 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
	Daniel Vetter, Hridya Valsaraju, Christian König,
	John Stultz, Todd Kjos, Carlos Llamas, Suren Baghdasaryan,
	Kalesh Singh, Kenny.Ho, Shuah Khan, kernel-team, cgroups,
	linux-doc, linux-kernel

On Wed, May 04, 2022 at 10:16:50AM -0700, "T.J. Mercier" <tjmercier@google.com> wrote:
> However the only names that would result from this series are the
> names of the dma-buf heaps, with "-heap" appended. So they are
> predictable from the /dev/dma_heap/* names, and only the system and
> cma heaps currently exist upstream.

It's not so important with the read-only stats currently posted (a
crafted sysfs file with these names would be an overlikill)...

> 
> For other future uses of this controller I thought we were headed in
> the direction of "standardized" names which would be
> predefined/hardcoded and documented, so these names wouldn't really
> need to be made available to a user at runtime.
> https://lore.kernel.org/lkml/CABdmKX3gTAohaOwkNccGrQyXN9tzT-oEVibO5ZPF+eP+Vq=AOg@mail.gmail.com/

(Ah, I see.)

...but if writers (limits) are envisioned, the keys should represent
something that the user can derive/construct from available info -- e.g.
the documentation.

OK, so I understand current form just presents some statistics.

Michal

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
       [not found]             ` <CABdmKX2DJy0i3XAP7xTduZ8KFVKtgto24w714YJNUb_=pfYiKw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2022-05-05 11:50               ` Michal Koutný
       [not found]                 ` <20220505115015.GD10890-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Koutný @ 2022-05-05 11:50 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Daniel Vetter,
	Hridya Valsaraju, Christian König, John Stultz, Todd Kjos,
	Carlos Llamas, Suren Baghdasaryan, Kalesh Singh,
	Kenny.Ho-5C7GfCeVMHo, Shuah Khan,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed, May 04, 2022 at 10:19:20AM -0700, "T.J. Mercier" <tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> Should I export these now for this series?

Hehe, _I_ don't know.
Depends on the likelihood this lands in and is built upon.

> No, except maybe the gpucg_bucket name which I can add an accessor
> function for. Won't this mean depending on LTO for potential inlining
> of the functions currently implemented in the header?

Yes.  Also depends how much inlining here would be performance relevant.
I suggested this with an OS vendor hat on, i.e. the less such ABI, the
simpler.

> I'm happy to make this change, but I wonder why some parts of the
> kernel take this approach and others do not.

I think there is no convention (see also
Documentation/process/stable-api-nonsense.rst ;-)).

Regards,
Michal

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 1/6] gpu: rfc: Proposal for a GPU cgroup controller
  2022-05-05 11:29           ` Michal Koutný
@ 2022-05-05 23:56             ` T.J. Mercier
  0 siblings, 0 replies; 12+ messages in thread
From: T.J. Mercier @ 2022-05-05 23:56 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
	Daniel Vetter, Hridya Valsaraju, Christian König,
	John Stultz, Todd Kjos, Carlos Llamas, Suren Baghdasaryan,
	Kalesh Singh, Kenny.Ho, Shuah Khan, kernel-team, cgroups,
	linux-doc, linux-kernel

On Thu, May 5, 2022 at 4:29 AM Michal Koutn√Ω <mkoutny@suse.com> wrote:
>
> On Wed, May 04, 2022 at 10:16:50AM -0700, "T.J. Mercier" <tjmercier@google.com> wrote:
> > However the only names that would result from this series are the
> > names of the dma-buf heaps, with "-heap" appended. So they are
> > predictable from the /dev/dma_heap/* names, and only the system and
> > cma heaps currently exist upstream.
>
> It's not so important with the read-only stats currently posted (a
> crafted sysfs file with these names would be an overlikill)...
>
> >
> > For other future uses of this controller I thought we were headed in
> > the direction of "standardized" names which would be
> > predefined/hardcoded and documented, so these names wouldn't really
> > need to be made available to a user at runtime.
> > https://lore.kernel.org/lkml/CABdmKX3gTAohaOwkNccGrQyXN9tzT-oEVibO5ZPF+eP+Vq=AOg@mail.gmail.com/
>
> (Ah, I see.)
>
> ...but if writers (limits) are envisioned, the keys should represent
> something that the user can derive/construct from available info -- e.g.
> the documentation.
>
> OK, so I understand current form just presents some statistics.
>
Yup, thanks for taking a look.

> Michal

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
       [not found]                 ` <20220505115015.GD10890-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
@ 2022-05-05 23:56                   ` T.J. Mercier
  0 siblings, 0 replies; 12+ messages in thread
From: T.J. Mercier @ 2022-05-05 23:56 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Daniel Vetter,
	Hridya Valsaraju, Christian König, John Stultz, Todd Kjos,
	Carlos Llamas, Suren Baghdasaryan, Kalesh Singh,
	Kenny.Ho-5C7GfCeVMHo, Shuah Khan,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Thu, May 5, 2022 at 4:50 AM 'Michal Koutn√Ω' via kernel-team
<kernel-team-z5hGa2qSFaRBDgjK7y7TUQ@public.gmane.org> wrote:
>
> On Wed, May 04, 2022 at 10:19:20AM -0700, "T.J. Mercier" <tjmercier@google.com> wrote:
> > Should I export these now for this series?
>
> Hehe, _I_ don't know.
> Depends on the likelihood this lands in and is built upon.
>
Ok, I'll leave these unexported for now unless I hear otherwise.

> > No, except maybe the gpucg_bucket name which I can add an accessor
> > function for. Won't this mean depending on LTO for potential inlining
> > of the functions currently implemented in the header?
>
> Yes.  Also depends how much inlining here would be performance relevant.
> I suggested this with an OS vendor hat on, i.e. the less such ABI, the
> simpler.
>
> > I'm happy to make this change, but I wonder why some parts of the
> > kernel take this approach and others do not.
>
> I think there is no convention (see also
> Documentation/process/stable-api-nonsense.rst ;-)).
>
Alright I'll queue this change up for the next rev.

> Regards,
> Michal

Thanks again!

>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe-z5hGa2qSFaTowKkBSvOlow@public.gmane.org
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-05-05 23:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-05-02 23:19 [PATCH v6 0/6] Proposal for a GPU cgroup controller T.J. Mercier
2022-05-02 23:19 ` [PATCH v6 1/6] gpu: rfc: " T.J. Mercier
     [not found]   ` <20220502231944.3891435-2-tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2022-05-04 12:10     ` Michal Koutný
     [not found]       ` <20220504121052.GA24172-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2022-05-04 17:16         ` T.J. Mercier
2022-05-05 11:29           ` Michal Koutný
2022-05-05 23:56             ` T.J. Mercier
     [not found] ` <20220502231944.3891435-1-tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2022-05-02 23:19   ` [PATCH v6 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory T.J. Mercier
     [not found]     ` <20220502231944.3891435-3-tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2022-05-04 12:25       ` Michal Koutný
     [not found]         ` <20220504122558.GB24172-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2022-05-04 17:19           ` T.J. Mercier
     [not found]             ` <CABdmKX2DJy0i3XAP7xTduZ8KFVKtgto24w714YJNUb_=pfYiKw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-05-05 11:50               ` Michal Koutný
     [not found]                 ` <20220505115015.GD10890-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2022-05-05 23:56                   ` T.J. Mercier
2022-05-02 23:19 ` [PATCH v6 4/6] dmabuf: Add gpu cgroup charge transfer function T.J. Mercier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox