[RFC v4 00/10] DRM scheduling cgroup controller

public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC v4 00/10] DRM scheduling cgroup controller
@ 2023-03-14 14:18 Tvrtko Ursulin
  2023-03-14 14:18 ` [RFC 01/10] drm: Track clients by tgid and not tid Tvrtko Ursulin
                   ` (10 more replies)
  0 siblings, 11 replies; 19+ messages in thread
From: Tvrtko Ursulin @ 2023-03-14 14:18 UTC (permalink / raw)
  To: Intel-gfx, dri-devel
  Cc: Rob Clark, Kenny.Ho, Daniel Vetter, Johannes Weiner, linux-kernel,
	Stéphane Marchesin, Christian König, Zefan Li,
	Dave Airlie, Tejun Heo, cgroups, T . J . Mercier

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

This series contains a proposal for a DRM scheduling cgroup controller which
implements a weight based hierarchical GPU usage budget based controller
similar in concept to some of the existing controllers.

Motivation mostly comes from my earlier proposal where I identified that GPU
scheduling lags significantly behind what is available for CPU and IO. Whereas
back then I was proposing to somehow tie this with process nice, feedback mostly
was that people wanted cgroups. So here it is - in the world of heterogenous
computing pipelines I think it is time to do something about this gap.

Code is not finished but should survive some light experimenting with. I am
sharing it early since the topic has been controversial in the past. I hope to
demonstrate there are gains to be had in real world usage(*), today, and that
the concepts the proposal relies are well enough established and stable.

*) Specifically under ChromeOS which uses cgroups to control CPU bandwith for
   VMs based on the window focused status. It can be demonstrated how GPU
   scheduling control can easily be integrated into that setup.

*) Another real world example later in the cover letter.

There should be no conflict with this proposal and any efforts to implement
memory usage based controller. Skeleton DRM cgroup controller is deliberatly
purely a skeleton patch where any further functionality can be added with no
real conflicts. [In fact, perhaps scheduling is even easier to deal with than
memory accounting.]

Structure of the series is as follows:

  1-2) Improve client ownership tracking in DRM core.
    3) Adds a skeleton DRM cgroup controller with no functionality.
  4-7) Laying down some infrastructure to enable the controller.
    8) The controller itself.
 9-10) i915 support for the controller.

The proposals defines a delegation of duties between the tree parties: cgroup
controller, DRM core and individual drivers. Two way communication interfaces
are then defined to enable the delegation to work.

DRM scheduling soft limits
~~~~~~~~~~~~~~~~~~~~~~~~~~

Because of the heterogenous hardware and driver DRM capabilities, soft limits
are implemented as a loose co-operative (bi-directional) interface between the
controller and DRM core.

The controller configures the GPU time allowed per group and periodically scans
the belonging tasks to detect the over budget condition, at which point it
invokes a callback notifying the DRM core of the condition.

DRM core provides an API to query per process GPU utilization and 2nd API to
receive notification from the cgroup controller when the group enters or exits
the over budget condition.

Individual DRM drivers which implement the interface are expected to act on this
in the best-effort manner only. There are no guarantees that the soft limits
will be respected.

DRM scheduling soft limits interface files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  drm.weight
	Standard cgroup weight based control [1, 10000] used to configure the
	relative distributing of GPU time between the sibling groups.

This builds upon the per client GPU utilisation work which landed recently for a
few drivers. My thinking is that in principle, an intersect of drivers which
support both that and some sort of scheduling control, like  priorities, could
also in theory support this controller.

Another really interesting angle for this controller is that it mimics the same
control menthod used by the CPU scheduler. That is the proportional/weight based
GPU time budgeting. Which makes it easy to configure and does not need a new
mental model.

However, as the introduction mentions, GPUs are much more heterogenous and
therefore the controller uses very "soft" wording as to what it promises. The
general statement is that it can define budgets, notify clients when they are
over them, and let individual drivers implement best effort handling of those
conditions.

Delegation of duties in the implementation goes likes this:

 * DRM cgroup controller implements the control files, the scanning loop and
   tracks the DRM clients associated with each cgroup. It provides API DRM
   core needs to call to (de)register and migrate clients.
 * DRM core defines two call-backs which the core calls directly: First for
   querying GPU time by a client and second for notifying the client that it
   is over budget. It calls controller API for (de)registering clients and
   migrating then between tasks on file descriptor hand over.
 * Individual drivers implement the above mentiopned callbacks and register
   them with the DRM core.

What I have demonstrated in practice is that when wired to i915, in a really
primitive way where the over-budget condition simply lowers the scheduling
priority, the concept can be almost equally effective as the static priority
control. I say almost because the design where budget control depends on the
periodic usage scanning has a fundamental delay, so responsiveness will depend
on the scanning period, which may or may not be a problem for a particular use
case.

There are also interesting conversations to be had around mental models for what
is GPU usage as a single number when faced with GPUs which have different
execution engines. To an extent this is similar to the multi-core and cgroup
CPU controller problems, but definitely goes further than that.

I deliberately did not want to include any such complications in the controller
itself and left the individual drivers to handle it. For instance in the i915
over-budget callback it will not do anything unless client's GPU usage is on a
physical engine which is oversubscribed. This enables multiple clients to be
harmlessly over budget, as long as they are not competing for the same GPU
resource.

Example usage from within a Linux desktop
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Standard Linux distributions like Ubuntu already uses cgroups heavily for
session management and that could easily be extended with the DRM controller.

After logging into the system graphically we can enable the DRM controller
throughout the cgroups hierarchy:

echo +drm > /sys/fs/cgroup/cgroup.subtree_control
echo +drm > /sys/fs/cgroup/user.slice/cgroup.subtree_control
echo +drm > /sys/fs/cgroup/user.slice/user-$(id -u).slice/cgroup.subtree_control

Next we will open two SSH sessions, just so separate cgroups are handily created
by systemd for this experiment.

Roughly simultaneously we run the following two benchmarks in each session
respectively:

1)
./GpuTest /test=pixmark_julia_fp32 /width=1920 /height=1080 /fullscreen /no_scorebox /benchmark /benchmark_duration_ms=60000

2)
vblank_mode=0 bin/testfw_app --gl_api=desktop_core --width=1920 --height=1080 --fullscreen 1 --gfx=glfw -t gl_manhattan

(The only reason for vsync off here is because I struggled to find an easily
runnable and demanding enough benchmark, or to run on a screen large enough to
make even a simpler ones demanding.)

With this test we get 252fps from GpuTest and 96fps from GfxBenchmark.

Premise here is that one of these GPU intensive benchmarks is intended to be ran
by the user with lower priority. Imagine kicking off some background compute
processing and continuing to use the UI for other tasks. Hence the user will now
re-run the test by first lowering the weight control of the first session (DRM
cgroup):

1)
echo 50 | sudo tee /sys/fs/cgroup/`cut -d':' -f3 /proc/self/cgroup`/drm.weight
./GpuTest /test=pixmark_julia_fp32 /width=1920 /height=1080 /fullscreen /no_scorebox /benchmark /benchmark_duration_ms=60000

2)
vblank_mode=0 bin/testfw_app --gl_api=desktop_core --width=1920 --height=1080 --fullscreen 1 --gfx=glfw -t gl_manhattan

In this case we will see that GpuTest has recorded 208fps (~18% down) and
GfxBenchmark 114fps (18% up), demonstrating that even a very simple approach of
wiring up i915 to the DRM cgroup controller can enable external GPU scheduling
control.

* Note here that default weight is 100, so setting 50 for the background session
  is asking the controller to give it half as much GPU bandwidth.

v2:
 * Prefaced the series with some core DRM work as suggested by Christian.
 * Dropped the priority based controller for now.
 * Dropped the introspection cgroup controller file.
 * Implemented unused budget sharing/propagation.
 * Some small fixes/tweak as per review feedback and in general.

v3:
 * Dropped one upstreamed patch.
 * Logging cleanup (use DRM macros where available).

v4:
 * Dropped the struct pid tracking indirection in favour of tracking individual
   DRM clients directly in the controller. (Michal Koutn√Ω)
 * Added boot time param for configuring the scanning period. (Tejun Heo)
 * Improved spreading of unused budget to over budget clients, regardless of
   their location in the tree so that all unused budget can be utilized.
 * Made scanning more robust by not re-starting it on every client de-
   registration and removal. Instead new baseline GPU activity data is simply
   collected on those events and next scan invocation can proceed as scheduled.
 * Dropped the debugging aids from the series.

TODOs/Opens:

 * For now (RFC) I haven't implemented the 2nd suggestion from Tejun of having
   a shadow tree which would only contain groups with DRM clients. (Purpose
   being less nodes to traverse in the scanning loop.)

 * Is the global state passing from can_attach to attach really okay? (I need
   source and destination css.)

Tvrtko Ursulin (10):
  drm: Track clients by tgid and not tid
  drm: Update file owner during use
  cgroup: Add the DRM cgroup controller
  drm/cgroup: Track DRM clients per cgroup
  drm/cgroup: Add ability to query drm cgroup GPU time
  drm/cgroup: Add over budget signalling callback
  drm/cgroup: Only track clients which are providing drm_cgroup_ops
  cgroup/drm: Introduce weight based drm cgroup control
  drm/i915: Wire up with drm controller GPU time query
  drm/i915: Implement cgroup controller over budget throttling

 Documentation/admin-guide/cgroup-v2.rst       |  31 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       |   6 +-
 drivers/gpu/drm/drm_auth.c                    |   3 +-
 drivers/gpu/drm/drm_debugfs.c                 |  12 +-
 drivers/gpu/drm/drm_file.c                    |  46 +-
 drivers/gpu/drm/drm_ioctl.c                   |   3 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  38 +-
 drivers/gpu/drm/i915/i915_driver.c            |  11 +
 drivers/gpu/drm/i915/i915_drm_client.c        | 209 ++++++-
 drivers/gpu/drm/i915/i915_drm_client.h        |  13 +
 drivers/gpu/drm/nouveau/nouveau_drm.c         |   5 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c           |   6 +-
 include/drm/drm_drv.h                         |  36 ++
 include/drm/drm_file.h                        |  19 +-
 include/linux/cgroup_drm.h                    |  29 +
 include/linux/cgroup_subsys.h                 |   4 +
 init/Kconfig                                  |   7 +
 kernel/cgroup/Makefile                        |   1 +
 kernel/cgroup/drm.c                           | 568 ++++++++++++++++++
 19 files changed, 1020 insertions(+), 27 deletions(-)
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

-- 
2.37.2

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC 01/10] drm: Track clients by tgid and not tid
  2023-03-14 14:18 [RFC v4 00/10] DRM scheduling cgroup controller Tvrtko Ursulin
@ 2023-03-14 14:18 ` Tvrtko Ursulin
  2023-03-14 15:33   ` Christian König
  2023-03-14 14:18 ` [RFC 02/10] drm: Update file owner during use Tvrtko Ursulin
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: Tvrtko Ursulin @ 2023-03-14 14:18 UTC (permalink / raw)
  To: Intel-gfx, dri-devel
  Cc: Rob Clark, Kenny.Ho, Daniel Vetter, Johannes Weiner, linux-kernel,
	Stéphane Marchesin, Christian König, Alex Deucher,
	linux-graphics-maintainer, Zefan Li, Dave Airlie, Tejun Heo,
	cgroups, T . J . Mercier, Zack Rusin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Thread group id (aka pid from userspace point of view) is a more
interesting thing to show as an owner of a DRM fd, so track and show that
instead of the thread id.

In the next patch we will make the owner updated post file descriptor
handover, which will also be tgid based to avoid ping-pong when multiple
threads access the fd.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Zack Rusin <zackr@vmware.com>
Cc: linux-graphics-maintainer@vmware.com
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian KÃ¶nig" <christian.koenig@amd.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +-
 drivers/gpu/drm/drm_debugfs.c           | 4 ++--
 drivers/gpu/drm/drm_file.c              | 2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c     | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index d8e683688daa..863cb668e000 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -969,7 +969,7 @@ static int amdgpu_debugfs_gem_info_show(struct seq_file *m, void *unused)
 		 * Therefore, we need to protect this ->comm access using RCU.
 		 */
 		rcu_read_lock();
-		task = pid_task(file->pid, PIDTYPE_PID);
+		task = pid_task(file->pid, PIDTYPE_TGID);
 		seq_printf(m, "pid %8d command %s:\n", pid_nr(file->pid),
 			   task ? task->comm : "<unknown>");
 		rcu_read_unlock();
diff --git a/drivers/gpu/drm/drm_debugfs.c b/drivers/gpu/drm/drm_debugfs.c
index 4f643a490dc3..4855230ba2c6 100644
--- a/drivers/gpu/drm/drm_debugfs.c
+++ b/drivers/gpu/drm/drm_debugfs.c
@@ -80,7 +80,7 @@ static int drm_clients_info(struct seq_file *m, void *data)
 	seq_printf(m,
 		   "%20s %5s %3s master a %5s %10s\n",
 		   "command",
-		   "pid",
+		   "tgid",
 		   "dev",
 		   "uid",
 		   "magic");
@@ -94,7 +94,7 @@ static int drm_clients_info(struct seq_file *m, void *data)
 		bool is_current_master = drm_is_current_master(priv);
 
 		rcu_read_lock(); /* locks pid_task()->comm */
-		task = pid_task(priv->pid, PIDTYPE_PID);
+		task = pid_task(priv->pid, PIDTYPE_TGID);
 		uid = task ? __task_cred(task)->euid : GLOBAL_ROOT_UID;
 		seq_printf(m, "%20s %5d %3d   %c    %c %5d %10u\n",
 			   task ? task->comm : "<unknown>",
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index a51ff8cee049..c1018c470047 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -156,7 +156,7 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
 	if (!file)
 		return ERR_PTR(-ENOMEM);
 
-	file->pid = get_pid(task_pid(current));
+	file->pid = get_pid(task_tgid(current));
 	file->minor = minor;
 
 	/* for compatibility root is always authenticated */
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
index d6baf73a6458..c0da89e16e6f 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
@@ -241,7 +241,7 @@ static int vmw_debugfs_gem_info_show(struct seq_file *m, void *unused)
 		 * Therefore, we need to protect this ->comm access using RCU.
 		 */
 		rcu_read_lock();
-		task = pid_task(file->pid, PIDTYPE_PID);
+		task = pid_task(file->pid, PIDTYPE_TGID);
 		seq_printf(m, "pid %8d command %s:\n", pid_nr(file->pid),
 			   task ? task->comm : "<unknown>");
 		rcu_read_unlock();
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 02/10] drm: Update file owner during use
  2023-03-14 14:18 [RFC v4 00/10] DRM scheduling cgroup controller Tvrtko Ursulin
  2023-03-14 14:18 ` [RFC 01/10] drm: Track clients by tgid and not tid Tvrtko Ursulin
@ 2023-03-14 14:18 ` Tvrtko Ursulin
  2023-03-14 15:49   ` Christian König
       [not found]   ` <20230314141904.1210824-3-tvrtko.ursulin-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
  2023-03-14 14:18 ` [RFC 03/10] cgroup: Add the DRM cgroup controller Tvrtko Ursulin
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 19+ messages in thread
From: Tvrtko Ursulin @ 2023-03-14 14:18 UTC (permalink / raw)
  To: Intel-gfx, dri-devel
  Cc: Rob Clark, Kenny.Ho, Daniel Vetter, Daniel Vetter,
	Johannes Weiner, linux-kernel, Stéphane Marchesin,
	Christian König, Zefan Li, Dave Airlie, Tejun Heo, cgroups,
	T . J . Mercier

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

With the typical model where the display server opends the file descriptor
and then hands it over to the client we were showing stale data in
debugfs.

Fix it by updating the drm_file->pid on ioctl access from a different
process.

The field is also made RCU protected to allow for lockless readers. Update
side is protected with dev->filelist_mutex.

Before:

$ cat /sys/kernel/debug/dri/0/clients
             command   pid dev master a   uid      magic
                Xorg  2344   0   y    y     0          0
                Xorg  2344   0   n    y     0          2
                Xorg  2344   0   n    y     0          3
                Xorg  2344   0   n    y     0          4

After:

$ cat /sys/kernel/debug/dri/0/clients
             command  tgid dev master a   uid      magic
                Xorg   830   0   y    y     0          0
       xfce4-session   880   0   n    y     0          1
               xfwm4   943   0   n    y     0          2
           neverball  1095   0   n    y     0          3

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: "Christian KÃ¶nig" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  6 ++--
 drivers/gpu/drm/drm_auth.c              |  3 +-
 drivers/gpu/drm/drm_debugfs.c           | 10 ++++---
 drivers/gpu/drm/drm_file.c              | 40 +++++++++++++++++++++++--
 drivers/gpu/drm/drm_ioctl.c             |  3 ++
 drivers/gpu/drm/nouveau/nouveau_drm.c   |  5 +++-
 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c     |  6 ++--
 include/drm/drm_file.h                  | 13 ++++++--
 8 files changed, 71 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 863cb668e000..67ce634992f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -960,6 +960,7 @@ static int amdgpu_debugfs_gem_info_show(struct seq_file *m, void *unused)
 	list_for_each_entry(file, &dev->filelist, lhead) {
 		struct task_struct *task;
 		struct drm_gem_object *gobj;
+		struct pid *pid;
 		int id;
 
 		/*
@@ -969,8 +970,9 @@ static int amdgpu_debugfs_gem_info_show(struct seq_file *m, void *unused)
 		 * Therefore, we need to protect this ->comm access using RCU.
 		 */
 		rcu_read_lock();
-		task = pid_task(file->pid, PIDTYPE_TGID);
-		seq_printf(m, "pid %8d command %s:\n", pid_nr(file->pid),
+		pid = rcu_dereference(file->pid);
+		task = pid_task(pid, PIDTYPE_TGID);
+		seq_printf(m, "pid %8d command %s:\n", pid_nr(pid),
 			   task ? task->comm : "<unknown>");
 		rcu_read_unlock();
 
diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
index cf92a9ae8034..2ed2585ded37 100644
--- a/drivers/gpu/drm/drm_auth.c
+++ b/drivers/gpu/drm/drm_auth.c
@@ -235,7 +235,8 @@ static int drm_new_set_master(struct drm_device *dev, struct drm_file *fpriv)
 static int
 drm_master_check_perm(struct drm_device *dev, struct drm_file *file_priv)
 {
-	if (file_priv->pid == task_pid(current) && file_priv->was_master)
+	if (file_priv->was_master &&
+	    rcu_access_pointer(file_priv->pid) == task_pid(current))
 		return 0;
 
 	if (!capable(CAP_SYS_ADMIN))
diff --git a/drivers/gpu/drm/drm_debugfs.c b/drivers/gpu/drm/drm_debugfs.c
index 4855230ba2c6..b46f5ceb24c6 100644
--- a/drivers/gpu/drm/drm_debugfs.c
+++ b/drivers/gpu/drm/drm_debugfs.c
@@ -90,15 +90,17 @@ static int drm_clients_info(struct seq_file *m, void *data)
 	 */
 	mutex_lock(&dev->filelist_mutex);
 	list_for_each_entry_reverse(priv, &dev->filelist, lhead) {
-		struct task_struct *task;
 		bool is_current_master = drm_is_current_master(priv);
+		struct task_struct *task;
+		struct pid *pid;
 
-		rcu_read_lock(); /* locks pid_task()->comm */
-		task = pid_task(priv->pid, PIDTYPE_TGID);
+		rcu_read_lock(); /* Locks priv->pid and pid_task()->comm! */
+		pid = rcu_dereference(priv->pid);
+		task = pid_task(pid, PIDTYPE_TGID);
 		uid = task ? __task_cred(task)->euid : GLOBAL_ROOT_UID;
 		seq_printf(m, "%20s %5d %3d   %c    %c %5d %10u\n",
 			   task ? task->comm : "<unknown>",
-			   pid_vnr(priv->pid),
+			   pid_vnr(pid),
 			   priv->minor->index,
 			   is_current_master ? 'y' : 'n',
 			   priv->authenticated ? 'y' : 'n',
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index c1018c470047..f2f8175ece15 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -156,7 +156,7 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
 	if (!file)
 		return ERR_PTR(-ENOMEM);
 
-	file->pid = get_pid(task_tgid(current));
+	rcu_assign_pointer(file->pid, get_pid(task_tgid(current)));
 	file->minor = minor;
 
 	/* for compatibility root is always authenticated */
@@ -196,7 +196,7 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
 		drm_syncobj_release(file);
 	if (drm_core_check_feature(dev, DRIVER_GEM))
 		drm_gem_release(dev, file);
-	put_pid(file->pid);
+	put_pid(rcu_access_pointer(file->pid));
 	kfree(file);
 
 	return ERR_PTR(ret);
@@ -287,7 +287,7 @@ void drm_file_free(struct drm_file *file)
 
 	WARN_ON(!list_empty(&file->event_list));
 
-	put_pid(file->pid);
+	put_pid(rcu_access_pointer(file->pid));
 	kfree(file);
 }
 
@@ -501,6 +501,40 @@ int drm_release(struct inode *inode, struct file *filp)
 }
 EXPORT_SYMBOL(drm_release);
 
+void drm_file_update_pid(struct drm_file *filp)
+{
+	struct drm_device *dev;
+	struct pid *pid, *old;
+
+	/*
+	 * Master nodes need to keep the original ownership in order for
+	 * drm_master_check_perm to keep working correctly. (See comment in
+	 * drm_auth.c.)
+	 */
+	if (filp->was_master)
+		return;
+
+	pid = task_tgid(current);
+
+	/*
+	 * Quick unlocked check since the model is a single handover followed by
+	 * exclusive repeated use.
+	 */
+	if (pid == rcu_access_pointer(filp->pid))
+		return;
+
+	dev = filp->minor->dev;
+	mutex_lock(&dev->filelist_mutex);
+	old = rcu_replace_pointer(filp->pid, pid, 1);
+	mutex_unlock(&dev->filelist_mutex);
+
+	if (pid != old) {
+		get_pid(pid);
+		synchronize_rcu();
+		put_pid(old);
+	}
+}
+
 /**
  * drm_release_noglobal - release method for DRM file
  * @inode: device inode
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 7c9d66ee917d..305b18d9d7b6 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -775,6 +775,9 @@ long drm_ioctl_kernel(struct file *file, drm_ioctl_t *func, void *kdata,
 	struct drm_device *dev = file_priv->minor->dev;
 	int retcode;
 
+	/* Update drm_file owner if fd was passed along. */
+	drm_file_update_pid(file_priv);
+
 	if (drm_dev_is_unplugged(dev))
 		return -ENODEV;
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index cc7c5b4a05fd..57aeaf7af613 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -1095,7 +1095,10 @@ nouveau_drm_open(struct drm_device *dev, struct drm_file *fpriv)
 	}
 
 	get_task_comm(tmpname, current);
-	snprintf(name, sizeof(name), "%s[%d]", tmpname, pid_nr(fpriv->pid));
+	rcu_read_lock();
+	snprintf(name, sizeof(name), "%s[%d]",
+		 tmpname, pid_nr(rcu_dereference(fpriv->pid)));
+	rcu_read_unlock();
 
 	if (!(cli = kzalloc(sizeof(*cli), GFP_KERNEL))) {
 		ret = -ENOMEM;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
index c0da89e16e6f..a07e5b7e2f2f 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
@@ -232,6 +232,7 @@ static int vmw_debugfs_gem_info_show(struct seq_file *m, void *unused)
 	list_for_each_entry(file, &dev->filelist, lhead) {
 		struct task_struct *task;
 		struct drm_gem_object *gobj;
+		struct pid *pid;
 		int id;
 
 		/*
@@ -241,8 +242,9 @@ static int vmw_debugfs_gem_info_show(struct seq_file *m, void *unused)
 		 * Therefore, we need to protect this ->comm access using RCU.
 		 */
 		rcu_read_lock();
-		task = pid_task(file->pid, PIDTYPE_TGID);
-		seq_printf(m, "pid %8d command %s:\n", pid_nr(file->pid),
+		pid = rcu_dereference(file->pid);
+		task = pid_task(pid, PIDTYPE_TGID);
+		seq_printf(m, "pid %8d command %s:\n", pid_nr(pid),
 			   task ? task->comm : "<unknown>");
 		rcu_read_unlock();
 
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 0d1f853092ab..27d545131d4a 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -255,8 +255,15 @@ struct drm_file {
 	/** @master_lookup_lock: Serializes @master. */
 	spinlock_t master_lookup_lock;
 
-	/** @pid: Process that opened this file. */
-	struct pid *pid;
+	/**
+	 * @pid: Process that is using this file.
+	 *
+	 * Must only be dereferenced under a rcu_read_lock or equivalent.
+	 *
+	 * Updates are guarded with dev->filelist_mutex and reference must be
+	 * dropped after a RCU grace period to accommodate lockless readers.
+	 */
+	struct pid __rcu *pid;
 
 	/** @magic: Authentication magic, see @authenticated. */
 	drm_magic_t magic;
@@ -415,6 +422,8 @@ static inline bool drm_is_accel_client(const struct drm_file *file_priv)
 	return file_priv->minor->type == DRM_MINOR_ACCEL;
 }
 
+void drm_file_update_pid(struct drm_file *);
+
 int drm_open(struct inode *inode, struct file *filp);
 int drm_open_helper(struct file *filp, struct drm_minor *minor);
 ssize_t drm_read(struct file *filp, char __user *buffer,
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 03/10] cgroup: Add the DRM cgroup controller
  2023-03-14 14:18 [RFC v4 00/10] DRM scheduling cgroup controller Tvrtko Ursulin
  2023-03-14 14:18 ` [RFC 01/10] drm: Track clients by tgid and not tid Tvrtko Ursulin
  2023-03-14 14:18 ` [RFC 02/10] drm: Update file owner during use Tvrtko Ursulin
@ 2023-03-14 14:18 ` Tvrtko Ursulin
  2023-03-14 14:18 ` [RFC 04/10] drm/cgroup: Track DRM clients per cgroup Tvrtko Ursulin
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Tvrtko Ursulin @ 2023-03-14 14:18 UTC (permalink / raw)
  To: Intel-gfx, dri-devel
  Cc: Rob Clark, Kenny.Ho, Daniel Vetter, Johannes Weiner, linux-kernel,
	Stéphane Marchesin, Christian König, Zefan Li,
	Dave Airlie, Tejun Heo, cgroups, T . J . Mercier

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Skeleton controller without any functionality.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 include/linux/cgroup_drm.h    |  9 ++++++
 include/linux/cgroup_subsys.h |  4 +++
 init/Kconfig                  |  7 ++++
 kernel/cgroup/Makefile        |  1 +
 kernel/cgroup/drm.c           | 60 +++++++++++++++++++++++++++++++++++
 5 files changed, 81 insertions(+)
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
new file mode 100644
index 000000000000..8ef66a47619f
--- /dev/null
+++ b/include/linux/cgroup_drm.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _CGROUP_DRM_H
+#define _CGROUP_DRM_H
+
+#endif	/* _CGROUP_DRM_H */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 445235487230..49460494a010 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -65,6 +65,10 @@ SUBSYS(rdma)
 SUBSYS(misc)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_DRM)
+SUBSYS(drm)
+#endif
+
 /*
  * The following subsystems are not supported on the default hierarchy.
  */
diff --git a/init/Kconfig b/init/Kconfig
index 1fb5f313d18f..1679229143c0 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1093,6 +1093,13 @@ config CGROUP_RDMA
 	  Attaching processes with active RDMA resources to the cgroup
 	  hierarchy is allowed even if can cross the hierarchy's limit.
 
+config CGROUP_DRM
+	bool "DRM controller"
+	help
+	  Provides the DRM subsystem controller.
+
+	  ...
+
 config CGROUP_FREEZER
 	bool "Freezer controller"
 	help
diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index 12f8457ad1f9..849bd2917477 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -6,4 +6,5 @@ obj-$(CONFIG_CGROUP_PIDS) += pids.o
 obj-$(CONFIG_CGROUP_RDMA) += rdma.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
 obj-$(CONFIG_CGROUP_MISC) += misc.o
+obj-$(CONFIG_CGROUP_DRM) += drm.o
 obj-$(CONFIG_CGROUP_DEBUG) += debug.o
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
new file mode 100644
index 000000000000..02c8eaa633d3
--- /dev/null
+++ b/kernel/cgroup/drm.c
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include <linux/cgroup.h>
+#include <linux/cgroup_drm.h>
+#include <linux/slab.h>
+
+struct drm_cgroup_state {
+	struct cgroup_subsys_state css;
+};
+
+struct drm_root_cgroup_state {
+	struct drm_cgroup_state drmcs;
+};
+
+static struct drm_root_cgroup_state root_drmcs;
+
+static inline struct drm_cgroup_state *
+css_to_drmcs(struct cgroup_subsys_state *css)
+{
+	return container_of(css, struct drm_cgroup_state, css);
+}
+
+static void drmcs_free(struct cgroup_subsys_state *css)
+{
+	struct drm_cgroup_state *drmcs = css_to_drmcs(css);
+
+	if (drmcs != &root_drmcs.drmcs)
+		kfree(drmcs);
+}
+
+static struct cgroup_subsys_state *
+drmcs_alloc(struct cgroup_subsys_state *parent_css)
+{
+	struct drm_cgroup_state *drmcs;
+
+	if (!parent_css) {
+		drmcs = &root_drmcs.drmcs;
+	} else {
+		drmcs = kzalloc(sizeof(*drmcs), GFP_KERNEL);
+		if (!drmcs)
+			return ERR_PTR(-ENOMEM);
+	}
+
+	return &drmcs->css;
+}
+
+struct cftype files[] = {
+	{ } /* Zero entry terminates. */
+};
+
+struct cgroup_subsys drm_cgrp_subsys = {
+	.css_alloc	= drmcs_alloc,
+	.css_free	= drmcs_free,
+	.early_init	= false,
+	.legacy_cftypes	= files,
+	.dfl_cftypes	= files,
+};
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 04/10] drm/cgroup: Track DRM clients per cgroup
  2023-03-14 14:18 [RFC v4 00/10] DRM scheduling cgroup controller Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2023-03-14 14:18 ` [RFC 03/10] cgroup: Add the DRM cgroup controller Tvrtko Ursulin
@ 2023-03-14 14:18 ` Tvrtko Ursulin
  2023-03-14 14:18 ` [RFC 05/10] drm/cgroup: Add ability to query drm cgroup GPU time Tvrtko Ursulin
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Tvrtko Ursulin @ 2023-03-14 14:18 UTC (permalink / raw)
  To: Intel-gfx, dri-devel
  Cc: Rob Clark, Kenny.Ho, Daniel Vetter, Johannes Weiner, linux-kernel,
	Stéphane Marchesin, Christian König, Zefan Li,
	Dave Airlie, Tejun Heo, cgroups, T . J . Mercier

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

To enable propagation of settings from the cgroup DRM controller to DRM
and vice-versa, we need to start tracking to which cgroups DRM clients
belong.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/drm_file.c |  6 ++++
 include/drm/drm_file.h     |  6 ++++
 include/linux/cgroup_drm.h | 20 ++++++++++++
 kernel/cgroup/drm.c        | 62 +++++++++++++++++++++++++++++++++++++-
 4 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index f2f8175ece15..f6bad820b7ee 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -32,6 +32,7 @@
  */
 
 #include <linux/anon_inodes.h>
+#include <linux/cgroup_drm.h>
 #include <linux/dma-fence.h>
 #include <linux/file.h>
 #include <linux/module.h>
@@ -300,6 +301,8 @@ static void drm_close_helper(struct file *filp)
 	list_del(&file_priv->lhead);
 	mutex_unlock(&dev->filelist_mutex);
 
+	drmcgroup_client_close(file_priv);
+
 	drm_file_free(file_priv);
 }
 
@@ -363,6 +366,8 @@ int drm_open_helper(struct file *filp, struct drm_minor *minor)
 	list_add(&priv->lhead, &dev->filelist);
 	mutex_unlock(&dev->filelist_mutex);
 
+	drmcgroup_client_open(priv);
+
 #ifdef CONFIG_DRM_LEGACY
 #ifdef __alpha__
 	/*
@@ -529,6 +534,7 @@ void drm_file_update_pid(struct drm_file *filp)
 	mutex_unlock(&dev->filelist_mutex);
 
 	if (pid != old) {
+		drmcgroup_client_migrate(filp);
 		get_pid(pid);
 		synchronize_rcu();
 		put_pid(old);
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 27d545131d4a..e3e0de0a8ec4 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -30,6 +30,7 @@
 #ifndef _DRM_FILE_H_
 #define _DRM_FILE_H_
 
+#include <linux/cgroup.h>
 #include <linux/types.h>
 #include <linux/completion.h>
 #include <linux/idr.h>
@@ -279,6 +280,11 @@ struct drm_file {
 	/** @minor: &struct drm_minor for this file. */
 	struct drm_minor *minor;
 
+#if IS_ENABLED(CONFIG_CGROUP_DRM)
+	struct cgroup_subsys_state *__css;
+	struct list_head clink;
+#endif
+
 	/**
 	 * @object_idr:
 	 *
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 8ef66a47619f..176431842d8e 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -6,4 +6,24 @@
 #ifndef _CGROUP_DRM_H
 #define _CGROUP_DRM_H
 
+#include <drm/drm_file.h>
+
+#if IS_ENABLED(CONFIG_CGROUP_DRM)
+void drmcgroup_client_open(struct drm_file *file_priv);
+void drmcgroup_client_close(struct drm_file *file_priv);
+void drmcgroup_client_migrate(struct drm_file *file_priv);
+#else
+static inline void drmcgroup_client_open(struct drm_file *file_priv)
+{
+}
+
+static inline void drmcgroup_client_close(struct drm_file *file_priv)
+{
+}
+
+static void drmcgroup_client_migrate(struct drm_file *file_priv)
+{
+}
+#endif
+
 #endif	/* _CGROUP_DRM_H */
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 02c8eaa633d3..d702be1b441f 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -5,17 +5,25 @@
 
 #include <linux/cgroup.h>
 #include <linux/cgroup_drm.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
 #include <linux/slab.h>
 
 struct drm_cgroup_state {
 	struct cgroup_subsys_state css;
+
+	struct list_head clients;
 };
 
 struct drm_root_cgroup_state {
 	struct drm_cgroup_state drmcs;
 };
 
-static struct drm_root_cgroup_state root_drmcs;
+static struct drm_root_cgroup_state root_drmcs = {
+	.drmcs.clients = LIST_HEAD_INIT(root_drmcs.drmcs.clients),
+};
+
+static DEFINE_MUTEX(drmcg_mutex);
 
 static inline struct drm_cgroup_state *
 css_to_drmcs(struct cgroup_subsys_state *css)
@@ -42,11 +50,63 @@ drmcs_alloc(struct cgroup_subsys_state *parent_css)
 		drmcs = kzalloc(sizeof(*drmcs), GFP_KERNEL);
 		if (!drmcs)
 			return ERR_PTR(-ENOMEM);
+
+		INIT_LIST_HEAD(&drmcs->clients);
 	}
 
 	return &drmcs->css;
 }
 
+void drmcgroup_client_open(struct drm_file *file_priv)
+{
+	struct drm_cgroup_state *drmcs;
+
+	drmcs = css_to_drmcs(task_get_css(current, drm_cgrp_id));
+
+	mutex_lock(&drmcg_mutex);
+	file_priv->__css = &drmcs->css; /* Keeps the reference. */
+	list_add_tail(&file_priv->clink, &drmcs->clients);
+	mutex_unlock(&drmcg_mutex);
+}
+EXPORT_SYMBOL_GPL(drmcgroup_client_open);
+
+void drmcgroup_client_close(struct drm_file *file_priv)
+{
+	struct drm_cgroup_state *drmcs;
+
+	drmcs = css_to_drmcs(file_priv->__css);
+
+	mutex_lock(&drmcg_mutex);
+	list_del(&file_priv->clink);
+	file_priv->__css = NULL;
+	mutex_unlock(&drmcg_mutex);
+
+	css_put(&drmcs->css);
+}
+EXPORT_SYMBOL_GPL(drmcgroup_client_close);
+
+void drmcgroup_client_migrate(struct drm_file *file_priv)
+{
+	struct drm_cgroup_state *src, *dst;
+	struct cgroup_subsys_state *old;
+
+	mutex_lock(&drmcg_mutex);
+
+	old = file_priv->__css;
+	src = css_to_drmcs(old);
+	dst = css_to_drmcs(task_get_css(current, drm_cgrp_id));
+
+	if (src != dst) {
+		file_priv->__css = &dst->css; /* Keeps the reference. */
+		list_move_tail(&file_priv->clink, &dst->clients);
+	}
+
+	mutex_unlock(&drmcg_mutex);
+
+	css_put(old);
+}
+EXPORT_SYMBOL_GPL(drmcgroup_client_migrate);
+
 struct cftype files[] = {
 	{ } /* Zero entry terminates. */
 };
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 05/10] drm/cgroup: Add ability to query drm cgroup GPU time
  2023-03-14 14:18 [RFC v4 00/10] DRM scheduling cgroup controller Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2023-03-14 14:18 ` [RFC 04/10] drm/cgroup: Track DRM clients per cgroup Tvrtko Ursulin
@ 2023-03-14 14:18 ` Tvrtko Ursulin
  2023-03-14 14:19 ` [RFC 06/10] drm/cgroup: Add over budget signalling callback Tvrtko Ursulin
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Tvrtko Ursulin @ 2023-03-14 14:18 UTC (permalink / raw)
  To: Intel-gfx, dri-devel
  Cc: Rob Clark, Kenny.Ho, Daniel Vetter, Johannes Weiner, linux-kernel,
	Stéphane Marchesin, Christian König, Zefan Li,
	Dave Airlie, Tejun Heo, cgroups, T . J . Mercier

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Add a driver callback and core helper which allow querying the time spent
on GPUs for processes belonging to a group.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 include/drm/drm_drv.h | 28 ++++++++++++++++++++++++++++
 kernel/cgroup/drm.c   | 20 ++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index 5b86bb7603e7..01953d6b98d6 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -151,6 +151,24 @@ enum drm_driver_feature {
 	DRIVER_HAVE_IRQ			= BIT(30),
 };
 
+/**
+ * struct drm_cgroup_ops
+ *
+ * This structure contains a number of callbacks that drivers can provide if
+ * they are able to support one or more of the functionalities implemented by
+ * the DRM cgroup controller.
+ */
+struct drm_cgroup_ops {
+	/**
+	 * @active_time_us:
+	 *
+	 * Optional callback for reporting the GPU time consumed by this client.
+	 *
+	 * Used by the DRM core when queried by the DRM cgroup controller.
+	 */
+	u64 (*active_time_us) (struct drm_file *);
+};
+
 /**
  * struct drm_driver - DRM driver structure
  *
@@ -443,6 +461,16 @@ struct drm_driver {
 	 */
 	const struct file_operations *fops;
 
+#ifdef CONFIG_CGROUP_DRM
+	/**
+	 * @cg_ops:
+	 *
+	 * Optional pointer to driver callbacks facilitating integration with
+	 * the DRM cgroup controller.
+	 */
+	const struct drm_cgroup_ops *cg_ops;
+#endif
+
 #ifdef CONFIG_DRM_LEGACY
 	/* Everything below here is for legacy driver, never use! */
 	/* private: */
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index d702be1b441f..acdb76635b60 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -9,6 +9,8 @@
 #include <linux/mutex.h>
 #include <linux/slab.h>
 
+#include <drm/drm_drv.h>
+
 struct drm_cgroup_state {
 	struct cgroup_subsys_state css;
 
@@ -31,6 +33,24 @@ css_to_drmcs(struct cgroup_subsys_state *css)
 	return container_of(css, struct drm_cgroup_state, css);
 }
 
+static u64 drmcs_get_active_time_us(struct drm_cgroup_state *drmcs)
+{
+	struct drm_file *fpriv;
+	u64 total = 0;
+
+	lockdep_assert_held(&drmcg_mutex);
+
+	list_for_each_entry(fpriv, &drmcs->clients, clink) {
+		const struct drm_cgroup_ops *cg_ops =
+			fpriv->minor->dev->driver->cg_ops;
+
+		if (cg_ops && cg_ops->active_time_us)
+			total += cg_ops->active_time_us(fpriv);
+	}
+
+	return total;
+}
+
 static void drmcs_free(struct cgroup_subsys_state *css)
 {
 	struct drm_cgroup_state *drmcs = css_to_drmcs(css);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 06/10] drm/cgroup: Add over budget signalling callback
  2023-03-14 14:18 [RFC v4 00/10] DRM scheduling cgroup controller Tvrtko Ursulin
                   ` (4 preceding siblings ...)
  2023-03-14 14:18 ` [RFC 05/10] drm/cgroup: Add ability to query drm cgroup GPU time Tvrtko Ursulin
@ 2023-03-14 14:19 ` Tvrtko Ursulin
  2023-03-14 14:19 ` [RFC 07/10] drm/cgroup: Only track clients which are providing drm_cgroup_ops Tvrtko Ursulin
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Tvrtko Ursulin @ 2023-03-14 14:19 UTC (permalink / raw)
  To: Intel-gfx, dri-devel
  Cc: Rob Clark, Kenny.Ho, Daniel Vetter, Johannes Weiner, linux-kernel,
	Stéphane Marchesin, Christian König, Zefan Li,
	Dave Airlie, Tejun Heo, cgroups, T . J . Mercier

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Add a new callback via which the drm cgroup controller is notifying the
drm core that a certain process is above its allotted GPU time.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 include/drm/drm_drv.h |  8 ++++++++
 kernel/cgroup/drm.c   | 16 ++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index 01953d6b98d6..d8386b64eab5 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -167,6 +167,14 @@ struct drm_cgroup_ops {
 	 * Used by the DRM core when queried by the DRM cgroup controller.
 	 */
 	u64 (*active_time_us) (struct drm_file *);
+
+	/**
+	 * @signal_budget:
+	 *
+	 * Optional callback used by the DRM core to forward over/under GPU time
+	 * messages sent by the DRM cgroup controller.
+	 */
+	int (*signal_budget) (struct drm_file *, u64 used, u64 budget);
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index acdb76635b60..68f31797c4f0 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -51,6 +51,22 @@ static u64 drmcs_get_active_time_us(struct drm_cgroup_state *drmcs)
 	return total;
 }
 
+static void
+drmcs_signal_budget(struct drm_cgroup_state *drmcs, u64 usage, u64 budget)
+{
+	struct drm_file *fpriv;
+
+	lockdep_assert_held(&drmcg_mutex);
+
+	list_for_each_entry(fpriv, &drmcs->clients, clink) {
+		const struct drm_cgroup_ops *cg_ops =
+			fpriv->minor->dev->driver->cg_ops;
+
+		if (cg_ops && cg_ops->signal_budget)
+			cg_ops->signal_budget(fpriv, usage, budget);
+	}
+}
+
 static void drmcs_free(struct cgroup_subsys_state *css)
 {
 	struct drm_cgroup_state *drmcs = css_to_drmcs(css);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 07/10] drm/cgroup: Only track clients which are providing drm_cgroup_ops
  2023-03-14 14:18 [RFC v4 00/10] DRM scheduling cgroup controller Tvrtko Ursulin
                   ` (5 preceding siblings ...)
  2023-03-14 14:19 ` [RFC 06/10] drm/cgroup: Add over budget signalling callback Tvrtko Ursulin
@ 2023-03-14 14:19 ` Tvrtko Ursulin
  2023-03-14 14:19 ` [RFC 08/10] cgroup/drm: Introduce weight based drm cgroup control Tvrtko Ursulin
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Tvrtko Ursulin @ 2023-03-14 14:19 UTC (permalink / raw)
  To: Intel-gfx, dri-devel
  Cc: Rob Clark, Kenny.Ho, Daniel Vetter, Johannes Weiner, linux-kernel,
	Stéphane Marchesin, Christian König, Zefan Li,
	Dave Airlie, Tejun Heo, cgroups, T . J . Mercier

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

To reduce the number of tracking going on, especially with drivers which
will not support any sort of control from the drm cgroup controller side,
lets express the funcionality as opt-in and use the presence of
drm_cgroup_ops as activation criteria.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 kernel/cgroup/drm.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 68f31797c4f0..60e1f3861576 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -97,6 +97,9 @@ void drmcgroup_client_open(struct drm_file *file_priv)
 {
 	struct drm_cgroup_state *drmcs;
 
+	if (!file_priv->minor->dev->driver->cg_ops)
+		return;
+
 	drmcs = css_to_drmcs(task_get_css(current, drm_cgrp_id));
 
 	mutex_lock(&drmcg_mutex);
@@ -112,6 +115,9 @@ void drmcgroup_client_close(struct drm_file *file_priv)
 
 	drmcs = css_to_drmcs(file_priv->__css);
 
+	if (!file_priv->minor->dev->driver->cg_ops)
+		return;
+
 	mutex_lock(&drmcg_mutex);
 	list_del(&file_priv->clink);
 	file_priv->__css = NULL;
@@ -126,6 +132,9 @@ void drmcgroup_client_migrate(struct drm_file *file_priv)
 	struct drm_cgroup_state *src, *dst;
 	struct cgroup_subsys_state *old;
 
+	if (!file_priv->minor->dev->driver->cg_ops)
+		return;
+
 	mutex_lock(&drmcg_mutex);
 
 	old = file_priv->__css;
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 08/10] cgroup/drm: Introduce weight based drm cgroup control
  2023-03-14 14:18 [RFC v4 00/10] DRM scheduling cgroup controller Tvrtko Ursulin
                   ` (6 preceding siblings ...)
  2023-03-14 14:19 ` [RFC 07/10] drm/cgroup: Only track clients which are providing drm_cgroup_ops Tvrtko Ursulin
@ 2023-03-14 14:19 ` Tvrtko Ursulin
  2023-03-14 14:19 ` [RFC 09/10] drm/i915: Wire up with drm controller GPU time query Tvrtko Ursulin
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Tvrtko Ursulin @ 2023-03-14 14:19 UTC (permalink / raw)
  To: Intel-gfx, dri-devel
  Cc: Rob Clark, Kenny.Ho, Daniel Vetter, Johannes Weiner, linux-kernel,
	Stéphane Marchesin, Christian König, Michal Koutný,
	Zefan Li, Dave Airlie, Tejun Heo, cgroups, T . J . Mercier

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Similar to CPU scheduling, implement a concept of weight in the drm cgroup
controller.

Uses the same range and default as the CPU controller - CGROUP_WEIGHT_MIN,
CGROUP_WEIGHT_DFL and CGROUP_WEIGHT_MAX.

Later each cgroup is assigned a time budget proportionaly based on the
relative weights of it's siblings. This time budget is in turn split by
the group's children and so on.

This will be used to implement a soft, or best effort signal from drm
cgroup to drm core notifying about groups which are over their allotted
budget.

No guarantees that the limit can be enforced are provided or implied.

Checking of GPU usage is done periodically by the controller which can be
configured via drmcg_period_ms kernel boot parameter and which defaults
to 2s.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michal Koutn√Ω <mkoutny@suse.com>
Cc: Tejun Heo <tj@kernel.org>
---
 Documentation/admin-guide/cgroup-v2.rst |  31 ++
 kernel/cgroup/drm.c                     | 409 +++++++++++++++++++++++-
 2 files changed, 437 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index f67c0829350b..8906fdf5c8d6 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2407,6 +2407,37 @@ HugeTLB Interface Files
         hugetlb pages of <hugepagesize> in this cgroup.  Only active in
         use hugetlb pages are included.  The per-node values are in bytes.
 
+DRM
+---
+
+The DRM controller allows configuring scheduling soft limits.
+
+DRM scheduling soft limits
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Because of the heterogenous hardware and driver DRM capabilities, soft limits
+are implemented as a loose co-operative (bi-directional) interface between the
+controller and DRM core.
+
+The controller configures the GPU time allowed per group and periodically scans
+the belonging tasks to detect the over budget condition, at which point it
+invokes a callback notifying the DRM core of the condition.
+
+DRM core provides an API to query per process GPU utilization and 2nd API to
+receive notification from the cgroup controller when the group enters or exits
+the over budget condition.
+
+Individual DRM drivers which implement the interface are expected to act on this
+in the best-effort manner only. There are no guarantees that the soft limits
+will be respected.
+
+DRM scheduling soft limits interface files
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+  drm.weight
+	Standard cgroup weight based control [1, 10000] used to configure the
+	relative distributing of GPU time between the sibling groups.
+
 Misc
 ----
 
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 60e1f3861576..b244e3d828cc 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -6,7 +6,9 @@
 #include <linux/cgroup.h>
 #include <linux/cgroup_drm.h>
 #include <linux/list.h>
+#include <linux/moduleparam.h>
 #include <linux/mutex.h>
+#include <linux/signal.h>
 #include <linux/slab.h>
 
 #include <drm/drm_drv.h>
@@ -15,10 +17,28 @@ struct drm_cgroup_state {
 	struct cgroup_subsys_state css;
 
 	struct list_head clients;
+
+	unsigned int weight;
+
+	unsigned int sum_children_weights;
+
+	bool over;
+	bool over_budget;
+
+	u64 per_s_budget_us;
+	u64 prev_active_us;
+	u64 active_us;
 };
 
 struct drm_root_cgroup_state {
 	struct drm_cgroup_state drmcs;
+
+	unsigned int period_us;
+
+	unsigned int last_scan_duration_us;
+	ktime_t prev_timestamp;
+
+	struct delayed_work scan_work;
 };
 
 static struct drm_root_cgroup_state root_drmcs = {
@@ -27,6 +47,9 @@ static struct drm_root_cgroup_state root_drmcs = {
 
 static DEFINE_MUTEX(drmcg_mutex);
 
+static int drmcg_period_ms = 2000;
+module_param(drmcg_period_ms, int, 0644);
+
 static inline struct drm_cgroup_state *
 css_to_drmcs(struct cgroup_subsys_state *css)
 {
@@ -67,12 +90,263 @@ drmcs_signal_budget(struct drm_cgroup_state *drmcs, u64 usage, u64 budget)
 	}
 }
 
-static void drmcs_free(struct cgroup_subsys_state *css)
+static u64
+drmcs_read_weight(struct cgroup_subsys_state *css, struct cftype *cft)
+{
+	struct drm_cgroup_state *drmcs = css_to_drmcs(css);
+
+	return drmcs->weight;
+}
+
+static int
+drmcs_write_weight(struct cgroup_subsys_state *css, struct cftype *cftype,
+		   u64 weight)
 {
 	struct drm_cgroup_state *drmcs = css_to_drmcs(css);
+	int ret;
+
+	if (weight < CGROUP_WEIGHT_MIN || weight > CGROUP_WEIGHT_MAX)
+		return -ERANGE;
+
+	ret = mutex_lock_interruptible(&drmcg_mutex);
+	if (ret)
+		return ret;
+	drmcs->weight = weight;
+	mutex_unlock(&drmcg_mutex);
+
+	return 0;
+}
+
+static bool __start_scanning(unsigned int period_us)
+{
+	struct drm_cgroup_state *root = &root_drmcs.drmcs;
+	struct cgroup_subsys_state *node;
+	ktime_t start, now;
+	bool ok = false;
+
+	lockdep_assert_held(&drmcg_mutex);
+
+	start = ktime_get();
+	if (period_us > root_drmcs.last_scan_duration_us)
+		period_us -= root_drmcs.last_scan_duration_us;
+
+	rcu_read_lock();
+
+	css_for_each_descendant_post(node, &root->css) {
+		struct drm_cgroup_state *drmcs = css_to_drmcs(node);
+
+		if (!css_tryget_online(node))
+			goto out;
 
-	if (drmcs != &root_drmcs.drmcs)
-		kfree(drmcs);
+		drmcs->active_us = 0;
+		drmcs->sum_children_weights = 0;
+
+		if (period_us && node == &root->css)
+			drmcs->per_s_budget_us =
+				DIV_ROUND_UP_ULL((u64)period_us * USEC_PER_SEC,
+						 USEC_PER_SEC);
+		else
+			drmcs->per_s_budget_us = 0;
+
+		css_put(node);
+	}
+
+	css_for_each_descendant_post(node, &root->css) {
+		struct drm_cgroup_state *drmcs = css_to_drmcs(node);
+		struct drm_cgroup_state *parent;
+		u64 active;
+
+		if (!css_tryget_online(node))
+			goto out;
+		if (!node->parent) {
+			css_put(node);
+			continue;
+		}
+		if (!css_tryget_online(node->parent)) {
+			css_put(node);
+			goto out;
+		}
+		parent = css_to_drmcs(node->parent);
+
+		active = drmcs_get_active_time_us(drmcs);
+		if (period_us && active > drmcs->prev_active_us)
+			drmcs->active_us += active - drmcs->prev_active_us;
+		drmcs->prev_active_us = active;
+
+		parent->active_us += drmcs->active_us;
+		parent->sum_children_weights += drmcs->weight;
+
+		css_put(node);
+		css_put(&parent->css);
+	}
+
+	ok = true;
+	now = ktime_get();
+	root_drmcs.last_scan_duration_us = ktime_to_us(ktime_sub(now, start));
+	root_drmcs.prev_timestamp = now;
+
+out:
+	rcu_read_unlock();
+
+	return ok;
+}
+
+static void scan_worker(struct work_struct *work)
+{
+	struct drm_cgroup_state *root = &root_drmcs.drmcs;
+	struct cgroup_subsys_state *node;
+	unsigned int period_us;
+
+	mutex_lock(&drmcg_mutex);
+
+	rcu_read_lock();
+
+	if (WARN_ON_ONCE(!css_tryget_online(&root->css))) {
+		rcu_read_unlock();
+		mutex_unlock(&drmcg_mutex);
+		return;
+	}
+
+	period_us = ktime_to_us(ktime_sub(ktime_get(),
+					  root_drmcs.prev_timestamp));
+
+	/*
+	 * 1st pass - reset working values and update hierarchical weights and
+	 * GPU utilisation.
+	 */
+	if (!__start_scanning(period_us))
+		goto out_retry; /*
+				 * Always come back later if scanner races with
+				 * core cgroup management. (Repeated pattern.)
+				 */
+
+	css_for_each_descendant_pre(node, &root->css) {
+		struct drm_cgroup_state *drmcs = css_to_drmcs(node);
+		struct cgroup_subsys_state *css;
+		u64 reused_us = 0, unused_us = 0;
+		unsigned int over_weights = 0;
+
+		if (!css_tryget_online(node))
+			goto out_retry;
+
+		/*
+		 * 2nd pass - calculate initial budgets, mark over budget
+		 * siblings and add up unused budget for the group.
+		 */
+		css_for_each_child(css, &drmcs->css) {
+			struct drm_cgroup_state *sibling = css_to_drmcs(css);
+
+			if (!css_tryget_online(css)) {
+				css_put(node);
+				goto out_retry;
+			}
+
+			sibling->per_s_budget_us  =
+				DIV_ROUND_UP_ULL(drmcs->per_s_budget_us *
+						 sibling->weight,
+						 drmcs->sum_children_weights);
+
+			sibling->over = sibling->active_us >
+					sibling->per_s_budget_us;
+			if (sibling->over)
+				over_weights += sibling->weight;
+			else
+				unused_us += sibling->per_s_budget_us -
+					     sibling->active_us;
+
+			css_put(css);
+		}
+
+		/*
+		 * 3rd pass - spread unused budget according to relative weights
+		 * of over budget siblings.
+		 */
+		while (over_weights && reused_us < unused_us) {
+			unsigned int under = 0;
+
+			unused_us -= reused_us;
+			reused_us = 0;
+
+			css_for_each_child(css, &drmcs->css) {
+				struct drm_cgroup_state *sibling;
+				u64 extra_us, max_us, need_us;
+
+				if (!css_tryget_online(css)) {
+					css_put(node);
+					goto out_retry;
+				}
+
+				sibling = css_to_drmcs(css);
+				if (!sibling->over) {
+					css_put(css);
+					continue;
+				}
+
+				extra_us = DIV_ROUND_UP_ULL(unused_us *
+							    sibling->weight,
+							    over_weights);
+				max_us = sibling->per_s_budget_us + extra_us;
+				if (max_us > sibling->active_us)
+					need_us = sibling->active_us -
+						  sibling->per_s_budget_us;
+				else
+					need_us = extra_us;
+				reused_us += need_us;
+				sibling->per_s_budget_us += need_us;
+				sibling->over = sibling->active_us  >
+						sibling->per_s_budget_us;
+				if (!sibling->over)
+					under += sibling->weight;
+
+				css_put(css);
+			}
+
+			over_weights -= under;
+		}
+
+		css_put(node);
+	}
+
+	/*
+	 * 4th pass - send out over/under budget notifications.
+	 */
+	css_for_each_descendant_post(node, &root->css) {
+		struct drm_cgroup_state *drmcs = css_to_drmcs(node);
+
+		if (!css_tryget_online(node))
+			goto out_retry;
+
+		if (drmcs->over || drmcs->over_budget)
+			drmcs_signal_budget(drmcs,
+					    drmcs->active_us,
+					    drmcs->per_s_budget_us);
+		drmcs->over_budget = drmcs->over;
+
+		css_put(node);
+	}
+
+out_retry:
+	rcu_read_unlock();
+	mutex_unlock(&drmcg_mutex);
+
+	period_us = READ_ONCE(root_drmcs.period_us);
+	if (period_us)
+		schedule_delayed_work(&root_drmcs.scan_work,
+				      usecs_to_jiffies(period_us));
+
+	css_put(&root->css);
+}
+
+static void drmcs_free(struct cgroup_subsys_state *css)
+{
+	if (css != &root_drmcs.drmcs.css)
+		kfree(css_to_drmcs(css));
+}
+
+static void record_baseline_utilisation(void)
+{
+	/* Re-capture baseline group GPU times to avoid downward jumps. */
+	WARN_ON_ONCE(!__start_scanning(0)); /* QQQ Retry if it fails? */
 }
 
 static struct cgroup_subsys_state *
@@ -82,6 +356,7 @@ drmcs_alloc(struct cgroup_subsys_state *parent_css)
 
 	if (!parent_css) {
 		drmcs = &root_drmcs.drmcs;
+		INIT_DELAYED_WORK(&root_drmcs.scan_work, scan_worker);
 	} else {
 		drmcs = kzalloc(sizeof(*drmcs), GFP_KERNEL);
 		if (!drmcs)
@@ -90,9 +365,124 @@ drmcs_alloc(struct cgroup_subsys_state *parent_css)
 		INIT_LIST_HEAD(&drmcs->clients);
 	}
 
+	drmcs->weight = CGROUP_WEIGHT_DFL;
+
 	return &drmcs->css;
 }
 
+static int drmcs_online(struct cgroup_subsys_state *css)
+{
+	if (css == &root_drmcs.drmcs.css && drmcg_period_ms) {
+		mutex_lock(&drmcg_mutex);
+		record_baseline_utilisation();
+		root_drmcs.period_us = max(500, drmcg_period_ms) * 1000;
+		mod_delayed_work(system_wq,
+				 &root_drmcs.scan_work,
+				 usecs_to_jiffies(root_drmcs.period_us));
+		mutex_unlock(&drmcg_mutex);
+	}
+
+	return 0;
+}
+
+static void drmcs_offline(struct cgroup_subsys_state *css)
+{
+	bool flush = false;
+
+	if (css != &root_drmcs.drmcs.css)
+		return;
+
+	mutex_lock(&drmcg_mutex);
+	if (root_drmcs.period_us) {
+		root_drmcs.period_us = 0;
+		cancel_delayed_work(&root_drmcs.scan_work);
+		flush = true;
+	}
+	mutex_unlock(&drmcg_mutex);
+
+	if (flush)
+		flush_delayed_work(&root_drmcs.scan_work);
+}
+
+static struct drm_cgroup_state *old_drmcs;
+
+static int drmcs_can_attach(struct cgroup_taskset *tset)
+{
+	struct cgroup_subsys_state *css;
+	struct task_struct *task;
+
+	/*
+	 * QQQ
+	 * Same passing of state via global as cpuset_can_attach to
+	 * cpuset_attach. Always serialized?
+	 */
+
+	task = cgroup_taskset_first(tset, &css);
+	old_drmcs = css_to_drmcs(task_css(task, drm_cgrp_id));
+
+	return 0;
+}
+
+static void drmcs_attach(struct cgroup_taskset *tset)
+{
+	struct drm_cgroup_state *old = old_drmcs;
+	struct cgroup_subsys_state *css;
+	struct drm_file *fpriv, *next;
+	struct drm_cgroup_state *new;
+	struct task_struct *task;
+	bool migrated = false;
+
+	if (!old)
+		return;
+
+	task = cgroup_taskset_first(tset, &css);
+	new = css_to_drmcs(task_css(task, drm_cgrp_id));
+	if (new == old)
+		return;
+
+	mutex_lock(&drmcg_mutex);
+
+	list_for_each_entry_safe(fpriv, next, &old->clients, clink) {
+		cgroup_taskset_for_each(task, css, tset) {
+			struct cgroup_subsys_state *old_css;
+
+			if (task->flags & PF_KTHREAD)
+				continue;
+			if (!thread_group_leader(task))
+				continue;
+
+			new = css_to_drmcs(task_css(task, drm_cgrp_id));
+			if (WARN_ON_ONCE(new == old))
+				continue;
+
+			if (rcu_access_pointer(fpriv->pid) != task_tgid(task))
+				continue;
+
+			if (WARN_ON_ONCE(fpriv->__css != &old->css))
+				continue;
+
+			old_css = fpriv->__css;
+			fpriv->__css = &new->css;
+			css_get(fpriv->__css);
+			list_move_tail(&fpriv->clink, &new->clients);
+			css_put(old_css);
+			migrated = true;
+		}
+	}
+
+	if (migrated)
+		record_baseline_utilisation();
+
+	mutex_unlock(&drmcg_mutex);
+
+	old_drmcs = NULL;
+}
+
+static void drmcs_cancel_attach(struct cgroup_taskset *tset)
+{
+	old_drmcs = NULL;
+}
+
 void drmcgroup_client_open(struct drm_file *file_priv)
 {
 	struct drm_cgroup_state *drmcs;
@@ -121,6 +511,7 @@ void drmcgroup_client_close(struct drm_file *file_priv)
 	mutex_lock(&drmcg_mutex);
 	list_del(&file_priv->clink);
 	file_priv->__css = NULL;
+	record_baseline_utilisation();
 	mutex_unlock(&drmcg_mutex);
 
 	css_put(&drmcs->css);
@@ -144,6 +535,7 @@ void drmcgroup_client_migrate(struct drm_file *file_priv)
 	if (src != dst) {
 		file_priv->__css = &dst->css; /* Keeps the reference. */
 		list_move_tail(&file_priv->clink, &dst->clients);
+		record_baseline_utilisation();
 	}
 
 	mutex_unlock(&drmcg_mutex);
@@ -153,12 +545,23 @@ void drmcgroup_client_migrate(struct drm_file *file_priv)
 EXPORT_SYMBOL_GPL(drmcgroup_client_migrate);
 
 struct cftype files[] = {
+	{
+		.name = "weight",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.read_u64 = drmcs_read_weight,
+		.write_u64 = drmcs_write_weight,
+	},
 	{ } /* Zero entry terminates. */
 };
 
 struct cgroup_subsys drm_cgrp_subsys = {
 	.css_alloc	= drmcs_alloc,
 	.css_free	= drmcs_free,
+	.css_online	= drmcs_online,
+	.css_offline	= drmcs_offline,
+	.can_attach     = drmcs_can_attach,
+	.attach		= drmcs_attach,
+	.cancel_attach  = drmcs_cancel_attach,
 	.early_init	= false,
 	.legacy_cftypes	= files,
 	.dfl_cftypes	= files,
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 09/10] drm/i915: Wire up with drm controller GPU time query
  2023-03-14 14:18 [RFC v4 00/10] DRM scheduling cgroup controller Tvrtko Ursulin
                   ` (7 preceding siblings ...)
  2023-03-14 14:19 ` [RFC 08/10] cgroup/drm: Introduce weight based drm cgroup control Tvrtko Ursulin
@ 2023-03-14 14:19 ` Tvrtko Ursulin
  2023-03-14 14:19 ` [RFC 10/10] drm/i915: Implement cgroup controller over budget throttling Tvrtko Ursulin
       [not found] ` <20230314141904.1210824-1-tvrtko.ursulin-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
  10 siblings, 0 replies; 19+ messages in thread
From: Tvrtko Ursulin @ 2023-03-14 14:19 UTC (permalink / raw)
  To: Intel-gfx, dri-devel
  Cc: Rob Clark, Kenny.Ho, Daniel Vetter, Johannes Weiner, linux-kernel,
	Stéphane Marchesin, Christian König, Zefan Li,
	Dave Airlie, Tejun Heo, cgroups, T . J . Mercier

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Implement the drm_cgroup_ops->active_time_us callback.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_driver.c     | 10 ++++
 drivers/gpu/drm/i915/i915_drm_client.c | 76 ++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_drm_client.h |  2 +
 3 files changed, 78 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index da249337c23b..e956990a9870 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1777,6 +1777,12 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY, i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
 };
 
+#ifdef CONFIG_CGROUP_DRM
+static const struct drm_cgroup_ops i915_drm_cgroup_ops = {
+	.active_time_us = i915_drm_cgroup_get_active_time_us,
+};
+#endif
+
 /*
  * Interface history:
  *
@@ -1805,6 +1811,10 @@ static const struct drm_driver i915_drm_driver = {
 	.lastclose = i915_driver_lastclose,
 	.postclose = i915_driver_postclose,
 
+#ifdef CONFIG_CGROUP_DRM
+	.cg_ops = &i915_drm_cgroup_ops,
+#endif
+
 	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
 	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
 	.gem_prime_import = i915_gem_prime_import,
diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
index b09d1d386574..c9754cb0277f 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -75,7 +75,7 @@ void i915_drm_clients_fini(struct i915_drm_clients *clients)
 	xa_destroy(&clients->xarray);
 }
 
-#ifdef CONFIG_PROC_FS
+#if defined(CONFIG_PROC_FS) || defined(CONFIG_CGROUP_DRM)
 static const char * const uabi_class_names[] = {
 	[I915_ENGINE_CLASS_RENDER] = "render",
 	[I915_ENGINE_CLASS_COPY] = "copy",
@@ -100,22 +100,78 @@ static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
 	return total;
 }
 
-static void
-show_client_class(struct seq_file *m,
-		  struct i915_drm_client *client,
-		  unsigned int class)
+static u64 get_class_active_ns(struct i915_drm_client *client,
+			       unsigned int class,
+			       unsigned int *capacity)
 {
-	const struct list_head *list = &client->ctx_list;
-	u64 total = atomic64_read(&client->past_runtime[class]);
-	const unsigned int capacity =
-		client->clients->i915->engine_uabi_class_count[class];
 	struct i915_gem_context *ctx;
+	u64 total;
+
+	*capacity =
+	    client->clients->i915->engine_uabi_class_count[class];
+	if (!*capacity)
+		return 0;
+
+	total = atomic64_read(&client->past_runtime[class]);
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(ctx, list, client_link)
+	list_for_each_entry_rcu(ctx, &client->ctx_list, client_link)
 		total += busy_add(ctx, class);
 	rcu_read_unlock();
 
+	return total;
+}
+#endif
+
+#ifdef CONFIG_CGROUP_DRM
+static bool supports_stats(struct drm_i915_private *i915)
+{
+	if (GRAPHICS_VER(i915) < 8)
+		return false;
+
+	/* temporary... */
+	if (intel_uc_uses_guc_submission(&to_gt(i915)->uc))
+		return false;
+
+	return true;
+}
+
+u64 i915_drm_cgroup_get_active_time_us(struct drm_file *file)
+{
+	struct drm_i915_file_private *fpriv = file->driver_priv;
+	struct i915_drm_client *client = fpriv->client;
+	unsigned int i;
+	u64 busy = 0;
+
+	if (!supports_stats(client->clients->i915))
+		return 0;
+
+	for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++) {
+		unsigned int capacity;
+		u64 b;
+
+		b = get_class_active_ns(client, i, &capacity);
+		if (capacity) {
+			b = DIV_ROUND_UP_ULL(b, capacity * 1000);
+			busy += b;
+		}
+	}
+
+	return busy;
+}
+#endif
+
+#ifdef CONFIG_PROC_FS
+static void
+show_client_class(struct seq_file *m,
+		  struct i915_drm_client *client,
+		  unsigned int class)
+{
+	unsigned int capacity;
+	u64 total;
+
+	total = get_class_active_ns(client, class, &capacity);
+
 	if (capacity)
 		seq_printf(m, "drm-engine-%s:\t%llu ns\n",
 			   uabi_class_names[class], total);
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
index 69496af996d9..c8439eaa89be 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -65,4 +65,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f);
 
 void i915_drm_clients_fini(struct i915_drm_clients *clients);
 
+u64 i915_drm_cgroup_get_active_time_us(struct drm_file *file);
+
 #endif /* !__I915_DRM_CLIENT_H__ */
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 10/10] drm/i915: Implement cgroup controller over budget throttling
  2023-03-14 14:18 [RFC v4 00/10] DRM scheduling cgroup controller Tvrtko Ursulin
                   ` (8 preceding siblings ...)
  2023-03-14 14:19 ` [RFC 09/10] drm/i915: Wire up with drm controller GPU time query Tvrtko Ursulin
@ 2023-03-14 14:19 ` Tvrtko Ursulin
       [not found] ` <20230314141904.1210824-1-tvrtko.ursulin-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
  10 siblings, 0 replies; 19+ messages in thread
From: Tvrtko Ursulin @ 2023-03-14 14:19 UTC (permalink / raw)
  To: Intel-gfx, dri-devel
  Cc: Rob Clark, Kenny.Ho, Daniel Vetter, Johannes Weiner, linux-kernel,
	Stéphane Marchesin, Christian König, Zefan Li,
	Dave Airlie, Tejun Heo, cgroups, T . J . Mercier

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

When notified by the drm core we are over our allotted time budget, i915
instance will check if any of the GPU engines it is reponsible for is
fully saturated. If it is, and the client in question is using that
engine, it will throttle it.

For now throttling is done simplistically by lowering the scheduling
priority while clients are throttled.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  38 ++++-
 drivers/gpu/drm/i915/i915_driver.c            |   1 +
 drivers/gpu/drm/i915/i915_drm_client.c        | 133 ++++++++++++++++++
 drivers/gpu/drm/i915/i915_drm_client.h        |  11 ++
 4 files changed, 182 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 9dce2957b4e5..25ac3e065d1c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -3065,6 +3065,42 @@ static void retire_requests(struct intel_timeline *tl, struct i915_request *end)
 			break;
 }
 
+#ifdef CONFIG_CGROUP_DRM
+static unsigned int
+__get_class(struct drm_i915_file_private *fpriv, const struct i915_request *rq)
+{
+	unsigned int class;
+
+	class = rq->context->engine->uabi_class;
+
+	if (WARN_ON_ONCE(class >= ARRAY_SIZE(fpriv->client->throttle)))
+		class = 0;
+
+	return class;
+}
+
+static void copy_priority(struct i915_sched_attr *attr,
+			  const struct i915_execbuffer *eb,
+			  const struct i915_request *rq)
+{
+	struct drm_i915_file_private *file_priv = eb->file->driver_priv;
+	int prio;
+
+	*attr = eb->gem_context->sched;
+
+	prio = file_priv->client->throttle[__get_class(file_priv, rq)];
+	if (prio)
+		attr->priority = prio;
+}
+#else
+static void copy_priority(struct i915_sched_attr *attr,
+			  const struct i915_execbuffer *eb,
+			  const struct i915_request *rq)
+{
+	*attr = eb->gem_context->sched;
+}
+#endif
+
 static int eb_request_add(struct i915_execbuffer *eb, struct i915_request *rq,
 			  int err, bool last_parallel)
 {
@@ -3081,7 +3117,7 @@ static int eb_request_add(struct i915_execbuffer *eb, struct i915_request *rq,
 
 	/* Check that the context wasn't destroyed before submission */
 	if (likely(!intel_context_is_closed(eb->context))) {
-		attr = eb->gem_context->sched;
+		copy_priority(&attr, eb, rq);
 	} else {
 		/* Serialise with context_close via the add_to_timeline */
 		i915_request_set_error_once(rq, -ENOENT);
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index e956990a9870..ebe1bd45ced1 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1780,6 +1780,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 #ifdef CONFIG_CGROUP_DRM
 static const struct drm_cgroup_ops i915_drm_cgroup_ops = {
 	.active_time_us = i915_drm_cgroup_get_active_time_us,
+	.signal_budget = i915_drm_cgroup_signal_budget,
 };
 #endif
 
diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
index c9754cb0277f..72d92ee292ae 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -4,6 +4,7 @@
  */
 
 #include <linux/kernel.h>
+#include <linux/ktime.h>
 #include <linux/slab.h>
 #include <linux/types.h>
 
@@ -159,6 +160,138 @@ u64 i915_drm_cgroup_get_active_time_us(struct drm_file *file)
 
 	return busy;
 }
+
+int i915_drm_cgroup_signal_budget(struct drm_file *file, u64 usage, u64 budget)
+{
+	struct drm_i915_file_private *fpriv = file->driver_priv;
+	u64 class_usage[I915_LAST_UABI_ENGINE_CLASS + 1];
+	u64 class_last[I915_LAST_UABI_ENGINE_CLASS + 1];
+	struct drm_i915_private *i915 = fpriv->dev_priv;
+	struct i915_drm_client *client = fpriv->client;
+	struct intel_engine_cs *engine;
+	bool over = usage > budget;
+	struct task_struct *task;
+	struct pid *pid;
+	unsigned int i;
+	ktime_t unused;
+	int ret = 0;
+	u64 t;
+
+	if (!supports_stats(i915))
+		return -EINVAL;
+
+	if (usage == 0 && budget == 0)
+		return 0;
+
+	rcu_read_lock();
+	pid = rcu_dereference(file->pid);
+	task = pid_task(pid, PIDTYPE_TGID);
+	if (over) {
+		client->over_budget++;
+		if (!client->over_budget)
+			client->over_budget = 2;
+
+		drm_dbg(&i915->drm, "%s[%u] over budget (%llu/%llu)\n",
+			task ? task->comm : "<unknown>", pid_vnr(pid),
+			usage, budget);
+	} else {
+		client->over_budget = 0;
+		memset(client->class_last, 0, sizeof(client->class_last));
+		memset(client->throttle, 0, sizeof(client->throttle));
+
+		drm_dbg(&i915->drm, "%s[%u] un-throttled; under budget\n",
+			task ? task->comm : "<unknown>", pid_vnr(pid));
+
+		rcu_read_unlock();
+		return 0;
+	}
+	rcu_read_unlock();
+
+	memset(class_usage, 0, sizeof(class_usage));
+	for_each_uabi_engine(engine, i915)
+		class_usage[engine->uabi_class] +=
+			ktime_to_ns(intel_engine_get_busy_time(engine, &unused));
+
+	memcpy(class_last, client->class_last, sizeof(class_last));
+	memcpy(client->class_last, class_usage, sizeof(class_last));
+
+	for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
+		class_usage[i] -= class_last[i];
+
+	t = client->last;
+	client->last = ktime_get_raw_ns();
+	t = client->last - t;
+
+	if (client->over_budget == 1)
+		return 0;
+
+	for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++) {
+		u64 client_class_usage[I915_LAST_UABI_ENGINE_CLASS + 1];
+		unsigned int capacity, rel_usage;
+
+		if (!i915->engine_uabi_class_count[i])
+			continue;
+
+		t = DIV_ROUND_UP_ULL(t, 1000);
+		class_usage[i] = DIV_ROUND_CLOSEST_ULL(class_usage[i], 1000);
+		rel_usage = DIV_ROUND_CLOSEST_ULL(class_usage[i] * 100ULL,
+						  t *
+						  i915->engine_uabi_class_count[i]);
+		if (rel_usage < 95) {
+			/* Physical class not oversubsribed. */
+			if (client->throttle[i]) {
+				client->throttle[i] = 0;
+
+				rcu_read_lock();
+				pid = rcu_dereference(file->pid);
+				task = pid_task(pid, PIDTYPE_TGID);
+				drm_dbg(&i915->drm,
+					"%s[%u] un-throttled; physical class %s utilisation %u%%\n",
+					task ? task->comm : "<unknown>",
+					pid_vnr(pid),
+					uabi_class_names[i],
+					rel_usage);
+				rcu_read_unlock();
+			}
+			continue;
+		}
+
+		client_class_usage[i] =
+			get_class_active_ns(client, i, &capacity);
+		if (client_class_usage[i]) {
+			int permille;
+
+			ret |= 1;
+
+			permille = DIV_ROUND_CLOSEST_ULL((usage - budget) *
+							 1000,
+							 budget);
+			client->throttle[i] =
+			    DIV_ROUND_CLOSEST(permille *
+					      I915_CONTEXT_MIN_USER_PRIORITY,
+					      1000);
+			if (client->throttle[i] <
+			    I915_CONTEXT_MIN_USER_PRIORITY)
+				client->throttle[i] =
+					I915_CONTEXT_MIN_USER_PRIORITY;
+
+			rcu_read_lock();
+			pid = rcu_dereference(file->pid);
+			task = pid_task(pid, PIDTYPE_TGID);
+			drm_dbg(&i915->drm,
+				"%s[%u] %dâ€° over budget, throttled to priority %d; physical class %s utilisation %u%%\n",
+				task ? task->comm : "<unknown>",
+				pid_vnr(pid),
+				permille,
+				client->throttle[i],
+				uabi_class_names[i],
+				rel_usage);
+			rcu_read_unlock();
+		}
+	}
+
+	return ret;
+}
 #endif
 
 #ifdef CONFIG_PROC_FS
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
index c8439eaa89be..092a7952a67b 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -15,6 +15,8 @@
 
 #define I915_LAST_UABI_ENGINE_CLASS I915_ENGINE_CLASS_COMPUTE
 
+struct drm_file;
+
 struct drm_i915_private;
 
 struct i915_drm_clients {
@@ -38,6 +40,13 @@ struct i915_drm_client {
 	 * @past_runtime: Accumulation of pphwsp runtimes from closed contexts.
 	 */
 	atomic64_t past_runtime[I915_LAST_UABI_ENGINE_CLASS + 1];
+
+#ifdef CONFIG_CGROUP_DRM
+	int throttle[I915_LAST_UABI_ENGINE_CLASS + 1];
+	unsigned int over_budget;
+	u64 last;
+	u64 class_last[I915_LAST_UABI_ENGINE_CLASS + 1];
+#endif
 };
 
 void i915_drm_clients_init(struct i915_drm_clients *clients,
@@ -66,5 +75,7 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f);
 void i915_drm_clients_fini(struct i915_drm_clients *clients);
 
 u64 i915_drm_cgroup_get_active_time_us(struct drm_file *file);
+int i915_drm_cgroup_signal_budget(struct drm_file *file,
+				  u64 usage, u64 budget);
 
 #endif /* !__I915_DRM_CLIENT_H__ */
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC 01/10] drm: Track clients by tgid and not tid
  2023-03-14 14:18 ` [RFC 01/10] drm: Track clients by tgid and not tid Tvrtko Ursulin
@ 2023-03-14 15:33   ` Christian König
  2023-03-15  9:52     ` Tvrtko Ursulin
  0 siblings, 1 reply; 19+ messages in thread
From: Christian König @ 2023-03-14 15:33 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx, dri-devel
  Cc: Rob Clark, Kenny.Ho, Daniel Vetter, Johannes Weiner, linux-kernel,
	Stéphane Marchesin, Alex Deucher, linux-graphics-maintainer,
	Zefan Li, Dave Airlie, Tejun Heo, cgroups, T . J . Mercier,
	Zack Rusin

Am 14.03.23 um 15:18 schrieb Tvrtko Ursulin:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> Thread group id (aka pid from userspace point of view) is a more
> interesting thing to show as an owner of a DRM fd, so track and show that
> instead of the thread id.
>
> In the next patch we will make the owner updated post file descriptor
> handover, which will also be tgid based to avoid ping-pong when multiple
> threads access the fd.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Zack Rusin <zackr@vmware.com>
> Cc: linux-graphics-maintainer@vmware.com
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Reviewed-by: Zack Rusin <zackr@vmware.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

Should we push the already reviewed cleanups like this one to 
drm-misc-next? That makes sense even without the rest of the 
functionality and reduce the amount of patches re-send.

Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +-
>   drivers/gpu/drm/drm_debugfs.c           | 4 ++--
>   drivers/gpu/drm/drm_file.c              | 2 +-
>   drivers/gpu/drm/vmwgfx/vmwgfx_gem.c     | 2 +-
>   4 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index d8e683688daa..863cb668e000 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -969,7 +969,7 @@ static int amdgpu_debugfs_gem_info_show(struct seq_file *m, void *unused)
>   		 * Therefore, we need to protect this ->comm access using RCU.
>   		 */
>   		rcu_read_lock();
> -		task = pid_task(file->pid, PIDTYPE_PID);
> +		task = pid_task(file->pid, PIDTYPE_TGID);
>   		seq_printf(m, "pid %8d command %s:\n", pid_nr(file->pid),
>   			   task ? task->comm : "<unknown>");
>   		rcu_read_unlock();
> diff --git a/drivers/gpu/drm/drm_debugfs.c b/drivers/gpu/drm/drm_debugfs.c
> index 4f643a490dc3..4855230ba2c6 100644
> --- a/drivers/gpu/drm/drm_debugfs.c
> +++ b/drivers/gpu/drm/drm_debugfs.c
> @@ -80,7 +80,7 @@ static int drm_clients_info(struct seq_file *m, void *data)
>   	seq_printf(m,
>   		   "%20s %5s %3s master a %5s %10s\n",
>   		   "command",
> -		   "pid",
> +		   "tgid",
>   		   "dev",
>   		   "uid",
>   		   "magic");
> @@ -94,7 +94,7 @@ static int drm_clients_info(struct seq_file *m, void *data)
>   		bool is_current_master = drm_is_current_master(priv);
>   
>   		rcu_read_lock(); /* locks pid_task()->comm */
> -		task = pid_task(priv->pid, PIDTYPE_PID);
> +		task = pid_task(priv->pid, PIDTYPE_TGID);
>   		uid = task ? __task_cred(task)->euid : GLOBAL_ROOT_UID;
>   		seq_printf(m, "%20s %5d %3d   %c    %c %5d %10u\n",
>   			   task ? task->comm : "<unknown>",
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index a51ff8cee049..c1018c470047 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -156,7 +156,7 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>   	if (!file)
>   		return ERR_PTR(-ENOMEM);
>   
> -	file->pid = get_pid(task_pid(current));
> +	file->pid = get_pid(task_tgid(current));
>   	file->minor = minor;
>   
>   	/* for compatibility root is always authenticated */
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
> index d6baf73a6458..c0da89e16e6f 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
> @@ -241,7 +241,7 @@ static int vmw_debugfs_gem_info_show(struct seq_file *m, void *unused)
>   		 * Therefore, we need to protect this ->comm access using RCU.
>   		 */
>   		rcu_read_lock();
> -		task = pid_task(file->pid, PIDTYPE_PID);
> +		task = pid_task(file->pid, PIDTYPE_TGID);
>   		seq_printf(m, "pid %8d command %s:\n", pid_nr(file->pid),
>   			   task ? task->comm : "<unknown>");
>   		rcu_read_unlock();


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 02/10] drm: Update file owner during use
  2023-03-14 14:18 ` [RFC 02/10] drm: Update file owner during use Tvrtko Ursulin
@ 2023-03-14 15:49   ` Christian König
       [not found]   ` <20230314141904.1210824-3-tvrtko.ursulin-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
  1 sibling, 0 replies; 19+ messages in thread
From: Christian König @ 2023-03-14 15:49 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx, dri-devel
  Cc: Rob Clark, Kenny.Ho, Daniel Vetter, Daniel Vetter,
	Johannes Weiner, linux-kernel, Stéphane Marchesin, Zefan Li,
	Dave Airlie, Tejun Heo, cgroups, T . J . Mercier

Am 14.03.23 um 15:18 schrieb Tvrtko Ursulin:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> With the typical model where the display server opends the file descriptor
> and then hands it over to the client we were showing stale data in
> debugfs.
>
> Fix it by updating the drm_file->pid on ioctl access from a different
> process.
>
> The field is also made RCU protected to allow for lockless readers. Update
> side is protected with dev->filelist_mutex.
>
> Before:
>
> $ cat /sys/kernel/debug/dri/0/clients
>               command   pid dev master a   uid      magic
>                  Xorg  2344   0   y    y     0          0
>                  Xorg  2344   0   n    y     0          2
>                  Xorg  2344   0   n    y     0          3
>                  Xorg  2344   0   n    y     0          4
>
> After:
>
> $ cat /sys/kernel/debug/dri/0/clients
>               command  tgid dev master a   uid      magic
>                  Xorg   830   0   y    y     0          0
>         xfce4-session   880   0   n    y     0          1
>                 xfwm4   943   0   n    y     0          2
>             neverball  1095   0   n    y     0          3
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>

Looks completely correct to me, but I can't claim that I understand all 
those nasty details around drm_master handling.

So only Acked-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  6 ++--
>   drivers/gpu/drm/drm_auth.c              |  3 +-
>   drivers/gpu/drm/drm_debugfs.c           | 10 ++++---
>   drivers/gpu/drm/drm_file.c              | 40 +++++++++++++++++++++++--
>   drivers/gpu/drm/drm_ioctl.c             |  3 ++
>   drivers/gpu/drm/nouveau/nouveau_drm.c   |  5 +++-
>   drivers/gpu/drm/vmwgfx/vmwgfx_gem.c     |  6 ++--
>   include/drm/drm_file.h                  | 13 ++++++--
>   8 files changed, 71 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index 863cb668e000..67ce634992f6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -960,6 +960,7 @@ static int amdgpu_debugfs_gem_info_show(struct seq_file *m, void *unused)
>   	list_for_each_entry(file, &dev->filelist, lhead) {
>   		struct task_struct *task;
>   		struct drm_gem_object *gobj;
> +		struct pid *pid;
>   		int id;
>   
>   		/*
> @@ -969,8 +970,9 @@ static int amdgpu_debugfs_gem_info_show(struct seq_file *m, void *unused)
>   		 * Therefore, we need to protect this ->comm access using RCU.
>   		 */
>   		rcu_read_lock();
> -		task = pid_task(file->pid, PIDTYPE_TGID);
> -		seq_printf(m, "pid %8d command %s:\n", pid_nr(file->pid),
> +		pid = rcu_dereference(file->pid);
> +		task = pid_task(pid, PIDTYPE_TGID);
> +		seq_printf(m, "pid %8d command %s:\n", pid_nr(pid),
>   			   task ? task->comm : "<unknown>");
>   		rcu_read_unlock();
>   
> diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
> index cf92a9ae8034..2ed2585ded37 100644
> --- a/drivers/gpu/drm/drm_auth.c
> +++ b/drivers/gpu/drm/drm_auth.c
> @@ -235,7 +235,8 @@ static int drm_new_set_master(struct drm_device *dev, struct drm_file *fpriv)
>   static int
>   drm_master_check_perm(struct drm_device *dev, struct drm_file *file_priv)
>   {
> -	if (file_priv->pid == task_pid(current) && file_priv->was_master)
> +	if (file_priv->was_master &&
> +	    rcu_access_pointer(file_priv->pid) == task_pid(current))
>   		return 0;
>   
>   	if (!capable(CAP_SYS_ADMIN))
> diff --git a/drivers/gpu/drm/drm_debugfs.c b/drivers/gpu/drm/drm_debugfs.c
> index 4855230ba2c6..b46f5ceb24c6 100644
> --- a/drivers/gpu/drm/drm_debugfs.c
> +++ b/drivers/gpu/drm/drm_debugfs.c
> @@ -90,15 +90,17 @@ static int drm_clients_info(struct seq_file *m, void *data)
>   	 */
>   	mutex_lock(&dev->filelist_mutex);
>   	list_for_each_entry_reverse(priv, &dev->filelist, lhead) {
> -		struct task_struct *task;
>   		bool is_current_master = drm_is_current_master(priv);
> +		struct task_struct *task;
> +		struct pid *pid;
>   
> -		rcu_read_lock(); /* locks pid_task()->comm */
> -		task = pid_task(priv->pid, PIDTYPE_TGID);
> +		rcu_read_lock(); /* Locks priv->pid and pid_task()->comm! */
> +		pid = rcu_dereference(priv->pid);
> +		task = pid_task(pid, PIDTYPE_TGID);
>   		uid = task ? __task_cred(task)->euid : GLOBAL_ROOT_UID;
>   		seq_printf(m, "%20s %5d %3d   %c    %c %5d %10u\n",
>   			   task ? task->comm : "<unknown>",
> -			   pid_vnr(priv->pid),
> +			   pid_vnr(pid),
>   			   priv->minor->index,
>   			   is_current_master ? 'y' : 'n',
>   			   priv->authenticated ? 'y' : 'n',
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index c1018c470047..f2f8175ece15 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -156,7 +156,7 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>   	if (!file)
>   		return ERR_PTR(-ENOMEM);
>   
> -	file->pid = get_pid(task_tgid(current));
> +	rcu_assign_pointer(file->pid, get_pid(task_tgid(current)));
>   	file->minor = minor;
>   
>   	/* for compatibility root is always authenticated */
> @@ -196,7 +196,7 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>   		drm_syncobj_release(file);
>   	if (drm_core_check_feature(dev, DRIVER_GEM))
>   		drm_gem_release(dev, file);
> -	put_pid(file->pid);
> +	put_pid(rcu_access_pointer(file->pid));
>   	kfree(file);
>   
>   	return ERR_PTR(ret);
> @@ -287,7 +287,7 @@ void drm_file_free(struct drm_file *file)
>   
>   	WARN_ON(!list_empty(&file->event_list));
>   
> -	put_pid(file->pid);
> +	put_pid(rcu_access_pointer(file->pid));
>   	kfree(file);
>   }
>   
> @@ -501,6 +501,40 @@ int drm_release(struct inode *inode, struct file *filp)
>   }
>   EXPORT_SYMBOL(drm_release);
>   
> +void drm_file_update_pid(struct drm_file *filp)
> +{
> +	struct drm_device *dev;
> +	struct pid *pid, *old;
> +
> +	/*
> +	 * Master nodes need to keep the original ownership in order for
> +	 * drm_master_check_perm to keep working correctly. (See comment in
> +	 * drm_auth.c.)
> +	 */
> +	if (filp->was_master)
> +		return;
> +
> +	pid = task_tgid(current);
> +
> +	/*
> +	 * Quick unlocked check since the model is a single handover followed by
> +	 * exclusive repeated use.
> +	 */
> +	if (pid == rcu_access_pointer(filp->pid))
> +		return;
> +
> +	dev = filp->minor->dev;
> +	mutex_lock(&dev->filelist_mutex);
> +	old = rcu_replace_pointer(filp->pid, pid, 1);
> +	mutex_unlock(&dev->filelist_mutex);
> +
> +	if (pid != old) {
> +		get_pid(pid);
> +		synchronize_rcu();
> +		put_pid(old);
> +	}
> +}
> +
>   /**
>    * drm_release_noglobal - release method for DRM file
>    * @inode: device inode
> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> index 7c9d66ee917d..305b18d9d7b6 100644
> --- a/drivers/gpu/drm/drm_ioctl.c
> +++ b/drivers/gpu/drm/drm_ioctl.c
> @@ -775,6 +775,9 @@ long drm_ioctl_kernel(struct file *file, drm_ioctl_t *func, void *kdata,
>   	struct drm_device *dev = file_priv->minor->dev;
>   	int retcode;
>   
> +	/* Update drm_file owner if fd was passed along. */
> +	drm_file_update_pid(file_priv);
> +
>   	if (drm_dev_is_unplugged(dev))
>   		return -ENODEV;
>   
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index cc7c5b4a05fd..57aeaf7af613 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -1095,7 +1095,10 @@ nouveau_drm_open(struct drm_device *dev, struct drm_file *fpriv)
>   	}
>   
>   	get_task_comm(tmpname, current);
> -	snprintf(name, sizeof(name), "%s[%d]", tmpname, pid_nr(fpriv->pid));
> +	rcu_read_lock();
> +	snprintf(name, sizeof(name), "%s[%d]",
> +		 tmpname, pid_nr(rcu_dereference(fpriv->pid)));
> +	rcu_read_unlock();
>   
>   	if (!(cli = kzalloc(sizeof(*cli), GFP_KERNEL))) {
>   		ret = -ENOMEM;
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
> index c0da89e16e6f..a07e5b7e2f2f 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
> @@ -232,6 +232,7 @@ static int vmw_debugfs_gem_info_show(struct seq_file *m, void *unused)
>   	list_for_each_entry(file, &dev->filelist, lhead) {
>   		struct task_struct *task;
>   		struct drm_gem_object *gobj;
> +		struct pid *pid;
>   		int id;
>   
>   		/*
> @@ -241,8 +242,9 @@ static int vmw_debugfs_gem_info_show(struct seq_file *m, void *unused)
>   		 * Therefore, we need to protect this ->comm access using RCU.
>   		 */
>   		rcu_read_lock();
> -		task = pid_task(file->pid, PIDTYPE_TGID);
> -		seq_printf(m, "pid %8d command %s:\n", pid_nr(file->pid),
> +		pid = rcu_dereference(file->pid);
> +		task = pid_task(pid, PIDTYPE_TGID);
> +		seq_printf(m, "pid %8d command %s:\n", pid_nr(pid),
>   			   task ? task->comm : "<unknown>");
>   		rcu_read_unlock();
>   
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index 0d1f853092ab..27d545131d4a 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -255,8 +255,15 @@ struct drm_file {
>   	/** @master_lookup_lock: Serializes @master. */
>   	spinlock_t master_lookup_lock;
>   
> -	/** @pid: Process that opened this file. */
> -	struct pid *pid;
> +	/**
> +	 * @pid: Process that is using this file.
> +	 *
> +	 * Must only be dereferenced under a rcu_read_lock or equivalent.
> +	 *
> +	 * Updates are guarded with dev->filelist_mutex and reference must be
> +	 * dropped after a RCU grace period to accommodate lockless readers.
> +	 */
> +	struct pid __rcu *pid;
>   
>   	/** @magic: Authentication magic, see @authenticated. */
>   	drm_magic_t magic;
> @@ -415,6 +422,8 @@ static inline bool drm_is_accel_client(const struct drm_file *file_priv)
>   	return file_priv->minor->type == DRM_MINOR_ACCEL;
>   }
>   
> +void drm_file_update_pid(struct drm_file *);
> +
>   int drm_open(struct inode *inode, struct file *filp);
>   int drm_open_helper(struct file *filp, struct drm_minor *minor);
>   ssize_t drm_read(struct file *filp, char __user *buffer,


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 01/10] drm: Track clients by tgid and not tid
  2023-03-14 15:33   ` Christian König
@ 2023-03-15  9:52     ` Tvrtko Ursulin
  2023-03-15 13:19       ` Christian König
  0 siblings, 1 reply; 19+ messages in thread
From: Tvrtko Ursulin @ 2023-03-15  9:52 UTC (permalink / raw)
  To: Christian König, Intel-gfx, dri-devel
  Cc: Rob Clark, Kenny.Ho, Daniel Vetter, Johannes Weiner, linux-kernel,
	Stéphane Marchesin, Alex Deucher, linux-graphics-maintainer,
	Zefan Li, Dave Airlie, Tejun Heo, cgroups, T . J . Mercier,
	Zack Rusin


Hi,

On 14/03/2023 15:33, Christian König wrote:
> Am 14.03.23 um 15:18 schrieb Tvrtko Ursulin:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Thread group id (aka pid from userspace point of view) is a more
>> interesting thing to show as an owner of a DRM fd, so track and show that
>> instead of the thread id.
>>
>> In the next patch we will make the owner updated post file descriptor
>> handover, which will also be tgid based to avoid ping-pong when multiple
>> threads access the fd.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Zack Rusin <zackr@vmware.com>
>> Cc: linux-graphics-maintainer@vmware.com
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: "Christian König" <christian.koenig@amd.com>
>> Reviewed-by: Zack Rusin <zackr@vmware.com>
> 
> Reviewed-by: Christian König <christian.koenig@amd.com>
> 
> Should we push the already reviewed cleanups like this one to 
> drm-misc-next? That makes sense even without the rest of the 
> functionality and reduce the amount of patches re-send.

I don't have the commit rights so if you could do that I certainly would 
not mind, thanks!

Regards,

Tvrtko

>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +-
>>   drivers/gpu/drm/drm_debugfs.c           | 4 ++--
>>   drivers/gpu/drm/drm_file.c              | 2 +-
>>   drivers/gpu/drm/vmwgfx/vmwgfx_gem.c     | 2 +-
>>   4 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> index d8e683688daa..863cb668e000 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> @@ -969,7 +969,7 @@ static int amdgpu_debugfs_gem_info_show(struct 
>> seq_file *m, void *unused)
>>            * Therefore, we need to protect this ->comm access using RCU.
>>            */
>>           rcu_read_lock();
>> -        task = pid_task(file->pid, PIDTYPE_PID);
>> +        task = pid_task(file->pid, PIDTYPE_TGID);
>>           seq_printf(m, "pid %8d command %s:\n", pid_nr(file->pid),
>>                  task ? task->comm : "<unknown>");
>>           rcu_read_unlock();
>> diff --git a/drivers/gpu/drm/drm_debugfs.c 
>> b/drivers/gpu/drm/drm_debugfs.c
>> index 4f643a490dc3..4855230ba2c6 100644
>> --- a/drivers/gpu/drm/drm_debugfs.c
>> +++ b/drivers/gpu/drm/drm_debugfs.c
>> @@ -80,7 +80,7 @@ static int drm_clients_info(struct seq_file *m, void 
>> *data)
>>       seq_printf(m,
>>              "%20s %5s %3s master a %5s %10s\n",
>>              "command",
>> -           "pid",
>> +           "tgid",
>>              "dev",
>>              "uid",
>>              "magic");
>> @@ -94,7 +94,7 @@ static int drm_clients_info(struct seq_file *m, void 
>> *data)
>>           bool is_current_master = drm_is_current_master(priv);
>>           rcu_read_lock(); /* locks pid_task()->comm */
>> -        task = pid_task(priv->pid, PIDTYPE_PID);
>> +        task = pid_task(priv->pid, PIDTYPE_TGID);
>>           uid = task ? __task_cred(task)->euid : GLOBAL_ROOT_UID;
>>           seq_printf(m, "%20s %5d %3d   %c    %c %5d %10u\n",
>>                  task ? task->comm : "<unknown>",
>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>> index a51ff8cee049..c1018c470047 100644
>> --- a/drivers/gpu/drm/drm_file.c
>> +++ b/drivers/gpu/drm/drm_file.c
>> @@ -156,7 +156,7 @@ struct drm_file *drm_file_alloc(struct drm_minor 
>> *minor)
>>       if (!file)
>>           return ERR_PTR(-ENOMEM);
>> -    file->pid = get_pid(task_pid(current));
>> +    file->pid = get_pid(task_tgid(current));
>>       file->minor = minor;
>>       /* for compatibility root is always authenticated */
>> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c 
>> b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
>> index d6baf73a6458..c0da89e16e6f 100644
>> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
>> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
>> @@ -241,7 +241,7 @@ static int vmw_debugfs_gem_info_show(struct 
>> seq_file *m, void *unused)
>>            * Therefore, we need to protect this ->comm access using RCU.
>>            */
>>           rcu_read_lock();
>> -        task = pid_task(file->pid, PIDTYPE_PID);
>> +        task = pid_task(file->pid, PIDTYPE_TGID);
>>           seq_printf(m, "pid %8d command %s:\n", pid_nr(file->pid),
>>                  task ? task->comm : "<unknown>");
>>           rcu_read_unlock();
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 01/10] drm: Track clients by tgid and not tid
  2023-03-15  9:52     ` Tvrtko Ursulin
@ 2023-03-15 13:19       ` Christian König
  0 siblings, 0 replies; 19+ messages in thread
From: Christian König @ 2023-03-15 13:19 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx, dri-devel
  Cc: Rob Clark, Kenny.Ho, Daniel Vetter, Johannes Weiner, linux-kernel,
	Stéphane Marchesin, Alex Deucher, linux-graphics-maintainer,
	Zefan Li, Dave Airlie, Tejun Heo, cgroups, T . J . Mercier,
	Zack Rusin

Am 15.03.23 um 10:52 schrieb Tvrtko Ursulin:
>
> Hi,
>
> On 14/03/2023 15:33, Christian König wrote:
>> Am 14.03.23 um 15:18 schrieb Tvrtko Ursulin:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> Thread group id (aka pid from userspace point of view) is a more
>>> interesting thing to show as an owner of a DRM fd, so track and show 
>>> that
>>> instead of the thread id.
>>>
>>> In the next patch we will make the owner updated post file descriptor
>>> handover, which will also be tgid based to avoid ping-pong when 
>>> multiple
>>> threads access the fd.
>>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Cc: Zack Rusin <zackr@vmware.com>
>>> Cc: linux-graphics-maintainer@vmware.com
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: "Christian König" <christian.koenig@amd.com>
>>> Reviewed-by: Zack Rusin <zackr@vmware.com>
>>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>
>> Should we push the already reviewed cleanups like this one to 
>> drm-misc-next? That makes sense even without the rest of the 
>> functionality and reduce the amount of patches re-send.
>
> I don't have the commit rights so if you could do that I certainly 
> would not mind, thanks!

I've just pushed this patch here to drm-misc-next. As soon as Daniel or 
Dave gives his ok as well I'm going to also push the second one.

Regards,
Christian.

>
> Regards,
>
> Tvrtko
>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +-
>>>   drivers/gpu/drm/drm_debugfs.c           | 4 ++--
>>>   drivers/gpu/drm/drm_file.c              | 2 +-
>>>   drivers/gpu/drm/vmwgfx/vmwgfx_gem.c     | 2 +-
>>>   4 files changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> index d8e683688daa..863cb668e000 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> @@ -969,7 +969,7 @@ static int amdgpu_debugfs_gem_info_show(struct 
>>> seq_file *m, void *unused)
>>>            * Therefore, we need to protect this ->comm access using 
>>> RCU.
>>>            */
>>>           rcu_read_lock();
>>> -        task = pid_task(file->pid, PIDTYPE_PID);
>>> +        task = pid_task(file->pid, PIDTYPE_TGID);
>>>           seq_printf(m, "pid %8d command %s:\n", pid_nr(file->pid),
>>>                  task ? task->comm : "<unknown>");
>>>           rcu_read_unlock();
>>> diff --git a/drivers/gpu/drm/drm_debugfs.c 
>>> b/drivers/gpu/drm/drm_debugfs.c
>>> index 4f643a490dc3..4855230ba2c6 100644
>>> --- a/drivers/gpu/drm/drm_debugfs.c
>>> +++ b/drivers/gpu/drm/drm_debugfs.c
>>> @@ -80,7 +80,7 @@ static int drm_clients_info(struct seq_file *m, 
>>> void *data)
>>>       seq_printf(m,
>>>              "%20s %5s %3s master a %5s %10s\n",
>>>              "command",
>>> -           "pid",
>>> +           "tgid",
>>>              "dev",
>>>              "uid",
>>>              "magic");
>>> @@ -94,7 +94,7 @@ static int drm_clients_info(struct seq_file *m, 
>>> void *data)
>>>           bool is_current_master = drm_is_current_master(priv);
>>>           rcu_read_lock(); /* locks pid_task()->comm */
>>> -        task = pid_task(priv->pid, PIDTYPE_PID);
>>> +        task = pid_task(priv->pid, PIDTYPE_TGID);
>>>           uid = task ? __task_cred(task)->euid : GLOBAL_ROOT_UID;
>>>           seq_printf(m, "%20s %5d %3d   %c    %c %5d %10u\n",
>>>                  task ? task->comm : "<unknown>",
>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>> index a51ff8cee049..c1018c470047 100644
>>> --- a/drivers/gpu/drm/drm_file.c
>>> +++ b/drivers/gpu/drm/drm_file.c
>>> @@ -156,7 +156,7 @@ struct drm_file *drm_file_alloc(struct drm_minor 
>>> *minor)
>>>       if (!file)
>>>           return ERR_PTR(-ENOMEM);
>>> -    file->pid = get_pid(task_pid(current));
>>> +    file->pid = get_pid(task_tgid(current));
>>>       file->minor = minor;
>>>       /* for compatibility root is always authenticated */
>>> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c 
>>> b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
>>> index d6baf73a6458..c0da89e16e6f 100644
>>> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
>>> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
>>> @@ -241,7 +241,7 @@ static int vmw_debugfs_gem_info_show(struct 
>>> seq_file *m, void *unused)
>>>            * Therefore, we need to protect this ->comm access using 
>>> RCU.
>>>            */
>>>           rcu_read_lock();
>>> -        task = pid_task(file->pid, PIDTYPE_PID);
>>> +        task = pid_task(file->pid, PIDTYPE_TGID);
>>>           seq_printf(m, "pid %8d command %s:\n", pid_nr(file->pid),
>>>                  task ? task->comm : "<unknown>");
>>>           rcu_read_unlock();
>>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v4 00/10] DRM scheduling cgroup controller
       [not found] ` <20230314141904.1210824-1-tvrtko.ursulin-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
@ 2023-03-25  1:43   ` Tejun Heo
  0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2023-03-25  1:43 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Johannes Weiner, Zefan Li,
	Dave Airlie, Daniel Vetter, Rob Clark, Stéphane Marchesin,
	T . J . Mercier, Kenny.Ho-5C7GfCeVMHo, Christian König,
	Brian Welty, Tvrtko Ursulin

Hello, Tvrtko.

On Tue, Mar 14, 2023 at 02:18:54PM +0000, Tvrtko Ursulin wrote:
> DRM scheduling soft limits
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Because of the heterogenous hardware and driver DRM capabilities, soft limits
> are implemented as a loose co-operative (bi-directional) interface between the
> controller and DRM core.
> 
> The controller configures the GPU time allowed per group and periodically scans
> the belonging tasks to detect the over budget condition, at which point it
> invokes a callback notifying the DRM core of the condition.
> 
> DRM core provides an API to query per process GPU utilization and 2nd API to
> receive notification from the cgroup controller when the group enters or exits
> the over budget condition.
> 
> Individual DRM drivers which implement the interface are expected to act on this
> in the best-effort manner only. There are no guarantees that the soft limits
> will be respected.
> 
> DRM scheduling soft limits interface files
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In general, I'm in favor of your approach but can you please stop using the
term "soft limit". That's a term with a specific historical meaning in
cgroup, so it gets really confusing when you use the term for hierarchical
weighted control. If you need a term to refer to how the weighted control is
implemented by throttling cgroups at target rates, please just come up with
a different term "usage threshold based control", "usage throttling based
control" or whichever you may like.

>   drm.weight
> 	Standard cgroup weight based control [1, 10000] used to configure the
> 	relative distributing of GPU time between the sibling groups.
> 
> This builds upon the per client GPU utilisation work which landed recently for a
> few drivers. My thinking is that in principle, an intersect of drivers which
> support both that and some sort of scheduling control, like  priorities, could
> also in theory support this controller.
> 
> Another really interesting angle for this controller is that it mimics the same
> control menthod used by the CPU scheduler. That is the proportional/weight based
> GPU time budgeting. Which makes it easy to configure and does not need a new
> mental model.

FWIW, the hierarchical weighted distribution is also implemented by IO
control.

> However, as the introduction mentions, GPUs are much more heterogenous and
> therefore the controller uses very "soft" wording as to what it promises. The
> general statement is that it can define budgets, notify clients when they are
> over them, and let individual drivers implement best effort handling of those
> conditions.

Maybe "best effort" is more suited than "soft"?

...
> Roughly simultaneously we run the following two benchmarks in each session
> respectively:
> 
> 1)
> ./GpuTest /test=pixmark_julia_fp32 /width=1920 /height=1080 /fullscreen /no_scorebox /benchmark /benchmark_duration_ms=60000
> 
> 2)
> vblank_mode=0 bin/testfw_app --gl_api=desktop_core --width=1920 --height=1080 --fullscreen 1 --gfx=glfw -t gl_manhattan
> 
> (The only reason for vsync off here is because I struggled to find an easily
> runnable and demanding enough benchmark, or to run on a screen large enough to
> make even a simpler ones demanding.)
> 
> With this test we get 252fps from GpuTest and 96fps from GfxBenchmark.
> 
> Premise here is that one of these GPU intensive benchmarks is intended to be ran
> by the user with lower priority. Imagine kicking off some background compute
> processing and continuing to use the UI for other tasks. Hence the user will now
> re-run the test by first lowering the weight control of the first session (DRM
> cgroup):
> 
> 1)
> echo 50 | sudo tee /sys/fs/cgroup/`cut -d':' -f3 /proc/self/cgroup`/drm.weight
> ./GpuTest /test=pixmark_julia_fp32 /width=1920 /height=1080 /fullscreen /no_scorebox /benchmark /benchmark_duration_ms=60000
> 
> 2)
> vblank_mode=0 bin/testfw_app --gl_api=desktop_core --width=1920 --height=1080 --fullscreen 1 --gfx=glfw -t gl_manhattan
> 
> In this case we will see that GpuTest has recorded 208fps (~18% down) and
> GfxBenchmark 114fps (18% up), demonstrating that even a very simple approach of
> wiring up i915 to the DRM cgroup controller can enable external GPU scheduling
> control.

It's really nice to see it working pretty intuitively.

>  * For now (RFC) I haven't implemented the 2nd suggestion from Tejun of having
>    a shadow tree which would only contain groups with DRM clients. (Purpose
>    being less nodes to traverse in the scanning loop.)
> 
>  * Is the global state passing from can_attach to attach really okay? (I need
>    source and destination css.)

Right now, it is and there are other places that depend on it. Obviously,
it's not great and we probably want to add explicit context passed around
instead in the future, but for now, it should be okay.

While not fully polished, from cgroup POV, the series looks pretty good and
I'd be happy to see it merged.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 02/10] drm: Update file owner during use
       [not found]   ` <20230314141904.1210824-3-tvrtko.ursulin-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
@ 2023-04-21 12:13     ` Emil Velikov
  2023-06-08 14:26       ` Tvrtko Ursulin
  0 siblings, 1 reply; 19+ messages in thread
From: Emil Velikov @ 2023-04-21 12:13 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Rob Clark,
	Brian Welty, Kenny.Ho-5C7GfCeVMHo, Tvrtko Ursulin, Daniel Vetter,
	Johannes Weiner, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Stéphane Marchesin, Christian König, Zefan Li,
	Dave Airlie, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	T . J . Mercier

Greetings everyone,

Above all - hell yeah. Thank you Tvrtko, this has been annoying the
hell out of me for ages.

On Tue, 14 Mar 2023 at 14:19, Tvrtko Ursulin
<tvrtko.ursulin-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> wrote:
>
> From: Tvrtko Ursulin <tvrtko.ursulin-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>
> With the typical model where the display server opends the file descriptor
> and then hands it over to the client we were showing stale data in
> debugfs.

s/opends/opens/

But as a whole the sentence is fairly misleading. Story time:

The traditional model, the server was the orchestrator managing the
primary device node. From the fd, to the master status and
authentication. But looking at the fd alone, this has varied across
the years.

IIRC in the DRI1 days, Xorg (libdrm really) would have a list of open
fd(s) and reuse those whenever needed, DRI2 the client was responsible
for open() themselves and with DRI3 the fd was passed to the client.

Around the inception of DRI3 and systemd-logind, the latter became
another possible orchestrator. Whereby Xorg and Wayland compositors
could ask it for the fd. For various reasons (hysterical and genuine
ones) Xorg has a fallback path going the open(), whereas Wayland
compositors are moving to solely relying on logind... some never had
fallback even.

Over the past few years, more projects have emerged which provide
functionality similar (be that on API level, Dbus, or otherwise) to
systemd-logind.

Apart from that, the commit is spot on. I like the use of rcu and the
was_master handling is correct. With some message polish this commit
is:
Reviewed-by: Emil Velikov <emil.l.velikov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

I also had a brief look at 01/10, although I cannot find many
references for the pid <> tguid mappings. Be that on the kernel side
or userspace - do you have any links that I can educate myself?

Thanks
Emil

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 02/10] drm: Update file owner during use
  2023-04-21 12:13     ` Emil Velikov
@ 2023-06-08 14:26       ` Tvrtko Ursulin
  2023-06-20 14:48         ` Emil Velikov
  0 siblings, 1 reply; 19+ messages in thread
From: Tvrtko Ursulin @ 2023-06-08 14:26 UTC (permalink / raw)
  To: Emil Velikov
  Cc: Rob Clark, cgroups, Dave Airlie, Kenny.Ho, Intel-gfx,
	linux-kernel, dri-devel, T . J . Mercier, Johannes Weiner,
	Zefan Li, Daniel Vetter, Tejun Heo, Stéphane Marchesin,
	Christian König


On 21/04/2023 13:13, Emil Velikov wrote:
> Greetings everyone,
> 
> Above all - hell yeah. Thank you Tvrtko, this has been annoying the
> hell out of me for ages.

Yay!

> On Tue, 14 Mar 2023 at 14:19, Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
>>
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> With the typical model where the display server opends the file descriptor
>> and then hands it over to the client we were showing stale data in
>> debugfs.
> 
> s/opends/opens/

Thanks!

> But as a whole the sentence is fairly misleading. Story time:
> 
> The traditional model, the server was the orchestrator managing the
> primary device node. From the fd, to the master status and
> authentication. But looking at the fd alone, this has varied across
> the years.
> 
> IIRC in the DRI1 days, Xorg (libdrm really) would have a list of open
> fd(s) and reuse those whenever needed, DRI2 the client was responsible
> for open() themselves and with DRI3 the fd was passed to the client.
> 
> Around the inception of DRI3 and systemd-logind, the latter became
> another possible orchestrator. Whereby Xorg and Wayland compositors
> could ask it for the fd. For various reasons (hysterical and genuine
> ones) Xorg has a fallback path going the open(), whereas Wayland
> compositors are moving to solely relying on logind... some never had
> fallback even.
> 
> Over the past few years, more projects have emerged which provide
> functionality similar (be that on API level, Dbus, or otherwise) to
> systemd-logind.
> 
> 
> Apart from that, the commit is spot on. I like the use of rcu and the
> was_master handling is correct. With some message polish this commit
> is:
> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>

Are you okay if I just paste your very fine explanation verbatim, with 
credits?

> I also had a brief look at 01/10, although I cannot find many
> references for the pid <> tguid mappings. Be that on the kernel side
> or userspace - do you have any links that I can educate myself?

TGID or thread group leader. For single threaded userspace TGID equals 
to PID, while for multi-threaded first thread TGID equals PID/TID, while 
additional threads PID/TID does not equal TGID. Clear, as mud? :) My 
POSIX book is misplaced somewhere having not consulted it years... :)

Regards,

Tvrtko


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 02/10] drm: Update file owner during use
  2023-06-08 14:26       ` Tvrtko Ursulin
@ 2023-06-20 14:48         ` Emil Velikov
  0 siblings, 0 replies; 19+ messages in thread
From: Emil Velikov @ 2023-06-20 14:48 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Intel-gfx, dri-devel, Rob Clark, Brian Welty, Kenny.Ho,
	Tvrtko Ursulin, Daniel Vetter, Johannes Weiner, linux-kernel,
	Stéphane Marchesin, Christian König, Zefan Li,
	Dave Airlie, Tejun Heo, cgroups, T . J . Mercier

Hi Tvrtko

Sorry for the delay, real life and other obligations got in the way.

On Thu, 8 Jun 2023 at 15:26, Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
> On 21/04/2023 13:13, Emil Velikov wrote:

> Are you okay if I just paste your very fine explanation verbatim, with
> credits?
>

Yes, feel free to use as much of if as you see reasonable.

> > I also had a brief look at 01/10, although I cannot find many
> > references for the pid <> tguid mappings. Be that on the kernel side
> > or userspace - do you have any links that I can educate myself?
>
> TGID or thread group leader. For single threaded userspace TGID equals
> to PID, while for multi-threaded first thread TGID equals PID/TID, while
> additional threads PID/TID does not equal TGID. Clear, as mud? :) My
> POSIX book is misplaced somewhere having not consulted it years... :)
>

Ack. /me looks into actually buying one, perhaps

Thanks
Emil

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-06-20 14:48 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-14 14:18 [RFC v4 00/10] DRM scheduling cgroup controller Tvrtko Ursulin
2023-03-14 14:18 ` [RFC 01/10] drm: Track clients by tgid and not tid Tvrtko Ursulin
2023-03-14 15:33   ` Christian König
2023-03-15  9:52     ` Tvrtko Ursulin
2023-03-15 13:19       ` Christian König
2023-03-14 14:18 ` [RFC 02/10] drm: Update file owner during use Tvrtko Ursulin
2023-03-14 15:49   ` Christian König
     [not found]   ` <20230314141904.1210824-3-tvrtko.ursulin-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2023-04-21 12:13     ` Emil Velikov
2023-06-08 14:26       ` Tvrtko Ursulin
2023-06-20 14:48         ` Emil Velikov
2023-03-14 14:18 ` [RFC 03/10] cgroup: Add the DRM cgroup controller Tvrtko Ursulin
2023-03-14 14:18 ` [RFC 04/10] drm/cgroup: Track DRM clients per cgroup Tvrtko Ursulin
2023-03-14 14:18 ` [RFC 05/10] drm/cgroup: Add ability to query drm cgroup GPU time Tvrtko Ursulin
2023-03-14 14:19 ` [RFC 06/10] drm/cgroup: Add over budget signalling callback Tvrtko Ursulin
2023-03-14 14:19 ` [RFC 07/10] drm/cgroup: Only track clients which are providing drm_cgroup_ops Tvrtko Ursulin
2023-03-14 14:19 ` [RFC 08/10] cgroup/drm: Introduce weight based drm cgroup control Tvrtko Ursulin
2023-03-14 14:19 ` [RFC 09/10] drm/i915: Wire up with drm controller GPU time query Tvrtko Ursulin
2023-03-14 14:19 ` [RFC 10/10] drm/i915: Implement cgroup controller over budget throttling Tvrtko Ursulin
     [not found] ` <20230314141904.1210824-1-tvrtko.ursulin-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2023-03-25  1:43   ` [RFC v4 00/10] DRM scheduling cgroup controller Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox