From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tvrtko Ursulin Subject: Re: [Intel-gfx] [RFC PATCH 2/4] drm/cgroup: Add memory accounting to DRM cgroup Date: Wed, 3 May 2023 16:31:19 +0100 Message-ID: References: <20230503083500.645848-1-maarten.lankhorst@linux.intel.com> <20230503083500.645848-3-maarten.lankhorst@linux.intel.com> Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1683127976; x=1714663976; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=BN2BK+oeCHCUZaZ4NEBc8WA1+u9GPhP/CnBYczQWvAI=; b=XQLXPjkTO8zlJT9r/t/whLrxO3kC0WUWg1pSJEvRxxWFI3ST15iHW66P zZo3LrPqlQ5hanOPHTeWF/KOLr5vC+EcxFp6xN+J4eYJXQiu9Z0Cp2q30 RNh5rYWLcou1tp9dAYm0FjFm/Nkbr21dqoi7AAaPmWc1y1ybBA3G9kV34 UVSmp6+38bjiAncL6GlBd2Ef51WedZ2QoEMNkT3h5t277fj2b/leBTwwB syR1VKN+qpCbzWYzUFrhf6Y3PXXvEygJonDeQ1vDMwuRjFpleET/5kl2c 07rDDQ4CfMoTOS8iHulBAVgecpqHw1lpeSG17waTp9CpXl9PGHr0zbNSD Q==; Content-Language: en-US In-Reply-To: <20230503083500.645848-3-maarten.lankhorst-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> List-ID: Content-Type: text/plain; charset="iso-8859-1"; format="flowed" To: Maarten Lankhorst , dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, intel-xe-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org Cc: Daniel Vetter , Thomas Zimmermann , intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, Maxime Ripard , Zefan Li , Johannes Weiner , Tejun Heo , David Airlie On 03/05/2023 09:34, Maarten Lankhorst wrote: > Based roughly on the rdma and misc cgroup controllers, with a lot of > the accounting code borrowed from rdma. >=20 > The interface is simple: > - populate drmcgroup_device->regions[..] name and size for each active > region. > - Call drm(m)cg_register_device() > - Use drmcg_try_charge to check if you can allocate a chunk of memory, > use drmcg_uncharge when freeing it. This may return an error code, > or -EAGAIN when the cgroup limit is reached. >=20 > The ttm code transforms -EAGAIN back to -ENOSPC since it has specific > logic for -ENOSPC, and returning -EAGAIN to userspace causes drmIoctl > to restart infinitely. >=20 > This API allows you to limit stuff with cgroups. > You can see the supported cards in /sys/fs/cgroup/drm.capacity > You need to echo +drm to cgroup.subtree_control, and then you can > partition memory. >=20 > In each cgroup subdir: > drm.max shows the current limits of the cgroup. > drm.current the current amount of allocated memory used by this cgroup. > drm.events shows the amount of time max memory was reached. Events is not in the patch? > Signed-off-by: Maarten Lankhorst > --- > Documentation/admin-guide/cgroup-v2.rst | 46 ++ > Documentation/gpu/drm-compute.rst | 54 +++ > include/linux/cgroup_drm.h | 81 ++++ > kernel/cgroup/drm.c | 539 +++++++++++++++++++++++- > 4 files changed, 699 insertions(+), 21 deletions(-) > create mode 100644 Documentation/gpu/drm-compute.rst >=20 > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admi= n-guide/cgroup-v2.rst > index f67c0829350b..b858d99cb2ef 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -2374,6 +2374,52 @@ RDMA Interface Files > mlx4_0 hca_handle=3D1 hca_object=3D20 > ocrdma1 hca_handle=3D1 hca_object=3D23 > =20 > +DRM > +---- > + > +The "drm" controller regulates the distribution and accounting of > +DRM resources. > + > +DRM Interface Files > +~~~~~~~~~~~~~~~~~~~~ > + > + drm.max > + A readwrite nested-keyed file that exists for all the cgroups > + except root that describes current configured resource limit > + for a DRM device. > + > + Lines are keyed by device name and are not ordered. > + Each line contains space separated resource name and its configured > + limit that can be distributed. > + > + The following nested keys are defined. > + > + =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > + region.* Maximum amount of bytes that allocatable in this region > + =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > + > + An example for xe follows:: > + > + 0000:03:00.0 region.vram0=3D1073741824 region.stolen=3Dmax > + > + drm.capacity > + A read-only file that describes maximum region capacity. > + It only exists on the root cgroup. Not all memory can be > + allocated by cgroups, as the kernel reserves some for > + internal use. > + > + An example for xe follows:: > + > + 0000:03:00.0 region.vram0=3D8514437120 region.stolen=3D67108864 > + > + drm.current > + A read-only file that describes current resource usage. > + It exists for all the cgroup except root. > + > + An example for xe follows:: > + > + 0000:03:00.0 region.vram0=3D12550144 region.stolen=3D8650752 > + > HugeTLB > ------- > =20 > diff --git a/Documentation/gpu/drm-compute.rst b/Documentation/gpu/drm-co= mpute.rst > new file mode 100644 > index 000000000000..116270976ef7 > --- /dev/null > +++ b/Documentation/gpu/drm-compute.rst > @@ -0,0 +1,54 @@ > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > +Long running workloads and compute > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > + > +Long running workloads (compute) are workloads that will not complete in= 10 > +seconds. (The time let the user wait before he reaches for the power but= ton). > +This means that other techniques need to be used to manage those workloa= ds, > +that cannot use fences. > + > +Some hardware may schedule compute jobs, and have no way to pre-empt the= m, or > +have their memory swapped out from them. Or they simply want their workl= oad > +not to be preempted or swapped out at all. > + > +This means that it differs from what is described in driver-api/dma-buf.= rst. > + > +As with normal compute jobs, dma-fence may not be used at all. In this c= ase, > +not even to force preemption. The driver with is simply forced to unmap = a BO > +from the long compute job's address space on unbind immediately, not even > +waiting for the workload to complete. Effectively this terminates the wo= rkload > +when there is no hardware support to recover. > + > +Since this is undesirable, there need to be mitigations to prevent a wor= kload > +from being terminated. There are several possible approach, all with the= ir > +advantages and drawbacks. > + > +The first approach you will likely try is to pin all buffers used by com= pute. > +This guarantees that the job will run uninterrupted, but also allows a v= ery > +denial of service attack by pinning as much memory as possible, hogging = the > +all GPU memory, and possibly a huge chunk of CPU memory. > + > +A second approach that will work slightly better on its own is adding an= option > +not to evict when creating a new job (any kind). If all of userspace opt= s in > +to this flag, it would prevent cooperating userspace from forced termina= ting > +older compute jobs to start a new one. > + > +If job preemption and recoverable pagefaults are not available, those ar= e the > +only approaches possible. So even with those, you want a separate way of > +controlling resources. The standard kernel way of doing so is cgroups. > + > +This creates a third option, using cgroups to prevent eviction. Both GPU= and > +driver-allocated CPU memory would be accounted to the correct cgroup, and > +eviction would be made cgroup aware. This allows the GPU to be partition= ed > +into cgroups, that will allow jobs to run next to each other without > +interference. The 3rd approach is only valid if used strictly with device local=20 memory, right? Because as soon as system memory backed buffers are used=20 this approach cannot guarantee no eviction can be triggered. > + > +The interface to the cgroup would be similar to the current CPU memory > +interface, with similar semantics for min/low/high/max, if eviction can > +be made cgroup aware. For now only max is implemented. > + > +What should be noted is that each memory region (tiled memory for exampl= e) > +should have its own accounting, using $card key0 =3D value0 key1 =3D val= ue1. > + > +The key is set to the regionid set by the driver, for example "tile0". > +For the value of $card, we use drmGetUnique(). > diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h > index 8ef66a47619f..4f17b1c85f47 100644 > --- a/include/linux/cgroup_drm.h > +++ b/include/linux/cgroup_drm.h > @@ -6,4 +6,85 @@ > #ifndef _CGROUP_DRM_H > #define _CGROUP_DRM_H > =20 > +#include > + > +#include > + > +struct drm_device; > +struct drm_file; > + > +struct drmcgroup_state; > + > +/* > + * Use 8 as max, because of N^2 lookup when setting things, can be bumpe= d if needed > + * Identical to TTM_NUM_MEM_TYPES to allow simplifying that code. > + */ > +#define DRMCG_MAX_REGIONS 8 > + > +struct drmcgroup_device { > + struct list_head list; > + struct list_head pools; > + > + struct { > + u64 size; > + const char *name; > + } regions[DRMCG_MAX_REGIONS]; > + > + /* Name describing the card, set by drmcg_register_device */ > + const char *name; > + > +}; > + > +#if IS_ENABLED(CONFIG_CGROUP_DRM) > +int drmcg_register_device(struct drm_device *dev, > + struct drmcgroup_device *drm_cg); > +void drmcg_unregister_device(struct drmcgroup_device *cgdev); > +int drmcg_try_charge(struct drmcgroup_state **drmcg, > + struct drmcgroup_device *cgdev, > + u32 index, u64 size); > +void drmcg_uncharge(struct drmcgroup_state *drmcg, > + struct drmcgroup_device *cgdev, > + u32 index, u64 size); > +#else > +static inline int > +drmcg_register_device(struct drm_device *dev, > + struct drm_cgroup *drm_cg) > +{ > + return 0; > +} > + > +static inline void drmcg_unregister_device(struct drmcgroup_device *cgde= v) > +{ > +} > + > +static inline int drmcg_try_charge(struct drmcgroup_state **drmcg, > + struct drmcgroup_device *cgdev, > + u32 index, u64 size) > +{ > + *drmcg =3D NULL; > + return 0; > +} > + > +static inline void drmcg_uncharge(struct drmcgroup_state *drmcg, > + struct drmcgroup_device *cgdev, > + u32 index, u64 size) > +{ } > +#endif > + > +static inline void drmmcg_unregister_device(struct drm_device *dev, void= *arg) > +{ > + drmcg_unregister_device(arg); > +} > + > +/* > + * This needs to be done as inline, because cgroup lives in the core > + * kernel and it cannot call drm calls directly > + */ > +static inline int drmmcg_register_device(struct drm_device *dev, > + struct drmcgroup_device *cgdev) > +{ > + return drmcg_register_device(dev, cgdev) ?: > + drmm_add_action_or_reset(dev, drmmcg_unregister_device, cgdev); > +} > + > #endif /* _CGROUP_DRM_H */ > diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c > index 02c8eaa633d3..a93d9344fd36 100644 > --- a/kernel/cgroup/drm.c > +++ b/kernel/cgroup/drm.c > @@ -1,60 +1,557 @@ > -/* SPDX-License-Identifier: MIT */ > +// SPDX-License-Identifier: GPL-2.0 > /* > - * Copyright =C2=A9 2023 Intel Corporation > + * Copyright 2023 Intel > + * Partially based on the rdma and misc controllers, which bear the foll= owing copyrights: > + * > + * Copyright 2020 Google LLC > + * Copyright (C) 2016 Parav Pandit > */ > =20 > #include > #include > +#include > +#include > +#include > #include > =20 > -struct drm_cgroup_state { As a side note, it'd be easier to read the diff if you left the name as=20 is, and some other details too, like the static root group (I need to=20 remind myself if/why I needed it, but does it harm you?) and my missed=20 static keywords and needless static struct initialization. I will fix=20 that up in my patch localy. Aynway, that way it would maybe be less=20 churn from one patch to the other in the series. > +#include > +#include > +#include > +#include > + > +struct drmcgroup_state { > struct cgroup_subsys_state css; > + > + struct list_head pools; > }; > =20 > -struct drm_root_cgroup_state { > - struct drm_cgroup_state drmcs; > +struct drmcgroup_pool_state { > + struct drmcgroup_device *device; > + struct drmcgroup_resource { > + s64 max, used; > + } resources[DRMCG_MAX_REGIONS]; > + > + s64 usage_sum; > + > + struct list_head cg_node; cg always makes me think cgroup and not css so it is a bit confusing. Why are two lists needed? > + struct list_head dev_node; > }; > =20 > -static struct drm_root_cgroup_state root_drmcs; > +static DEFINE_MUTEX(drmcg_mutex); > +static LIST_HEAD(drmcg_devices); > =20 > -static inline struct drm_cgroup_state * > +static inline struct drmcgroup_state * > css_to_drmcs(struct cgroup_subsys_state *css) > { > - return container_of(css, struct drm_cgroup_state, css); > + return container_of(css, struct drmcgroup_state, css); > +} > + > +static inline struct drmcgroup_state *get_current_drmcg(void) > +{ > + return css_to_drmcs(task_get_css(current, drm_cgrp_id)); > +} > + > +static struct drmcgroup_state *parent_drmcg(struct drmcgroup_state *cg) > +{ > + return css_to_drmcs(cg->css.parent); > +} > + > +static void free_cg_pool_locked(struct drmcgroup_pool_state *pool) > +{ > + lockdep_assert_held(&drmcg_mutex); > + > + list_del(&pool->cg_node); > + list_del(&pool->dev_node); > + kfree(pool); > +} > + > +static void > +set_resource_max(struct drmcgroup_pool_state *pool, int i, u64 new_max) > +{ > + pool->resources[i].max =3D new_max; > +} > + > +static void set_all_resource_max_limit(struct drmcgroup_pool_state *rpoo= l) > +{ > + int i; > + > + for (i =3D 0; i < DRMCG_MAX_REGIONS; i++) > + set_resource_max(rpool, i, S64_MAX); > +} > + > +static void drmcs_offline(struct cgroup_subsys_state *css) > +{ > + struct drmcgroup_state *drmcs =3D css_to_drmcs(css); > + struct drmcgroup_pool_state *pool, *next; > + > + mutex_lock(&drmcg_mutex); > + list_for_each_entry_safe(pool, next, &drmcs->pools, cg_node) { > + if (!pool->usage_sum) { > + free_cg_pool_locked(pool); > + } else { > + /* Reset all regions, last uncharge will remove pool */ > + set_all_resource_max_limit(pool); > + } > + } > + mutex_unlock(&drmcg_mutex); > } > =20 > static void drmcs_free(struct cgroup_subsys_state *css) > { > - struct drm_cgroup_state *drmcs =3D css_to_drmcs(css); > + struct drmcgroup_state *drmcs =3D css_to_drmcs(css); > =20 > - if (drmcs !=3D &root_drmcs.drmcs) > - kfree(drmcs); > + kfree(drmcs); > } > =20 > static struct cgroup_subsys_state * > drmcs_alloc(struct cgroup_subsys_state *parent_css) > { > - struct drm_cgroup_state *drmcs; > + struct drmcgroup_state *drmcs =3D kzalloc(sizeof(*drmcs), GFP_KERNEL); > + if (!drmcs) > + return ERR_PTR(-ENOMEM); > + > + INIT_LIST_HEAD(&drmcs->pools); > + return &drmcs->css; > +} > + > +static struct drmcgroup_pool_state * > +find_cg_pool_locked(struct drmcgroup_state *drmcs, struct drmcgroup_devi= ce *dev) > +{ > + struct drmcgroup_pool_state *pool; > + > + list_for_each_entry(pool, &drmcs->pools, cg_node) > + if (pool->device =3D=3D dev) > + return pool; > + > + return NULL; > +} > + > +static struct drmcgroup_pool_state * > +get_cg_pool_locked(struct drmcgroup_state *drmcs, struct drmcgroup_devic= e *dev) > +{ > + struct drmcgroup_pool_state *pool; > + > + pool =3D find_cg_pool_locked(drmcs, dev); > + if (pool) > + return pool; > + > + pool =3D kzalloc(sizeof(*pool), GFP_KERNEL); > + if (!pool) > + return ERR_PTR(-ENOMEM); > + > + pool->device =3D dev; > + set_all_resource_max_limit(pool); > =20 > - if (!parent_css) { > - drmcs =3D &root_drmcs.drmcs; > - } else { > - drmcs =3D kzalloc(sizeof(*drmcs), GFP_KERNEL); > - if (!drmcs) > - return ERR_PTR(-ENOMEM); > + INIT_LIST_HEAD(&pool->cg_node); > + INIT_LIST_HEAD(&pool->dev_node); > + list_add_tail(&pool->cg_node, &drmcs->pools); > + list_add_tail(&pool->dev_node, &dev->pools); > + return pool; > +} > + > +void drmcg_unregister_device(struct drmcgroup_device *cgdev) > +{ > + struct drmcgroup_pool_state *pool, *next; > + > + mutex_lock(&drmcg_mutex); > + list_del(&cgdev->list); > + > + list_for_each_entry_safe(pool, next, &cgdev->pools, dev_node) > + free_cg_pool_locked(pool); > + mutex_unlock(&drmcg_mutex); > + kfree(cgdev->name); > +} > + > +EXPORT_SYMBOL_GPL(drmcg_unregister_device); > + > +int drmcg_register_device(struct drm_device *dev, > + struct drmcgroup_device *cgdev) > +{ > + char *name =3D kstrdup(dev->unique, GFP_KERNEL); > + if (!name) > + return -ENOMEM; > + > + INIT_LIST_HEAD(&cgdev->pools); > + mutex_lock(&drmcg_mutex); > + cgdev->name =3D name; > + list_add_tail(&cgdev->list, &drmcg_devices); > + mutex_unlock(&drmcg_mutex); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(drmcg_register_device); > + > +static int drmcg_max_show(struct seq_file *sf, void *v) > +{ > + struct drmcgroup_state *drmcs =3D css_to_drmcs(seq_css(sf)); > + struct drmcgroup_pool_state *pool; > + > + mutex_lock(&drmcg_mutex); > + list_for_each_entry(pool, &drmcs->pools, cg_node) { > + struct drmcgroup_device *dev =3D pool->device; > + int i; > + > + seq_puts(sf, dev->name); > + > + for (i =3D 0; i < DRMCG_MAX_REGIONS; i++) { > + if (!dev->regions[i].name) > + continue; > + > + if (pool->resources[i].max < S64_MAX) > + seq_printf(sf, " region.%s=3D%lld", dev->regions[i].name, > + pool->resources[i].max); > + else > + seq_printf(sf, " region.%s=3Dmax", dev->regions[i].name); > + } > + > + seq_putc(sf, '\n'); > } > + mutex_unlock(&drmcg_mutex); > =20 > - return &drmcs->css; > + return 0; > +} > + > +static struct drmcgroup_device *drmcg_get_device_locked(const char *name) > +{ > + struct drmcgroup_device *dev; > + > + lockdep_assert_held(&drmcg_mutex); > + > + list_for_each_entry(dev, &drmcg_devices, list) > + if (!strcmp(name, dev->name)) > + return dev; > + > + return NULL; > +} > + > +static void try_to_free_cg_pool_locked(struct drmcgroup_pool_state *pool) > +{ > + struct drmcgroup_device *dev =3D pool->device; > + u32 i; > + > + /* Memory charged to this pool */ > + if (pool->usage_sum) > + return; > + > + for (i =3D 0; i < DRMCG_MAX_REGIONS; i++) { > + if (!dev->regions[i].name) > + continue; > + > + /* Is a specific limit set? */ > + if (pool->resources[i].max < S64_MAX) > + return; > + } > + > + /* > + * No user of the pool and all entries are set to defaults; > + * safe to delete this pool. > + */ > + free_cg_pool_locked(pool); > +} > + > + > +static void > +uncharge_cg_locked(struct drmcgroup_state *drmcs, > + struct drmcgroup_device *cgdev, > + u32 index, u64 size) > +{ > + struct drmcgroup_pool_state *pool; > + > + pool =3D find_cg_pool_locked(drmcs, cgdev); > + > + if (unlikely(!pool)) { > + pr_warn("Invalid device %p or drm cgroup %p\n", cgdev, drmcs); > + return; > + } > + > + pool->resources[index].used -=3D size; > + > + /* > + * A negative count (or overflow) is invalid, > + * it indicates a bug in the rdma controller. > + */ > + WARN_ON_ONCE(pool->resources[index].used < 0); > + pool->usage_sum--; > + try_to_free_cg_pool_locked(pool); > +} > + > +static void drmcg_uncharge_hierarchy(struct drmcgroup_state *drmcs, > + struct drmcgroup_device *cgdev, > + struct drmcgroup_state *stop_cg, > + u32 index, u64 size) > +{ > + struct drmcgroup_state *p; > + > + mutex_lock(&drmcg_mutex); > + > + for (p =3D drmcs; p !=3D stop_cg; p =3D parent_drmcg(p)) > + uncharge_cg_locked(p, cgdev, index, size); > + > + mutex_unlock(&drmcg_mutex); > + > + css_put(&drmcs->css); > +} > + > +void drmcg_uncharge(struct drmcgroup_state *drmcs, > + struct drmcgroup_device *cgdev, > + u32 index, > + u64 size) > +{ > + if (index >=3D DRMCG_MAX_REGIONS) > + return; > + > + drmcg_uncharge_hierarchy(drmcs, cgdev, NULL, index, size); > +} > +EXPORT_SYMBOL_GPL(drmcg_uncharge); > + > +int drmcg_try_charge(struct drmcgroup_state **drmcs, > + struct drmcgroup_device *cgdev, > + u32 index, > + u64 size) > +{ > + struct drmcgroup_state *cg, *p; > + struct drmcgroup_pool_state *pool; > + u64 new; > + int ret =3D 0; > + > + if (index >=3D DRMCG_MAX_REGIONS) > + return -EINVAL; > + > + /* > + * hold on to css, as cgroup can be removed but resource > + * accounting happens on css. > + */ > + cg =3D get_current_drmcg(); 1) I am not familiar with the Xe flows - charging is at the point of actual=20 backing store allocation? What about buffer sharing? Also, given how the css is permanently stored in the caller - you=20 deliberately decided not to deal with task migrations? I am not sure=20 that will work. Or maybe just omitted for RFC v1? 2) Buffer objects which Xe can migrate between memory regions will be=20 correctly charge/uncharged as they are moved? Regards, Tvrtko > + > + mutex_lock(&drmcg_mutex); > + for (p =3D cg; p; p =3D parent_drmcg(p)) { > + pool =3D get_cg_pool_locked(p, cgdev); > + if (IS_ERR(pool)) { > + ret =3D PTR_ERR(pool); > + goto err; > + } else { > + new =3D pool->resources[index].used + size; > + if (new > pool->resources[index].max || new > S64_MAX) { > + ret =3D -EAGAIN; > + goto err; > + } else { > + pool->resources[index].used =3D new; > + pool->usage_sum++; > + } > + } > + } > + mutex_unlock(&drmcg_mutex); > + > + *drmcs =3D cg; > + return 0; > + > +err: > + mutex_unlock(&drmcg_mutex); > + drmcg_uncharge_hierarchy(cg, cgdev, p, index, size); > + return ret; > +} > +EXPORT_SYMBOL_GPL(drmcg_try_charge); > + > +static s64 parse_resource(char *c, char **retname) > +{ > + substring_t argstr; > + char *name, *value =3D c; > + size_t len; > + int ret; > + u64 retval; > + > + name =3D strsep(&value, "=3D"); > + if (!name || !value) > + return -EINVAL; > + > + /* Only support region setting for now */ > + if (strncmp(name, "region.", 7)) > + return -EINVAL; > + else > + name +=3D 7; > + > + *retname =3D name; > + len =3D strlen(value); > + > + argstr.from =3D value; > + argstr.to =3D value + len; > + > + ret =3D match_u64(&argstr, &retval); > + if (ret >=3D 0) { > + if (retval > S64_MAX) > + return -EINVAL; > + return retval; > + } > + if (!strncmp(value, "max", len)) > + return S64_MAX; > + > + /* Not u64 or max, error */ > + return -EINVAL; > +} > + > +static int drmcg_parse_limits(char *options, > + u64 *limits, char **enables) > +{ > + char *c; > + int num_limits =3D 0; > + > + /* parse resource options */ > + while ((c =3D strsep(&options, " ")) !=3D NULL) { > + s64 limit; > + > + if (num_limits >=3D DRMCG_MAX_REGIONS) > + return -EINVAL; > + > + limit =3D parse_resource(c, &enables[num_limits]); > + if (limit < 0) > + return limit; > + > + limits[num_limits++] =3D limit; > + } > + return num_limits; > +} > + > +static ssize_t drmcg_max_write(struct kernfs_open_file *of, > + char *buf, size_t nbytes, loff_t off) > +{ > + struct drmcgroup_state *drmcs =3D css_to_drmcs(of_css(of)); > + struct drmcgroup_device *dev; > + struct drmcgroup_pool_state *pool; > + char *options =3D strstrip(buf); > + char *dev_name =3D strsep(&options, " "); > + u64 limits[DRMCG_MAX_REGIONS]; > + u64 new_limits[DRMCG_MAX_REGIONS]; > + char *regions[DRMCG_MAX_REGIONS]; > + int num_limits, i; > + unsigned long set_mask =3D 0; > + int err =3D 0; > + > + if (!dev_name) > + return -EINVAL; > + > + num_limits =3D drmcg_parse_limits(options, limits, regions); > + if (num_limits < 0) > + return num_limits; > + if (!num_limits) > + return -EINVAL; > + > + /* > + * Everything is parsed into key=3Dvalue pairs now, take lock and attem= pt to update > + * For good measure, set -EINVAL when a key is set twice. > + */ > + mutex_lock(&drmcg_mutex); > + > + dev =3D drmcg_get_device_locked(dev_name); > + if (!dev) { > + err =3D -ENODEV; > + goto err; > + } > + > + pool =3D get_cg_pool_locked(drmcs, dev); > + if (IS_ERR(pool)) { > + err =3D PTR_ERR(pool); > + goto err; > + } > + > + /* Lookup region names and set new_limits to the index */ > + for (i =3D 0; i < num_limits; i++) { > + int j; > + > + for (j =3D 0; j < DRMCG_MAX_REGIONS; j++) > + if (dev->regions[j].name && > + !strcmp(regions[i], dev->regions[j].name)) > + break; > + > + if (j =3D=3D DRMCG_MAX_REGIONS || > + set_mask & BIT(j)) { > + err =3D -EINVAL; > + goto err_put; > + } > + > + set_mask |=3D BIT(j); > + new_limits[j] =3D limits[i]; > + } > + > + /* And commit */ > + for_each_set_bit(i, &set_mask, DRMCG_MAX_REGIONS) > + set_resource_max(pool, i, new_limits[i]); > + > +err_put: > + try_to_free_cg_pool_locked(pool); > +err: > + mutex_unlock(&drmcg_mutex); > + > + return err ?: nbytes; > +} > + > +static int drmcg_current_show(struct seq_file *sf, void *v) > +{ > + struct drmcgroup_state *drmcs =3D css_to_drmcs(seq_css(sf)); > + struct drmcgroup_device *dev; > + > + mutex_lock(&drmcg_mutex); > + list_for_each_entry(dev, &drmcg_devices, list) { > + struct drmcgroup_pool_state *pool =3D find_cg_pool_locked(drmcs, dev); > + int i; > + > + seq_puts(sf, dev->name); > + > + for (i =3D 0; i < DRMCG_MAX_REGIONS; i++) { > + if (!dev->regions[i].name) > + continue; > + > + seq_printf(sf, " region.%s=3D%lld", dev->regions[i].name, > + pool ? pool->resources[i].used : 0ULL); > + } > + > + seq_putc(sf, '\n'); > + } > + mutex_unlock(&drmcg_mutex); > + > + return 0; > +} > + > +static int drmcg_capacity_show(struct seq_file *sf, void *v) > +{ > + struct drmcgroup_device *dev; > + int i; > + > + list_for_each_entry(dev, &drmcg_devices, list) { > + seq_puts(sf, dev->name); > + for (i =3D 0; i < DRMCG_MAX_REGIONS; i++) > + if (dev->regions[i].name) > + seq_printf(sf, " region.%s=3D%lld", > + dev->regions[i].name, > + dev->regions[i].size); > + seq_putc(sf, '\n'); > + } > + return 0; > } > =20 > -struct cftype files[] =3D { > +static struct cftype files[] =3D { > + { > + .name =3D "max", > + .write =3D drmcg_max_write, > + .seq_show =3D drmcg_max_show, > + .flags =3D CFTYPE_NOT_ON_ROOT, > + }, > + { > + .name =3D "current", > + .seq_show =3D drmcg_current_show, > + .flags =3D CFTYPE_NOT_ON_ROOT, > + }, > + { > + .name =3D "capacity", > + .seq_show =3D drmcg_capacity_show, > + .flags =3D CFTYPE_ONLY_ON_ROOT, > + }, > { } /* Zero entry terminates. */ > }; > =20 > struct cgroup_subsys drm_cgrp_subsys =3D { > .css_alloc =3D drmcs_alloc, > .css_free =3D drmcs_free, > - .early_init =3D false, > + .css_offline =3D drmcs_offline, > .legacy_cftypes =3D files, > .dfl_cftypes =3D files, > };