From: Tvrtko Ursulin <tvrtko.ursulin-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
To: "Michal Koutný" <mkoutny-IBi9RG/b67k@public.gmane.org>
Cc: Intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
"Tejun Heo" <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
"Johannes Weiner"
<hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
"Zefan Li" <lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>,
"Dave Airlie" <airlied-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
"Daniel Vetter" <daniel.vetter-/w4YWyX8dFk@public.gmane.org>,
"Rob Clark" <robdclark-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>,
"Stéphane Marchesin"
<marcheu-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>,
"T . J . Mercier"
<tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Kenny.Ho-5C7GfCeVMHo@public.gmane.org,
"Christian König" <christian.koenig-5C7GfCeVMHo@public.gmane.org>,
"Brian Welty"
<brian.welty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
"Tvrtko Ursulin"
<tvrtko.ursulin-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Subject: Re: [RFC 10/12] cgroup/drm: Introduce weight based drm cgroup control
Date: Fri, 27 Jan 2023 13:31:54 +0000 [thread overview]
Message-ID: <a96e6b5c-b538-f7e7-d603-cabb29137de7@linux.intel.com> (raw)
In-Reply-To: <20230127130134.GA15846-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
On 27/01/2023 13:01, Michal Koutný wrote:
> On Thu, Jan 12, 2023 at 04:56:07PM +0000, Tvrtko Ursulin <tvrtko.ursulin-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> wrote:
>> +static int drmcs_can_attach(struct cgroup_taskset *tset)
>> +{
>> + int ret;
>> +
>> + /*
>> + * As processes are getting moved between groups we need to ensure
>> + * both that the old group does not see a sudden downward jump in the
>> + * GPU utilisation, and that the new group does not see a sudden jump
>> + * up with all the GPU time clients belonging to the migrated process
>> + * have accumulated.
>> + *
>> + * To achieve that we suspend the scanner until the migration is
>> + * completed where the resume at the end ensures both groups start
>> + * observing GPU utilisation from a reset state.
>> + */
>> +
>> + ret = mutex_lock_interruptible(&drmcg_mutex);
>> + if (ret)
>> + return ret;
>> + start_suspend_scanning();
>> + mutex_unlock(&drmcg_mutex);
>> +
>> + finish_suspend_scanning();
>
> Here's scanning suspension, communicated via
>
> root_drmcs.scanning_suspended = true;
> root_drmcs.suspended_period_us = root_drmcs.period_us;
> root_drmcs.period_us = 0;
>
> but I don't see those used in scan_worker() and the scanning traversal
> can apparently run concurrently with a task migration.
I think you missed the finish_suspend_scanning() part:
if (root_drmcs.suspended_period_us)
cancel_delayed_work_sync(&root_drmcs.scan_work);
So if scanning was in progress migration will wait until it finishes.
And re-start only when migration is done (drmcs_attach), or it failed
(drmcs_cancel_attach).
Not claiming I did not miss something because I was totally new with
cgroup internals when I started working on this. So it is definitely
useful to have more eyes looking.
>> [...]
>> +static bool
>> +__start_scanning(struct drm_cgroup_state *root, unsigned int period_us)
>> [...]
>> + css_for_each_descendant_post(node, &root->css) {
>> [...]
>> + active = drmcs_get_active_time_us(drmcs);
>> + if (period_us && active > drmcs->prev_active_us)
>> + drmcs->active_us += active - drmcs->prev_active_us;
>> + drmcs->prev_active_us = active;
>
> drmcs_get_active_time_us() could count a task's contribution here,
> the task would migrate to a different drmcs,
> and it'd be counted 2nd time.
Lets see.. __start_scanning() can be called from the worker, so max one
instance at a time, no issue.
Then from resume scanning, so it is guaranteed worker is not running and
can't restart since mutex guards the re-start.
Finally from drmcs_write_period_us() - yes there __start_scanning() can
race with it being invoked by the worker - oops! However.. this is just
a debugging aid as the cover letter explains. This file is not intended
to be present in the final version, rather as per earlier discussion
with Tejun the idea is to only have boot time option to control the
functionality (enable/disable or period).
I will nevertheless try to fix this race up for the next posting to
avoid further confusion!
Regards,
Tvrtko
next prev parent reply other threads:[~2023-01-27 13:31 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-12 16:55 [RFC v3 00/12] DRM scheduling cgroup controller Tvrtko Ursulin
2023-01-12 16:55 ` [RFC 01/12] drm: Track clients by tgid and not tid Tvrtko Ursulin
2023-01-12 16:55 ` [RFC 02/12] drm: Update file owner during use Tvrtko Ursulin
2023-01-12 16:56 ` [RFC 03/12] cgroup: Add the DRM cgroup controller Tvrtko Ursulin
2023-01-12 16:56 ` [RFC 04/12] drm/cgroup: Track clients per owning process Tvrtko Ursulin
2023-01-17 16:03 ` Stanislaw Gruszka
2023-01-17 16:25 ` Tvrtko Ursulin
2023-01-12 16:56 ` [RFC 05/12] drm/cgroup: Allow safe external access to file_priv Tvrtko Ursulin
2023-01-12 16:56 ` [RFC 06/12] drm/cgroup: Add ability to query drm cgroup GPU time Tvrtko Ursulin
2023-01-12 16:56 ` [RFC 07/12] drm/cgroup: Add over budget signalling callback Tvrtko Ursulin
2023-01-12 16:56 ` [RFC 08/12] drm/cgroup: Only track clients which are providing drm_cgroup_ops Tvrtko Ursulin
2023-01-12 16:56 ` [RFC 09/12] cgroup/drm: Client exit hook Tvrtko Ursulin
2023-01-12 16:56 ` [RFC 10/12] cgroup/drm: Introduce weight based drm cgroup control Tvrtko Ursulin
[not found] ` <20230112165609.1083270-11-tvrtko.ursulin-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2023-01-27 13:01 ` Michal Koutný
[not found] ` <20230127130134.GA15846-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2023-01-27 13:31 ` Tvrtko Ursulin [this message]
2023-01-27 14:11 ` Michal Koutný
[not found] ` <20230127141136.GG3527-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2023-01-27 15:21 ` Tvrtko Ursulin
2023-01-28 1:11 ` Tejun Heo
[not found] ` <Y9R2N8sl+7f8Zacv-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2023-02-02 14:26 ` Tvrtko Ursulin
[not found] ` <27b7882e-1201-b173-6f56-9ececb5780e8-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2023-02-02 20:00 ` Tejun Heo
2023-01-12 16:56 ` [RFC 11/12] drm/i915: Wire up with drm controller GPU time query Tvrtko Ursulin
2023-01-12 16:56 ` [RFC 12/12] drm/i915: Implement cgroup controller over budget throttling Tvrtko Ursulin
[not found] ` <20230112165609.1083270-1-tvrtko.ursulin-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2023-01-23 15:42 ` [RFC v3 00/12] DRM scheduling cgroup controller Michal Koutný
[not found] ` <20230123154239.GA24348-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2023-01-25 18:11 ` Tvrtko Ursulin
[not found] ` <371f3ce5-3468-b91d-d688-7e89499ff347-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2023-01-26 13:00 ` Michal Koutný
2023-01-26 17:04 ` Tejun Heo
[not found] ` <Y9KyiCPYj2Mzym3Z-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2023-01-26 17:57 ` Tvrtko Ursulin
2023-01-26 18:14 ` Tvrtko Ursulin
[not found] ` <b8a0872c-fe86-b174-ca3b-0fc04a98e224-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2023-01-27 10:04 ` Michal Koutný
2023-01-27 11:40 ` Tvrtko Ursulin
2023-01-27 13:00 ` Michal Koutný
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a96e6b5c-b538-f7e7-d603-cabb29137de7@linux.intel.com \
--to=tvrtko.ursulin-vuqaysv1563yd54fqh9/ca@public.gmane.org \
--cc=Intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
--cc=Kenny.Ho-5C7GfCeVMHo@public.gmane.org \
--cc=airlied-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=brian.welty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=christian.koenig-5C7GfCeVMHo@public.gmane.org \
--cc=daniel.vetter-/w4YWyX8dFk@public.gmane.org \
--cc=dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org \
--cc=marcheu-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org \
--cc=mkoutny-IBi9RG/b67k@public.gmane.org \
--cc=robdclark-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=tvrtko.ursulin-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox