From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 32F37CAC5B9 for ; Fri, 26 Sep 2025 11:31:17 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B7CE310EA30; Fri, 26 Sep 2025 11:31:16 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="kebQnehC"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) by gabe.freedesktop.org (Postfix) with ESMTPS id E8D2A10E382 for ; Fri, 26 Sep 2025 11:31:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1758886276; x=1790422276; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=mlkQSTlz8pzGySAoreat5SqVnHuxJYN2YhKY8/XJB7c=; b=kebQnehCakaswdHOa7UfRGsyK2BypCP6sU/hScvwnXIrn8RSQuz7/HuL GbLu7wyh+PpyjeqvLbl37J5ifbH6R3HeAwsu9Wgb44dRDgUB7oFdZHy/b S8g9C9kuIF01uHUVfovvy7gzLAY6gbNIPjlCImQHfl/EfhhQ5G8tI/Op8 QoDfCpo58u77dI+vXxuuGjemlw81eVSUBs+PQY63M1CPHSdP4pha8RUhE EFfh20wHk+7fX0pkRjD9qhBIAXbufzLYJWedGdVcv6BOihaaHn/UdleU7 /ybMX+z6hMvNvmUAjs97dVay2iO1YDv/KLMzSuE7Oy8GZ4idLBJu+qWD0 Q==; X-CSE-ConnectionGUID: ZeagkkHYQ2y4OzUEDH4ReA== X-CSE-MsgGUID: UHoyJtBzQS+ysT4sX9bfqw== X-IronPort-AV: E=McAfee;i="6800,10657,11564"; a="60916867" X-IronPort-AV: E=Sophos;i="6.18,295,1751266800"; d="scan'208";a="60916867" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Sep 2025 04:31:15 -0700 X-CSE-ConnectionGUID: pkTzAvG4QM6ti1r2W47/2g== X-CSE-MsgGUID: 8JjA+Z75ShKcec07ogetcw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.18,295,1751266800"; d="scan'208";a="182862217" Received: from hrotuna-mobl2.ger.corp.intel.com (HELO localhost) ([10.245.246.10]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Sep 2025 04:31:11 -0700 From: Jani Nikula To: Aakash Deep Sarkar , intel-xe@lists.freedesktop.org Cc: jeevaka.badrappan@intel.com, rodrigo.vivi@intel.com, matthew.brost@intel.com, carlos.santa@intel.com, matthew.auld@intel.com, Aakash Deep Sarkar Subject: Re: [PATCH v4 6/9] drm/xe: Implement xe_work_period_worker In-Reply-To: <20250926104521.1815428-7-aakash.deep.sarkar@intel.com> Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo References: <20250926104521.1815428-1-aakash.deep.sarkar@intel.com> <20250926104521.1815428-7-aakash.deep.sarkar@intel.com> Date: Fri, 26 Sep 2025 14:31:08 +0300 Message-ID: <3cf9a71439b5d79751f6b7879bdc6b917d023d2d@intel.com> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Fri, 26 Sep 2025, Aakash Deep Sarkar wrote: > The work of collecting the GPU run time for a given > xe_user and emitting its event, is done by the > xe_work_period_worker kworker. At the time of creation > of a new xe_user, we simultaneously start a delayed > kworker thread. The delay of execution is set to be > 500 ms. After the completion of the work, the kworker > schedules itself for the next execution. This is done > as long as the reference to the xe_user pointer is > valid. > > During each execution cycle the xe_work_period_worker > iterates over all the xe files in the xe_user::filelist > and accumulate their corresponding GPU runtime into the > xe_user::active_duration_ns; while also updating each of > the xe_file::active_duration_ns. The total runtime for > this uid in the current sampling period is the delta > between the previous xe_user::active_duration_ns and > the current xe_user::active_duration_ns. > > We also record the current timestamp at the end of each > invocation to xe_work_period_worker function in the > xe_user::last_timestamp_ns. The sampling period for this > uid is the delta between the previous timestamp and the > current timestamp. > > Signed-off-by: Aakash Deep Sarkar > --- > drivers/gpu/drm/xe/xe_device.c | 13 ++-- > drivers/gpu/drm/xe/xe_pm.c | 5 ++ > drivers/gpu/drm/xe/xe_user.c | 127 +++++++++++++++++++++++++++++++-- > drivers/gpu/drm/xe/xe_user.h | 21 ++++-- > 4 files changed, 149 insertions(+), 17 deletions(-) > > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c > index 837c23784388..5569a27abb09 100644 > --- a/drivers/gpu/drm/xe/xe_device.c > +++ b/drivers/gpu/drm/xe/xe_device.c > @@ -81,11 +81,9 @@ static int xe_file_open(struct drm_device *dev, struct drm_file *file) > { > struct xe_device *xe = to_xe_device(dev); > struct xe_drm_client *client; > - struct xe_user *user; > struct xe_file *xef; > int ret = -ENOMEM; > int uid = -EINVAL; > - u32 idx; > struct task_struct *task = NULL; > const struct cred *cred = NULL; > > @@ -142,11 +140,12 @@ static void xe_file_destroy(struct kref *ref) > xe_drm_client_put(xef->client); > kfree(xef->process_name); > > - mutex_lock(&xef->user->filelist_lock); > - list_del(&xef->user_link); > - mutex_unlock(&xef->user->filelist_lock); > - > - xe_user_put(xef->user); > + if (xef->user) { > + mutex_lock(&xef->user->lock); > + list_del(&xef->user_link); > + xe_user_put(xef->user); > + mutex_unlock(&xef->user->lock); > + } > kfree(xef); > } > > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c > index b7e3094f8acf..c7add2616189 100644 > --- a/drivers/gpu/drm/xe/xe_pm.c > +++ b/drivers/gpu/drm/xe/xe_pm.c > @@ -26,6 +26,7 @@ > #include "xe_pxp.h" > #include "xe_sriov_vf_ccs.h" > #include "xe_trace.h" > +#include "xe_user.h" > #include "xe_vm.h" > #include "xe_wa.h" > > @@ -598,6 +599,8 @@ int xe_pm_runtime_suspend(struct xe_device *xe) > > xe_i2c_pm_suspend(xe); > > + xe_user_cancel_workers(xe); > + > xe_rpm_lockmap_release(xe); > xe_pm_write_callback_task(xe, NULL); > return 0; > @@ -650,6 +653,8 @@ int xe_pm_runtime_resume(struct xe_device *xe) > > xe_i2c_pm_resume(xe, xe->d3cold.allowed); > > + xe_user_resume_workers(xe); > + > xe_irq_resume(xe); > > for_each_gt(gt, xe, id) > diff --git a/drivers/gpu/drm/xe/xe_user.c b/drivers/gpu/drm/xe/xe_user.c > index 846c6451140b..19b28bcada0f 100644 > --- a/drivers/gpu/drm/xe/xe_user.c > +++ b/drivers/gpu/drm/xe/xe_user.c > @@ -6,17 +6,95 @@ > #include > #include > > +#include "xe_assert.h" > +#include "xe_device_types.h" > +#include "xe_exec_queue.h" > +#include "xe_pm.h" > #include "xe_user.h" > > +#define CREATE_TRACE_POINTS > +#include > + > +static inline void schedule_next_work(struct xe_device *xe, unsigned int id) > +{ > + struct xe_user *user; > + > + mutex_lock(&xe->work_period.lock); > + user = xa_load(&xe->work_period.users, id); > + if (user && xe_user_get_unless_zero(user)) > + schedule_delayed_work(&user->delay_work, > + msecs_to_jiffies(XE_WORK_PERIOD_INTERVAL)); > + mutex_unlock(&xe->work_period.lock); > +} > /** > * worker thread to emit gpu work period event for this xe user > * @work: work instance for this xe user > * > * Return: void > */ > -static inline void work_period_worker(struct work_struct *work) > +static void xe_work_period_worker(struct work_struct *work) > { > - //TODO: Implement this worker > + struct xe_user *user = container_of(work, struct xe_user, delay_work.work); > + struct xe_device *xe = user->xe; > + struct xe_file *xef; > + struct xe_exec_queue *q; > + > + /* > + * The GPU work period event requires the following parameters > + * > + * gpuid: GPU index in case the platform has more than one GPU > + * uid: user id of the app > + * start_time: start time for the sampling period in nanosecs > + * end_time: end time for the sampling period in nanosecs > + * active_duration: Total runtime in nanosecs for this uid in > + * the current sampling period. > + */ > + u32 gpuid = 0, uid = user->uid, id = user->id; > + u64 start_time, end_time, active_duration; > + u64 last_active_duration, last_timestamp; > + unsigned long i; > + > + mutex_lock(&user->lock); > + > + // Save the last recorded active duration and timestamp > + last_active_duration = user->active_duration_ns; > + last_timestamp = user->last_timestamp_ns; > + > + if (xe_pm_runtime_get_if_active(xe)) { > + > + list_for_each_entry(xef, &user->filelist, user_link) { > + > + wait_var_event(&xef->exec_queue.pending_removal, > + !atomic_read(&xef->exec_queue.pending_removal)); > + > + /* Accumulate all the exec queues from this file */ > + mutex_lock(&xef->exec_queue.lock); > + xa_for_each(&xef->exec_queue.xa, i, q) { > + xe_exec_queue_get(q); > + mutex_unlock(&xef->exec_queue.lock); > + > + xe_exec_queue_update_run_ticks(q); > + > + mutex_lock(&xef->exec_queue.lock); > + xe_exec_queue_put(q); > + } > + mutex_unlock(&xef->exec_queue.lock); > + user->active_duration_ns += xef->active_duration_ns; > + } > + > + xe_pm_runtime_put(xe); > + > + start_time = last_timestamp + 1; > + end_time = ktime_get_raw_ns(); > + active_duration = user->active_duration_ns - last_active_duration; > + trace_gpu_work_period(gpuid, uid, start_time, end_time, active_duration); > + user->last_timestamp_ns = end_time; > + xe_user_put(user); > + } > + > + mutex_unlock(&user->lock); > + > + schedule_next_work(xe, id); > } > > /** > @@ -38,9 +116,9 @@ static struct xe_user *xe_user_alloc(void) > return NULL; > > kref_init(&user->refcount); > - mutex_init(&user->filelist_lock); > + mutex_init(&user->lock); > INIT_LIST_HEAD(&user->filelist); > - INIT_WORK(&user->work, work_period_worker); > + INIT_DELAYED_WORK(&user->delay_work, xe_work_period_worker); > return user; > } > > @@ -120,12 +198,49 @@ int xe_user_init(struct xe_device *xe, struct xe_file *xef, unsigned int uid) > > user->id = idx; > drm_dev_get(&xe->drm); > + > + xe_user_get(user); > + if (!schedule_delayed_work(&user->delay_work, > + msecs_to_jiffies(XE_WORK_PERIOD_INTERVAL))) > + xe_user_put(user); > } > > - mutex_lock(&user->filelist_lock); > + mutex_lock(&user->lock); > list_add(&xef->user_link, &user->filelist); > - mutex_unlock(&user->filelist_lock); > + mutex_unlock(&user->lock); > xef->user = user; > > return 0; > } > + > +void xe_user_cancel_workers(struct xe_device *xe) > +{ > + struct xe_user *user = NULL; > + unsigned long i = 0; > + > + mutex_lock(&xe->work_period.lock); > + xa_for_each(&xe->work_period.users, i, user) { > + if (user && xe_user_get_unless_zero(user)) { > + cancel_delayed_work_sync(&user->delay_work); > + xe_user_put(user); > + } > + } > + mutex_unlock(&xe->work_period.lock); > +} > + > +void xe_user_resume_workers(struct xe_device *xe) > +{ > + struct xe_user *user = NULL; > + unsigned long i = 0; > + > + mutex_lock(&xe->work_period.lock); > + xa_for_each(&xe->work_period.users, i, user) { > + if (user && xe_user_get_unless_zero(user)) { > + if (!schedule_delayed_work(&user->delay_work, > + msecs_to_jiffies(XE_WORK_PERIOD_INTERVAL))) > + xe_user_put(user); > + } > + } > + mutex_unlock(&xe->work_period.lock); > +} > + > diff --git a/drivers/gpu/drm/xe/xe_user.h b/drivers/gpu/drm/xe/xe_user.h > index b13130cc9492..ded816be7334 100644 > --- a/drivers/gpu/drm/xe/xe_user.h > +++ b/drivers/gpu/drm/xe/xe_user.h > @@ -11,9 +11,11 @@ > #include > #include > > -#include "xe_device.h" > +#include "xe_device_types.h" This shouldn't be needed, neither should. > > > +#define XE_WORK_PERIOD_INTERVAL 500 > + > /** > * This is a per process/user id structure for a xe device > * client. It is allocated when a new process/app opens the > @@ -32,9 +34,9 @@ struct xe_user { > struct xe_device *xe; > > /** > - * @filelist_lock: lock protecting the filelist > + * @filelist_lock: lock protecting this structure > */ > - struct mutex filelist_lock; > + struct mutex lock; > > /** > * @filelist: list of xe files belonging to this xe user > @@ -45,7 +47,7 @@ struct xe_user { > * @work: work to emit the gpu work period event for this > * xe user > */ > - struct work_struct work; > + struct delayed_work delay_work; > > /** > * @id: index of this user into the xe device users array > @@ -72,6 +74,17 @@ struct xe_user { > > int xe_user_init(struct xe_device *xe, struct xe_file *xef, unsigned int uid); > > +void xe_user_cancel_workers(struct xe_device *xe); > + > +void xe_user_resume_workers(struct xe_device *xe); > + > +static inline struct xe_user * > +xe_user_get_unless_zero(struct xe_user *user) > +{ > + if (kref_get_unless_zero(&user->refcount)) > + return user; > + return NULL; > +} This is only ever used in xe_user.c. There's no need to add it to a header. > > static inline struct xe_user * > xe_user_get(struct xe_user *user) -- Jani Nikula, Intel