From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0ABA3C6FA8F for ; Tue, 29 Aug 2023 17:37:21 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A1B7410E3EA; Tue, 29 Aug 2023 17:37:20 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7990310E3E8 for ; Tue, 29 Aug 2023 17:37:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1693330636; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KyEGNQ+lTvchkCom6b2unYu88z//ucaip98vI4R0Pg8=; b=cIrCIj3YSmk3pP/9N0ttzJ/pWiIjIzZQ9vf6sHVpZkG12ZNb/VymFI+ZJtKXOTvNywJ5Gm fOZPICkCAeZBHIHCkoDl5aoNYVkAbdCGSvkPmcFqnA1CwBDPGVB1sHwQAvRPf2pkjiGmYD N5AUauij3IE8Y1Ci1iU57TIYg1Om48Y= Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-403-TKnxCOTQPTGPVVrjARNxWg-1; Tue, 29 Aug 2023 13:37:14 -0400 X-MC-Unique: TKnxCOTQPTGPVVrjARNxWg-1 Received: by mail-ed1-f70.google.com with SMTP id 4fb4d7f45d1cf-52310058f1eso65539a12.0 for ; Tue, 29 Aug 2023 10:37:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693330633; x=1693935433; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=KyEGNQ+lTvchkCom6b2unYu88z//ucaip98vI4R0Pg8=; b=GRohMxlPkxo/k69UdI2PV2nI0bsugbR6sMzNEJ+ZcavQX7Q4moUSFaqxyEUiKqm6mH 3OqDprrs8dsZf1iJm/crN5TxU1lfsS7CeF3DC4viYYV119tKj57GfPOoj6wGE9AFxQbL Ilrmt9E8N1cFYWVu6IgCTSwezkNQWPLR6yx/yzHGiNw5768DabRqGXOloaNXnSLJAhHa oiyG1Qpz2kVXtInRFLRw029XgXutMzq8R0By+G/TVz6CxnR9XYYLYbpy628woDEVTpn9 2meAmc7HtDl9Gxi7swXUNABZ1RgKSziuAi5/e1h8tZ44rH165xxR1Qj7j8cHhwPcBZ5p Z3lg== X-Gm-Message-State: AOJu0YxJR8q41tE/9m3eH2xvJ+wqBpEufxrMpubfdRZIL4IGQ8Bu+6iF vlKp4UBt1Nzpn/dfM4fziSOnzxQl/ilIxHU46wyxrSb7wI3S49BzPPujCkn7RdPAoAJKGImxlm1 /pK8F7Gf5aMJpr3O/ptKc8WaPf/g= X-Received: by 2002:a05:6402:5111:b0:51f:ef58:da87 with SMTP id m17-20020a056402511100b0051fef58da87mr3893459edd.2.1693330633361; Tue, 29 Aug 2023 10:37:13 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHoEjrZOiwz2sA2UFiYfMz3t8ux+EeFQKNkyB36iNEgXzUfZTe1Bc+dq0hrkpsHmkUxUJ1wFA== X-Received: by 2002:a05:6402:5111:b0:51f:ef58:da87 with SMTP id m17-20020a056402511100b0051fef58da87mr3893398edd.2.1693330632828; Tue, 29 Aug 2023 10:37:12 -0700 (PDT) Received: from ?IPV6:2a02:810d:4b3f:de9c:642:1aff:fe31:a15c? ([2a02:810d:4b3f:de9c:642:1aff:fe31:a15c]) by smtp.gmail.com with ESMTPSA id y9-20020aa7d509000000b0052239012c65sm5883259edq.82.2023.08.29.10.37.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 29 Aug 2023 10:37:12 -0700 (PDT) Message-ID: <21b82a4e-ded8-cb63-0013-72e4043c3344@redhat.com> Date: Tue, 29 Aug 2023 19:37:10 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 To: Matthew Brost References: <20230811023137.659037-1-matthew.brost@intel.com> <20230811023137.659037-4-matthew.brost@intel.com> From: Danilo Krummrich Organization: RedHat In-Reply-To: <20230811023137.659037-4-matthew.brost@intel.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Intel-xe] [PATCH v2 3/9] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: robdclark@chromium.org, sarah.walker@imgtec.com, ketil.johnsen@arm.com, lina@asahilina.net, Liviu.Dudau@arm.com, dri-devel@lists.freedesktop.org, intel-xe@lists.freedesktop.org, luben.tuikov@amd.com, donald.robson@imgtec.com, boris.brezillon@collabora.com, christian.koenig@amd.com, faith.ekstrand@collabora.com Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On 8/11/23 04:31, Matthew Brost wrote: > DRM_SCHED_POLICY_SINGLE_ENTITY creates a 1 to 1 relationship between > scheduler and entity. No priorities or run queue used in this mode. > Intended for devices with firmware schedulers. > > v2: > - Drop sched / rq union (Luben) > > Signed-off-by: Matthew Brost > --- > drivers/gpu/drm/scheduler/sched_entity.c | 69 ++++++++++++++++++------ > drivers/gpu/drm/scheduler/sched_fence.c | 2 +- > drivers/gpu/drm/scheduler/sched_main.c | 63 +++++++++++++++++++--- > include/drm/gpu_scheduler.h | 8 +++ > 4 files changed, 119 insertions(+), 23 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c > index 65a972b52eda..1dec97caaba3 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -83,6 +83,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, > memset(entity, 0, sizeof(struct drm_sched_entity)); > INIT_LIST_HEAD(&entity->list); > entity->rq = NULL; > + entity->single_sched = NULL; > entity->guilty = guilty; > entity->num_sched_list = num_sched_list; > entity->priority = priority; > @@ -90,8 +91,17 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, > RCU_INIT_POINTER(entity->last_scheduled, NULL); > RB_CLEAR_NODE(&entity->rb_tree_node); > > - if(num_sched_list) > - entity->rq = &sched_list[0]->sched_rq[entity->priority]; > + if (num_sched_list) { > + if (sched_list[0]->sched_policy != > + DRM_SCHED_POLICY_SINGLE_ENTITY) { > + entity->rq = &sched_list[0]->sched_rq[entity->priority]; > + } else { > + if (num_sched_list != 1 || sched_list[0]->single_entity) > + return -EINVAL; > + sched_list[0]->single_entity = entity; > + entity->single_sched = sched_list[0]; > + } > + } > > init_completion(&entity->entity_idle); > > @@ -124,7 +134,8 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity, > struct drm_gpu_scheduler **sched_list, > unsigned int num_sched_list) > { > - WARN_ON(!num_sched_list || !sched_list); > + WARN_ON(!num_sched_list || !sched_list || > + !!entity->single_sched); > > entity->sched_list = sched_list; > entity->num_sched_list = num_sched_list; > @@ -231,13 +242,15 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity) > { > struct drm_sched_job *job; > struct dma_fence *prev; > + bool single_entity = !!entity->single_sched; > > - if (!entity->rq) > + if (!entity->rq && !single_entity) > return; > > spin_lock(&entity->rq_lock); > entity->stopped = true; > - drm_sched_rq_remove_entity(entity->rq, entity); > + if (!single_entity) > + drm_sched_rq_remove_entity(entity->rq, entity); Looks like nothing prevents drm_sched_run_job_work() to fetch more jobs from the entity, hence if this is called for an entity still having queued up jobs and a still running scheduler, drm_sched_entity_kill() and drm_sched_run_job_work() would race for jobs, right? Not sure if this is by intention because we don't expect the driver to drm_sched_entity_fini() as long as there are still queued up jobs. At least this is inconsistant to what drm_sched_entity_kill() does without DRM_SCHED_POLICY_SINGLE_ENTITY and should either be fixed or documented if we agree nothing else makes sense. I think it also touches my question on how to tear down the scheduler once a ring is removed or deinitialized. I know XE is going its own way in this respect, but I also feel like we're leaving drivers potentially being interested in DRM_SCHED_POLICY_SINGLE_ENTITY a bit alone on that. I think we should probably give drivers a bit more guidance on how to do that. Currently, I see two approaches. (1) Do what XE does, which means letting the scheduler run dry, which includes both the entity's job queue and the schedulers pending_list. While jobs from the entity's queue aren't pushing any more work to the ring on tear down, but just "flow through" to get freed up eventually. (Hopefully I got that right.) (2) Kill the entity to cleanup jobs from the entity's queue, stop the scheduler and either just wait for pending jobs or signal them right away and finally free them up. Actually there'd also be (3), which could be a mix of both, discard the entity's queued jobs, but let the pending_list run dry. I'm not saying we should provide a whole bunch of infrastructure for drivers, e.g. for (1) as you've mentioned already as well, there is probably not much to generalize anyway. However, I think we should document the options drivers have to tear things down and do enough to enable drivers using any option (as long as we agree it is reasonable). For Nouveau specifically, I'd probably like to go with (3). - Danilo > spin_unlock(&entity->rq_lock); > > /* Make sure this entity is not used by the scheduler at the moment */ > @@ -259,6 +272,20 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity) > dma_fence_put(prev); > } > > +/** > + * drm_sched_entity_to_scheduler - Schedule entity to GPU scheduler > + * @entity: scheduler entity > + * > + * Returns GPU scheduler for the entity > + */ > +struct drm_gpu_scheduler * > +drm_sched_entity_to_scheduler(struct drm_sched_entity *entity) > +{ > + bool single_entity = !!entity->single_sched; > + > + return single_entity ? entity->single_sched : entity->rq->sched; > +} > + > /** > * drm_sched_entity_flush - Flush a context entity > * > @@ -276,11 +303,12 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout) > struct drm_gpu_scheduler *sched; > struct task_struct *last_user; > long ret = timeout; > + bool single_entity = !!entity->single_sched; > > - if (!entity->rq) > + if (!entity->rq && !single_entity) > return 0; > > - sched = entity->rq->sched; > + sched = drm_sched_entity_to_scheduler(entity); > /** > * The client will not queue more IBs during this fini, consume existing > * queued IBs or discard them on SIGKILL > @@ -373,7 +401,7 @@ static void drm_sched_entity_wakeup(struct dma_fence *f, > container_of(cb, struct drm_sched_entity, cb); > > drm_sched_entity_clear_dep(f, cb); > - drm_sched_wakeup_if_can_queue(entity->rq->sched); > + drm_sched_wakeup_if_can_queue(drm_sched_entity_to_scheduler(entity)); > } > > /** > @@ -387,6 +415,8 @@ static void drm_sched_entity_wakeup(struct dma_fence *f, > void drm_sched_entity_set_priority(struct drm_sched_entity *entity, > enum drm_sched_priority priority) > { > + WARN_ON(!!entity->single_sched); > + > spin_lock(&entity->rq_lock); > entity->priority = priority; > spin_unlock(&entity->rq_lock); > @@ -399,7 +429,7 @@ EXPORT_SYMBOL(drm_sched_entity_set_priority); > */ > static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity) > { > - struct drm_gpu_scheduler *sched = entity->rq->sched; > + struct drm_gpu_scheduler *sched = drm_sched_entity_to_scheduler(entity); > struct dma_fence *fence = entity->dependency; > struct drm_sched_fence *s_fence; > > @@ -501,7 +531,8 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) > * Update the entity's location in the min heap according to > * the timestamp of the next job, if any. > */ > - if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO) { > + if (drm_sched_entity_to_scheduler(entity)->sched_policy == > + DRM_SCHED_POLICY_FIFO) { > struct drm_sched_job *next; > > next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); > @@ -524,6 +555,8 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity) > struct drm_gpu_scheduler *sched; > struct drm_sched_rq *rq; > > + WARN_ON(!!entity->single_sched); > + > /* single possible engine and already selected */ > if (!entity->sched_list) > return; > @@ -573,12 +606,13 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity) > void drm_sched_entity_push_job(struct drm_sched_job *sched_job) > { > struct drm_sched_entity *entity = sched_job->entity; > - bool first, fifo = entity->rq->sched->sched_policy == > - DRM_SCHED_POLICY_FIFO; > + bool single_entity = !!entity->single_sched; > + bool first; > ktime_t submit_ts; > > trace_drm_sched_job(sched_job, entity); > - atomic_inc(entity->rq->sched->score); > + if (!single_entity) > + atomic_inc(entity->rq->sched->score); > WRITE_ONCE(entity->last_user, current->group_leader); > > /* > @@ -591,6 +625,10 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) > > /* first job wakes up scheduler */ > if (first) { > + struct drm_gpu_scheduler *sched = > + drm_sched_entity_to_scheduler(entity); > + bool fifo = sched->sched_policy == DRM_SCHED_POLICY_FIFO; > + > /* Add the entity to the run queue */ > spin_lock(&entity->rq_lock); > if (entity->stopped) { > @@ -600,13 +638,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) > return; > } > > - drm_sched_rq_add_entity(entity->rq, entity); > + if (!single_entity) > + drm_sched_rq_add_entity(entity->rq, entity); > spin_unlock(&entity->rq_lock); > > if (fifo) > drm_sched_rq_update_fifo(entity, submit_ts); > > - drm_sched_wakeup_if_can_queue(entity->rq->sched); > + drm_sched_wakeup_if_can_queue(sched); > } > } > EXPORT_SYMBOL(drm_sched_entity_push_job); > diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c > index 06cedfe4b486..f6b926f5e188 100644 > --- a/drivers/gpu/drm/scheduler/sched_fence.c > +++ b/drivers/gpu/drm/scheduler/sched_fence.c > @@ -225,7 +225,7 @@ void drm_sched_fence_init(struct drm_sched_fence *fence, > { > unsigned seq; > > - fence->sched = entity->rq->sched; > + fence->sched = drm_sched_entity_to_scheduler(entity); > seq = atomic_inc_return(&entity->fence_seq); > dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled, > &fence->lock, entity->fence_context, seq); > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 545d5298c086..cede47afc800 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -32,7 +32,8 @@ > * backend operations to the scheduler like submitting a job to hardware run queue, > * returning the dependencies of a job etc. > * > - * The organisation of the scheduler is the following: > + * The organisation of the scheduler is the following for scheduling policies > + * DRM_SCHED_POLICY_RR and DRM_SCHED_POLICY_FIFO: > * > * 1. Each hw run queue has one scheduler > * 2. Each scheduler has multiple run queues with different priorities > @@ -43,6 +44,23 @@ > * > * The jobs in a entity are always scheduled in the order that they were pushed. > * > + * The organisation of the scheduler is the following for scheduling policy > + * DRM_SCHED_POLICY_SINGLE_ENTITY: > + * > + * 1. One to one relationship between scheduler and entity > + * 2. No priorities implemented per scheduler (single job queue) > + * 3. No run queues in scheduler rather jobs are directly dequeued from entity > + * 4. The entity maintains a queue of jobs that will be scheduled on the > + * hardware > + * > + * The jobs in a entity are always scheduled in the order that they were pushed > + * regardless of scheduling policy. > + * > + * A policy of DRM_SCHED_POLICY_RR or DRM_SCHED_POLICY_FIFO is expected to used > + * when the KMD is scheduling directly on the hardware while a scheduling policy > + * of DRM_SCHED_POLICY_SINGLE_ENTITY is expected to be used when there is a > + * firmware scheduler. > + * > * Note that once a job was taken from the entities queue and pushed to the > * hardware, i.e. the pending queue, the entity must not be referenced anymore > * through the jobs entity pointer. > @@ -96,6 +114,8 @@ static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *enti > > void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) > { > + WARN_ON(!!entity->single_sched); > + > /* > * Both locks need to be grabbed, one to protect from entity->rq change > * for entity from within concurrent drm_sched_entity_select_rq and the > @@ -126,6 +146,8 @@ void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) > static void drm_sched_rq_init(struct drm_gpu_scheduler *sched, > struct drm_sched_rq *rq) > { > + WARN_ON(sched->sched_policy == DRM_SCHED_POLICY_SINGLE_ENTITY); > + > spin_lock_init(&rq->lock); > INIT_LIST_HEAD(&rq->entities); > rq->rb_tree_root = RB_ROOT_CACHED; > @@ -144,6 +166,8 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched, > void drm_sched_rq_add_entity(struct drm_sched_rq *rq, > struct drm_sched_entity *entity) > { > + WARN_ON(!!entity->single_sched); > + > if (!list_empty(&entity->list)) > return; > > @@ -166,6 +190,8 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq, > void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, > struct drm_sched_entity *entity) > { > + WARN_ON(!!entity->single_sched); > + > if (list_empty(&entity->list)) > return; > > @@ -641,7 +667,7 @@ int drm_sched_job_init(struct drm_sched_job *job, > struct drm_sched_entity *entity, > void *owner) > { > - if (!entity->rq) > + if (!entity->rq && !entity->single_sched) > return -ENOENT; > > job->entity = entity; > @@ -674,13 +700,16 @@ void drm_sched_job_arm(struct drm_sched_job *job) > { > struct drm_gpu_scheduler *sched; > struct drm_sched_entity *entity = job->entity; > + bool single_entity = !!entity->single_sched; > > BUG_ON(!entity); > - drm_sched_entity_select_rq(entity); > - sched = entity->rq->sched; > + if (!single_entity) > + drm_sched_entity_select_rq(entity); > + sched = drm_sched_entity_to_scheduler(entity); > > job->sched = sched; > - job->s_priority = entity->rq - sched->sched_rq; > + if (!single_entity) > + job->s_priority = entity->rq - sched->sched_rq; > job->id = atomic64_inc_return(&sched->job_id_count); > > drm_sched_fence_init(job->s_fence, job->entity); > @@ -896,6 +925,13 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) > if (!drm_sched_can_queue(sched)) > return NULL; > > + if (sched->single_entity) { > + if (drm_sched_entity_is_ready(sched->single_entity)) > + return sched->single_entity; > + > + return NULL; > + } > + > /* Kernel run queue has higher priority than normal run queue*/ > for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { > entity = sched->sched_policy == DRM_SCHED_POLICY_FIFO ? > @@ -1091,6 +1127,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, > return -EINVAL; > > sched->ops = ops; > + sched->single_entity = NULL; > sched->hw_submission_limit = hw_submission; > sched->name = name; > sched->submit_wq = submit_wq ? : system_wq; > @@ -1103,7 +1140,9 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, > sched->sched_policy = default_drm_sched_policy; > else > sched->sched_policy = sched_policy; > - for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++) > + for (i = DRM_SCHED_PRIORITY_MIN; sched_policy != > + DRM_SCHED_POLICY_SINGLE_ENTITY && i < DRM_SCHED_PRIORITY_COUNT; > + i++) > drm_sched_rq_init(sched, &sched->sched_rq[i]); > > init_waitqueue_head(&sched->job_scheduled); > @@ -1135,7 +1174,15 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) > > drm_sched_submit_stop(sched); > > - for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { > + if (sched->single_entity) { > + spin_lock(&sched->single_entity->rq_lock); > + sched->single_entity->stopped = true; > + spin_unlock(&sched->single_entity->rq_lock); > + } > + > + for (i = DRM_SCHED_PRIORITY_COUNT - 1; sched->sched_policy != > + DRM_SCHED_POLICY_SINGLE_ENTITY && i >= DRM_SCHED_PRIORITY_MIN; > + i--) { > struct drm_sched_rq *rq = &sched->sched_rq[i]; > > spin_lock(&rq->lock); > @@ -1176,6 +1223,8 @@ void drm_sched_increase_karma(struct drm_sched_job *bad) > struct drm_sched_entity *entity; > struct drm_gpu_scheduler *sched = bad->sched; > > + WARN_ON(sched->sched_policy == DRM_SCHED_POLICY_SINGLE_ENTITY); > + > /* don't change @bad's karma if it's from KERNEL RQ, > * because sometimes GPU hang would cause kernel jobs (like VM updating jobs) > * corrupt but keep in mind that kernel jobs always considered good. > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > index 897d52a4ff4f..04eec2d7635f 100644 > --- a/include/drm/gpu_scheduler.h > +++ b/include/drm/gpu_scheduler.h > @@ -79,6 +79,7 @@ enum drm_sched_policy { > DRM_SCHED_POLICY_DEFAULT, > DRM_SCHED_POLICY_RR, > DRM_SCHED_POLICY_FIFO, > + DRM_SCHED_POLICY_SINGLE_ENTITY, > DRM_SCHED_POLICY_COUNT, > }; > > @@ -112,6 +113,9 @@ struct drm_sched_entity { > */ > struct drm_sched_rq *rq; > > + /** @single_sched: Single scheduler */ > + struct drm_gpu_scheduler *single_sched; > + > /** > * @sched_list: > * > @@ -473,6 +477,7 @@ struct drm_sched_backend_ops { > * struct drm_gpu_scheduler - scheduler instance-specific data > * > * @ops: backend operations provided by the driver. > + * @single_entity: Single entity for the scheduler > * @hw_submission_limit: the max size of the hardware queue. > * @timeout: the time after which a job is removed from the scheduler. > * @name: name of the ring for which this scheduler is being used. > @@ -503,6 +508,7 @@ struct drm_sched_backend_ops { > */ > struct drm_gpu_scheduler { > const struct drm_sched_backend_ops *ops; > + struct drm_sched_entity *single_entity; > uint32_t hw_submission_limit; > long timeout; > const char *name; > @@ -585,6 +591,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, > struct drm_gpu_scheduler **sched_list, > unsigned int num_sched_list, > atomic_t *guilty); > +struct drm_gpu_scheduler * > +drm_sched_entity_to_scheduler(struct drm_sched_entity *entity); > long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout); > void drm_sched_entity_fini(struct drm_sched_entity *entity); > void drm_sched_entity_destroy(struct drm_sched_entity *entity); From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D1524C71153 for ; Tue, 29 Aug 2023 17:37:20 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0573010E3E8; Tue, 29 Aug 2023 17:37:20 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7A9C610E3EA for ; Tue, 29 Aug 2023 17:37:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1693330636; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KyEGNQ+lTvchkCom6b2unYu88z//ucaip98vI4R0Pg8=; b=cIrCIj3YSmk3pP/9N0ttzJ/pWiIjIzZQ9vf6sHVpZkG12ZNb/VymFI+ZJtKXOTvNywJ5Gm fOZPICkCAeZBHIHCkoDl5aoNYVkAbdCGSvkPmcFqnA1CwBDPGVB1sHwQAvRPf2pkjiGmYD N5AUauij3IE8Y1Ci1iU57TIYg1Om48Y= Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-108-6bt8X-wpN0mDALUKRLfeWA-1; Tue, 29 Aug 2023 13:37:14 -0400 X-MC-Unique: 6bt8X-wpN0mDALUKRLfeWA-1 Received: by mail-ed1-f72.google.com with SMTP id 4fb4d7f45d1cf-52310058f1eso65536a12.0 for ; Tue, 29 Aug 2023 10:37:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693330633; x=1693935433; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=KyEGNQ+lTvchkCom6b2unYu88z//ucaip98vI4R0Pg8=; b=jjHQAYIFioSBfo0KWrCTe8UULVW86QSe5rKhAkp9AxPnwhjsJb5XyYdFjO7OsPBYtl 3CW2Gj/DAquXwEbFxBSzXVeTMbuWzZV5MJKB1jfu+ybZ/r4CdabC2tiztw7WnKMaS2+N 3NaFi3Nr7hKDRH3Zi2p5ZZMBYMyI7KO77GH9eCZb93P3mf1EBUpX2QDEepn5PBkVzHzH OHfsDQWq4d3pkI2vBuZUvEckscUJmwTEK8PZO1XgexwXFgblYfIHaL35uMUqEhw+z+Uu eZb1dSY6STADtgbW1gb6KzeetenPlU4NYCnS253FEL2RCe/vY2GA6YPvqfS1HExnTQtS nVWQ== X-Gm-Message-State: AOJu0YylOPWykG5jZX/NI2cmAEA2WzN/9Tj7srAnTKL/K5IqH/RSwdoB epo2tHxf2E8UIRMNtFzXKWkQ+XxGcUHAKk5H7rvl/PKdq+efjZX0hYPf7WtJKKc9UqDM19UHUty p8BDgBCK3hrXjHwbvaY2Hej1VXUwk X-Received: by 2002:a05:6402:5111:b0:51f:ef58:da87 with SMTP id m17-20020a056402511100b0051fef58da87mr3893454edd.2.1693330633336; Tue, 29 Aug 2023 10:37:13 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHoEjrZOiwz2sA2UFiYfMz3t8ux+EeFQKNkyB36iNEgXzUfZTe1Bc+dq0hrkpsHmkUxUJ1wFA== X-Received: by 2002:a05:6402:5111:b0:51f:ef58:da87 with SMTP id m17-20020a056402511100b0051fef58da87mr3893398edd.2.1693330632828; Tue, 29 Aug 2023 10:37:12 -0700 (PDT) Received: from ?IPV6:2a02:810d:4b3f:de9c:642:1aff:fe31:a15c? ([2a02:810d:4b3f:de9c:642:1aff:fe31:a15c]) by smtp.gmail.com with ESMTPSA id y9-20020aa7d509000000b0052239012c65sm5883259edq.82.2023.08.29.10.37.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 29 Aug 2023 10:37:12 -0700 (PDT) Message-ID: <21b82a4e-ded8-cb63-0013-72e4043c3344@redhat.com> Date: Tue, 29 Aug 2023 19:37:10 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH v2 3/9] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy To: Matthew Brost References: <20230811023137.659037-1-matthew.brost@intel.com> <20230811023137.659037-4-matthew.brost@intel.com> From: Danilo Krummrich Organization: RedHat In-Reply-To: <20230811023137.659037-4-matthew.brost@intel.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: robdclark@chromium.org, thomas.hellstrom@linux.intel.com, sarah.walker@imgtec.com, ketil.johnsen@arm.com, lina@asahilina.net, Liviu.Dudau@arm.com, dri-devel@lists.freedesktop.org, intel-xe@lists.freedesktop.org, luben.tuikov@amd.com, donald.robson@imgtec.com, boris.brezillon@collabora.com, christian.koenig@amd.com, faith.ekstrand@collabora.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On 8/11/23 04:31, Matthew Brost wrote: > DRM_SCHED_POLICY_SINGLE_ENTITY creates a 1 to 1 relationship between > scheduler and entity. No priorities or run queue used in this mode. > Intended for devices with firmware schedulers. > > v2: > - Drop sched / rq union (Luben) > > Signed-off-by: Matthew Brost > --- > drivers/gpu/drm/scheduler/sched_entity.c | 69 ++++++++++++++++++------ > drivers/gpu/drm/scheduler/sched_fence.c | 2 +- > drivers/gpu/drm/scheduler/sched_main.c | 63 +++++++++++++++++++--- > include/drm/gpu_scheduler.h | 8 +++ > 4 files changed, 119 insertions(+), 23 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c > index 65a972b52eda..1dec97caaba3 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -83,6 +83,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, > memset(entity, 0, sizeof(struct drm_sched_entity)); > INIT_LIST_HEAD(&entity->list); > entity->rq = NULL; > + entity->single_sched = NULL; > entity->guilty = guilty; > entity->num_sched_list = num_sched_list; > entity->priority = priority; > @@ -90,8 +91,17 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, > RCU_INIT_POINTER(entity->last_scheduled, NULL); > RB_CLEAR_NODE(&entity->rb_tree_node); > > - if(num_sched_list) > - entity->rq = &sched_list[0]->sched_rq[entity->priority]; > + if (num_sched_list) { > + if (sched_list[0]->sched_policy != > + DRM_SCHED_POLICY_SINGLE_ENTITY) { > + entity->rq = &sched_list[0]->sched_rq[entity->priority]; > + } else { > + if (num_sched_list != 1 || sched_list[0]->single_entity) > + return -EINVAL; > + sched_list[0]->single_entity = entity; > + entity->single_sched = sched_list[0]; > + } > + } > > init_completion(&entity->entity_idle); > > @@ -124,7 +134,8 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity, > struct drm_gpu_scheduler **sched_list, > unsigned int num_sched_list) > { > - WARN_ON(!num_sched_list || !sched_list); > + WARN_ON(!num_sched_list || !sched_list || > + !!entity->single_sched); > > entity->sched_list = sched_list; > entity->num_sched_list = num_sched_list; > @@ -231,13 +242,15 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity) > { > struct drm_sched_job *job; > struct dma_fence *prev; > + bool single_entity = !!entity->single_sched; > > - if (!entity->rq) > + if (!entity->rq && !single_entity) > return; > > spin_lock(&entity->rq_lock); > entity->stopped = true; > - drm_sched_rq_remove_entity(entity->rq, entity); > + if (!single_entity) > + drm_sched_rq_remove_entity(entity->rq, entity); Looks like nothing prevents drm_sched_run_job_work() to fetch more jobs from the entity, hence if this is called for an entity still having queued up jobs and a still running scheduler, drm_sched_entity_kill() and drm_sched_run_job_work() would race for jobs, right? Not sure if this is by intention because we don't expect the driver to drm_sched_entity_fini() as long as there are still queued up jobs. At least this is inconsistant to what drm_sched_entity_kill() does without DRM_SCHED_POLICY_SINGLE_ENTITY and should either be fixed or documented if we agree nothing else makes sense. I think it also touches my question on how to tear down the scheduler once a ring is removed or deinitialized. I know XE is going its own way in this respect, but I also feel like we're leaving drivers potentially being interested in DRM_SCHED_POLICY_SINGLE_ENTITY a bit alone on that. I think we should probably give drivers a bit more guidance on how to do that. Currently, I see two approaches. (1) Do what XE does, which means letting the scheduler run dry, which includes both the entity's job queue and the schedulers pending_list. While jobs from the entity's queue aren't pushing any more work to the ring on tear down, but just "flow through" to get freed up eventually. (Hopefully I got that right.) (2) Kill the entity to cleanup jobs from the entity's queue, stop the scheduler and either just wait for pending jobs or signal them right away and finally free them up. Actually there'd also be (3), which could be a mix of both, discard the entity's queued jobs, but let the pending_list run dry. I'm not saying we should provide a whole bunch of infrastructure for drivers, e.g. for (1) as you've mentioned already as well, there is probably not much to generalize anyway. However, I think we should document the options drivers have to tear things down and do enough to enable drivers using any option (as long as we agree it is reasonable). For Nouveau specifically, I'd probably like to go with (3). - Danilo > spin_unlock(&entity->rq_lock); > > /* Make sure this entity is not used by the scheduler at the moment */ > @@ -259,6 +272,20 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity) > dma_fence_put(prev); > } > > +/** > + * drm_sched_entity_to_scheduler - Schedule entity to GPU scheduler > + * @entity: scheduler entity > + * > + * Returns GPU scheduler for the entity > + */ > +struct drm_gpu_scheduler * > +drm_sched_entity_to_scheduler(struct drm_sched_entity *entity) > +{ > + bool single_entity = !!entity->single_sched; > + > + return single_entity ? entity->single_sched : entity->rq->sched; > +} > + > /** > * drm_sched_entity_flush - Flush a context entity > * > @@ -276,11 +303,12 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout) > struct drm_gpu_scheduler *sched; > struct task_struct *last_user; > long ret = timeout; > + bool single_entity = !!entity->single_sched; > > - if (!entity->rq) > + if (!entity->rq && !single_entity) > return 0; > > - sched = entity->rq->sched; > + sched = drm_sched_entity_to_scheduler(entity); > /** > * The client will not queue more IBs during this fini, consume existing > * queued IBs or discard them on SIGKILL > @@ -373,7 +401,7 @@ static void drm_sched_entity_wakeup(struct dma_fence *f, > container_of(cb, struct drm_sched_entity, cb); > > drm_sched_entity_clear_dep(f, cb); > - drm_sched_wakeup_if_can_queue(entity->rq->sched); > + drm_sched_wakeup_if_can_queue(drm_sched_entity_to_scheduler(entity)); > } > > /** > @@ -387,6 +415,8 @@ static void drm_sched_entity_wakeup(struct dma_fence *f, > void drm_sched_entity_set_priority(struct drm_sched_entity *entity, > enum drm_sched_priority priority) > { > + WARN_ON(!!entity->single_sched); > + > spin_lock(&entity->rq_lock); > entity->priority = priority; > spin_unlock(&entity->rq_lock); > @@ -399,7 +429,7 @@ EXPORT_SYMBOL(drm_sched_entity_set_priority); > */ > static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity) > { > - struct drm_gpu_scheduler *sched = entity->rq->sched; > + struct drm_gpu_scheduler *sched = drm_sched_entity_to_scheduler(entity); > struct dma_fence *fence = entity->dependency; > struct drm_sched_fence *s_fence; > > @@ -501,7 +531,8 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) > * Update the entity's location in the min heap according to > * the timestamp of the next job, if any. > */ > - if (entity->rq->sched->sched_policy == DRM_SCHED_POLICY_FIFO) { > + if (drm_sched_entity_to_scheduler(entity)->sched_policy == > + DRM_SCHED_POLICY_FIFO) { > struct drm_sched_job *next; > > next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); > @@ -524,6 +555,8 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity) > struct drm_gpu_scheduler *sched; > struct drm_sched_rq *rq; > > + WARN_ON(!!entity->single_sched); > + > /* single possible engine and already selected */ > if (!entity->sched_list) > return; > @@ -573,12 +606,13 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity) > void drm_sched_entity_push_job(struct drm_sched_job *sched_job) > { > struct drm_sched_entity *entity = sched_job->entity; > - bool first, fifo = entity->rq->sched->sched_policy == > - DRM_SCHED_POLICY_FIFO; > + bool single_entity = !!entity->single_sched; > + bool first; > ktime_t submit_ts; > > trace_drm_sched_job(sched_job, entity); > - atomic_inc(entity->rq->sched->score); > + if (!single_entity) > + atomic_inc(entity->rq->sched->score); > WRITE_ONCE(entity->last_user, current->group_leader); > > /* > @@ -591,6 +625,10 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) > > /* first job wakes up scheduler */ > if (first) { > + struct drm_gpu_scheduler *sched = > + drm_sched_entity_to_scheduler(entity); > + bool fifo = sched->sched_policy == DRM_SCHED_POLICY_FIFO; > + > /* Add the entity to the run queue */ > spin_lock(&entity->rq_lock); > if (entity->stopped) { > @@ -600,13 +638,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) > return; > } > > - drm_sched_rq_add_entity(entity->rq, entity); > + if (!single_entity) > + drm_sched_rq_add_entity(entity->rq, entity); > spin_unlock(&entity->rq_lock); > > if (fifo) > drm_sched_rq_update_fifo(entity, submit_ts); > > - drm_sched_wakeup_if_can_queue(entity->rq->sched); > + drm_sched_wakeup_if_can_queue(sched); > } > } > EXPORT_SYMBOL(drm_sched_entity_push_job); > diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c > index 06cedfe4b486..f6b926f5e188 100644 > --- a/drivers/gpu/drm/scheduler/sched_fence.c > +++ b/drivers/gpu/drm/scheduler/sched_fence.c > @@ -225,7 +225,7 @@ void drm_sched_fence_init(struct drm_sched_fence *fence, > { > unsigned seq; > > - fence->sched = entity->rq->sched; > + fence->sched = drm_sched_entity_to_scheduler(entity); > seq = atomic_inc_return(&entity->fence_seq); > dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled, > &fence->lock, entity->fence_context, seq); > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 545d5298c086..cede47afc800 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -32,7 +32,8 @@ > * backend operations to the scheduler like submitting a job to hardware run queue, > * returning the dependencies of a job etc. > * > - * The organisation of the scheduler is the following: > + * The organisation of the scheduler is the following for scheduling policies > + * DRM_SCHED_POLICY_RR and DRM_SCHED_POLICY_FIFO: > * > * 1. Each hw run queue has one scheduler > * 2. Each scheduler has multiple run queues with different priorities > @@ -43,6 +44,23 @@ > * > * The jobs in a entity are always scheduled in the order that they were pushed. > * > + * The organisation of the scheduler is the following for scheduling policy > + * DRM_SCHED_POLICY_SINGLE_ENTITY: > + * > + * 1. One to one relationship between scheduler and entity > + * 2. No priorities implemented per scheduler (single job queue) > + * 3. No run queues in scheduler rather jobs are directly dequeued from entity > + * 4. The entity maintains a queue of jobs that will be scheduled on the > + * hardware > + * > + * The jobs in a entity are always scheduled in the order that they were pushed > + * regardless of scheduling policy. > + * > + * A policy of DRM_SCHED_POLICY_RR or DRM_SCHED_POLICY_FIFO is expected to used > + * when the KMD is scheduling directly on the hardware while a scheduling policy > + * of DRM_SCHED_POLICY_SINGLE_ENTITY is expected to be used when there is a > + * firmware scheduler. > + * > * Note that once a job was taken from the entities queue and pushed to the > * hardware, i.e. the pending queue, the entity must not be referenced anymore > * through the jobs entity pointer. > @@ -96,6 +114,8 @@ static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *enti > > void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) > { > + WARN_ON(!!entity->single_sched); > + > /* > * Both locks need to be grabbed, one to protect from entity->rq change > * for entity from within concurrent drm_sched_entity_select_rq and the > @@ -126,6 +146,8 @@ void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) > static void drm_sched_rq_init(struct drm_gpu_scheduler *sched, > struct drm_sched_rq *rq) > { > + WARN_ON(sched->sched_policy == DRM_SCHED_POLICY_SINGLE_ENTITY); > + > spin_lock_init(&rq->lock); > INIT_LIST_HEAD(&rq->entities); > rq->rb_tree_root = RB_ROOT_CACHED; > @@ -144,6 +166,8 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched, > void drm_sched_rq_add_entity(struct drm_sched_rq *rq, > struct drm_sched_entity *entity) > { > + WARN_ON(!!entity->single_sched); > + > if (!list_empty(&entity->list)) > return; > > @@ -166,6 +190,8 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq, > void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, > struct drm_sched_entity *entity) > { > + WARN_ON(!!entity->single_sched); > + > if (list_empty(&entity->list)) > return; > > @@ -641,7 +667,7 @@ int drm_sched_job_init(struct drm_sched_job *job, > struct drm_sched_entity *entity, > void *owner) > { > - if (!entity->rq) > + if (!entity->rq && !entity->single_sched) > return -ENOENT; > > job->entity = entity; > @@ -674,13 +700,16 @@ void drm_sched_job_arm(struct drm_sched_job *job) > { > struct drm_gpu_scheduler *sched; > struct drm_sched_entity *entity = job->entity; > + bool single_entity = !!entity->single_sched; > > BUG_ON(!entity); > - drm_sched_entity_select_rq(entity); > - sched = entity->rq->sched; > + if (!single_entity) > + drm_sched_entity_select_rq(entity); > + sched = drm_sched_entity_to_scheduler(entity); > > job->sched = sched; > - job->s_priority = entity->rq - sched->sched_rq; > + if (!single_entity) > + job->s_priority = entity->rq - sched->sched_rq; > job->id = atomic64_inc_return(&sched->job_id_count); > > drm_sched_fence_init(job->s_fence, job->entity); > @@ -896,6 +925,13 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) > if (!drm_sched_can_queue(sched)) > return NULL; > > + if (sched->single_entity) { > + if (drm_sched_entity_is_ready(sched->single_entity)) > + return sched->single_entity; > + > + return NULL; > + } > + > /* Kernel run queue has higher priority than normal run queue*/ > for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { > entity = sched->sched_policy == DRM_SCHED_POLICY_FIFO ? > @@ -1091,6 +1127,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, > return -EINVAL; > > sched->ops = ops; > + sched->single_entity = NULL; > sched->hw_submission_limit = hw_submission; > sched->name = name; > sched->submit_wq = submit_wq ? : system_wq; > @@ -1103,7 +1140,9 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, > sched->sched_policy = default_drm_sched_policy; > else > sched->sched_policy = sched_policy; > - for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++) > + for (i = DRM_SCHED_PRIORITY_MIN; sched_policy != > + DRM_SCHED_POLICY_SINGLE_ENTITY && i < DRM_SCHED_PRIORITY_COUNT; > + i++) > drm_sched_rq_init(sched, &sched->sched_rq[i]); > > init_waitqueue_head(&sched->job_scheduled); > @@ -1135,7 +1174,15 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) > > drm_sched_submit_stop(sched); > > - for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { > + if (sched->single_entity) { > + spin_lock(&sched->single_entity->rq_lock); > + sched->single_entity->stopped = true; > + spin_unlock(&sched->single_entity->rq_lock); > + } > + > + for (i = DRM_SCHED_PRIORITY_COUNT - 1; sched->sched_policy != > + DRM_SCHED_POLICY_SINGLE_ENTITY && i >= DRM_SCHED_PRIORITY_MIN; > + i--) { > struct drm_sched_rq *rq = &sched->sched_rq[i]; > > spin_lock(&rq->lock); > @@ -1176,6 +1223,8 @@ void drm_sched_increase_karma(struct drm_sched_job *bad) > struct drm_sched_entity *entity; > struct drm_gpu_scheduler *sched = bad->sched; > > + WARN_ON(sched->sched_policy == DRM_SCHED_POLICY_SINGLE_ENTITY); > + > /* don't change @bad's karma if it's from KERNEL RQ, > * because sometimes GPU hang would cause kernel jobs (like VM updating jobs) > * corrupt but keep in mind that kernel jobs always considered good. > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > index 897d52a4ff4f..04eec2d7635f 100644 > --- a/include/drm/gpu_scheduler.h > +++ b/include/drm/gpu_scheduler.h > @@ -79,6 +79,7 @@ enum drm_sched_policy { > DRM_SCHED_POLICY_DEFAULT, > DRM_SCHED_POLICY_RR, > DRM_SCHED_POLICY_FIFO, > + DRM_SCHED_POLICY_SINGLE_ENTITY, > DRM_SCHED_POLICY_COUNT, > }; > > @@ -112,6 +113,9 @@ struct drm_sched_entity { > */ > struct drm_sched_rq *rq; > > + /** @single_sched: Single scheduler */ > + struct drm_gpu_scheduler *single_sched; > + > /** > * @sched_list: > * > @@ -473,6 +477,7 @@ struct drm_sched_backend_ops { > * struct drm_gpu_scheduler - scheduler instance-specific data > * > * @ops: backend operations provided by the driver. > + * @single_entity: Single entity for the scheduler > * @hw_submission_limit: the max size of the hardware queue. > * @timeout: the time after which a job is removed from the scheduler. > * @name: name of the ring for which this scheduler is being used. > @@ -503,6 +508,7 @@ struct drm_sched_backend_ops { > */ > struct drm_gpu_scheduler { > const struct drm_sched_backend_ops *ops; > + struct drm_sched_entity *single_entity; > uint32_t hw_submission_limit; > long timeout; > const char *name; > @@ -585,6 +591,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, > struct drm_gpu_scheduler **sched_list, > unsigned int num_sched_list, > atomic_t *guilty); > +struct drm_gpu_scheduler * > +drm_sched_entity_to_scheduler(struct drm_sched_entity *entity); > long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout); > void drm_sched_entity_fini(struct drm_sched_entity *entity); > void drm_sched_entity_destroy(struct drm_sched_entity *entity);