From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3862CCA0EF5 for ; Tue, 12 Sep 2023 16:52:44 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id F3DE510E23E; Tue, 12 Sep 2023 16:52:43 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id 31C1F10E23E for ; Tue, 12 Sep 2023 16:52:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694537561; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7zPLZB5PmMWR9I/7dDRtLQmWaAF03lmMzfEQ6taGyD4=; b=akI//KymZzsXR3+h4ijsyx8Fy+vEmYMAYe+hZUjz67zInwvuCl2BKiLePSXgLE6NbsM/dB NAmWLSIdNjIuJ+6WCJu+ZZlH7mi7EFVyPWex0u5hJ3AjJnFmbcuiG/1cBGJtlVmV6Kpp75 WdlEucfAYwhNclLnMu46dLoaNvVEmE4= Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-458-aiCgrlIYMDyKS2S1t6F8Aw-1; Tue, 12 Sep 2023 12:52:40 -0400 X-MC-Unique: aiCgrlIYMDyKS2S1t6F8Aw-1 Received: by mail-ed1-f72.google.com with SMTP id 4fb4d7f45d1cf-52a0f6f7a3bso34044a12.1 for ; Tue, 12 Sep 2023 09:52:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694537558; x=1695142358; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7zPLZB5PmMWR9I/7dDRtLQmWaAF03lmMzfEQ6taGyD4=; b=YneyIoErzXsd/gCm/toZs0GV7TLP1ifaMreiLsmXFBpSIYEbalWRLf7EllHFwIAXX4 VxFbIhoDeSUdy+Su6bkge6Q8wxCACyeFwkY6/FOkOR1p9l1e+Y5xummh7cq+un/6IY/m 4WJziC/xt+Ya7wxMbyqdQ+XLHa4v5kN8SJZr5wLuCWQ1h0Ojm33rttGXSQEF6EokqgpP j2uqpiQOv4LtCJKRUXLBHPPxmNoKcYnAEgtcDvLIs43zJTAyv53EkpydzRq8TP2EUJC8 QLuGbuDSpTAr0pxUJvAZS6xNxgqt9JT/2cSUyd7GKzvNArL/PeQWZG+oiYBBkSiX5QST O2DA== X-Gm-Message-State: AOJu0YzdY7PsRzfSuZRlKUicNj0ztBORbJ0AKyxa1r0ExxVdcKjT3FIo NDP0NUxS/3N3+Hs57m3NNoDk6d9/rxuHjpTXEmphJKknv1XbLVLO7ONHwHWbZXo+OxkDkl+7kFm 4EiemdGKnepFC82o2dJxWHGwOO2g= X-Received: by 2002:aa7:d94e:0:b0:523:100b:70da with SMTP id l14-20020aa7d94e000000b00523100b70damr321302eds.4.1694537558718; Tue, 12 Sep 2023 09:52:38 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF/fKwHHtaf3bLzcsJ/fWWK9WZEL+c4ZJMKlVWzRqfsmLvJ3IcymQbwU6J8H4V6UsKPXl7UMQ== X-Received: by 2002:aa7:d94e:0:b0:523:100b:70da with SMTP id l14-20020aa7d94e000000b00523100b70damr321272eds.4.1694537558409; Tue, 12 Sep 2023 09:52:38 -0700 (PDT) Received: from ?IPV6:2a02:810d:4b3f:de9c:642:1aff:fe31:a15c? ([2a02:810d:4b3f:de9c:642:1aff:fe31:a15c]) by smtp.gmail.com with ESMTPSA id r25-20020aa7d599000000b005256d80cdaesm6088570edq.65.2023.09.12.09.52.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Sep 2023 09:52:37 -0700 (PDT) Message-ID: <2982d2c1-99e2-03a4-7b54-e57471394bd3@redhat.com> Date: Tue, 12 Sep 2023 18:52:36 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 To: Boris Brezillon References: <20230811023137.659037-1-matthew.brost@intel.com> <20230811023137.659037-2-matthew.brost@intel.com> <69b648f8-c6b3-5846-0d03-05a380d010d8@redhat.com> <069e6cd0-abd3-fdd9-217d-173e8f8e1d29@amd.com> <982800c1-e7d3-f276-51d0-1a431f92eacb@amd.com> <5fdf7d59-3323-24b5-a35a-bd60b06b4ce5@redhat.com> <0bf839df-db7f-41fa-8b34-59792d2ba8be@amd.com> <20230912162838.34135959@collabora.com> <20230912164909.018d13c8@collabora.com> From: Danilo Krummrich Organization: RedHat In-Reply-To: <20230912164909.018d13c8@collabora.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Intel-xe] [PATCH v2 1/9] drm/sched: Convert drm scheduler to use a work queue rather than kthread X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: robdclark@chromium.org, sarah.walker@imgtec.com, ketil.johnsen@arm.com, lina@asahilina.net, Liviu.Dudau@arm.com, dri-devel@lists.freedesktop.org, intel-xe@lists.freedesktop.org, luben.tuikov@amd.com, donald.robson@imgtec.com, =?UTF-8?Q?Christian_K=c3=b6nig?= , faith.ekstrand@collabora.com Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On 9/12/23 16:49, Boris Brezillon wrote: > On Tue, 12 Sep 2023 16:33:01 +0200 > Danilo Krummrich wrote: > >> On 9/12/23 16:28, Boris Brezillon wrote: >>> On Thu, 17 Aug 2023 13:13:31 +0200 >>> Danilo Krummrich wrote: >>> >>>> I think that's a misunderstanding. I'm not trying to say that it is >>>> *always* beneficial to fill up the ring as much as possible. But I think >>>> it is under certain circumstances, exactly those circumstances I >>>> described for Nouveau. >>>> >>>> As mentioned, in Nouveau the size of a job is only really limited by the >>>> ring size, which means that one job can (but does not necessarily) fill >>>> up the whole ring. We both agree that this is inefficient, because it >>>> potentially results into the HW run dry due to hw_submission_limit == 1. >>>> >>>> I recognize you said that one should define hw_submission_limit and >>>> adjust the other parts of the equation accordingly, the options I see are: >>>> >>>> (1) Increase the ring size while keeping the maximum job size. >>>> (2) Decrease the maximum job size while keeping the ring size. >>>> (3) Let the scheduler track the actual job size rather than the maximum >>>> job size. >>>> >>>> (1) results into potentially wasted ring memory, because we're not >>>> always reaching the maximum job size, but the scheduler assumes so. >>>> >>>> (2) results into more IOCTLs from userspace for the same amount of IBs >>>> and more jobs result into more memory allocations and more work being >>>> submitted to the workqueue (with Matt's patches). >>>> >>>> (3) doesn't seem to have any of those draw backs. >>>> >>>> What would be your take on that? >>>> >>>> Actually, if none of the other drivers is interested into a more precise >>>> way of keeping track of the ring utilization, I'd be totally fine to do >>>> it in a driver specific way. However, unfortunately I don't see how this >>>> would be possible. >>> >>> I'm not entirely sure, but I think PowerVR is pretty close to your >>> description: jobs size is dynamic size, and the ring buffer size is >>> picked by the driver at queue initialization time. What we did was to >>> set hw_submission_limit to an arbitrarily high value of 64k (we could >>> have used something like ringbuf_size/min_job_size instead), and then >>> have the control flow implemented with ->prepare_job() [1] (CCCB is the >>> PowerVR ring buffer). This allows us to maximize ring buffer utilization >>> while still allowing dynamic-size jobs. >> >> I guess this would work, but I think it would be better to bake this in, >> especially if more drivers do have this need. I already have an >> implementation [1] for doing that in the scheduler. My plan was to push >> that as soon as Matt sends out V3. >> >> [1] https://gitlab.freedesktop.org/nouvelles/kernel/-/commit/269f05d6a2255384badff8b008b3c32d640d2d95 > > PowerVR's ->can_fit_in_ringbuf() logic is a bit more involved in that > native fences waits are passed to the FW, and those add to the job size. > When we know our job is ready for execution (all non-native deps are > signaled), we evict already signaled native-deps (or native fences) to > shrink the job size further more, but that's something we need to > calculate late if we want the job size to be minimal. Of course, we can > always over-estimate the job size, but if we go for a full-blown > drm_sched integration, I wonder if it wouldn't be preferable to have a > ->get_job_size() callback returning the number of units needed by job, > and have the core pick 1 when the hook is not implemented. > Sure, why not. Sounds reasonable to me.