From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5723C4743C for ; Wed, 23 Jun 2021 15:12:19 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5D3066102A for ; Wed, 23 Jun 2021 15:12:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5D3066102A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ffwll.ch Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0B4D56E930; Wed, 23 Jun 2021 15:12:19 +0000 (UTC) Received: from mail-ej1-x62b.google.com (mail-ej1-x62b.google.com [IPv6:2a00:1450:4864:20::62b]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2A6186E930 for ; Wed, 23 Jun 2021 15:12:18 +0000 (UTC) Received: by mail-ej1-x62b.google.com with SMTP id ji1so4600704ejc.4 for ; Wed, 23 Jun 2021 08:12:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=3y+KpMWjSfkunjXUwYFJ6kZNhxpq+5TW5mhxInmbGtQ=; b=f9nn3ewCrvOiVb3SWt5bua/7cj86/IBQcQMF1DdCGwJlxfQWvebySb8uxawzkSA9CN rX2zZKh2lC3wyJ0K779VU+fBuEB6e/sTNpa3sa136q6hgOTSxHv/Mfu0DbjvwCQlGMgZ tgFzgzibr8TZupKFlnfvS76s7KNMf5rpO5s54= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=3y+KpMWjSfkunjXUwYFJ6kZNhxpq+5TW5mhxInmbGtQ=; b=UrKTIgtL72SRrICJLglA+8M2rh7pRT82NAFMX2SoFChycQhqghCE0O1nDmHk5fvvXj SQZdHDXeuapX0zkfr2IJR9NZm7sK3SDJke/Efi4CpCUvdWm6xkcd95IVW72JTL/bV7rx pNbdOTHuCH7PX2Z9mKpWEoU5+OZ2ZrKRRSZHpgTG+bI/aNY3qZ4nGfwn3so53gJMoaIl 0rw+JdGGTRTITeC8Lo+OeY2vJDLhGNN3vjUAC42Huxa1HE75IctARW6INey8b6KX+9kS W1WOXCC4h877Ti4I8sLphJosEDqqcnvc2In8mC2ziLoM4n71tuO2Gi6w/o/RYA/r5Cqp 1yRw== X-Gm-Message-State: AOAM533Y48mi+tS/uM40/4tvyeFL2LfaxI7G9DUE4PQZLfMkeQAd9QF9 QYSQNG8b1DKWzHZoaR7y1XV/MA== X-Google-Smtp-Source: ABdhPJy2MWq3jqWweX7Me0GG0dXd1oIOG+tty2iSml+Y5qyNGuE2xo4pxVIIqI9VGfKR1D38BZCf5A== X-Received: by 2002:a17:907:a8f:: with SMTP id by15mr502158ejc.91.1624461136049; Wed, 23 Jun 2021 08:12:16 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id yh13sm43863ejb.28.2021.06.23.08.12.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Jun 2021 08:12:15 -0700 (PDT) Date: Wed, 23 Jun 2021 17:12:13 +0200 From: Daniel Vetter To: Christian =?iso-8859-1?Q?K=F6nig?= Message-ID: References: <3bf45006-4256-763d-601b-3a25a7057820@amd.com> <421cb10d-92a7-0780-3d38-d4cabd008c0c@amd.com> <682d2f3b-8ba3-cccb-1385-1f74f0b06ada@amd.com> <5a3c9ea8-adb9-4662-55e2-19672e14b163@amd.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <5a3c9ea8-adb9-4662-55e2-19672e14b163@amd.com> X-Operating-System: Linux phenom 5.10.0-7-amd64 Subject: Re: [Intel-gfx] [PATCH 15/15] RFC: drm/amdgpu: Implement a proper implicit fencing uapi X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Rob Clark , Daniel Stone , Daniel Vetter , Daniel Vetter , Intel Graphics Development , Kevin Wang , DRI Development , Sumit Semwal , "moderated list:DMA BUFFER SHARING FRAMEWORK" , Luben Tuikov , "Kristian H . Kristensen" , Chen Li , Bas Nieuwenhuizen , Alex Deucher , mesa-dev , Dave Airlie , Michel =?iso-8859-1?Q?D=E4nzer?= , Dennis Li , Deepak R Varma Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On Wed, Jun 23, 2021 at 05:07:17PM +0200, Christian K=F6nig wrote: > Am 23.06.21 um 17:03 schrieb Daniel Vetter: > > On Wed, Jun 23, 2021 at 04:58:27PM +0200, Bas Nieuwenhuizen wrote: > > > On Wed, Jun 23, 2021 at 4:50 PM Daniel Vetter wrote: > > > > On Wed, Jun 23, 2021 at 4:02 PM Christian K=F6nig > > > > wrote: > > > > > Am 23.06.21 um 15:49 schrieb Daniel Vetter: > > > > > > On Wed, Jun 23, 2021 at 3:44 PM Christian K=F6nig > > > > > > wrote: > > > > > > > Am 23.06.21 um 15:38 schrieb Bas Nieuwenhuizen: > > > > > > > > On Wed, Jun 23, 2021 at 2:59 PM Christian K=F6nig > > > > > > > > wrote: > > > > > > > > > Am 23.06.21 um 14:18 schrieb Daniel Vetter: > > > > > > > > > > On Wed, Jun 23, 2021 at 11:45 AM Bas Nieuwenhuizen > > > > > > > > > > wrote: > > > > > > > > > > > On Tue, Jun 22, 2021 at 6:55 PM Daniel Vetter wrote: > > > > > > > > > > > > WARNING: Absolutely untested beyond "gcc isn't dyin= g in agony". > > > > > > > > > > > > = > > > > > > > > > > > > Implicit fencing done properly needs to treat the i= mplicit fencing > > > > > > > > > > > > slots like a funny kind of IPC mailbox. In other wo= rds it needs to be > > > > > > > > > > > > explicitly. This is the only way it will mesh well = with explicit > > > > > > > > > > > > fencing userspace like vk, and it's also the bare m= inimum required to > > > > > > > > > > > > be able to manage anything else that wants to use t= he same buffer on > > > > > > > > > > > > multiple engines in parallel, and still be able to = share it through > > > > > > > > > > > > implicit sync. > > > > > > > > > > > > = > > > > > > > > > > > > amdgpu completely lacks such an uapi. Fix this. > > > > > > > > > > > > = > > > > > > > > > > > > Luckily the concept of ignoring implicit fences exi= sts already, and > > > > > > > > > > > > takes care of all the complexities of making sure t= hat non-optional > > > > > > > > > > > > fences (like bo moves) are not ignored. This suppor= t was added in > > > > > > > > > > > > = > > > > > > > > > > > > commit 177ae09b5d699a5ebd1cafcee78889db968abf54 > > > > > > > > > > > > Author: Andres Rodriguez > > > > > > > > > > > > Date: Fri Sep 15 20:44:06 2017 -0400 > > > > > > > > > > > > = > > > > > > > > > > > > drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXP= LICIT_SYNC v2 > > > > > > > > > > > > = > > > > > > > > > > > > Unfortuantely it's the wrong semantics, because it'= s a bo flag and > > > > > > > > > > > > disables implicit sync on an allocated buffer compl= etely. > > > > > > > > > > > > = > > > > > > > > > > > > We _do_ want implicit sync, but control it explicit= ly. For this we > > > > > > > > > > > > need a flag on the drm_file, so that a given usersp= ace (like vulkan) > > > > > > > > > > > > can manage the implicit sync slots explicitly. The = other side of the > > > > > > > > > > > > pipeline (compositor, other process or just differe= nt stage in a media > > > > > > > > > > > > pipeline in the same process) can then either do th= e same, or fully > > > > > > > > > > > > participate in the implicit sync as implemented by = the kernel by > > > > > > > > > > > > default. > > > > > > > > > > > > = > > > > > > > > > > > > By building on the existing flag for buffers we avo= id any issues with > > > > > > > > > > > > opening up additional security concerns - anything = this new flag here > > > > > > > > > > > > allows is already. > > > > > > > > > > > > = > > > > > > > > > > > > All drivers which supports this concept of a usersp= ace-specific > > > > > > > > > > > > opt-out of implicit sync have a flag in their CS io= ctl, but in reality > > > > > > > > > > > > that turned out to be a bit too inflexible. See the= discussion below, > > > > > > > > > > > > let's try to do a bit better for amdgpu. > > > > > > > > > > > > = > > > > > > > > > > > > This alone only allows us to completely avoid any s= talls due to > > > > > > > > > > > > implicit sync, it does not yet allow us to use impl= icit sync as a > > > > > > > > > > > > strange form of IPC for sync_file. > > > > > > > > > > > > = > > > > > > > > > > > > For that we need two more pieces: > > > > > > > > > > > > = > > > > > > > > > > > > - a way to get the current implicit sync fences out= of a buffer. Could > > > > > > > > > > > > be done in a driver ioctl, but everyone needs= this, and generally a > > > > > > > > > > > > dma-buf is involved anyway to establish the s= haring. So an ioctl on > > > > > > > > > > > > the dma-buf makes a ton more sense: > > > > > > > > > > > > = > > > > > > > > > > > > https://nam11.safelinks.protection.outlook.co= m/?url=3Dhttps%3A%2F%2Flore.kernel.org%2Fdri-devel%2F20210520190007.534046-= 4-jason%40jlekstrand.net%2F&data=3D04%7C01%7Cchristian.koenig%40amd.com= %7C517f0d3467324e7ce05008d936581f60%7C3dd8961fe4884e608e11a82d994e183d%7C0%= 7C0%7C637600574408265873%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIj= oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DgntXLzlrqPxYj4Q= 3mQflD3arT9ad40S9AqsvtOXV4nk%3D&reserved=3D0 > > > > > > > > > > > > = > > > > > > > > > > > > Current drivers in upstream solves this by ha= ving the opt-out flag > > > > > > > > > > > > on their CS ioctl. This has the downside that= very often the CS > > > > > > > > > > > > which must actually stall for the implicit fe= nce is run a while > > > > > > > > > > > > after the implicit fence point was logically = sampled per the api > > > > > > > > > > > > spec (vk passes an explicit syncobj around fo= r that afaiui), and so > > > > > > > > > > > > results in oversync. Converting the implicit = sync fences into a > > > > > > > > > > > > snap-shot sync_file is actually accurate. > > > > > > > > > > > > = > > > > > > > > > > > > - Simillar we need to be able to set the exclusive = implicit fence. > > > > > > > > > > > > Current drivers again do this with a CS ioctl= flag, with again the > > > > > > > > > > > > same problems that the time the CS happens ad= ditional dependencies > > > > > > > > > > > > have been added. An explicit ioctl to only in= sert a sync_file (while > > > > > > > > > > > > respecting the rules for how exclusive and sh= ared fence slots must > > > > > > > > > > > > be update in struct dma_resv) is much better.= This is proposed here: > > > > > > > > > > > > = > > > > > > > > > > > > https://nam11.safelinks.protection.outlook.co= m/?url=3Dhttps%3A%2F%2Flore.kernel.org%2Fdri-devel%2F20210520190007.534046-= 5-jason%40jlekstrand.net%2F&data=3D04%7C01%7Cchristian.koenig%40amd.com= %7C517f0d3467324e7ce05008d936581f60%7C3dd8961fe4884e608e11a82d994e183d%7C0%= 7C0%7C637600574408265873%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIj= oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DYtqHT756jlt5NX7= Ydr3Kk1UMTb98nQhlcOlrnr%2B48HE%3D&reserved=3D0 > > > > > > > > > > > > = > > > > > > > > > > > > These three pieces together allow userspace to full= y control implicit > > > > > > > > > > > > fencing and remove all unecessary stall points due = to them. > > > > > > > > > > > > = > > > > > > > > > > > > Well, as much as the implicit fencing model fundame= ntally allows: > > > > > > > > > > > > There is only one set of fences, you can only choos= e to sync against > > > > > > > > > > > > only writers (exclusive slot), or everyone. Hence s= uballocating > > > > > > > > > > > > multiple buffers or anything else like this is fund= amentally not > > > > > > > > > > > > possible, and can only be fixed by a proper explici= t fencing model. > > > > > > > > > > > > = > > > > > > > > > > > > Aside from that caveat this model gets implicit fen= cing as closely to > > > > > > > > > > > > explicit fencing semantics as possible: > > > > > > > > > > > > = > > > > > > > > > > > > On the actual implementation I opted for a simple s= etparam ioctl, no > > > > > > > > > > > > locking (just atomic reads/writes) for simplicity. = There is a nice > > > > > > > > > > > > flag parameter in the VM ioctl which we could use, = except: > > > > > > > > > > > > - it's not checked, so userspace likely passes garb= age > > > > > > > > > > > > - there's already a comment that userspace _does_ p= ass garbage in the > > > > > > > > > > > > priority field > > > > > > > > > > > > So yeah unfortunately this flag parameter for setti= ng vm flags is > > > > > > > > > > > > useless, and we need to hack up a new one. > > > > > > > > > > > > = > > > > > > > > > > > > v2: Explain why a new SETPARAM (Jason) > > > > > > > > > > > > = > > > > > > > > > > > > v3: Bas noticed I forgot to hook up the dependency-= side shortcut. We > > > > > > > > > > > > need both, or this doesn't do much. > > > > > > > > > > > > = > > > > > > > > > > > > v4: Rebase over the amdgpu patch to always set the = implicit sync > > > > > > > > > > > > fences. > > > > > > > > > > > So I think there is still a case missing in this impl= ementation. > > > > > > > > > > > Consider these 3 cases > > > > > > > > > > > = > > > > > > > > > > > (format: a->b: b waits on a. Yes, I know arrows are h= ard) > > > > > > > > > > > = > > > > > > > > > > > explicit->explicit: This doesn't wait now, which is g= ood > > > > > > > > > > > Implicit->explicit: This doesn't wait now, which is g= ood > > > > > > > > > > > explicit->implicit : This still waits as the explicit= submission still > > > > > > > > > > > adds shared fences and most things that set an exclus= ive fence for > > > > > > > > > > > implicit sync will hence wait on it. > > > > > > > > > > > = > > > > > > > > > > > This is probably good enough for what radv needs now = but also sounds > > > > > > > > > > > like a risk wrt baking in new uapi behavior that we d= on't want to be > > > > > > > > > > > the end result. > > > > > > > > > > > = > > > > > > > > > > > Within AMDGPU this is probably solvable in two ways: > > > > > > > > > > > = > > > > > > > > > > > 1) Downgrade AMDGPU_SYNC_NE_OWNER to AMDGPU_SYNC_EXPL= ICIT for shared fences. > > > > > > > > > > I'm not sure that works. I think the right fix is that = radeonsi also > > > > > > > > > > switches to this model, with maybe a per-bo CS flag to = set indicate > > > > > > > > > > write access, to cut down on the number of ioctls that = are needed > > > > > > > > > > otherwise on shared buffers. This per-bo flag would ess= entially select > > > > > > > > > > between SYNC_NE_OWNER and SYNC_EXPLICIT on a per-buffer= basis. > > > > > > > > > Yeah, but I'm still not entirely sure why that approach i= sn't sufficient? > > > > > > > > > = > > > > > > > > > Problem with the per context or per vm flag is that you t= hen don't get > > > > > > > > > any implicit synchronization any more when another proces= s starts using > > > > > > > > > the buffer. > > > > > > > > That is exactly what I want for Vulkan :) > > > > > > > Yeah, but as far as I know this is not something we can do. > > > > > > > = > > > > > > > See we have use cases like screen capture and debug which rel= y on that > > > > > > > behavior. > > > > > > They will keep working, if (and only if) the vulkan side sets t= he > > > > > > winsys fences correctly. Also, everything else in vulkan aside = from > > > > > > winsys is explicitly not synced at all, you have to import drm = syncobj > > > > > > timeline on the gl side. > > > > > > = > > > > > > > The only thing we can do is to say on a per buffer flag that = a buffer > > > > > > > should not participate in implicit sync at all. > > > > > > Nah, this doesn't work. Because it's not a global decision, is = a local > > > > > > decision for the rendered. Vulkan wants to control implicit sync > > > > > > explicitly, and the kernel can't force more synchronization. If= a > > > > > > buffer is shared as a winsys buffer between vulkan client and g= l using > > > > > > compositor, then you _have_ to use implicit sync on it. But vk = needs > > > > > > to set the fences directly (and if the app gets it wrong, you g= et > > > > > > misrendering, but that is the specified behavour of vulkan). > > > > > Yeah, but that's exactly what we tried to avoid. > > > > > = > > > > > Mhm, when we attach the flag to the process/VM then this would br= eak the > > > > > use case of VA-API and Vulkan in the same process. > > > > > = > > > > > But I think if you attach the flag to the context that should ind= eed > > > > > work fine. > > > > Yeah that's a question I have, whether the drm_file is shared within > > > > one process among everything, or whether radeonsi/libva/vk each have > > > > their own. If each have their own drm_file, then we should be fine, > > > > otherwise we need to figure out another place to put this (worst ca= se > > > > as a CS extension that vk just sets on every submit). > > > libdrm_amdgpu dedupes it all so we mostly end up with one drm_file per > > > process (modulo minigbm on chromeos and modulo a master fd). > > > = > > > That said the current proposal is for the context right? And on the > > > context this should pretty much work? So I'm not sure why this is the > > > part we are discussing? > > It's on the fpriv->vm, so on the FD. I assumed vulkan at least would wa= nt > > to have it's private VM for this. And on the quick I didn't see any oth= er > > way to create a VM than to have an FD of your own. > = > You can't have your own FD in libdrm_amdgpu userspace. We had a pretty ha= rd > design discussion about that already. > = > What you could do is to load your own copy of libdrm_amdgpu, but I won't > recommend that. > = > Just putting the flag on the context instead of the VM is much cleaner as > far as I can see anyway. Helper for the blind? If you gues expect me to move that myself ... -Daniel > = > Christian. > = > > If there's something else that means "gpu context with it's own vm" then > > the flag would need to be moved there, pointers appreciated (but maybe > > someone with hw + userspace can do that quicker). > > -Daniel > > = > > > > Also yes this risks that a vk app which was violationing the winsys > > > > spec will now break, which is why I think we should do this sooner > > > > than later. Otherwise the list of w/a we might need to apply in vk > > > > userspace will become very long :-( At least since this is purely > > > > opt-in from userspace, we only need to have the w/a list in userspa= ce, > > > > where mesa has the infrastructure for that already. > > > > -Daniel > > > > = > > > > > Christian. > > > > > = > > > > > > -Daniel > > > > > > = > > > > > > > Regards, > > > > > > > Christian. > > > > > > > = > > > > > > > > > > The current amdgpu uapi just doesn't allow any other mo= del without an > > > > > > > > > > explicit opt-in. So current implicit sync userspace jus= t has to > > > > > > > > > > oversync, there's not much choice. > > > > > > > > > > = > > > > > > > > > > > 2) Have an EXPLICIT fence owner that is used for expl= icit submissions > > > > > > > > > > > that is ignored by AMDGPU_SYNC_NE_OWNER. > > > > > > > > > > > = > > > > > > > > > > > But this doesn't solve cross-driver interactions here. > > > > > > > > > > Yeah cross-driver is still entirely unsolved, because > > > > > > > > > > amdgpu_bo_explicit_sync() on the bo didn't solve that e= ither. > > > > > > > > > Hui? You have lost me. Why is that still unsolved? > > > > > > > > The part we're trying to solve with this patch is Vulkan sh= ould not > > > > > > > > participate in any implicit sync at all wrt submissions (an= d then > > > > > > > > handle the implicit sync for WSI explicitly using the fence > > > > > > > > import/export stuff that Jason wrote). As long we add share= d fences to > > > > > > > > the dma_resv we participate in implicit sync (at the level = of an > > > > > > > > implicit sync read) still, at least from the perspective of= later jobs > > > > > > > > waiting on these fences. > > > > > > > > = > > > > > > > > > Regards, > > > > > > > > > Christian. > > > > > > > > > = > > > > > > > > > > -Daniel > > > > > > > > > > = > > > > > > > > > > > > Cc: mesa-dev@lists.freedesktop.org > > > > > > > > > > > > Cc: Bas Nieuwenhuizen > > > > > > > > > > > > Cc: Dave Airlie > > > > > > > > > > > > Cc: Rob Clark > > > > > > > > > > > > Cc: Kristian H. Kristensen > > > > > > > > > > > > Cc: Michel D=E4nzer > > > > > > > > > > > > Cc: Daniel Stone > > > > > > > > > > > > Cc: Sumit Semwal > > > > > > > > > > > > Cc: "Christian K=F6nig" > > > > > > > > > > > > Cc: Alex Deucher > > > > > > > > > > > > Cc: Daniel Vetter > > > > > > > > > > > > Cc: Deepak R Varma > > > > > > > > > > > > Cc: Chen Li > > > > > > > > > > > > Cc: Kevin Wang > > > > > > > > > > > > Cc: Dennis Li > > > > > > > > > > > > Cc: Luben Tuikov > > > > > > > > > > > > Cc: linaro-mm-sig@lists.linaro.org > > > > > > > > > > > > Signed-off-by: Daniel Vetter > > > > > > > > > > > > --- > > > > > > > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 += ++++-- > > > > > > > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 += ++++++++++++++++++++ > > > > > > > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 6 += +++++ > > > > > > > > > > > > include/uapi/drm/amdgpu_drm.h | 10 += +++++++++ > > > > > > > > > > > > 4 files changed, 42 insertions(+), 2 deletions= (-) > > > > > > > > > > > > = > > > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c= b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > > > > > > > > > > > index 65df34c17264..c5386d13eb4a 100644 > > > > > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > > > > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > > > > > > > > > > > @@ -498,6 +498,7 @@ static int amdgpu_cs_parser_bos= (struct amdgpu_cs_parser *p, > > > > > > > > > > > > struct amdgpu_bo *gds; > > > > > > > > > > > > struct amdgpu_bo *gws; > > > > > > > > > > > > struct amdgpu_bo *oa; > > > > > > > > > > > > + bool no_implicit_sync =3D READ_ONCE(fpriv->= vm.no_implicit_sync); > > > > > > > > > > > > int r; > > > > > > > > > > > > = > > > > > > > > > > > > INIT_LIST_HEAD(&p->validated); > > > > > > > > > > > > @@ -577,7 +578,8 @@ static int amdgpu_cs_parser_bos= (struct amdgpu_cs_parser *p, > > > > > > > > > > > > = > > > > > > > > > > > > e->bo_va =3D amdgpu_vm_bo_find(= vm, bo); > > > > > > > > > > > > = > > > > > > > > > > > > - if (bo->tbo.base.dma_buf && !amdgpu= _bo_explicit_sync(bo)) { > > > > > > > > > > > > + if (bo->tbo.base.dma_buf && > > > > > > > > > > > > + !(no_implicit_sync || amdgpu_bo= _explicit_sync(bo))) { > > > > > > > > > > > > e->chain =3D dma_fence_= chain_alloc(); > > > > > > > > > > > > if (!e->chain) { > > > > > > > > > > > > r =3D -ENOMEM; > > > > > > > > > > > > @@ -649,6 +651,7 @@ static int amdgpu_cs_sync_rings= (struct amdgpu_cs_parser *p) > > > > > > > > > > > > { > > > > > > > > > > > > struct amdgpu_fpriv *fpriv =3D p->filp-= >driver_priv; > > > > > > > > > > > > struct amdgpu_bo_list_entry *e; > > > > > > > > > > > > + bool no_implicit_sync =3D READ_ONCE(fpriv->= vm.no_implicit_sync); > > > > > > > > > > > > int r; > > > > > > > > > > > > = > > > > > > > > > > > > list_for_each_entry(e, &p->validated, t= v.head) { > > > > > > > > > > > > @@ -656,7 +659,7 @@ static int amdgpu_cs_sync_rings= (struct amdgpu_cs_parser *p) > > > > > > > > > > > > struct dma_resv *resv =3D bo->t= bo.base.resv; > > > > > > > > > > > > enum amdgpu_sync_mode sync_mode; > > > > > > > > > > > > = > > > > > > > > > > > > - sync_mode =3D amdgpu_bo_explicit_sy= nc(bo) ? > > > > > > > > > > > > + sync_mode =3D no_implicit_sync || a= mdgpu_bo_explicit_sync(bo) ? > > > > > > > > > > > > AMDGPU_SYNC_EXPLICIT : = AMDGPU_SYNC_NE_OWNER; > > > > > > > > > > > > r =3D amdgpu_sync_resv(p->adev,= &p->job->sync, resv, sync_mode, > > > > > > > > > > > > &fpriv->vm= ); > > > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.= c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > > > > > > > > > > index c080ba15ae77..f982626b5328 100644 > > > > > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > > > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > > > > > > > > > > @@ -1724,6 +1724,26 @@ int amdgpu_file_to_fpriv(str= uct file *filp, struct amdgpu_fpriv **fpriv) > > > > > > > > > > > > return 0; > > > > > > > > > > > > } > > > > > > > > > > > > = > > > > > > > > > > > > +int amdgpu_setparam_ioctl(struct drm_device *dev, = void *data, > > > > > > > > > > > > + struct drm_file *filp) > > > > > > > > > > > > +{ > > > > > > > > > > > > + struct drm_amdgpu_setparam *setparam =3D da= ta; > > > > > > > > > > > > + struct amdgpu_fpriv *fpriv =3D filp->driver= _priv; > > > > > > > > > > > > + > > > > > > > > > > > > + switch (setparam->param) { > > > > > > > > > > > > + case AMDGPU_SETPARAM_NO_IMPLICIT_SYNC: > > > > > > > > > > > > + if (setparam->value) > > > > > > > > > > > > + WRITE_ONCE(fpriv->vm.no_imp= licit_sync, true); > > > > > > > > > > > > + else > > > > > > > > > > > > + WRITE_ONCE(fpriv->vm.no_imp= licit_sync, false); > > > > > > > > > > > > + break; > > > > > > > > > > > > + default: > > > > > > > > > > > > + return -EINVAL; > > > > > > > > > > > > + } > > > > > > > > > > > > + > > > > > > > > > > > > + return 0; > > > > > > > > > > > > +} > > > > > > > > > > > > + > > > > > > > > > > > > const struct drm_ioctl_desc amdgpu_ioctls_kms[= ] =3D { > > > > > > > > > > > > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_CREATE, am= dgpu_gem_create_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > > > > > > > > > > > DRM_IOCTL_DEF_DRV(AMDGPU_CTX, amdgpu_ct= x_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > > > > > > > > > > > @@ -1742,6 +1762,7 @@ const struct drm_ioctl_desc a= mdgpu_ioctls_kms[] =3D { > > > > > > > > > > > > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu= _gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > > > > > > > > > > > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu= _gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > > > > > > > > > > > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, a= mdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > > > > > > > > > > > + DRM_IOCTL_DEF_DRV(AMDGPU_SETPARAM, amdgpu_s= etparam_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > > > > > > > > > > > > }; > > > > > > > > > > > > = > > > > > > > > > > > > static const struct drm_driver amdgpu_kms_driv= er =3D { > > > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h= b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h > > > > > > > > > > > > index ddb85a85cbba..0e8c440c6303 100644 > > > > > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h > > > > > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h > > > > > > > > > > > > @@ -321,6 +321,12 @@ struct amdgpu_vm { > > > > > > > > > > > > bool bulk_moveable; > > > > > > > > > > > > /* Flag to indicate if VM is used for c= ompute */ > > > > > > > > > > > > bool is_compute_cont= ext; > > > > > > > > > > > > + /* > > > > > > > > > > > > + * Flag to indicate whether implicit sync s= hould always be skipped on > > > > > > > > > > > > + * this context. We do not care about races= at all, userspace is allowed > > > > > > > > > > > > + * to shoot itself with implicit sync to it= s fullest liking. > > > > > > > > > > > > + */ > > > > > > > > > > > > + bool no_implicit_sync; > > > > > > > > > > > > }; > > > > > > > > > > > > = > > > > > > > > > > > > struct amdgpu_vm_manager { > > > > > > > > > > > > diff --git a/include/uapi/drm/amdgpu_drm.h b/includ= e/uapi/drm/amdgpu_drm.h > > > > > > > > > > > > index 0cbd1540aeac..9eae245c14d6 100644 > > > > > > > > > > > > --- a/include/uapi/drm/amdgpu_drm.h > > > > > > > > > > > > +++ b/include/uapi/drm/amdgpu_drm.h > > > > > > > > > > > > @@ -54,6 +54,7 @@ extern "C" { > > > > > > > > > > > > #define DRM_AMDGPU_VM 0x13 > > > > > > > > > > > > #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14 > > > > > > > > > > > > #define DRM_AMDGPU_SCHED 0x15 > > > > > > > > > > > > +#define DRM_AMDGPU_SETPARAM 0x16 > > > > > > > > > > > > = > > > > > > > > > > > > #define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOW= R(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create) > > > > > > > > > > > > #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOW= R(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap) > > > > > > > > > > > > @@ -71,6 +72,7 @@ extern "C" { > > > > > > > > > > > > #define DRM_IOCTL_AMDGPU_VM DRM_IOW= R(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm) > > > > > > > > > > > > #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_I= OWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_t= o_handle) > > > > > > > > > > > > #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW= (DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched) > > > > > > > > > > > > +#define DRM_IOCTL_AMDGPU_SETPARAM DRM_IOW(DRM= _COMMAND_BASE + DRM_AMDGPU_SETPARAM, struct drm_amdgpu_setparam) > > > > > > > > > > > > = > > > > > > > > > > > > /** > > > > > > > > > > > > * DOC: memory domains > > > > > > > > > > > > @@ -306,6 +308,14 @@ union drm_amdgpu_sched { > > > > > > > > > > > > struct drm_amdgpu_sched_in in; > > > > > > > > > > > > }; > > > > > > > > > > > > = > > > > > > > > > > > > +#define AMDGPU_SETPARAM_NO_IMPLICIT_SYNC 1 > > > > > > > > > > > > + > > > > > > > > > > > > +struct drm_amdgpu_setparam { > > > > > > > > > > > > + /* AMDGPU_SETPARAM_* */ > > > > > > > > > > > > + __u32 param; > > > > > > > > > > > > + __u32 value; > > > > > > > > > > > > +}; > > > > > > > > > > > > + > > > > > > > > > > > > /* > > > > > > > > > > > > * This is not a reliable API and you should e= xpect it to fail for any > > > > > > > > > > > > * number of reasons and have fallback path th= at do not use userptr to > > > > > > > > > > > > -- > > > > > > > > > > > > 2.32.0.rc2 > > > > > > > > > > > > = > > > > = > > > > -- > > > > Daniel Vetter > > > > Software Engineer, Intel Corporation > > > > https://nam11.safelinks.protection.outlook.com/?url=3Dhttp%3A%2F%2F= blog.ffwll.ch%2F&data=3D04%7C01%7Cchristian.koenig%40amd.com%7C517f0d34= 67324e7ce05008d936581f60%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63760= 0574408265873%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiL= CJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3Dtc6ZdgYzOXpER4vpuOiOlyIsr7= YTAHLMcuFaNjSs6YE%3D&reserved=3D0 > = -- = Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx