From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Hellstrom Subject: Re: [PATCH 3/6] drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock held Date: Wed, 28 Nov 2012 16:10:02 +0100 Message-ID: <50B6294A.7070906@vmware.com> References: <1354101944-10455-1-git-send-email-maarten.lankhorst@canonical.com> <1354101944-10455-3-git-send-email-maarten.lankhorst@canonical.com> <50B5FB82.9020904@vmware.com> <50B6006A.6030107@canonical.com> <50B60FBC.6090907@vmware.com> <50B617D2.4090109@canonical.com> <50B61E66.1040805@vmware.com> <50B623CE.5070801@canonical.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from smtp-outbound-2.vmware.com (smtp-outbound-2.vmware.com [208.91.2.13]) by gabe.freedesktop.org (Postfix) with ESMTP id 933BBE5EB1 for ; Wed, 28 Nov 2012 07:10:07 -0800 (PST) In-Reply-To: <50B623CE.5070801@canonical.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org To: Maarten Lankhorst Cc: Maarten Lankhorst , dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org On 11/28/2012 03:46 PM, Maarten Lankhorst wrote: > Op 28-11-12 15:23, Thomas Hellstrom schreef: >> On 11/28/2012 02:55 PM, Maarten Lankhorst wrote: >>> Op 28-11-12 14:21, Thomas Hellstrom schreef: >>>> On 11/28/2012 01:15 PM, Maarten Lankhorst wrote: >>>>> Op 28-11-12 12:54, Thomas Hellstrom schreef: >>>>>> On 11/28/2012 12:25 PM, Maarten Lankhorst wrote: >>>>>>> By removing the unlocking of lru and retaking it immediately, a race is >>>>>>> removed where the bo is taken off the swap list or the lru list between >>>>>>> the unlock and relock. As such the cleanup_refs code can be simplified, >>>>>>> it will attempt to call ttm_bo_wait non-blockingly, and if it fails >>>>>>> it will drop the locks and perform a blocking wait, or return an error >>>>>>> if no_wait_gpu was set. >>>>>>> >>>>>>> The need for looping is also eliminated, since swapout and evict_mem_first >>>>>>> will always follow the destruction path, so no new fence is allowed >>>>>>> to be attached. As far as I can see this may already have been the case, >>>>>>> but the unlocking / relocking required a complicated loop to deal with >>>>>>> re-reservation. >>>>>>> >>>>>>> The downside is that ttm_bo_cleanup_memtype_use is no longer called with >>>>>>> reservation held, so drivers must be aware that move_notify with a null >>>>>>> parameter doesn't require a reservation. >>>>>> Why can't we unreserve *after* ttm_bo_cleanup_memtype_use? That's not >>>>>> immediately clear from this patch. >>>>> Because we would hold the reservation while waiting and with the object still >>>>> on swap and lru lists still, that would defeat the whole purpose of keeping >>>>> the object on multiple lists, plus break current code that assumes bo's on the >>>>> those lists can always be reserved. >>>>> >>>>> the if (ret && !no_wait_gpu) path has to drop the reservation and lru lock, and >>>>> isn't guaranteed to be able to retake it. Maybe it could be guaranteed now, but >>>>> I'm sure that would not be the case if the reservations were shared across >>>>> devices. >>>> The evict path removes the BO from the LRU lists, drops the LRU lock but hangs on to the reservation, >>>> and in case the wait goes wrong, re-adds the bo to the LRU lists and returns an error. >>> If you really want to, we could hang on to the !no_wait_gpu path, wait shouldn't ever fail there, so I suppose >>> leaving it off the lru lists and not re-add on any list in case of wait fail is fine. It's still on the ddestroy list in that >>> case, so not adding it back to the other lists is harmless. >>> >> Well I'm a bit afraid that theoretically, other callers may have a bo reserved, while cleanup_refs_and_unlock >> more or less runs the whole destroy path on that buffer. Sure, we have control over those other reservers, >> but it may come back and bite us. > That's why initially I moved all the destruction to ttm_bo_release_list, to have all destruction in > only 1 place. But even now it's serialized with the lru lock, while the destruction may not happen > right away, it still happens before last list ref to the bo is dropped. > > But it's your call, just choose the approach you want and I'll resubmit this. :-) > >> Also the wait might fail if a signal is hit, so it's definitely possible, and even likely in the case of the X server process. >> >> Anyway, I prefer if we could try to keep the reservation across the ttm_cleanup_memtype_use function, and as far >> as I can tell, the only thing preventing that is the reservation release in the (!no_wait_gpu) path. So if we alter that to >> do the same as the evict path I think without looking to deeply into the consequences that we should be safe. > I think returning success early without removing off ddestroy list if re-reserving fails > with lru lock held would be better. > > We completed the wait and attempt to reserve the bo, which failed. Without the lru > lock atomicity, this can't happen since the only places that would do it call this with > the lru lock held. > > With the atomicity removal, the only place that could do this is ttm_bo_delayed_delete > with remove_all set to true. And even if that happened the destruction code would run > *anyway* since we completed the waiting part already, it would just not necessarily be > run from this thread, but that guarantee didn't exist anyway. >> Then we should be able to skip patch 2 as well. > If my tryreserve approach sounds sane, second patch should still be skippable. :-) Sure, Lets go for that approach. > > ~Maarten /Thomas