From mboxrd@z Thu Jan 1 00:00:00 1970 From: Maarten Lankhorst Subject: Re: [PATCH] drm/nv84-: write fence value on exit, and restore value on init. Date: Wed, 04 Sep 2013 14:37:07 +0200 Message-ID: <52272973.9010303@canonical.com> References: <1378132262-19453-1-git-send-email-maarten.lankhorst@canonical.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nouveau-bounces+gcfxn-nouveau=m.gmane.org-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org Errors-To: nouveau-bounces+gcfxn-nouveau=m.gmane.org-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org To: Ben Skeggs Cc: "nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org" , "dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org" List-Id: nouveau.vger.kernel.org Op 04-09-13 05:21, Ben Skeggs schreef: > On Tue, Sep 3, 2013 at 12:31 AM, Maarten Lankhorst > wrote: >> This increases the chance slightly that recovery from lockup can happen >> succesfully. > I'd *really* love to see proof of this. When channels die, all > outstanding fences are marked as signalled. This should do absolutely > nothing... nv84+ heavily rely on fences though, and a race like this is possible: - channel 0 uses a bo from channel 1, queues a wait somewhere in the command stream for it. - channel 1 dies cleanly, but userspace creates a new channel in its place, fence counter is reset to 0. - channel 0 reaches the NV84_SUBCHAN_SEMAPHORE_TRIGGER.ACQUIRE_GEQUAL op, waits on fence in channel 1 to signal forever. Channel 0 could be the global drm channel used for buffer moves, which would result in a hang. This may seem unlikely, but I believe that parallel piglit runs could trigger it. If not, simply creating an operation that takes a few seconds in channel 0 and then queuing a command that uses a bo from channel 1 while chan1 is still busy, then deleting/recreating chan1 could trigger it. ~Maarten