From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?ISO-8859-1?Q?Christian_K=F6nig?= Subject: Re: Reworking of GPU reset logic Date: Sat, 21 Apr 2012 11:42:01 +0200 Message-ID: <4F9280E9.8030100@vodafone.de> References: <1334875160-5454-1-git-send-email-deathsimple@vodafone.de> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: Received: from outgoing.email.vodafone.de (outgoing.email.vodafone.de [139.7.28.128]) by gabe.freedesktop.org (Postfix) with ESMTP id D07439E771 for ; Sat, 21 Apr 2012 02:42:05 -0700 (PDT) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org To: Jerome Glisse Cc: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org On 20.04.2012 01:47, Jerome Glisse wrote: > 2012/4/19 Christian K=F6nig: >> This includes mostly fixes for multi ring lockups and GPU resets, but it= should general improve the behavior of the kernel mode driver in case some= thing goes badly wrong. >> >> On the other hand it completely rewrites the IB pool and semaphore handl= ing, so I think there are still a couple of problems in it. >> >> The first four patches were already send to the list, but the current se= t depends on them so I resend them again. >> >> Cheers, >> Christian. > I did a quick review, it looks mostly good, but as it's sensitive code > i would like to spend sometime on > it. Probably next week. Note that i had some work on this area too, i > mostly want to drop all the debugfs > related to this and add some new more usefull (basicly something that > allow you to read all the data > needed to replay a locking up ib). I also was looking into Dave reset > thread and your solution of moving > reset in ioctl return path sounds good too but i need to convince my > self that it encompass all possible > case. > > Cheers, > Jerome > After sleeping a night over it I already reworked the patch for = improving the SA performance, so please wait at least for v2 before = taking a look at it :) Regarding the debugging of lockups I had the following on my "in mind = todo" list: 1. Rework the chip specific lockup detection code a bit more and = probably clean it up a bit. 2. Make the timeout a module parameter, cause compute task sometimes = block a ring for more than 10 seconds. 3. Keep track of the actually RPTR offset a fence is emitted to 3. Keep track of all the BOs a IB is touching. 4. Now if a lockup happens start with the last successfully signaled = fence and dump the ring content after that RPTR offset till the first = not signaled fence. 5. Then if this fence references to an IB dump it's content and the BOs = it is touching. 6. Dump everything on the ring after that fence until you reach the RPTR = of the next fence or the WPTR of the ring. 7. If there is a next fence repeat the whole thing at number 5. If I'm not completely wrong that should give you practically every = information available, and we probably should put that behind another = module option, cause we are going to spam syslog pretty much here. Feel = free to add/modify the ideas on this list. Christian.