From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?windows-1252?Q?Christian_K=F6nig?= Subject: Re: CIK hangs with kernel 3.15, bisected Date: Thu, 29 May 2014 18:30:51 +0200 Message-ID: <538760BB.4090208@vodafone.de> References: <53735D79.6050904@vodafone.de> <5385BCBD.7010803@vodafone.de> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: Received: from pegasos-out.vodafone.de (pegasos-out.vodafone.de [80.84.1.38]) by gabe.freedesktop.org (Postfix) with ESMTP id 4815D6E9E7 for ; Thu, 29 May 2014 09:31:08 -0700 (PDT) Received: from localhost (localhost.localdomain [127.0.0.1]) by pegasos-out.vodafone.de (Rohrpostix2 Daemon) with ESMTP id D640268B8A3 for ; Thu, 29 May 2014 18:31:06 +0200 (CEST) Received: from pegasos-out.vodafone.de ([127.0.0.1]) by localhost (rohrpostix2.prod.vfnet.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rO3JMR+dNffV for ; Thu, 29 May 2014 18:31:01 +0200 (CEST) In-Reply-To: <5385BCBD.7010803@vodafone.de> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: =?windows-1252?Q?Marek_Ol=9A=E1k?= , Alex Deucher Cc: dri-devel List-Id: dri-devel@lists.freedesktop.org Hi Marek & Alex, I've found the issue why forcefully evicting page tables sometimes = crashes the box. Well this is a typical hexdump page table before it is moved to GART: 000117f000 02914061 00000000 000117f008 02915061 00000000 000117f010 02916061 00000000 000117f018 02917061 00000000 000117f020 02918061 00000000 And it looks like this when it comes back: 0006102000 00000000 00000000 * Ideas? I don't really have an explanation for this. Moving buffers = around otherwise seems to work perfectly fine. Thanks, Christian. Am 28.05.2014 12:38, schrieb Christian K=F6nig: > I already tried a similar patch as well, without any more noticeable = > crashes. But going to give this another round with your patch and = > openarena. > > Thanks, > Christian. > > Am 27.05.2014 23:55, schrieb Marek Ol=9A=E1k: >> Hi Christian, >> >> I test on Bonaire (ChipID =3D 0x665c). Unfortunately, the hangs are not >> fixed yet. They are very rare and very random. Therefore, I have come >> up with a patch which evicts page tables between IBs. See the >> attachment. With that patch applied, the system starts fine, compiz >> and glxgears work, but once I start playing openarena, it locks up >> pretty quickly. >> >> The patch shouldn't do anything in theory, because pages are moved >> back to VRAM immediately after that. However, the VRAM address of page >> tables may end up being different from before, which might be the root >> cause. >> >> Marek >> >> On Wed, May 14, 2014 at 2:11 PM, Christian K=F6nig >> wrote: >>> Crap, any chance you can narrow it down a bit more? >>> >>> I've just tried a piglit quick test on my Bonaire and it seems to work >>> perfectly fine. >>> >>> What hw do you test on? >>> >>> Regards, >>> Christian. >>> >>> Am 13.05.2014 23:21, schrieb Marek Ol=9A=E1k: >>> >>>> Hi Christian, >>>> >>>> Even though some regressions are fixed by these patches: >>>> >>>> drm/radeon: fix page directory update size estimation >>>> drm/radeon: fix buffer placement under memory pressure v2 >>>> >>>> and indeed, the texelFetch tests no longer hang, there is one more >>>> hang which needs to be fixed. :( All I know is the exact same commit >>>> causes it and it can only be reproduced by running whole piglit with >>>> concurrency enabled. >>>> >>>> My kernel git log: >>>> >>>> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2 >>>> (10 hours ago) >>>> * 3af91e5 - drm/radeon: fix page directory update size estimation (21 >>>> hours ago) >>>> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2 >>>> months ago) >>>> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2 >>>> months ago) >>>> >>>> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either >>>> of the two fixes is the first bad commit. >>>> >>>> Marek >>>> >>>> On Fri, May 9, 2014 at 8:03 PM, Marek Ol=9A=E1k wro= te: >>>>> Hi Christian, >>>>> >>>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire: >>>>> >>>>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592 >>>>> Author: Christian K=F6nig >>>>> Date: Thu Feb 20 13:42:17 2014 +0100 >>>>> >>>>> drm/radeon: use normal BOs for the page tables v4 >>>>> >>>>> No need to make it more complicated than necessary, >>>>> just allocate the page tables as normal BO and >>>>> flush whenever the address change. >>>>> >>>>> v2: update comments and function name >>>>> v3: squash bug fixes, page directory and tables patch >>>>> v4: rebased on Mareks changes >>>>> >>>>> Signed-off-by: Christian K=F6nig >>>>> >>>>> >>>>> Reverting the commit gives me a lot of merge conflicts. >>>>> >>>>> The simplest way to reproduce the hangs is to run piglit with these >>>>> parameters: >>>>> -t texelFetch.fs >>>>> >>>>> Some of the tests allocate a lot of MSAA textures and the tests also >>>>> run in parallel, which creates a lot of memory pressure and probably >>>>> causes buffer evictions. >>>>> >>>>> Any idea what is wrong with it? >>>>> >>>>> Thanks, >>>>> >>>>> Marek >>> >