From mboxrd@z Thu Jan 1 00:00:00 1970 From: Grigori Goronzy Subject: Re: CIK hangs with kernel 3.15, bisected Date: Fri, 30 May 2014 02:30:00 +0200 Message-ID: <5387D108.3050503@chown.ath.cx> References: <536D4B08.8030000@chown.ath.cx> <536DE204.5060308@vodafone.de> <536E552F.6010401@vodafone.de> <536F3D93.6050604@vodafone.de> <5370C3AB.1080903@vodafone.de> <53715B76.9040304@chown.ath.cx> <537224C9.60804@vodafone.de> <53723ABD.6030704@vodafone.de> <53727E52.20502@chown.ath.cx> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: Received: from pygmy.kinoho.net (pygmy.kinoho.net [134.0.27.24]) by gabe.freedesktop.org (Postfix) with ESMTP id D90C66E2CB for ; Thu, 29 May 2014 17:30:05 -0700 (PDT) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: =?windows-1252?Q?Marek_Ol=9A=E1k?= Cc: dri-devel List-Id: dri-devel@lists.freedesktop.org On 13.05.2014 22:27, Marek Ol=9A=E1k wrote: > I applied these two patches Christian sent to dri-devel: > > drm/radeon: fix page directory update size estimation > drm/radeon: fix buffer placement under memory pressure v2 > > on top of torvalds's master branch. > With latest kernel master (a991639c) I still see a regression, compared = to 3.13 or 3.14, which have similar performance. Xonotic is about 7% = slower. OpenArena and Unigine Tropics are also noticeably slower, but I = didn't record accurate numbers. Maybe the improved memory management has some overhead, but this is not = acceptable IMHO. I'll try to investigate further. Best regards Grigori > Marek > > On Tue, May 13, 2014 at 10:19 PM, Grigori Goronzy wro= te: >> On 13.05.2014 21:50, Marek Ol=9A=E1k wrote: >>> >>> Hi Christian, >>> >>> The performance regression I saw with piglit seems to be fixed with >>> latest kernel git. It's difficult to bisect the kernel, because there >>> are only merges between 3.14 and 3.15 and the merged committs are >>> actually based on 3.14-rc1 and 3.14-rc4. >>> >>> All seems to be fine with your fixes. >>> >> >> Which fixes have you applied? There are quite a few pending patches on >> dri-devel, that aren't yet part of drm-fixes-3.15. >> >> Grigori >> >> >>> Marek >>> >>> On Tue, May 13, 2014 at 5:31 PM, Christian K=F6nig >>> wrote: >>>> >>>> Is the performance regression regression caused by the page table chan= ges >>>> or >>>> something else? >>>> >>>> I did made some tests with xonotic while developing it and it didn't >>>> showed >>>> anything obvious, but I didn't made tests on different systems. >>>> >>>> Christian. >>>> >>>> Am 13.05.2014 17:19, schrieb Marek Ol=9A=E1k: >>>> >>>>> Your latest patches fix the regression. >>>>> >>>>> The performance regression can also be reproduced with piglit "-t >>>>> texelFetch.fs". >>>>> >>>>> Kernel 3.14: >>>>> real 0m17.724s >>>>> user 0m41.905s >>>>> sys 0m11.299s >>>>> >>>>> The problematic commit checked out + your fixes (without the PTE patc= h I >>>>> think): >>>>> real 0m23.474s >>>>> user 1m1.008s >>>>> sys 0m13.812s >>>>> >>>>> Marek >>>>> >>>>> >>>>> On Tue, May 13, 2014 at 3:57 PM, Christian K=F6nig >>>>> wrote: >>>>>> >>>>>> >>>>>> Am 13.05.2014 15:22, schrieb Alex Deucher: >>>>>> >>>>>>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> I can confirm this fixes it for me, too. >>>>>>>> >>>>>>>> 3.15 with these fixes and the large PTE patches actually ends up >>>>>>>> being >>>>>>>> noticeably slower than earlier kernels with Xonotic, though. I won= der >>>>>>>> what's >>>>>>>> going on. >>>>>>> >>>>>>> >>>>>>> Allocation overhead? >>>>>> >>>>>> >>>>>> >>>>>> Unlikely, Xonotic just allocates a single page table at start, which >>>>>> then >>>>>> gets extended to a certain rate until they no longer need more addre= ss >>>>>> space >>>>>> and are done with it. >>>>>> >>>>>> Grigori, can you bisect and/or try to figure out what's wrong here? >>>>>> >>>>>> Christian. >>>>>> >>>>>> >>>>>>> >>>>>>>> Grigori >>>>>>>> >>>>>>>> >>>>>>>> On 12.05.2014 14:50, Christian K=F6nig wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> I could reproduce the problem with xonotic and I think I've found >>>>>>>>> the >>>>>>>>> issue. >>>>>>>>> >>>>>>>>> Please test the attached patch. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Christian. >>>>>>>>> >>>>>>>>> Am 11.05.2014 11:06, schrieb Christian K=F6nig: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I have tested it and it doesn't fix the hangs. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yeah, thought so. Well it was just a guess. >>>>>>>>>> >>>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I >>>>>>>>>>> added >>>>>>>>>>> for userspace buffers.) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Actually it shouldn't affect that. The alternative domain always >>>>>>>>>> contains GART even when userspace only specified VRAM as placeme= nt >>>>>>>>>> (as >>>>>>>>>> long as it is technical possible to do so). >>>>>>>>>> >>>>>>>>>> So what should happen is that TTM sees the current placement, >>>>>>>>>> matches >>>>>>>>>> that with the desired placement and should find that it doesn't >>>>>>>>>> need >>>>>>>>>> to move the buffer (we should just test if this behavior really >>>>>>>>>> works >>>>>>>>>> as expected). >>>>>>>>>> >>>>>>>>>> Christian. >>>>>>>>>> >>>>>>>>>> Am 10.05.2014 23:38, schrieb Marek Ol=9A=E1k: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Christian, >>>>>>>>>>> >>>>>>>>>>> I have tested it and it doesn't fix the hangs. >>>>>>>>>>> >>>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I >>>>>>>>>>> added >>>>>>>>>>> for userspace buffers.) >>>>>>>>>>> >>>>>>>>>>> Marek >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sat, May 10, 2014 at 6:34 PM, Christian K=F6nig >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Couldn't reproduce the issue so far. So the attached patch is >>>>>>>>>>>> just >>>>>>>>>>>> a >>>>>>>>>>>> complete shoot into the dark found by rereading the code, but = it >>>>>>>>>>>> might >>>>>>>>>>>> actually be the problem. >>>>>>>>>>>> >>>>>>>>>>>> Please give it a try. >>>>>>>>>>>> >>>>>>>>>>>> Going to keep testing in the meantime, >>>>>>>>>>>> Christian. >>>>>>>>>>>> >>>>>>>>>>>> Am 10.05.2014 10:23, schrieb Christian K=F6nig: >>>>>>>>>>>> >>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e= .g. >>>>>>>>>>>>>> if >>>>>>>>>>>>>> I boot >>>>>>>>>>>>>> with radeon.vramlimit=3D256 and then run Xonotic timedemo wi= th >>>>>>>>>>>>>> high >>>>>>>>>>>>>> settings. >>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a >>>>>>>>>>>>>> similar >>>>>>>>>>>>>> problem. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Sounds like the same issue to me. Thx for the good test case. >>>>>>>>>>>>> >>>>>>>>>>>>>> Any idea what is wrong with it? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Actually I already wondered that it went so smooth without any >>>>>>>>>>>>> regression >>>>>>>>>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet. >>>>>>>>>>>>> >>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the te= sts >>>>>>>>>>>>>> also >>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and >>>>>>>>>>>>>> probably >>>>>>>>>>>>>> causes buffer evictions. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Sounds like the underlying problem to me. We probably evict s= ome >>>>>>>>>>>>> part of a >>>>>>>>>>>>> page table without updating the page directory. Going to dig >>>>>>>>>>>>> into >>>>>>>>>>>>> it today, >>>>>>>>>>>>> it's probably just a one liner missing somewhere in the VM co= de. >>>>>>>>>>>>> >>>>>>>>>>>>> Christian. >>>>>>>>>>>>> >>>>>>>>>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 09.05.2014 20:03, Marek Ol=9A=E1k wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on >>>>>>>>>>>>>>> Bonaire: >>>>>>>>>>>>>>> [...] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The simplest way to reproduce the hangs is to run piglit wi= th >>>>>>>>>>>>>>> these >>>>>>>>>>>>>>> parameters: >>>>>>>>>>>>>>> -t texelFetch.fs >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the >>>>>>>>>>>>>>> tests >>>>>>>>>>>>>>> also >>>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and >>>>>>>>>>>>>>> probably >>>>>>>>>>>>>>> causes buffer evictions. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e= .g. >>>>>>>>>>>>>> if >>>>>>>>>>>>>> I boot >>>>>>>>>>>>>> with radeon.vramlimit=3D256 and then run Xonotic timedemo wi= th >>>>>>>>>>>>>> high >>>>>>>>>>>>>> settings. >>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a >>>>>>>>>>>>>> similar >>>>>>>>>>>>>> problem. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Grigori >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> dri-devel mailing list >>>>>>>> dri-devel@lists.freedesktop.org >>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>> >>>>>> >>>>>> >>>> >>