All of lore.kernel.org
 help / color / mirror / Atom feed
* CIK hangs with kernel 3.15, bisected
@ 2014-05-09 18:03 Marek Olšák
  2014-05-09 18:10 ` Rafał Miłecki
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Marek Olšák @ 2014-05-09 18:03 UTC (permalink / raw)
  To: Christian König, dri-devel

Hi Christian,

This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:

commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
Author: Christian König <christian.koenig@amd.com>
Date:   Thu Feb 20 13:42:17 2014 +0100

    drm/radeon: use normal BOs for the page tables v4

    No need to make it more complicated than necessary,
    just allocate the page tables as normal BO and
    flush whenever the address change.

    v2: update comments and function name
    v3: squash bug fixes, page directory and tables patch
    v4: rebased on Mareks changes

    Signed-off-by: Christian König <christian.koenig@amd.com>


Reverting the commit gives me a lot of merge conflicts.

The simplest way to reproduce the hangs is to run piglit with these parameters:
-t texelFetch.fs

Some of the tests allocate a lot of MSAA textures and the tests also
run in parallel, which creates a lot of memory pressure and probably
causes buffer evictions.

Any idea what is wrong with it?

Thanks,

Marek
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-09 18:03 CIK hangs with kernel 3.15, bisected Marek Olšák
@ 2014-05-09 18:10 ` Rafał Miłecki
  2014-05-09 21:39 ` Grigori Goronzy
  2014-05-13 21:21 ` Marek Olšák
  2 siblings, 0 replies; 31+ messages in thread
From: Rafał Miłecki @ 2014-05-09 18:10 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

On 9 May 2014 20:03, Marek Olšák <maraeo@gmail.com> wrote:
> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>
> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
> Author: Christian König <christian.koenig@amd.com>
> Date:   Thu Feb 20 13:42:17 2014 +0100
>
>     drm/radeon: use normal BOs for the page tables v4

Also reported in:
https://bugzilla.kernel.org/show_bug.cgi?id=75651
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-09 18:03 CIK hangs with kernel 3.15, bisected Marek Olšák
  2014-05-09 18:10 ` Rafał Miłecki
@ 2014-05-09 21:39 ` Grigori Goronzy
  2014-05-10  8:23   ` Christian König
  2014-05-13 21:21 ` Marek Olšák
  2 siblings, 1 reply; 31+ messages in thread
From: Grigori Goronzy @ 2014-05-09 21:39 UTC (permalink / raw)
  To: Marek Olšák, Christian König, dri-devel

On 09.05.2014 20:03, Marek Olšák wrote:
>
> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>[...]
>
> The simplest way to reproduce the hangs is to run piglit with these parameters:
> -t texelFetch.fs
>
> Some of the tests allocate a lot of MSAA textures and the tests also
> run in parallel, which creates a lot of memory pressure and probably
> causes buffer evictions.
>

I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I 
boot with radeon.vramlimit=256 and then run Xonotic timedemo with high 
settings. I haven't had a chance to bisect it yet, but it might be a 
similar problem.

Grigori
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-09 21:39 ` Grigori Goronzy
@ 2014-05-10  8:23   ` Christian König
  2014-05-10 16:34     ` Christian König
  0 siblings, 1 reply; 31+ messages in thread
From: Christian König @ 2014-05-10  8:23 UTC (permalink / raw)
  To: Grigori Goronzy, Marek Olšák, dri-devel

> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I 
> boot with radeon.vramlimit=256 and then run Xonotic timedemo with high 
> settings. I haven't had a chance to bisect it yet, but it might be a 
> similar problem.
Sounds like the same issue to me. Thx for the good test case.

> Any idea what is wrong with it?
Actually I already wondered that it went so smooth without any 
regression so far, didn't noticed the bug in bugzilla.kernel.org yet.

> Some of the tests allocate a lot of MSAA textures and the tests also
> run in parallel, which creates a lot of memory pressure and probably
> causes buffer evictions.
Sounds like the underlying problem to me. We probably evict some part of 
a page table without updating the page directory. Going to dig into it 
today, it's probably just a one liner missing somewhere in the VM code.

Christian.

Am 09.05.2014 23:39, schrieb Grigori Goronzy:
> On 09.05.2014 20:03, Marek Olšák wrote:
>>
>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>> [...]
>>
>> The simplest way to reproduce the hangs is to run piglit with these 
>> parameters:
>> -t texelFetch.fs
>>
>> Some of the tests allocate a lot of MSAA textures and the tests also
>> run in parallel, which creates a lot of memory pressure and probably
>> causes buffer evictions.
>>
>
> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I 
> boot with radeon.vramlimit=256 and then run Xonotic timedemo with high 
> settings. I haven't had a chance to bisect it yet, but it might be a 
> similar problem.
>
> Grigori

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-10  8:23   ` Christian König
@ 2014-05-10 16:34     ` Christian König
  2014-05-10 21:38       ` Marek Olšák
  0 siblings, 1 reply; 31+ messages in thread
From: Christian König @ 2014-05-10 16:34 UTC (permalink / raw)
  To: Grigori Goronzy, Marek Olšák, dri-devel

[-- Attachment #1: Type: text/plain, Size: 1979 bytes --]

Couldn't reproduce the issue so far. So the attached patch is just a 
complete shoot into the dark found by rereading the code, but it might 
actually be the problem.

Please give it a try.

Going to keep testing in the meantime,
Christian.

Am 10.05.2014 10:23, schrieb Christian König:
>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I 
>> boot with radeon.vramlimit=256 and then run Xonotic timedemo with 
>> high settings. I haven't had a chance to bisect it yet, but it might 
>> be a similar problem.
> Sounds like the same issue to me. Thx for the good test case.
>
>> Any idea what is wrong with it?
> Actually I already wondered that it went so smooth without any 
> regression so far, didn't noticed the bug in bugzilla.kernel.org yet.
>
>> Some of the tests allocate a lot of MSAA textures and the tests also
>> run in parallel, which creates a lot of memory pressure and probably
>> causes buffer evictions.
> Sounds like the underlying problem to me. We probably evict some part 
> of a page table without updating the page directory. Going to dig into 
> it today, it's probably just a one liner missing somewhere in the VM 
> code.
>
> Christian.
>
> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>
>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>> [...]
>>>
>>> The simplest way to reproduce the hangs is to run piglit with these 
>>> parameters:
>>> -t texelFetch.fs
>>>
>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>> run in parallel, which creates a lot of memory pressure and probably
>>> causes buffer evictions.
>>>
>>
>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I 
>> boot with radeon.vramlimit=256 and then run Xonotic timedemo with 
>> high settings. I haven't had a chance to bisect it yet, but it might 
>> be a similar problem.
>>
>> Grigori
>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-drm-radeon-fix-buffer-placement-under-memory-pressur.patch --]
[-- Type: text/x-diff; name="0001-drm-radeon-fix-buffer-placement-under-memory-pressur.patch", Size: 1944 bytes --]

>From 93a89ae1bdf359a4261ae0120ba893039a6f05be Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig@amd.com>
Date: Sat, 10 May 2014 18:17:09 +0200
Subject: [PATCH] drm/radeon: fix buffer placement under memory pressure
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Some buffers (UVD/VM page tables) must be placed in VRAM,
but the byte restriction for moving buffers didn't took this
into account.

This patch not only fixed that bug, but also improves
the situation when we run out of GART space.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/radeon/radeon_object.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 72705fb..92ff6be 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -447,8 +447,6 @@ int radeon_bo_list_validate(struct radeon_device *rdev,
 		bo = lobj->robj;
 		if (!bo->pin_count) {
 			u32 domain = lobj->domain;
-			u32 current_domain =
-				radeon_mem_type_to_domain(bo->tbo.mem.mem_type);
 
 			/* Check if this buffer will be moved and don't move it
 			 * if we have moved too many buffers for this IB already.
@@ -458,11 +456,10 @@ int radeon_bo_list_validate(struct radeon_device *rdev,
 			 * into account. We don't want to disallow buffer moves
 			 * completely.
 			 */
-			if (current_domain != RADEON_GEM_DOMAIN_CPU &&
-			    (domain & current_domain) == 0 && /* will be moved */
-			    bytes_moved > bytes_moved_threshold) {
-				/* don't move it */
-				domain = current_domain;
+			if (bytes_moved > bytes_moved_threshold) {
+				/* if we already moved to many bytes accept
+				   the alternative domain as well */
+				domain = lobj->alt_domain;
 			}
 
 		retry:
-- 
1.9.1


[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-10 16:34     ` Christian König
@ 2014-05-10 21:38       ` Marek Olšák
  2014-05-11  9:06         ` Christian König
  0 siblings, 1 reply; 31+ messages in thread
From: Marek Olšák @ 2014-05-10 21:38 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

Hi Christian,

I have tested it and it doesn't fix the hangs.

(Also, I don't like the patch, because it reverts the behavior I added
for userspace buffers.)

Marek



On Sat, May 10, 2014 at 6:34 PM, Christian König
<deathsimple@vodafone.de> wrote:
> Couldn't reproduce the issue so far. So the attached patch is just a
> complete shoot into the dark found by rereading the code, but it might
> actually be the problem.
>
> Please give it a try.
>
> Going to keep testing in the meantime,
> Christian.
>
> Am 10.05.2014 10:23, schrieb Christian König:
>
>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I boot
>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high settings.
>>> I haven't had a chance to bisect it yet, but it might be a similar problem.
>>
>> Sounds like the same issue to me. Thx for the good test case.
>>
>>> Any idea what is wrong with it?
>>
>> Actually I already wondered that it went so smooth without any regression
>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>
>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>> run in parallel, which creates a lot of memory pressure and probably
>>> causes buffer evictions.
>>
>> Sounds like the underlying problem to me. We probably evict some part of a
>> page table without updating the page directory. Going to dig into it today,
>> it's probably just a one liner missing somewhere in the VM code.
>>
>> Christian.
>>
>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>
>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>
>>>>
>>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>>> [...]
>>>>
>>>> The simplest way to reproduce the hangs is to run piglit with these
>>>> parameters:
>>>> -t texelFetch.fs
>>>>
>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>> run in parallel, which creates a lot of memory pressure and probably
>>>> causes buffer evictions.
>>>>
>>>
>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I boot
>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high settings.
>>> I haven't had a chance to bisect it yet, but it might be a similar problem.
>>>
>>> Grigori
>>
>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-10 21:38       ` Marek Olšák
@ 2014-05-11  9:06         ` Christian König
  2014-05-12 12:50           ` Christian König
  0 siblings, 1 reply; 31+ messages in thread
From: Christian König @ 2014-05-11  9:06 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

> I have tested it and it doesn't fix the hangs.
Yeah, thought so. Well it was just a guess.

> (Also, I don't like the patch, because it reverts the behavior I added
> for userspace buffers.)
Actually it shouldn't affect that. The alternative domain always 
contains GART even when userspace only specified VRAM as placement (as 
long as it is technical possible to do so).

So what should happen is that TTM sees the current placement, matches 
that with the desired placement and should find that it doesn't need to 
move the buffer (we should just test if this behavior really works as 
expected).

Christian.

Am 10.05.2014 23:38, schrieb Marek Olšák:
> Hi Christian,
>
> I have tested it and it doesn't fix the hangs.
>
> (Also, I don't like the patch, because it reverts the behavior I added
> for userspace buffers.)
>
> Marek
>
>
>
> On Sat, May 10, 2014 at 6:34 PM, Christian König
> <deathsimple@vodafone.de> wrote:
>> Couldn't reproduce the issue so far. So the attached patch is just a
>> complete shoot into the dark found by rereading the code, but it might
>> actually be the problem.
>>
>> Please give it a try.
>>
>> Going to keep testing in the meantime,
>> Christian.
>>
>> Am 10.05.2014 10:23, schrieb Christian König:
>>
>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I boot
>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high settings.
>>>> I haven't had a chance to bisect it yet, but it might be a similar problem.
>>> Sounds like the same issue to me. Thx for the good test case.
>>>
>>>> Any idea what is wrong with it?
>>> Actually I already wondered that it went so smooth without any regression
>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>
>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>> run in parallel, which creates a lot of memory pressure and probably
>>>> causes buffer evictions.
>>> Sounds like the underlying problem to me. We probably evict some part of a
>>> page table without updating the page directory. Going to dig into it today,
>>> it's probably just a one liner missing somewhere in the VM code.
>>>
>>> Christian.
>>>
>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>
>>>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>>>> [...]
>>>>>
>>>>> The simplest way to reproduce the hangs is to run piglit with these
>>>>> parameters:
>>>>> -t texelFetch.fs
>>>>>
>>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>>> run in parallel, which creates a lot of memory pressure and probably
>>>>> causes buffer evictions.
>>>>>
>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I boot
>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high settings.
>>>> I haven't had a chance to bisect it yet, but it might be a similar problem.
>>>>
>>>> Grigori
>>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-11  9:06         ` Christian König
@ 2014-05-12 12:50           ` Christian König
  2014-05-12 23:38             ` Grigori Goronzy
  0 siblings, 1 reply; 31+ messages in thread
From: Christian König @ 2014-05-12 12:50 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

[-- Attachment #1: Type: text/plain, Size: 3374 bytes --]

I could reproduce the problem with xonotic and I think I've found the issue.

Please test the attached patch.

Thanks,
Christian.

Am 11.05.2014 11:06, schrieb Christian König:
>> I have tested it and it doesn't fix the hangs.
> Yeah, thought so. Well it was just a guess.
>
>> (Also, I don't like the patch, because it reverts the behavior I added
>> for userspace buffers.)
> Actually it shouldn't affect that. The alternative domain always 
> contains GART even when userspace only specified VRAM as placement (as 
> long as it is technical possible to do so).
>
> So what should happen is that TTM sees the current placement, matches 
> that with the desired placement and should find that it doesn't need 
> to move the buffer (we should just test if this behavior really works 
> as expected).
>
> Christian.
>
> Am 10.05.2014 23:38, schrieb Marek Olšák:
>> Hi Christian,
>>
>> I have tested it and it doesn't fix the hangs.
>>
>> (Also, I don't like the patch, because it reverts the behavior I added
>> for userspace buffers.)
>>
>> Marek
>>
>>
>>
>> On Sat, May 10, 2014 at 6:34 PM, Christian König
>> <deathsimple@vodafone.de> wrote:
>>> Couldn't reproduce the issue so far. So the attached patch is just a
>>> complete shoot into the dark found by rereading the code, but it might
>>> actually be the problem.
>>>
>>> Please give it a try.
>>>
>>> Going to keep testing in the meantime,
>>> Christian.
>>>
>>> Am 10.05.2014 10:23, schrieb Christian König:
>>>
>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if 
>>>>> I boot
>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high 
>>>>> settings.
>>>>> I haven't had a chance to bisect it yet, but it might be a similar 
>>>>> problem.
>>>> Sounds like the same issue to me. Thx for the good test case.
>>>>
>>>>> Any idea what is wrong with it?
>>>> Actually I already wondered that it went so smooth without any 
>>>> regression
>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>>
>>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>>> run in parallel, which creates a lot of memory pressure and probably
>>>>> causes buffer evictions.
>>>> Sounds like the underlying problem to me. We probably evict some 
>>>> part of a
>>>> page table without updating the page directory. Going to dig into 
>>>> it today,
>>>> it's probably just a one liner missing somewhere in the VM code.
>>>>
>>>> Christian.
>>>>
>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>>
>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on 
>>>>>> Bonaire:
>>>>>> [...]
>>>>>>
>>>>>> The simplest way to reproduce the hangs is to run piglit with these
>>>>>> parameters:
>>>>>> -t texelFetch.fs
>>>>>>
>>>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>>>> run in parallel, which creates a lot of memory pressure and probably
>>>>>> causes buffer evictions.
>>>>>>
>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if 
>>>>> I boot
>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high 
>>>>> settings.
>>>>> I haven't had a chance to bisect it yet, but it might be a similar 
>>>>> problem.
>>>>>
>>>>> Grigori
>>>>
>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-drm-radeon-fix-page-directory-update-size-estimation.patch --]
[-- Type: text/x-diff; name="0001-drm-radeon-fix-page-directory-update-size-estimation.patch", Size: 1017 bytes --]

>From a81682b06f702ea35ccc7fdc176c3f8db6cff138 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig@amd.com>
Date: Mon, 12 May 2014 14:46:11 +0200
Subject: [PATCH] drm/radeon: fix page directory update size estimation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Take padding into account as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/radeon/radeon_vm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c
index 2aae6ce..d9ab99f 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -595,7 +595,7 @@ int radeon_vm_update_page_directory(struct radeon_device *rdev,
 	ndw = 64;
 
 	/* assume the worst case */
-	ndw += vm->max_pde_used * 12;
+	ndw += vm->max_pde_used * 16;
 
 	/* update too big for an IB */
 	if (ndw > 0xfffff)
-- 
1.9.1


[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-12 12:50           ` Christian König
@ 2014-05-12 23:38             ` Grigori Goronzy
  2014-05-13 13:22               ` Alex Deucher
  0 siblings, 1 reply; 31+ messages in thread
From: Grigori Goronzy @ 2014-05-12 23:38 UTC (permalink / raw)
  To: Christian König, Marek Olšák; +Cc: dri-devel

I can confirm this fixes it for me, too.

3.15 with these fixes and the large PTE patches actually ends up being 
noticeably slower than earlier kernels with Xonotic, though. I wonder 
what's going on.

Grigori

On 12.05.2014 14:50, Christian König wrote:
> I could reproduce the problem with xonotic and I think I've found the
> issue.
>
> Please test the attached patch.
>
> Thanks,
> Christian.
>
> Am 11.05.2014 11:06, schrieb Christian König:
>>> I have tested it and it doesn't fix the hangs.
>> Yeah, thought so. Well it was just a guess.
>>
>>> (Also, I don't like the patch, because it reverts the behavior I added
>>> for userspace buffers.)
>> Actually it shouldn't affect that. The alternative domain always
>> contains GART even when userspace only specified VRAM as placement (as
>> long as it is technical possible to do so).
>>
>> So what should happen is that TTM sees the current placement, matches
>> that with the desired placement and should find that it doesn't need
>> to move the buffer (we should just test if this behavior really works
>> as expected).
>>
>> Christian.
>>
>> Am 10.05.2014 23:38, schrieb Marek Olšák:
>>> Hi Christian,
>>>
>>> I have tested it and it doesn't fix the hangs.
>>>
>>> (Also, I don't like the patch, because it reverts the behavior I added
>>> for userspace buffers.)
>>>
>>> Marek
>>>
>>>
>>>
>>> On Sat, May 10, 2014 at 6:34 PM, Christian König
>>> <deathsimple@vodafone.de> wrote:
>>>> Couldn't reproduce the issue so far. So the attached patch is just a
>>>> complete shoot into the dark found by rereading the code, but it might
>>>> actually be the problem.
>>>>
>>>> Please give it a try.
>>>>
>>>> Going to keep testing in the meantime,
>>>> Christian.
>>>>
>>>> Am 10.05.2014 10:23, schrieb Christian König:
>>>>
>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
>>>>>> I boot
>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>>>>> settings.
>>>>>> I haven't had a chance to bisect it yet, but it might be a similar
>>>>>> problem.
>>>>> Sounds like the same issue to me. Thx for the good test case.
>>>>>
>>>>>> Any idea what is wrong with it?
>>>>> Actually I already wondered that it went so smooth without any
>>>>> regression
>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>>>
>>>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>>>> run in parallel, which creates a lot of memory pressure and probably
>>>>>> causes buffer evictions.
>>>>> Sounds like the underlying problem to me. We probably evict some
>>>>> part of a
>>>>> page table without updating the page directory. Going to dig into
>>>>> it today,
>>>>> it's probably just a one liner missing somewhere in the VM code.
>>>>>
>>>>> Christian.
>>>>>
>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>>>
>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on
>>>>>>> Bonaire:
>>>>>>> [...]
>>>>>>>
>>>>>>> The simplest way to reproduce the hangs is to run piglit with these
>>>>>>> parameters:
>>>>>>> -t texelFetch.fs
>>>>>>>
>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>>>>> run in parallel, which creates a lot of memory pressure and probably
>>>>>>> causes buffer evictions.
>>>>>>>
>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
>>>>>> I boot
>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>>>>> settings.
>>>>>> I haven't had a chance to bisect it yet, but it might be a similar
>>>>>> problem.
>>>>>>
>>>>>> Grigori
>>>>>
>>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-12 23:38             ` Grigori Goronzy
@ 2014-05-13 13:22               ` Alex Deucher
  2014-05-13 13:57                 ` Christian König
  0 siblings, 1 reply; 31+ messages in thread
From: Alex Deucher @ 2014-05-13 13:22 UTC (permalink / raw)
  To: Grigori Goronzy; +Cc: dri-devel

On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy <greg@chown.ath.cx> wrote:
> I can confirm this fixes it for me, too.
>
> 3.15 with these fixes and the large PTE patches actually ends up being
> noticeably slower than earlier kernels with Xonotic, though. I wonder what's
> going on.

Allocation overhead?


>
> Grigori
>
>
> On 12.05.2014 14:50, Christian König wrote:
>>
>> I could reproduce the problem with xonotic and I think I've found the
>> issue.
>>
>> Please test the attached patch.
>>
>> Thanks,
>> Christian.
>>
>> Am 11.05.2014 11:06, schrieb Christian König:
>>>>
>>>> I have tested it and it doesn't fix the hangs.
>>>
>>> Yeah, thought so. Well it was just a guess.
>>>
>>>> (Also, I don't like the patch, because it reverts the behavior I added
>>>> for userspace buffers.)
>>>
>>> Actually it shouldn't affect that. The alternative domain always
>>> contains GART even when userspace only specified VRAM as placement (as
>>> long as it is technical possible to do so).
>>>
>>> So what should happen is that TTM sees the current placement, matches
>>> that with the desired placement and should find that it doesn't need
>>> to move the buffer (we should just test if this behavior really works
>>> as expected).
>>>
>>> Christian.
>>>
>>> Am 10.05.2014 23:38, schrieb Marek Olšák:
>>>>
>>>> Hi Christian,
>>>>
>>>> I have tested it and it doesn't fix the hangs.
>>>>
>>>> (Also, I don't like the patch, because it reverts the behavior I added
>>>> for userspace buffers.)
>>>>
>>>> Marek
>>>>
>>>>
>>>>
>>>> On Sat, May 10, 2014 at 6:34 PM, Christian König
>>>> <deathsimple@vodafone.de> wrote:
>>>>>
>>>>> Couldn't reproduce the issue so far. So the attached patch is just a
>>>>> complete shoot into the dark found by rereading the code, but it might
>>>>> actually be the problem.
>>>>>
>>>>> Please give it a try.
>>>>>
>>>>> Going to keep testing in the meantime,
>>>>> Christian.
>>>>>
>>>>> Am 10.05.2014 10:23, schrieb Christian König:
>>>>>
>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
>>>>>>> I boot
>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>>>>>> settings.
>>>>>>> I haven't had a chance to bisect it yet, but it might be a similar
>>>>>>> problem.
>>>>>>
>>>>>> Sounds like the same issue to me. Thx for the good test case.
>>>>>>
>>>>>>> Any idea what is wrong with it?
>>>>>>
>>>>>> Actually I already wondered that it went so smooth without any
>>>>>> regression
>>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>>>>
>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>>>>> run in parallel, which creates a lot of memory pressure and probably
>>>>>>> causes buffer evictions.
>>>>>>
>>>>>> Sounds like the underlying problem to me. We probably evict some
>>>>>> part of a
>>>>>> page table without updating the page directory. Going to dig into
>>>>>> it today,
>>>>>> it's probably just a one liner missing somewhere in the VM code.
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>>>>>
>>>>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on
>>>>>>>> Bonaire:
>>>>>>>> [...]
>>>>>>>>
>>>>>>>> The simplest way to reproduce the hangs is to run piglit with these
>>>>>>>> parameters:
>>>>>>>> -t texelFetch.fs
>>>>>>>>
>>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>>>>>> run in parallel, which creates a lot of memory pressure and probably
>>>>>>>> causes buffer evictions.
>>>>>>>>
>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
>>>>>>> I boot
>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>>>>>> settings.
>>>>>>> I haven't had a chance to bisect it yet, but it might be a similar
>>>>>>> problem.
>>>>>>>
>>>>>>> Grigori
>>>>>>
>>>>>>
>>>
>>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-13 13:22               ` Alex Deucher
@ 2014-05-13 13:57                 ` Christian König
  2014-05-13 15:19                   ` Marek Olšák
  0 siblings, 1 reply; 31+ messages in thread
From: Christian König @ 2014-05-13 13:57 UTC (permalink / raw)
  To: Alex Deucher, Grigori Goronzy; +Cc: dri-devel

Am 13.05.2014 15:22, schrieb Alex Deucher:
> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy <greg@chown.ath.cx> wrote:
>> I can confirm this fixes it for me, too.
>>
>> 3.15 with these fixes and the large PTE patches actually ends up being
>> noticeably slower than earlier kernels with Xonotic, though. I wonder what's
>> going on.
> Allocation overhead?

Unlikely, Xonotic just allocates a single page table at start, which 
then gets extended to a certain rate until they no longer need more 
address space and are done with it.

Grigori, can you bisect and/or try to figure out what's wrong here?

Christian.

>
>
>> Grigori
>>
>>
>> On 12.05.2014 14:50, Christian König wrote:
>>> I could reproduce the problem with xonotic and I think I've found the
>>> issue.
>>>
>>> Please test the attached patch.
>>>
>>> Thanks,
>>> Christian.
>>>
>>> Am 11.05.2014 11:06, schrieb Christian König:
>>>>> I have tested it and it doesn't fix the hangs.
>>>> Yeah, thought so. Well it was just a guess.
>>>>
>>>>> (Also, I don't like the patch, because it reverts the behavior I added
>>>>> for userspace buffers.)
>>>> Actually it shouldn't affect that. The alternative domain always
>>>> contains GART even when userspace only specified VRAM as placement (as
>>>> long as it is technical possible to do so).
>>>>
>>>> So what should happen is that TTM sees the current placement, matches
>>>> that with the desired placement and should find that it doesn't need
>>>> to move the buffer (we should just test if this behavior really works
>>>> as expected).
>>>>
>>>> Christian.
>>>>
>>>> Am 10.05.2014 23:38, schrieb Marek Olšák:
>>>>> Hi Christian,
>>>>>
>>>>> I have tested it and it doesn't fix the hangs.
>>>>>
>>>>> (Also, I don't like the patch, because it reverts the behavior I added
>>>>> for userspace buffers.)
>>>>>
>>>>> Marek
>>>>>
>>>>>
>>>>>
>>>>> On Sat, May 10, 2014 at 6:34 PM, Christian König
>>>>> <deathsimple@vodafone.de> wrote:
>>>>>> Couldn't reproduce the issue so far. So the attached patch is just a
>>>>>> complete shoot into the dark found by rereading the code, but it might
>>>>>> actually be the problem.
>>>>>>
>>>>>> Please give it a try.
>>>>>>
>>>>>> Going to keep testing in the meantime,
>>>>>> Christian.
>>>>>>
>>>>>> Am 10.05.2014 10:23, schrieb Christian König:
>>>>>>
>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
>>>>>>>> I boot
>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>>>>>>> settings.
>>>>>>>> I haven't had a chance to bisect it yet, but it might be a similar
>>>>>>>> problem.
>>>>>>> Sounds like the same issue to me. Thx for the good test case.
>>>>>>>
>>>>>>>> Any idea what is wrong with it?
>>>>>>> Actually I already wondered that it went so smooth without any
>>>>>>> regression
>>>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>>>>>
>>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>>>>>> run in parallel, which creates a lot of memory pressure and probably
>>>>>>>> causes buffer evictions.
>>>>>>> Sounds like the underlying problem to me. We probably evict some
>>>>>>> part of a
>>>>>>> page table without updating the page directory. Going to dig into
>>>>>>> it today,
>>>>>>> it's probably just a one liner missing somewhere in the VM code.
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>>>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>>>>>
>>>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on
>>>>>>>>> Bonaire:
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>> The simplest way to reproduce the hangs is to run piglit with these
>>>>>>>>> parameters:
>>>>>>>>> -t texelFetch.fs
>>>>>>>>>
>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>>>>>>> run in parallel, which creates a lot of memory pressure and probably
>>>>>>>>> causes buffer evictions.
>>>>>>>>>
>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
>>>>>>>> I boot
>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>>>>>>> settings.
>>>>>>>> I haven't had a chance to bisect it yet, but it might be a similar
>>>>>>>> problem.
>>>>>>>>
>>>>>>>> Grigori
>>>>>>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-13 13:57                 ` Christian König
@ 2014-05-13 15:19                   ` Marek Olšák
  2014-05-13 15:31                     ` Christian König
  0 siblings, 1 reply; 31+ messages in thread
From: Marek Olšák @ 2014-05-13 15:19 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

Your latest patches fix the regression.

The performance regression can also be reproduced with piglit "-t
texelFetch.fs".

Kernel 3.14:
   real    0m17.724s
   user    0m41.905s
   sys    0m11.299s

The problematic commit checked out + your fixes (without the PTE patch I think):
   real    0m23.474s
   user    1m1.008s
   sys    0m13.812s

Marek


On Tue, May 13, 2014 at 3:57 PM, Christian König
<deathsimple@vodafone.de> wrote:
> Am 13.05.2014 15:22, schrieb Alex Deucher:
>
>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy <greg@chown.ath.cx>
>> wrote:
>>>
>>> I can confirm this fixes it for me, too.
>>>
>>> 3.15 with these fixes and the large PTE patches actually ends up being
>>> noticeably slower than earlier kernels with Xonotic, though. I wonder
>>> what's
>>> going on.
>>
>> Allocation overhead?
>
>
> Unlikely, Xonotic just allocates a single page table at start, which then
> gets extended to a certain rate until they no longer need more address space
> and are done with it.
>
> Grigori, can you bisect and/or try to figure out what's wrong here?
>
> Christian.
>
>
>>
>>
>>> Grigori
>>>
>>>
>>> On 12.05.2014 14:50, Christian König wrote:
>>>>
>>>> I could reproduce the problem with xonotic and I think I've found the
>>>> issue.
>>>>
>>>> Please test the attached patch.
>>>>
>>>> Thanks,
>>>> Christian.
>>>>
>>>> Am 11.05.2014 11:06, schrieb Christian König:
>>>>>>
>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>
>>>>> Yeah, thought so. Well it was just a guess.
>>>>>
>>>>>> (Also, I don't like the patch, because it reverts the behavior I added
>>>>>> for userspace buffers.)
>>>>>
>>>>> Actually it shouldn't affect that. The alternative domain always
>>>>> contains GART even when userspace only specified VRAM as placement (as
>>>>> long as it is technical possible to do so).
>>>>>
>>>>> So what should happen is that TTM sees the current placement, matches
>>>>> that with the desired placement and should find that it doesn't need
>>>>> to move the buffer (we should just test if this behavior really works
>>>>> as expected).
>>>>>
>>>>> Christian.
>>>>>
>>>>> Am 10.05.2014 23:38, schrieb Marek Olšák:
>>>>>>
>>>>>> Hi Christian,
>>>>>>
>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>
>>>>>> (Also, I don't like the patch, because it reverts the behavior I added
>>>>>> for userspace buffers.)
>>>>>>
>>>>>> Marek
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, May 10, 2014 at 6:34 PM, Christian König
>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>
>>>>>>> Couldn't reproduce the issue so far. So the attached patch is just a
>>>>>>> complete shoot into the dark found by rereading the code, but it
>>>>>>> might
>>>>>>> actually be the problem.
>>>>>>>
>>>>>>> Please give it a try.
>>>>>>>
>>>>>>> Going to keep testing in the meantime,
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 10.05.2014 10:23, schrieb Christian König:
>>>>>>>
>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
>>>>>>>>> I boot
>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>>>>>>>> settings.
>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a similar
>>>>>>>>> problem.
>>>>>>>>
>>>>>>>> Sounds like the same issue to me. Thx for the good test case.
>>>>>>>>
>>>>>>>>> Any idea what is wrong with it?
>>>>>>>>
>>>>>>>> Actually I already wondered that it went so smooth without any
>>>>>>>> regression
>>>>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>>>>>>
>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests
>>>>>>>>> also
>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>> probably
>>>>>>>>> causes buffer evictions.
>>>>>>>>
>>>>>>>> Sounds like the underlying problem to me. We probably evict some
>>>>>>>> part of a
>>>>>>>> page table without updating the page directory. Going to dig into
>>>>>>>> it today,
>>>>>>>> it's probably just a one liner missing somewhere in the VM code.
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>>>>>>>
>>>>>>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on
>>>>>>>>>> Bonaire:
>>>>>>>>>> [...]
>>>>>>>>>>
>>>>>>>>>> The simplest way to reproduce the hangs is to run piglit with
>>>>>>>>>> these
>>>>>>>>>> parameters:
>>>>>>>>>> -t texelFetch.fs
>>>>>>>>>>
>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests
>>>>>>>>>> also
>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>> probably
>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>
>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
>>>>>>>>> I boot
>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>>>>>>>> settings.
>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a similar
>>>>>>>>> problem.
>>>>>>>>>
>>>>>>>>> Grigori
>>>>>>>>
>>>>>>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-13 15:19                   ` Marek Olšák
@ 2014-05-13 15:31                     ` Christian König
  2014-05-13 16:08                       ` Marek Olšák
  2014-05-13 19:50                       ` Marek Olšák
  0 siblings, 2 replies; 31+ messages in thread
From: Christian König @ 2014-05-13 15:31 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

Is the performance regression regression caused by the page table 
changes or something else?

I did made some tests with xonotic while developing it and it didn't 
showed anything obvious, but I didn't made tests on different systems.

Christian.

Am 13.05.2014 17:19, schrieb Marek Olšák:
> Your latest patches fix the regression.
>
> The performance regression can also be reproduced with piglit "-t
> texelFetch.fs".
>
> Kernel 3.14:
>     real    0m17.724s
>     user    0m41.905s
>     sys    0m11.299s
>
> The problematic commit checked out + your fixes (without the PTE patch I think):
>     real    0m23.474s
>     user    1m1.008s
>     sys    0m13.812s
>
> Marek
>
>
> On Tue, May 13, 2014 at 3:57 PM, Christian König
> <deathsimple@vodafone.de> wrote:
>> Am 13.05.2014 15:22, schrieb Alex Deucher:
>>
>>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy <greg@chown.ath.cx>
>>> wrote:
>>>> I can confirm this fixes it for me, too.
>>>>
>>>> 3.15 with these fixes and the large PTE patches actually ends up being
>>>> noticeably slower than earlier kernels with Xonotic, though. I wonder
>>>> what's
>>>> going on.
>>> Allocation overhead?
>>
>> Unlikely, Xonotic just allocates a single page table at start, which then
>> gets extended to a certain rate until they no longer need more address space
>> and are done with it.
>>
>> Grigori, can you bisect and/or try to figure out what's wrong here?
>>
>> Christian.
>>
>>
>>>
>>>> Grigori
>>>>
>>>>
>>>> On 12.05.2014 14:50, Christian König wrote:
>>>>> I could reproduce the problem with xonotic and I think I've found the
>>>>> issue.
>>>>>
>>>>> Please test the attached patch.
>>>>>
>>>>> Thanks,
>>>>> Christian.
>>>>>
>>>>> Am 11.05.2014 11:06, schrieb Christian König:
>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>> Yeah, thought so. Well it was just a guess.
>>>>>>
>>>>>>> (Also, I don't like the patch, because it reverts the behavior I added
>>>>>>> for userspace buffers.)
>>>>>> Actually it shouldn't affect that. The alternative domain always
>>>>>> contains GART even when userspace only specified VRAM as placement (as
>>>>>> long as it is technical possible to do so).
>>>>>>
>>>>>> So what should happen is that TTM sees the current placement, matches
>>>>>> that with the desired placement and should find that it doesn't need
>>>>>> to move the buffer (we should just test if this behavior really works
>>>>>> as expected).
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>> Am 10.05.2014 23:38, schrieb Marek Olšák:
>>>>>>> Hi Christian,
>>>>>>>
>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>
>>>>>>> (Also, I don't like the patch, because it reverts the behavior I added
>>>>>>> for userspace buffers.)
>>>>>>>
>>>>>>> Marek
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, May 10, 2014 at 6:34 PM, Christian König
>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>> Couldn't reproduce the issue so far. So the attached patch is just a
>>>>>>>> complete shoot into the dark found by rereading the code, but it
>>>>>>>> might
>>>>>>>> actually be the problem.
>>>>>>>>
>>>>>>>> Please give it a try.
>>>>>>>>
>>>>>>>> Going to keep testing in the meantime,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>> Am 10.05.2014 10:23, schrieb Christian König:
>>>>>>>>
>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
>>>>>>>>>> I boot
>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>>>>>>>>> settings.
>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a similar
>>>>>>>>>> problem.
>>>>>>>>> Sounds like the same issue to me. Thx for the good test case.
>>>>>>>>>
>>>>>>>>>> Any idea what is wrong with it?
>>>>>>>>> Actually I already wondered that it went so smooth without any
>>>>>>>>> regression
>>>>>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>>>>>>>
>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests
>>>>>>>>>> also
>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>> probably
>>>>>>>>>> causes buffer evictions.
>>>>>>>>> Sounds like the underlying problem to me. We probably evict some
>>>>>>>>> part of a
>>>>>>>>> page table without updating the page directory. Going to dig into
>>>>>>>>> it today,
>>>>>>>>> it's probably just a one liner missing somewhere in the VM code.
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>>>>>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>>>>>>>
>>>>>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on
>>>>>>>>>>> Bonaire:
>>>>>>>>>>> [...]
>>>>>>>>>>>
>>>>>>>>>>> The simplest way to reproduce the hangs is to run piglit with
>>>>>>>>>>> these
>>>>>>>>>>> parameters:
>>>>>>>>>>> -t texelFetch.fs
>>>>>>>>>>>
>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests
>>>>>>>>>>> also
>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>> probably
>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>
>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
>>>>>>>>>> I boot
>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>>>>>>>>> settings.
>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a similar
>>>>>>>>>> problem.
>>>>>>>>>>
>>>>>>>>>> Grigori
>>>>>>>>>
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-13 15:31                     ` Christian König
@ 2014-05-13 16:08                       ` Marek Olšák
  2014-05-13 19:50                       ` Marek Olšák
  1 sibling, 0 replies; 31+ messages in thread
From: Marek Olšák @ 2014-05-13 16:08 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

I think it's caused by something else. I'll continue testing and bisecting.

Marek

On Tue, May 13, 2014 at 5:31 PM, Christian König
<deathsimple@vodafone.de> wrote:
> Is the performance regression regression caused by the page table changes or
> something else?
>
> I did made some tests with xonotic while developing it and it didn't showed
> anything obvious, but I didn't made tests on different systems.
>
> Christian.
>
> Am 13.05.2014 17:19, schrieb Marek Olšák:
>
>> Your latest patches fix the regression.
>>
>> The performance regression can also be reproduced with piglit "-t
>> texelFetch.fs".
>>
>> Kernel 3.14:
>>     real    0m17.724s
>>     user    0m41.905s
>>     sys    0m11.299s
>>
>> The problematic commit checked out + your fixes (without the PTE patch I
>> think):
>>     real    0m23.474s
>>     user    1m1.008s
>>     sys    0m13.812s
>>
>> Marek
>>
>>
>> On Tue, May 13, 2014 at 3:57 PM, Christian König
>> <deathsimple@vodafone.de> wrote:
>>>
>>> Am 13.05.2014 15:22, schrieb Alex Deucher:
>>>
>>>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy <greg@chown.ath.cx>
>>>> wrote:
>>>>>
>>>>> I can confirm this fixes it for me, too.
>>>>>
>>>>> 3.15 with these fixes and the large PTE patches actually ends up being
>>>>> noticeably slower than earlier kernels with Xonotic, though. I wonder
>>>>> what's
>>>>> going on.
>>>>
>>>> Allocation overhead?
>>>
>>>
>>> Unlikely, Xonotic just allocates a single page table at start, which then
>>> gets extended to a certain rate until they no longer need more address
>>> space
>>> and are done with it.
>>>
>>> Grigori, can you bisect and/or try to figure out what's wrong here?
>>>
>>> Christian.
>>>
>>>
>>>>
>>>>> Grigori
>>>>>
>>>>>
>>>>> On 12.05.2014 14:50, Christian König wrote:
>>>>>>
>>>>>> I could reproduce the problem with xonotic and I think I've found the
>>>>>> issue.
>>>>>>
>>>>>> Please test the attached patch.
>>>>>>
>>>>>> Thanks,
>>>>>> Christian.
>>>>>>
>>>>>> Am 11.05.2014 11:06, schrieb Christian König:
>>>>>>>>
>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>
>>>>>>> Yeah, thought so. Well it was just a guess.
>>>>>>>
>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I
>>>>>>>> added
>>>>>>>> for userspace buffers.)
>>>>>>>
>>>>>>> Actually it shouldn't affect that. The alternative domain always
>>>>>>> contains GART even when userspace only specified VRAM as placement
>>>>>>> (as
>>>>>>> long as it is technical possible to do so).
>>>>>>>
>>>>>>> So what should happen is that TTM sees the current placement, matches
>>>>>>> that with the desired placement and should find that it doesn't need
>>>>>>> to move the buffer (we should just test if this behavior really works
>>>>>>> as expected).
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 10.05.2014 23:38, schrieb Marek Olšák:
>>>>>>>>
>>>>>>>> Hi Christian,
>>>>>>>>
>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>
>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I
>>>>>>>> added
>>>>>>>> for userspace buffers.)
>>>>>>>>
>>>>>>>> Marek
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, May 10, 2014 at 6:34 PM, Christian König
>>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>>>
>>>>>>>>> Couldn't reproduce the issue so far. So the attached patch is just
>>>>>>>>> a
>>>>>>>>> complete shoot into the dark found by rereading the code, but it
>>>>>>>>> might
>>>>>>>>> actually be the problem.
>>>>>>>>>
>>>>>>>>> Please give it a try.
>>>>>>>>>
>>>>>>>>> Going to keep testing in the meantime,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> Am 10.05.2014 10:23, schrieb Christian König:
>>>>>>>>>
>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g.
>>>>>>>>>>> if
>>>>>>>>>>> I boot
>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>>>>>>>>>> settings.
>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>> similar
>>>>>>>>>>> problem.
>>>>>>>>>>
>>>>>>>>>> Sounds like the same issue to me. Thx for the good test case.
>>>>>>>>>>
>>>>>>>>>>> Any idea what is wrong with it?
>>>>>>>>>>
>>>>>>>>>> Actually I already wondered that it went so smooth without any
>>>>>>>>>> regression
>>>>>>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>>>>>>>>
>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests
>>>>>>>>>>> also
>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>> probably
>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>
>>>>>>>>>> Sounds like the underlying problem to me. We probably evict some
>>>>>>>>>> part of a
>>>>>>>>>> page table without updating the page directory. Going to dig into
>>>>>>>>>> it today,
>>>>>>>>>> it's probably just a one liner missing somewhere in the VM code.
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>>>>>>>>>
>>>>>>>>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on
>>>>>>>>>>>> Bonaire:
>>>>>>>>>>>> [...]
>>>>>>>>>>>>
>>>>>>>>>>>> The simplest way to reproduce the hangs is to run piglit with
>>>>>>>>>>>> these
>>>>>>>>>>>> parameters:
>>>>>>>>>>>> -t texelFetch.fs
>>>>>>>>>>>>
>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests
>>>>>>>>>>>> also
>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>>> probably
>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>
>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g.
>>>>>>>>>>> if
>>>>>>>>>>> I boot
>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>>>>>>>>>> settings.
>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>> similar
>>>>>>>>>>> problem.
>>>>>>>>>>>
>>>>>>>>>>> Grigori
>>>>>>>>>>
>>>>>>>>>>
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-13 15:31                     ` Christian König
  2014-05-13 16:08                       ` Marek Olšák
@ 2014-05-13 19:50                       ` Marek Olšák
  2014-05-13 20:19                         ` Grigori Goronzy
  1 sibling, 1 reply; 31+ messages in thread
From: Marek Olšák @ 2014-05-13 19:50 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

Hi Christian,

The performance regression I saw with piglit seems to be fixed with
latest kernel git. It's difficult to bisect the kernel, because there
are only merges between 3.14 and 3.15 and the merged committs are
actually based on 3.14-rc1 and 3.14-rc4.

All seems to be fine with your fixes.

Marek

On Tue, May 13, 2014 at 5:31 PM, Christian König
<deathsimple@vodafone.de> wrote:
> Is the performance regression regression caused by the page table changes or
> something else?
>
> I did made some tests with xonotic while developing it and it didn't showed
> anything obvious, but I didn't made tests on different systems.
>
> Christian.
>
> Am 13.05.2014 17:19, schrieb Marek Olšák:
>
>> Your latest patches fix the regression.
>>
>> The performance regression can also be reproduced with piglit "-t
>> texelFetch.fs".
>>
>> Kernel 3.14:
>>     real    0m17.724s
>>     user    0m41.905s
>>     sys    0m11.299s
>>
>> The problematic commit checked out + your fixes (without the PTE patch I
>> think):
>>     real    0m23.474s
>>     user    1m1.008s
>>     sys    0m13.812s
>>
>> Marek
>>
>>
>> On Tue, May 13, 2014 at 3:57 PM, Christian König
>> <deathsimple@vodafone.de> wrote:
>>>
>>> Am 13.05.2014 15:22, schrieb Alex Deucher:
>>>
>>>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy <greg@chown.ath.cx>
>>>> wrote:
>>>>>
>>>>> I can confirm this fixes it for me, too.
>>>>>
>>>>> 3.15 with these fixes and the large PTE patches actually ends up being
>>>>> noticeably slower than earlier kernels with Xonotic, though. I wonder
>>>>> what's
>>>>> going on.
>>>>
>>>> Allocation overhead?
>>>
>>>
>>> Unlikely, Xonotic just allocates a single page table at start, which then
>>> gets extended to a certain rate until they no longer need more address
>>> space
>>> and are done with it.
>>>
>>> Grigori, can you bisect and/or try to figure out what's wrong here?
>>>
>>> Christian.
>>>
>>>
>>>>
>>>>> Grigori
>>>>>
>>>>>
>>>>> On 12.05.2014 14:50, Christian König wrote:
>>>>>>
>>>>>> I could reproduce the problem with xonotic and I think I've found the
>>>>>> issue.
>>>>>>
>>>>>> Please test the attached patch.
>>>>>>
>>>>>> Thanks,
>>>>>> Christian.
>>>>>>
>>>>>> Am 11.05.2014 11:06, schrieb Christian König:
>>>>>>>>
>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>
>>>>>>> Yeah, thought so. Well it was just a guess.
>>>>>>>
>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I
>>>>>>>> added
>>>>>>>> for userspace buffers.)
>>>>>>>
>>>>>>> Actually it shouldn't affect that. The alternative domain always
>>>>>>> contains GART even when userspace only specified VRAM as placement
>>>>>>> (as
>>>>>>> long as it is technical possible to do so).
>>>>>>>
>>>>>>> So what should happen is that TTM sees the current placement, matches
>>>>>>> that with the desired placement and should find that it doesn't need
>>>>>>> to move the buffer (we should just test if this behavior really works
>>>>>>> as expected).
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 10.05.2014 23:38, schrieb Marek Olšák:
>>>>>>>>
>>>>>>>> Hi Christian,
>>>>>>>>
>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>
>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I
>>>>>>>> added
>>>>>>>> for userspace buffers.)
>>>>>>>>
>>>>>>>> Marek
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, May 10, 2014 at 6:34 PM, Christian König
>>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>>>
>>>>>>>>> Couldn't reproduce the issue so far. So the attached patch is just
>>>>>>>>> a
>>>>>>>>> complete shoot into the dark found by rereading the code, but it
>>>>>>>>> might
>>>>>>>>> actually be the problem.
>>>>>>>>>
>>>>>>>>> Please give it a try.
>>>>>>>>>
>>>>>>>>> Going to keep testing in the meantime,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> Am 10.05.2014 10:23, schrieb Christian König:
>>>>>>>>>
>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g.
>>>>>>>>>>> if
>>>>>>>>>>> I boot
>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>>>>>>>>>> settings.
>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>> similar
>>>>>>>>>>> problem.
>>>>>>>>>>
>>>>>>>>>> Sounds like the same issue to me. Thx for the good test case.
>>>>>>>>>>
>>>>>>>>>>> Any idea what is wrong with it?
>>>>>>>>>>
>>>>>>>>>> Actually I already wondered that it went so smooth without any
>>>>>>>>>> regression
>>>>>>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>>>>>>>>
>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests
>>>>>>>>>>> also
>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>> probably
>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>
>>>>>>>>>> Sounds like the underlying problem to me. We probably evict some
>>>>>>>>>> part of a
>>>>>>>>>> page table without updating the page directory. Going to dig into
>>>>>>>>>> it today,
>>>>>>>>>> it's probably just a one liner missing somewhere in the VM code.
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>>>>>>>>>
>>>>>>>>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on
>>>>>>>>>>>> Bonaire:
>>>>>>>>>>>> [...]
>>>>>>>>>>>>
>>>>>>>>>>>> The simplest way to reproduce the hangs is to run piglit with
>>>>>>>>>>>> these
>>>>>>>>>>>> parameters:
>>>>>>>>>>>> -t texelFetch.fs
>>>>>>>>>>>>
>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests
>>>>>>>>>>>> also
>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>>> probably
>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>
>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g.
>>>>>>>>>>> if
>>>>>>>>>>> I boot
>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>>>>>>>>>> settings.
>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>> similar
>>>>>>>>>>> problem.
>>>>>>>>>>>
>>>>>>>>>>> Grigori
>>>>>>>>>>
>>>>>>>>>>
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-13 19:50                       ` Marek Olšák
@ 2014-05-13 20:19                         ` Grigori Goronzy
  2014-05-13 20:27                           ` Marek Olšák
  0 siblings, 1 reply; 31+ messages in thread
From: Grigori Goronzy @ 2014-05-13 20:19 UTC (permalink / raw)
  To: Marek Olšák, Christian König; +Cc: dri-devel

On 13.05.2014 21:50, Marek Olšák wrote:
> Hi Christian,
>
> The performance regression I saw with piglit seems to be fixed with
> latest kernel git. It's difficult to bisect the kernel, because there
> are only merges between 3.14 and 3.15 and the merged committs are
> actually based on 3.14-rc1 and 3.14-rc4.
>
> All seems to be fine with your fixes.
>

Which fixes have you applied? There are quite a few pending patches on 
dri-devel, that aren't yet part of drm-fixes-3.15.

Grigori

> Marek
>
> On Tue, May 13, 2014 at 5:31 PM, Christian König
> <deathsimple@vodafone.de> wrote:
>> Is the performance regression regression caused by the page table changes or
>> something else?
>>
>> I did made some tests with xonotic while developing it and it didn't showed
>> anything obvious, but I didn't made tests on different systems.
>>
>> Christian.
>>
>> Am 13.05.2014 17:19, schrieb Marek Olšák:
>>
>>> Your latest patches fix the regression.
>>>
>>> The performance regression can also be reproduced with piglit "-t
>>> texelFetch.fs".
>>>
>>> Kernel 3.14:
>>>      real    0m17.724s
>>>      user    0m41.905s
>>>      sys    0m11.299s
>>>
>>> The problematic commit checked out + your fixes (without the PTE patch I
>>> think):
>>>      real    0m23.474s
>>>      user    1m1.008s
>>>      sys    0m13.812s
>>>
>>> Marek
>>>
>>>
>>> On Tue, May 13, 2014 at 3:57 PM, Christian König
>>> <deathsimple@vodafone.de> wrote:
>>>>
>>>> Am 13.05.2014 15:22, schrieb Alex Deucher:
>>>>
>>>>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy <greg@chown.ath.cx>
>>>>> wrote:
>>>>>>
>>>>>> I can confirm this fixes it for me, too.
>>>>>>
>>>>>> 3.15 with these fixes and the large PTE patches actually ends up being
>>>>>> noticeably slower than earlier kernels with Xonotic, though. I wonder
>>>>>> what's
>>>>>> going on.
>>>>>
>>>>> Allocation overhead?
>>>>
>>>>
>>>> Unlikely, Xonotic just allocates a single page table at start, which then
>>>> gets extended to a certain rate until they no longer need more address
>>>> space
>>>> and are done with it.
>>>>
>>>> Grigori, can you bisect and/or try to figure out what's wrong here?
>>>>
>>>> Christian.
>>>>
>>>>
>>>>>
>>>>>> Grigori
>>>>>>
>>>>>>
>>>>>> On 12.05.2014 14:50, Christian König wrote:
>>>>>>>
>>>>>>> I could reproduce the problem with xonotic and I think I've found the
>>>>>>> issue.
>>>>>>>
>>>>>>> Please test the attached patch.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 11.05.2014 11:06, schrieb Christian König:
>>>>>>>>>
>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>
>>>>>>>> Yeah, thought so. Well it was just a guess.
>>>>>>>>
>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I
>>>>>>>>> added
>>>>>>>>> for userspace buffers.)
>>>>>>>>
>>>>>>>> Actually it shouldn't affect that. The alternative domain always
>>>>>>>> contains GART even when userspace only specified VRAM as placement
>>>>>>>> (as
>>>>>>>> long as it is technical possible to do so).
>>>>>>>>
>>>>>>>> So what should happen is that TTM sees the current placement, matches
>>>>>>>> that with the desired placement and should find that it doesn't need
>>>>>>>> to move the buffer (we should just test if this behavior really works
>>>>>>>> as expected).
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>> Am 10.05.2014 23:38, schrieb Marek Olšák:
>>>>>>>>>
>>>>>>>>> Hi Christian,
>>>>>>>>>
>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>
>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I
>>>>>>>>> added
>>>>>>>>> for userspace buffers.)
>>>>>>>>>
>>>>>>>>> Marek
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, May 10, 2014 at 6:34 PM, Christian König
>>>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>>>>
>>>>>>>>>> Couldn't reproduce the issue so far. So the attached patch is just
>>>>>>>>>> a
>>>>>>>>>> complete shoot into the dark found by rereading the code, but it
>>>>>>>>>> might
>>>>>>>>>> actually be the problem.
>>>>>>>>>>
>>>>>>>>>> Please give it a try.
>>>>>>>>>>
>>>>>>>>>> Going to keep testing in the meantime,
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>> Am 10.05.2014 10:23, schrieb Christian König:
>>>>>>>>>>
>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g.
>>>>>>>>>>>> if
>>>>>>>>>>>> I boot
>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>>>>>>>>>>> settings.
>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>> similar
>>>>>>>>>>>> problem.
>>>>>>>>>>>
>>>>>>>>>>> Sounds like the same issue to me. Thx for the good test case.
>>>>>>>>>>>
>>>>>>>>>>>> Any idea what is wrong with it?
>>>>>>>>>>>
>>>>>>>>>>> Actually I already wondered that it went so smooth without any
>>>>>>>>>>> regression
>>>>>>>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>>>>>>>>>
>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests
>>>>>>>>>>>> also
>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>>> probably
>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>
>>>>>>>>>>> Sounds like the underlying problem to me. We probably evict some
>>>>>>>>>>> part of a
>>>>>>>>>>> page table without updating the page directory. Going to dig into
>>>>>>>>>>> it today,
>>>>>>>>>>> it's probably just a one liner missing somewhere in the VM code.
>>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>>>>>>>>>>
>>>>>>>>>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on
>>>>>>>>>>>>> Bonaire:
>>>>>>>>>>>>> [...]
>>>>>>>>>>>>>
>>>>>>>>>>>>> The simplest way to reproduce the hangs is to run piglit with
>>>>>>>>>>>>> these
>>>>>>>>>>>>> parameters:
>>>>>>>>>>>>> -t texelFetch.fs
>>>>>>>>>>>>>
>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests
>>>>>>>>>>>>> also
>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>>>> probably
>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>>
>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g.
>>>>>>>>>>>> if
>>>>>>>>>>>> I boot
>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>>>>>>>>>>> settings.
>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>> similar
>>>>>>>>>>>> problem.
>>>>>>>>>>>>
>>>>>>>>>>>> Grigori
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> _______________________________________________
>>>>>> dri-devel mailing list
>>>>>> dri-devel@lists.freedesktop.org
>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>
>>>>
>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-13 20:19                         ` Grigori Goronzy
@ 2014-05-13 20:27                           ` Marek Olšák
  2014-05-30  0:30                             ` Grigori Goronzy
  0 siblings, 1 reply; 31+ messages in thread
From: Marek Olšák @ 2014-05-13 20:27 UTC (permalink / raw)
  To: Grigori Goronzy; +Cc: dri-devel

I applied these two patches Christian sent to dri-devel:

drm/radeon: fix page directory update size estimation
drm/radeon: fix buffer placement under memory pressure v2

on top of torvalds's master branch.

Marek

On Tue, May 13, 2014 at 10:19 PM, Grigori Goronzy <greg@chown.ath.cx> wrote:
> On 13.05.2014 21:50, Marek Olšák wrote:
>>
>> Hi Christian,
>>
>> The performance regression I saw with piglit seems to be fixed with
>> latest kernel git. It's difficult to bisect the kernel, because there
>> are only merges between 3.14 and 3.15 and the merged committs are
>> actually based on 3.14-rc1 and 3.14-rc4.
>>
>> All seems to be fine with your fixes.
>>
>
> Which fixes have you applied? There are quite a few pending patches on
> dri-devel, that aren't yet part of drm-fixes-3.15.
>
> Grigori
>
>
>> Marek
>>
>> On Tue, May 13, 2014 at 5:31 PM, Christian König
>> <deathsimple@vodafone.de> wrote:
>>>
>>> Is the performance regression regression caused by the page table changes
>>> or
>>> something else?
>>>
>>> I did made some tests with xonotic while developing it and it didn't
>>> showed
>>> anything obvious, but I didn't made tests on different systems.
>>>
>>> Christian.
>>>
>>> Am 13.05.2014 17:19, schrieb Marek Olšák:
>>>
>>>> Your latest patches fix the regression.
>>>>
>>>> The performance regression can also be reproduced with piglit "-t
>>>> texelFetch.fs".
>>>>
>>>> Kernel 3.14:
>>>>      real    0m17.724s
>>>>      user    0m41.905s
>>>>      sys    0m11.299s
>>>>
>>>> The problematic commit checked out + your fixes (without the PTE patch I
>>>> think):
>>>>      real    0m23.474s
>>>>      user    1m1.008s
>>>>      sys    0m13.812s
>>>>
>>>> Marek
>>>>
>>>>
>>>> On Tue, May 13, 2014 at 3:57 PM, Christian König
>>>> <deathsimple@vodafone.de> wrote:
>>>>>
>>>>>
>>>>> Am 13.05.2014 15:22, schrieb Alex Deucher:
>>>>>
>>>>>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy <greg@chown.ath.cx>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> I can confirm this fixes it for me, too.
>>>>>>>
>>>>>>> 3.15 with these fixes and the large PTE patches actually ends up
>>>>>>> being
>>>>>>> noticeably slower than earlier kernels with Xonotic, though. I wonder
>>>>>>> what's
>>>>>>> going on.
>>>>>>
>>>>>>
>>>>>> Allocation overhead?
>>>>>
>>>>>
>>>>>
>>>>> Unlikely, Xonotic just allocates a single page table at start, which
>>>>> then
>>>>> gets extended to a certain rate until they no longer need more address
>>>>> space
>>>>> and are done with it.
>>>>>
>>>>> Grigori, can you bisect and/or try to figure out what's wrong here?
>>>>>
>>>>> Christian.
>>>>>
>>>>>
>>>>>>
>>>>>>> Grigori
>>>>>>>
>>>>>>>
>>>>>>> On 12.05.2014 14:50, Christian König wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> I could reproduce the problem with xonotic and I think I've found
>>>>>>>> the
>>>>>>>> issue.
>>>>>>>>
>>>>>>>> Please test the attached patch.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>> Am 11.05.2014 11:06, schrieb Christian König:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yeah, thought so. Well it was just a guess.
>>>>>>>>>
>>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I
>>>>>>>>>> added
>>>>>>>>>> for userspace buffers.)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Actually it shouldn't affect that. The alternative domain always
>>>>>>>>> contains GART even when userspace only specified VRAM as placement
>>>>>>>>> (as
>>>>>>>>> long as it is technical possible to do so).
>>>>>>>>>
>>>>>>>>> So what should happen is that TTM sees the current placement,
>>>>>>>>> matches
>>>>>>>>> that with the desired placement and should find that it doesn't
>>>>>>>>> need
>>>>>>>>> to move the buffer (we should just test if this behavior really
>>>>>>>>> works
>>>>>>>>> as expected).
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> Am 10.05.2014 23:38, schrieb Marek Olšák:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Christian,
>>>>>>>>>>
>>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>>
>>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I
>>>>>>>>>> added
>>>>>>>>>> for userspace buffers.)
>>>>>>>>>>
>>>>>>>>>> Marek
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, May 10, 2014 at 6:34 PM, Christian König
>>>>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Couldn't reproduce the issue so far. So the attached patch is
>>>>>>>>>>> just
>>>>>>>>>>> a
>>>>>>>>>>> complete shoot into the dark found by rereading the code, but it
>>>>>>>>>>> might
>>>>>>>>>>> actually be the problem.
>>>>>>>>>>>
>>>>>>>>>>> Please give it a try.
>>>>>>>>>>>
>>>>>>>>>>> Going to keep testing in the meantime,
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>> Am 10.05.2014 10:23, schrieb Christian König:
>>>>>>>>>>>
>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g.
>>>>>>>>>>>>> if
>>>>>>>>>>>>> I boot
>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with
>>>>>>>>>>>>> high
>>>>>>>>>>>>> settings.
>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>>> similar
>>>>>>>>>>>>> problem.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Sounds like the same issue to me. Thx for the good test case.
>>>>>>>>>>>>
>>>>>>>>>>>>> Any idea what is wrong with it?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Actually I already wondered that it went so smooth without any
>>>>>>>>>>>> regression
>>>>>>>>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>>>>>>>>>>
>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests
>>>>>>>>>>>>> also
>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>>>> probably
>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Sounds like the underlying problem to me. We probably evict some
>>>>>>>>>>>> part of a
>>>>>>>>>>>> page table without updating the page directory. Going to dig
>>>>>>>>>>>> into
>>>>>>>>>>>> it today,
>>>>>>>>>>>> it's probably just a one liner missing somewhere in the VM code.
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on
>>>>>>>>>>>>>> Bonaire:
>>>>>>>>>>>>>> [...]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The simplest way to reproduce the hangs is to run piglit with
>>>>>>>>>>>>>> these
>>>>>>>>>>>>>> parameters:
>>>>>>>>>>>>>> -t texelFetch.fs
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the
>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>> also
>>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g.
>>>>>>>>>>>>> if
>>>>>>>>>>>>> I boot
>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with
>>>>>>>>>>>>> high
>>>>>>>>>>>>> settings.
>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>>> similar
>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Grigori
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dri-devel mailing list
>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>
>>>>>
>>>>>
>>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-09 18:03 CIK hangs with kernel 3.15, bisected Marek Olšák
  2014-05-09 18:10 ` Rafał Miłecki
  2014-05-09 21:39 ` Grigori Goronzy
@ 2014-05-13 21:21 ` Marek Olšák
  2014-05-14 12:11   ` Christian König
  2 siblings, 1 reply; 31+ messages in thread
From: Marek Olšák @ 2014-05-13 21:21 UTC (permalink / raw)
  To: Christian König, dri-devel

Hi Christian,

Even though some regressions are fixed by these patches:

drm/radeon: fix page directory update size estimation
drm/radeon: fix buffer placement under memory pressure v2

and indeed, the texelFetch tests no longer hang, there is one more
hang which needs to be fixed. :( All I know is the exact same commit
causes it and it can only be reproduced by running whole piglit with
concurrency enabled.

My kernel git log:

* 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
(10 hours ago) <Christian König>
* 3af91e5 - drm/radeon: fix page directory update size estimation (21
hours ago) <Christian König>
* 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
months ago) <Christian König>
* fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
months ago) <Christian König>

fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
of the two fixes is the first bad commit.

Marek

On Fri, May 9, 2014 at 8:03 PM, Marek Olšák <maraeo@gmail.com> wrote:
> Hi Christian,
>
> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>
> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
> Author: Christian König <christian.koenig@amd.com>
> Date:   Thu Feb 20 13:42:17 2014 +0100
>
>     drm/radeon: use normal BOs for the page tables v4
>
>     No need to make it more complicated than necessary,
>     just allocate the page tables as normal BO and
>     flush whenever the address change.
>
>     v2: update comments and function name
>     v3: squash bug fixes, page directory and tables patch
>     v4: rebased on Mareks changes
>
>     Signed-off-by: Christian König <christian.koenig@amd.com>
>
>
> Reverting the commit gives me a lot of merge conflicts.
>
> The simplest way to reproduce the hangs is to run piglit with these parameters:
> -t texelFetch.fs
>
> Some of the tests allocate a lot of MSAA textures and the tests also
> run in parallel, which creates a lot of memory pressure and probably
> causes buffer evictions.
>
> Any idea what is wrong with it?
>
> Thanks,
>
> Marek
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-13 21:21 ` Marek Olšák
@ 2014-05-14 12:11   ` Christian König
  2014-05-27 21:55     ` Marek Olšák
  0 siblings, 1 reply; 31+ messages in thread
From: Christian König @ 2014-05-14 12:11 UTC (permalink / raw)
  To: Marek Olšák, dri-devel

Crap, any chance you can narrow it down a bit more?

I've just tried a piglit quick test on my Bonaire and it seems to work 
perfectly fine.

What hw do you test on?

Regards,
Christian.

Am 13.05.2014 23:21, schrieb Marek Olšák:
> Hi Christian,
>
> Even though some regressions are fixed by these patches:
>
> drm/radeon: fix page directory update size estimation
> drm/radeon: fix buffer placement under memory pressure v2
>
> and indeed, the texelFetch tests no longer hang, there is one more
> hang which needs to be fixed. :( All I know is the exact same commit
> causes it and it can only be reproduced by running whole piglit with
> concurrency enabled.
>
> My kernel git log:
>
> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
> (10 hours ago) <Christian König>
> * 3af91e5 - drm/radeon: fix page directory update size estimation (21
> hours ago) <Christian König>
> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
> months ago) <Christian König>
> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
> months ago) <Christian König>
>
> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
> of the two fixes is the first bad commit.
>
> Marek
>
> On Fri, May 9, 2014 at 8:03 PM, Marek Olšák <maraeo@gmail.com> wrote:
>> Hi Christian,
>>
>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>
>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
>> Author: Christian König <christian.koenig@amd.com>
>> Date:   Thu Feb 20 13:42:17 2014 +0100
>>
>>      drm/radeon: use normal BOs for the page tables v4
>>
>>      No need to make it more complicated than necessary,
>>      just allocate the page tables as normal BO and
>>      flush whenever the address change.
>>
>>      v2: update comments and function name
>>      v3: squash bug fixes, page directory and tables patch
>>      v4: rebased on Mareks changes
>>
>>      Signed-off-by: Christian König <christian.koenig@amd.com>
>>
>>
>> Reverting the commit gives me a lot of merge conflicts.
>>
>> The simplest way to reproduce the hangs is to run piglit with these parameters:
>> -t texelFetch.fs
>>
>> Some of the tests allocate a lot of MSAA textures and the tests also
>> run in parallel, which creates a lot of memory pressure and probably
>> causes buffer evictions.
>>
>> Any idea what is wrong with it?
>>
>> Thanks,
>>
>> Marek

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-14 12:11   ` Christian König
@ 2014-05-27 21:55     ` Marek Olšák
  2014-05-28 10:38       ` Christian König
  0 siblings, 1 reply; 31+ messages in thread
From: Marek Olšák @ 2014-05-27 21:55 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

[-- Attachment #1: Type: text/plain, Size: 3269 bytes --]

Hi Christian,

I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not
fixed yet. They are very rare and very random. Therefore, I have come
up with a patch which evicts page tables between IBs. See the
attachment. With that patch applied, the system starts fine, compiz
and glxgears work, but once I start playing openarena, it locks up
pretty quickly.

The patch shouldn't do anything in theory, because pages are moved
back to VRAM immediately after that. However, the VRAM address of page
tables may end up being different from before, which might be the root
cause.

Marek

On Wed, May 14, 2014 at 2:11 PM, Christian König
<deathsimple@vodafone.de> wrote:
> Crap, any chance you can narrow it down a bit more?
>
> I've just tried a piglit quick test on my Bonaire and it seems to work
> perfectly fine.
>
> What hw do you test on?
>
> Regards,
> Christian.
>
> Am 13.05.2014 23:21, schrieb Marek Olšák:
>
>> Hi Christian,
>>
>> Even though some regressions are fixed by these patches:
>>
>> drm/radeon: fix page directory update size estimation
>> drm/radeon: fix buffer placement under memory pressure v2
>>
>> and indeed, the texelFetch tests no longer hang, there is one more
>> hang which needs to be fixed. :( All I know is the exact same commit
>> causes it and it can only be reproduced by running whole piglit with
>> concurrency enabled.
>>
>> My kernel git log:
>>
>> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
>> (10 hours ago) <Christian König>
>> * 3af91e5 - drm/radeon: fix page directory update size estimation (21
>> hours ago) <Christian König>
>> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
>> months ago) <Christian König>
>> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
>> months ago) <Christian König>
>>
>> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
>> of the two fixes is the first bad commit.
>>
>> Marek
>>
>> On Fri, May 9, 2014 at 8:03 PM, Marek Olšák <maraeo@gmail.com> wrote:
>>>
>>> Hi Christian,
>>>
>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>>
>>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
>>> Author: Christian König <christian.koenig@amd.com>
>>> Date:   Thu Feb 20 13:42:17 2014 +0100
>>>
>>>      drm/radeon: use normal BOs for the page tables v4
>>>
>>>      No need to make it more complicated than necessary,
>>>      just allocate the page tables as normal BO and
>>>      flush whenever the address change.
>>>
>>>      v2: update comments and function name
>>>      v3: squash bug fixes, page directory and tables patch
>>>      v4: rebased on Mareks changes
>>>
>>>      Signed-off-by: Christian König <christian.koenig@amd.com>
>>>
>>>
>>> Reverting the commit gives me a lot of merge conflicts.
>>>
>>> The simplest way to reproduce the hangs is to run piglit with these
>>> parameters:
>>> -t texelFetch.fs
>>>
>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>> run in parallel, which creates a lot of memory pressure and probably
>>> causes buffer evictions.
>>>
>>> Any idea what is wrong with it?
>>>
>>> Thanks,
>>>
>>> Marek
>
>

[-- Attachment #2: vm_move_page_tables.diff --]
[-- Type: text/plain, Size: 1227 bytes --]

diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c
index d9ab99f..365e36f 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -116,6 +116,19 @@ void radeon_vm_manager_fini(struct radeon_device *rdev)
 	rdev->vm_manager.enabled = false;
 }
 
+static void force_gtt(struct radeon_bo *bo)
+{
+	if (radeon_bo_reserve(bo, false))
+		return;
+
+	radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_GTT);
+
+	if (ttm_bo_validate(&bo->tbo, &bo->placement, true, false)) {
+		DRM_ERROR("failed to force a GTT placement\n");
+	}
+	radeon_bo_unreserve(bo);
+}
+
 /**
  * radeon_vm_get_bos - add the vm BOs to a validation list
  *
@@ -147,6 +160,8 @@ struct radeon_cs_reloc *radeon_vm_get_bos(struct radeon_device *rdev,
 	list[0].handle = 0;
 	list_add(&list[0].tv.head, head);
 
+	force_gtt(vm->page_directory);
+
 	for (i = 0, idx = 1; i <= vm->max_pde_used; i++) {
 		if (!vm->page_tables[i].bo)
 			continue;
@@ -159,6 +174,8 @@ struct radeon_cs_reloc *radeon_vm_get_bos(struct radeon_device *rdev,
 		list[idx].tiling_flags = 0;
 		list[idx].handle = 0;
 		list_add(&list[idx++].tv.head, head);
+
+		force_gtt(vm->page_tables[i].bo);
 	}
 
 	return list;

[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-27 21:55     ` Marek Olšák
@ 2014-05-28 10:38       ` Christian König
  2014-05-29 16:30         ` Christian König
  0 siblings, 1 reply; 31+ messages in thread
From: Christian König @ 2014-05-28 10:38 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

I already tried a similar patch as well, without any more noticeable 
crashes. But going to give this another round with your patch and openarena.

Thanks,
Christian.

Am 27.05.2014 23:55, schrieb Marek Olšák:
> Hi Christian,
>
> I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not
> fixed yet. They are very rare and very random. Therefore, I have come
> up with a patch which evicts page tables between IBs. See the
> attachment. With that patch applied, the system starts fine, compiz
> and glxgears work, but once I start playing openarena, it locks up
> pretty quickly.
>
> The patch shouldn't do anything in theory, because pages are moved
> back to VRAM immediately after that. However, the VRAM address of page
> tables may end up being different from before, which might be the root
> cause.
>
> Marek
>
> On Wed, May 14, 2014 at 2:11 PM, Christian König
> <deathsimple@vodafone.de> wrote:
>> Crap, any chance you can narrow it down a bit more?
>>
>> I've just tried a piglit quick test on my Bonaire and it seems to work
>> perfectly fine.
>>
>> What hw do you test on?
>>
>> Regards,
>> Christian.
>>
>> Am 13.05.2014 23:21, schrieb Marek Olšák:
>>
>>> Hi Christian,
>>>
>>> Even though some regressions are fixed by these patches:
>>>
>>> drm/radeon: fix page directory update size estimation
>>> drm/radeon: fix buffer placement under memory pressure v2
>>>
>>> and indeed, the texelFetch tests no longer hang, there is one more
>>> hang which needs to be fixed. :( All I know is the exact same commit
>>> causes it and it can only be reproduced by running whole piglit with
>>> concurrency enabled.
>>>
>>> My kernel git log:
>>>
>>> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
>>> (10 hours ago) <Christian König>
>>> * 3af91e5 - drm/radeon: fix page directory update size estimation (21
>>> hours ago) <Christian König>
>>> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
>>> months ago) <Christian König>
>>> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
>>> months ago) <Christian König>
>>>
>>> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
>>> of the two fixes is the first bad commit.
>>>
>>> Marek
>>>
>>> On Fri, May 9, 2014 at 8:03 PM, Marek Olšák <maraeo@gmail.com> wrote:
>>>> Hi Christian,
>>>>
>>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>>>
>>>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
>>>> Author: Christian König <christian.koenig@amd.com>
>>>> Date:   Thu Feb 20 13:42:17 2014 +0100
>>>>
>>>>       drm/radeon: use normal BOs for the page tables v4
>>>>
>>>>       No need to make it more complicated than necessary,
>>>>       just allocate the page tables as normal BO and
>>>>       flush whenever the address change.
>>>>
>>>>       v2: update comments and function name
>>>>       v3: squash bug fixes, page directory and tables patch
>>>>       v4: rebased on Mareks changes
>>>>
>>>>       Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>
>>>>
>>>> Reverting the commit gives me a lot of merge conflicts.
>>>>
>>>> The simplest way to reproduce the hangs is to run piglit with these
>>>> parameters:
>>>> -t texelFetch.fs
>>>>
>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>> run in parallel, which creates a lot of memory pressure and probably
>>>> causes buffer evictions.
>>>>
>>>> Any idea what is wrong with it?
>>>>
>>>> Thanks,
>>>>
>>>> Marek
>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-28 10:38       ` Christian König
@ 2014-05-29 16:30         ` Christian König
  2014-05-29 16:51           ` Marek Olšák
  2014-05-29 16:52           ` Alex Deucher
  0 siblings, 2 replies; 31+ messages in thread
From: Christian König @ 2014-05-29 16:30 UTC (permalink / raw)
  To: Marek Olšák, Alex Deucher; +Cc: dri-devel

Hi Marek & Alex,

I've found the issue why forcefully evicting page tables sometimes 
crashes the box.

Well this is a typical hexdump page table before it is moved to GART:
000117f000  02914061 00000000
000117f008  02915061 00000000
000117f010  02916061 00000000
000117f018  02917061 00000000
000117f020  02918061 00000000

And it looks like this when it comes back:
0006102000  00000000 00000000
*

Ideas? I don't really have an explanation for this. Moving buffers 
around otherwise seems to work perfectly fine.

Thanks,
Christian.

Am 28.05.2014 12:38, schrieb Christian König:
> I already tried a similar patch as well, without any more noticeable 
> crashes. But going to give this another round with your patch and 
> openarena.
>
> Thanks,
> Christian.
>
> Am 27.05.2014 23:55, schrieb Marek Olšák:
>> Hi Christian,
>>
>> I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not
>> fixed yet. They are very rare and very random. Therefore, I have come
>> up with a patch which evicts page tables between IBs. See the
>> attachment. With that patch applied, the system starts fine, compiz
>> and glxgears work, but once I start playing openarena, it locks up
>> pretty quickly.
>>
>> The patch shouldn't do anything in theory, because pages are moved
>> back to VRAM immediately after that. However, the VRAM address of page
>> tables may end up being different from before, which might be the root
>> cause.
>>
>> Marek
>>
>> On Wed, May 14, 2014 at 2:11 PM, Christian König
>> <deathsimple@vodafone.de> wrote:
>>> Crap, any chance you can narrow it down a bit more?
>>>
>>> I've just tried a piglit quick test on my Bonaire and it seems to work
>>> perfectly fine.
>>>
>>> What hw do you test on?
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 13.05.2014 23:21, schrieb Marek Olšák:
>>>
>>>> Hi Christian,
>>>>
>>>> Even though some regressions are fixed by these patches:
>>>>
>>>> drm/radeon: fix page directory update size estimation
>>>> drm/radeon: fix buffer placement under memory pressure v2
>>>>
>>>> and indeed, the texelFetch tests no longer hang, there is one more
>>>> hang which needs to be fixed. :( All I know is the exact same commit
>>>> causes it and it can only be reproduced by running whole piglit with
>>>> concurrency enabled.
>>>>
>>>> My kernel git log:
>>>>
>>>> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
>>>> (10 hours ago) <Christian König>
>>>> * 3af91e5 - drm/radeon: fix page directory update size estimation (21
>>>> hours ago) <Christian König>
>>>> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
>>>> months ago) <Christian König>
>>>> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
>>>> months ago) <Christian König>
>>>>
>>>> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
>>>> of the two fixes is the first bad commit.
>>>>
>>>> Marek
>>>>
>>>> On Fri, May 9, 2014 at 8:03 PM, Marek Olšák <maraeo@gmail.com> wrote:
>>>>> Hi Christian,
>>>>>
>>>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>>>>
>>>>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
>>>>> Author: Christian König <christian.koenig@amd.com>
>>>>> Date:   Thu Feb 20 13:42:17 2014 +0100
>>>>>
>>>>>       drm/radeon: use normal BOs for the page tables v4
>>>>>
>>>>>       No need to make it more complicated than necessary,
>>>>>       just allocate the page tables as normal BO and
>>>>>       flush whenever the address change.
>>>>>
>>>>>       v2: update comments and function name
>>>>>       v3: squash bug fixes, page directory and tables patch
>>>>>       v4: rebased on Mareks changes
>>>>>
>>>>>       Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>
>>>>>
>>>>> Reverting the commit gives me a lot of merge conflicts.
>>>>>
>>>>> The simplest way to reproduce the hangs is to run piglit with these
>>>>> parameters:
>>>>> -t texelFetch.fs
>>>>>
>>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>>> run in parallel, which creates a lot of memory pressure and probably
>>>>> causes buffer evictions.
>>>>>
>>>>> Any idea what is wrong with it?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Marek
>>>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-29 16:30         ` Christian König
@ 2014-05-29 16:51           ` Marek Olšák
  2014-05-29 16:59             ` Christian König
  2014-05-29 16:52           ` Alex Deucher
  1 sibling, 1 reply; 31+ messages in thread
From: Marek Olšák @ 2014-05-29 16:51 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

Can disable evictions for page tables, e.g. by removing them from the LRU list?

Marek

On Thu, May 29, 2014 at 6:30 PM, Christian König
<deathsimple@vodafone.de> wrote:
> Hi Marek & Alex,
>
> I've found the issue why forcefully evicting page tables sometimes crashes
> the box.
>
> Well this is a typical hexdump page table before it is moved to GART:
> 000117f000  02914061 00000000
> 000117f008  02915061 00000000
> 000117f010  02916061 00000000
> 000117f018  02917061 00000000
> 000117f020  02918061 00000000
>
> And it looks like this when it comes back:
> 0006102000  00000000 00000000
> *
>
> Ideas? I don't really have an explanation for this. Moving buffers around
> otherwise seems to work perfectly fine.
>
> Thanks,
> Christian.
>
> Am 28.05.2014 12:38, schrieb Christian König:
>
>> I already tried a similar patch as well, without any more noticeable
>> crashes. But going to give this another round with your patch and openarena.
>>
>> Thanks,
>> Christian.
>>
>> Am 27.05.2014 23:55, schrieb Marek Olšák:
>>>
>>> Hi Christian,
>>>
>>> I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not
>>> fixed yet. They are very rare and very random. Therefore, I have come
>>> up with a patch which evicts page tables between IBs. See the
>>> attachment. With that patch applied, the system starts fine, compiz
>>> and glxgears work, but once I start playing openarena, it locks up
>>> pretty quickly.
>>>
>>> The patch shouldn't do anything in theory, because pages are moved
>>> back to VRAM immediately after that. However, the VRAM address of page
>>> tables may end up being different from before, which might be the root
>>> cause.
>>>
>>> Marek
>>>
>>> On Wed, May 14, 2014 at 2:11 PM, Christian König
>>> <deathsimple@vodafone.de> wrote:
>>>>
>>>> Crap, any chance you can narrow it down a bit more?
>>>>
>>>> I've just tried a piglit quick test on my Bonaire and it seems to work
>>>> perfectly fine.
>>>>
>>>> What hw do you test on?
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 13.05.2014 23:21, schrieb Marek Olšák:
>>>>
>>>>> Hi Christian,
>>>>>
>>>>> Even though some regressions are fixed by these patches:
>>>>>
>>>>> drm/radeon: fix page directory update size estimation
>>>>> drm/radeon: fix buffer placement under memory pressure v2
>>>>>
>>>>> and indeed, the texelFetch tests no longer hang, there is one more
>>>>> hang which needs to be fixed. :( All I know is the exact same commit
>>>>> causes it and it can only be reproduced by running whole piglit with
>>>>> concurrency enabled.
>>>>>
>>>>> My kernel git log:
>>>>>
>>>>> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
>>>>> (10 hours ago) <Christian König>
>>>>> * 3af91e5 - drm/radeon: fix page directory update size estimation (21
>>>>> hours ago) <Christian König>
>>>>> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
>>>>> months ago) <Christian König>
>>>>> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
>>>>> months ago) <Christian König>
>>>>>
>>>>> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
>>>>> of the two fixes is the first bad commit.
>>>>>
>>>>> Marek
>>>>>
>>>>> On Fri, May 9, 2014 at 8:03 PM, Marek Olšák <maraeo@gmail.com> wrote:
>>>>>>
>>>>>> Hi Christian,
>>>>>>
>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>>>>>
>>>>>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
>>>>>> Author: Christian König <christian.koenig@amd.com>
>>>>>> Date:   Thu Feb 20 13:42:17 2014 +0100
>>>>>>
>>>>>>       drm/radeon: use normal BOs for the page tables v4
>>>>>>
>>>>>>       No need to make it more complicated than necessary,
>>>>>>       just allocate the page tables as normal BO and
>>>>>>       flush whenever the address change.
>>>>>>
>>>>>>       v2: update comments and function name
>>>>>>       v3: squash bug fixes, page directory and tables patch
>>>>>>       v4: rebased on Mareks changes
>>>>>>
>>>>>>       Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>>
>>>>>>
>>>>>> Reverting the commit gives me a lot of merge conflicts.
>>>>>>
>>>>>> The simplest way to reproduce the hangs is to run piglit with these
>>>>>> parameters:
>>>>>> -t texelFetch.fs
>>>>>>
>>>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>>>> run in parallel, which creates a lot of memory pressure and probably
>>>>>> causes buffer evictions.
>>>>>>
>>>>>> Any idea what is wrong with it?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Marek
>>>>
>>>>
>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-29 16:30         ` Christian König
  2014-05-29 16:51           ` Marek Olšák
@ 2014-05-29 16:52           ` Alex Deucher
  2014-05-30 15:57             ` Christian König
  1 sibling, 1 reply; 31+ messages in thread
From: Alex Deucher @ 2014-05-29 16:52 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Thu, May 29, 2014 at 12:30 PM, Christian König
<deathsimple@vodafone.de> wrote:
> Hi Marek & Alex,
>
> I've found the issue why forcefully evicting page tables sometimes crashes
> the box.
>
> Well this is a typical hexdump page table before it is moved to GART:
> 000117f000  02914061 00000000
> 000117f008  02915061 00000000
> 000117f010  02916061 00000000
> 000117f018  02917061 00000000
> 000117f020  02918061 00000000
>
> And it looks like this when it comes back:
> 0006102000  00000000 00000000
> *
>
> Ideas? I don't really have an explanation for this. Moving buffers around
> otherwise seems to work perfectly fine.

Nothing I can think of off hand.  Might be worth trying CP DMA rather
than SDMA for BO moves to see if we can narrow it down a bit more.
Might also try the other SDMA ring.

Alex

>
> Thanks,
> Christian.
>
> Am 28.05.2014 12:38, schrieb Christian König:
>
>> I already tried a similar patch as well, without any more noticeable
>> crashes. But going to give this another round with your patch and openarena.
>>
>> Thanks,
>> Christian.
>>
>> Am 27.05.2014 23:55, schrieb Marek Olšák:
>>>
>>> Hi Christian,
>>>
>>> I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not
>>> fixed yet. They are very rare and very random. Therefore, I have come
>>> up with a patch which evicts page tables between IBs. See the
>>> attachment. With that patch applied, the system starts fine, compiz
>>> and glxgears work, but once I start playing openarena, it locks up
>>> pretty quickly.
>>>
>>> The patch shouldn't do anything in theory, because pages are moved
>>> back to VRAM immediately after that. However, the VRAM address of page
>>> tables may end up being different from before, which might be the root
>>> cause.
>>>
>>> Marek
>>>
>>> On Wed, May 14, 2014 at 2:11 PM, Christian König
>>> <deathsimple@vodafone.de> wrote:
>>>>
>>>> Crap, any chance you can narrow it down a bit more?
>>>>
>>>> I've just tried a piglit quick test on my Bonaire and it seems to work
>>>> perfectly fine.
>>>>
>>>> What hw do you test on?
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 13.05.2014 23:21, schrieb Marek Olšák:
>>>>
>>>>> Hi Christian,
>>>>>
>>>>> Even though some regressions are fixed by these patches:
>>>>>
>>>>> drm/radeon: fix page directory update size estimation
>>>>> drm/radeon: fix buffer placement under memory pressure v2
>>>>>
>>>>> and indeed, the texelFetch tests no longer hang, there is one more
>>>>> hang which needs to be fixed. :( All I know is the exact same commit
>>>>> causes it and it can only be reproduced by running whole piglit with
>>>>> concurrency enabled.
>>>>>
>>>>> My kernel git log:
>>>>>
>>>>> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
>>>>> (10 hours ago) <Christian König>
>>>>> * 3af91e5 - drm/radeon: fix page directory update size estimation (21
>>>>> hours ago) <Christian König>
>>>>> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
>>>>> months ago) <Christian König>
>>>>> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
>>>>> months ago) <Christian König>
>>>>>
>>>>> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
>>>>> of the two fixes is the first bad commit.
>>>>>
>>>>> Marek
>>>>>
>>>>> On Fri, May 9, 2014 at 8:03 PM, Marek Olšák <maraeo@gmail.com> wrote:
>>>>>>
>>>>>> Hi Christian,
>>>>>>
>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>>>>>
>>>>>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
>>>>>> Author: Christian König <christian.koenig@amd.com>
>>>>>> Date:   Thu Feb 20 13:42:17 2014 +0100
>>>>>>
>>>>>>       drm/radeon: use normal BOs for the page tables v4
>>>>>>
>>>>>>       No need to make it more complicated than necessary,
>>>>>>       just allocate the page tables as normal BO and
>>>>>>       flush whenever the address change.
>>>>>>
>>>>>>       v2: update comments and function name
>>>>>>       v3: squash bug fixes, page directory and tables patch
>>>>>>       v4: rebased on Mareks changes
>>>>>>
>>>>>>       Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>>
>>>>>>
>>>>>> Reverting the commit gives me a lot of merge conflicts.
>>>>>>
>>>>>> The simplest way to reproduce the hangs is to run piglit with these
>>>>>> parameters:
>>>>>> -t texelFetch.fs
>>>>>>
>>>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>>>> run in parallel, which creates a lot of memory pressure and probably
>>>>>> causes buffer evictions.
>>>>>>
>>>>>> Any idea what is wrong with it?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Marek
>>>>
>>>>
>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-29 16:51           ` Marek Olšák
@ 2014-05-29 16:59             ` Christian König
  0 siblings, 0 replies; 31+ messages in thread
From: Christian König @ 2014-05-29 16:59 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

Yeah, that will work around it for now.

But the general problem is that we have a memory corruption here, we 
just didn't noticed it earlier because clearing a texture or vectors 
with zero only results in random mis rendering.

Only when you hit a shader or in this case a page table it really 
manifests in a bad crash.

Going to dig deeper into this,
Christian.

Am 29.05.2014 18:51, schrieb Marek Olšák:
> Can disable evictions for page tables, e.g. by removing them from the LRU list?
>
> Marek
>
> On Thu, May 29, 2014 at 6:30 PM, Christian König
> <deathsimple@vodafone.de> wrote:
>> Hi Marek & Alex,
>>
>> I've found the issue why forcefully evicting page tables sometimes crashes
>> the box.
>>
>> Well this is a typical hexdump page table before it is moved to GART:
>> 000117f000  02914061 00000000
>> 000117f008  02915061 00000000
>> 000117f010  02916061 00000000
>> 000117f018  02917061 00000000
>> 000117f020  02918061 00000000
>>
>> And it looks like this when it comes back:
>> 0006102000  00000000 00000000
>> *
>>
>> Ideas? I don't really have an explanation for this. Moving buffers around
>> otherwise seems to work perfectly fine.
>>
>> Thanks,
>> Christian.
>>
>> Am 28.05.2014 12:38, schrieb Christian König:
>>
>>> I already tried a similar patch as well, without any more noticeable
>>> crashes. But going to give this another round with your patch and openarena.
>>>
>>> Thanks,
>>> Christian.
>>>
>>> Am 27.05.2014 23:55, schrieb Marek Olšák:
>>>> Hi Christian,
>>>>
>>>> I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not
>>>> fixed yet. They are very rare and very random. Therefore, I have come
>>>> up with a patch which evicts page tables between IBs. See the
>>>> attachment. With that patch applied, the system starts fine, compiz
>>>> and glxgears work, but once I start playing openarena, it locks up
>>>> pretty quickly.
>>>>
>>>> The patch shouldn't do anything in theory, because pages are moved
>>>> back to VRAM immediately after that. However, the VRAM address of page
>>>> tables may end up being different from before, which might be the root
>>>> cause.
>>>>
>>>> Marek
>>>>
>>>> On Wed, May 14, 2014 at 2:11 PM, Christian König
>>>> <deathsimple@vodafone.de> wrote:
>>>>> Crap, any chance you can narrow it down a bit more?
>>>>>
>>>>> I've just tried a piglit quick test on my Bonaire and it seems to work
>>>>> perfectly fine.
>>>>>
>>>>> What hw do you test on?
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 13.05.2014 23:21, schrieb Marek Olšák:
>>>>>
>>>>>> Hi Christian,
>>>>>>
>>>>>> Even though some regressions are fixed by these patches:
>>>>>>
>>>>>> drm/radeon: fix page directory update size estimation
>>>>>> drm/radeon: fix buffer placement under memory pressure v2
>>>>>>
>>>>>> and indeed, the texelFetch tests no longer hang, there is one more
>>>>>> hang which needs to be fixed. :( All I know is the exact same commit
>>>>>> causes it and it can only be reproduced by running whole piglit with
>>>>>> concurrency enabled.
>>>>>>
>>>>>> My kernel git log:
>>>>>>
>>>>>> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
>>>>>> (10 hours ago) <Christian König>
>>>>>> * 3af91e5 - drm/radeon: fix page directory update size estimation (21
>>>>>> hours ago) <Christian König>
>>>>>> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
>>>>>> months ago) <Christian König>
>>>>>> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
>>>>>> months ago) <Christian König>
>>>>>>
>>>>>> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
>>>>>> of the two fixes is the first bad commit.
>>>>>>
>>>>>> Marek
>>>>>>
>>>>>> On Fri, May 9, 2014 at 8:03 PM, Marek Olšák <maraeo@gmail.com> wrote:
>>>>>>> Hi Christian,
>>>>>>>
>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>>>>>>
>>>>>>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
>>>>>>> Author: Christian König <christian.koenig@amd.com>
>>>>>>> Date:   Thu Feb 20 13:42:17 2014 +0100
>>>>>>>
>>>>>>>        drm/radeon: use normal BOs for the page tables v4
>>>>>>>
>>>>>>>        No need to make it more complicated than necessary,
>>>>>>>        just allocate the page tables as normal BO and
>>>>>>>        flush whenever the address change.
>>>>>>>
>>>>>>>        v2: update comments and function name
>>>>>>>        v3: squash bug fixes, page directory and tables patch
>>>>>>>        v4: rebased on Mareks changes
>>>>>>>
>>>>>>>        Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>>>
>>>>>>>
>>>>>>> Reverting the commit gives me a lot of merge conflicts.
>>>>>>>
>>>>>>> The simplest way to reproduce the hangs is to run piglit with these
>>>>>>> parameters:
>>>>>>> -t texelFetch.fs
>>>>>>>
>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>>>>> run in parallel, which creates a lot of memory pressure and probably
>>>>>>> causes buffer evictions.
>>>>>>>
>>>>>>> Any idea what is wrong with it?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Marek
>>>>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-13 20:27                           ` Marek Olšák
@ 2014-05-30  0:30                             ` Grigori Goronzy
  2014-05-30 11:30                               ` Marek Olšák
  0 siblings, 1 reply; 31+ messages in thread
From: Grigori Goronzy @ 2014-05-30  0:30 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

On 13.05.2014 22:27, Marek Olšák wrote:
> I applied these two patches Christian sent to dri-devel:
>
> drm/radeon: fix page directory update size estimation
> drm/radeon: fix buffer placement under memory pressure v2
>
> on top of torvalds's master branch.
>

With latest kernel master (a991639c) I still see a regression, compared 
to 3.13 or 3.14, which have similar performance. Xonotic is about 7% 
slower. OpenArena and Unigine Tropics are also noticeably slower, but I 
didn't record accurate numbers.

Maybe the improved memory management has some overhead, but this is not 
acceptable IMHO. I'll try to investigate further.

Best regards
Grigori

> Marek
>
> On Tue, May 13, 2014 at 10:19 PM, Grigori Goronzy <greg@chown.ath.cx> wrote:
>> On 13.05.2014 21:50, Marek Olšák wrote:
>>>
>>> Hi Christian,
>>>
>>> The performance regression I saw with piglit seems to be fixed with
>>> latest kernel git. It's difficult to bisect the kernel, because there
>>> are only merges between 3.14 and 3.15 and the merged committs are
>>> actually based on 3.14-rc1 and 3.14-rc4.
>>>
>>> All seems to be fine with your fixes.
>>>
>>
>> Which fixes have you applied? There are quite a few pending patches on
>> dri-devel, that aren't yet part of drm-fixes-3.15.
>>
>> Grigori
>>
>>
>>> Marek
>>>
>>> On Tue, May 13, 2014 at 5:31 PM, Christian König
>>> <deathsimple@vodafone.de> wrote:
>>>>
>>>> Is the performance regression regression caused by the page table changes
>>>> or
>>>> something else?
>>>>
>>>> I did made some tests with xonotic while developing it and it didn't
>>>> showed
>>>> anything obvious, but I didn't made tests on different systems.
>>>>
>>>> Christian.
>>>>
>>>> Am 13.05.2014 17:19, schrieb Marek Olšák:
>>>>
>>>>> Your latest patches fix the regression.
>>>>>
>>>>> The performance regression can also be reproduced with piglit "-t
>>>>> texelFetch.fs".
>>>>>
>>>>> Kernel 3.14:
>>>>>       real    0m17.724s
>>>>>       user    0m41.905s
>>>>>       sys    0m11.299s
>>>>>
>>>>> The problematic commit checked out + your fixes (without the PTE patch I
>>>>> think):
>>>>>       real    0m23.474s
>>>>>       user    1m1.008s
>>>>>       sys    0m13.812s
>>>>>
>>>>> Marek
>>>>>
>>>>>
>>>>> On Tue, May 13, 2014 at 3:57 PM, Christian König
>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>
>>>>>>
>>>>>> Am 13.05.2014 15:22, schrieb Alex Deucher:
>>>>>>
>>>>>>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy <greg@chown.ath.cx>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> I can confirm this fixes it for me, too.
>>>>>>>>
>>>>>>>> 3.15 with these fixes and the large PTE patches actually ends up
>>>>>>>> being
>>>>>>>> noticeably slower than earlier kernels with Xonotic, though. I wonder
>>>>>>>> what's
>>>>>>>> going on.
>>>>>>>
>>>>>>>
>>>>>>> Allocation overhead?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Unlikely, Xonotic just allocates a single page table at start, which
>>>>>> then
>>>>>> gets extended to a certain rate until they no longer need more address
>>>>>> space
>>>>>> and are done with it.
>>>>>>
>>>>>> Grigori, can you bisect and/or try to figure out what's wrong here?
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> Grigori
>>>>>>>>
>>>>>>>>
>>>>>>>> On 12.05.2014 14:50, Christian König wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I could reproduce the problem with xonotic and I think I've found
>>>>>>>>> the
>>>>>>>>> issue.
>>>>>>>>>
>>>>>>>>> Please test the attached patch.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> Am 11.05.2014 11:06, schrieb Christian König:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yeah, thought so. Well it was just a guess.
>>>>>>>>>>
>>>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I
>>>>>>>>>>> added
>>>>>>>>>>> for userspace buffers.)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Actually it shouldn't affect that. The alternative domain always
>>>>>>>>>> contains GART even when userspace only specified VRAM as placement
>>>>>>>>>> (as
>>>>>>>>>> long as it is technical possible to do so).
>>>>>>>>>>
>>>>>>>>>> So what should happen is that TTM sees the current placement,
>>>>>>>>>> matches
>>>>>>>>>> that with the desired placement and should find that it doesn't
>>>>>>>>>> need
>>>>>>>>>> to move the buffer (we should just test if this behavior really
>>>>>>>>>> works
>>>>>>>>>> as expected).
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>> Am 10.05.2014 23:38, schrieb Marek Olšák:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Christian,
>>>>>>>>>>>
>>>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>>>
>>>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I
>>>>>>>>>>> added
>>>>>>>>>>> for userspace buffers.)
>>>>>>>>>>>
>>>>>>>>>>> Marek
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sat, May 10, 2014 at 6:34 PM, Christian König
>>>>>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Couldn't reproduce the issue so far. So the attached patch is
>>>>>>>>>>>> just
>>>>>>>>>>>> a
>>>>>>>>>>>> complete shoot into the dark found by rereading the code, but it
>>>>>>>>>>>> might
>>>>>>>>>>>> actually be the problem.
>>>>>>>>>>>>
>>>>>>>>>>>> Please give it a try.
>>>>>>>>>>>>
>>>>>>>>>>>> Going to keep testing in the meantime,
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>> Am 10.05.2014 10:23, schrieb Christian König:
>>>>>>>>>>>>
>>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g.
>>>>>>>>>>>>>> if
>>>>>>>>>>>>>> I boot
>>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with
>>>>>>>>>>>>>> high
>>>>>>>>>>>>>> settings.
>>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sounds like the same issue to me. Thx for the good test case.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any idea what is wrong with it?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Actually I already wondered that it went so smooth without any
>>>>>>>>>>>>> regression
>>>>>>>>>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests
>>>>>>>>>>>>>> also
>>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sounds like the underlying problem to me. We probably evict some
>>>>>>>>>>>>> part of a
>>>>>>>>>>>>> page table without updating the page directory. Going to dig
>>>>>>>>>>>>> into
>>>>>>>>>>>>> it today,
>>>>>>>>>>>>> it's probably just a one liner missing somewhere in the VM code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on
>>>>>>>>>>>>>>> Bonaire:
>>>>>>>>>>>>>>> [...]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The simplest way to reproduce the hangs is to run piglit with
>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>> parameters:
>>>>>>>>>>>>>>> -t texelFetch.fs
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the
>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g.
>>>>>>>>>>>>>> if
>>>>>>>>>>>>>> I boot
>>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with
>>>>>>>>>>>>>> high
>>>>>>>>>>>>>> settings.
>>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Grigori
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> dri-devel mailing list
>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>
>>>>>>
>>>>>>
>>>>
>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-30  0:30                             ` Grigori Goronzy
@ 2014-05-30 11:30                               ` Marek Olšák
  2014-05-30 11:46                                 ` Grigori Goronzy
  0 siblings, 1 reply; 31+ messages in thread
From: Marek Olšák @ 2014-05-30 11:30 UTC (permalink / raw)
  To: Grigori Goronzy; +Cc: dri-devel

Grigori,

you can git-checkout the commit before and after the memory management
changes, compile both and test them.

Marek

On Fri, May 30, 2014 at 2:30 AM, Grigori Goronzy <greg@chown.ath.cx> wrote:
> On 13.05.2014 22:27, Marek Olšák wrote:
>>
>> I applied these two patches Christian sent to dri-devel:
>>
>> drm/radeon: fix page directory update size estimation
>> drm/radeon: fix buffer placement under memory pressure v2
>>
>> on top of torvalds's master branch.
>>
>
> With latest kernel master (a991639c) I still see a regression, compared to
> 3.13 or 3.14, which have similar performance. Xonotic is about 7% slower.
> OpenArena and Unigine Tropics are also noticeably slower, but I didn't
> record accurate numbers.
>
> Maybe the improved memory management has some overhead, but this is not
> acceptable IMHO. I'll try to investigate further.
>
> Best regards
>
> Grigori
>
>> Marek
>>
>> On Tue, May 13, 2014 at 10:19 PM, Grigori Goronzy <greg@chown.ath.cx>
>> wrote:
>>>
>>> On 13.05.2014 21:50, Marek Olšák wrote:
>>>>
>>>>
>>>> Hi Christian,
>>>>
>>>> The performance regression I saw with piglit seems to be fixed with
>>>> latest kernel git. It's difficult to bisect the kernel, because there
>>>> are only merges between 3.14 and 3.15 and the merged committs are
>>>> actually based on 3.14-rc1 and 3.14-rc4.
>>>>
>>>> All seems to be fine with your fixes.
>>>>
>>>
>>> Which fixes have you applied? There are quite a few pending patches on
>>> dri-devel, that aren't yet part of drm-fixes-3.15.
>>>
>>> Grigori
>>>
>>>
>>>> Marek
>>>>
>>>> On Tue, May 13, 2014 at 5:31 PM, Christian König
>>>> <deathsimple@vodafone.de> wrote:
>>>>>
>>>>>
>>>>> Is the performance regression regression caused by the page table
>>>>> changes
>>>>> or
>>>>> something else?
>>>>>
>>>>> I did made some tests with xonotic while developing it and it didn't
>>>>> showed
>>>>> anything obvious, but I didn't made tests on different systems.
>>>>>
>>>>> Christian.
>>>>>
>>>>> Am 13.05.2014 17:19, schrieb Marek Olšák:
>>>>>
>>>>>> Your latest patches fix the regression.
>>>>>>
>>>>>> The performance regression can also be reproduced with piglit "-t
>>>>>> texelFetch.fs".
>>>>>>
>>>>>> Kernel 3.14:
>>>>>>       real    0m17.724s
>>>>>>       user    0m41.905s
>>>>>>       sys    0m11.299s
>>>>>>
>>>>>> The problematic commit checked out + your fixes (without the PTE patch
>>>>>> I
>>>>>> think):
>>>>>>       real    0m23.474s
>>>>>>       user    1m1.008s
>>>>>>       sys    0m13.812s
>>>>>>
>>>>>> Marek
>>>>>>
>>>>>>
>>>>>> On Tue, May 13, 2014 at 3:57 PM, Christian König
>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Am 13.05.2014 15:22, schrieb Alex Deucher:
>>>>>>>
>>>>>>>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy <greg@chown.ath.cx>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I can confirm this fixes it for me, too.
>>>>>>>>>
>>>>>>>>> 3.15 with these fixes and the large PTE patches actually ends up
>>>>>>>>> being
>>>>>>>>> noticeably slower than earlier kernels with Xonotic, though. I
>>>>>>>>> wonder
>>>>>>>>> what's
>>>>>>>>> going on.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Allocation overhead?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Unlikely, Xonotic just allocates a single page table at start, which
>>>>>>> then
>>>>>>> gets extended to a certain rate until they no longer need more
>>>>>>> address
>>>>>>> space
>>>>>>> and are done with it.
>>>>>>>
>>>>>>> Grigori, can you bisect and/or try to figure out what's wrong here?
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> Grigori
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 12.05.2014 14:50, Christian König wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I could reproduce the problem with xonotic and I think I've found
>>>>>>>>>> the
>>>>>>>>>> issue.
>>>>>>>>>>
>>>>>>>>>> Please test the attached patch.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>> Am 11.05.2014 11:06, schrieb Christian König:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yeah, thought so. Well it was just a guess.
>>>>>>>>>>>
>>>>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I
>>>>>>>>>>>> added
>>>>>>>>>>>> for userspace buffers.)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Actually it shouldn't affect that. The alternative domain always
>>>>>>>>>>> contains GART even when userspace only specified VRAM as
>>>>>>>>>>> placement
>>>>>>>>>>> (as
>>>>>>>>>>> long as it is technical possible to do so).
>>>>>>>>>>>
>>>>>>>>>>> So what should happen is that TTM sees the current placement,
>>>>>>>>>>> matches
>>>>>>>>>>> that with the desired placement and should find that it doesn't
>>>>>>>>>>> need
>>>>>>>>>>> to move the buffer (we should just test if this behavior really
>>>>>>>>>>> works
>>>>>>>>>>> as expected).
>>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>> Am 10.05.2014 23:38, schrieb Marek Olšák:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Christian,
>>>>>>>>>>>>
>>>>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>>>>
>>>>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I
>>>>>>>>>>>> added
>>>>>>>>>>>> for userspace buffers.)
>>>>>>>>>>>>
>>>>>>>>>>>> Marek
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, May 10, 2014 at 6:34 PM, Christian König
>>>>>>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Couldn't reproduce the issue so far. So the attached patch is
>>>>>>>>>>>>> just
>>>>>>>>>>>>> a
>>>>>>>>>>>>> complete shoot into the dark found by rereading the code, but
>>>>>>>>>>>>> it
>>>>>>>>>>>>> might
>>>>>>>>>>>>> actually be the problem.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please give it a try.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Going to keep testing in the meantime,
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 10.05.2014 10:23, schrieb Christian König:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure,
>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>> I boot
>>>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with
>>>>>>>>>>>>>>> high
>>>>>>>>>>>>>>> settings.
>>>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sounds like the same issue to me. Thx for the good test case.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any idea what is wrong with it?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Actually I already wondered that it went so smooth without any
>>>>>>>>>>>>>> regression
>>>>>>>>>>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the
>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sounds like the underlying problem to me. We probably evict
>>>>>>>>>>>>>> some
>>>>>>>>>>>>>> part of a
>>>>>>>>>>>>>> page table without updating the page directory. Going to dig
>>>>>>>>>>>>>> into
>>>>>>>>>>>>>> it today,
>>>>>>>>>>>>>> it's probably just a one liner missing somewhere in the VM
>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on
>>>>>>>>>>>>>>>> Bonaire:
>>>>>>>>>>>>>>>> [...]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The simplest way to reproduce the hangs is to run piglit
>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>> parameters:
>>>>>>>>>>>>>>>> -t texelFetch.fs
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the
>>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure,
>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>> I boot
>>>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with
>>>>>>>>>>>>>>> high
>>>>>>>>>>>>>>> settings.
>>>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Grigori
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> dri-devel mailing list
>>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-30 11:30                               ` Marek Olšák
@ 2014-05-30 11:46                                 ` Grigori Goronzy
  2014-05-30 11:51                                   ` Marek Olšák
  2014-05-30 18:01                                   ` Grigori Goronzy
  0 siblings, 2 replies; 31+ messages in thread
From: Grigori Goronzy @ 2014-05-30 11:46 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

On 30.05.2014 13:30, Marek Olšák wrote:
> Grigori,
>
> you can git-checkout the commit before and after the memory management
> changes, compile both and test them.
>

I was trying to revert the changes, but it looks like too much changed 
in the meantime. The suitable commits to check out should be 0bc490a8 
(before) and 19dff56a (after), right?

Best regards
Grigori

> Marek
>
> On Fri, May 30, 2014 at 2:30 AM, Grigori Goronzy <greg@chown.ath.cx> wrote:
>> On 13.05.2014 22:27, Marek Olšák wrote:
>>>
>>> I applied these two patches Christian sent to dri-devel:
>>>
>>> drm/radeon: fix page directory update size estimation
>>> drm/radeon: fix buffer placement under memory pressure v2
>>>
>>> on top of torvalds's master branch.
>>>
>>
>> With latest kernel master (a991639c) I still see a regression, compared to
>> 3.13 or 3.14, which have similar performance. Xonotic is about 7% slower.
>> OpenArena and Unigine Tropics are also noticeably slower, but I didn't
>> record accurate numbers.
>>
>> Maybe the improved memory management has some overhead, but this is not
>> acceptable IMHO. I'll try to investigate further.
>>
>> Best regards
>>
>> Grigori
>>
>>> Marek
>>>
>>> On Tue, May 13, 2014 at 10:19 PM, Grigori Goronzy <greg@chown.ath.cx>
>>> wrote:
>>>>
>>>> On 13.05.2014 21:50, Marek Olšák wrote:
>>>>>
>>>>>
>>>>> Hi Christian,
>>>>>
>>>>> The performance regression I saw with piglit seems to be fixed with
>>>>> latest kernel git. It's difficult to bisect the kernel, because there
>>>>> are only merges between 3.14 and 3.15 and the merged committs are
>>>>> actually based on 3.14-rc1 and 3.14-rc4.
>>>>>
>>>>> All seems to be fine with your fixes.
>>>>>
>>>>
>>>> Which fixes have you applied? There are quite a few pending patches on
>>>> dri-devel, that aren't yet part of drm-fixes-3.15.
>>>>
>>>> Grigori
>>>>
>>>>
>>>>> Marek
>>>>>
>>>>> On Tue, May 13, 2014 at 5:31 PM, Christian König
>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>
>>>>>>
>>>>>> Is the performance regression regression caused by the page table
>>>>>> changes
>>>>>> or
>>>>>> something else?
>>>>>>
>>>>>> I did made some tests with xonotic while developing it and it didn't
>>>>>> showed
>>>>>> anything obvious, but I didn't made tests on different systems.
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>> Am 13.05.2014 17:19, schrieb Marek Olšák:
>>>>>>
>>>>>>> Your latest patches fix the regression.
>>>>>>>
>>>>>>> The performance regression can also be reproduced with piglit "-t
>>>>>>> texelFetch.fs".
>>>>>>>
>>>>>>> Kernel 3.14:
>>>>>>>        real    0m17.724s
>>>>>>>        user    0m41.905s
>>>>>>>        sys    0m11.299s
>>>>>>>
>>>>>>> The problematic commit checked out + your fixes (without the PTE patch
>>>>>>> I
>>>>>>> think):
>>>>>>>        real    0m23.474s
>>>>>>>        user    1m1.008s
>>>>>>>        sys    0m13.812s
>>>>>>>
>>>>>>> Marek
>>>>>>>
>>>>>>>
>>>>>>> On Tue, May 13, 2014 at 3:57 PM, Christian König
>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 13.05.2014 15:22, schrieb Alex Deucher:
>>>>>>>>
>>>>>>>>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy <greg@chown.ath.cx>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I can confirm this fixes it for me, too.
>>>>>>>>>>
>>>>>>>>>> 3.15 with these fixes and the large PTE patches actually ends up
>>>>>>>>>> being
>>>>>>>>>> noticeably slower than earlier kernels with Xonotic, though. I
>>>>>>>>>> wonder
>>>>>>>>>> what's
>>>>>>>>>> going on.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Allocation overhead?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Unlikely, Xonotic just allocates a single page table at start, which
>>>>>>>> then
>>>>>>>> gets extended to a certain rate until they no longer need more
>>>>>>>> address
>>>>>>>> space
>>>>>>>> and are done with it.
>>>>>>>>
>>>>>>>> Grigori, can you bisect and/or try to figure out what's wrong here?
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Grigori
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 12.05.2014 14:50, Christian König wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I could reproduce the problem with xonotic and I think I've found
>>>>>>>>>>> the
>>>>>>>>>>> issue.
>>>>>>>>>>>
>>>>>>>>>>> Please test the attached patch.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>> Am 11.05.2014 11:06, schrieb Christian König:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yeah, thought so. Well it was just a guess.
>>>>>>>>>>>>
>>>>>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I
>>>>>>>>>>>>> added
>>>>>>>>>>>>> for userspace buffers.)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Actually it shouldn't affect that. The alternative domain always
>>>>>>>>>>>> contains GART even when userspace only specified VRAM as
>>>>>>>>>>>> placement
>>>>>>>>>>>> (as
>>>>>>>>>>>> long as it is technical possible to do so).
>>>>>>>>>>>>
>>>>>>>>>>>> So what should happen is that TTM sees the current placement,
>>>>>>>>>>>> matches
>>>>>>>>>>>> that with the desired placement and should find that it doesn't
>>>>>>>>>>>> need
>>>>>>>>>>>> to move the buffer (we should just test if this behavior really
>>>>>>>>>>>> works
>>>>>>>>>>>> as expected).
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>> Am 10.05.2014 23:38, schrieb Marek Olšák:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Christian,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>>>>>
>>>>>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior I
>>>>>>>>>>>>> added
>>>>>>>>>>>>> for userspace buffers.)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Marek
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, May 10, 2014 at 6:34 PM, Christian König
>>>>>>>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Couldn't reproduce the issue so far. So the attached patch is
>>>>>>>>>>>>>> just
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>> complete shoot into the dark found by rereading the code, but
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>> might
>>>>>>>>>>>>>> actually be the problem.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please give it a try.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Going to keep testing in the meantime,
>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 10.05.2014 10:23, schrieb Christian König:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure,
>>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>> I boot
>>>>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with
>>>>>>>>>>>>>>>> high
>>>>>>>>>>>>>>>> settings.
>>>>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sounds like the same issue to me. Thx for the good test case.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Any idea what is wrong with it?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Actually I already wondered that it went so smooth without any
>>>>>>>>>>>>>>> regression
>>>>>>>>>>>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the
>>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sounds like the underlying problem to me. We probably evict
>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>> part of a
>>>>>>>>>>>>>>> page table without updating the page directory. Going to dig
>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>> it today,
>>>>>>>>>>>>>>> it's probably just a one liner missing somewhere in the VM
>>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on
>>>>>>>>>>>>>>>>> Bonaire:
>>>>>>>>>>>>>>>>> [...]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The simplest way to reproduce the hangs is to run piglit
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>> parameters:
>>>>>>>>>>>>>>>>> -t texelFetch.fs
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the
>>>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure,
>>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>> I boot
>>>>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo with
>>>>>>>>>>>>>>>> high
>>>>>>>>>>>>>>>> settings.
>>>>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Grigori
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> dri-devel mailing list
>>>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-30 11:46                                 ` Grigori Goronzy
@ 2014-05-30 11:51                                   ` Marek Olšák
  2014-05-30 18:01                                   ` Grigori Goronzy
  1 sibling, 0 replies; 31+ messages in thread
From: Marek Olšák @ 2014-05-30 11:51 UTC (permalink / raw)
  To: Grigori Goronzy; +Cc: dri-devel

That's right.

Also, you probably want to enable automatic addition of the git-sha1
to the kernel version in menuconfig, there is an option for it, so
that you can have several kernels with the same version but different
sha1 installed.

Marek

On Fri, May 30, 2014 at 1:46 PM, Grigori Goronzy <greg@chown.ath.cx> wrote:
> On 30.05.2014 13:30, Marek Olšák wrote:
>>
>> Grigori,
>>
>> you can git-checkout the commit before and after the memory management
>> changes, compile both and test them.
>>
>
> I was trying to revert the changes, but it looks like too much changed in
> the meantime. The suitable commits to check out should be 0bc490a8 (before)
> and 19dff56a (after), right?
>
>
> Best regards
> Grigori
>
>> Marek
>>
>> On Fri, May 30, 2014 at 2:30 AM, Grigori Goronzy <greg@chown.ath.cx>
>> wrote:
>>>
>>> On 13.05.2014 22:27, Marek Olšák wrote:
>>>>
>>>>
>>>> I applied these two patches Christian sent to dri-devel:
>>>>
>>>> drm/radeon: fix page directory update size estimation
>>>> drm/radeon: fix buffer placement under memory pressure v2
>>>>
>>>> on top of torvalds's master branch.
>>>>
>>>
>>> With latest kernel master (a991639c) I still see a regression, compared
>>> to
>>> 3.13 or 3.14, which have similar performance. Xonotic is about 7% slower.
>>> OpenArena and Unigine Tropics are also noticeably slower, but I didn't
>>> record accurate numbers.
>>>
>>> Maybe the improved memory management has some overhead, but this is not
>>> acceptable IMHO. I'll try to investigate further.
>>>
>>> Best regards
>>>
>>> Grigori
>>>
>>>> Marek
>>>>
>>>> On Tue, May 13, 2014 at 10:19 PM, Grigori Goronzy <greg@chown.ath.cx>
>>>> wrote:
>>>>>
>>>>>
>>>>> On 13.05.2014 21:50, Marek Olšák wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Christian,
>>>>>>
>>>>>> The performance regression I saw with piglit seems to be fixed with
>>>>>> latest kernel git. It's difficult to bisect the kernel, because there
>>>>>> are only merges between 3.14 and 3.15 and the merged committs are
>>>>>> actually based on 3.14-rc1 and 3.14-rc4.
>>>>>>
>>>>>> All seems to be fine with your fixes.
>>>>>>
>>>>>
>>>>> Which fixes have you applied? There are quite a few pending patches on
>>>>> dri-devel, that aren't yet part of drm-fixes-3.15.
>>>>>
>>>>> Grigori
>>>>>
>>>>>
>>>>>> Marek
>>>>>>
>>>>>> On Tue, May 13, 2014 at 5:31 PM, Christian König
>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Is the performance regression regression caused by the page table
>>>>>>> changes
>>>>>>> or
>>>>>>> something else?
>>>>>>>
>>>>>>> I did made some tests with xonotic while developing it and it didn't
>>>>>>> showed
>>>>>>> anything obvious, but I didn't made tests on different systems.
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 13.05.2014 17:19, schrieb Marek Olšák:
>>>>>>>
>>>>>>>> Your latest patches fix the regression.
>>>>>>>>
>>>>>>>> The performance regression can also be reproduced with piglit "-t
>>>>>>>> texelFetch.fs".
>>>>>>>>
>>>>>>>> Kernel 3.14:
>>>>>>>>        real    0m17.724s
>>>>>>>>        user    0m41.905s
>>>>>>>>        sys    0m11.299s
>>>>>>>>
>>>>>>>> The problematic commit checked out + your fixes (without the PTE
>>>>>>>> patch
>>>>>>>> I
>>>>>>>> think):
>>>>>>>>        real    0m23.474s
>>>>>>>>        user    1m1.008s
>>>>>>>>        sys    0m13.812s
>>>>>>>>
>>>>>>>> Marek
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, May 13, 2014 at 3:57 PM, Christian König
>>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Am 13.05.2014 15:22, schrieb Alex Deucher:
>>>>>>>>>
>>>>>>>>>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy
>>>>>>>>>> <greg@chown.ath.cx>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I can confirm this fixes it for me, too.
>>>>>>>>>>>
>>>>>>>>>>> 3.15 with these fixes and the large PTE patches actually ends up
>>>>>>>>>>> being
>>>>>>>>>>> noticeably slower than earlier kernels with Xonotic, though. I
>>>>>>>>>>> wonder
>>>>>>>>>>> what's
>>>>>>>>>>> going on.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Allocation overhead?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Unlikely, Xonotic just allocates a single page table at start,
>>>>>>>>> which
>>>>>>>>> then
>>>>>>>>> gets extended to a certain rate until they no longer need more
>>>>>>>>> address
>>>>>>>>> space
>>>>>>>>> and are done with it.
>>>>>>>>>
>>>>>>>>> Grigori, can you bisect and/or try to figure out what's wrong here?
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Grigori
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 12.05.2014 14:50, Christian König wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I could reproduce the problem with xonotic and I think I've
>>>>>>>>>>>> found
>>>>>>>>>>>> the
>>>>>>>>>>>> issue.
>>>>>>>>>>>>
>>>>>>>>>>>> Please test the attached patch.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>> Am 11.05.2014 11:06, schrieb Christian König:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yeah, thought so. Well it was just a guess.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>> added
>>>>>>>>>>>>>> for userspace buffers.)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Actually it shouldn't affect that. The alternative domain
>>>>>>>>>>>>> always
>>>>>>>>>>>>> contains GART even when userspace only specified VRAM as
>>>>>>>>>>>>> placement
>>>>>>>>>>>>> (as
>>>>>>>>>>>>> long as it is technical possible to do so).
>>>>>>>>>>>>>
>>>>>>>>>>>>> So what should happen is that TTM sees the current placement,
>>>>>>>>>>>>> matches
>>>>>>>>>>>>> that with the desired placement and should find that it doesn't
>>>>>>>>>>>>> need
>>>>>>>>>>>>> to move the buffer (we should just test if this behavior really
>>>>>>>>>>>>> works
>>>>>>>>>>>>> as expected).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 10.05.2014 23:38, schrieb Marek Olšák:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Christian,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (Also, I don't like the patch, because it reverts the behavior
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>> added
>>>>>>>>>>>>>> for userspace buffers.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Marek
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, May 10, 2014 at 6:34 PM, Christian König
>>>>>>>>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Couldn't reproduce the issue so far. So the attached patch is
>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> complete shoot into the dark found by rereading the code, but
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>> actually be the problem.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please give it a try.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Going to keep testing in the meantime,
>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Am 10.05.2014 10:23, schrieb Christian König:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure,
>>>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>> I boot
>>>>>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> high
>>>>>>>>>>>>>>>>> settings.
>>>>>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sounds like the same issue to me. Thx for the good test
>>>>>>>>>>>>>>>> case.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Any idea what is wrong with it?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Actually I already wondered that it went so smooth without
>>>>>>>>>>>>>>>> any
>>>>>>>>>>>>>>>> regression
>>>>>>>>>>>>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the
>>>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure and
>>>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sounds like the underlying problem to me. We probably evict
>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>> part of a
>>>>>>>>>>>>>>>> page table without updating the page directory. Going to dig
>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>> it today,
>>>>>>>>>>>>>>>> it's probably just a one liner missing somewhere in the VM
>>>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs
>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> Bonaire:
>>>>>>>>>>>>>>>>>> [...]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The simplest way to reproduce the hangs is to run piglit
>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>> parameters:
>>>>>>>>>>>>>>>>>> -t texelFetch.fs
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the
>>>>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure,
>>>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>> I boot
>>>>>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> high
>>>>>>>>>>>>>>>>> settings.
>>>>>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Grigori
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> dri-devel mailing list
>>>>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-29 16:52           ` Alex Deucher
@ 2014-05-30 15:57             ` Christian König
  0 siblings, 0 replies; 31+ messages in thread
From: Christian König @ 2014-05-30 15:57 UTC (permalink / raw)
  To: Alex Deucher; +Cc: dri-devel

Well the good news is that when I use the CP DMA instead of the SDMA 
everything seems to work fine.

Unfortunately using the CP DMA has a completely different timing 
(because of the additional sync needed) and so I'm not sure if it's 
really fixed or just masked.

Christian.

Am 29.05.2014 18:52, schrieb Alex Deucher:
> On Thu, May 29, 2014 at 12:30 PM, Christian König
> <deathsimple@vodafone.de> wrote:
>> Hi Marek & Alex,
>>
>> I've found the issue why forcefully evicting page tables sometimes crashes
>> the box.
>>
>> Well this is a typical hexdump page table before it is moved to GART:
>> 000117f000  02914061 00000000
>> 000117f008  02915061 00000000
>> 000117f010  02916061 00000000
>> 000117f018  02917061 00000000
>> 000117f020  02918061 00000000
>>
>> And it looks like this when it comes back:
>> 0006102000  00000000 00000000
>> *
>>
>> Ideas? I don't really have an explanation for this. Moving buffers around
>> otherwise seems to work perfectly fine.
> Nothing I can think of off hand.  Might be worth trying CP DMA rather
> than SDMA for BO moves to see if we can narrow it down a bit more.
> Might also try the other SDMA ring.
>
> Alex
>
>> Thanks,
>> Christian.
>>
>> Am 28.05.2014 12:38, schrieb Christian König:
>>
>>> I already tried a similar patch as well, without any more noticeable
>>> crashes. But going to give this another round with your patch and openarena.
>>>
>>> Thanks,
>>> Christian.
>>>
>>> Am 27.05.2014 23:55, schrieb Marek Olšák:
>>>> Hi Christian,
>>>>
>>>> I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not
>>>> fixed yet. They are very rare and very random. Therefore, I have come
>>>> up with a patch which evicts page tables between IBs. See the
>>>> attachment. With that patch applied, the system starts fine, compiz
>>>> and glxgears work, but once I start playing openarena, it locks up
>>>> pretty quickly.
>>>>
>>>> The patch shouldn't do anything in theory, because pages are moved
>>>> back to VRAM immediately after that. However, the VRAM address of page
>>>> tables may end up being different from before, which might be the root
>>>> cause.
>>>>
>>>> Marek
>>>>
>>>> On Wed, May 14, 2014 at 2:11 PM, Christian König
>>>> <deathsimple@vodafone.de> wrote:
>>>>> Crap, any chance you can narrow it down a bit more?
>>>>>
>>>>> I've just tried a piglit quick test on my Bonaire and it seems to work
>>>>> perfectly fine.
>>>>>
>>>>> What hw do you test on?
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 13.05.2014 23:21, schrieb Marek Olšák:
>>>>>
>>>>>> Hi Christian,
>>>>>>
>>>>>> Even though some regressions are fixed by these patches:
>>>>>>
>>>>>> drm/radeon: fix page directory update size estimation
>>>>>> drm/radeon: fix buffer placement under memory pressure v2
>>>>>>
>>>>>> and indeed, the texelFetch tests no longer hang, there is one more
>>>>>> hang which needs to be fixed. :( All I know is the exact same commit
>>>>>> causes it and it can only be reproduced by running whole piglit with
>>>>>> concurrency enabled.
>>>>>>
>>>>>> My kernel git log:
>>>>>>
>>>>>> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
>>>>>> (10 hours ago) <Christian König>
>>>>>> * 3af91e5 - drm/radeon: fix page directory update size estimation (21
>>>>>> hours ago) <Christian König>
>>>>>> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
>>>>>> months ago) <Christian König>
>>>>>> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
>>>>>> months ago) <Christian König>
>>>>>>
>>>>>> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
>>>>>> of the two fixes is the first bad commit.
>>>>>>
>>>>>> Marek
>>>>>>
>>>>>> On Fri, May 9, 2014 at 8:03 PM, Marek Olšák <maraeo@gmail.com> wrote:
>>>>>>> Hi Christian,
>>>>>>>
>>>>>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>>>>>>
>>>>>>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
>>>>>>> Author: Christian König <christian.koenig@amd.com>
>>>>>>> Date:   Thu Feb 20 13:42:17 2014 +0100
>>>>>>>
>>>>>>>        drm/radeon: use normal BOs for the page tables v4
>>>>>>>
>>>>>>>        No need to make it more complicated than necessary,
>>>>>>>        just allocate the page tables as normal BO and
>>>>>>>        flush whenever the address change.
>>>>>>>
>>>>>>>        v2: update comments and function name
>>>>>>>        v3: squash bug fixes, page directory and tables patch
>>>>>>>        v4: rebased on Mareks changes
>>>>>>>
>>>>>>>        Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>>>
>>>>>>>
>>>>>>> Reverting the commit gives me a lot of merge conflicts.
>>>>>>>
>>>>>>> The simplest way to reproduce the hangs is to run piglit with these
>>>>>>> parameters:
>>>>>>> -t texelFetch.fs
>>>>>>>
>>>>>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>>>>>> run in parallel, which creates a lot of memory pressure and probably
>>>>>>> causes buffer evictions.
>>>>>>>
>>>>>>> Any idea what is wrong with it?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Marek
>>>>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CIK hangs with kernel 3.15, bisected
  2014-05-30 11:46                                 ` Grigori Goronzy
  2014-05-30 11:51                                   ` Marek Olšák
@ 2014-05-30 18:01                                   ` Grigori Goronzy
  1 sibling, 0 replies; 31+ messages in thread
From: Grigori Goronzy @ 2014-05-30 18:01 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On 30.05.2014 13:46, Grigori Goronzy wrote:
> On 30.05.2014 13:30, Marek Olšák wrote:
>> Grigori,
>>
>> you can git-checkout the commit before and after the memory management
>> changes, compile both and test them.
>>
>
> I was trying to revert the changes, but it looks like too much changed
> in the meantime. The suitable commits to check out should be 0bc490a8
> (before) and 19dff56a (after), right?
>

Turns out these changes weren't the problem, but instead it's the page 
tables rework which seems to also cause a bunch of other issues, commit 
6d2f2944. The latest drm-fixes code doesn't change it, either.

According to my (not very scientific) testing with radeontop and the 
"time" utility, this appears to be a CPU overhead problem. The "sys" 
duration reported by time for a Xonotic benchmark run is over 3x as long 
after the regression, and radeontop seems to report about 10% reduced 
GPU load on average.

Best regards
Grigori

> Best regards
> Grigori
>
>> Marek
>>
>> On Fri, May 30, 2014 at 2:30 AM, Grigori Goronzy <greg@chown.ath.cx>
>> wrote:
>>> On 13.05.2014 22:27, Marek Olšák wrote:
>>>>
>>>> I applied these two patches Christian sent to dri-devel:
>>>>
>>>> drm/radeon: fix page directory update size estimation
>>>> drm/radeon: fix buffer placement under memory pressure v2
>>>>
>>>> on top of torvalds's master branch.
>>>>
>>>
>>> With latest kernel master (a991639c) I still see a regression,
>>> compared to
>>> 3.13 or 3.14, which have similar performance. Xonotic is about 7%
>>> slower.
>>> OpenArena and Unigine Tropics are also noticeably slower, but I didn't
>>> record accurate numbers.
>>>
>>> Maybe the improved memory management has some overhead, but this is not
>>> acceptable IMHO. I'll try to investigate further.
>>>
>>> Best regards
>>>
>>> Grigori
>>>
>>>> Marek
>>>>
>>>> On Tue, May 13, 2014 at 10:19 PM, Grigori Goronzy <greg@chown.ath.cx>
>>>> wrote:
>>>>>
>>>>> On 13.05.2014 21:50, Marek Olšák wrote:
>>>>>>
>>>>>>
>>>>>> Hi Christian,
>>>>>>
>>>>>> The performance regression I saw with piglit seems to be fixed with
>>>>>> latest kernel git. It's difficult to bisect the kernel, because there
>>>>>> are only merges between 3.14 and 3.15 and the merged committs are
>>>>>> actually based on 3.14-rc1 and 3.14-rc4.
>>>>>>
>>>>>> All seems to be fine with your fixes.
>>>>>>
>>>>>
>>>>> Which fixes have you applied? There are quite a few pending patches on
>>>>> dri-devel, that aren't yet part of drm-fixes-3.15.
>>>>>
>>>>> Grigori
>>>>>
>>>>>
>>>>>> Marek
>>>>>>
>>>>>> On Tue, May 13, 2014 at 5:31 PM, Christian König
>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Is the performance regression regression caused by the page table
>>>>>>> changes
>>>>>>> or
>>>>>>> something else?
>>>>>>>
>>>>>>> I did made some tests with xonotic while developing it and it didn't
>>>>>>> showed
>>>>>>> anything obvious, but I didn't made tests on different systems.
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 13.05.2014 17:19, schrieb Marek Olšák:
>>>>>>>
>>>>>>>> Your latest patches fix the regression.
>>>>>>>>
>>>>>>>> The performance regression can also be reproduced with piglit "-t
>>>>>>>> texelFetch.fs".
>>>>>>>>
>>>>>>>> Kernel 3.14:
>>>>>>>>        real    0m17.724s
>>>>>>>>        user    0m41.905s
>>>>>>>>        sys    0m11.299s
>>>>>>>>
>>>>>>>> The problematic commit checked out + your fixes (without the PTE
>>>>>>>> patch
>>>>>>>> I
>>>>>>>> think):
>>>>>>>>        real    0m23.474s
>>>>>>>>        user    1m1.008s
>>>>>>>>        sys    0m13.812s
>>>>>>>>
>>>>>>>> Marek
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, May 13, 2014 at 3:57 PM, Christian König
>>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Am 13.05.2014 15:22, schrieb Alex Deucher:
>>>>>>>>>
>>>>>>>>>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy
>>>>>>>>>> <greg@chown.ath.cx>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I can confirm this fixes it for me, too.
>>>>>>>>>>>
>>>>>>>>>>> 3.15 with these fixes and the large PTE patches actually ends up
>>>>>>>>>>> being
>>>>>>>>>>> noticeably slower than earlier kernels with Xonotic, though. I
>>>>>>>>>>> wonder
>>>>>>>>>>> what's
>>>>>>>>>>> going on.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Allocation overhead?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Unlikely, Xonotic just allocates a single page table at start,
>>>>>>>>> which
>>>>>>>>> then
>>>>>>>>> gets extended to a certain rate until they no longer need more
>>>>>>>>> address
>>>>>>>>> space
>>>>>>>>> and are done with it.
>>>>>>>>>
>>>>>>>>> Grigori, can you bisect and/or try to figure out what's wrong
>>>>>>>>> here?
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Grigori
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 12.05.2014 14:50, Christian König wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I could reproduce the problem with xonotic and I think I've
>>>>>>>>>>>> found
>>>>>>>>>>>> the
>>>>>>>>>>>> issue.
>>>>>>>>>>>>
>>>>>>>>>>>> Please test the attached patch.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>> Am 11.05.2014 11:06, schrieb Christian König:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yeah, thought so. Well it was just a guess.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> (Also, I don't like the patch, because it reverts the
>>>>>>>>>>>>>> behavior I
>>>>>>>>>>>>>> added
>>>>>>>>>>>>>> for userspace buffers.)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Actually it shouldn't affect that. The alternative domain
>>>>>>>>>>>>> always
>>>>>>>>>>>>> contains GART even when userspace only specified VRAM as
>>>>>>>>>>>>> placement
>>>>>>>>>>>>> (as
>>>>>>>>>>>>> long as it is technical possible to do so).
>>>>>>>>>>>>>
>>>>>>>>>>>>> So what should happen is that TTM sees the current placement,
>>>>>>>>>>>>> matches
>>>>>>>>>>>>> that with the desired placement and should find that it
>>>>>>>>>>>>> doesn't
>>>>>>>>>>>>> need
>>>>>>>>>>>>> to move the buffer (we should just test if this behavior
>>>>>>>>>>>>> really
>>>>>>>>>>>>> works
>>>>>>>>>>>>> as expected).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 10.05.2014 23:38, schrieb Marek Olšák:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Christian,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have tested it and it doesn't fix the hangs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (Also, I don't like the patch, because it reverts the
>>>>>>>>>>>>>> behavior I
>>>>>>>>>>>>>> added
>>>>>>>>>>>>>> for userspace buffers.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Marek
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, May 10, 2014 at 6:34 PM, Christian König
>>>>>>>>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Couldn't reproduce the issue so far. So the attached
>>>>>>>>>>>>>>> patch is
>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> complete shoot into the dark found by rereading the code,
>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>> actually be the problem.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please give it a try.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Going to keep testing in the meantime,
>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Am 10.05.2014 10:23, schrieb Christian König:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure,
>>>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>> I boot
>>>>>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> high
>>>>>>>>>>>>>>>>> settings.
>>>>>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sounds like the same issue to me. Thx for the good test
>>>>>>>>>>>>>>>> case.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Any idea what is wrong with it?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Actually I already wondered that it went so smooth
>>>>>>>>>>>>>>>> without any
>>>>>>>>>>>>>>>> regression
>>>>>>>>>>>>>>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the
>>>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>> run in parallel, which creates a lot of memory pressure
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sounds like the underlying problem to me. We probably evict
>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>> part of a
>>>>>>>>>>>>>>>> page table without updating the page directory. Going to
>>>>>>>>>>>>>>>> dig
>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>> it today,
>>>>>>>>>>>>>>>> it's probably just a one liner missing somewhere in the VM
>>>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 09.05.2014 20:03, Marek Olšák wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This commit which first appeared in 3.15-rc1 causes
>>>>>>>>>>>>>>>>>> hangs on
>>>>>>>>>>>>>>>>>> Bonaire:
>>>>>>>>>>>>>>>>>> [...]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The simplest way to reproduce the hangs is to run piglit
>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>> parameters:
>>>>>>>>>>>>>>>>>> -t texelFetch.fs
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Some of the tests allocate a lot of MSAA textures and the
>>>>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>> run in parallel, which creates a lot of memory
>>>>>>>>>>>>>>>>>> pressure and
>>>>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>>>> causes buffer evictions.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I see hangs with kernel 3.15 and SI under memory pressure,
>>>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>> I boot
>>>>>>>>>>>>>>>>> with radeon.vramlimit=256 and then run Xonotic timedemo
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> high
>>>>>>>>>>>>>>>>> settings.
>>>>>>>>>>>>>>>>> I haven't had a chance to bisect it yet, but it might be a
>>>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Grigori
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> dri-devel mailing list
>>>>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2014-05-30 18:01 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-09 18:03 CIK hangs with kernel 3.15, bisected Marek Olšák
2014-05-09 18:10 ` Rafał Miłecki
2014-05-09 21:39 ` Grigori Goronzy
2014-05-10  8:23   ` Christian König
2014-05-10 16:34     ` Christian König
2014-05-10 21:38       ` Marek Olšák
2014-05-11  9:06         ` Christian König
2014-05-12 12:50           ` Christian König
2014-05-12 23:38             ` Grigori Goronzy
2014-05-13 13:22               ` Alex Deucher
2014-05-13 13:57                 ` Christian König
2014-05-13 15:19                   ` Marek Olšák
2014-05-13 15:31                     ` Christian König
2014-05-13 16:08                       ` Marek Olšák
2014-05-13 19:50                       ` Marek Olšák
2014-05-13 20:19                         ` Grigori Goronzy
2014-05-13 20:27                           ` Marek Olšák
2014-05-30  0:30                             ` Grigori Goronzy
2014-05-30 11:30                               ` Marek Olšák
2014-05-30 11:46                                 ` Grigori Goronzy
2014-05-30 11:51                                   ` Marek Olšák
2014-05-30 18:01                                   ` Grigori Goronzy
2014-05-13 21:21 ` Marek Olšák
2014-05-14 12:11   ` Christian König
2014-05-27 21:55     ` Marek Olšák
2014-05-28 10:38       ` Christian König
2014-05-29 16:30         ` Christian König
2014-05-29 16:51           ` Marek Olšák
2014-05-29 16:59             ` Christian König
2014-05-29 16:52           ` Alex Deucher
2014-05-30 15:57             ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.