From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 93264] Tonga VM Faults since llvm ScheduleDAGInstrs: Rework schedule graph builder. Date: Sat, 05 Dec 2015 18:24:42 +0000 Message-ID: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0669362210==" Return-path: Received: from culpepper.freedesktop.org (unknown [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id E48936E470 for ; Sat, 5 Dec 2015 10:24:42 -0800 (PST) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0669362210== Content-Type: multipart/alternative; boundary="1449339882.36a6EB30.4139"; charset="UTF-8" --1449339882.36a6EB30.4139 Date: Sat, 5 Dec 2015 18:24:42 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" https://bugs.freedesktop.org/show_bug.cgi?id=93264 Bug ID: 93264 Summary: Tonga VM Faults since llvm ScheduleDAGInstrs: Rework schedule graph builder. Product: DRI Version: DRI git Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: adf.lists@gmail.com R9285 using Unreal ElementalDemo to trigger this. It doesn't start till well into the demo at the same place that triggered an older resolved issue. https://bugs.freedesktop.org/show_bug.cgi?id=93015 (so maybe Nicolai knows what happens at this point in demo) bisecting llvm came up with c0a189c3792865257c1383f176e5401373ed2270 is the first bad commit commit c0a189c3792865257c1383f176e5401373ed2270 Author: Matthias Braun Date: Thu Dec 3 02:05:27 2015 +0000 ScheduleDAGInstrs: Rework schedule graph builder. The new algorithm remembers the uses encountered while walking backwards until a matching def is found. Contrary to the previous version this: - Works without LiveIntervals being available - Allows to increase the precision to subregisters/lanemasks (not used for now) The changes in the AMDGPU tests are necessary because the R600 scheduler is not stable with respect to the order of nodes in the ready queues. Differential Revision: http://reviews.llvm.org/D9068 The demo continues to run/render OK, but I get thousands of - amdgpu 0000:01:00.0: GPU fault detected: 147 0x07d04401 amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x092D80FA amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A044001 VM fault (0x01, vmid 5) at page 153977082, read from 'TC7' (0x54433700) (68) amdgpu 0000:01:00.0: GPU fault detected: 147 0x07d00401 amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0022D16B amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A0C4002 VM fault (0x02, vmid 5) at page 2281835, read from 'TC4' (0x54433400) (196) amdgpu 0000:01:00.0: GPU fault detected: 147 0x07d04001 amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0022D163 -- You are receiving this mail because: You are the assignee for the bug. --1449339882.36a6EB30.4139 Date: Sat, 5 Dec 2015 18:24:42 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8"
Bug ID 93264
Summary Tonga VM Faults since llvm ScheduleDAGInstrs: Rework schedule graph builder.
Product DRI
Version DRI git
Hardware x86-64 (AMD64)
OS Linux (All)
Status NEW
Severity normal
Priority medium
Component DRM/AMDgpu
Assignee dri-devel@lists.freedesktop.org
Reporter adf.lists@gmail.com

R9285 using Unreal ElementalDemo to trigger this.

It doesn't start till well into the demo at the same place that triggered an
older resolved issue.

https://bugs.freedesktop.org/show_bug.cgi?id=93015
(so maybe Nicolai knows what happens at this point in demo)

bisecting llvm came up with

c0a189c3792865257c1383f176e5401373ed2270 is the first bad commit
commit c0a189c3792865257c1383f176e5401373ed2270
Author: Matthias Braun <matze@braunis.de>
Date:   Thu Dec 3 02:05:27 2015 +0000

    ScheduleDAGInstrs: Rework schedule graph builder.

    The new algorithm remembers the uses encountered while walking backwards
    until a matching def is found. Contrary to the previous version this:
    - Works without LiveIntervals being available
    - Allows to increase the precision to subregisters/lanemasks
      (not used for now)

    The changes in the AMDGPU tests are necessary because the R600 scheduler
    is not stable with respect to the order of nodes in the ready queues.

    Differential Revision: http://reviews.llvm.org/D9068


The demo continues to run/render OK, but I get thousands of -

amdgpu 0000:01:00.0: GPU fault detected: 147 0x07d04401
amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x092D80FA
amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A044001
VM fault (0x01, vmid 5) at page 153977082, read from 'TC7' (0x54433700) (68)
amdgpu 0000:01:00.0: GPU fault detected: 147 0x07d00401
amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0022D16B
amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A0C4002
VM fault (0x02, vmid 5) at page 2281835, read from 'TC4' (0x54433400) (196)
amdgpu 0000:01:00.0: GPU fault detected: 147 0x07d04001
amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0022D163


You are receiving this mail because:
  • You are the assignee for the bug.
--1449339882.36a6EB30.4139-- --===============0669362210== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHA6Ly9saXN0 cy5mcmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9kcmktZGV2ZWwK --===============0669362210==--