From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 28 Jun 2018 21:09:09 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1963752947==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 09CAA6E012 for ; Thu, 28 Jun 2018 21:09:09 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1963752947== Content-Type: multipart/alternative; boundary="15302201480.2d5EDF30E.17675" Content-Transfer-Encoding: 7bit --15302201480.2d5EDF30E.17675 Date: Thu, 28 Jun 2018 21:09:08 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #19 from Andrey Grodzovsky --- Can you use addr2line or gdb with 'list' command to give the line number matching (In reply to dwagner from comment #18) > The good news: So far no crashes during normal uptime with > amdgpu.vm_update_mode=3D3 >=20 > The bad news: System crashes immediately upon S3 resume (with messages qu= ite > different from the ones I saw with earlier S3-resume crashes) - I filed b= ug > report https://bugs.freedesktop.org/show_bug.cgi?id=3D107065 on this. >=20 > (In reply to Andrey Grodzovsky from comment #17) > > dwagner, this is obviously just a work around and not a fix. It points = to > > some problem with SDMA packets, if you want to continue exploring we ca= n try > > to dump some fence traces and SDMA HW ring content to examine the latest > > packets before the hang happened. >=20 > If you can include some debug output into "amd-staging-drm-next" that hel= ps > finding the root cause, I might be able to provide some output - if the > kernel survives long enough after the crash to write the system journal - > this has not always been the case. No need to recompile, just need to see what is the content of SDMA ring buf= fer when the hang occurs. Clone and build our register analyzer from here - https://cgit.freedesktop.org/amd/umr/ and once the hang happens just run=20 sudo umr -lb sudo umr -R gfx[.] sudo umr -R sdma0[.] sudo umr -R sdma1[.] I will probably need more info later but let's try this first. --=20 You are receiving this mail because: You are the assignee for the bug.= --15302201480.2d5EDF30E.17675 Date: Thu, 28 Jun 2018 21:09:08 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 19 on bug 10232= 2 from Andrey Grodzovsky
Can you use addr2line or gdb with 'list' command to give the l=
ine number
matching (In reply to dwagner from comment #18)
> The good news: So far no crashes during normal u=
ptime with
> amdgpu.vm_update_mode=3D3
>=20
> The bad news: System crashes immediately upon S3 resume (with messages=
 quite
> different from the ones I saw with earlier S3-resume crashes) - I file=
d bug
> report https://bugs.freedesktop.org/show_bug.=
cgi?id=3D107065 on this.
>=20
> (In reply to Andrey Grodzovsky from comment #17)
> > dwagner, this is obviously just a work around and not a fix. It p=
oints to
> > some problem with SDMA packets, if you want to continue exploring=
 we can try
> > to dump some fence traces and SDMA HW ring content to examine the=
 latest
> > packets before the hang happened.
>=20
> If you can include some debug output into "amd-staging-drm-next&q=
uot; that helps
> finding the root cause, I might be able to provide some output - if the
> kernel survives long enough after the crash to write the system journa=
l -
> this has not always been the case.

No need to recompile, just need to see what is the content of SDMA ring buf=
fer
when the hang occurs.

Clone and build our register analyzer from here -
https://cgit.freedesktop.=
org/amd/umr/ and once the hang happens just run=20

sudo umr -lb
sudo umr -R gfx[.]
sudo umr -R sdma0[.]
sudo umr -R sdma1[.]

I will probably need more info later but let's try this first.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15302201480.2d5EDF30E.17675-- --===============1963752947== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1963752947==--