From: Carsten Emde <C.Emde@osadl.org>
To: "Michel Dänzer" <michel@daenzer.net>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Linux RT Users <linux-rt-users@vger.kernel.org>,
DRI Development <dri-devel@lists.freedesktop.org>
Subject: Re: [OSADL QA 3.18.9-rt4 #1] Radeon driver hangs
Date: Sun, 22 Mar 2015 23:14:32 +0100 [thread overview]
Message-ID: <550F3EC8.1080109@osadl.org> (raw)
In-Reply-To: <550791E4.5070301@daenzer.net>
Hi Michel,
>>>> [..]
>>>> The most striking problem of kernel 3.18.9-rt4 affects all systems that
>>>> are equipped with Radeon graphics (irrespective whether PCIe cards or
>>>> APUs with on-chip graphics). They suffer from a hanging radeon driver.
>>>> The block occurs when accelerated graphics load is created by x11perf or
>>>> gltestperf. Sometimes only the graphics are frozen while ssh login still
>>>> is possible, somtimes the entire box is no longer accessible at all. In
>>>> any case, a reboot is needed to recover from this situation.
>>>>
>>>> Here is a selection of kernel messages:
>>> [...]
>>> The commits from
>>> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=f957063fee6392bb9365370db6db74dc0b2dce0a
>>>
>>> to
>>> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=cffefd9bb31cd35ab745d3b49005d10616d25bdc
>>>
>>> and
>>> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=b6610101718d4ab90d793c482625e98eb1262cad
>>>
>>> might help for this.
>>
>> Thanks a lot. I have applied these patches to a number of systems:
>> # quilt applied | tail -7
>> patches/drm-radeon-do-a-posting-read-in-r100_set_irq.patch
>> patches/drm-radeon-do-a-posting-read-in-rs600_set_irq.patch
>> patches/drm-radeon-do-a-posting-read-in-r600_set_irq.patch
>> patches/drm-radeon-do-a-posting-read-in-evergreen_set_irq.patch
>> patches/drm-radeon-do-a-posting-read-in-si_set_irq.patch
>> patches/drm-radeon-do-a-posting-read-in-cik_set_irq.patch
>> patches/drm-radeon-fix-wait-to-actually-occur-after-the-signaling-callback.patch
>>
>>
>> The graphic boards still crash and freeze the screen, but in contrast
>> to the earlier situation the systems remain accessible, and the X
>> Window server can be restarted after the offensive programs are
>> removed. The crashes were reliably triggered by
>> - gltestperf
>> or
>> - x11perf -repeat 3 -subs 25 -time 2 -rect10
This is not entirely correct, since gltestperf does not reliably crash
the graphics controller. However, "x11perf -repeat 3 -subs 25 -time 2
-rect10" always does a reliable job to trigger the crash.
>> but the crashes also occur several times per day during normal work
>> such as browsing the Internet or writing a text document. If you wish
>> me to provide additional diagnostic information such as running test
>> programs while the graphic boards are unresponsive, I certainly can do
>> that.
>
> Does it also happen with a kernel built from a current drm-fixes tree?
> http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-fixes
No. Apparently, you need full preemption to expose the problem.
The following list contains the results whether the command "x11perf
-repeat 3 -subs 25 -time 2 -rect10" freezes the Radeon board under test
(Radeon HD 7970 XFS / R9 280X) or not:
linux-3.12.33-rt47 no
linux-3.14.34-rt32 no
linux-3.14.34-drm-3.16.7-rt32* no
linux-3.18.7-rt1 YES
linux-3.18.9-rt4 YES
linux-3.18.9-rt5 YES
linux-3.18.9-drm-3.16.7-rt5** no
linux-4.0.0-rc4 no
linux-drm-fixes no
*DRM subsystem backported from linux-3.16.7 to linux-3.14.34-rt32.
**DRM subsystem ported from linux-3.16.7 to linux-3.18.9-rt5.
More observations:
If full function tracing is enabled (which makes the system about five
times slower), the graphics controller no longer freezes. With partial
function tracing such as "echo *drm* >set_ftrace_filter", the
controller still freezes. The trace then contains vblank interrupt
processing only, ioctls are no longer executed.
This is the location where the driver hangs:
[25104.509258] INFO: task Xorg.bin:16591 blocked for more than 120 seconds.
[25104.516322] Not tainted 3.18.9-rt5 #2
[25104.520715] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[25104.528853] Xorg.bin D ffffffff8171ed90 0 16591 16239
0x10400080
[25104.536102] ffff8800ba0bb8d8 0000000000000002 ffff8800ba0bbfd8
0000000000000006
[25104.536103] 000000000000dc08 ffff880626d0dc08 ffff8800ba0bbfd8
000000000000dc08
[25104.536104] ffff88061b2cdcd0 ffff880616d3a940 ffff880035c10000
ffff880616d3a940
[25104.559274] Call Trace:
[25104.561844] [<ffffffff8171bb54>] schedule+0x34/0xa0
[25104.561846] [<ffffffff8171e2ac>] schedule_timeout+0x23c/0x2a0
[25104.561870] [<ffffffffa00e3ab6>] ? radeon_fence_process+0x16/0x40
[radeon]
[25104.561879] [<ffffffffa00e3b24>] ?
radeon_fence_any_seq_signaled+0x44/0x90 [radeon]
[25104.561887] [<ffffffffa00e3e97>]
radeon_fence_wait_seq_timeout.constprop.8+0x327/0x380 [radeon]
[25104.561889] [<ffffffff810d19c0>] ? __wake_up_sync+0x20/0x20
[25104.561898] [<ffffffffa00e4287>] radeon_fence_wait_any+0x57/0x70
[radeon]
[25104.561914] [<ffffffffa015a36f>] radeon_sa_bo_new+0x2af/0x4b0 [radeon]
[25104.561916] [<ffffffff81379b07>] ? debug_smp_processor_id+0x17/0x20
[25104.561918] [<ffffffff811d0b4a>] ? __kmalloc+0x8a/0x300
[25104.561932] [<ffffffffa01b2197>] radeon_ib_get+0x37/0xe0 [radeon]
[25104.561943] [<ffffffffa01003ee>] radeon_cs_ioctl+0x22e/0x860 [radeon]
[25104.561952] [<ffffffffa0005bc7>] drm_ioctl+0x197/0x670 [drm]
[25104.561954] [<ffffffff81379b07>] ? debug_smp_processor_id+0x17/0x20
[25104.561956] [<ffffffff810901ba>] ? unpin_current_cpu+0x1a/0x80
[25104.561959] [<ffffffff810ba200>] ? migrate_enable+0x90/0x1a0
[25104.561966] [<ffffffffa00c604c>] radeon_drm_ioctl+0x4c/0x80 [radeon]
[25104.561967] [<ffffffff811fdb88>] do_vfs_ioctl+0x2c8/0x4c0
[25104.561969] [<ffffffff81208a92>] ? __fget+0x72/0xb0
[25104.561970] [<ffffffff811fde01>] SyS_ioctl+0x81/0xa0
[25104.561971] [<ffffffff8171f99e>] tracesys_phase2+0xd4/0xd9
Conclusion:
An upgrade change of the DRM subsystem between 3.16.7 and 3.18.9
introduced a race condition that freezes Radeon graphics. It requires
full preemption to be exposed reliably.
Thanks,
-Carsten.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
next prev parent reply other threads:[~2015-03-22 22:14 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-12 23:23 [OSADL QA 3.18.9-rt4 #1] Radeon driver hangs Carsten Emde
2015-03-13 2:23 ` Michel Dänzer
2015-03-13 11:12 ` Sebastian Andrzej Siewior
2015-03-16 14:52 ` Carsten Emde
2015-03-17 2:31 ` Michel Dänzer
2015-03-22 22:14 ` Carsten Emde [this message]
2015-03-25 6:57 ` Michel Dänzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=550F3EC8.1080109@osadl.org \
--to=c.emde@osadl.org \
--cc=bigeasy@linutronix.de \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=michel@daenzer.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).