* RT is freezing
@ 2015-01-05 22:47 Gustavo Bittencourt
2015-01-06 1:26 ` Gustavo Bittencourt
0 siblings, 1 reply; 6+ messages in thread
From: Gustavo Bittencourt @ 2015-01-05 22:47 UTC (permalink / raw)
To: linux-rt-users
Hi everybody
I compiled the 3.14.25-rt22, but my system freezes when I start Unity
and some programs like Chrome or Thunderbird. The problem happens only
when PREEMPT_RT_FULL=y. No log is generated. I would like to find the
root of this problem, but I don't know how. Do you have any suggestion?
Best regard,
Gustavo Bittencourt
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RT is freezing
2015-01-05 22:47 RT is freezing Gustavo Bittencourt
@ 2015-01-06 1:26 ` Gustavo Bittencourt
2015-01-07 10:24 ` Joakim Hernberg
0 siblings, 1 reply; 6+ messages in thread
From: Gustavo Bittencourt @ 2015-01-06 1:26 UTC (permalink / raw)
To: linux-rt-users
It seems that the problem is with the nouveau driver. When I boot in
failsafe graphic mode, the system works well. Here is my video
configuration:
$ lshw -c video
*-display
description: VGA compatible controller
product: GF108M [GeForce GT 540M]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master
cap_list rom
configuration: driver=nouveau latency=0
resources: irq:53 memory:f4000000-f4ffffff
memory:d0000000-dfffffff memory:e0000000-e1ffffff ioport:d000(size=128)
memory:f5000000-f507ffff
On 01/05/2015 08:47 PM, Gustavo Bittencourt wrote:
> Hi everybody
>
> I compiled the 3.14.25-rt22, but my system freezes when I start Unity
> and some programs like Chrome or Thunderbird. The problem happens only
> when PREEMPT_RT_FULL=y. No log is generated. I would like to find the
> root of this problem, but I don't know how. Do you have any suggestion?
>
> Best regard,
> Gustavo Bittencourt
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RT is freezing
2015-01-06 1:26 ` Gustavo Bittencourt
@ 2015-01-07 10:24 ` Joakim Hernberg
2015-01-07 23:39 ` Gustavo Bittencourt
0 siblings, 1 reply; 6+ messages in thread
From: Joakim Hernberg @ 2015-01-07 10:24 UTC (permalink / raw)
To: linux-rt-users
On Mon, 05 Jan 2015 23:26:42 -0200
Gustavo Bittencourt <gbitten@gmail.com> wrote:
> It seems that the problem is with the nouveau driver. When I boot in
> failsafe graphic mode, the system works well. Here is my video
> configuration:
> $ lshw -c video
> *-display
> description: VGA compatible controller
> product: GF108M [GeForce GT 540M]
> vendor: NVIDIA Corporation
> physical id: 0
> bus info: pci@0000:01:00.0
> version: a1
> width: 64 bits
> clock: 33MHz
> capabilities: pm msi pciexpress vga_controller bus_master
> cap_list rom
> configuration: driver=nouveau latency=0
> resources: irq:53 memory:f4000000-f4ffffff
> memory:d0000000-dfffffff memory:e0000000-e1ffffff
> ioport:d000(size=128) memory:f5000000-f507ffff
>
>
> On 01/05/2015 08:47 PM, Gustavo Bittencourt wrote:
> > Hi everybody
> >
> > I compiled the 3.14.25-rt22, but my system freezes when I start
> > Unity and some programs like Chrome or Thunderbird. The problem
> > happens only when PREEMPT_RT_FULL=y. No log is generated. I would
> > like to find the root of this problem, but I don't know how. Do you
> > have any suggestion?
I don't know if this is related, and I'm sorry for mentioning nvidia on
the mailinglist, but if it applies to nouveau too, I hope it's
alright :)
I have the same experience using the nvidia driver on a test system.
This patch was brought to my attention and I use it for Archlinux'
realtime kernel. It appears to fix the X hangs on my nvidia test
machine (note that for me it's just X that hangs):
-NOTE: this patch is a rebase of John Blackwood's patch. On his kernel, he must be using
-an older simple wait patch - as his applies to kernel/sched/core.c, while the simple wait
-completion code lives in kernel/sched/completion.c ... I have ported this to test with
-nvidia, as i would like to see if it fixes the semaphore issues i have seen.
-I've kept the original patch comment in tact;
I'm not 100% sure that the patch below will fix your problem, but we
saw something that sounds pretty familiar to your issue involving the
nvidia driver and the preempt-rt patch. The nvidia driver uses the
completion support to create their own driver's notion of an internally
used semaphore.
Fix a race in the PRT wait for completion simple wait code.
A wait_for_completion() waiter task can be awoken by a task calling
complete(), but fail to consume the 'done' completion resource if it
looses a race with another task calling wait_for_completion() just as
it is waking up.
In this case, the awoken task will call schedule_timeout() again
without being in the simple wait queue.
So if the awoken task is unable to claim the 'done' completion resource,
check to see if it needs to be re-inserted into the wait list before
waiting again in schedule_timeout().
Fix-by: John Blackwood <john.blackwood@ccur.com>
--- linux-3.14/kernel/sched/completion.c 2014-05-22 14:01:03.879734869 -0400
+++ linux-3.14/kernel/sched/completion.c 2014-05-22 14:13:59.181688658 -0400
@@ -61,11 +61,19 @@
do_wait_for_common(struct completion *x,
long (*action)(long), long timeout, int state)
{
+ int again = 0;
+
if (!x->done) {
DEFINE_SWAITER(wait);
swait_prepare_locked(&x->wait, &wait);
do {
+ /* Check to see if we lost race for 'done' and are
+ * no longer in the wait list.
+ */
+ if (unlikely(again) && list_empty(&wait.node))
+ swait_prepare_locked(&x->wait, &wait);
+
if (signal_pending_state(state, current)) {
timeout = -ERESTARTSYS;
break;
@@ -74,6 +82,7 @@
raw_spin_unlock_irq(&x->wait.lock);
timeout = action(timeout);
raw_spin_lock_irq(&x->wait.lock);
+ again = 1;
} while (!x->done && timeout);
swait_finish_locked(&x->wait, &wait);
if (!x->done)
--
Joakim
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RT is freezing
2015-01-07 10:24 ` Joakim Hernberg
@ 2015-01-07 23:39 ` Gustavo Bittencourt
2015-02-17 17:16 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 6+ messages in thread
From: Gustavo Bittencourt @ 2015-01-07 23:39 UTC (permalink / raw)
To: Joakim Hernberg, linux-rt-users
Unfortunately, the patch didn't work. But now I was able to get the
stack (see below). This stack repeats more than 1500 times during 1 second.
[ 139.532236] BUG: scheduling while atomic: Xorg/1273/0x00000002
[ 139.532252] Modules linked in: ctr ccm arc4 ath9k ath9k_common
nouveau ath9k_hw bnep rfcomm ath snd_hda_codec_hdmi mac80211
snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec
uvcvideo videobuf2_vmalloc snd_pcm videobuf2_memops videobuf2_core
mxm_wmi videodev wmi snd_hwdep snd_seq_midi i2c_algo_bit drm_kms_helper
snd_seq_midi_event ttm snd_rawmidi snd_seq drm intel_rapl btusb
x86_pkg_temp_thermal snd_timer cfg80211 bluetooth snd_seq_device
intel_powerclamp coretemp joydev parport_pc serio_raw crc32_pclmul snd
ppdev 6lowpan_iphc lp parport mac_hid mei_me mei soundcore sony_laptop
video lpc_ich psmouse firewire_ohci firewire_core r8169 ahci sdhci_pci
libahci mii sdhci crc_itu_t
[ 139.532253] CPU: 7 PID: 1273 Comm: Xorg Tainted: G W
3.14.25-rt22+ #17
[ 139.532254] Hardware name: Sony Corporation VPCF215FB/VAIO, BIOS
R0200V3 02/10/2011
[ 139.532257] 00000000 00000000 e9d13c80 c1653d1b f77cbcc0 e9d13c98
c1650b4c c182aed0
[ 139.532259] c0529300 000004f9 00000002 e9d13d14 c165708c 0000001e
e9d13cc4 c1650e91
[ 139.532262] e9d12000 7adb3ab0 00000020 c1a84cc0 c0528f20 c0528f20
e9d13cd8 c105abdb
[ 139.532262] Call Trace:
[ 139.532264] [<c1653d1b>] dump_stack+0x48/0x76
[ 139.532266] [<c1650b4c>] __schedule_bug+0x54/0x62
[ 139.532268] [<c165708c>] __schedule+0x5dc/0x680
[ 139.532270] [<c1650e91>] ? printk+0x50/0x52
[ 139.532273] [<c105abdb>] ? print_oops_end_marker+0x3b/0x40
[ 139.532275] [<c105ac6f>] ? warn_slowpath_common+0x8f/0xa0
[ 139.532278] [<c16585be>] ? rt_mutex_slowlock+0x15e/0x1e0
[ 139.532280] [<c16585be>] ? rt_mutex_slowlock+0x15e/0x1e0
[ 139.532282] [<c165715b>] schedule+0x2b/0x90
[ 139.532284] [<c16585df>] rt_mutex_slowlock+0x17f/0x1e0
[ 139.532287] [<c1151fbd>] ? pagefault_disable+0xd/0x20
[ 139.532290] [<c1658662>] __ww_mutex_lock_interruptible+0x22/0x30
[ 139.532307] [<f8a3d33b>] nouveau_gem_ioctl_pushbuf+0x68b/0x11b0
[nouveau]
[ 139.532309] [<c1087953>] ? migrate_enable+0x83/0x190
[ 139.532326] [<f8a3ccb0>] ? nouveau_gem_ioctl_new+0x1d0/0x1d0 [nouveau]
[ 139.532334] [<f865b73e>] drm_ioctl+0x43e/0x4d0 [drm]
[ 139.532351] [<f8a3ccb0>] ? nouveau_gem_ioctl_new+0x1d0/0x1d0 [nouveau]
[ 139.532354] [<c1087953>] ? migrate_enable+0x83/0x190
[ 139.532356] [<c1426101>] ? __pm_runtime_resume+0x41/0x50
[ 139.532373] [<f8a34ea1>] nouveau_drm_ioctl+0x41/0x70 [nouveau]
[ 139.532390] [<f8a34e60>] ? nouveau_pmops_thaw+0x60/0x60 [nouveau]
[ 139.532392] [<c1196c92>] do_vfs_ioctl+0x2e2/0x4e0
[ 139.532394] [<c10bcb48>] ? ktime_get_ts+0x48/0x140
[ 139.532397] [<c1196ef0>] SyS_ioctl+0x60/0x90
[ 139.532398] [<c16609c6>] sysenter_do_call+0x12/0x12
On 01/07/2015 08:24 AM, Joakim Hernberg wrote:
> On Mon, 05 Jan 2015 23:26:42 -0200
> Gustavo Bittencourt <gbitten@gmail.com> wrote:
>
>> It seems that the problem is with the nouveau driver. When I boot in
>> failsafe graphic mode, the system works well. Here is my video
>> configuration:
>> $ lshw -c video
>> *-display
>> description: VGA compatible controller
>> product: GF108M [GeForce GT 540M]
>> vendor: NVIDIA Corporation
>> physical id: 0
>> bus info: pci@0000:01:00.0
>> version: a1
>> width: 64 bits
>> clock: 33MHz
>> capabilities: pm msi pciexpress vga_controller bus_master
>> cap_list rom
>> configuration: driver=nouveau latency=0
>> resources: irq:53 memory:f4000000-f4ffffff
>> memory:d0000000-dfffffff memory:e0000000-e1ffffff
>> ioport:d000(size=128) memory:f5000000-f507ffff
>>
>>
>> On 01/05/2015 08:47 PM, Gustavo Bittencourt wrote:
>>> Hi everybody
>>>
>>> I compiled the 3.14.25-rt22, but my system freezes when I start
>>> Unity and some programs like Chrome or Thunderbird. The problem
>>> happens only when PREEMPT_RT_FULL=y. No log is generated. I would
>>> like to find the root of this problem, but I don't know how. Do you
>>> have any suggestion?
> I don't know if this is related, and I'm sorry for mentioning nvidia on
> the mailinglist, but if it applies to nouveau too, I hope it's
> alright :)
>
> I have the same experience using the nvidia driver on a test system.
> This patch was brought to my attention and I use it for Archlinux'
> realtime kernel. It appears to fix the X hangs on my nvidia test
> machine (note that for me it's just X that hangs):
>
> -NOTE: this patch is a rebase of John Blackwood's patch. On his kernel, he must be using
> -an older simple wait patch - as his applies to kernel/sched/core.c, while the simple wait
> -completion code lives in kernel/sched/completion.c ... I have ported this to test with
> -nvidia, as i would like to see if it fixes the semaphore issues i have seen.
>
> -I've kept the original patch comment in tact;
>
> I'm not 100% sure that the patch below will fix your problem, but we
> saw something that sounds pretty familiar to your issue involving the
> nvidia driver and the preempt-rt patch. The nvidia driver uses the
> completion support to create their own driver's notion of an internally
> used semaphore.
>
> Fix a race in the PRT wait for completion simple wait code.
>
> A wait_for_completion() waiter task can be awoken by a task calling
> complete(), but fail to consume the 'done' completion resource if it
> looses a race with another task calling wait_for_completion() just as
> it is waking up.
>
> In this case, the awoken task will call schedule_timeout() again
> without being in the simple wait queue.
>
> So if the awoken task is unable to claim the 'done' completion resource,
> check to see if it needs to be re-inserted into the wait list before
> waiting again in schedule_timeout().
>
> Fix-by: John Blackwood <john.blackwood@ccur.com>
>
> --- linux-3.14/kernel/sched/completion.c 2014-05-22 14:01:03.879734869 -0400
> +++ linux-3.14/kernel/sched/completion.c 2014-05-22 14:13:59.181688658 -0400
> @@ -61,11 +61,19 @@
> do_wait_for_common(struct completion *x,
> long (*action)(long), long timeout, int state)
> {
> + int again = 0;
> +
> if (!x->done) {
> DEFINE_SWAITER(wait);
>
> swait_prepare_locked(&x->wait, &wait);
> do {
> + /* Check to see if we lost race for 'done' and are
> + * no longer in the wait list.
> + */
> + if (unlikely(again) && list_empty(&wait.node))
> + swait_prepare_locked(&x->wait, &wait);
> +
> if (signal_pending_state(state, current)) {
> timeout = -ERESTARTSYS;
> break;
> @@ -74,6 +82,7 @@
> raw_spin_unlock_irq(&x->wait.lock);
> timeout = action(timeout);
> raw_spin_lock_irq(&x->wait.lock);
> + again = 1;
> } while (!x->done && timeout);
> swait_finish_locked(&x->wait, &wait);
> if (!x->done)
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RT is freezing
2015-01-07 23:39 ` Gustavo Bittencourt
@ 2015-02-17 17:16 ` Sebastian Andrzej Siewior
2015-02-18 0:40 ` Gustavo Bittencourt
0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-02-17 17:16 UTC (permalink / raw)
To: Gustavo Bittencourt; +Cc: Joakim Hernberg, linux-rt-users
* Gustavo Bittencourt | 2015-01-07 21:39:24 [-0200]:
>Unfortunately, the patch didn't work. But now I was able to get the
>stack (see below). This stack repeats more than 1500 times during 1
>second.
I suggest you play with RT without the graphics. I recall a backtrace
posted here from a crash and I've been told that there was no
proper locking around it. I think it was v3.12 but I am not sure how
much changeed since then.
Sebastian
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RT is freezing
2015-02-17 17:16 ` Sebastian Andrzej Siewior
@ 2015-02-18 0:40 ` Gustavo Bittencourt
0 siblings, 0 replies; 6+ messages in thread
From: Gustavo Bittencourt @ 2015-02-18 0:40 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: Joakim Hernberg, linux-rt-users
On 02/17/2015 03:16 PM, Sebastian Andrzej Siewior wrote:
> * Gustavo Bittencourt | 2015-01-07 21:39:24 [-0200]:
>
>> Unfortunately, the patch didn't work. But now I was able to get the
>> stack (see below). This stack repeats more than 1500 times during 1
>> second.
> I suggest you play with RT without the graphics. I recall a backtrace
> posted here from a crash and I've been told that there was no
> proper locking around it. I think it was v3.12 but I am not sure how
> much changeed since then.
>
> Sebastian
I've found the source of the problem
(https://www.marc.info/?l=linux-rt-users&m=142178416026907), it is
working smoothly now. Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-02-18 0:40 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-05 22:47 RT is freezing Gustavo Bittencourt
2015-01-06 1:26 ` Gustavo Bittencourt
2015-01-07 10:24 ` Joakim Hernberg
2015-01-07 23:39 ` Gustavo Bittencourt
2015-02-17 17:16 ` Sebastian Andrzej Siewior
2015-02-18 0:40 ` Gustavo Bittencourt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).