* RT is freezing @ 2015-01-05 22:47 Gustavo Bittencourt 2015-01-06 1:26 ` Gustavo Bittencourt 0 siblings, 1 reply; 6+ messages in thread From: Gustavo Bittencourt @ 2015-01-05 22:47 UTC (permalink / raw) To: linux-rt-users Hi everybody I compiled the 3.14.25-rt22, but my system freezes when I start Unity and some programs like Chrome or Thunderbird. The problem happens only when PREEMPT_RT_FULL=y. No log is generated. I would like to find the root of this problem, but I don't know how. Do you have any suggestion? Best regard, Gustavo Bittencourt ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RT is freezing 2015-01-05 22:47 RT is freezing Gustavo Bittencourt @ 2015-01-06 1:26 ` Gustavo Bittencourt 2015-01-07 10:24 ` Joakim Hernberg 0 siblings, 1 reply; 6+ messages in thread From: Gustavo Bittencourt @ 2015-01-06 1:26 UTC (permalink / raw) To: linux-rt-users It seems that the problem is with the nouveau driver. When I boot in failsafe graphic mode, the system works well. Here is my video configuration: $ lshw -c video *-display description: VGA compatible controller product: GF108M [GeForce GT 540M] vendor: NVIDIA Corporation physical id: 0 bus info: pci@0000:01:00.0 version: a1 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress vga_controller bus_master cap_list rom configuration: driver=nouveau latency=0 resources: irq:53 memory:f4000000-f4ffffff memory:d0000000-dfffffff memory:e0000000-e1ffffff ioport:d000(size=128) memory:f5000000-f507ffff On 01/05/2015 08:47 PM, Gustavo Bittencourt wrote: > Hi everybody > > I compiled the 3.14.25-rt22, but my system freezes when I start Unity > and some programs like Chrome or Thunderbird. The problem happens only > when PREEMPT_RT_FULL=y. No log is generated. I would like to find the > root of this problem, but I don't know how. Do you have any suggestion? > > Best regard, > Gustavo Bittencourt ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RT is freezing 2015-01-06 1:26 ` Gustavo Bittencourt @ 2015-01-07 10:24 ` Joakim Hernberg 2015-01-07 23:39 ` Gustavo Bittencourt 0 siblings, 1 reply; 6+ messages in thread From: Joakim Hernberg @ 2015-01-07 10:24 UTC (permalink / raw) To: linux-rt-users On Mon, 05 Jan 2015 23:26:42 -0200 Gustavo Bittencourt <gbitten@gmail.com> wrote: > It seems that the problem is with the nouveau driver. When I boot in > failsafe graphic mode, the system works well. Here is my video > configuration: > $ lshw -c video > *-display > description: VGA compatible controller > product: GF108M [GeForce GT 540M] > vendor: NVIDIA Corporation > physical id: 0 > bus info: pci@0000:01:00.0 > version: a1 > width: 64 bits > clock: 33MHz > capabilities: pm msi pciexpress vga_controller bus_master > cap_list rom > configuration: driver=nouveau latency=0 > resources: irq:53 memory:f4000000-f4ffffff > memory:d0000000-dfffffff memory:e0000000-e1ffffff > ioport:d000(size=128) memory:f5000000-f507ffff > > > On 01/05/2015 08:47 PM, Gustavo Bittencourt wrote: > > Hi everybody > > > > I compiled the 3.14.25-rt22, but my system freezes when I start > > Unity and some programs like Chrome or Thunderbird. The problem > > happens only when PREEMPT_RT_FULL=y. No log is generated. I would > > like to find the root of this problem, but I don't know how. Do you > > have any suggestion? I don't know if this is related, and I'm sorry for mentioning nvidia on the mailinglist, but if it applies to nouveau too, I hope it's alright :) I have the same experience using the nvidia driver on a test system. This patch was brought to my attention and I use it for Archlinux' realtime kernel. It appears to fix the X hangs on my nvidia test machine (note that for me it's just X that hangs): -NOTE: this patch is a rebase of John Blackwood's patch. On his kernel, he must be using -an older simple wait patch - as his applies to kernel/sched/core.c, while the simple wait -completion code lives in kernel/sched/completion.c ... I have ported this to test with -nvidia, as i would like to see if it fixes the semaphore issues i have seen. -I've kept the original patch comment in tact; I'm not 100% sure that the patch below will fix your problem, but we saw something that sounds pretty familiar to your issue involving the nvidia driver and the preempt-rt patch. The nvidia driver uses the completion support to create their own driver's notion of an internally used semaphore. Fix a race in the PRT wait for completion simple wait code. A wait_for_completion() waiter task can be awoken by a task calling complete(), but fail to consume the 'done' completion resource if it looses a race with another task calling wait_for_completion() just as it is waking up. In this case, the awoken task will call schedule_timeout() again without being in the simple wait queue. So if the awoken task is unable to claim the 'done' completion resource, check to see if it needs to be re-inserted into the wait list before waiting again in schedule_timeout(). Fix-by: John Blackwood <john.blackwood@ccur.com> --- linux-3.14/kernel/sched/completion.c 2014-05-22 14:01:03.879734869 -0400 +++ linux-3.14/kernel/sched/completion.c 2014-05-22 14:13:59.181688658 -0400 @@ -61,11 +61,19 @@ do_wait_for_common(struct completion *x, long (*action)(long), long timeout, int state) { + int again = 0; + if (!x->done) { DEFINE_SWAITER(wait); swait_prepare_locked(&x->wait, &wait); do { + /* Check to see if we lost race for 'done' and are + * no longer in the wait list. + */ + if (unlikely(again) && list_empty(&wait.node)) + swait_prepare_locked(&x->wait, &wait); + if (signal_pending_state(state, current)) { timeout = -ERESTARTSYS; break; @@ -74,6 +82,7 @@ raw_spin_unlock_irq(&x->wait.lock); timeout = action(timeout); raw_spin_lock_irq(&x->wait.lock); + again = 1; } while (!x->done && timeout); swait_finish_locked(&x->wait, &wait); if (!x->done) -- Joakim ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RT is freezing 2015-01-07 10:24 ` Joakim Hernberg @ 2015-01-07 23:39 ` Gustavo Bittencourt 2015-02-17 17:16 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 6+ messages in thread From: Gustavo Bittencourt @ 2015-01-07 23:39 UTC (permalink / raw) To: Joakim Hernberg, linux-rt-users Unfortunately, the patch didn't work. But now I was able to get the stack (see below). This stack repeats more than 1500 times during 1 second. [ 139.532236] BUG: scheduling while atomic: Xorg/1273/0x00000002 [ 139.532252] Modules linked in: ctr ccm arc4 ath9k ath9k_common nouveau ath9k_hw bnep rfcomm ath snd_hda_codec_hdmi mac80211 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec uvcvideo videobuf2_vmalloc snd_pcm videobuf2_memops videobuf2_core mxm_wmi videodev wmi snd_hwdep snd_seq_midi i2c_algo_bit drm_kms_helper snd_seq_midi_event ttm snd_rawmidi snd_seq drm intel_rapl btusb x86_pkg_temp_thermal snd_timer cfg80211 bluetooth snd_seq_device intel_powerclamp coretemp joydev parport_pc serio_raw crc32_pclmul snd ppdev 6lowpan_iphc lp parport mac_hid mei_me mei soundcore sony_laptop video lpc_ich psmouse firewire_ohci firewire_core r8169 ahci sdhci_pci libahci mii sdhci crc_itu_t [ 139.532253] CPU: 7 PID: 1273 Comm: Xorg Tainted: G W 3.14.25-rt22+ #17 [ 139.532254] Hardware name: Sony Corporation VPCF215FB/VAIO, BIOS R0200V3 02/10/2011 [ 139.532257] 00000000 00000000 e9d13c80 c1653d1b f77cbcc0 e9d13c98 c1650b4c c182aed0 [ 139.532259] c0529300 000004f9 00000002 e9d13d14 c165708c 0000001e e9d13cc4 c1650e91 [ 139.532262] e9d12000 7adb3ab0 00000020 c1a84cc0 c0528f20 c0528f20 e9d13cd8 c105abdb [ 139.532262] Call Trace: [ 139.532264] [<c1653d1b>] dump_stack+0x48/0x76 [ 139.532266] [<c1650b4c>] __schedule_bug+0x54/0x62 [ 139.532268] [<c165708c>] __schedule+0x5dc/0x680 [ 139.532270] [<c1650e91>] ? printk+0x50/0x52 [ 139.532273] [<c105abdb>] ? print_oops_end_marker+0x3b/0x40 [ 139.532275] [<c105ac6f>] ? warn_slowpath_common+0x8f/0xa0 [ 139.532278] [<c16585be>] ? rt_mutex_slowlock+0x15e/0x1e0 [ 139.532280] [<c16585be>] ? rt_mutex_slowlock+0x15e/0x1e0 [ 139.532282] [<c165715b>] schedule+0x2b/0x90 [ 139.532284] [<c16585df>] rt_mutex_slowlock+0x17f/0x1e0 [ 139.532287] [<c1151fbd>] ? pagefault_disable+0xd/0x20 [ 139.532290] [<c1658662>] __ww_mutex_lock_interruptible+0x22/0x30 [ 139.532307] [<f8a3d33b>] nouveau_gem_ioctl_pushbuf+0x68b/0x11b0 [nouveau] [ 139.532309] [<c1087953>] ? migrate_enable+0x83/0x190 [ 139.532326] [<f8a3ccb0>] ? nouveau_gem_ioctl_new+0x1d0/0x1d0 [nouveau] [ 139.532334] [<f865b73e>] drm_ioctl+0x43e/0x4d0 [drm] [ 139.532351] [<f8a3ccb0>] ? nouveau_gem_ioctl_new+0x1d0/0x1d0 [nouveau] [ 139.532354] [<c1087953>] ? migrate_enable+0x83/0x190 [ 139.532356] [<c1426101>] ? __pm_runtime_resume+0x41/0x50 [ 139.532373] [<f8a34ea1>] nouveau_drm_ioctl+0x41/0x70 [nouveau] [ 139.532390] [<f8a34e60>] ? nouveau_pmops_thaw+0x60/0x60 [nouveau] [ 139.532392] [<c1196c92>] do_vfs_ioctl+0x2e2/0x4e0 [ 139.532394] [<c10bcb48>] ? ktime_get_ts+0x48/0x140 [ 139.532397] [<c1196ef0>] SyS_ioctl+0x60/0x90 [ 139.532398] [<c16609c6>] sysenter_do_call+0x12/0x12 On 01/07/2015 08:24 AM, Joakim Hernberg wrote: > On Mon, 05 Jan 2015 23:26:42 -0200 > Gustavo Bittencourt <gbitten@gmail.com> wrote: > >> It seems that the problem is with the nouveau driver. When I boot in >> failsafe graphic mode, the system works well. Here is my video >> configuration: >> $ lshw -c video >> *-display >> description: VGA compatible controller >> product: GF108M [GeForce GT 540M] >> vendor: NVIDIA Corporation >> physical id: 0 >> bus info: pci@0000:01:00.0 >> version: a1 >> width: 64 bits >> clock: 33MHz >> capabilities: pm msi pciexpress vga_controller bus_master >> cap_list rom >> configuration: driver=nouveau latency=0 >> resources: irq:53 memory:f4000000-f4ffffff >> memory:d0000000-dfffffff memory:e0000000-e1ffffff >> ioport:d000(size=128) memory:f5000000-f507ffff >> >> >> On 01/05/2015 08:47 PM, Gustavo Bittencourt wrote: >>> Hi everybody >>> >>> I compiled the 3.14.25-rt22, but my system freezes when I start >>> Unity and some programs like Chrome or Thunderbird. The problem >>> happens only when PREEMPT_RT_FULL=y. No log is generated. I would >>> like to find the root of this problem, but I don't know how. Do you >>> have any suggestion? > I don't know if this is related, and I'm sorry for mentioning nvidia on > the mailinglist, but if it applies to nouveau too, I hope it's > alright :) > > I have the same experience using the nvidia driver on a test system. > This patch was brought to my attention and I use it for Archlinux' > realtime kernel. It appears to fix the X hangs on my nvidia test > machine (note that for me it's just X that hangs): > > -NOTE: this patch is a rebase of John Blackwood's patch. On his kernel, he must be using > -an older simple wait patch - as his applies to kernel/sched/core.c, while the simple wait > -completion code lives in kernel/sched/completion.c ... I have ported this to test with > -nvidia, as i would like to see if it fixes the semaphore issues i have seen. > > -I've kept the original patch comment in tact; > > I'm not 100% sure that the patch below will fix your problem, but we > saw something that sounds pretty familiar to your issue involving the > nvidia driver and the preempt-rt patch. The nvidia driver uses the > completion support to create their own driver's notion of an internally > used semaphore. > > Fix a race in the PRT wait for completion simple wait code. > > A wait_for_completion() waiter task can be awoken by a task calling > complete(), but fail to consume the 'done' completion resource if it > looses a race with another task calling wait_for_completion() just as > it is waking up. > > In this case, the awoken task will call schedule_timeout() again > without being in the simple wait queue. > > So if the awoken task is unable to claim the 'done' completion resource, > check to see if it needs to be re-inserted into the wait list before > waiting again in schedule_timeout(). > > Fix-by: John Blackwood <john.blackwood@ccur.com> > > --- linux-3.14/kernel/sched/completion.c 2014-05-22 14:01:03.879734869 -0400 > +++ linux-3.14/kernel/sched/completion.c 2014-05-22 14:13:59.181688658 -0400 > @@ -61,11 +61,19 @@ > do_wait_for_common(struct completion *x, > long (*action)(long), long timeout, int state) > { > + int again = 0; > + > if (!x->done) { > DEFINE_SWAITER(wait); > > swait_prepare_locked(&x->wait, &wait); > do { > + /* Check to see if we lost race for 'done' and are > + * no longer in the wait list. > + */ > + if (unlikely(again) && list_empty(&wait.node)) > + swait_prepare_locked(&x->wait, &wait); > + > if (signal_pending_state(state, current)) { > timeout = -ERESTARTSYS; > break; > @@ -74,6 +82,7 @@ > raw_spin_unlock_irq(&x->wait.lock); > timeout = action(timeout); > raw_spin_lock_irq(&x->wait.lock); > + again = 1; > } while (!x->done && timeout); > swait_finish_locked(&x->wait, &wait); > if (!x->done) > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RT is freezing 2015-01-07 23:39 ` Gustavo Bittencourt @ 2015-02-17 17:16 ` Sebastian Andrzej Siewior 2015-02-18 0:40 ` Gustavo Bittencourt 0 siblings, 1 reply; 6+ messages in thread From: Sebastian Andrzej Siewior @ 2015-02-17 17:16 UTC (permalink / raw) To: Gustavo Bittencourt; +Cc: Joakim Hernberg, linux-rt-users * Gustavo Bittencourt | 2015-01-07 21:39:24 [-0200]: >Unfortunately, the patch didn't work. But now I was able to get the >stack (see below). This stack repeats more than 1500 times during 1 >second. I suggest you play with RT without the graphics. I recall a backtrace posted here from a crash and I've been told that there was no proper locking around it. I think it was v3.12 but I am not sure how much changeed since then. Sebastian ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RT is freezing 2015-02-17 17:16 ` Sebastian Andrzej Siewior @ 2015-02-18 0:40 ` Gustavo Bittencourt 0 siblings, 0 replies; 6+ messages in thread From: Gustavo Bittencourt @ 2015-02-18 0:40 UTC (permalink / raw) To: Sebastian Andrzej Siewior; +Cc: Joakim Hernberg, linux-rt-users On 02/17/2015 03:16 PM, Sebastian Andrzej Siewior wrote: > * Gustavo Bittencourt | 2015-01-07 21:39:24 [-0200]: > >> Unfortunately, the patch didn't work. But now I was able to get the >> stack (see below). This stack repeats more than 1500 times during 1 >> second. > I suggest you play with RT without the graphics. I recall a backtrace > posted here from a crash and I've been told that there was no > proper locking around it. I think it was v3.12 but I am not sure how > much changeed since then. > > Sebastian I've found the source of the problem (https://www.marc.info/?l=linux-rt-users&m=142178416026907), it is working smoothly now. Thanks. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-02-18 0:40 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-01-05 22:47 RT is freezing Gustavo Bittencourt 2015-01-06 1:26 ` Gustavo Bittencourt 2015-01-07 10:24 ` Joakim Hernberg 2015-01-07 23:39 ` Gustavo Bittencourt 2015-02-17 17:16 ` Sebastian Andrzej Siewior 2015-02-18 0:40 ` Gustavo Bittencourt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).