virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* Re: Xorg indefinitely hangs in kernelspace
@ 2019-09-06  5:53 Hillf Danton
  2019-09-06 20:27 ` [Spice-devel] " Frediano Ziglio
  2019-09-09  5:52 ` Gerd Hoffmann
  0 siblings, 2 replies; 8+ messages in thread
From: Hillf Danton @ 2019-09-06  5:53 UTC (permalink / raw)
  To: Jaak Ristioja
  Cc: David Airlie, linux-kernel, dri-devel, virtualization,
	Daniel Vetter, spice-devel, Dave Airlie


On Tue, 6 Aug 2019 21:00:10 +0300 From:   Jaak Ristioja <jaak@ristioja.ee>
> Hello!
> 
> I'm writing to report a crash in the QXL / DRM code in the Linux kernel.
> I originally filed the issue on LaunchPad and more details can be found
> there, although I doubt whether these details are useful.
> 
>   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1813620
> 
> I first experienced these issues with:
> 
> * Ubuntu 18.04 (probably kernel 4.15.something)
> * Ubuntu 18.10 (kernel 4.18.0-13)
> * Ubuntu 19.04 (kernel 5.0.0-13-generic)
> * Ubuntu 19.04 (mainline kernel 5.1-rc7)
> * Ubuntu 19.04 (mainline kernel 5.2.0-050200rc1-generic)
> 
> Here is the crash output from dmesg:
> 
> [354073.713350] INFO: task Xorg:920 blocked for more than 120 seconds.
> [354073.717755]       Not tainted 5.2.0-050200rc1-generic #201905191930
> [354073.722277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [354073.738332] Xorg            D    0   920    854 0x00404004
> [354073.738334] Call Trace:
> [354073.738340]  __schedule+0x2ba/0x650
> [354073.738342]  schedule+0x2d/0x90
> [354073.738343]  schedule_preempt_disabled+0xe/0x10
> [354073.738345]  __ww_mutex_lock.isra.11+0x3e0/0x750
> [354073.738346]  __ww_mutex_lock_slowpath+0x16/0x20
> [354073.738347]  ww_mutex_lock+0x34/0x50
> [354073.738352]  ttm_eu_reserve_buffers+0x1f9/0x2e0 [ttm]
> [354073.738356]  qxl_release_reserve_list+0x67/0x150 [qxl]
> [354073.738358]  ? qxl_bo_pin+0xaa/0x190 [qxl]
> [354073.738359]  qxl_cursor_atomic_update+0x1b0/0x2e0 [qxl]
> [354073.738367]  drm_atomic_helper_commit_planes+0xb9/0x220 [drm_kms_helper]
> [354073.738371]  drm_atomic_helper_commit_tail+0x2b/0x70 [drm_kms_helper]
> [354073.738374]  commit_tail+0x67/0x70 [drm_kms_helper]
> [354073.738378]  drm_atomic_helper_commit+0x113/0x120 [drm_kms_helper]
> [354073.738390]  drm_atomic_commit+0x4a/0x50 [drm]
> [354073.738394]  drm_atomic_helper_update_plane+0xe9/0x100 [drm_kms_helper]
> [354073.738402]  __setplane_atomic+0xd3/0x120 [drm]
> [354073.738410]  drm_mode_cursor_universal+0x142/0x270 [drm]
> [354073.738418]  drm_mode_cursor_common+0xcb/0x220 [drm]
> [354073.738425]  ? drm_mode_cursor_ioctl+0x60/0x60 [drm]
> [354073.738432]  drm_mode_cursor2_ioctl+0xe/0x10 [drm]
> [354073.738438]  drm_ioctl_kernel+0xb0/0x100 [drm]
> [354073.738440]  ? ___sys_recvmsg+0x16c/0x200
> [354073.738445]  drm_ioctl+0x233/0x410 [drm]
> [354073.738452]  ? drm_mode_cursor_ioctl+0x60/0x60 [drm]
> [354073.738454]  ? timerqueue_add+0x57/0x90
> [354073.738456]  ? enqueue_hrtimer+0x3c/0x90
> [354073.738458]  do_vfs_ioctl+0xa9/0x640
> [354073.738459]  ? fput+0x13/0x20
> [354073.738461]  ? __sys_recvmsg+0x88/0xa0
> [354073.738462]  ksys_ioctl+0x67/0x90
> [354073.738463]  __x64_sys_ioctl+0x1a/0x20
> [354073.738465]  do_syscall_64+0x5a/0x140
> [354073.738467]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [354073.738468] RIP: 0033:0x7ffad14d3417
> [354073.738472] Code: Bad RIP value.
> [354073.738472] RSP: 002b:00007ffdd5679978 EFLAGS: 00003246 ORIG_RAX:
> 0000000000000010
> [354073.738473] RAX: ffffffffffffffda RBX: 000056428a474610 RCX:
> 00007ffad14d3417
> [354073.738474] RDX: 00007ffdd56799b0 RSI: 00000000c02464bb RDI:
> 000000000000000e
> [354073.738474] RBP: 00007ffdd56799b0 R08: 0000000000000040 R09:
> 0000000000000010
> [354073.738475] R10: 000000000000003f R11: 0000000000003246 R12:
> 00000000c02464bb
> [354073.738475] R13: 000000000000000e R14: 0000000000000000 R15:
> 000056428a4721d0
> [354073.738511] INFO: task kworker/1:0:27625 blocked for more than 120 seconds.
> [354073.745154]       Not tainted 5.2.0-050200rc1-generic #201905191930
> [354073.751900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [354073.762197] kworker/1:0     D    0 27625      2 0x80004000
> [354073.762205] Workqueue: events qxl_client_monitors_config_work_func [qxl]
> [354073.762206] Call Trace:
> [354073.762211]  __schedule+0x2ba/0x650
> [354073.762214]  schedule+0x2d/0x90
> [354073.762215]  schedule_preempt_disabled+0xe/0x10
> [354073.762216]  __ww_mutex_lock.isra.11+0x3e0/0x750
> [354073.762217]  ? __switch_to_asm+0x34/0x70
> [354073.762218]  ? __switch_to_asm+0x40/0x70
> [354073.762219]  ? __switch_to_asm+0x40/0x70
> [354073.762220]  __ww_mutex_lock_slowpath+0x16/0x20
> [354073.762221]  ww_mutex_lock+0x34/0x50
> [354073.762235]  drm_modeset_lock+0x35/0xb0 [drm]
> [354073.762243]  drm_modeset_lock_all_ctx+0x5d/0xe0 [drm]
> [354073.762251]  drm_modeset_lock_all+0x5e/0xb0 [drm]
> [354073.762252]  qxl_display_read_client_monitors_config+0x1e1/0x370 [qxl]
> [354073.762254]  qxl_client_monitors_config_work_func+0x15/0x20 [qxl]
> [354073.762256]  process_one_work+0x20f/0x410
> [354073.762257]  worker_thread+0x34/0x400
> [354073.762259]  kthread+0x120/0x140
> [354073.762260]  ? process_one_work+0x410/0x410
> [354073.762261]  ? __kthread_parkme+0x70/0x70
> [354073.762262]  ret_from_fork+0x35/0x40
> 

--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -97,8 +97,9 @@ int ttm_eu_reserve_buffers(struct ww_acq
 			   struct list_head *dups, bool del_lru)
 {
 	struct ttm_bo_global *glob;
-	struct ttm_validate_buffer *entry;
+	struct ttm_validate_buffer *entry, *last_entry;
 	int ret;
+	bool locked = false;
 
 	if (list_empty(list))
 		return 0;
@@ -112,7 +113,10 @@ int ttm_eu_reserve_buffers(struct ww_acq
 	list_for_each_entry(entry, list, head) {
 		struct ttm_buffer_object *bo = entry->bo;
 
+		last_entry = entry;
 		ret = __ttm_bo_reserve(bo, intr, (ticket == NULL), ticket);
+		if (!ret)
+			locked = true;
 		if (!ret && unlikely(atomic_read(&bo->cpu_writers) > 0)) {
 			reservation_object_unlock(bo->resv);
 
@@ -151,6 +155,10 @@ int ttm_eu_reserve_buffers(struct ww_acq
 				ret = 0;
 			}
 		}
+		if (!ret)
+			locked = true;
+		else
+			locked = false;
 
 		if (!ret && entry->num_shared)
 			ret = reservation_object_reserve_shared(bo->resv,
@@ -163,6 +171,8 @@ int ttm_eu_reserve_buffers(struct ww_acq
 				ww_acquire_done(ticket);
 				ww_acquire_fini(ticket);
 			}
+			if (locked)
+				ttm_eu_backoff_reservation_reverse(list, entry);
 			return ret;
 		}
 
@@ -172,6 +182,8 @@ int ttm_eu_reserve_buffers(struct ww_acq
 		list_del(&entry->head);
 		list_add(&entry->head, list);
 	}
+	if (locked)
+		ttm_eu_backoff_reservation_reverse(list, last_entry);
 
 	if (del_lru) {
 		spin_lock(&glob->lru_lock);
--

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Spice-devel] Xorg indefinitely hangs in kernelspace
  2019-09-06  5:53 Xorg indefinitely hangs in kernelspace Hillf Danton
@ 2019-09-06 20:27 ` Frediano Ziglio
  2019-09-07  2:00   ` Hillf Danton
  2019-09-09  5:52 ` Gerd Hoffmann
  1 sibling, 1 reply; 8+ messages in thread
From: Frediano Ziglio @ 2019-09-06 20:27 UTC (permalink / raw)
  To: Hillf Danton
  Cc: David Airlie, linux-kernel, dri-devel, virtualization,
	Daniel Vetter, Dave Airlie, Jaak Ristioja, spice-devel

> 
> On Tue, 6 Aug 2019 21:00:10 +0300 From:   Jaak Ristioja <jaak@ristioja.ee>
> > Hello!
> > 
> > I'm writing to report a crash in the QXL / DRM code in the Linux kernel.
> > I originally filed the issue on LaunchPad and more details can be found
> > there, although I doubt whether these details are useful.
> > 
> >   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1813620
> > 
> > I first experienced these issues with:
> > 
> > * Ubuntu 18.04 (probably kernel 4.15.something)
> > * Ubuntu 18.10 (kernel 4.18.0-13)
> > * Ubuntu 19.04 (kernel 5.0.0-13-generic)
> > * Ubuntu 19.04 (mainline kernel 5.1-rc7)
> > * Ubuntu 19.04 (mainline kernel 5.2.0-050200rc1-generic)
> > 
> > Here is the crash output from dmesg:
> > 
> > [354073.713350] INFO: task Xorg:920 blocked for more than 120 seconds.
> > [354073.717755]       Not tainted 5.2.0-050200rc1-generic #201905191930
> > [354073.722277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [354073.738332] Xorg            D    0   920    854 0x00404004
> > [354073.738334] Call Trace:
> > [354073.738340]  __schedule+0x2ba/0x650
> > [354073.738342]  schedule+0x2d/0x90
> > [354073.738343]  schedule_preempt_disabled+0xe/0x10
> > [354073.738345]  __ww_mutex_lock.isra.11+0x3e0/0x750
> > [354073.738346]  __ww_mutex_lock_slowpath+0x16/0x20
> > [354073.738347]  ww_mutex_lock+0x34/0x50
> > [354073.738352]  ttm_eu_reserve_buffers+0x1f9/0x2e0 [ttm]
> > [354073.738356]  qxl_release_reserve_list+0x67/0x150 [qxl]
> > [354073.738358]  ? qxl_bo_pin+0xaa/0x190 [qxl]
> > [354073.738359]  qxl_cursor_atomic_update+0x1b0/0x2e0 [qxl]
> > [354073.738367]  drm_atomic_helper_commit_planes+0xb9/0x220
> > [drm_kms_helper]
> > [354073.738371]  drm_atomic_helper_commit_tail+0x2b/0x70 [drm_kms_helper]
> > [354073.738374]  commit_tail+0x67/0x70 [drm_kms_helper]
> > [354073.738378]  drm_atomic_helper_commit+0x113/0x120 [drm_kms_helper]
> > [354073.738390]  drm_atomic_commit+0x4a/0x50 [drm]
> > [354073.738394]  drm_atomic_helper_update_plane+0xe9/0x100 [drm_kms_helper]
> > [354073.738402]  __setplane_atomic+0xd3/0x120 [drm]
> > [354073.738410]  drm_mode_cursor_universal+0x142/0x270 [drm]
> > [354073.738418]  drm_mode_cursor_common+0xcb/0x220 [drm]
> > [354073.738425]  ? drm_mode_cursor_ioctl+0x60/0x60 [drm]
> > [354073.738432]  drm_mode_cursor2_ioctl+0xe/0x10 [drm]
> > [354073.738438]  drm_ioctl_kernel+0xb0/0x100 [drm]
> > [354073.738440]  ? ___sys_recvmsg+0x16c/0x200
> > [354073.738445]  drm_ioctl+0x233/0x410 [drm]
> > [354073.738452]  ? drm_mode_cursor_ioctl+0x60/0x60 [drm]
> > [354073.738454]  ? timerqueue_add+0x57/0x90
> > [354073.738456]  ? enqueue_hrtimer+0x3c/0x90
> > [354073.738458]  do_vfs_ioctl+0xa9/0x640
> > [354073.738459]  ? fput+0x13/0x20
> > [354073.738461]  ? __sys_recvmsg+0x88/0xa0
> > [354073.738462]  ksys_ioctl+0x67/0x90
> > [354073.738463]  __x64_sys_ioctl+0x1a/0x20
> > [354073.738465]  do_syscall_64+0x5a/0x140
> > [354073.738467]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [354073.738468] RIP: 0033:0x7ffad14d3417
> > [354073.738472] Code: Bad RIP value.
> > [354073.738472] RSP: 002b:00007ffdd5679978 EFLAGS: 00003246 ORIG_RAX:
> > 0000000000000010
> > [354073.738473] RAX: ffffffffffffffda RBX: 000056428a474610 RCX:
> > 00007ffad14d3417
> > [354073.738474] RDX: 00007ffdd56799b0 RSI: 00000000c02464bb RDI:
> > 000000000000000e
> > [354073.738474] RBP: 00007ffdd56799b0 R08: 0000000000000040 R09:
> > 0000000000000010
> > [354073.738475] R10: 000000000000003f R11: 0000000000003246 R12:
> > 00000000c02464bb
> > [354073.738475] R13: 000000000000000e R14: 0000000000000000 R15:
> > 000056428a4721d0
> > [354073.738511] INFO: task kworker/1:0:27625 blocked for more than 120
> > seconds.
> > [354073.745154]       Not tainted 5.2.0-050200rc1-generic #201905191930
> > [354073.751900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [354073.762197] kworker/1:0     D    0 27625      2 0x80004000
> > [354073.762205] Workqueue: events qxl_client_monitors_config_work_func
> > [qxl]
> > [354073.762206] Call Trace:
> > [354073.762211]  __schedule+0x2ba/0x650
> > [354073.762214]  schedule+0x2d/0x90
> > [354073.762215]  schedule_preempt_disabled+0xe/0x10
> > [354073.762216]  __ww_mutex_lock.isra.11+0x3e0/0x750
> > [354073.762217]  ? __switch_to_asm+0x34/0x70
> > [354073.762218]  ? __switch_to_asm+0x40/0x70
> > [354073.762219]  ? __switch_to_asm+0x40/0x70
> > [354073.762220]  __ww_mutex_lock_slowpath+0x16/0x20
> > [354073.762221]  ww_mutex_lock+0x34/0x50
> > [354073.762235]  drm_modeset_lock+0x35/0xb0 [drm]
> > [354073.762243]  drm_modeset_lock_all_ctx+0x5d/0xe0 [drm]
> > [354073.762251]  drm_modeset_lock_all+0x5e/0xb0 [drm]
> > [354073.762252]  qxl_display_read_client_monitors_config+0x1e1/0x370 [qxl]
> > [354073.762254]  qxl_client_monitors_config_work_func+0x15/0x20 [qxl]
> > [354073.762256]  process_one_work+0x20f/0x410
> > [354073.762257]  worker_thread+0x34/0x400
> > [354073.762259]  kthread+0x120/0x140
> > [354073.762260]  ? process_one_work+0x410/0x410
> > [354073.762261]  ? __kthread_parkme+0x70/0x70
> > [354073.762262]  ret_from_fork+0x35/0x40
> > 
> 
> --- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> @@ -97,8 +97,9 @@ int ttm_eu_reserve_buffers(struct ww_acq
>  			   struct list_head *dups, bool del_lru)
>  {
>  	struct ttm_bo_global *glob;
> -	struct ttm_validate_buffer *entry;
> +	struct ttm_validate_buffer *entry, *last_entry;
>  	int ret;
> +	bool locked = false;
>  
>  	if (list_empty(list))
>  		return 0;
> @@ -112,7 +113,10 @@ int ttm_eu_reserve_buffers(struct ww_acq
>  	list_for_each_entry(entry, list, head) {
>  		struct ttm_buffer_object *bo = entry->bo;
>  
> +		last_entry = entry;
>  		ret = __ttm_bo_reserve(bo, intr, (ticket == NULL), ticket);
> +		if (!ret)
> +			locked = true;
>  		if (!ret && unlikely(atomic_read(&bo->cpu_writers) > 0)) {
>  			reservation_object_unlock(bo->resv);
>  
> @@ -151,6 +155,10 @@ int ttm_eu_reserve_buffers(struct ww_acq
>  				ret = 0;
>  			}
>  		}
> +		if (!ret)
> +			locked = true;
> +		else
> +			locked = false;
>  

locked = !ret; 

?

>  		if (!ret && entry->num_shared)
>  			ret = reservation_object_reserve_shared(bo->resv,
> @@ -163,6 +171,8 @@ int ttm_eu_reserve_buffers(struct ww_acq
>  				ww_acquire_done(ticket);
>  				ww_acquire_fini(ticket);
>  			}
> +			if (locked)
> +				ttm_eu_backoff_reservation_reverse(list, entry);
>  			return ret;
>  		}
>  
> @@ -172,6 +182,8 @@ int ttm_eu_reserve_buffers(struct ww_acq
>  		list_del(&entry->head);
>  		list_add(&entry->head, list);
>  	}
> +	if (locked)
> +		ttm_eu_backoff_reservation_reverse(list, last_entry);
>  
>  	if (del_lru) {
>  		spin_lock(&glob->lru_lock);

Where does it came this patch? Is it already somewhere?
Is it supposed to fix this issue?
Does it affect some other card beside QXL?

Frediano

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Spice-devel] Xorg indefinitely hangs in kernelspace
  2019-09-06 20:27 ` [Spice-devel] " Frediano Ziglio
@ 2019-09-07  2:00   ` Hillf Danton
  0 siblings, 0 replies; 8+ messages in thread
From: Hillf Danton @ 2019-09-07  2:00 UTC (permalink / raw)
  To: Frediano Ziglio
  Cc: David Airlie, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org,
	virtualization@lists.linux-foundation.org, Daniel Vetter,
	Dave Airlie, Jaak Ristioja, spice-devel@lists.freedesktop.org


[-- Attachment #1.1: Type: text/plain, Size: 313 bytes --]

From Frediano Ziglio <fziglio@redhat.com>
>
> Where does it came this patch?

My fingers tapping the keyboard.

> Is it already somewhere?

No idea yet.

> Is it supposed to fix this issue?

It may do nothing else as far as I can tell.

> Does it affect some other card beside QXL?

Perhaps.



[-- Attachment #1.2: Type: text/html, Size: 2946 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Xorg indefinitely hangs in kernelspace
  2019-09-06  5:53 Xorg indefinitely hangs in kernelspace Hillf Danton
  2019-09-06 20:27 ` [Spice-devel] " Frediano Ziglio
@ 2019-09-09  5:52 ` Gerd Hoffmann
  2019-09-09  7:13   ` Hillf Danton
  1 sibling, 1 reply; 8+ messages in thread
From: Gerd Hoffmann @ 2019-09-09  5:52 UTC (permalink / raw)
  To: Hillf Danton
  Cc: David Airlie, linux-kernel, dri-devel, virtualization,
	Daniel Vetter, spice-devel, Jaak Ristioja, Dave Airlie

  Hi,

--verbose please.  Do you see the same hang?  Does the patch fix it?

> --- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> @@ -97,8 +97,9 @@ int ttm_eu_reserve_buffers(struct ww_acq
>  			   struct list_head *dups, bool del_lru)
[ ... ]

> +			if (locked)
> +				ttm_eu_backoff_reservation_reverse(list, entry);

Hmm, I think the patch is wrong.  As far I know it is the qxl drivers's
job to call ttm_eu_backoff_reservation().  Doing that automatically in
ttm will most likely break other ttm users.

So I guess the call is missing in the qxl driver somewhere, most likely
in some error handling code path given that this bug is a relatively
rare event.

There is only a single ttm_eu_reserve_buffers() call in qxl.
So how about this?

----------------------- cut here --------------------
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 312216caeea2..2f9950fa0b8d 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -262,18 +262,20 @@ int qxl_release_reserve_list(struct qxl_release *release, bool no_intr)
 	ret = ttm_eu_reserve_buffers(&release->ticket, &release->bos,
 				     !no_intr, NULL, true);
 	if (ret)
-		return ret;
+		goto err_backoff;
 
 	list_for_each_entry(entry, &release->bos, tv.head) {
 		struct qxl_bo *bo = to_qxl_bo(entry->tv.bo);
 
 		ret = qxl_release_validate_bo(bo);
-		if (ret) {
-			ttm_eu_backoff_reservation(&release->ticket, &release->bos);
-			return ret;
-		}
+		if (ret)
+			goto err_backoff;
 	}
 	return 0;
+
+err_backoff:
+	ttm_eu_backoff_reservation(&release->ticket, &release->bos);
+	return ret;
 }
 
 void qxl_release_backoff_reserve_list(struct qxl_release *release)
----------------------- cut here --------------------

cheers,
  Gerd

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Xorg indefinitely hangs in kernelspace
  2019-09-09  5:52 ` Gerd Hoffmann
@ 2019-09-09  7:13   ` Hillf Danton
  0 siblings, 0 replies; 8+ messages in thread
From: Hillf Danton @ 2019-09-09  7:13 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: David Airlie, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org,
	virtualization@lists.linux-foundation.org, Daniel Vetter,
	spice-devel@lists.freedesktop.org, Jaak Ristioja, Dave Airlie


[-- Attachment #1.1: Type: text/plain, Size: 750 bytes --]

Hi,

On Mon, 9 Sep 2019 from Gerd Hoffmann <kraxel@redhat.com>
>
> Hmm, I think the patch is wrong.  As far I know it is the qxl drivers's
> job to call ttm_eu_backoff_reservation().  Doing that automatically in
> ttm will most likely break other ttm users.
>
Perhaps.

>So I guess the call is missing in the qxl driver somewhere, most likely
>in some error handling code path given that this bug is a relatively
>rare event.
>
>There is only a single ttm_eu_reserve_buffers() call in qxl.
>So how about this?
>
No preference in either way if it is a right cure.

BTW a quick peep at the mainline tree shows not every
ttm_eu_reserve_buffers() pairs with ttm_eu_backoff_reservation()
without qxl being taken in account.

Hillf

[-- Attachment #1.2: Type: text/html, Size: 6170 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Spice-devel] Xorg indefinitely hangs in kernelspace
       [not found]     ` <ccafdbaf-7f8e-8616-5543-2a178bd63828@ristioja.ee>
@ 2019-09-30 13:29       ` Frediano Ziglio
       [not found]       ` <1174991123.3693721.1569850187145.JavaMail.zimbra@redhat.com>
  1 sibling, 0 replies; 8+ messages in thread
From: Frediano Ziglio @ 2019-09-30 13:29 UTC (permalink / raw)
  To: Jaak Ristioja
  Cc: David Airlie, linux-kernel, dri-devel, virtualization,
	Daniel Vetter, Dave Airlie, spice-devel

> 
> On 05.09.19 15:34, Jaak Ristioja wrote:
> > On 05.09.19 10:14, Gerd Hoffmann wrote:
> >> On Tue, Aug 06, 2019 at 09:00:10PM +0300, Jaak Ristioja wrote:
> >>> Hello!
> >>>
> >>> I'm writing to report a crash in the QXL / DRM code in the Linux kernel.
> >>> I originally filed the issue on LaunchPad and more details can be found
> >>> there, although I doubt whether these details are useful.
> >>
> >> Any change with kernel 5.3-rc7 ?
> > 
> > Didn't try. Did you change something? I could try, but I've done so
> > before and every time this bug manifests itself with MAJOR.MINOR-rc# I
> > get asked to try version MAJOR.(MINOR+1)-rc# so I guess I could as well
> > give up?
> > 
> > Alright, I'll install 5.3-rc7, but once more it might take some time for
> > this bug to expose itself.
> 
> Just got the issue with 5.3.0-050300rc7-generic:
> 
> [124212.547403] INFO: task Xorg:797 blocked for more than 120 seconds.
> [124212.548639]       Not tainted 5.3.0-050300rc7-generic #201909021831
> [124212.549839] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [124212.547403] INFO: task Xorg:797 blocked for more than 120 seconds.
> [124212.548639]       Not tainted 5.3.0-050300rc7-generic #201909021831
> [124212.549839] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [124212.551329] Xorg            D    0   797    773 0x00404004
> [124212.551331] Call Trace:
> [124212.551336]  __schedule+0x2b9/0x6c0
> [124212.551337]  schedule+0x42/0xb0
> [124212.551338]  schedule_preempt_disabled+0xe/0x10
> [124212.551340]  __ww_mutex_lock.isra.0+0x261/0x7f0
> [124212.551345]  ? ttm_bo_init+0x6b/0x100 [ttm]
> [124212.551346]  __ww_mutex_lock_slowpath+0x16/0x20
> [124212.551347]  ww_mutex_lock+0x38/0x90
> [124212.551352]  ttm_eu_reserve_buffers+0x1cc/0x2f0 [ttm]
> [124212.551371]  qxl_release_reserve_list+0x6d/0x150 [qxl]
> [124212.551373]  ? qxl_bo_pin+0xf4/0x190 [qxl]
> [124212.551375]  qxl_cursor_atomic_update+0x1ab/0x2e0 [qxl]
> [124212.551376]  ? qxl_bo_pin+0xf4/0x190 [qxl]
> [124212.551384]  drm_atomic_helper_commit_planes+0xd5/0x220 [drm_kms_helper]
> [124212.551388]  drm_atomic_helper_commit_tail+0x2c/0x70 [drm_kms_helper]
> [124212.551392]  commit_tail+0x68/0x70 [drm_kms_helper]
> [124212.551395]  drm_atomic_helper_commit+0x118/0x120 [drm_kms_helper]
> [124212.551407]  drm_atomic_commit+0x4a/0x50 [drm]
> [124212.551411]  drm_atomic_helper_update_plane+0xea/0x100 [drm_kms_helper]
> [124212.551418]  __setplane_atomic+0xcb/0x110 [drm]
> [124212.551428]  drm_mode_cursor_universal+0x140/0x260 [drm]
> [124212.551435]  drm_mode_cursor_common+0xcc/0x220 [drm]
> [124212.551441]  ? drm_mode_cursor_ioctl+0x60/0x60 [drm]
> [124212.551447]  drm_mode_cursor2_ioctl+0xe/0x10 [drm]
> [124212.551452]  drm_ioctl_kernel+0xae/0xf0 [drm]
> [124212.551458]  drm_ioctl+0x234/0x3d0 [drm]
> [124212.551464]  ? drm_mode_cursor_ioctl+0x60/0x60 [drm]
> [124212.551466]  ? timerqueue_add+0x5f/0xa0
> [124212.551469]  ? enqueue_hrtimer+0x3d/0x90
> [124212.551471]  do_vfs_ioctl+0x407/0x670
> [124212.551473]  ? fput+0x13/0x20
> [124212.551475]  ? __sys_recvmsg+0x88/0xa0
> [124212.551476]  ksys_ioctl+0x67/0x90
> [124212.551477]  __x64_sys_ioctl+0x1a/0x20
> [124212.551479]  do_syscall_64+0x5a/0x130
> [124212.551480]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [124212.551481] RIP: 0033:0x7f07c79ee417
> [124212.551485] Code: Bad RIP value.
> [124212.551485] RSP: 002b:00007ffc6b1de1a8 EFLAGS: 00003246 ORIG_RAX:
> 0000000000000010
> [124212.551486] RAX: ffffffffffffffda RBX: 00005612f109a610 RCX:
> 00007f07c79ee417
> [124212.551487] RDX: 00007ffc6b1de1e0 RSI: 00000000c02464bb RDI:
> 000000000000000e
> [124212.551487] RBP: 00007ffc6b1de1e0 R08: 0000000000000040 R09:
> 0000000000000004
> [124212.551488] R10: 000000000000003f R11: 0000000000003246 R12:
> 00000000c02464bb
> [124212.551488] R13: 000000000000000e R14: 0000000000000000 R15:
> 00005612f10981d0
> [124333.376328] INFO: task Xorg:797 blocked for more than 241 seconds.
> [124333.377474]       Not tainted 5.3.0-050300rc7-generic #201909021831
> [124333.378609] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [124333.376328] INFO: task Xorg:797 blocked for more than 241 seconds.
> [124333.377474]       Not tainted 5.3.0-050300rc7-generic #201909021831
> [124333.378609] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [124333.380060] Xorg            D    0   797    773 0x00404004
> [124333.380062] Call Trace:
> [124333.380067]  __schedule+0x2b9/0x6c0
> [124333.380068]  schedule+0x42/0xb0
> [124333.380069]  schedule_preempt_disabled+0xe/0x10
> [124333.380070]  __ww_mutex_lock.isra.0+0x261/0x7f0
> [124333.380076]  ? ttm_bo_init+0x6b/0x100 [ttm]
> [124333.380077]  __ww_mutex_lock_slowpath+0x16/0x20
> [124333.380077]  ww_mutex_lock+0x38/0x90
> [124333.380080]  ttm_eu_reserve_buffers+0x1cc/0x2f0 [ttm]
> [124333.380083]  qxl_release_reserve_list+0x6d/0x150 [qxl]
> [124333.380085]  ? qxl_bo_pin+0xf4/0x190 [qxl]
> [124333.380087]  qxl_cursor_atomic_update+0x1ab/0x2e0 [qxl]
> [124333.380088]  ? qxl_bo_pin+0xf4/0x190 [qxl]
> [124333.380096]  drm_atomic_helper_commit_planes+0xd5/0x220 [drm_kms_helper]
> [124333.380101]  drm_atomic_helper_commit_tail+0x2c/0x70 [drm_kms_helper]
> [124333.380105]  commit_tail+0x68/0x70 [drm_kms_helper]
> [124333.380109]  drm_atomic_helper_commit+0x118/0x120 [drm_kms_helper]
> [124333.380128]  drm_atomic_commit+0x4a/0x50 [drm]
> [124333.380132]  drm_atomic_helper_update_plane+0xea/0x100 [drm_kms_helper]
> [124333.380140]  __setplane_atomic+0xcb/0x110 [drm]
> [124333.380147]  drm_mode_cursor_universal+0x140/0x260 [drm]
> [124333.380153]  drm_mode_cursor_common+0xcc/0x220 [drm]
> [124333.380160]  ? drm_mode_cursor_ioctl+0x60/0x60 [drm]
> [124333.380166]  drm_mode_cursor2_ioctl+0xe/0x10 [drm]
> [124333.380171]  drm_ioctl_kernel+0xae/0xf0 [drm]
> [124333.380176]  drm_ioctl+0x234/0x3d0 [drm]
> [124333.380182]  ? drm_mode_cursor_ioctl+0x60/0x60 [drm]
> [124333.380184]  ? timerqueue_add+0x5f/0xa0
> [124333.380186]  ? enqueue_hrtimer+0x3d/0x90
> [124333.380188]  do_vfs_ioctl+0x407/0x670
> [124333.380190]  ? fput+0x13/0x20
> [124333.380192]  ? __sys_recvmsg+0x88/0xa0
> [124333.380193]  ksys_ioctl+0x67/0x90
> [124333.380194]  __x64_sys_ioctl+0x1a/0x20
> [124333.380195]  do_syscall_64+0x5a/0x130
> [124333.380197]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [124333.380198] RIP: 0033:0x7f07c79ee417
> [124333.380202] Code: Bad RIP value.
> [124333.380203] RSP: 002b:00007ffc6b1de1a8 EFLAGS: 00003246 ORIG_RAX:
> 0000000000000010
> [124333.380204] RAX: ffffffffffffffda RBX: 00005612f109a610 RCX:
> 00007f07c79ee417
> [124333.380204] RDX: 00007ffc6b1de1e0 RSI: 00000000c02464bb RDI:
> 000000000000000e
> [124333.380205] RBP: 00007ffc6b1de1e0 R08: 0000000000000040 R09:
> 0000000000000004
> [124333.380205] R10: 000000000000003f R11: 0000000000003246 R12:
> 00000000c02464bb
> [124333.380206] R13: 000000000000000e R14: 0000000000000000 R15:
> 00005612f10981d0
> 
> 
> Best regards,
> J

Hi Jaak,
  Why didn't you update bug at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1813620?
I know it can seem tedious but would help tracking it.
It seems you are having this issue since quite some time and with
multiple kernel versions.
Are you still using Kubuntu? Maybe it happens more with KDE.
From the Kernel log looks like a dead lock.

Frediano

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Spice-devel] Xorg indefinitely hangs in kernelspace
       [not found]       ` <1174991123.3693721.1569850187145.JavaMail.zimbra@redhat.com>
@ 2019-10-03  6:45         ` Jaak Ristioja
  2019-10-03  8:23         ` Hillf Danton
  1 sibling, 0 replies; 8+ messages in thread
From: Jaak Ristioja @ 2019-10-03  6:45 UTC (permalink / raw)
  To: Frediano Ziglio
  Cc: David Airlie, linux-kernel, dri-devel, virtualization,
	Daniel Vetter, Dave Airlie, spice-devel

On 30.09.19 16:29, Frediano Ziglio wrote:
>   Why didn't you update bug at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1813620?
> I know it can seem tedious but would help tracking it.

I suppose the lack on centralized tracking and handling of Linux kernel
bugs is a delicate topic, so I don't want to rant much more on that.
Updating that bug would tedious and time-consuming indeed, which is why
I haven't done that. To be honest, I don't have enough time and motivation.

I would have posted a link to the upstream (kernel) bug tracker for
this, but being confined I only posted a link to my original e-mail on
the virtualization list Pipermail archive. Can you please provide a
better URL to a reasonably browsable index of this whole e-mail thread
in some web-based mailing list archive? Perhaps posting that to
Launchpad would suffice.


> It seems you are having this issue since quite some time and with
> multiple kernel versions.
> Are you still using Kubuntu? Maybe it happens more with KDE.
> From the Kernel log looks like a dead lock.

Yes, I'm using Kubuntu 19.04.


Best regards,
Jaak Ristioja

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Spice-devel] Xorg indefinitely hangs in kernelspace
       [not found]       ` <1174991123.3693721.1569850187145.JavaMail.zimbra@redhat.com>
  2019-10-03  6:45         ` Jaak Ristioja
@ 2019-10-03  8:23         ` Hillf Danton
  1 sibling, 0 replies; 8+ messages in thread
From: Hillf Danton @ 2019-10-03  8:23 UTC (permalink / raw)
  To: Frediano Ziglio, Jaak Ristioja
  Cc: David Airlie, linux-kernel, dri-devel, virtualization,
	Daniel Vetter, Dave Airlie, spice-devel


On Thu, 3 Oct 2019 09:45:55 +0300 Jaak Ristioja wrote:
> On 30.09.19 16:29, Frediano Ziglio wrote:
> >   Why didn't you update bug at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1813620?
> > I know it can seem tedious but would help tracking it.
> 
> I suppose the lack on centralized tracking and handling of Linux kernel
> bugs is a delicate topic, so I don't want to rant much more on that.
> Updating that bug would tedious and time-consuming indeed, which is why
> I haven't done that. To be honest, I don't have enough time and motivation.

Give the diff below a go only when it is convenient and only if it makes
a bit of sense to you.

--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -110,6 +110,7 @@ int ttm_eu_reserve_buffers(struct ww_acq
 		ww_acquire_init(ticket, &reservation_ww_class);
 
 	list_for_each_entry(entry, list, head) {
+		bool lockon = false;
 		struct ttm_buffer_object *bo = entry->bo;
 
 		ret = __ttm_bo_reserve(bo, intr, (ticket == NULL), ticket);
@@ -150,6 +151,7 @@ int ttm_eu_reserve_buffers(struct ww_acq
 				dma_resv_lock_slow(bo->base.resv, ticket);
 				ret = 0;
 			}
+			lockon = !ret;
 		}
 
 		if (!ret && entry->num_shared)
@@ -157,6 +159,8 @@ int ttm_eu_reserve_buffers(struct ww_acq
 								entry->num_shared);
 
 		if (unlikely(ret != 0)) {
+			if (lockon)
+				dma_resv_unlock(bo->base.resv);
 			if (ret == -EINTR)
 				ret = -ERESTARTSYS;
 			if (ticket) {

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-10-03  8:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-09-06  5:53 Xorg indefinitely hangs in kernelspace Hillf Danton
2019-09-06 20:27 ` [Spice-devel] " Frediano Ziglio
2019-09-07  2:00   ` Hillf Danton
2019-09-09  5:52 ` Gerd Hoffmann
2019-09-09  7:13   ` Hillf Danton
  -- strict thread matches above, loose matches on Subject: below --
2019-08-06 18:00 Jaak Ristioja
     [not found] ` <20190905071407.47iywqcqomizs3yr@sirius.home.kraxel.org>
     [not found]   ` <e4b7d889-15f3-0c90-3b9f-d395344499c0@ristioja.ee>
     [not found]     ` <ccafdbaf-7f8e-8616-5543-2a178bd63828@ristioja.ee>
2019-09-30 13:29       ` [Spice-devel] " Frediano Ziglio
     [not found]       ` <1174991123.3693721.1569850187145.JavaMail.zimbra@redhat.com>
2019-10-03  6:45         ` Jaak Ristioja
2019-10-03  8:23         ` Hillf Danton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).