Re: task btrfs-transacti:651 blocked for more than 120 seconds

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* Re: task btrfs-transacti:651 blocked for more than 120 seconds
       [not found] <1506593789.26660.28.camel@daevel.fr>
@ 2017-09-28 11:18 ` Nikolay Borisov
       [not found] ` <ed2732d4-966a-3f17-bb5e-27f7615668ea@gmail.com>
  2017-09-28 15:04 ` Olivier Bonvalet
  2 siblings, 0 replies; 9+ messages in thread
From: Nikolay Borisov @ 2017-09-28 11:18 UTC (permalink / raw)
  To: Olivier Bonvalet, linux-btrfs; +Cc: xen-devel



On 28.09.2017 13:16, Olivier Bonvalet wrote:
> Hi !
> 
> I have a virtual server (Xen) which very frequently hangs with only
> this error in logs :
> 
> [ 1330.144124] INFO: task btrfs-transacti:651 blocked for more than 120 seconds.
> [ 1330.144141]       Not tainted 4.9-dae-xen #2
> [ 1330.144146] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1330.144179] btrfs-transacti D    0   651      2 0x00000000
> [ 1330.144184]  ffff8803a6c85b40 0000000000000000 ffff8803af857880 ffff8803a9762180
> [ 1330.144190]  ffff8803a7bb8140 ffffc900173bfb10 ffffffff8150ff1f 0000000000000000
> [ 1330.144195]  ffff8803a7bb8140 7fffffffffffffff ffffffff81510710 ffffc900173bfc18
> [ 1330.144200] Call Trace:
> [ 1330.144211]  [<ffffffff8150ff1f>] ? __schedule+0x17f/0x530
> [ 1330.144215]  [<ffffffff81510710>] ? bit_wait+0x50/0x50
> [ 1330.144218]  [<ffffffff815102fd>] ? schedule+0x2d/0x80
> [ 1330.144221]  [<ffffffff815132be>] ? schedule_timeout+0x17e/0x2a0
> [ 1330.144226]  [<ffffffff8101bb71>] ? xen_clocksource_get_cycles+0x11/0x20
> [ 1330.144231]  [<ffffffff810f2196>] ? ktime_get+0x36/0xa0
> [ 1330.144234]  [<ffffffff81510710>] ? bit_wait+0x50/0x50
> [ 1330.144237]  [<ffffffff8150fd38>] ? io_schedule_timeout+0x98/0x100
> [ 1330.144240]  [<ffffffff81513de1>] ? _raw_spin_unlock_irqrestore+0x11/0x20
> [ 1330.144246]  [<ffffffff81510722>] ? bit_wait_io+0x12/0x60
> [ 1330.144250]  [<ffffffff815107be>] ? __wait_on_bit+0x4e/0x80
> [ 1330.144256]  [<ffffffff8113772c>] ? wait_on_page_bit+0x6c/0x80
> [ 1330.144261]  [<ffffffff810d4ab0>] ? autoremove_wake_function+0x30/0x30
> [ 1330.144265]  [<ffffffff81137808>] ? __filemap_fdatawait_range+0xc8/0x110
> [ 1330.144270]  [<ffffffff81137859>] ? filemap_fdatawait_range+0x9/0x20
> [ 1330.144298]  [<ffffffffa014b033>] ? btrfs_wait_ordered_range+0x63/0x100 [btrfs]
> [ 1330.144310]  [<ffffffffa0175a68>] ? btrfs_wait_cache_io+0x58/0x1e0 [btrfs]
> [ 1330.144320]  [<ffffffffa011ded2>] ? btrfs_start_dirty_block_groups+0x1c2/0x450 [btrfs]
> [ 1330.144328]  [<ffffffff810a2ba5>] ? do_group_exit+0x35/0xa0
> [ 1330.144338]  [<ffffffffa012efa7>] ? btrfs_commit_transaction+0x147/0x9b0 [btrfs]
> [ 1330.144348]  [<ffffffffa012f8a2>] ? start_transaction+0x92/0x3f0 [btrfs]
> [ 1330.144357]  [<ffffffffa012a0e7>] ? transaction_kthread+0x1d7/0x1f0 [btrfs]
> [ 1330.144366]  [<ffffffffa0129f10>] ? btrfs_cleanup_transaction+0x4f0/0x4f0 [btrfs]
> [ 1330.144373]  [<ffffffff810ba352>] ? kthread+0xc2/0xe0
> [ 1330.144377]  [<ffffffff810ba290>] ? kthread_create_on_node+0x40/0x40
> [ 1330.144381]  [<ffffffff81514405>] ? ret_from_fork+0x25/0x30

So what this stack trace means is that transaction commit has hung. And
judging by the called functions (assuming they are correct, though the ?
aren't very encouraging). Concretely, it means that an io has been
started for a certain range of addresses and transaction commit is now
waiting to be awaken upon completion of write. When this occurs can you
see if there is io activity from that particular guest (assuming you
have access to the hypervisor)? It might be a bug in btrfs or you might
be hitting something else in the hypervisor


> 
> 
> It's a Debian Stretch system, running a 4.9.52 Linux kernel (on a Xen 4.8.2 hypervisor).
> With an old 4.1.x Linux kernel, I haven't any problem.
> 
> 
> Is it a Btrfs bug ? Should I try a more recent kernel ? (which one ?)
> 
> Thanks in advance,
> 
> Olivier
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

[parent not found: <ed2732d4-966a-3f17-bb5e-27f7615668ea@gmail.com>]

* Re : task btrfs-transacti:651 blocked for more than 120 seconds
       [not found] ` <ed2732d4-966a-3f17-bb5e-27f7615668ea@gmail.com>
@ 2017-09-28 14:28   ` Olivier Bonvalet
       [not found]   ` <1506608901.2373.10.camel@daevel.fr>
  1 sibling, 0 replies; 9+ messages in thread
From: Olivier Bonvalet @ 2017-09-28 14:28 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs; +Cc: xen-devel

Le jeudi 28 septembre 2017 à 14:18 +0300, Nikolay Borisov a écrit :
> So what this stack trace means is that transaction commit has hung.
> And
> judging by the called functions (assuming they are correct, though
> the ?
> aren't very encouraging). Concretely, it means that an io has been
> started for a certain range of addresses and transaction commit is
> now
> waiting to be awaken upon completion of write. When this occurs can
> you
> see if there is io activity from that particular guest (assuming you
> have access to the hypervisor)? It might be a bug in btrfs or you
> might
> be hitting something else in the hypervisor


Hello,

thanks for your answer. From the hypervisor, I don't see any IO during
this hang.

I tried to clone the VM to simulate the problem, and I also have the
problem without Btrfs :

[ 3263.452023] INFO: task systemd:1 blocked for more than 120 seconds.
[ 3263.452040]       Tainted: G        W       4.9-dae-xen #2
[ 3263.452044] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3263.452052] systemd         D    0     1      0 0x00000000
[ 3263.452060]  ffff8803a71ca000 0000000000000000 ffff8803af857880 ffff8803a9762dc0
[ 3263.452070]  ffff8803a96fcc80 ffffc9001623f990 ffffffff8150ff1f 0000000000000000
[ 3263.452079]  ffff8803a96fcc80 7fffffffffffffff ffffffff81510710 ffffc9001623faa0
[ 3263.452087] Call Trace:
[ 3263.452099]  [<ffffffff8150ff1f>] ? __schedule+0x17f/0x530
[ 3263.452105]  [<ffffffff81510710>] ? bit_wait+0x50/0x50
[ 3263.452110]  [<ffffffff815102fd>] ? schedule+0x2d/0x80
[ 3263.452116]  [<ffffffff815132be>] ? schedule_timeout+0x17e/0x2a0
[ 3263.452121]  [<ffffffff8101bb71>] ? xen_clocksource_get_cycles+0x11/0x20
[ 3263.452126]  [<ffffffff810f2196>] ? ktime_get+0x36/0xa0
[ 3263.452130]  [<ffffffff81510710>] ? bit_wait+0x50/0x50
[ 3263.452134]  [<ffffffff8150fd38>] ? io_schedule_timeout+0x98/0x100
[ 3263.452137]  [<ffffffff81513de1>] ? _raw_spin_unlock_irqrestore+0x11/0x20
[ 3263.452141]  [<ffffffff81510722>] ? bit_wait_io+0x12/0x60
[ 3263.452145]  [<ffffffff815107be>] ? __wait_on_bit+0x4e/0x80
[ 3263.452149]  [<ffffffff81510710>] ? bit_wait+0x50/0x50
[ 3263.452153]  [<ffffffff81510859>] ? out_of_line_wait_on_bit+0x69/0x80
[ 3263.452157]  [<ffffffff810d4ab0>] ? autoremove_wake_function+0x30/0x30
[ 3263.452163]  [<ffffffff81220ed0>] ? ext4_find_entry+0x350/0x5d0
[ 3263.452168]  [<ffffffff811b9020>] ? d_alloc_parallel+0xa0/0x480
[ 3263.452172]  [<ffffffff811b6d18>] ? __d_lookup_done+0x68/0xd0
[ 3263.452175]  [<ffffffff811b7f38>] ? d_splice_alias+0x158/0x3b0
[ 3263.452179]  [<ffffffff81221662>] ? ext4_lookup+0x42/0x1f0
[ 3263.452184]  [<ffffffff811ab28e>] ? lookup_slow+0x8e/0x130
[ 3263.452187]  [<ffffffff811ab71a>] ? walk_component+0x1ca/0x300
[ 3263.452193]  [<ffffffff811ac0fe>] ? link_path_walk+0x18e/0x570
[ 3263.452199]  [<ffffffff811abe13>] ? path_init+0x1c3/0x320
[ 3263.452207]  [<ffffffff811ae4c2>] ? path_openat+0xe2/0x1380
[ 3263.452214]  [<ffffffff811b0329>] ? do_filp_open+0x79/0xd0
[ 3263.452222]  [<ffffffff81185fc1>] ? kmem_cache_alloc+0x71/0x400
[ 3263.452228]  [<ffffffff8119d507>] ? __check_object_size+0xf7/0x1c4
[ 3263.452235]  [<ffffffff8119f8cf>] ? do_sys_open+0x11f/0x1f0
[ 3263.452238]  [<ffffffff815141b7>] ? entry_SYSCALL_64_fastpath+0x1a/0xa9


So I will try to see with Xen developpers.

Thanks,

Olivier

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

[parent not found: <1506608901.2373.10.camel@daevel.fr>]

* Re :  Re :  task btrfs-transacti:651 blocked for more than 120 seconds
       [not found]   ` <1506608901.2373.10.camel@daevel.fr>
@ 2017-09-28 16:49     ` Olivier Bonvalet
  0 siblings, 0 replies; 9+ messages in thread
From: Olivier Bonvalet @ 2017-09-28 16:49 UTC (permalink / raw)
  To: xen-devel; +Cc: linux-btrfs

Le jeudi 28 septembre 2017 à 16:28 +0200, Olivier Bonvalet a écrit :
> [ 3263.452023] INFO: task systemd:1 blocked for more than 120
> seconds.
> [ 3263.452040]       Tainted: G        W       4.9-dae-xen #2
> [ 3263.452044] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 3263.452052] systemd         D    0     1      0 0x00000000
> [ 3263.452060]  ffff8803a71ca000 0000000000000000 ffff8803af857880
> ffff8803a9762dc0
> [ 3263.452070]  ffff8803a96fcc80 ffffc9001623f990 ffffffff8150ff1f
> 0000000000000000
> [ 3263.452079]  ffff8803a96fcc80 7fffffffffffffff ffffffff81510710
> ffffc9001623faa0
> [ 3263.452087] Call Trace:
> [ 3263.452099]  [<ffffffff8150ff1f>] ? __schedule+0x17f/0x530
> [ 3263.452105]  [<ffffffff81510710>] ? bit_wait+0x50/0x50
> [ 3263.452110]  [<ffffffff815102fd>] ? schedule+0x2d/0x80
> [ 3263.452116]  [<ffffffff815132be>] ? schedule_timeout+0x17e/0x2a0
> [ 3263.452121]  [<ffffffff8101bb71>] ?
> xen_clocksource_get_cycles+0x11/0x20
> [ 3263.452126]  [<ffffffff810f2196>] ? ktime_get+0x36/0xa0
> [ 3263.452130]  [<ffffffff81510710>] ? bit_wait+0x50/0x50
> [ 3263.452134]  [<ffffffff8150fd38>] ? io_schedule_timeout+0x98/0x100
> [ 3263.452137]  [<ffffffff81513de1>] ?
> _raw_spin_unlock_irqrestore+0x11/0x20
> [ 3263.452141]  [<ffffffff81510722>] ? bit_wait_io+0x12/0x60
> [ 3263.452145]  [<ffffffff815107be>] ? __wait_on_bit+0x4e/0x80
> [ 3263.452149]  [<ffffffff81510710>] ? bit_wait+0x50/0x50
> [ 3263.452153]  [<ffffffff81510859>] ?
> out_of_line_wait_on_bit+0x69/0x80
> [ 3263.452157]  [<ffffffff810d4ab0>] ?
> autoremove_wake_function+0x30/0x30
> [ 3263.452163]  [<ffffffff81220ed0>] ? ext4_find_entry+0x350/0x5d0
> [ 3263.452168]  [<ffffffff811b9020>] ? d_alloc_parallel+0xa0/0x480
> [ 3263.452172]  [<ffffffff811b6d18>] ? __d_lookup_done+0x68/0xd0
> [ 3263.452175]  [<ffffffff811b7f38>] ? d_splice_alias+0x158/0x3b0
> [ 3263.452179]  [<ffffffff81221662>] ? ext4_lookup+0x42/0x1f0
> [ 3263.452184]  [<ffffffff811ab28e>] ? lookup_slow+0x8e/0x130
> [ 3263.452187]  [<ffffffff811ab71a>] ? walk_component+0x1ca/0x300
> [ 3263.452193]  [<ffffffff811ac0fe>] ? link_path_walk+0x18e/0x570
> [ 3263.452199]  [<ffffffff811abe13>] ? path_init+0x1c3/0x320
> [ 3263.452207]  [<ffffffff811ae4c2>] ? path_openat+0xe2/0x1380
> [ 3263.452214]  [<ffffffff811b0329>] ? do_filp_open+0x79/0xd0
> [ 3263.452222]  [<ffffffff81185fc1>] ? kmem_cache_alloc+0x71/0x400
> [ 3263.452228]  [<ffffffff8119d507>] ? __check_object_size+0xf7/0x1c4
> [ 3263.452235]  [<ffffffff8119f8cf>] ? do_sys_open+0x11f/0x1f0
> [ 3263.452238]  [<ffffffff815141b7>] ?
> entry_SYSCALL_64_fastpath+0x1a/0xa9

Just in case, an other example :

[ 1088.476044] INFO: task jbd2/xvdb-8:494 blocked for more than 120 seconds.
[ 1088.476058]       Tainted: G        W       4.9-dae-xen #2
[ 1088.476061] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1088.476066] jbd2/xvdb-8     D    0   494      2 0x00000000
[ 1088.476072]  ffff8800fd036480 0000000000000000 ffff8803af8d7880 ffff8803a8c6e580
[ 1088.476079]  ffff88038756d280 ffffc9001737fb90 ffffffff8150ff1f 0000100000000001
[ 1088.476085]  ffff88038756d280 7fffffffffffffff ffffffff81510710 ffffc9001737fc98
[ 1088.476091] Call Trace:
[ 1088.476102]  [<ffffffff8150ff1f>] ? __schedule+0x17f/0x530
[ 1088.476107]  [<ffffffff81510710>] ? bit_wait+0x50/0x50
[ 1088.476114]  [<ffffffff815102fd>] ? schedule+0x2d/0x80
[ 1088.476117]  [<ffffffff815132be>] ? schedule_timeout+0x17e/0x2a0
[ 1088.476123]  [<ffffffff8101bb71>] ? xen_clocksource_get_cycles+0x11/0x20
[ 1088.476126]  [<ffffffff8101bb71>] ? xen_clocksource_get_cycles+0x11/0x20
[ 1088.476132]  [<ffffffff810f2196>] ? ktime_get+0x36/0xa0
[ 1088.476136]  [<ffffffff81510710>] ? bit_wait+0x50/0x50
[ 1088.476139]  [<ffffffff8150fd38>] ? io_schedule_timeout+0x98/0x100
[ 1088.476143]  [<ffffffff81513de1>] ? _raw_spin_unlock_irqrestore+0x11/0x20
[ 1088.476147]  [<ffffffff81510722>] ? bit_wait_io+0x12/0x60
[ 1088.476151]  [<ffffffff815107be>] ? __wait_on_bit+0x4e/0x80
[ 1088.476155]  [<ffffffff81510710>] ? bit_wait+0x50/0x50
[ 1088.476159]  [<ffffffff81510859>] ? out_of_line_wait_on_bit+0x69/0x80
[ 1088.476163]  [<ffffffff810d4ab0>] ? autoremove_wake_function+0x30/0x30
[ 1088.476170]  [<ffffffff812528ee>] ? jbd2_journal_commit_transaction+0xe7e/0x1610
[ 1088.476177]  [<ffffffff810eb7f6>] ? lock_timer_base+0x76/0x90
[ 1088.476182]  [<ffffffff81255b0d>] ? kjournald2+0xad/0x230
[ 1088.476189]  [<ffffffff810d4a80>] ? wake_atomic_t_function+0x50/0x50
[ 1088.476193]  [<ffffffff81255a60>] ? commit_timeout+0x10/0x10
[ 1088.476197]  [<ffffffff810a2ba5>] ? do_group_exit+0x35/0xa0
[ 1088.476201]  [<ffffffff810ba352>] ? kthread+0xc2/0xe0
[ 1088.476205]  [<ffffffff810ba290>] ? kthread_create_on_node+0x40/0x40
[ 1088.476209]  [<ffffffff81514405>] ? ret_from_fork+0x25/0x30



and also from the Dom0 (rewritten from screenshot) :

watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [kworker/11:0:26273]
Modules linked in: ...
CPU: 11 PID: 26273 Comm: kworker/11:0 Taineted: G D W L 4.13-dae-dom0 #2
Harware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016
Workqueue: events wait_rcu_exp_gp
task: ... task.stack: ...
RIP: e030:smp_call_function_single+0x6b/0xc0
...
Call Trace:
 ? sync_rcu_exp_select_cpus+0x2b5/0x410
 ? rcu_barrier_func+0x40/0x40
 ? wait_rcu_rxp_gp+0x16/0x30
 ? process_one_work+0x1ad/0x340
 ? worker_thread+0x45/0x3f0
 ? kthread+0xf2/0x130
 ? process_one_work+0x340/0x340
 ? kthread_create_on_node+0x40/0x40
 ? do_group_exit+0x35/0xa0
 ? ret_from_fork+0x25/0x30
...




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re :  task btrfs-transacti:651 blocked for more than 120 seconds
       [not found] <1506593789.26660.28.camel@daevel.fr>
  2017-09-28 11:18 ` task btrfs-transacti:651 blocked for more than 120 seconds Nikolay Borisov
       [not found] ` <ed2732d4-966a-3f17-bb5e-27f7615668ea@gmail.com>
@ 2017-09-28 15:04 ` Olivier Bonvalet
  2017-09-28 16:12   ` Roger Pau Monné
  2 siblings, 1 reply; 9+ messages in thread
From: Olivier Bonvalet @ 2017-09-28 15:04 UTC (permalink / raw)
  To: xen-devel

Le jeudi 28 septembre 2017 à 12:16 +0200, Olivier Bonvalet a écrit :
> It's a Debian Stretch system, running a 4.9.52 Linux kernel (on a Xen
> 4.8.2 hypervisor).
> With an old 4.1.x Linux kernel, I haven't any problem.

Just a precision, this VM have 26 block devices attached.
Don't know if that can be related.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re :  task  btrfs-transacti:651 blocked for more than 120 seconds
  2017-09-28 15:04 ` Olivier Bonvalet
@ 2017-09-28 16:12   ` Roger Pau Monné
  2017-09-28 17:27     ` Re : " Olivier Bonvalet
  0 siblings, 1 reply; 9+ messages in thread
From: Roger Pau Monné @ 2017-09-28 16:12 UTC (permalink / raw)
  To: Olivier Bonvalet; +Cc: xen-devel

On Thu, Sep 28, 2017 at 03:04:02PM +0000, Olivier Bonvalet wrote:
> Le jeudi 28 septembre 2017 à 12:16 +0200, Olivier Bonvalet a écrit :
> > It's a Debian Stretch system, running a 4.9.52 Linux kernel (on a Xen
> > 4.8.2 hypervisor).
> > With an old 4.1.x Linux kernel, I haven't any problem.
> 
> Just a precision, this VM have 26 block devices attached.
> Don't know if that can be related.

Quite likely. With so many PV block devices attached you either have
to limit the number of queues and persistent grants per-device, or
expand the number of grants allowed by Xen.

Can you try to set the following in the Xen command line [0] and see
if that solves your issues:

gnttab_max_frames=64

Roger.

[0] http://xenbits.xenproject.org/docs/unstable/misc/xen-command-line.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re :  Re :  task  btrfs-transacti:651 blocked for more than 120 seconds
  2017-09-28 16:12   ` Roger Pau Monné
@ 2017-09-28 17:27     ` Olivier Bonvalet
  2017-09-29  9:20       ` Roger Pau Monné
  0 siblings, 1 reply; 9+ messages in thread
From: Olivier Bonvalet @ 2017-09-28 17:27 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

Le jeudi 28 septembre 2017 à 17:12 +0100, Roger Pau Monné a écrit :
> 
> Quite likely. With so many PV block devices attached you either have
> to limit the number of queues and persistent grants per-device, or
> expand the number of grants allowed by Xen.
> 
> Can you try to set the following in the Xen command line [0] and see
> if that solves your issues:
> 
> gnttab_max_frames=64
> 
> Roger.
> 
> [0] http://xenbits.xenproject.org/docs/unstable/misc/xen-command-line
> .html
> 
> _______________________________________________
> 

Oh, from the Novell's documentation [0] I read :

« General recommendation for determining the proper value for
"gnttab_max_frames" is to multiply by 2 the number of attached disks. »


Since I have about 250 RBD devices, I suppose I have to try directly
with gnttab_max_frames=512, right ?

Thanks,

Olivier

[0] https://www.novell.com/support/kb/doc.php?id=7018590

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re :  Re : task btrfs-transacti:651 blocked for more than 120 seconds
  2017-09-28 17:27     ` Re : " Olivier Bonvalet
@ 2017-09-29  9:20       ` Roger Pau Monné
  2017-10-02 16:32         ` Re : " Olivier Bonvalet
  0 siblings, 1 reply; 9+ messages in thread
From: Roger Pau Monné @ 2017-09-29  9:20 UTC (permalink / raw)
  To: Olivier Bonvalet; +Cc: xen-devel

On Thu, Sep 28, 2017 at 05:27:54PM +0000, Olivier Bonvalet wrote:
> Le jeudi 28 septembre 2017 à 17:12 +0100, Roger Pau Monné a écrit :
> > 
> > Quite likely. With so many PV block devices attached you either have
> > to limit the number of queues and persistent grants per-device, or
> > expand the number of grants allowed by Xen.
> > 
> > Can you try to set the following in the Xen command line [0] and see
> > if that solves your issues:
> > 
> > gnttab_max_frames=64
> > 
> > Roger.
> > 
> > [0] http://xenbits.xenproject.org/docs/unstable/misc/xen-command-line
> > .html
> > 
> > _______________________________________________
> > 
> 
> Oh, from the Novell's documentation [0] I read :
> 
> « General recommendation for determining the proper value for
> "gnttab_max_frames" is to multiply by 2 the number of attached disks. »
> 
> 
> Since I have about 250 RBD devices, I suppose I have to try directly
> with gnttab_max_frames=512, right ?

Do you have 250 devices attached to the same guest? If so I guess 512
might be sensible, although you said earlier that you had 26 devices
attached, not 250.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re : Re :  Re : task  btrfs-transacti:651 blocked for more than 120 seconds
  2017-09-29  9:20       ` Roger Pau Monné
@ 2017-10-02 16:32         ` Olivier Bonvalet
  2017-10-03  9:10           ` Roger Pau Monné
  0 siblings, 1 reply; 9+ messages in thread
From: Olivier Bonvalet @ 2017-10-02 16:32 UTC (permalink / raw)
  To: Roger Pau Monné, Olivier Bonvalet; +Cc: xen-devel

Le vendredi 29 septembre 2017 à 10:20 +0100, Roger Pau Monné a écrit :
> On Thu, Sep 28, 2017 at 05:27:54PM +0000, Olivier Bonvalet wrote:
> > Le jeudi 28 septembre 2017 à 17:12 +0100, Roger Pau Monné a écrit :
> > > 
> > > Quite likely. With so many PV block devices attached you either
> > > have
> > > to limit the number of queues and persistent grants per-device,
> > > or
> > > expand the number of grants allowed by Xen.
> > > 
> > > Can you try to set the following in the Xen command line [0] and
> > > see
> > > if that solves your issues:
> > > 
> > > gnttab_max_frames=64
> > > 
> > > Roger.
> > > 
> > > [0] http://xenbits.xenproject.org/docs/unstable/misc/xen-command-
> > > line
> > > .html
> > > 
> > > _______________________________________________
> > > 
> > 
> > Oh, from the Novell's documentation [0] I read :
> > 
> > « General recommendation for determining the proper value for
> > "gnttab_max_frames" is to multiply by 2 the number of attached
> > disks. »
> > 
> > 
> > Since I have about 250 RBD devices, I suppose I have to try
> > directly
> > with gnttab_max_frames=512, right ?
> 
> Do you have 250 devices attached to the same guest? If so I guess 512
> might be sensible, although you said earlier that you had 26 devices
> attached, not 250.
> 
> Roger.
> 

Hi,

no VM have more than 26 devices. Except for the Dom0 which have about
300 devices to handle. Does dom0 affected by this gnttab_max_frames
choice ?

Anyway, after booting each hypervisor with gnttab_max_frames=256 (or
greater), it seems I don't reach this timeout anymore.

Thanks !

Olivier



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re : Re :  Re : task   btrfs-transacti:651 blocked for more than 120 seconds
  2017-10-02 16:32         ` Re : " Olivier Bonvalet
@ 2017-10-03  9:10           ` Roger Pau Monné
  0 siblings, 0 replies; 9+ messages in thread
From: Roger Pau Monné @ 2017-10-03  9:10 UTC (permalink / raw)
  To: Olivier Bonvalet; +Cc: Olivier Bonvalet, xen-devel

On Mon, Oct 02, 2017 at 04:32:39PM +0000, Olivier Bonvalet wrote:
> Le vendredi 29 septembre 2017 à 10:20 +0100, Roger Pau Monné a écrit :
> > On Thu, Sep 28, 2017 at 05:27:54PM +0000, Olivier Bonvalet wrote:
> > > Le jeudi 28 septembre 2017 à 17:12 +0100, Roger Pau Monné a écrit :
> > > > 
> > > > Quite likely. With so many PV block devices attached you either
> > > > have
> > > > to limit the number of queues and persistent grants per-device,
> > > > or
> > > > expand the number of grants allowed by Xen.
> > > > 
> > > > Can you try to set the following in the Xen command line [0] and
> > > > see
> > > > if that solves your issues:
> > > > 
> > > > gnttab_max_frames=64
> > > > 
> > > > Roger.
> > > > 
> > > > [0] http://xenbits.xenproject.org/docs/unstable/misc/xen-command-
> > > > line
> > > > .html
> > > > 
> > > > _______________________________________________
> > > > 
> > > 
> > > Oh, from the Novell's documentation [0] I read :
> > > 
> > > « General recommendation for determining the proper value for
> > > "gnttab_max_frames" is to multiply by 2 the number of attached
> > > disks. »
> > > 
> > > 
> > > Since I have about 250 RBD devices, I suppose I have to try
> > > directly
> > > with gnttab_max_frames=512, right ?
> > 
> > Do you have 250 devices attached to the same guest? If so I guess 512
> > might be sensible, although you said earlier that you had 26 devices
> > attached, not 250.
> > 
> > Roger.
> > 
> 
> Hi,
> 
> no VM have more than 26 devices. Except for the Dom0 which have about
> 300 devices to handle. Does dom0 affected by this gnttab_max_frames
> choice ?

No, this limit is only meaningful for DomUs, not for Dom0 (the disks
attached to Dom0 are not PV).

> Anyway, after booting each hypervisor with gnttab_max_frames=256 (or
> greater), it seems I don't reach this timeout anymore.

I think 256 is quite high, 64 should probably be enough. In any case
256 is 1MB of memory used by the grant table, which I guess is not
that bad.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-10-03  9:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1506593789.26660.28.camel@daevel.fr>
2017-09-28 11:18 ` task btrfs-transacti:651 blocked for more than 120 seconds Nikolay Borisov
     [not found] ` <ed2732d4-966a-3f17-bb5e-27f7615668ea@gmail.com>
2017-09-28 14:28   ` Re : " Olivier Bonvalet
     [not found]   ` <1506608901.2373.10.camel@daevel.fr>
2017-09-28 16:49     ` Re : " Olivier Bonvalet
2017-09-28 15:04 ` Olivier Bonvalet
2017-09-28 16:12   ` Roger Pau Monné
2017-09-28 17:27     ` Re : " Olivier Bonvalet
2017-09-29  9:20       ` Roger Pau Monné
2017-10-02 16:32         ` Re : " Olivier Bonvalet
2017-10-03  9:10           ` Roger Pau Monné

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).