* Re: task btrfs-transacti:651 blocked for more than 120 seconds [not found] <1506593789.26660.28.camel@daevel.fr> @ 2017-09-28 11:18 ` Nikolay Borisov [not found] ` <ed2732d4-966a-3f17-bb5e-27f7615668ea@gmail.com> 2017-09-28 15:04 ` Olivier Bonvalet 2 siblings, 0 replies; 9+ messages in thread From: Nikolay Borisov @ 2017-09-28 11:18 UTC (permalink / raw) To: Olivier Bonvalet, linux-btrfs; +Cc: xen-devel On 28.09.2017 13:16, Olivier Bonvalet wrote: > Hi ! > > I have a virtual server (Xen) which very frequently hangs with only > this error in logs : > > [ 1330.144124] INFO: task btrfs-transacti:651 blocked for more than 120 seconds. > [ 1330.144141] Not tainted 4.9-dae-xen #2 > [ 1330.144146] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 1330.144179] btrfs-transacti D 0 651 2 0x00000000 > [ 1330.144184] ffff8803a6c85b40 0000000000000000 ffff8803af857880 ffff8803a9762180 > [ 1330.144190] ffff8803a7bb8140 ffffc900173bfb10 ffffffff8150ff1f 0000000000000000 > [ 1330.144195] ffff8803a7bb8140 7fffffffffffffff ffffffff81510710 ffffc900173bfc18 > [ 1330.144200] Call Trace: > [ 1330.144211] [<ffffffff8150ff1f>] ? __schedule+0x17f/0x530 > [ 1330.144215] [<ffffffff81510710>] ? bit_wait+0x50/0x50 > [ 1330.144218] [<ffffffff815102fd>] ? schedule+0x2d/0x80 > [ 1330.144221] [<ffffffff815132be>] ? schedule_timeout+0x17e/0x2a0 > [ 1330.144226] [<ffffffff8101bb71>] ? xen_clocksource_get_cycles+0x11/0x20 > [ 1330.144231] [<ffffffff810f2196>] ? ktime_get+0x36/0xa0 > [ 1330.144234] [<ffffffff81510710>] ? bit_wait+0x50/0x50 > [ 1330.144237] [<ffffffff8150fd38>] ? io_schedule_timeout+0x98/0x100 > [ 1330.144240] [<ffffffff81513de1>] ? _raw_spin_unlock_irqrestore+0x11/0x20 > [ 1330.144246] [<ffffffff81510722>] ? bit_wait_io+0x12/0x60 > [ 1330.144250] [<ffffffff815107be>] ? __wait_on_bit+0x4e/0x80 > [ 1330.144256] [<ffffffff8113772c>] ? wait_on_page_bit+0x6c/0x80 > [ 1330.144261] [<ffffffff810d4ab0>] ? autoremove_wake_function+0x30/0x30 > [ 1330.144265] [<ffffffff81137808>] ? __filemap_fdatawait_range+0xc8/0x110 > [ 1330.144270] [<ffffffff81137859>] ? filemap_fdatawait_range+0x9/0x20 > [ 1330.144298] [<ffffffffa014b033>] ? btrfs_wait_ordered_range+0x63/0x100 [btrfs] > [ 1330.144310] [<ffffffffa0175a68>] ? btrfs_wait_cache_io+0x58/0x1e0 [btrfs] > [ 1330.144320] [<ffffffffa011ded2>] ? btrfs_start_dirty_block_groups+0x1c2/0x450 [btrfs] > [ 1330.144328] [<ffffffff810a2ba5>] ? do_group_exit+0x35/0xa0 > [ 1330.144338] [<ffffffffa012efa7>] ? btrfs_commit_transaction+0x147/0x9b0 [btrfs] > [ 1330.144348] [<ffffffffa012f8a2>] ? start_transaction+0x92/0x3f0 [btrfs] > [ 1330.144357] [<ffffffffa012a0e7>] ? transaction_kthread+0x1d7/0x1f0 [btrfs] > [ 1330.144366] [<ffffffffa0129f10>] ? btrfs_cleanup_transaction+0x4f0/0x4f0 [btrfs] > [ 1330.144373] [<ffffffff810ba352>] ? kthread+0xc2/0xe0 > [ 1330.144377] [<ffffffff810ba290>] ? kthread_create_on_node+0x40/0x40 > [ 1330.144381] [<ffffffff81514405>] ? ret_from_fork+0x25/0x30 So what this stack trace means is that transaction commit has hung. And judging by the called functions (assuming they are correct, though the ? aren't very encouraging). Concretely, it means that an io has been started for a certain range of addresses and transaction commit is now waiting to be awaken upon completion of write. When this occurs can you see if there is io activity from that particular guest (assuming you have access to the hypervisor)? It might be a bug in btrfs or you might be hitting something else in the hypervisor > > > It's a Debian Stretch system, running a 4.9.52 Linux kernel (on a Xen 4.8.2 hypervisor). > With an old 4.1.x Linux kernel, I haven't any problem. > > > Is it a Btrfs bug ? Should I try a more recent kernel ? (which one ?) > > Thanks in advance, > > Olivier > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <ed2732d4-966a-3f17-bb5e-27f7615668ea@gmail.com>]
* Re : task btrfs-transacti:651 blocked for more than 120 seconds [not found] ` <ed2732d4-966a-3f17-bb5e-27f7615668ea@gmail.com> @ 2017-09-28 14:28 ` Olivier Bonvalet [not found] ` <1506608901.2373.10.camel@daevel.fr> 1 sibling, 0 replies; 9+ messages in thread From: Olivier Bonvalet @ 2017-09-28 14:28 UTC (permalink / raw) To: Nikolay Borisov, linux-btrfs; +Cc: xen-devel Le jeudi 28 septembre 2017 à 14:18 +0300, Nikolay Borisov a écrit : > So what this stack trace means is that transaction commit has hung. > And > judging by the called functions (assuming they are correct, though > the ? > aren't very encouraging). Concretely, it means that an io has been > started for a certain range of addresses and transaction commit is > now > waiting to be awaken upon completion of write. When this occurs can > you > see if there is io activity from that particular guest (assuming you > have access to the hypervisor)? It might be a bug in btrfs or you > might > be hitting something else in the hypervisor Hello, thanks for your answer. From the hypervisor, I don't see any IO during this hang. I tried to clone the VM to simulate the problem, and I also have the problem without Btrfs : [ 3263.452023] INFO: task systemd:1 blocked for more than 120 seconds. [ 3263.452040] Tainted: G W 4.9-dae-xen #2 [ 3263.452044] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3263.452052] systemd D 0 1 0 0x00000000 [ 3263.452060] ffff8803a71ca000 0000000000000000 ffff8803af857880 ffff8803a9762dc0 [ 3263.452070] ffff8803a96fcc80 ffffc9001623f990 ffffffff8150ff1f 0000000000000000 [ 3263.452079] ffff8803a96fcc80 7fffffffffffffff ffffffff81510710 ffffc9001623faa0 [ 3263.452087] Call Trace: [ 3263.452099] [<ffffffff8150ff1f>] ? __schedule+0x17f/0x530 [ 3263.452105] [<ffffffff81510710>] ? bit_wait+0x50/0x50 [ 3263.452110] [<ffffffff815102fd>] ? schedule+0x2d/0x80 [ 3263.452116] [<ffffffff815132be>] ? schedule_timeout+0x17e/0x2a0 [ 3263.452121] [<ffffffff8101bb71>] ? xen_clocksource_get_cycles+0x11/0x20 [ 3263.452126] [<ffffffff810f2196>] ? ktime_get+0x36/0xa0 [ 3263.452130] [<ffffffff81510710>] ? bit_wait+0x50/0x50 [ 3263.452134] [<ffffffff8150fd38>] ? io_schedule_timeout+0x98/0x100 [ 3263.452137] [<ffffffff81513de1>] ? _raw_spin_unlock_irqrestore+0x11/0x20 [ 3263.452141] [<ffffffff81510722>] ? bit_wait_io+0x12/0x60 [ 3263.452145] [<ffffffff815107be>] ? __wait_on_bit+0x4e/0x80 [ 3263.452149] [<ffffffff81510710>] ? bit_wait+0x50/0x50 [ 3263.452153] [<ffffffff81510859>] ? out_of_line_wait_on_bit+0x69/0x80 [ 3263.452157] [<ffffffff810d4ab0>] ? autoremove_wake_function+0x30/0x30 [ 3263.452163] [<ffffffff81220ed0>] ? ext4_find_entry+0x350/0x5d0 [ 3263.452168] [<ffffffff811b9020>] ? d_alloc_parallel+0xa0/0x480 [ 3263.452172] [<ffffffff811b6d18>] ? __d_lookup_done+0x68/0xd0 [ 3263.452175] [<ffffffff811b7f38>] ? d_splice_alias+0x158/0x3b0 [ 3263.452179] [<ffffffff81221662>] ? ext4_lookup+0x42/0x1f0 [ 3263.452184] [<ffffffff811ab28e>] ? lookup_slow+0x8e/0x130 [ 3263.452187] [<ffffffff811ab71a>] ? walk_component+0x1ca/0x300 [ 3263.452193] [<ffffffff811ac0fe>] ? link_path_walk+0x18e/0x570 [ 3263.452199] [<ffffffff811abe13>] ? path_init+0x1c3/0x320 [ 3263.452207] [<ffffffff811ae4c2>] ? path_openat+0xe2/0x1380 [ 3263.452214] [<ffffffff811b0329>] ? do_filp_open+0x79/0xd0 [ 3263.452222] [<ffffffff81185fc1>] ? kmem_cache_alloc+0x71/0x400 [ 3263.452228] [<ffffffff8119d507>] ? __check_object_size+0xf7/0x1c4 [ 3263.452235] [<ffffffff8119f8cf>] ? do_sys_open+0x11f/0x1f0 [ 3263.452238] [<ffffffff815141b7>] ? entry_SYSCALL_64_fastpath+0x1a/0xa9 So I will try to see with Xen developpers. Thanks, Olivier _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <1506608901.2373.10.camel@daevel.fr>]
* Re : Re : task btrfs-transacti:651 blocked for more than 120 seconds [not found] ` <1506608901.2373.10.camel@daevel.fr> @ 2017-09-28 16:49 ` Olivier Bonvalet 0 siblings, 0 replies; 9+ messages in thread From: Olivier Bonvalet @ 2017-09-28 16:49 UTC (permalink / raw) To: xen-devel; +Cc: linux-btrfs Le jeudi 28 septembre 2017 à 16:28 +0200, Olivier Bonvalet a écrit : > [ 3263.452023] INFO: task systemd:1 blocked for more than 120 > seconds. > [ 3263.452040] Tainted: G W 4.9-dae-xen #2 > [ 3263.452044] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 3263.452052] systemd D 0 1 0 0x00000000 > [ 3263.452060] ffff8803a71ca000 0000000000000000 ffff8803af857880 > ffff8803a9762dc0 > [ 3263.452070] ffff8803a96fcc80 ffffc9001623f990 ffffffff8150ff1f > 0000000000000000 > [ 3263.452079] ffff8803a96fcc80 7fffffffffffffff ffffffff81510710 > ffffc9001623faa0 > [ 3263.452087] Call Trace: > [ 3263.452099] [<ffffffff8150ff1f>] ? __schedule+0x17f/0x530 > [ 3263.452105] [<ffffffff81510710>] ? bit_wait+0x50/0x50 > [ 3263.452110] [<ffffffff815102fd>] ? schedule+0x2d/0x80 > [ 3263.452116] [<ffffffff815132be>] ? schedule_timeout+0x17e/0x2a0 > [ 3263.452121] [<ffffffff8101bb71>] ? > xen_clocksource_get_cycles+0x11/0x20 > [ 3263.452126] [<ffffffff810f2196>] ? ktime_get+0x36/0xa0 > [ 3263.452130] [<ffffffff81510710>] ? bit_wait+0x50/0x50 > [ 3263.452134] [<ffffffff8150fd38>] ? io_schedule_timeout+0x98/0x100 > [ 3263.452137] [<ffffffff81513de1>] ? > _raw_spin_unlock_irqrestore+0x11/0x20 > [ 3263.452141] [<ffffffff81510722>] ? bit_wait_io+0x12/0x60 > [ 3263.452145] [<ffffffff815107be>] ? __wait_on_bit+0x4e/0x80 > [ 3263.452149] [<ffffffff81510710>] ? bit_wait+0x50/0x50 > [ 3263.452153] [<ffffffff81510859>] ? > out_of_line_wait_on_bit+0x69/0x80 > [ 3263.452157] [<ffffffff810d4ab0>] ? > autoremove_wake_function+0x30/0x30 > [ 3263.452163] [<ffffffff81220ed0>] ? ext4_find_entry+0x350/0x5d0 > [ 3263.452168] [<ffffffff811b9020>] ? d_alloc_parallel+0xa0/0x480 > [ 3263.452172] [<ffffffff811b6d18>] ? __d_lookup_done+0x68/0xd0 > [ 3263.452175] [<ffffffff811b7f38>] ? d_splice_alias+0x158/0x3b0 > [ 3263.452179] [<ffffffff81221662>] ? ext4_lookup+0x42/0x1f0 > [ 3263.452184] [<ffffffff811ab28e>] ? lookup_slow+0x8e/0x130 > [ 3263.452187] [<ffffffff811ab71a>] ? walk_component+0x1ca/0x300 > [ 3263.452193] [<ffffffff811ac0fe>] ? link_path_walk+0x18e/0x570 > [ 3263.452199] [<ffffffff811abe13>] ? path_init+0x1c3/0x320 > [ 3263.452207] [<ffffffff811ae4c2>] ? path_openat+0xe2/0x1380 > [ 3263.452214] [<ffffffff811b0329>] ? do_filp_open+0x79/0xd0 > [ 3263.452222] [<ffffffff81185fc1>] ? kmem_cache_alloc+0x71/0x400 > [ 3263.452228] [<ffffffff8119d507>] ? __check_object_size+0xf7/0x1c4 > [ 3263.452235] [<ffffffff8119f8cf>] ? do_sys_open+0x11f/0x1f0 > [ 3263.452238] [<ffffffff815141b7>] ? > entry_SYSCALL_64_fastpath+0x1a/0xa9 Just in case, an other example : [ 1088.476044] INFO: task jbd2/xvdb-8:494 blocked for more than 120 seconds. [ 1088.476058] Tainted: G W 4.9-dae-xen #2 [ 1088.476061] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1088.476066] jbd2/xvdb-8 D 0 494 2 0x00000000 [ 1088.476072] ffff8800fd036480 0000000000000000 ffff8803af8d7880 ffff8803a8c6e580 [ 1088.476079] ffff88038756d280 ffffc9001737fb90 ffffffff8150ff1f 0000100000000001 [ 1088.476085] ffff88038756d280 7fffffffffffffff ffffffff81510710 ffffc9001737fc98 [ 1088.476091] Call Trace: [ 1088.476102] [<ffffffff8150ff1f>] ? __schedule+0x17f/0x530 [ 1088.476107] [<ffffffff81510710>] ? bit_wait+0x50/0x50 [ 1088.476114] [<ffffffff815102fd>] ? schedule+0x2d/0x80 [ 1088.476117] [<ffffffff815132be>] ? schedule_timeout+0x17e/0x2a0 [ 1088.476123] [<ffffffff8101bb71>] ? xen_clocksource_get_cycles+0x11/0x20 [ 1088.476126] [<ffffffff8101bb71>] ? xen_clocksource_get_cycles+0x11/0x20 [ 1088.476132] [<ffffffff810f2196>] ? ktime_get+0x36/0xa0 [ 1088.476136] [<ffffffff81510710>] ? bit_wait+0x50/0x50 [ 1088.476139] [<ffffffff8150fd38>] ? io_schedule_timeout+0x98/0x100 [ 1088.476143] [<ffffffff81513de1>] ? _raw_spin_unlock_irqrestore+0x11/0x20 [ 1088.476147] [<ffffffff81510722>] ? bit_wait_io+0x12/0x60 [ 1088.476151] [<ffffffff815107be>] ? __wait_on_bit+0x4e/0x80 [ 1088.476155] [<ffffffff81510710>] ? bit_wait+0x50/0x50 [ 1088.476159] [<ffffffff81510859>] ? out_of_line_wait_on_bit+0x69/0x80 [ 1088.476163] [<ffffffff810d4ab0>] ? autoremove_wake_function+0x30/0x30 [ 1088.476170] [<ffffffff812528ee>] ? jbd2_journal_commit_transaction+0xe7e/0x1610 [ 1088.476177] [<ffffffff810eb7f6>] ? lock_timer_base+0x76/0x90 [ 1088.476182] [<ffffffff81255b0d>] ? kjournald2+0xad/0x230 [ 1088.476189] [<ffffffff810d4a80>] ? wake_atomic_t_function+0x50/0x50 [ 1088.476193] [<ffffffff81255a60>] ? commit_timeout+0x10/0x10 [ 1088.476197] [<ffffffff810a2ba5>] ? do_group_exit+0x35/0xa0 [ 1088.476201] [<ffffffff810ba352>] ? kthread+0xc2/0xe0 [ 1088.476205] [<ffffffff810ba290>] ? kthread_create_on_node+0x40/0x40 [ 1088.476209] [<ffffffff81514405>] ? ret_from_fork+0x25/0x30 and also from the Dom0 (rewritten from screenshot) : watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [kworker/11:0:26273] Modules linked in: ... CPU: 11 PID: 26273 Comm: kworker/11:0 Taineted: G D W L 4.13-dae-dom0 #2 Harware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016 Workqueue: events wait_rcu_exp_gp task: ... task.stack: ... RIP: e030:smp_call_function_single+0x6b/0xc0 ... Call Trace: ? sync_rcu_exp_select_cpus+0x2b5/0x410 ? rcu_barrier_func+0x40/0x40 ? wait_rcu_rxp_gp+0x16/0x30 ? process_one_work+0x1ad/0x340 ? worker_thread+0x45/0x3f0 ? kthread+0xf2/0x130 ? process_one_work+0x340/0x340 ? kthread_create_on_node+0x40/0x40 ? do_group_exit+0x35/0xa0 ? ret_from_fork+0x25/0x30 ... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re : task btrfs-transacti:651 blocked for more than 120 seconds [not found] <1506593789.26660.28.camel@daevel.fr> 2017-09-28 11:18 ` task btrfs-transacti:651 blocked for more than 120 seconds Nikolay Borisov [not found] ` <ed2732d4-966a-3f17-bb5e-27f7615668ea@gmail.com> @ 2017-09-28 15:04 ` Olivier Bonvalet 2017-09-28 16:12 ` Roger Pau Monné 2 siblings, 1 reply; 9+ messages in thread From: Olivier Bonvalet @ 2017-09-28 15:04 UTC (permalink / raw) To: xen-devel Le jeudi 28 septembre 2017 à 12:16 +0200, Olivier Bonvalet a écrit : > It's a Debian Stretch system, running a 4.9.52 Linux kernel (on a Xen > 4.8.2 hypervisor). > With an old 4.1.x Linux kernel, I haven't any problem. Just a precision, this VM have 26 block devices attached. Don't know if that can be related. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re : task btrfs-transacti:651 blocked for more than 120 seconds 2017-09-28 15:04 ` Olivier Bonvalet @ 2017-09-28 16:12 ` Roger Pau Monné 2017-09-28 17:27 ` Re : " Olivier Bonvalet 0 siblings, 1 reply; 9+ messages in thread From: Roger Pau Monné @ 2017-09-28 16:12 UTC (permalink / raw) To: Olivier Bonvalet; +Cc: xen-devel On Thu, Sep 28, 2017 at 03:04:02PM +0000, Olivier Bonvalet wrote: > Le jeudi 28 septembre 2017 à 12:16 +0200, Olivier Bonvalet a écrit : > > It's a Debian Stretch system, running a 4.9.52 Linux kernel (on a Xen > > 4.8.2 hypervisor). > > With an old 4.1.x Linux kernel, I haven't any problem. > > Just a precision, this VM have 26 block devices attached. > Don't know if that can be related. Quite likely. With so many PV block devices attached you either have to limit the number of queues and persistent grants per-device, or expand the number of grants allowed by Xen. Can you try to set the following in the Xen command line [0] and see if that solves your issues: gnttab_max_frames=64 Roger. [0] http://xenbits.xenproject.org/docs/unstable/misc/xen-command-line.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re : Re : task btrfs-transacti:651 blocked for more than 120 seconds 2017-09-28 16:12 ` Roger Pau Monné @ 2017-09-28 17:27 ` Olivier Bonvalet 2017-09-29 9:20 ` Roger Pau Monné 0 siblings, 1 reply; 9+ messages in thread From: Olivier Bonvalet @ 2017-09-28 17:27 UTC (permalink / raw) To: Roger Pau Monné; +Cc: xen-devel Le jeudi 28 septembre 2017 à 17:12 +0100, Roger Pau Monné a écrit : > > Quite likely. With so many PV block devices attached you either have > to limit the number of queues and persistent grants per-device, or > expand the number of grants allowed by Xen. > > Can you try to set the following in the Xen command line [0] and see > if that solves your issues: > > gnttab_max_frames=64 > > Roger. > > [0] http://xenbits.xenproject.org/docs/unstable/misc/xen-command-line > .html > > _______________________________________________ > Oh, from the Novell's documentation [0] I read : « General recommendation for determining the proper value for "gnttab_max_frames" is to multiply by 2 the number of attached disks. » Since I have about 250 RBD devices, I suppose I have to try directly with gnttab_max_frames=512, right ? Thanks, Olivier [0] https://www.novell.com/support/kb/doc.php?id=7018590 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re : Re : task btrfs-transacti:651 blocked for more than 120 seconds 2017-09-28 17:27 ` Re : " Olivier Bonvalet @ 2017-09-29 9:20 ` Roger Pau Monné 2017-10-02 16:32 ` Re : " Olivier Bonvalet 0 siblings, 1 reply; 9+ messages in thread From: Roger Pau Monné @ 2017-09-29 9:20 UTC (permalink / raw) To: Olivier Bonvalet; +Cc: xen-devel On Thu, Sep 28, 2017 at 05:27:54PM +0000, Olivier Bonvalet wrote: > Le jeudi 28 septembre 2017 à 17:12 +0100, Roger Pau Monné a écrit : > > > > Quite likely. With so many PV block devices attached you either have > > to limit the number of queues and persistent grants per-device, or > > expand the number of grants allowed by Xen. > > > > Can you try to set the following in the Xen command line [0] and see > > if that solves your issues: > > > > gnttab_max_frames=64 > > > > Roger. > > > > [0] http://xenbits.xenproject.org/docs/unstable/misc/xen-command-line > > .html > > > > _______________________________________________ > > > > Oh, from the Novell's documentation [0] I read : > > « General recommendation for determining the proper value for > "gnttab_max_frames" is to multiply by 2 the number of attached disks. » > > > Since I have about 250 RBD devices, I suppose I have to try directly > with gnttab_max_frames=512, right ? Do you have 250 devices attached to the same guest? If so I guess 512 might be sensible, although you said earlier that you had 26 devices attached, not 250. Roger. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re : Re : Re : task btrfs-transacti:651 blocked for more than 120 seconds 2017-09-29 9:20 ` Roger Pau Monné @ 2017-10-02 16:32 ` Olivier Bonvalet 2017-10-03 9:10 ` Roger Pau Monné 0 siblings, 1 reply; 9+ messages in thread From: Olivier Bonvalet @ 2017-10-02 16:32 UTC (permalink / raw) To: Roger Pau Monné, Olivier Bonvalet; +Cc: xen-devel Le vendredi 29 septembre 2017 à 10:20 +0100, Roger Pau Monné a écrit : > On Thu, Sep 28, 2017 at 05:27:54PM +0000, Olivier Bonvalet wrote: > > Le jeudi 28 septembre 2017 à 17:12 +0100, Roger Pau Monné a écrit : > > > > > > Quite likely. With so many PV block devices attached you either > > > have > > > to limit the number of queues and persistent grants per-device, > > > or > > > expand the number of grants allowed by Xen. > > > > > > Can you try to set the following in the Xen command line [0] and > > > see > > > if that solves your issues: > > > > > > gnttab_max_frames=64 > > > > > > Roger. > > > > > > [0] http://xenbits.xenproject.org/docs/unstable/misc/xen-command- > > > line > > > .html > > > > > > _______________________________________________ > > > > > > > Oh, from the Novell's documentation [0] I read : > > > > « General recommendation for determining the proper value for > > "gnttab_max_frames" is to multiply by 2 the number of attached > > disks. » > > > > > > Since I have about 250 RBD devices, I suppose I have to try > > directly > > with gnttab_max_frames=512, right ? > > Do you have 250 devices attached to the same guest? If so I guess 512 > might be sensible, although you said earlier that you had 26 devices > attached, not 250. > > Roger. > Hi, no VM have more than 26 devices. Except for the Dom0 which have about 300 devices to handle. Does dom0 affected by this gnttab_max_frames choice ? Anyway, after booting each hypervisor with gnttab_max_frames=256 (or greater), it seems I don't reach this timeout anymore. Thanks ! Olivier _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re : Re : Re : task btrfs-transacti:651 blocked for more than 120 seconds 2017-10-02 16:32 ` Re : " Olivier Bonvalet @ 2017-10-03 9:10 ` Roger Pau Monné 0 siblings, 0 replies; 9+ messages in thread From: Roger Pau Monné @ 2017-10-03 9:10 UTC (permalink / raw) To: Olivier Bonvalet; +Cc: Olivier Bonvalet, xen-devel On Mon, Oct 02, 2017 at 04:32:39PM +0000, Olivier Bonvalet wrote: > Le vendredi 29 septembre 2017 à 10:20 +0100, Roger Pau Monné a écrit : > > On Thu, Sep 28, 2017 at 05:27:54PM +0000, Olivier Bonvalet wrote: > > > Le jeudi 28 septembre 2017 à 17:12 +0100, Roger Pau Monné a écrit : > > > > > > > > Quite likely. With so many PV block devices attached you either > > > > have > > > > to limit the number of queues and persistent grants per-device, > > > > or > > > > expand the number of grants allowed by Xen. > > > > > > > > Can you try to set the following in the Xen command line [0] and > > > > see > > > > if that solves your issues: > > > > > > > > gnttab_max_frames=64 > > > > > > > > Roger. > > > > > > > > [0] http://xenbits.xenproject.org/docs/unstable/misc/xen-command- > > > > line > > > > .html > > > > > > > > _______________________________________________ > > > > > > > > > > Oh, from the Novell's documentation [0] I read : > > > > > > « General recommendation for determining the proper value for > > > "gnttab_max_frames" is to multiply by 2 the number of attached > > > disks. » > > > > > > > > > Since I have about 250 RBD devices, I suppose I have to try > > > directly > > > with gnttab_max_frames=512, right ? > > > > Do you have 250 devices attached to the same guest? If so I guess 512 > > might be sensible, although you said earlier that you had 26 devices > > attached, not 250. > > > > Roger. > > > > Hi, > > no VM have more than 26 devices. Except for the Dom0 which have about > 300 devices to handle. Does dom0 affected by this gnttab_max_frames > choice ? No, this limit is only meaningful for DomUs, not for Dom0 (the disks attached to Dom0 are not PV). > Anyway, after booting each hypervisor with gnttab_max_frames=256 (or > greater), it seems I don't reach this timeout anymore. I think 256 is quite high, 64 should probably be enough. In any case 256 is 1MB of memory used by the grant table, which I guess is not that bad. Roger. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-10-03 9:10 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1506593789.26660.28.camel@daevel.fr>
2017-09-28 11:18 ` task btrfs-transacti:651 blocked for more than 120 seconds Nikolay Borisov
[not found] ` <ed2732d4-966a-3f17-bb5e-27f7615668ea@gmail.com>
2017-09-28 14:28 ` Re : " Olivier Bonvalet
[not found] ` <1506608901.2373.10.camel@daevel.fr>
2017-09-28 16:49 ` Re : " Olivier Bonvalet
2017-09-28 15:04 ` Olivier Bonvalet
2017-09-28 16:12 ` Roger Pau Monné
2017-09-28 17:27 ` Re : " Olivier Bonvalet
2017-09-29 9:20 ` Roger Pau Monné
2017-10-02 16:32 ` Re : " Olivier Bonvalet
2017-10-03 9:10 ` Roger Pau Monné
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).