From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nikolay Borisov Subject: Re: bcache causes RCU stalls/bcache_gc hogs CPU Date: Wed, 15 Apr 2015 11:41:31 +0300 Message-ID: <552E243B.7050806@siteground.com> References: <552D29B8.9040704@siteground.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail.siteground.com ([67.19.240.234]:38287 "EHLO mail.siteground.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752958AbbDOIle (ORCPT ); Wed, 15 Apr 2015 04:41:34 -0400 In-Reply-To: Sender: linux-bcache-owner@vger.kernel.org List-Id: linux-bcache@vger.kernel.org To: Eric Wheeler Cc: linux-bcache@vger.kernel.org Thanks for the patches, I've applied them to 4.0 and am in the process of testing that. Do you happen to know in which (if any) repo do those patches live and if there is a way to "reliably" (e.g. a repo where they are being applied) track them or have you just collected them from misc postings to the mailing list? Regards, Nikolay On 04/14/2015 11:03 PM, Eric Wheeler wrote: > Apply all of the attached patches to your kernel and try again. > > I wish somebody would apply these upstream and get it into the official > kernel. I have been carrying all of these patches with me for some time > and they definitely make bcache more stable. > > -Eric > > > -- > Eric Wheeler, President eWheeler, Inc. dba Global Linux Security > 888-LINUX26 (888-546-8926) Fax: 503-716-3878 PO Box 25107 > www.GlobalLinuxSecurity.pro Linux since 1996! Portland, OR 97298 > > On Tue, 14 Apr 2015, Nikolay Borisov wrote: > >> Hello list, >> >> >> I'm currently testing bcache with the following setup: >> >> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT >> >> sda 8:0 0 1.8T 0 disk >> ??sda1 8:1 0 2M 0 part >> ??sda2 8:2 0 191M 0 part >> ??sda3 8:3 0 1.8T 0 part >> ??main-os (dm-0) 254:0 0 1.8T 0 lvm / >> sdb 8:16 0 223.1G 0 disk >> ??sdb1 8:17 0 188.2M 0 part /boot >> ??sdb2 8:18 0 222.9G 0 part >> ??main-ssd (dm-1) 254:1 0 40G 0 lvm >> ? ??bcache0 253:0 0 182G 0 disk /sdb >> ??main-db (dm-2) 254:2 0 182G 0 lvm >> ??bcache0 253:0 0 182G 0 disk /sdb >> >> So a 40gig ssd (main-ssd, lvm2 volume) backed by a 180gig hdd (main-db, lvm 2 >> volume), using the writeback cache policy. Every other setting is at its >> default. I'm running the 4.0-rc6 (!CONFIG_PREEMPT). After running fio (using a >> 30gb file) with a mix of sequential and random i/o and I'm getting the >> following RCU warn: >> >> ====================================================== >> INFO: rcu_sched self-detected stall on CPU >> 4: (2099 ticks this GP) idle=fcf/140000000000001/0 >> softirq=1031582/1031582 fqs=2100 >> INFO: rcu_sched detected stalls on CPUs/tasks: >> 4: (2099 ticks this GP) idle=fcf/140000000000001/0 >> softirq=1031582/1031582 fqs=2100 >> (detected by 16, t=2104 jiffies, g=2176431, c=2176430, q=3098) >> Task dump for CPU 4: >> bcache_gc R running task 12728 18115 2 0x00000008 >> ffff880079e85720 fffffffffffffffc ffff88046c180e20 fffffffffffffffc >> ffffffff81091693 fffffffffffffffc ffff88086aa3d000 ffff88046c180000 >> ffff88086aa3d000 ffff88046c180000 ffff88086aa3d060 ffff88086aa3d000 >> Call Trace: >> [] ? __wake_up+0x53/0x70 >> [] ? bch_btree_gc+0x2f4/0x560 [bcache] >> [] ? __switch_to+0xbb/0x5f0 >> [] ? woken_wake_function+0x20/0x20 >> [] ? bch_gc_thread+0x38/0x120 [bcache] >> [] ? bch_btree_gc+0x560/0x560 [bcache] >> [] ? bch_btree_gc+0x560/0x560 [bcache] >> [] ? kthread+0xce/0xf0 >> [] ? kthread_freezable_should_stop+0x70/0x70 >> [] ? ret_from_fork+0x58/0x90 >> [] ? kthread_freezable_should_stop+0x70/0x70 >> (t=2228 jiffies g=2176431 c=2176430 q=3161) >> Task dump for CPU 4: >> bcache_gc R running task 12728 18115 2 0x00000008 >> 0000000000000005 ffff88046fc83ca8 ffffffff8107720b 0000000000000004 >> ffffffff8183d040 ffff88046fc83cc8 ffffffff810772af ffff88046fc83cc8 >> ffffffff8183d100 ffff88046fc83cf8 ffffffff810a5101 ffff88046fc94500 >> Call Trace: >> [] sched_show_task+0xcb/0x130 >> [] dump_cpu_task+0x3f/0x50 >> [] rcu_dump_cpu_stacks+0x91/0xd0 >> [] rcu_check_callbacks+0x65f/0xc30 >> [] ? account_process_tick+0x6c/0x170 >> [] update_process_times+0x39/0x70 >> [] tick_sched_handle+0x40/0x50 >> [] tick_sched_timer+0x52/0xa0 >> [] __run_hrtimer+0x86/0x1d0 >> [] ? tick_nohz_handler+0xc0/0xc0 >> [] hrtimer_interrupt+0x102/0x240 >> [] ? bch_ptr_invalid+0x10/0x10 [bcache] >> [] local_apic_timer_interrupt+0x39/0x60 >> [] smp_apic_timer_interrupt+0x45/0x59 >> [] ? bch_ptr_invalid+0x10/0x10 [bcache] >> [] apic_timer_interrupt+0x6d/0x80 >> [] ? __bch_extent_invalid+0xa5/0xd0 [bcache] >> [] ? __bch_extent_invalid+0x1/0xd0 [bcache] >> [] ? bch_extent_invalid+0x12/0x20 [bcache] >> [] bch_extent_bad+0x2d/0x1c0 [bcache] >> [] bch_ptr_bad+0xa/0x10 [bcache] >> [] bch_btree_iter_next_filter+0x39/0x50 [bcache] >> [] btree_gc_count_keys+0x50/0x70 [bcache] >> [] btree_gc_recurse+0x1bf/0x2e0 [bcache] >> [] ? btree_gc_mark_node+0xdc/0x210 [bcache] >> [] ? __wake_up+0x53/0x70 >> [] bch_btree_gc+0x2f4/0x560 [bcache] >> [] ? __switch_to+0xbb/0x5f0 >> [] ? woken_wake_function+0x20/0x20 >> [] bch_gc_thread+0x38/0x120 [bcache] >> [] ? bch_btree_gc+0x560/0x560 [bcache] >> [] ? bch_btree_gc+0x560/0x560 [bcache] >> [] kthread+0xce/0xf0 >> [] ? kthread_freezable_should_stop+0x70/0x70 >> [] ret_from_fork+0x58/0x90 >> >> Naturally, checking >> /sys/fs/bcache/b9bcddd1-7a9a-4f2f-88e6-cb5bef6abcf2/internal/btree_gc_max_duration_ms >> shows: 31593 Clearly at some point the GC overhead becomes so large that it >> causes RCU grace period stalls. I'm puzzled since bch_btree_gc_finish(...) is >> not listed and this is the only function that pertains to bcache gc AND >> executes code in RCU critical read section. >> >> In addition to that I also observed that the after this RCU stall warn occurs >> the bcache_gc thread hogs the machine at 100% rendering it unusable. I managed >> to get 2 call stack dumps via magic sysrq as follows: >> >> ============= >> NMI backtrace for cpu 4 >> CPU: 4 PID: 18115 Comm: bcache_gc Not tainted 4.0.0-rc6bcache1-nikbor #4 >> Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.0b 12/05/2013 >> task: ffff88086ab093e0 ti: ffff880868024000 task.ti: ffff880868024000 >> RIP: 0010:[] [] >> bch_btree_iter_next_filter+0xb/0x50 [bcache] >> RSP: 0018:ffff880868027bd8 EFLAGS: 00000202 >> RAX: 0000000000000001 RBX: 0000000000002034 RCX: 000000000000000a >> RDX: ffffffffa0109920 RSI: ffff88086aa3dcd0 RDI: ffff880868027c08 >> RBP: ffff880868027bf8 R08: 0000000000000001 R09: 0000000000000001 >> R10: 000007ffffffffff R11: 0000000000000008 R12: ffff88086aa3dcd0 >> R13: ffff880868027c08 R14: ffff880868027cf8 R15: ffff880868027dd8 >> FS: 0000000000000000(0000) GS:ffff88046fc80000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00007f4f6b410008 CR3: 000000000180e000 CR4: 00000000001406e0 >> Stack: >> 0000000000002034 ffff88086aa3dcd0 ffff880868027c08 ffff880868027cf8 >> ffff880868027c78 ffffffffa0109c80 0000000000000004 0000000000000001 >> ffff8807740101c0 ffff88077402a9f8 ffff88086aa3dc00 ffff88086aa3d000 >> Call Trace: >> [] btree_gc_count_keys+0x50/0x70 [bcache] >> [] btree_gc_recurse+0x1bf/0x2e0 [bcache] >> [] ? btree_gc_mark_node+0xdc/0x210 [bcache] >> [] ? __wake_up+0x53/0x70 >> [] bch_btree_gc+0x2f4/0x560 [bcache] >> [] ? __switch_to+0xbb/0x5f0 >> [] ? woken_wake_function+0x20/0x20 >> [] bch_gc_thread+0x38/0x120 [bcache] >> [] ? bch_btree_gc+0x560/0x560 [bcache] >> [] ? bch_btree_gc+0x560/0x560 [bcache] >> [] kthread+0xce/0xf0 >> [] ? kthread_freezable_should_stop+0x70/0x70 >> [] ret_from_fork+0x58/0x90 >> [] ? kthread_freezable_should_stop+0x70/0x70 >> Code: ff 48 89 d7 4c 29 cf eb a7 48 29 f2 48 89 d6 e9 18 ff ff ff 66 66 66 2e >> 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 56 41 55 41 54 53 <0f> 1f 44 00 00 48 >> 89 fb 49 89 f4 49 89 d6 0f 1f 80 00 00 00 00 >> >> >> =========================================== >> >> NMI backtrace for cpu 4 >> CPU: 4 PID: 18115 Comm: bcache_gc Not tainted 4.0.0-rc6bcache1-nikbor #4 >> Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.0b 12/05/2013 >> task: ffff88086ab093e0 ti: ffff880868024000 task.ti: ffff880868024000 >> RIP: 0010:[] [] >> bch_extent_bad+0x106/0x1c0 [bcache] >> RSP: 0018:ffff880868027ba8 EFLAGS: 00000202 >> RAX: 000000000000bd2a RBX: ffff88077400a550 RCX: 000000000000000a >> RDX: 0000000000000004 RSI: 00000000fc390004 RDI: ffff88046c180000 >> RBP: ffff880868027bb8 R08: 0000000000000001 R09: 0000000000000000 >> R10: 000007ffffffffff R11: 0000000000000008 R12: ffff88086aa3dcd0 >> R13: ffff88077400a550 R14: ffffffffa0109920 R15: ffff880868027dd8 >> FS: 0000000000000000(0000) GS:ffff88046fc80000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00007f4f6b410008 CR3: 000000000180e000 CR4: 00000000001406e0 >> Stack: >> ffff880868027c08 ffff88086aa3dcd0 ffff880868027bc8 ffffffffa010992a >> ffff880868027bf8 ffffffffa01098f9 00000000000014a6 ffff88086aa3dcd0 >> ffff880868027c08 ffff880868027cf8 ffff880868027c78 ffffffffa0109c80 >> Call Trace: >> [] bch_ptr_bad+0xa/0x10 [bcache] >> [] bch_btree_iter_next_filter+0x39/0x50 [bcache] >> [] btree_gc_count_keys+0x50/0x70 [bcache] >> [] btree_gc_recurse+0x1bf/0x2e0 [bcache] >> [] ? btree_gc_mark_node+0xdc/0x210 [bcache] >> [] ? __wake_up+0x53/0x70 >> [] bch_btree_gc+0x2f4/0x560 [bcache] >> [] ? __switch_to+0xbb/0x5f0 >> [] ? woken_wake_function+0x20/0x20 >> [] bch_gc_thread+0x38/0x120 [bcache] >> [] ? bch_btree_gc+0x560/0x560 [bcache] >> [] ? bch_btree_gc+0x560/0x560 [bcache] >> [] kthread+0xce/0xf0 >> [] ? kthread_freezable_should_stop+0x70/0x70 >> [] ret_from_fork+0x58/0x90 >> [] ? kthread_freezable_should_stop+0x70/0x70 >> Code: 33 25 ff 0f 00 00 48 8b 94 c7 40 0c 00 00 48 89 f0 48 8b 92 d8 0a 00 00 >> 48 c1 e8 08 4c 21 d0 48 d3 e8 48 8d 04 40 0f b6 54 82 06 <40> 28 f2 80 fa 80 >> 0f 87 7e 00 00 00 0f b6 d2 83 fa 60 76 66 31 >> >> >> In the mean time I'm running the stable 4.0.0 where I observe better results ( >> no bcache_gc thread hog but still the occasional stall warn) >> >> Regards, >> Nikolay >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html