All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Borisov <n.borisov@siteground.com>
To: linux-bcache@vger.kernel.org
Subject: bcache causes RCU stalls/bcache_gc hogs CPU
Date: Tue, 14 Apr 2015 17:52:40 +0300	[thread overview]
Message-ID: <552D29B8.9040704@siteground.com> (raw)

Hello list,


I'm currently testing bcache with the following setup:

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

sda                   8:0    0   1.8T  0 disk
├─sda1                8:1    0     2M  0 part
├─sda2                8:2    0   191M  0 part
└─sda3                8:3    0   1.8T  0 part
   └─main-os (dm-0)  254:0    0   1.8T  0 lvm  /
sdb                   8:16   0 223.1G  0 disk
├─sdb1                8:17   0 188.2M  0 part /boot
└─sdb2                8:18   0 222.9G  0 part
   ├─main-ssd (dm-1) 254:1    0    40G  0 lvm
   │ └─bcache0       253:0    0   182G  0 disk /sdb
   └─main-db (dm-2)  254:2    0   182G  0 lvm
     └─bcache0       253:0    0   182G  0 disk /sdb

So a 40gig ssd (main-ssd, lvm2 volume) backed by a 180gig hdd (main-db, lvm 2 volume), using the writeback cache policy. Every other setting is at its default. I'm running the 4.0-rc6 (!CONFIG_PREEMPT). After running fio (using a 30gb file) with a mix of sequential and random i/o and I'm getting the following RCU warn:

======================================================
INFO: rcu_sched self-detected stall on CPU
         4: (2099 ticks this GP) idle=fcf/140000000000001/0 softirq=1031582/1031582 fqs=2100
INFO: rcu_sched detected stalls on CPUs/tasks:
         4: (2099 ticks this GP) idle=fcf/140000000000001/0 softirq=1031582/1031582 fqs=2100
         (detected by 16, t=2104 jiffies, g=2176431, c=2176430, q=3098)
Task dump for CPU 4:
bcache_gc	R  running task    12728 18115      2 0x00000008
  ffff880079e85720 fffffffffffffffc ffff88046c180e20 fffffffffffffffc
  ffffffff81091693 fffffffffffffffc ffff88086aa3d000 ffff88046c180000
  ffff88086aa3d000 ffff88046c180000 ffff88086aa3d060 ffff88086aa3d000
Call Trace:
  [<ffffffff81091693>] ? __wake_up+0x53/0x70
  [<ffffffffa01103d4>] ? bch_btree_gc+0x2f4/0x560 [bcache]
  [<ffffffff8100180b>] ? __switch_to+0xbb/0x5f0
  [<ffffffff810911f0>] ? woken_wake_function+0x20/0x20
  [<ffffffffa0110678>] ? bch_gc_thread+0x38/0x120 [bcache]
  [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache]
  [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache]
  [<ffffffff81070a9e>] ? kthread+0xce/0xf0
  [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70
  [<ffffffff815b8818>] ? ret_from_fork+0x58/0x90
  [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70
          (t=2228 jiffies g=2176431 c=2176430 q=3161)
Task dump for CPU 4:
bcache_gc	R  running task    12728 18115      2 0x00000008
  0000000000000005 ffff88046fc83ca8 ffffffff8107720b 0000000000000004
  ffffffff8183d040 ffff88046fc83cc8 ffffffff810772af ffff88046fc83cc8
  ffffffff8183d100 ffff88046fc83cf8 ffffffff810a5101 ffff88046fc94500
Call Trace:
  <IRQ>  [<ffffffff8107720b>] sched_show_task+0xcb/0x130
  [<ffffffff810772af>] dump_cpu_task+0x3f/0x50
  [<ffffffff810a5101>] rcu_dump_cpu_stacks+0x91/0xd0
  [<ffffffff810a68cf>] rcu_check_callbacks+0x65f/0xc30
  [<ffffffff81080ecc>] ? account_process_tick+0x6c/0x170
  [<ffffffff810acf29>] update_process_times+0x39/0x70
  [<ffffffff810beba0>] tick_sched_handle+0x40/0x50
  [<ffffffff810bedb2>] tick_sched_timer+0x52/0xa0
  [<ffffffff810afa16>] __run_hrtimer+0x86/0x1d0
  [<ffffffff810bed60>] ? tick_nohz_handler+0xc0/0xc0
  [<ffffffff810afd92>] hrtimer_interrupt+0x102/0x240
  [<ffffffffa0109920>] ? bch_ptr_invalid+0x10/0x10 [bcache]
  [<ffffffff81032e79>] local_apic_timer_interrupt+0x39/0x60
  [<ffffffff815bb355>] smp_apic_timer_interrupt+0x45/0x59
  [<ffffffffa0109920>] ? bch_ptr_invalid+0x10/0x10 [bcache]
  [<ffffffff815b972d>] apic_timer_interrupt+0x6d/0x80
  <EOI>  [<ffffffffa01117c5>] ? __bch_extent_invalid+0xa5/0xd0 [bcache]
  [<ffffffffa0111721>] ? __bch_extent_invalid+0x1/0xd0 [bcache]
  [<ffffffffa0111802>] ? bch_extent_invalid+0x12/0x20 [bcache]
  [<ffffffffa011183d>] bch_extent_bad+0x2d/0x1c0 [bcache]
  [<ffffffffa010992a>] bch_ptr_bad+0xa/0x10 [bcache]
  [<ffffffffa01098f9>] bch_btree_iter_next_filter+0x39/0x50 [bcache]
  [<ffffffffa0109c80>] btree_gc_count_keys+0x50/0x70 [bcache]
  [<ffffffffa010ffbf>] btree_gc_recurse+0x1bf/0x2e0 [bcache]
  [<ffffffffa010c4ac>] ? btree_gc_mark_node+0xdc/0x210 [bcache]
  [<ffffffff81091693>] ? __wake_up+0x53/0x70
  [<ffffffffa01103d4>] bch_btree_gc+0x2f4/0x560 [bcache]
  [<ffffffff8100180b>] ? __switch_to+0xbb/0x5f0
  [<ffffffff810911f0>] ? woken_wake_function+0x20/0x20
  [<ffffffffa0110678>] bch_gc_thread+0x38/0x120 [bcache]
  [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache]
  [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache]
  [<ffffffff81070a9e>] kthread+0xce/0xf0
  [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70
  [<ffffffff815b8818>] ret_from_fork+0x58/0x90

Naturally, checking /sys/fs/bcache/b9bcddd1-7a9a-4f2f-88e6-cb5bef6abcf2/internal/btree_gc_max_duration_ms shows: 31593  Clearly at some point the GC overhead becomes so large that it causes RCU grace period stalls. I'm puzzled since bch_btree_gc_finish(...) is not listed and this is the only function that pertains to bcache gc AND executes code in RCU critical read section.

In addition to that I also observed that the after this RCU stall warn occurs the bcache_gc thread hogs the machine at 100% rendering it unusable. I managed to get 2 call stack dumps via magic sysrq as follows:

=============
NMI backtrace for cpu 4
CPU: 4 PID: 18115 Comm: bcache_gc Not tainted 4.0.0-rc6bcache1-nikbor #4
Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.0b 12/05/2013
task: ffff88086ab093e0 ti: ffff880868024000 task.ti: ffff880868024000
RIP: 0010:[<ffffffffa01098cb>]  [<ffffffffa01098cb>] bch_btree_iter_next_filter+0xb/0x50 [bcache]
RSP: 0018:ffff880868027bd8  EFLAGS: 00000202
RAX: 0000000000000001 RBX: 0000000000002034 RCX: 000000000000000a
RDX: ffffffffa0109920 RSI: ffff88086aa3dcd0 RDI: ffff880868027c08
RBP: ffff880868027bf8 R08: 0000000000000001 R09: 0000000000000001
R10: 000007ffffffffff R11: 0000000000000008 R12: ffff88086aa3dcd0
R13: ffff880868027c08 R14: ffff880868027cf8 R15: ffff880868027dd8
FS:  0000000000000000(0000) GS:ffff88046fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4f6b410008 CR3: 000000000180e000 CR4: 00000000001406e0
Stack:
  0000000000002034 ffff88086aa3dcd0 ffff880868027c08 ffff880868027cf8
  ffff880868027c78 ffffffffa0109c80 0000000000000004 0000000000000001
  ffff8807740101c0 ffff88077402a9f8 ffff88086aa3dc00 ffff88086aa3d000
Call Trace:
  [<ffffffffa0109c80>] btree_gc_count_keys+0x50/0x70 [bcache]
  [<ffffffffa010ffbf>] btree_gc_recurse+0x1bf/0x2e0 [bcache]
  [<ffffffffa010c4ac>] ? btree_gc_mark_node+0xdc/0x210 [bcache]
  [<ffffffff81091693>] ? __wake_up+0x53/0x70
  [<ffffffffa01103d4>] bch_btree_gc+0x2f4/0x560 [bcache]
  [<ffffffff8100180b>] ? __switch_to+0xbb/0x5f0
  [<ffffffff810911f0>] ? woken_wake_function+0x20/0x20
  [<ffffffffa0110678>] bch_gc_thread+0x38/0x120 [bcache]
  [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache]
  [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache]
  [<ffffffff81070a9e>] kthread+0xce/0xf0
  [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70
  [<ffffffff815b8818>] ret_from_fork+0x58/0x90
  [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70
Code: ff 48 89 d7 4c 29 cf eb a7 48 29 f2 48 89 d6 e9 18 ff ff ff 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 56 41 55 41 54 53 <0f> 1f 44 00 00 48 89 fb 49 89 f4 49 89 d6 0f 1f 80 00 00 00 00


===========================================

NMI backtrace for cpu 4
CPU: 4 PID: 18115 Comm: bcache_gc Not tainted 4.0.0-rc6bcache1-nikbor #4
Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.0b 12/05/2013
task: ffff88086ab093e0 ti: ffff880868024000 task.ti: ffff880868024000
RIP: 0010:[<ffffffffa0111916>]  [<ffffffffa0111916>] bch_extent_bad+0x106/0x1c0 [bcache]
RSP: 0018:ffff880868027ba8  EFLAGS: 00000202
RAX: 000000000000bd2a RBX: ffff88077400a550 RCX: 000000000000000a
RDX: 0000000000000004 RSI: 00000000fc390004 RDI: ffff88046c180000
RBP: ffff880868027bb8 R08: 0000000000000001 R09: 0000000000000000
R10: 000007ffffffffff R11: 0000000000000008 R12: ffff88086aa3dcd0
R13: ffff88077400a550 R14: ffffffffa0109920 R15: ffff880868027dd8
FS:  0000000000000000(0000) GS:ffff88046fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4f6b410008 CR3: 000000000180e000 CR4: 00000000001406e0
Stack:
  ffff880868027c08 ffff88086aa3dcd0 ffff880868027bc8 ffffffffa010992a
  ffff880868027bf8 ffffffffa01098f9 00000000000014a6 ffff88086aa3dcd0
  ffff880868027c08 ffff880868027cf8 ffff880868027c78 ffffffffa0109c80
Call Trace:
  [<ffffffffa010992a>] bch_ptr_bad+0xa/0x10 [bcache]
  [<ffffffffa01098f9>] bch_btree_iter_next_filter+0x39/0x50 [bcache]
  [<ffffffffa0109c80>] btree_gc_count_keys+0x50/0x70 [bcache]
  [<ffffffffa010ffbf>] btree_gc_recurse+0x1bf/0x2e0 [bcache]
  [<ffffffffa010c4ac>] ? btree_gc_mark_node+0xdc/0x210 [bcache]
  [<ffffffff81091693>] ? __wake_up+0x53/0x70
  [<ffffffffa01103d4>] bch_btree_gc+0x2f4/0x560 [bcache]
  [<ffffffff8100180b>] ? __switch_to+0xbb/0x5f0
  [<ffffffff810911f0>] ? woken_wake_function+0x20/0x20
  [<ffffffffa0110678>] bch_gc_thread+0x38/0x120 [bcache]
  [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache]
  [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache]
  [<ffffffff81070a9e>] kthread+0xce/0xf0
  [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70
  [<ffffffff815b8818>] ret_from_fork+0x58/0x90
  [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70
Code: 33 25 ff 0f 00 00 48 8b 94 c7 40 0c 00 00 48 89 f0 48 8b 92 d8 0a 00 00 48 c1 e8 08 4c 21 d0 48 d3 e8 48 8d 04 40 0f b6 54 82 06 <40> 28 f2 80 fa 80 0f 87 7e 00 00 00 0f b6 d2 83 fa 60 76 66 31


In the mean time I'm running the stable 4.0.0 where I observe better results ( no bcache_gc thread hog but still the occasional stall warn)

Regards,
Nikolay

             reply	other threads:[~2015-04-14 15:33 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-14 14:52 Nikolay Borisov [this message]
2015-04-14 20:03 ` bcache causes RCU stalls/bcache_gc hogs CPU Eric Wheeler
2015-04-15  8:41   ` Nikolay Borisov
2015-04-15 19:49     ` Eric Wheeler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=552D29B8.9040704@siteground.com \
    --to=n.borisov@siteground.com \
    --cc=linux-bcache@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.