public inbox for linux-bcache@vger.kernel.org
 help / color / mirror / Atom feed
* bcache deadlock with /dev/ramX + partition
@ 2014-02-20 18:23 Vivek Dasmohapatra
  2014-02-21 17:08 ` Max Kutsevol
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Vivek Dasmohapatra @ 2014-02-20 18:23 UTC (permalink / raw)
  To: linux-bcache

Hi: We've been using bcache to fashion an "overflow" ramdisc: We use
/dev/ramX as the cache, and an actual block device on spinning rust
as the backing device. This works pretty well, but we've bumped into
what appears to be a locking problem in the kernel:

When the backing device is a top level block device ( eg /dev/sdb )
everything seems to work just fine. When the backing device is a 
partition ( eg /dev/sda11 ) processes seem to end up in the D state
when they fsync.

Under kernel 3.11.2 this can be triggered with a simple fsync. Under 
kernels 3.12.9, 3.13 and 3.13 running with the nosmp command line
argument it is slightly harder to provoke ( a simple fsync won't always
trigger it but our chroot install script triggers it every time ).

Further investigation hinted at this being a memory alignment problem: 
We haven't confirmed it yet, but certain offsets for partitions don't 
trigger the deadlock:

[ In all cases it's an msdos partition table, the device is bcache0 and is
   mounted under /srv which happens to be on sda10. The cache device is
   always /dev/ram0 ]:

Deadlocking configurations:

    backing: sda11;  default offset from sda (NOT a multiple of 4 KiB)
    backing: sdb1;   default offset from sdb (!= 4 KiB)
    backing: sda11;  offset from sda is:
                     - a multiple of 4  KiB
                     - a multiple of 4  MiB
                     - a multplie of 16 MiB

Non-deadlocking configurations:

    backing: sdb
    backing: sdb1;   offset from start of sdb is 4096 B

So, to sum up: a top level block device as a backing device never
seems to deadlock, some offsets for some partitions (ok, PAGE_SIZE
alignment for sdb1) _also_ do not deadlock. All other cases of
use of a partition have deadlocked so far.

I'm currently testiung under 3.12.9 and get the following trace from the
deadlocked process:

[82440.244111] INFO: task dpkg:14059 blocked for more than 120 seconds.
[82440.244141]       Not tainted 3.12-0.bpo.1-amd64 #1
[82440.244153] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[82440.244167] dpkg            D ffff88021fc14300     0 14059  14056 0x00000000
[82440.244173]  ffff8800c0edd800 0000000000000086 0000000000000088 ffffffff81813480
[82440.244177]  ffff8801d5d5bfd8 ffff8801d5d5bfd8 ffff8801d5d5bfd8 ffff8800c0edd800
[82440.244181]  0000000000000000 ffff88021fc14b40 ffff8800c0edd800 ffffffff8111f800
[82440.244185] Call Trace:
[82440.244195]  [<ffffffff8111f800>] ? __lock_page+0x70/0x70
[82440.244201]  [<ffffffff814c2797>] ? io_schedule+0x87/0xd0
[82440.244204]  [<ffffffff8111f809>] ? sleep_on_page+0x9/0x10
[82440.244208]  [<ffffffff814c00c2>] ? __wait_on_bit+0x52/0x80
[82440.244212]  [<ffffffff8111f943>] ? wait_on_page_bit+0x73/0x80
[82440.244217]  [<ffffffff81082d80>] ? wake_atomic_t_function+0x30/0x30
[82440.244220]  [<ffffffff8111fa46>] ? filemap_fdatawait_range+0xf6/0x170
[82440.244225]  [<ffffffff81121058>] ? filemap_write_and_wait_range+0x48/0x90
[82440.244230]  [<ffffffff811a948d>] ? generic_file_fsync+0x2d/0xa0
[82440.244247]  [<ffffffffa0159543>] ? ext4_sync_file+0x203/0x320 [ext4]
[82440.244251]  [<ffffffff811b3298>] ? do_fsync+0x58/0x90
[82440.244255]  [<ffffffff811b362b>] ? SyS_fsync+0xb/0x20
[82440.244259]  [<ffffffff814cb7b9>] ? system_call_fastpath+0x16/0x1b

Other traces from previous tests:

3.11.2:
[501840.292105] Call Trace:
[501840.292141]  [<ffffffff8110b270>] ? wait_on_page_read+0x60/0x60
[501840.292159]  [<ffffffff814787e4>] ? io_schedule+0x94/0x120
[501840.292167]  [<ffffffff8110b275>] ? sleep_on_page+0x5/0x10
[501840.292171]  [<ffffffff81476824>] ? __wait_on_bit+0x54/0x80
[501840.292178]  [<ffffffff8110b08f>] ? wait_on_page_bit+0x7f/0x90
[501840.292194]  [<ffffffff81078c40>] ? wake_atomic_t_function+0x30/0x30
[501840.292207]  [<ffffffff811175e8>] ? pagevec_lookup_tag+0x18/0x20
[501840.292211]  [<ffffffff8110b178>] ? filemap_fdatawait_range+0xd8/0x150
[501840.292217]  [<ffffffff8110c765>] ? filemap_write_and_wait_range+0x35/0x60
[501840.292229]  [<ffffffff8118cd8b>] ? generic_file_fsync+0x1b/0x90
[501840.292259]  [<ffffffffa01750ea>] ? ext4_sync_file+0x10a/0x2e0 [ext4]
[501840.292264]  [<ffffffff8119515c>] ? do_fsync+0x4c/0x80
[501840.292267]  [<ffffffff811953d7>] ? SyS_fsync+0x7/0x10
[501840.292275]  [<ffffffff81481de9>] ? system_call_fastpath+0x16/0x1b
[501960.292133] INFO: task dpkg:17778 blocked for more than 120 seconds.

3.13:
[ 1560.256491] Call Trace:
[ 1560.256502]  [<ffffffff81120410>] ? __lock_page+0x70/0x70
[ 1560.256509]  [<ffffffff814c96d8>] ? io_schedule+0x88/0xd0
[ 1560.256513]  [<ffffffff81120419>] ? sleep_on_page+0x9/0x10
[ 1560.256517]  [<ffffffff814c9c52>] ? __wait_on_bit+0x52/0x80
[ 1560.256521]  [<ffffffff81120adb>] ? find_get_pages_tag+0xcb/0x180
[ 1560.256526]  [<ffffffff81120533>] ? wait_on_page_bit+0x73/0x80
[ 1560.256531]  [<ffffffff8109c230>] ? wake_atomic_t_function+0x30/0x30
[ 1560.256535]  [<ffffffff81120610>] ? filemap_fdatawait_range+0xd0/0x150
[ 1560.256540]  [<ffffffff8112193c>] ? __filemap_fdatawrite_range+0x4c/0x60
[ 1560.256544]  [<ffffffff81121997>] ? filemap_write_and_wait_range+0x47/0x90
[ 1560.256549]  [<ffffffff811abf8d>] ? generic_file_fsync+0x2d/0xa0
[ 1560.256570]  [<ffffffffa018d3e3>] ? ext4_sync_file+0x153/0x300 [ext4]
[ 1560.256576]  [<ffffffff811b5be3>] ? do_fsync+0x53/0x90
[ 1560.256580]  [<ffffffff811b5e9b>] ? SyS_fsync+0xb/0x20
[ 1560.256586]  [<ffffffff814d4279>] ? system_call_fastpath+0x16/0x1b

3.13 + nosmp:
[  240.236653]  [<ffffffff81120410>] ? __lock_page+0x70/0x70
[  240.236660]  [<ffffffff814c96d8>] ? io_schedule+0x88/0xd0
[  240.236664]  [<ffffffff81120419>] ? sleep_on_page+0x9/0x10
[  240.236668]  [<ffffffff814c9c52>] ? __wait_on_bit+0x52/0x80
[  240.236672]  [<ffffffff81120adb>] ? find_get_pages_tag+0xcb/0x180
[  240.236676]  [<ffffffff81120533>] ? wait_on_page_bit+0x73/0x80
[  240.236681]  [<ffffffff8109c230>] ? wake_atomic_t_function+0x30/0x30
[  240.236685]  [<ffffffff81120610>] ? filemap_fdatawait_range+0xd0/0x150
[  240.236691]  [<ffffffff811b602b>] ? SyS_sync_file_range+0x15b/0x1a0
[  240.236696]  [<ffffffff814d4279>] ? system_call_fastpath+0x16/0x1b

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bcache deadlock with /dev/ramX + partition
  2014-02-20 18:23 bcache deadlock with /dev/ramX + partition Vivek Dasmohapatra
@ 2014-02-21 17:08 ` Max Kutsevol
  2014-02-21 18:23   ` Ryan Tucker
  2014-02-27 19:14 ` Vivek Dasmohapatra
  2014-02-28 19:06 ` Vivek Dasmohapatra
  2 siblings, 1 reply; 5+ messages in thread
From: Max Kutsevol @ 2014-02-21 17:08 UTC (permalink / raw)
  To: linux-bcache

Hello.

Try this patch: http://evilpiepirate.org/git/linux-bcache.git/commit/?
h=bcache-for-3.14&id=947174476701fbc84ea8c7ec9664270f9d80b076

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bcache deadlock with /dev/ramX + partition
  2014-02-21 17:08 ` Max Kutsevol
@ 2014-02-21 18:23   ` Ryan Tucker
  0 siblings, 0 replies; 5+ messages in thread
From: Ryan Tucker @ 2014-02-21 18:23 UTC (permalink / raw)
  To: Max Kutsevol; +Cc: linux-bcache@vger.kernel.org

On Fri, Feb 21, 2014 at 12:08 PM, Max Kutsevol <max@kutsevol.com> wrote:
> Hello.
>
> Try this patch: http://evilpiepirate.org/git/linux-bcache.git/commit/?
> h=bcache-for-3.14&id=947174476701fbc84ea8c7ec9664270f9d80b076

Hmm, from what I've seen, the problem fixed by that patch manifested
itself differently (bch_ptr_invalid at the top of the call trace, RIP
set to __bch_btree_mark_key), and I believe it was only present in
3.13.x where 0 <= x < 4.  Doesn't look like Vivek's runs with 3.13
lasted long enough to hit that bug, alas!  -rt

-- 
Ryan Tucker <rtucker@gmail.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bcache deadlock with /dev/ramX + partition
  2014-02-20 18:23 bcache deadlock with /dev/ramX + partition Vivek Dasmohapatra
  2014-02-21 17:08 ` Max Kutsevol
@ 2014-02-27 19:14 ` Vivek Dasmohapatra
  2014-02-28 19:06 ` Vivek Dasmohapatra
  2 siblings, 0 replies; 5+ messages in thread
From: Vivek Dasmohapatra @ 2014-02-27 19:14 UTC (permalink / raw)
  To: linux-bcache

To follow up: Brought up the same kernel but with PROVE_LOCKING configured,
and initialising the bcache set triggers the following lockdep warning:

------------[ cut here ]------------

WARNING: CPU: 0 PID: 3716 at /home/kernel/linux-3.12.9/kernel/lockdep.c:708
   __lock_acquire+0x1d0f/0x1d80()

Modules linked in: brd nfsd auth_rpcgss oid_registry nfs_acl nfs
lockd fscache sunrpc loop snd_pcm snd_page_alloc snd_timer snd
soundcore i2c_piix4 i2c_core evdev virtio_balloon processor
thermal_sys pcspkr button psmouse serio_raw ext4 crc16 mbcache
jbd2 bcache sg sd_mod crc_t10dif crct10dif_common ata_generic
floppy ata_piix libata virtio_pci virtio_ring 8139too scsi_mod
uhci_hcd ehci_hcd virtio 8139cp mii usbcore usb_common

CPU: 0
PID: 3716
Comm: bcache-register Not tainted 3.12-0.bpo.2-amd64 #1 Debian 3.12.9-1~bpo70+1

Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
  00000000000002c4 ffff880214baf798 ffffffff814fe566 ffff880214b8ad00
  0000000000000000 ffff880214baf7d8 ffffffff81061cd7 ffffffff8109ada1
  ffff8800dbbac8b0 ffff880214b8ad00 0000000000000000 0000000000000001

Call Trace:
  [<ffffffff814fe566>] dump_stack+0x46/0x58
  [<ffffffff81061cd7>] warn_slowpath_common+0x87/0xb0
  [<ffffffff8109ada1>] ? select_task_rq_fair+0x51/0x7f0
  [<ffffffff81061d15>] warn_slowpath_null+0x15/0x20
  [<ffffffff810cb1bf>] __lock_acquire+0x1d0f/0x1d80
  [<ffffffff810c97a3>] ? __lock_acquire+0x2f3/0x1d80
  [<ffffffff810cc164>] ? mark_held_locks+0x74/0x140
  [<ffffffff81505c0a>] ? _raw_spin_unlock_irqrestore+0x3a/0x70
  [<ffffffffa015a42e>] ? mca_reap+0x2e/0x1f0 [bcache]
  [<ffffffff810cb844>] lock_acquire+0x94/0x130
  [<ffffffffa015a42e>] ? mca_reap+0x2e/0x1f0 [bcache]
  [<ffffffff8108ae1e>] down_write_trylock+0x5e/0x70
  [<ffffffffa015a42e>] ? mca_reap+0x2e/0x1f0 [bcache]
  [<ffffffffa015a42e>] mca_reap+0x2e/0x1f0 [bcache]
  [<ffffffffa015a692>] mca_alloc+0xa2/0x560 [bcache]
  [<ffffffffa015bb72>] bch_btree_node_alloc+0xa2/0x2a0 [bcache]
  [<ffffffffa016ff19>] run_cache_set+0x769/0x960 [bcache]
  [<ffffffff812a531b>] ? idr_get_empty_slot+0x16b/0x3e0
  [<ffffffffa0170d34>] register_cache_set+0x2b4/0x2c0 [bcache]
  [<ffffffffa01712b1>] register_cache+0x571/0x7a0 [bcache]
  [<ffffffff81184154>] ? kmem_cache_alloc_trace+0x1b4/0x1e0
  [<ffffffffa0171b45>] ? register_bcache+0x665/0xec0 [bcache]
  [<ffffffffa0171b67>] register_bcache+0x687/0xec0 [bcache]
  [<ffffffff81210438>] ? sysfs_write_file+0xc8/0x160
  [<ffffffff81159aab>] ? might_fault+0x3b/0x90
  [<ffffffff812a653f>] kobj_attr_store+0xf/0x30
  [<ffffffff81210451>] sysfs_write_file+0xe1/0x160
  [<ffffffff8119b119>] vfs_write+0xc9/0x1e0
  [<ffffffff8119b600>] SyS_write+0x50/0xa0
  [<ffffffff8150e479>] system_call_fastpath+0x16/0x1b
---[ end trace 4a908a17006489d4 ]---

And the relevant part of lockdep.c says:

list_for_each_entry(class, hash_head, hash_entry) {
     if (class->key == key) {
     /*
      * Huh! same key, different name? Did someone trample
      * on some memory? We're most confused.
      */
     WARN_ON_ONCE(class->name != lock->name);

So it looks like there's something about the btree initialisation that
isn't quite kosher, maybe?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bcache deadlock with /dev/ramX + partition
  2014-02-20 18:23 bcache deadlock with /dev/ramX + partition Vivek Dasmohapatra
  2014-02-21 17:08 ` Max Kutsevol
  2014-02-27 19:14 ` Vivek Dasmohapatra
@ 2014-02-28 19:06 ` Vivek Dasmohapatra
  2 siblings, 0 replies; 5+ messages in thread
From: Vivek Dasmohapatra @ 2014-02-28 19:06 UTC (permalink / raw)
  To: linux-bcache

On Thu, 20 Feb 2014, Vivek Dasmohapatra wrote:

Update: Found an offset for sda11 which does not deadlock:

Deadlocking configurations:

   backing: sda11;  default offset from sda (NOT a multiple of 4 KiB)
   backing: sdb1;   default offset from sdb (!= 4 KiB)
   backing: sda11;  offset from sda is:
                    - a multiple of 4  KiB
                    - a multiple of 4  MiB
                    - a multplie of 16 MiB

Non-deadlocking configurations:

   backing: sdb
   backing: sdb1;   offset from start of sdb is 4096 B
   backing: sda11;  offset from start of sda is 78486397440 B
                    (153293745 sectors)
                    for reference, sda1's offset is 1048576 B (2048 sectors)

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-02-28 19:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-20 18:23 bcache deadlock with /dev/ramX + partition Vivek Dasmohapatra
2014-02-21 17:08 ` Max Kutsevol
2014-02-21 18:23   ` Ryan Tucker
2014-02-27 19:14 ` Vivek Dasmohapatra
2014-02-28 19:06 ` Vivek Dasmohapatra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox