* bcache deadlock with /dev/ramX + partition
@ 2014-02-20 18:23 Vivek Dasmohapatra
2014-02-21 17:08 ` Max Kutsevol
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Vivek Dasmohapatra @ 2014-02-20 18:23 UTC (permalink / raw)
To: linux-bcache
Hi: We've been using bcache to fashion an "overflow" ramdisc: We use
/dev/ramX as the cache, and an actual block device on spinning rust
as the backing device. This works pretty well, but we've bumped into
what appears to be a locking problem in the kernel:
When the backing device is a top level block device ( eg /dev/sdb )
everything seems to work just fine. When the backing device is a
partition ( eg /dev/sda11 ) processes seem to end up in the D state
when they fsync.
Under kernel 3.11.2 this can be triggered with a simple fsync. Under
kernels 3.12.9, 3.13 and 3.13 running with the nosmp command line
argument it is slightly harder to provoke ( a simple fsync won't always
trigger it but our chroot install script triggers it every time ).
Further investigation hinted at this being a memory alignment problem:
We haven't confirmed it yet, but certain offsets for partitions don't
trigger the deadlock:
[ In all cases it's an msdos partition table, the device is bcache0 and is
mounted under /srv which happens to be on sda10. The cache device is
always /dev/ram0 ]:
Deadlocking configurations:
backing: sda11; default offset from sda (NOT a multiple of 4 KiB)
backing: sdb1; default offset from sdb (!= 4 KiB)
backing: sda11; offset from sda is:
- a multiple of 4 KiB
- a multiple of 4 MiB
- a multplie of 16 MiB
Non-deadlocking configurations:
backing: sdb
backing: sdb1; offset from start of sdb is 4096 B
So, to sum up: a top level block device as a backing device never
seems to deadlock, some offsets for some partitions (ok, PAGE_SIZE
alignment for sdb1) _also_ do not deadlock. All other cases of
use of a partition have deadlocked so far.
I'm currently testiung under 3.12.9 and get the following trace from the
deadlocked process:
[82440.244111] INFO: task dpkg:14059 blocked for more than 120 seconds.
[82440.244141] Not tainted 3.12-0.bpo.1-amd64 #1
[82440.244153] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[82440.244167] dpkg D ffff88021fc14300 0 14059 14056 0x00000000
[82440.244173] ffff8800c0edd800 0000000000000086 0000000000000088 ffffffff81813480
[82440.244177] ffff8801d5d5bfd8 ffff8801d5d5bfd8 ffff8801d5d5bfd8 ffff8800c0edd800
[82440.244181] 0000000000000000 ffff88021fc14b40 ffff8800c0edd800 ffffffff8111f800
[82440.244185] Call Trace:
[82440.244195] [<ffffffff8111f800>] ? __lock_page+0x70/0x70
[82440.244201] [<ffffffff814c2797>] ? io_schedule+0x87/0xd0
[82440.244204] [<ffffffff8111f809>] ? sleep_on_page+0x9/0x10
[82440.244208] [<ffffffff814c00c2>] ? __wait_on_bit+0x52/0x80
[82440.244212] [<ffffffff8111f943>] ? wait_on_page_bit+0x73/0x80
[82440.244217] [<ffffffff81082d80>] ? wake_atomic_t_function+0x30/0x30
[82440.244220] [<ffffffff8111fa46>] ? filemap_fdatawait_range+0xf6/0x170
[82440.244225] [<ffffffff81121058>] ? filemap_write_and_wait_range+0x48/0x90
[82440.244230] [<ffffffff811a948d>] ? generic_file_fsync+0x2d/0xa0
[82440.244247] [<ffffffffa0159543>] ? ext4_sync_file+0x203/0x320 [ext4]
[82440.244251] [<ffffffff811b3298>] ? do_fsync+0x58/0x90
[82440.244255] [<ffffffff811b362b>] ? SyS_fsync+0xb/0x20
[82440.244259] [<ffffffff814cb7b9>] ? system_call_fastpath+0x16/0x1b
Other traces from previous tests:
3.11.2:
[501840.292105] Call Trace:
[501840.292141] [<ffffffff8110b270>] ? wait_on_page_read+0x60/0x60
[501840.292159] [<ffffffff814787e4>] ? io_schedule+0x94/0x120
[501840.292167] [<ffffffff8110b275>] ? sleep_on_page+0x5/0x10
[501840.292171] [<ffffffff81476824>] ? __wait_on_bit+0x54/0x80
[501840.292178] [<ffffffff8110b08f>] ? wait_on_page_bit+0x7f/0x90
[501840.292194] [<ffffffff81078c40>] ? wake_atomic_t_function+0x30/0x30
[501840.292207] [<ffffffff811175e8>] ? pagevec_lookup_tag+0x18/0x20
[501840.292211] [<ffffffff8110b178>] ? filemap_fdatawait_range+0xd8/0x150
[501840.292217] [<ffffffff8110c765>] ? filemap_write_and_wait_range+0x35/0x60
[501840.292229] [<ffffffff8118cd8b>] ? generic_file_fsync+0x1b/0x90
[501840.292259] [<ffffffffa01750ea>] ? ext4_sync_file+0x10a/0x2e0 [ext4]
[501840.292264] [<ffffffff8119515c>] ? do_fsync+0x4c/0x80
[501840.292267] [<ffffffff811953d7>] ? SyS_fsync+0x7/0x10
[501840.292275] [<ffffffff81481de9>] ? system_call_fastpath+0x16/0x1b
[501960.292133] INFO: task dpkg:17778 blocked for more than 120 seconds.
3.13:
[ 1560.256491] Call Trace:
[ 1560.256502] [<ffffffff81120410>] ? __lock_page+0x70/0x70
[ 1560.256509] [<ffffffff814c96d8>] ? io_schedule+0x88/0xd0
[ 1560.256513] [<ffffffff81120419>] ? sleep_on_page+0x9/0x10
[ 1560.256517] [<ffffffff814c9c52>] ? __wait_on_bit+0x52/0x80
[ 1560.256521] [<ffffffff81120adb>] ? find_get_pages_tag+0xcb/0x180
[ 1560.256526] [<ffffffff81120533>] ? wait_on_page_bit+0x73/0x80
[ 1560.256531] [<ffffffff8109c230>] ? wake_atomic_t_function+0x30/0x30
[ 1560.256535] [<ffffffff81120610>] ? filemap_fdatawait_range+0xd0/0x150
[ 1560.256540] [<ffffffff8112193c>] ? __filemap_fdatawrite_range+0x4c/0x60
[ 1560.256544] [<ffffffff81121997>] ? filemap_write_and_wait_range+0x47/0x90
[ 1560.256549] [<ffffffff811abf8d>] ? generic_file_fsync+0x2d/0xa0
[ 1560.256570] [<ffffffffa018d3e3>] ? ext4_sync_file+0x153/0x300 [ext4]
[ 1560.256576] [<ffffffff811b5be3>] ? do_fsync+0x53/0x90
[ 1560.256580] [<ffffffff811b5e9b>] ? SyS_fsync+0xb/0x20
[ 1560.256586] [<ffffffff814d4279>] ? system_call_fastpath+0x16/0x1b
3.13 + nosmp:
[ 240.236653] [<ffffffff81120410>] ? __lock_page+0x70/0x70
[ 240.236660] [<ffffffff814c96d8>] ? io_schedule+0x88/0xd0
[ 240.236664] [<ffffffff81120419>] ? sleep_on_page+0x9/0x10
[ 240.236668] [<ffffffff814c9c52>] ? __wait_on_bit+0x52/0x80
[ 240.236672] [<ffffffff81120adb>] ? find_get_pages_tag+0xcb/0x180
[ 240.236676] [<ffffffff81120533>] ? wait_on_page_bit+0x73/0x80
[ 240.236681] [<ffffffff8109c230>] ? wake_atomic_t_function+0x30/0x30
[ 240.236685] [<ffffffff81120610>] ? filemap_fdatawait_range+0xd0/0x150
[ 240.236691] [<ffffffff811b602b>] ? SyS_sync_file_range+0x15b/0x1a0
[ 240.236696] [<ffffffff814d4279>] ? system_call_fastpath+0x16/0x1b
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: bcache deadlock with /dev/ramX + partition
2014-02-20 18:23 bcache deadlock with /dev/ramX + partition Vivek Dasmohapatra
@ 2014-02-21 17:08 ` Max Kutsevol
2014-02-21 18:23 ` Ryan Tucker
2014-02-27 19:14 ` Vivek Dasmohapatra
2014-02-28 19:06 ` Vivek Dasmohapatra
2 siblings, 1 reply; 5+ messages in thread
From: Max Kutsevol @ 2014-02-21 17:08 UTC (permalink / raw)
To: linux-bcache
Hello.
Try this patch: http://evilpiepirate.org/git/linux-bcache.git/commit/?
h=bcache-for-3.14&id=947174476701fbc84ea8c7ec9664270f9d80b076
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: bcache deadlock with /dev/ramX + partition
2014-02-21 17:08 ` Max Kutsevol
@ 2014-02-21 18:23 ` Ryan Tucker
0 siblings, 0 replies; 5+ messages in thread
From: Ryan Tucker @ 2014-02-21 18:23 UTC (permalink / raw)
To: Max Kutsevol; +Cc: linux-bcache@vger.kernel.org
On Fri, Feb 21, 2014 at 12:08 PM, Max Kutsevol <max@kutsevol.com> wrote:
> Hello.
>
> Try this patch: http://evilpiepirate.org/git/linux-bcache.git/commit/?
> h=bcache-for-3.14&id=947174476701fbc84ea8c7ec9664270f9d80b076
Hmm, from what I've seen, the problem fixed by that patch manifested
itself differently (bch_ptr_invalid at the top of the call trace, RIP
set to __bch_btree_mark_key), and I believe it was only present in
3.13.x where 0 <= x < 4. Doesn't look like Vivek's runs with 3.13
lasted long enough to hit that bug, alas! -rt
--
Ryan Tucker <rtucker@gmail.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: bcache deadlock with /dev/ramX + partition
2014-02-20 18:23 bcache deadlock with /dev/ramX + partition Vivek Dasmohapatra
2014-02-21 17:08 ` Max Kutsevol
@ 2014-02-27 19:14 ` Vivek Dasmohapatra
2014-02-28 19:06 ` Vivek Dasmohapatra
2 siblings, 0 replies; 5+ messages in thread
From: Vivek Dasmohapatra @ 2014-02-27 19:14 UTC (permalink / raw)
To: linux-bcache
To follow up: Brought up the same kernel but with PROVE_LOCKING configured,
and initialising the bcache set triggers the following lockdep warning:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 3716 at /home/kernel/linux-3.12.9/kernel/lockdep.c:708
__lock_acquire+0x1d0f/0x1d80()
Modules linked in: brd nfsd auth_rpcgss oid_registry nfs_acl nfs
lockd fscache sunrpc loop snd_pcm snd_page_alloc snd_timer snd
soundcore i2c_piix4 i2c_core evdev virtio_balloon processor
thermal_sys pcspkr button psmouse serio_raw ext4 crc16 mbcache
jbd2 bcache sg sd_mod crc_t10dif crct10dif_common ata_generic
floppy ata_piix libata virtio_pci virtio_ring 8139too scsi_mod
uhci_hcd ehci_hcd virtio 8139cp mii usbcore usb_common
CPU: 0
PID: 3716
Comm: bcache-register Not tainted 3.12-0.bpo.2-amd64 #1 Debian 3.12.9-1~bpo70+1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
00000000000002c4 ffff880214baf798 ffffffff814fe566 ffff880214b8ad00
0000000000000000 ffff880214baf7d8 ffffffff81061cd7 ffffffff8109ada1
ffff8800dbbac8b0 ffff880214b8ad00 0000000000000000 0000000000000001
Call Trace:
[<ffffffff814fe566>] dump_stack+0x46/0x58
[<ffffffff81061cd7>] warn_slowpath_common+0x87/0xb0
[<ffffffff8109ada1>] ? select_task_rq_fair+0x51/0x7f0
[<ffffffff81061d15>] warn_slowpath_null+0x15/0x20
[<ffffffff810cb1bf>] __lock_acquire+0x1d0f/0x1d80
[<ffffffff810c97a3>] ? __lock_acquire+0x2f3/0x1d80
[<ffffffff810cc164>] ? mark_held_locks+0x74/0x140
[<ffffffff81505c0a>] ? _raw_spin_unlock_irqrestore+0x3a/0x70
[<ffffffffa015a42e>] ? mca_reap+0x2e/0x1f0 [bcache]
[<ffffffff810cb844>] lock_acquire+0x94/0x130
[<ffffffffa015a42e>] ? mca_reap+0x2e/0x1f0 [bcache]
[<ffffffff8108ae1e>] down_write_trylock+0x5e/0x70
[<ffffffffa015a42e>] ? mca_reap+0x2e/0x1f0 [bcache]
[<ffffffffa015a42e>] mca_reap+0x2e/0x1f0 [bcache]
[<ffffffffa015a692>] mca_alloc+0xa2/0x560 [bcache]
[<ffffffffa015bb72>] bch_btree_node_alloc+0xa2/0x2a0 [bcache]
[<ffffffffa016ff19>] run_cache_set+0x769/0x960 [bcache]
[<ffffffff812a531b>] ? idr_get_empty_slot+0x16b/0x3e0
[<ffffffffa0170d34>] register_cache_set+0x2b4/0x2c0 [bcache]
[<ffffffffa01712b1>] register_cache+0x571/0x7a0 [bcache]
[<ffffffff81184154>] ? kmem_cache_alloc_trace+0x1b4/0x1e0
[<ffffffffa0171b45>] ? register_bcache+0x665/0xec0 [bcache]
[<ffffffffa0171b67>] register_bcache+0x687/0xec0 [bcache]
[<ffffffff81210438>] ? sysfs_write_file+0xc8/0x160
[<ffffffff81159aab>] ? might_fault+0x3b/0x90
[<ffffffff812a653f>] kobj_attr_store+0xf/0x30
[<ffffffff81210451>] sysfs_write_file+0xe1/0x160
[<ffffffff8119b119>] vfs_write+0xc9/0x1e0
[<ffffffff8119b600>] SyS_write+0x50/0xa0
[<ffffffff8150e479>] system_call_fastpath+0x16/0x1b
---[ end trace 4a908a17006489d4 ]---
And the relevant part of lockdep.c says:
list_for_each_entry(class, hash_head, hash_entry) {
if (class->key == key) {
/*
* Huh! same key, different name? Did someone trample
* on some memory? We're most confused.
*/
WARN_ON_ONCE(class->name != lock->name);
So it looks like there's something about the btree initialisation that
isn't quite kosher, maybe?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: bcache deadlock with /dev/ramX + partition
2014-02-20 18:23 bcache deadlock with /dev/ramX + partition Vivek Dasmohapatra
2014-02-21 17:08 ` Max Kutsevol
2014-02-27 19:14 ` Vivek Dasmohapatra
@ 2014-02-28 19:06 ` Vivek Dasmohapatra
2 siblings, 0 replies; 5+ messages in thread
From: Vivek Dasmohapatra @ 2014-02-28 19:06 UTC (permalink / raw)
To: linux-bcache
On Thu, 20 Feb 2014, Vivek Dasmohapatra wrote:
Update: Found an offset for sda11 which does not deadlock:
Deadlocking configurations:
backing: sda11; default offset from sda (NOT a multiple of 4 KiB)
backing: sdb1; default offset from sdb (!= 4 KiB)
backing: sda11; offset from sda is:
- a multiple of 4 KiB
- a multiple of 4 MiB
- a multplie of 16 MiB
Non-deadlocking configurations:
backing: sdb
backing: sdb1; offset from start of sdb is 4096 B
backing: sda11; offset from start of sda is 78486397440 B
(153293745 sectors)
for reference, sda1's offset is 1048576 B (2048 sectors)
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-02-28 19:06 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-20 18:23 bcache deadlock with /dev/ramX + partition Vivek Dasmohapatra
2014-02-21 17:08 ` Max Kutsevol
2014-02-21 18:23 ` Ryan Tucker
2014-02-27 19:14 ` Vivek Dasmohapatra
2014-02-28 19:06 ` Vivek Dasmohapatra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox