* Deadlock on 3.18.5 @ 2015-02-04 23:05 Juergen Fitschen 2015-02-05 4:03 ` Duncan 2015-02-05 11:04 ` Juergen Fitschen 0 siblings, 2 replies; 6+ messages in thread From: Juergen Fitschen @ 2015-02-04 23:05 UTC (permalink / raw) To: linux-btrfs Hey, I just got a deadlock on Linux 3.18.5. It occurred while processing a some data: All in all reading from the btrfs volume, decompressing it (the bottleneck in this process), doing some calculations and finally writing the results back to a new file on the volume. I am using a freshly installed Ubuntu 14.04 in a virtual machine. The block device used for btrfs is a LVM volume offered by the KVM hypervisor. Unfortunately I was using an old version of my user space tools (Version 3.12-1), so it might be related to that? The volume was mounted with the following mount options: > default,compress-force=lzo,subvol=data Related kernel stack trace: > Feb 4 22:00:41 qdock03 kernel: [22560.284120] INFO: task btrfs-transacti:508 blocked for more than 120 seconds. > Feb 4 22:00:41 qdock03 kernel: [22560.284200] Tainted: G W 3.18.5-031805-generic #201501292218 > Feb 4 22:00:41 qdock03 kernel: [22560.284267] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Feb 4 22:00:41 qdock03 kernel: [22560.284339] btrfs-transacti D 0000000000000001 0 508 2 0x00000000 > Feb 4 22:00:41 qdock03 kernel: [22560.284345] ffff881ece617b28 0000000000000046 ffffffffc0136e90 00000000008bee9a > Feb 4 22:00:41 qdock03 kernel: [22560.284349] ffff881ece617fd8 0000000000013640 ffff881ece095b00 0000000000013640 > Feb 4 22:00:41 qdock03 kernel: [22560.284352] ffff881ed0638000 ffff881ecfdf5000 ffff881ece617b98 ffff881f79c33f40 > Feb 4 22:00:41 qdock03 kernel: [22560.284356] Call Trace: > Feb 4 22:00:41 qdock03 kernel: [22560.284397] [<ffffffffc0136e90>] ? btree_io_failed_hook+0x60/0x60 [btrfs] > Feb 4 22:00:41 qdock03 kernel: [22560.284406] [<ffffffff817b0750>] ? bit_wait_timeout+0x80/0x80 > Feb 4 22:00:41 qdock03 kernel: [22560.284411] [<ffffffff817afd29>] schedule+0x29/0x70 > Feb 4 22:00:41 qdock03 kernel: [22560.284415] [<ffffffff817afdff>] io_schedule+0x8f/0xd0 > Feb 4 22:00:41 qdock03 kernel: [22560.284419] [<ffffffff817b077c>] bit_wait_io+0x2c/0x50 > Feb 4 22:00:41 qdock03 kernel: [22560.284423] [<ffffffff817b04e5>] __wait_on_bit+0x65/0x90 > Feb 4 22:00:41 qdock03 kernel: [22560.284428] [<ffffffff8117888b>] ? find_get_pages_tag+0xcb/0x170 > Feb 4 22:00:41 qdock03 kernel: [22560.284449] [<ffffffffc015ce8e>] ? clear_state_bit+0xae/0x170 [btrfs] > Feb 4 22:00:41 qdock03 kernel: [22560.284455] [<ffffffff81177e37>] wait_on_page_bit+0xc7/0xd0 > Feb 4 22:00:41 qdock03 kernel: [22560.284460] [<ffffffff810b4cf0>] ? wake_atomic_t_function+0x40/0x40 > Feb 4 22:00:41 qdock03 kernel: [22560.284465] [<ffffffff81178004>] filemap_fdatawait_range+0xf4/0x180 > Feb 4 22:00:41 qdock03 kernel: [22560.284483] [<ffffffffc013f41f>] btrfs_wait_marked_extents+0x8f/0x140 [btrfs] > Feb 4 22:00:41 qdock03 kernel: [22560.284498] [<ffffffffc013f532>] btrfs_write_and_wait_transaction+0x62/0xb0 [btrfs] > Feb 4 22:00:41 qdock03 kernel: [22560.284514] [<ffffffffc013fe04>] btrfs_commit_transaction+0x734/0xae0 [btrfs] > Feb 4 22:00:41 qdock03 kernel: [22560.284528] [<ffffffffc013d4d5>] transaction_kthread+0x1d5/0x250 [btrfs] > Feb 4 22:00:41 qdock03 kernel: [22560.284543] [<ffffffffc013d300>] ? open_ctree+0x1f50/0x1f50 [btrfs] > Feb 4 22:00:41 qdock03 kernel: [22560.284548] [<ffffffff81093a79>] kthread+0xc9/0xe0 > Feb 4 22:00:41 qdock03 kernel: [22560.284552] [<ffffffff810939b0>] ? flush_kthread_worker+0x90/0x90 > Feb 4 22:00:41 qdock03 kernel: [22560.284556] [<ffffffff817b3ebc>] ret_from_fork+0x7c/0xb0 > Feb 4 22:00:41 qdock03 kernel: [22560.284559] [<ffffffff810939b0>] ? flush_kthread_worker+0x90/0x90 Regards, Juergen ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Deadlock on 3.18.5 2015-02-04 23:05 Deadlock on 3.18.5 Juergen Fitschen @ 2015-02-05 4:03 ` Duncan 2015-02-05 11:04 ` Juergen Fitschen 1 sibling, 0 replies; 6+ messages in thread From: Duncan @ 2015-02-05 4:03 UTC (permalink / raw) To: linux-btrfs Juergen Fitschen posted on Thu, 05 Feb 2015 00:05:33 +0100 as excerpted: > I just got a deadlock on Linux 3.18.5. [...] > I am using a freshly installed Ubuntu 14.04 in a virtual machine. The > block device used for btrfs is a LVM volume offered by the KVM > hypervisor. > Unfortunately I was using an old version of my user space tools (Version > 3.12-1), so it might be related to that? I know little to nothing about the complexities of btrfs in a VM so won't attempt an answer there. But about userspace specifically I can say this... For normal operation, btrfs userspace doesn't matter so much. It's the kernel that's vital as userspace mostly simply tells the kernel what to do at a relatively high level (mount/umount/snapshot/delete-snapshot/etc) and the kernel code handles pretty much everything at the more critical lower levels. OTOH, if there are problems and you are trying to fix them using userspace tools such as btrfs check, restore, etc, /that/ is when the /userspace/ version needs to be current, as it's then userspace doing the low-level stuff as well. Since it was during normal operations that this deadlock occurred, it's almost certainly a kernel problem (unless the problem is ultimately traced to a bug in the original mkfs.btrfs that created the filesystem, or the like), and that was current, so it's it's a current problem. =:^( Hopefully a dev or other regular with more VM experience will be along to help you further, but I could deal with this much, so I did. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Deadlock on 3.18.5 2015-02-04 23:05 Deadlock on 3.18.5 Juergen Fitschen 2015-02-05 4:03 ` Duncan @ 2015-02-05 11:04 ` Juergen Fitschen 2015-02-05 12:47 ` Austin S Hemmelgarn 1 sibling, 1 reply; 6+ messages in thread From: Juergen Fitschen @ 2015-02-05 11:04 UTC (permalink / raw) To: linux-btrfs Hey, It’s me again. First of all: Thanks for the reply, Duncan :) After detecting the deadlock und posting the stack trace yesterday evening, I left the machine alone and didn’t rebooted it. The monitoring told me that the whole server (including the hypervisor) became unreachable a few minutes after I fetched the stack trace from syslog. But now - as I just realised - the server is back alive, the process finished successfully and the volume is accessible. So the deadlock was not a real deadlock or resolved itself magically without me doing anything. The syslog reports several stack traces from kworker for about 2 hours (the time the server was not reachable). Within the first hour syslog was completely silent. Within the second hour the kernel complains about that rcu_sched dected stalls on the CPU in an interval of 3 minutes. What do you think? Regards, Juergen ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Deadlock on 3.18.5 2015-02-05 11:04 ` Juergen Fitschen @ 2015-02-05 12:47 ` Austin S Hemmelgarn 2015-02-05 15:24 ` Juergen Fitschen 0 siblings, 1 reply; 6+ messages in thread From: Austin S Hemmelgarn @ 2015-02-05 12:47 UTC (permalink / raw) To: Juergen Fitschen, linux-btrfs On 2015-02-05 06:04, Juergen Fitschen wrote: > Hey, > > It’s me again. > First of all: Thanks for the reply, Duncan :) > > After detecting the deadlock und posting the stack trace yesterday evening, I left the machine alone and didn’t rebooted it. The monitoring told me that the whole server (including the hypervisor) became unreachable a few minutes after I fetched the stack trace from syslog. > > But now - as I just realised - the server is back alive, the process finished successfully and the volume is accessible. So the deadlock was not a real deadlock or resolved itself magically without me doing anything. The syslog reports several stack traces from kworker for about 2 hours (the time the server was not reachable). Within the first hour syslog was completely silent. Within the second hour the kernel complains about that rcu_sched dected stalls on the CPU in an interval of 3 minutes. > > What do you think? I've actually seen similar behavior without the virtualization when doing large filesystem intensive operations with compression enabled. I don't know if this is significant, but it seems to be worse with lzo compression than zlib, and also seems to be worse when compression is enabled at the filesystem level instead of through 'chattr +c'. I'm not certain, but I think it might have something to do with the somewhat brain-dead default parameters in the default I/O scheduler (the so-called 'completely fair queue', which as I've said before was obviously named by a mathematician and not based on it's actual behavior), although it seems to be much worse when using the Deadline and no-op I/O schedulers. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Deadlock on 3.18.5 2015-02-05 12:47 ` Austin S Hemmelgarn @ 2015-02-05 15:24 ` Juergen Fitschen 2015-02-05 16:48 ` Austin S Hemmelgarn 0 siblings, 1 reply; 6+ messages in thread From: Juergen Fitschen @ 2015-02-05 15:24 UTC (permalink / raw) To: Austin S Hemmelgarn, linux-btrfs > On 05 Feb 2015, at 13:47, Austin S Hemmelgarn <ahferroin7@gmail.com> wrote: > > I've actually seen similar behavior without the virtualization when doing large filesystem intensive operations with compression enabled. > I don't know if this is significant, but it seems to be worse with lzo compression than zlib, and also seems to be worse when compression is enabled at the filesystem level instead of through 'chattr +c’. Zlib isn’t that performant compared to lzo. So zlib creates a bottleneck at the CPU and thereby limits the IO the volume is exposed to. So our problem might be related to intensive operations on the volume. > I'm not certain, but I think it might have something to do with the somewhat brain-dead default parameters in the default I/O scheduler (the so-called 'completely fair queue', which as I've said before was obviously named by a mathematician and not based on it's actual behavior), although it seems to be much worse when using the Deadline and no-op I/O schedulers. Good idea. I had a look to my configuration of the “stack” for the block devices and their queuing and caching. My setup looks like this (with default settings - I made no adjustments): * 2 HDDs * Hewlett-Packard Company Smart Array Gen8 Controllers (rev 01) [With 1GB write cache. Other black magic seems to be included. Combines both HDDs to a RAID1] * Block device driver * IO Scheduler: deadline * LVM * QEMU [With writeback cache. Should I change it to “none"? The storage controller has write cache included.] * virtio-blk * btrfs As you can see, only one IO scheduler is involved. The VM by default seems not to use any IO schedulers. I checked this by executing “cat /sys/block/vd*/queue/scheduler” on the VM and it reported “none”. Regards, Juergen ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Deadlock on 3.18.5 2015-02-05 15:24 ` Juergen Fitschen @ 2015-02-05 16:48 ` Austin S Hemmelgarn 0 siblings, 0 replies; 6+ messages in thread From: Austin S Hemmelgarn @ 2015-02-05 16:48 UTC (permalink / raw) To: Juergen Fitschen, linux-btrfs On 2015-02-05 10:24, Juergen Fitschen wrote: > >> On 05 Feb 2015, at 13:47, Austin S Hemmelgarn <ahferroin7@gmail.com> wrote: >> >> I've actually seen similar behavior without the virtualization when doing large filesystem intensive operations with compression enabled. >> I don't know if this is significant, but it seems to be worse with lzo compression than zlib, and also seems to be worse when compression is enabled at the filesystem level instead of through 'chattr +c’. > Zlib isn’t that performant compared to lzo. So zlib creates a bottleneck at the CPU and thereby limits the IO the volume is exposed to. So our problem might be related to intensive operations on the volume. > >> I'm not certain, but I think it might have something to do with the somewhat brain-dead default parameters in the default I/O scheduler (the so-called 'completely fair queue', which as I've said before was obviously named by a mathematician and not based on it's actual behavior), although it seems to be much worse when using the Deadline and no-op I/O schedulers. > Good idea. I had a look to my configuration of the “stack” for the block devices and their queuing and caching. My setup looks like this (with default settings - I made no adjustments): > > * 2 HDDs > * Hewlett-Packard Company Smart Array Gen8 Controllers (rev 01) > [With 1GB write cache. Other black magic seems to be included. Combines both HDDs to a RAID1] > * Block device driver > * IO Scheduler: deadline > * LVM > * QEMU > [With writeback cache. Should I change it to “none"? The storage controller has write cache included.] > * virtio-blk > * btrfs > > As you can see, only one IO scheduler is involved. The VM by default seems not to use any IO schedulers. I checked this by executing “cat /sys/block/vd*/queue/scheduler” on the VM and it reported “none”. Yeah, thankfully Linux is smart enough to turn off the I/O scheduler for block devices that it can see are virtualized. At the very least, I would suggest changing QEMU to not use caching. I've found that host-side caching for virtualized block devices tends to just make things slower unless the block device is imported over the netowkr (ie, iSCSI/ATAoE/NBD). This is especially significant when you have a storage controller with such a big write-cache (I would make sure that the write-cache on the storage controller is non-volatile first though, if it isn't you should probably use writethrough mode for QEMU's caching). Additionally, you might want to try using CFQ for the I/O scheduler on the host side, albeit with some non-default parameters (the deadline scheduler tends to get very laggy with really heavy random-access workloads). I've found that it does do well when you actually take the time to fine tune things. The particular parameters I would suggest some experimentation with for CFQ are: * Under /sys/block/<device>/queue: nomerges, rq_affinity, max_sectors_kb * Under /sys/block/<device>/queue/iosched: group_idle, quantum, slice_idle, back_seek_max, back_seek_penalty There is good information on what each of these does in the kernel sources under Documentation/block/queue-sysfs.txt and Documentation/block/cfq-iosched.txt Once you find a set of parameters that work well, I'd suggest writing some simple udev rules to automatically set them on boot/device enumeration. FWIW, I've found that the following parameters provide near optimal performance for the SSD in my laptop: queue/nomerges=1 queue/rq_affinity=2 queue/max_sectors_kb=16387 (16MB, which is 4x the erase-block size on the SSD) queue/iosched/group_idle=8 queue/iosched/quantum=128 (this also happens to be equal to the device's NCQ queue depth) queue/iosched/slice_idle=0 Using these settings, the time from the boot-loader handing off execution to the kernel to having a login prompt is about 45 seconds. With the default CFQ parameters, it takes almost 150 seconds, so fine tuning here can provide a very noticeable performance improvement. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-02-05 16:48 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-02-04 23:05 Deadlock on 3.18.5 Juergen Fitschen 2015-02-05 4:03 ` Duncan 2015-02-05 11:04 ` Juergen Fitschen 2015-02-05 12:47 ` Austin S Hemmelgarn 2015-02-05 15:24 ` Juergen Fitschen 2015-02-05 16:48 ` Austin S Hemmelgarn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).