From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f172.google.com ([209.85.213.172]:37441 "EHLO mail-ig0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932651AbcA1M2W (ORCPT ); Thu, 28 Jan 2016 07:28:22 -0500 Received: by mail-ig0-f172.google.com with SMTP id h5so10830697igh.0 for ; Thu, 28 Jan 2016 04:28:22 -0800 (PST) Subject: Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? To: Chris Murphy References: <56A230C3.3080100@netcologne.de> <56A6082C.3030007@netcologne.de> <56A73460.7080100@netcologne.de> <56A7CF97.6030408@gmail.com> <56A88452.6020306@netcologne.de> <56A8F18E.3070400@gmail.com> Cc: Christian Rohmann , linux-btrfs From: "Austin S. Hemmelgarn" Message-ID: <56AA091C.5090102@gmail.com> Date: Thu, 28 Jan 2016 07:27:08 -0500 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 2016-01-27 16:53, Chris Murphy wrote: > On Wed, Jan 27, 2016 at 9:34 AM, Austin S. Hemmelgarn > wrote: > >> Hmm, I did some automated testing in a couple of VM's last night, and I have >> to agree, this _really_ needs to get optimized. Using the same data-set on >> otherwise identical VM's, I saw an average 28x slowdown (best case was 16x, >> worst was almost 100x) for balancing a RAID6 set versus a RAID1 set. While >> the parity computations add to the time, there is absolutely no way that >> just that can explain why this is taking so long. The closest comparison >> using MD or DM RAID is probably a full verification of the array, and the >> greatest difference there that I've seen is around 10x. > > I can't exactly reproduce this. I'm using +C qcow2 on Btrfs on one SSD > to back the drives in the VM. In my case I was using a set of 8 thinly-provisioned 256G (virtual size) LVM volumes exposed directly to a Xen VM as virtual block devices, physically backed by traditional hard drives. For both tests, I used a filesystem spanning all the disks which had a lot of sparse files, and had had a lot of data chunks forced allocated and then made almost empty. I made a point to use snapshots to ensure that the filesystem itself was not a variable in this. It's probably worth noting that the system I ran this on does have other VM's running at the same time on the same physical CPU's, but we need to plan for that use case also. > > 2x btrfs raid1 with files totalling 5G consistently takes ~1 minute > [1] to balance (no filters) Similar times here. > > 4x btrfs raid6 with the same files *inconsistently* takes ~1m15s [2] > to balance (no filters) On this I was literally getting around 30 minutes on average, with one case where it only took 16, and one where it took 97. On both configurations, I did 12 runs total. > iotop is all over the place, from 21MB/s writes to 527MB/s Similar results with iotop, with values ranging from 2MB/s up to spikes of 100MB/s (which is about 150% of the measured streaming write speed from the VM going straight to the virtual disk). > > > Do both of you get something like this: > [root@f23m ~]# dmesg | grep -i raid > [ 1.518682] raid6: sse2x1 gen() 4531 MB/s > [ 1.535663] raid6: sse2x1 xor() 3783 MB/s > [ 1.552683] raid6: sse2x2 gen() 10140 MB/s > [ 1.569658] raid6: sse2x2 xor() 7306 MB/s > [ 1.586673] raid6: sse2x4 gen() 11261 MB/s > [ 1.603683] raid6: sse2x4 xor() 7009 MB/s > [ 1.603685] raid6: using algorithm sse2x4 gen() 11261 MB/s > [ 1.603686] raid6: .... xor() 7009 MB/s, rmw enabled > [ 1.603687] raid6: using ssse3x2 recovery algorithm My system picks avx2x4, which supposedly gets 6.6 GB/s on this hardware, although I've never seen any raid recovery, even on RAM disks, manage that kind of computational throughput. > > > > [1] Did it 3 times > 1m8 > 0m58 > 0m40 > > [2] Did this multiple times > 1m15s > 0m55s > 0m49s > And then from that point all attempts were 2+m, but never more than > 2m29s. I'm not sure why, but there were a lot of drop outs in iotop > where it'd go to 0MB/s for a couple seconds. I captured some sysrq+t > for this. I saw similar drops in IO performance as well, although I didn't get any traces for it. > > https://drive.google.com/open?id=0B_2Asp8DGjJ9SE5ZNTBGQUV1ZUk > >