Very slow balance / btrfs-transaction

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Very slow balance / btrfs-transaction
@ 2017-02-03 22:13 jb
  2017-02-03 23:25 ` Goldwyn Rodrigues
  2017-02-04  0:30 ` Jorg Bornschein
  0 siblings, 2 replies; 22+ messages in thread
From: jb @ 2017-02-03 22:13 UTC (permalink / raw)
  To: linux-btrfs

Hi, 


I'm currently running a balance (without any filters) on a 4 drives raid1 filesystem. The array contains 3 3TB drives and one 6TB drive; I'm running the rebalance because the 6TB drive recently replaced a 2TB drive. 


I know that balance is not supposed to be a fast operation, but this one is now running for ~6 days and it managed to balance ~18% (754 out of about 4250 chunks balanced (755 considered),  82% left) -- so I expect it to take another ~4 weeks. 

That seems excessively slow for ~8TiB of data.


Is this expected behavior? In case it's not: Is there anything I can do to help debug it?


The 4 individual devices are bcache devices with currently no ssd cache partition attached; the bcache backing devices sit ontop of luks encrypted devices. Maybe a few words about the history of this fs: It used to be a 1 drive btrfs ontop of a bcache partition with a 30GiB SSD cache (actively used for >1 year). During the last month, I gradually added devices (always with active bcaches). At some point, after adding the 4th device, I deactivated (detached) the bcache caching device and instead activated raid1 for data and metadata and ran a rebalance (which was reasonably fast -- I don't remember how fast exactly, but probably <24h). The finaly steps that lead to the current situation: I activated "nossd" and replaced the smallest device with "btrfs dev replace" (which was also reasonabley fast, <12h).

 

Best & thanks, 

   j


--
[joerg@dorsal ~]$ lsblk
NAME            MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda               8:0    0 111.8G  0 disk
├─sda1            8:1    0     1G  0 part  /boot
└─sda2            8:2    0 110.8G  0 part
  └─crypted     254:0    0 110.8G  0 crypt
    ├─ssd-root  254:1    0  72.8G  0 lvm   /
    ├─ssd-swap  254:2    0     8G  0 lvm   [SWAP]
    └─ssd-cache 254:3    0    30G  0 lvm
sdb               8:16   0   2.7T  0 disk
└─sdb1            8:17   0   2.7T  0 part
  └─crypted-sdb 254:7    0   2.7T  0 crypt
    └─bcache2   253:2    0   2.7T  0 disk
sdc               8:32   0   2.7T  0 disk
└─sdc1            8:33   0   2.7T  0 part
  └─crypted-sdc 254:4    0   2.7T  0 crypt
    └─bcache1   253:1    0   2.7T  0 disk
sdd               8:48   0   2.7T  0 disk
└─sdd1            8:49   0   2.7T  0 part
  └─crypted-sdd 254:6    0   2.7T  0 crypt
    └─bcache0   253:0    0   2.7T  0 disk
sde               8:64   0   5.5T  0 disk
└─sde1            8:65   0   5.5T  0 part
  └─crypted-sde 254:5    0   5.5T  0 crypt
    └─bcache3   253:3    0   5.5T  0 disk  /storage
--

joerg@dorsal ~]$ sudo btrfs  fi usage -h /storage/
Overall:
    Device size:                  13.64TiB
    Device allocated:              8.35TiB
    Device unallocated:            5.29TiB
    Device missing:                  0.00B
    Used:                          8.34TiB
    Free (estimated):              2.65TiB      (min: 2.65TiB)
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 15.77MiB)

Data,RAID1: Size:4.17TiB, Used:4.16TiB
   /dev/bcache0    2.38TiB
   /dev/bcache1    2.37TiB
   /dev/bcache2    2.38TiB
   /dev/bcache3    1.20TiB

Metadata,RAID1: Size:9.00GiB, Used:7.49GiB
   /dev/bcache1    8.00GiB
   /dev/bcache2    1.00GiB
   /dev/bcache3    9.00GiB

System,RAID1: Size:32.00MiB, Used:624.00KiB
   /dev/bcache1   32.00MiB
   /dev/bcache3   32.00MiB

Unallocated:
   /dev/bcache0  355.52GiB
   /dev/bcache1  356.49GiB
   /dev/bcache2  355.52GiB
   /dev/bcache3    4.25TiB
  
--
[joerg@dorsal ~]$ ps -xal | grep btrfs
1     0   227     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-worker]
1     0   229     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-worker-hi]
1     0   230     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-delalloc]
1     0   231     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-flush_del]
1     0   232     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-cache]
1     0   233     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-submit]
1     0   234     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-fixup]
1     0   235     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-endio]
1     0   236     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-endio-met]
1     0   237     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-endio-met]
1     0   238     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-endio-rai]
1     0   239     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-endio-rep]
1     0   240     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-rmw]
1     0   241     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-endio-wri]
1     0   242     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-freespace]
1     0   243     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-delayed-m]
1     0   244     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-readahead]
1     0   245     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-qgroup-re]
1     0   246     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-extent-re]
1     0   247     2  20   0      0     0 -      S    ?          0:00 [btrfs-cleaner]
1     0   248     2  20   0      0     0 -      S    ?          0:30 [btrfs-transacti]
1     0  2283     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-worker]
1     0  2285     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-worker-hi]
1     0  2286     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-delalloc]
1     0  2287     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-flush_del]
1     0  2288     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-cache]
1     0  2289     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-submit]
1     0  2290     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-fixup]
1     0  2291     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-endio]
1     0  2292     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-endio-met]
1     0  2293     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-endio-met]
1     0  2294     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-endio-rai]
1     0  2295     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-endio-rep]
1     0  2296     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-rmw]
1     0  2297     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-endio-wri]
1     0  2298     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-freespace]
1     0  2299     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-delayed-m]
1     0  2300     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-readahead]
1     0  2301     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-qgroup-re]
1     0  2302     2   0 -20      0     0 -      S<   ?          0:00 [btrfs-extent-re]
1     0  2303     2  20   0      0     0 -      D    ?          0:10 [btrfs-cleaner]
1     0  2304     2  20   0      0     0 -      D    ?        3247:49 [btrfs-transacti]
4     0 10316  9321  20   0  83604  4868 -      S+   pts/1      0:00 sudo btrfs balance start /storage/
4     0 10317 10316  20   0  15788  1044 -      R+   pts/1    5352:22 btrfs balance start /storage/
0  1000 17901 13293  20   0  12288  2136 -      R+   pts/5      0:00 grep --color=auto btrfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-03 22:13 Very slow balance / btrfs-transaction jb
@ 2017-02-03 23:25 ` Goldwyn Rodrigues
  2017-02-04  0:30 ` Jorg Bornschein
  1 sibling, 0 replies; 22+ messages in thread
From: Goldwyn Rodrigues @ 2017-02-03 23:25 UTC (permalink / raw)
  To: jb, linux-btrfs



On 02/03/2017 04:13 PM, jb@capsec.org wrote:
> Hi, 
> 
> 
> I'm currently running a balance (without any filters) on a 4 drives raid1 filesystem. The array contains 3 3TB drives and one 6TB drive; I'm running the rebalance because the 6TB drive recently replaced a 2TB drive. 
> 
> 
> I know that balance is not supposed to be a fast operation, but this one is now running for ~6 days and it managed to balance ~18% (754 out of about 4250 chunks balanced (755 considered),  82% left) -- so I expect it to take another ~4 weeks. 
> 
> That seems excessively slow for ~8TiB of data.
> 
> 
> Is this expected behavior? In case it's not: Is there anything I can do to help debug it?

Do you have quotas enabled?

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-03 22:13 Very slow balance / btrfs-transaction jb
  2017-02-03 23:25 ` Goldwyn Rodrigues
@ 2017-02-04  0:30 ` Jorg Bornschein
  2017-02-04  1:07   ` Goldwyn Rodrigues
                     ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Jorg Bornschein @ 2017-02-04  0:30 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-btrfs

February 3, 2017 11:26 PM, "Goldwyn Rodrigues" <rgoldwyn@suse.com> wrote:

>> Hi,
>> 
>> I'm currently running a balance (without any filters) on a 4 drives raid1 filesystem. The array
>> contains 3 3TB drives and one 6TB drive; I'm running the rebalance because the 6TB drive recently
>> replaced a 2TB drive.
>> 
>> I know that balance is not supposed to be a fast operation, but this one is now running for ~6 days
>> and it managed to balance ~18% (754 out of about 4250 chunks balanced (755 considered), 82% left)
>> -- so I expect it to take another ~4 weeks.
>> 
>> That seems excessively slow for ~8TiB of data.
>> 
>> Is this expected behavior? In case it's not: Is there anything I can do to help debug it?
> 
> Do you have quotas enabled?


I might have activated it when playing with "snapper" -- I remember using some quota command without knowing what it does. 

How can I check its active? Shall I just disable it wit "btrfs quota disable"? 


   j

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-04  0:30 ` Jorg Bornschein
@ 2017-02-04  1:07   ` Goldwyn Rodrigues
  2017-02-04  1:47   ` Jorg Bornschein
  2017-02-04 20:50   ` Jorg Bornschein
  2 siblings, 0 replies; 22+ messages in thread
From: Goldwyn Rodrigues @ 2017-02-04  1:07 UTC (permalink / raw)
  To: Jorg Bornschein, linux-btrfs



On 02/03/2017 06:30 PM, Jorg Bornschein wrote:
> February 3, 2017 11:26 PM, "Goldwyn Rodrigues" <rgoldwyn@suse.com> wrote:
> 
>>> Hi,
>>>
>>> I'm currently running a balance (without any filters) on a 4 drives raid1 filesystem. The array
>>> contains 3 3TB drives and one 6TB drive; I'm running the rebalance because the 6TB drive recently
>>> replaced a 2TB drive.
>>>
>>> I know that balance is not supposed to be a fast operation, but this one is now running for ~6 days
>>> and it managed to balance ~18% (754 out of about 4250 chunks balanced (755 considered), 82% left)
>>> -- so I expect it to take another ~4 weeks.
>>>
>>> That seems excessively slow for ~8TiB of data.
>>>
>>> Is this expected behavior? In case it's not: Is there anything I can do to help debug it?
>>
>> Do you have quotas enabled?
> 
> 
> I might have activated it when playing with "snapper" -- I remember using some quota command without knowing what it does. 
> 
> How can I check its active? Shall I just disable it wit "btrfs quota disable"? 
> 

To check your quota limits:
# btrfs qgroup show <mountpoint>

To disable
# btrfs quota disable <mountpoint>

Yes, please check if disabling quotas makes a difference in execution
time of btrfs balance.

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-04  0:30 ` Jorg Bornschein
  2017-02-04  1:07   ` Goldwyn Rodrigues
@ 2017-02-04  1:47   ` Jorg Bornschein
  2017-02-04  2:55     ` Lakshmipathi.G
                       ` (2 more replies)
  2017-02-04 20:50   ` Jorg Bornschein
  2 siblings, 3 replies; 22+ messages in thread
From: Jorg Bornschein @ 2017-02-04  1:47 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-btrfs

February 4, 2017 1:07 AM, "Goldwyn Rodrigues" <rgoldwyn@suse.de> wrote:

> On 02/03/2017 06:30 PM, Jorg Bornschein wrote:
> 
>> February 3, 2017 11:26 PM, "Goldwyn Rodrigues" <rgoldwyn@suse.com> wrote:
>> 
>> Hi,
>> 
>> I'm currently running a balance (without any filters) on a 4 drives raid1 filesystem. The array
>> contains 3 3TB drives and one 6TB drive; I'm running the rebalance because the 6TB drive recently
>> replaced a 2TB drive.
>> 
>> I know that balance is not supposed to be a fast operation, but this one is now running for ~6 days
>> and it managed to balance ~18% (754 out of about 4250 chunks balanced (755 considered), 82% left)
>> -- so I expect it to take another ~4 weeks.
>> 
>> That seems excessively slow for ~8TiB of data.
>> 
>> Is this expected behavior? In case it's not: Is there anything I can do to help debug it?
>>> Do you have quotas enabled?
>> 
>> I might have activated it when playing with "snapper" -- I remember using some quota command
>> without knowing what it does.
>> 
>> How can I check its active? Shall I just disable it wit "btrfs quota disable"?
> 
> To check your quota limits:
> # btrfs qgroup show <mountpoint>
> 
> To disable
> # btrfs quota disable <mountpoint>
> 
> Yes, please check if disabling quotas makes a difference in execution
> time of btrfs balance.


Quata support was indeed active -- and it warned me that the qroup data was inconsistent. 

Disabling quotas had an immediate impact on balance throughput -- it's *much* faster now! 
>From a quick glance at iostat I would guess it's at least a factor 100 faster.


Should quota support generally be disabled during balances? Or did I somehow push my fs into a weired state where it triggered a slow-path?



Thanks!   
   
   j

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-04  1:47   ` Jorg Bornschein
@ 2017-02-04  2:55     ` Lakshmipathi.G
  2017-02-04  8:22       ` Duncan
  2017-02-06  1:45     ` Qu Wenruo
  2017-02-06  9:14     ` Jorg Bornschein
  2 siblings, 1 reply; 22+ messages in thread
From: Lakshmipathi.G @ 2017-02-04  2:55 UTC (permalink / raw)
  To: Jorg Bornschein; +Cc: Goldwyn Rodrigues, btrfs

>Should quota support generally be disabled during balances?

If this true and quota impacts balance throughput, at-least there
should an alert message like "Running Balance with quota will affect
performance" or similar before starting.

----
Cheers,
Lakshmipathi.G

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-04  2:55     ` Lakshmipathi.G
@ 2017-02-04  8:22       ` Duncan
  0 siblings, 0 replies; 22+ messages in thread
From: Duncan @ 2017-02-04  8:22 UTC (permalink / raw)
  To: linux-btrfs

Lakshmipathi.G posted on Sat, 04 Feb 2017 08:25:04 +0530 as excerpted:

>>Should quota support generally be disabled during balances?
> 
> If this true and quota impacts balance throughput, at-least there should
> an alert message like "Running Balance with quota will affect
> performance" or similar before starting.

The problem isn't that, exactly, tho that's part of it.  The problem with 
quotas is that the feature itself isn't yet mature.  At least until very 
recently, and possibly still, quotas couldn't be depended upon to work 
correctly (various not entirely uncommon corner-cases would trigger 
negative numbers, etc), and even when they do work correctly, they simply 
don't scale well in combination with balance, check, etc -- that 10X 
difference isn't uncommon.

So my recommendation for quotas has been and remains, unless you're 
actively working with the devs on improving them, it's probably better to 
keep them disabled.  Either you actually need quota functionality or you 
don't.  If you do, it's better to use a mature filesystem where quotas 
are a mature feature that works dependably.  If you don't, just leave the 
feature off, as it continues to simply not be worth the troubles and 
scaling issues it triggers.

IOW, btrfs quotas might work and scale well some day, but that day isn't 
today, and it's not going to be tomorrow or next kernel cycle, either.  
It's going to take awhile, and you'll be much happier with btrfs in the 
mean time if you don't have them enabled.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-04  1:47   ` Jorg Bornschein
  2017-02-04  2:55     ` Lakshmipathi.G
@ 2017-02-06  1:45     ` Qu Wenruo
  2017-02-06 16:09       ` Goldwyn Rodrigues
  2017-02-06  9:14     ` Jorg Bornschein
  2 siblings, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2017-02-06  1:45 UTC (permalink / raw)
  To: Jorg Bornschein, Goldwyn Rodrigues, linux-btrfs



At 02/04/2017 09:47 AM, Jorg Bornschein wrote:
> February 4, 2017 1:07 AM, "Goldwyn Rodrigues" <rgoldwyn@suse.de> wrote:
>
>> On 02/03/2017 06:30 PM, Jorg Bornschein wrote:
>>
>>> February 3, 2017 11:26 PM, "Goldwyn Rodrigues" <rgoldwyn@suse.com> wrote:
>>>
>>> Hi,
>>>
>>> I'm currently running a balance (without any filters) on a 4 drives raid1 filesystem. The array
>>> contains 3 3TB drives and one 6TB drive; I'm running the rebalance because the 6TB drive recently
>>> replaced a 2TB drive.
>>>
>>> I know that balance is not supposed to be a fast operation, but this one is now running for ~6 days
>>> and it managed to balance ~18% (754 out of about 4250 chunks balanced (755 considered), 82% left)
>>> -- so I expect it to take another ~4 weeks.
>>>
>>> That seems excessively slow for ~8TiB of data.
>>>
>>> Is this expected behavior? In case it's not: Is there anything I can do to help debug it?
>>>> Do you have quotas enabled?
>>>
>>> I might have activated it when playing with "snapper" -- I remember using some quota command
>>> without knowing what it does.
>>>
>>> How can I check its active? Shall I just disable it wit "btrfs quota disable"?
>>
>> To check your quota limits:
>> # btrfs qgroup show <mountpoint>
>>
>> To disable
>> # btrfs quota disable <mountpoint>
>>
>> Yes, please check if disabling quotas makes a difference in execution
>> time of btrfs balance.
>
>
> Quata support was indeed active -- and it warned me that the qroup data was inconsistent.
>
> Disabling quotas had an immediate impact on balance throughput -- it's *much* faster now!
> From a quick glance at iostat I would guess it's at least a factor 100 faster.
>
>
> Should quota support generally be disabled during balances? Or did I somehow push my fs into a weired state where it triggered a slow-path?
>
>
>
> Thanks!
>
>    j

Would you please provide the kernel version?

v4.9 introduced a bad fix for qgroup balance, which doesn't completely 
fix qgroup bytes leaking, but also hugely slow down the balance process:

commit 62b99540a1d91e46422f0e04de50fc723812c421
Author: Qu Wenruo <quwenruo@cn.fujitsu.com>
Date:   Mon Aug 15 10:36:51 2016 +0800

     btrfs: relocation: Fix leaking qgroups numbers on data extents

Sorry for that.

And in v4.10, a better method is applied to fix the byte leaking 
problem, and should be a little faster than previous one.

commit 824d8dff8846533c9f1f9b1eabb0c03959e989ca
Author: Qu Wenruo <quwenruo@cn.fujitsu.com>
Date:   Tue Oct 18 09:31:29 2016 +0800

     btrfs: qgroup: Fix qgroup data leaking by using subtree tracing


However, using balance with qgroup is still slower than balance without 
qgroup, the root fix needs us to rework current backref iteration.

Thanks,
Qu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-06  1:45     ` Qu Wenruo
@ 2017-02-06 16:09       ` Goldwyn Rodrigues
  2017-02-07  0:22         ` Qu Wenruo
  0 siblings, 1 reply; 22+ messages in thread
From: Goldwyn Rodrigues @ 2017-02-06 16:09 UTC (permalink / raw)
  To: Qu Wenruo, Jorg Bornschein, linux-btrfs


Hi Qu,

On 02/05/2017 07:45 PM, Qu Wenruo wrote:
> 
> 
> At 02/04/2017 09:47 AM, Jorg Bornschein wrote:
>> February 4, 2017 1:07 AM, "Goldwyn Rodrigues" <rgoldwyn@suse.de> wrote:

<snipped>

>>
>>
>> Quata support was indeed active -- and it warned me that the qroup
>> data was inconsistent.
>>
>> Disabling quotas had an immediate impact on balance throughput -- it's
>> *much* faster now!
>> From a quick glance at iostat I would guess it's at least a factor 100
>> faster.
>>
>>
>> Should quota support generally be disabled during balances? Or did I
>> somehow push my fs into a weired state where it triggered a slow-path?
>>
>>
>>
>> Thanks!
>>
>>    j
> 
> Would you please provide the kernel version?
> 
> v4.9 introduced a bad fix for qgroup balance, which doesn't completely
> fix qgroup bytes leaking, but also hugely slow down the balance process:
> 
> commit 62b99540a1d91e46422f0e04de50fc723812c421
> Author: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Date:   Mon Aug 15 10:36:51 2016 +0800
> 
>     btrfs: relocation: Fix leaking qgroups numbers on data extents
> 
> Sorry for that.
> 
> And in v4.10, a better method is applied to fix the byte leaking
> problem, and should be a little faster than previous one.
> 
> commit 824d8dff8846533c9f1f9b1eabb0c03959e989ca
> Author: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Date:   Tue Oct 18 09:31:29 2016 +0800
> 
>     btrfs: qgroup: Fix qgroup data leaking by using subtree tracing
> 
> 
> However, using balance with qgroup is still slower than balance without
> qgroup, the root fix needs us to rework current backref iteration.
> 

This patch has made the btrfs balance performance worse. The balance
task has become more CPU intensive compared to earlier and takes longer
to complete, besides hogging resources. While correctness is important,
we need to figure out how this can be made more efficient.

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-06 16:09       ` Goldwyn Rodrigues
@ 2017-02-07  0:22         ` Qu Wenruo
  2017-02-07 15:55           ` Filipe Manana
  0 siblings, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2017-02-07  0:22 UTC (permalink / raw)
  To: Goldwyn Rodrigues, Jorg Bornschein, linux-btrfs



At 02/07/2017 12:09 AM, Goldwyn Rodrigues wrote:
>
> Hi Qu,
>
> On 02/05/2017 07:45 PM, Qu Wenruo wrote:
>>
>>
>> At 02/04/2017 09:47 AM, Jorg Bornschein wrote:
>>> February 4, 2017 1:07 AM, "Goldwyn Rodrigues" <rgoldwyn@suse.de> wrote:
>
> <snipped>
>
>>>
>>>
>>> Quata support was indeed active -- and it warned me that the qroup
>>> data was inconsistent.
>>>
>>> Disabling quotas had an immediate impact on balance throughput -- it's
>>> *much* faster now!
>>> From a quick glance at iostat I would guess it's at least a factor 100
>>> faster.
>>>
>>>
>>> Should quota support generally be disabled during balances? Or did I
>>> somehow push my fs into a weired state where it triggered a slow-path?
>>>
>>>
>>>
>>> Thanks!
>>>
>>>    j
>>
>> Would you please provide the kernel version?
>>
>> v4.9 introduced a bad fix for qgroup balance, which doesn't completely
>> fix qgroup bytes leaking, but also hugely slow down the balance process:
>>
>> commit 62b99540a1d91e46422f0e04de50fc723812c421
>> Author: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> Date:   Mon Aug 15 10:36:51 2016 +0800
>>
>>     btrfs: relocation: Fix leaking qgroups numbers on data extents
>>
>> Sorry for that.
>>
>> And in v4.10, a better method is applied to fix the byte leaking
>> problem, and should be a little faster than previous one.
>>
>> commit 824d8dff8846533c9f1f9b1eabb0c03959e989ca
>> Author: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> Date:   Tue Oct 18 09:31:29 2016 +0800
>>
>>     btrfs: qgroup: Fix qgroup data leaking by using subtree tracing
>>
>>
>> However, using balance with qgroup is still slower than balance without
>> qgroup, the root fix needs us to rework current backref iteration.
>>
>
> This patch has made the btrfs balance performance worse. The balance
> task has become more CPU intensive compared to earlier and takes longer
> to complete, besides hogging resources. While correctness is important,
> we need to figure out how this can be made more efficient.
>
The cause is already known.

It's find_parent_node() which takes most of the time to find all 
referencer of an extent.

And it's also the cause for FIEMAP softlockup (fixed in recent release 
by early quit).

The biggest problem is, current find_parent_node() uses list to iterate, 
which is quite slow especially it's done in a loop.
In real world find_parent_node() is about O(n^3).
We can either improve find_parent_node() by using rb_tree, or introduce 
some cache for find_parent_node().


IIRC SUSE guys(maybe Jeff?) are working on it with the first method, but 
I didn't hear anything about it recently.

Thanks,
Qu



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-07  0:22         ` Qu Wenruo
@ 2017-02-07 15:55           ` Filipe Manana
  2017-02-08  0:39             ` Qu Wenruo
  0 siblings, 1 reply; 22+ messages in thread
From: Filipe Manana @ 2017-02-07 15:55 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Goldwyn Rodrigues, Jorg Bornschein, linux-btrfs@vger.kernel.org

On Tue, Feb 7, 2017 at 12:22 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>
>
> At 02/07/2017 12:09 AM, Goldwyn Rodrigues wrote:
>>
>>
>> Hi Qu,
>>
>> On 02/05/2017 07:45 PM, Qu Wenruo wrote:
>>>
>>>
>>>
>>> At 02/04/2017 09:47 AM, Jorg Bornschein wrote:
>>>>
>>>> February 4, 2017 1:07 AM, "Goldwyn Rodrigues" <rgoldwyn@suse.de> wrote:
>>
>>
>> <snipped>
>>
>>>>
>>>>
>>>> Quata support was indeed active -- and it warned me that the qroup
>>>> data was inconsistent.
>>>>
>>>> Disabling quotas had an immediate impact on balance throughput -- it's
>>>> *much* faster now!
>>>> From a quick glance at iostat I would guess it's at least a factor 100
>>>> faster.
>>>>
>>>>
>>>> Should quota support generally be disabled during balances? Or did I
>>>> somehow push my fs into a weired state where it triggered a slow-path?
>>>>
>>>>
>>>>
>>>> Thanks!
>>>>
>>>>    j
>>>
>>>
>>> Would you please provide the kernel version?
>>>
>>> v4.9 introduced a bad fix for qgroup balance, which doesn't completely
>>> fix qgroup bytes leaking, but also hugely slow down the balance process:
>>>
>>> commit 62b99540a1d91e46422f0e04de50fc723812c421
>>> Author: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> Date:   Mon Aug 15 10:36:51 2016 +0800
>>>
>>>     btrfs: relocation: Fix leaking qgroups numbers on data extents
>>>
>>> Sorry for that.
>>>
>>> And in v4.10, a better method is applied to fix the byte leaking
>>> problem, and should be a little faster than previous one.
>>>
>>> commit 824d8dff8846533c9f1f9b1eabb0c03959e989ca
>>> Author: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> Date:   Tue Oct 18 09:31:29 2016 +0800
>>>
>>>     btrfs: qgroup: Fix qgroup data leaking by using subtree tracing
>>>
>>>
>>> However, using balance with qgroup is still slower than balance without
>>> qgroup, the root fix needs us to rework current backref iteration.
>>>
>>
>> This patch has made the btrfs balance performance worse. The balance
>> task has become more CPU intensive compared to earlier and takes longer
>> to complete, besides hogging resources. While correctness is important,
>> we need to figure out how this can be made more efficient.
>>
> The cause is already known.
>
> It's find_parent_node() which takes most of the time to find all referencer
> of an extent.
>
> And it's also the cause for FIEMAP softlockup (fixed in recent release by
> early quit).
>
> The biggest problem is, current find_parent_node() uses list to iterate,
> which is quite slow especially it's done in a loop.
> In real world find_parent_node() is about O(n^3).
> We can either improve find_parent_node() by using rb_tree, or introduce some
> cache for find_parent_node().

Even if anyone is able to reduce that function's complexity from
O(n^3) down to lets say O(n^2) or O(n log n) for example, the current
implementation of qgroups will always be a problem. The real problem
is that this more recent rework of qgroups does all this accounting
inside the critical section of a transaction - blocking any other
tasks that want to start a new transaction or attempt to join the
current transaction. Not to mention that on systems with small amounts
of memory (2Gb or 4Gb from what I've seen from user reports) we also
OOM due this allocation of struct btrfs_qgroup_extent_record per
delayed data reference head, that are used for that accounting phase
in the critical section of a transaction commit.

Let's face it and be realistic, even if someone manages to make
find_parent_node() much much better, like O(n) for example, it will
always be a problem due to the reasons mentioned before. Many extents
touched per transaction and many subvolumes/snapshots, will always
expose that root problem - doing the accounting in the transaction
commit critical section.

>
>
> IIRC SUSE guys(maybe Jeff?) are working on it with the first method, but I
> didn't hear anything about it recently.
>
> Thanks,
> Qu
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"People will forget what you said,
 people will forget what you did,
 but people will never forget how you made them feel."

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-07 15:55           ` Filipe Manana
@ 2017-02-08  0:39             ` Qu Wenruo
  2017-02-08 13:56               ` Filipe Manana
  0 siblings, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2017-02-08  0:39 UTC (permalink / raw)
  To: fdmanana; +Cc: Goldwyn Rodrigues, Jorg Bornschein, linux-btrfs@vger.kernel.org



At 02/07/2017 11:55 PM, Filipe Manana wrote:
> On Tue, Feb 7, 2017 at 12:22 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>
>>
>> At 02/07/2017 12:09 AM, Goldwyn Rodrigues wrote:
>>>
>>>
>>> Hi Qu,
>>>
>>> On 02/05/2017 07:45 PM, Qu Wenruo wrote:
>>>>
>>>>
>>>>
>>>> At 02/04/2017 09:47 AM, Jorg Bornschein wrote:
>>>>>
>>>>> February 4, 2017 1:07 AM, "Goldwyn Rodrigues" <rgoldwyn@suse.de> wrote:
>>>
>>>
>>> <snipped>
>>>
>>>>>
>>>>>
>>>>> Quata support was indeed active -- and it warned me that the qroup
>>>>> data was inconsistent.
>>>>>
>>>>> Disabling quotas had an immediate impact on balance throughput -- it's
>>>>> *much* faster now!
>>>>> From a quick glance at iostat I would guess it's at least a factor 100
>>>>> faster.
>>>>>
>>>>>
>>>>> Should quota support generally be disabled during balances? Or did I
>>>>> somehow push my fs into a weired state where it triggered a slow-path?
>>>>>
>>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>>    j
>>>>
>>>>
>>>> Would you please provide the kernel version?
>>>>
>>>> v4.9 introduced a bad fix for qgroup balance, which doesn't completely
>>>> fix qgroup bytes leaking, but also hugely slow down the balance process:
>>>>
>>>> commit 62b99540a1d91e46422f0e04de50fc723812c421
>>>> Author: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>> Date:   Mon Aug 15 10:36:51 2016 +0800
>>>>
>>>>     btrfs: relocation: Fix leaking qgroups numbers on data extents
>>>>
>>>> Sorry for that.
>>>>
>>>> And in v4.10, a better method is applied to fix the byte leaking
>>>> problem, and should be a little faster than previous one.
>>>>
>>>> commit 824d8dff8846533c9f1f9b1eabb0c03959e989ca
>>>> Author: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>> Date:   Tue Oct 18 09:31:29 2016 +0800
>>>>
>>>>     btrfs: qgroup: Fix qgroup data leaking by using subtree tracing
>>>>
>>>>
>>>> However, using balance with qgroup is still slower than balance without
>>>> qgroup, the root fix needs us to rework current backref iteration.
>>>>
>>>
>>> This patch has made the btrfs balance performance worse. The balance
>>> task has become more CPU intensive compared to earlier and takes longer
>>> to complete, besides hogging resources. While correctness is important,
>>> we need to figure out how this can be made more efficient.
>>>
>> The cause is already known.
>>
>> It's find_parent_node() which takes most of the time to find all referencer
>> of an extent.
>>
>> And it's also the cause for FIEMAP softlockup (fixed in recent release by
>> early quit).
>>
>> The biggest problem is, current find_parent_node() uses list to iterate,
>> which is quite slow especially it's done in a loop.
>> In real world find_parent_node() is about O(n^3).
>> We can either improve find_parent_node() by using rb_tree, or introduce some
>> cache for find_parent_node().
>
> Even if anyone is able to reduce that function's complexity from
> O(n^3) down to lets say O(n^2) or O(n log n) for example, the current
> implementation of qgroups will always be a problem. The real problem
> is that this more recent rework of qgroups does all this accounting
> inside the critical section of a transaction - blocking any other
> tasks that want to start a new transaction or attempt to join the
> current transaction. Not to mention that on systems with small amounts
> of memory (2Gb or 4Gb from what I've seen from user reports) we also
> OOM due this allocation of struct btrfs_qgroup_extent_record per
> delayed data reference head, that are used for that accounting phase
> in the critical section of a transaction commit.
>
> Let's face it and be realistic, even if someone manages to make
> find_parent_node() much much better, like O(n) for example, it will
> always be a problem due to the reasons mentioned before. Many extents
> touched per transaction and many subvolumes/snapshots, will always
> expose that root problem - doing the accounting in the transaction
> commit critical section.

You must accept the fact that we must call find_parent_node() at least 
twice to get correct owner modification for each touched extent.
Or qgroup number will never be correct.

One for old_roots by searching commit root, and one for new_roots by 
searching current root.

You can call find_parent_node() as many time as you like, but that's 
just wasting your CPU time.

Only the final find_parent_node() will determine new_roots for that 
extent, and there is no better timing than commit_transaction().

Or you can wasting more time calling find_parent_node() every time you 
touched a extent, saving one find_parent_node() in commit_transaction() 
with the cost of more find_parent_node() in other place.
Is that what you want?

I can move the find_parent_node() for old_roots out of commit_transaction().
But that will only reduce 50% of the time spent on commit_transaction().

Compared to O(n^3) find_parent_node(), that's not the determining fact even.

Thanks,
Qu

>
>>
>>
>> IIRC SUSE guys(maybe Jeff?) are working on it with the first method, but I
>> didn't hear anything about it recently.
>>
>> Thanks,
>> Qu
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-08  0:39             ` Qu Wenruo
@ 2017-02-08 13:56               ` Filipe Manana
  2017-02-09  1:13                 ` Qu Wenruo
  0 siblings, 1 reply; 22+ messages in thread
From: Filipe Manana @ 2017-02-08 13:56 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Goldwyn Rodrigues, Jorg Bornschein, linux-btrfs@vger.kernel.org

On Wed, Feb 8, 2017 at 12:39 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>
>
> At 02/07/2017 11:55 PM, Filipe Manana wrote:
>>
>> On Tue, Feb 7, 2017 at 12:22 AM, Qu Wenruo <quwenruo@cn.fujitsu.com>
>> wrote:
>>>
>>>
>>>
>>> At 02/07/2017 12:09 AM, Goldwyn Rodrigues wrote:
>>>>
>>>>
>>>>
>>>> Hi Qu,
>>>>
>>>> On 02/05/2017 07:45 PM, Qu Wenruo wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> At 02/04/2017 09:47 AM, Jorg Bornschein wrote:
>>>>>>
>>>>>>
>>>>>> February 4, 2017 1:07 AM, "Goldwyn Rodrigues" <rgoldwyn@suse.de>
>>>>>> wrote:
>>>>
>>>>
>>>>
>>>> <snipped>
>>>>
>>>>>>
>>>>>>
>>>>>> Quata support was indeed active -- and it warned me that the qroup
>>>>>> data was inconsistent.
>>>>>>
>>>>>> Disabling quotas had an immediate impact on balance throughput -- it's
>>>>>> *much* faster now!
>>>>>> From a quick glance at iostat I would guess it's at least a factor 100
>>>>>> faster.
>>>>>>
>>>>>>
>>>>>> Should quota support generally be disabled during balances? Or did I
>>>>>> somehow push my fs into a weired state where it triggered a slow-path?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>    j
>>>>>
>>>>>
>>>>>
>>>>> Would you please provide the kernel version?
>>>>>
>>>>> v4.9 introduced a bad fix for qgroup balance, which doesn't completely
>>>>> fix qgroup bytes leaking, but also hugely slow down the balance
>>>>> process:
>>>>>
>>>>> commit 62b99540a1d91e46422f0e04de50fc723812c421
>>>>> Author: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>>> Date:   Mon Aug 15 10:36:51 2016 +0800
>>>>>
>>>>>     btrfs: relocation: Fix leaking qgroups numbers on data extents
>>>>>
>>>>> Sorry for that.
>>>>>
>>>>> And in v4.10, a better method is applied to fix the byte leaking
>>>>> problem, and should be a little faster than previous one.
>>>>>
>>>>> commit 824d8dff8846533c9f1f9b1eabb0c03959e989ca
>>>>> Author: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>>> Date:   Tue Oct 18 09:31:29 2016 +0800
>>>>>
>>>>>     btrfs: qgroup: Fix qgroup data leaking by using subtree tracing
>>>>>
>>>>>
>>>>> However, using balance with qgroup is still slower than balance without
>>>>> qgroup, the root fix needs us to rework current backref iteration.
>>>>>
>>>>
>>>> This patch has made the btrfs balance performance worse. The balance
>>>> task has become more CPU intensive compared to earlier and takes longer
>>>> to complete, besides hogging resources. While correctness is important,
>>>> we need to figure out how this can be made more efficient.
>>>>
>>> The cause is already known.
>>>
>>> It's find_parent_node() which takes most of the time to find all
>>> referencer
>>> of an extent.
>>>
>>> And it's also the cause for FIEMAP softlockup (fixed in recent release by
>>> early quit).
>>>
>>> The biggest problem is, current find_parent_node() uses list to iterate,
>>> which is quite slow especially it's done in a loop.
>>> In real world find_parent_node() is about O(n^3).
>>> We can either improve find_parent_node() by using rb_tree, or introduce
>>> some
>>> cache for find_parent_node().
>>
>>
>> Even if anyone is able to reduce that function's complexity from
>> O(n^3) down to lets say O(n^2) or O(n log n) for example, the current
>> implementation of qgroups will always be a problem. The real problem
>> is that this more recent rework of qgroups does all this accounting
>> inside the critical section of a transaction - blocking any other
>> tasks that want to start a new transaction or attempt to join the
>> current transaction. Not to mention that on systems with small amounts
>> of memory (2Gb or 4Gb from what I've seen from user reports) we also
>> OOM due this allocation of struct btrfs_qgroup_extent_record per
>> delayed data reference head, that are used for that accounting phase
>> in the critical section of a transaction commit.
>>
>> Let's face it and be realistic, even if someone manages to make
>> find_parent_node() much much better, like O(n) for example, it will
>> always be a problem due to the reasons mentioned before. Many extents
>> touched per transaction and many subvolumes/snapshots, will always
>> expose that root problem - doing the accounting in the transaction
>> commit critical section.
>
>
> You must accept the fact that we must call find_parent_node() at least twice
> to get correct owner modification for each touched extent.
> Or qgroup number will never be correct.
>
> One for old_roots by searching commit root, and one for new_roots by
> searching current root.
>
> You can call find_parent_node() as many time as you like, but that's just
> wasting your CPU time.
>
> Only the final find_parent_node() will determine new_roots for that extent,
> and there is no better timing than commit_transaction().

You're missing my point.

My point is not about needing to call find_parent_nodes() nor how many
times to call it, or whether it's needed or not. My point is about
doing expensive things inside the critical section of a transaction
commit, which leads not only to low performance but getting a system
becoming unresponsive and with too high latency - and this is not
theory or speculation, there are upstream reports about this as well
as several in suse's bugzilla, all caused when qgroups are enabled on
4.2+ kernels (when the last qgroups major changes landed).

Judging from that code and from your reply to this and other threads
it seems you didn't understand the consequences of doing all that
accounting stuff inside the critical section of a transaction commit.

Forget find_parent_nodes() for a moment, yes it has problems, but it's
not what I'm trying to make you understand. Just look at
btrfs_qgroup_account_extents(), called within the critical section -
it iterates all elements of a red black tree, and each element
corresponds to some data extent allocated in the current transaction -
if we have thousands, tens of thousands, or more, even if whatever the
loop calls had an awesome complexity of O(1) or O(log N) it would
still be bad, exactly because it's blocking future transactions to
start and tasks from joining the current transaction. CPU time and
memory consumption (used for those struct btrfs_qgroup_extent_record)
are also concerns, but to a smaller extent imo.

>
> Or you can wasting more time calling find_parent_node() every time you
> touched a extent, saving one find_parent_node() in commit_transaction() with
> the cost of more find_parent_node() in other place.
> Is that what you want?
>
> I can move the find_parent_node() for old_roots out of commit_transaction().
> But that will only reduce 50% of the time spent on commit_transaction().
>
> Compared to O(n^3) find_parent_node(), that's not the determining fact even.
>
> Thanks,
> Qu
>
>
>>
>>>
>>>
>>> IIRC SUSE guys(maybe Jeff?) are working on it with the first method, but
>>> I
>>> didn't hear anything about it recently.
>>>
>>> Thanks,
>>> Qu
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>>
>
>



-- 
Filipe David Manana,

"People will forget what you said,
 people will forget what you did,
 but people will never forget how you made them feel."

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-08 13:56               ` Filipe Manana
@ 2017-02-09  1:13                 ` Qu Wenruo
  0 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2017-02-09  1:13 UTC (permalink / raw)
  To: fdmanana; +Cc: Goldwyn Rodrigues, Jorg Bornschein, linux-btrfs@vger.kernel.org



At 02/08/2017 09:56 PM, Filipe Manana wrote:
> On Wed, Feb 8, 2017 at 12:39 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>
>>
>> At 02/07/2017 11:55 PM, Filipe Manana wrote:
>>>
>>> On Tue, Feb 7, 2017 at 12:22 AM, Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> wrote:
>>>>
>>>>
>>>>
>>>> At 02/07/2017 12:09 AM, Goldwyn Rodrigues wrote:
>>>>>
>>>>>
>>>>>
>>>>> Hi Qu,
>>>>>
>>>>> On 02/05/2017 07:45 PM, Qu Wenruo wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> At 02/04/2017 09:47 AM, Jorg Bornschein wrote:
>>>>>>>
>>>>>>>
>>>>>>> February 4, 2017 1:07 AM, "Goldwyn Rodrigues" <rgoldwyn@suse.de>
>>>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> <snipped>
>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Quata support was indeed active -- and it warned me that the qroup
>>>>>>> data was inconsistent.
>>>>>>>
>>>>>>> Disabling quotas had an immediate impact on balance throughput -- it's
>>>>>>> *much* faster now!
>>>>>>> From a quick glance at iostat I would guess it's at least a factor 100
>>>>>>> faster.
>>>>>>>
>>>>>>>
>>>>>>> Should quota support generally be disabled during balances? Or did I
>>>>>>> somehow push my fs into a weired state where it triggered a slow-path?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>    j
>>>>>>
>>>>>>
>>>>>>
>>>>>> Would you please provide the kernel version?
>>>>>>
>>>>>> v4.9 introduced a bad fix for qgroup balance, which doesn't completely
>>>>>> fix qgroup bytes leaking, but also hugely slow down the balance
>>>>>> process:
>>>>>>
>>>>>> commit 62b99540a1d91e46422f0e04de50fc723812c421
>>>>>> Author: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>>>> Date:   Mon Aug 15 10:36:51 2016 +0800
>>>>>>
>>>>>>     btrfs: relocation: Fix leaking qgroups numbers on data extents
>>>>>>
>>>>>> Sorry for that.
>>>>>>
>>>>>> And in v4.10, a better method is applied to fix the byte leaking
>>>>>> problem, and should be a little faster than previous one.
>>>>>>
>>>>>> commit 824d8dff8846533c9f1f9b1eabb0c03959e989ca
>>>>>> Author: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>>>> Date:   Tue Oct 18 09:31:29 2016 +0800
>>>>>>
>>>>>>     btrfs: qgroup: Fix qgroup data leaking by using subtree tracing
>>>>>>
>>>>>>
>>>>>> However, using balance with qgroup is still slower than balance without
>>>>>> qgroup, the root fix needs us to rework current backref iteration.
>>>>>>
>>>>>
>>>>> This patch has made the btrfs balance performance worse. The balance
>>>>> task has become more CPU intensive compared to earlier and takes longer
>>>>> to complete, besides hogging resources. While correctness is important,
>>>>> we need to figure out how this can be made more efficient.
>>>>>
>>>> The cause is already known.
>>>>
>>>> It's find_parent_node() which takes most of the time to find all
>>>> referencer
>>>> of an extent.
>>>>
>>>> And it's also the cause for FIEMAP softlockup (fixed in recent release by
>>>> early quit).
>>>>
>>>> The biggest problem is, current find_parent_node() uses list to iterate,
>>>> which is quite slow especially it's done in a loop.
>>>> In real world find_parent_node() is about O(n^3).
>>>> We can either improve find_parent_node() by using rb_tree, or introduce
>>>> some
>>>> cache for find_parent_node().
>>>
>>>
>>> Even if anyone is able to reduce that function's complexity from
>>> O(n^3) down to lets say O(n^2) or O(n log n) for example, the current
>>> implementation of qgroups will always be a problem. The real problem
>>> is that this more recent rework of qgroups does all this accounting
>>> inside the critical section of a transaction - blocking any other
>>> tasks that want to start a new transaction or attempt to join the
>>> current transaction. Not to mention that on systems with small amounts
>>> of memory (2Gb or 4Gb from what I've seen from user reports) we also
>>> OOM due this allocation of struct btrfs_qgroup_extent_record per
>>> delayed data reference head, that are used for that accounting phase
>>> in the critical section of a transaction commit.
>>>
>>> Let's face it and be realistic, even if someone manages to make
>>> find_parent_node() much much better, like O(n) for example, it will
>>> always be a problem due to the reasons mentioned before. Many extents
>>> touched per transaction and many subvolumes/snapshots, will always
>>> expose that root problem - doing the accounting in the transaction
>>> commit critical section.
>>
>>
>> You must accept the fact that we must call find_parent_node() at least twice
>> to get correct owner modification for each touched extent.
>> Or qgroup number will never be correct.
>>
>> One for old_roots by searching commit root, and one for new_roots by
>> searching current root.
>>
>> You can call find_parent_node() as many time as you like, but that's just
>> wasting your CPU time.
>>
>> Only the final find_parent_node() will determine new_roots for that extent,
>> and there is no better timing than commit_transaction().
>
> You're missing my point.
>
> My point is not about needing to call find_parent_nodes() nor how many
> times to call it, or whether it's needed or not. My point is about
> doing expensive things inside the critical section of a transaction
> commit, which leads not only to low performance but getting a system
> becoming unresponsive and with too high latency - and this is not
> theory or speculation, there are upstream reports about this as well
> as several in suse's bugzilla, all caused when qgroups are enabled on
> 4.2+ kernels (when the last qgroups major changes landed).
>
> Judging from that code and from your reply to this and other threads
> it seems you didn't understand the consequences of doing all that
> accounting stuff inside the critical section of a transaction commit.

NO, I know what you're talking about.
Or I won't send the patch to move half of the find_all_roots() call out 
of commit_trans().
(OK, I just like refer to commit_trans() to its critical section, as 
quick exit or other part doesn't make much impact in this context)

While it seems that you still don't understand the necessity to call 
find_all_roots() in critical section.
It's the base stone of qgroup, or we get back to the qgroup mismatch days.

As I already mentioned, only before we switch fs commit roots, we can 
avoid calling extra expensive find_all_roots() and get correct new_roots.

That's to say, at least one find_all_roots() should be called at 
critical section for extent modified extent, to keep qgroup correct.

If you could have any better solution to keep qgroup correct and avoid 
calling find_all_roots() in critical section, I'm happy to listen.

>
> Forget find_parent_nodes() for a moment, yes it has problems, but it's
> not what I'm trying to make you understand. Just look at
> btrfs_qgroup_account_extents(), called within the critical section -
> it iterates all elements of a red black tree, and each element
> corresponds to some data extent allocated in the current transaction -
> if we have thousands, tens of thousands, or more, even if whatever the
> loop calls had an awesome complexity of O(1) or O(log N) it would
> still be bad, exactly because it's blocking future transactions to
> start and tasks from joining the current transaction. CPU time and
> memory consumption (used for those struct btrfs_qgroup_extent_record)
> are also concerns, but to a smaller extent imo.

That's another problem.

For worst case, we could limit the number of delayed ref head before 
triggering a transaction commit.
But I still doubt if qgroup extent record is the cause for OOM.

Memory consumption of qgroup_extent_record is quite small, it's only 48 
bytes for one extent.
(And delayed ref head is much larger than qgroup_extent_record)

While for one extent, its minimum size is 4K, overhead would be at most 1%.

IIRC before we hit OOM, memory pressure should trigger 
commit_transaction(), and freeing all the qgroup_extent_record().
So there is some other problem related to this.

>
>>
>> Or you can wasting more time calling find_parent_node() every time you
>> touched a extent, saving one find_parent_node() in commit_transaction() with
>> the cost of more find_parent_node() in other place.
>> Is that what you want?
>>
>> I can move the find_parent_node() for old_roots out of commit_transaction().
>> But that will only reduce 50% of the time spent on commit_transaction().
>>
>> Compared to O(n^3) find_parent_node(), that's not the determining fact even.
>>
>> Thanks,
>> Qu
>>
>>
>>>
>>>>
>>>>
>>>> IIRC SUSE guys(maybe Jeff?) are working on it with the first method, but
>>>> I
>>>> didn't hear anything about it recently.
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>>
>>
>>
>
>
>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-04  1:47   ` Jorg Bornschein
  2017-02-04  2:55     ` Lakshmipathi.G
  2017-02-06  1:45     ` Qu Wenruo
@ 2017-02-06  9:14     ` Jorg Bornschein
  2017-02-06  9:29       ` Qu Wenruo
  2 siblings, 1 reply; 22+ messages in thread
From: Jorg Bornschein @ 2017-02-06  9:14 UTC (permalink / raw)
  To: Qu Wenruo, Goldwyn Rodrigues, linux-btrfs

February 6, 2017 1:45 AM, "Qu Wenruo" <quwenruo@cn.fujitsu.com> 

> Would you please provide the kernel version?
> 
> v4.9 introduced a bad fix for qgroup balance, which doesn't completely fix qgroup bytes leaking,
> but also hugely slow down the balance process:
>

I'm a bit behind the times: 4.8.13-1-ARCH



   j

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-06  9:14     ` Jorg Bornschein
@ 2017-02-06  9:29       ` Qu Wenruo
  0 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2017-02-06  9:29 UTC (permalink / raw)
  To: Jorg Bornschein, Goldwyn Rodrigues, linux-btrfs



At 02/06/2017 05:14 PM, Jorg Bornschein wrote:
> February 6, 2017 1:45 AM, "Qu Wenruo" <quwenruo@cn.fujitsu.com>
>
>> Would you please provide the kernel version?
>>
>> v4.9 introduced a bad fix for qgroup balance, which doesn't completely fix qgroup bytes leaking,
>> but also hugely slow down the balance process:
>>
>
> I'm a bit behind the times: 4.8.13-1-ARCH
>
>
>
>    j
>
>
Unfortunately, v4.8 also has that bad commit :(.

So if you have your spare time, you could try v4.10.
Although for Archlinux it would take some time before v4.10 moved from 
[testing] to [core].

Thanks,
Qu



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-04  0:30 ` Jorg Bornschein
  2017-02-04  1:07   ` Goldwyn Rodrigues
  2017-02-04  1:47   ` Jorg Bornschein
@ 2017-02-04 20:50   ` Jorg Bornschein
  2017-02-04 21:10     ` Kai Krakow
  2 siblings, 1 reply; 22+ messages in thread
From: Jorg Bornschein @ 2017-02-04 20:50 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-btrfs

February 4, 2017 1:07 AM, "Goldwyn Rodrigues" <rgoldwyn@suse.de> wrote:

> Yes, please check if disabling quotas makes a difference in execution
> time of btrfs balance.

Just FYI: With quotas disabled it took ~20h to finish the balance instead of the projected >30 days. Therefore, in my case, there was a speedup of factor ~35.

and thanks for the quick reply! (and for btrfs general!)

BTW: I'm wondering how much sense it makes to activate the underlying bcache for my raid1 fs again. I guess btrfs chooses randomly (or based predicted of disk latency?) which copy of a given extend to load? I guess that would mean the effective cache size would only be half of the actual cache-set size (+-additional overhead)? Or does btrfs try a deterministically determined copy of each extend first? 

   j

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-04 20:50   ` Jorg Bornschein
@ 2017-02-04 21:10     ` Kai Krakow
  2017-02-06 13:19       ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2017-02-04 21:10 UTC (permalink / raw)
  To: linux-btrfs

Am Sat, 04 Feb 2017 20:50:03 +0000
schrieb "Jorg Bornschein" <jb@capsec.org>:

> February 4, 2017 1:07 AM, "Goldwyn Rodrigues" <rgoldwyn@suse.de>
> wrote:
> 
> > Yes, please check if disabling quotas makes a difference in
> > execution time of btrfs balance.  
> 
> Just FYI: With quotas disabled it took ~20h to finish the balance
> instead of the projected >30 days. Therefore, in my case, there was a
> speedup of factor ~35.
> 
> 
> and thanks for the quick reply! (and for btrfs general!)
> 
> 
> BTW: I'm wondering how much sense it makes to activate the underlying
> bcache for my raid1 fs again. I guess btrfs chooses randomly (or
> based predicted of disk latency?) which copy of a given extend to
> load?

As far as I know, it uses PID modulo only currently, no round-robin,
no random value. There are no performance optimizations going into btrfs
yet because there're still a lot of ongoing feature implementations.

I think there were patches to include a rotator value in the stripe
selection. They don't apply to the current kernel. I tried it once and
didn't see any subjective difference for normal desktop workloads. But
that's probably because I use RAID1 for metadata only.

MDRAID uses stripe selection based on latency and other measurements
(like head position). It would be nice if btrfs implemented similar
functionality. This would also be helpful for selecting a disk if
there're more disks than stripesets (for example, I have 3 disks in my
btrfs array). This could write new blocks to the most idle disk always.
I think this wasn't covered by the above mentioned patch. Currently,
selection is based only on the disk with most free space.

> I guess that would mean the effective cache size would only be
> half of the actual cache-set size (+-additional overhead)? Or does
> btrfs try a deterministically determined copy of each extend first? 

I'm currently using 500GB bcache, it helps a lot during system start -
and probably also while using using the system. I think that bcache
mostly caches metadata access which should improve a lot of btrfs
performance issues. The downside of RAID1 profile is, that probably
every second access is a cache-miss unless it has already been cached.
Thus, it's only half-effective as it could be.

I'm using write-back bcache caching, and RAID0 for data (I do daily
backups with borgbackup, I can easily recover broken files). So
writing with bcache is not such an issue for me. The cache is big
enough that double metadata writes are no problem.

-- 
Regards,
Kai

Replies to list-only preferred.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-04 21:10     ` Kai Krakow
@ 2017-02-06 13:19       ` Austin S. Hemmelgarn
  2017-02-07 19:47         ` Kai Krakow
  0 siblings, 1 reply; 22+ messages in thread
From: Austin S. Hemmelgarn @ 2017-02-06 13:19 UTC (permalink / raw)
  To: linux-btrfs

On 2017-02-04 16:10, Kai Krakow wrote:
> Am Sat, 04 Feb 2017 20:50:03 +0000
> schrieb "Jorg Bornschein" <jb@capsec.org>:
>
>> February 4, 2017 1:07 AM, "Goldwyn Rodrigues" <rgoldwyn@suse.de>
>> wrote:
>>
>>> Yes, please check if disabling quotas makes a difference in
>>> execution time of btrfs balance.
>>
>> Just FYI: With quotas disabled it took ~20h to finish the balance
>> instead of the projected >30 days. Therefore, in my case, there was a
>> speedup of factor ~35.
>>
>>
>> and thanks for the quick reply! (and for btrfs general!)
>>
>>
>> BTW: I'm wondering how much sense it makes to activate the underlying
>> bcache for my raid1 fs again. I guess btrfs chooses randomly (or
>> based predicted of disk latency?) which copy of a given extend to
>> load?
>
> As far as I know, it uses PID modulo only currently, no round-robin,
> no random value. There are no performance optimizations going into btrfs
> yet because there're still a lot of ongoing feature implementations.
>
> I think there were patches to include a rotator value in the stripe
> selection. They don't apply to the current kernel. I tried it once and
> didn't see any subjective difference for normal desktop workloads. But
> that's probably because I use RAID1 for metadata only.
I had tested similar patches myself using raid1 for everything, and saw 
near zero improvement unless I explicitly tried to create a worst-case 
performance situation.  The reality is that the current algorithm is 
actually remarkably close to being optimal for most use cases while 
using an insanely small amount of processing power and memory compared 
to an optimal algorithm (and a truly optimal algorithm is in fact 
functionally impossible in almost all cases because it would require 
predicting the future).
>
> MDRAID uses stripe selection based on latency and other measurements
> (like head position). It would be nice if btrfs implemented similar
> functionality. This would also be helpful for selecting a disk if
> there're more disks than stripesets (for example, I have 3 disks in my
> btrfs array). This could write new blocks to the most idle disk always.
> I think this wasn't covered by the above mentioned patch. Currently,
> selection is based only on the disk with most free space.
You're confusing read selection and write selection.  MDADM and DM-RAID 
both use a load-balancing read selection algorithm that takes latency 
and other factors into account.  However, they use a round-robin write 
selection algorithm that only cares about the position of the block in 
the virtual device modulo the number of physical devices.

As an example, say you have a 3 disk RAID10 array set up using MDADM 
(this is functionally the same as a 3-disk raid1 mode BTRFS filesystem). 
  Every third block starting from block 0 will be on disks 1 and 2, 
every third block starting from block 1 will be on disks 3 and 1, and 
every third block starting from block 2 will be on disks 2 and 3.  No 
latency measurements are taken, literally nothing is factored in except 
the block's position in the virtual device.

Now, that said, BTRFS does behave differently under the same 
circumstances, but this is because the striping is different for BTRFS. 
It happens at the chunk level instead of the block level.  If we look at 
an example using the same 3 devices as the MDADM example, and then for 
simplicity assume that you end up allocating alternating data and 
metadata chunks, things might look a bit like this:
* System chunk: Device 1 and 2
* Metadata chunk 0: Device 3 and 1
* Data chunk 0: Device 2 and 3
* Metadata chunk 1: Device 1 and 2
* Data chunk 1: Device 1 and 2
Overall, there is technically a pattern, but it's got a very long 
repetition period.  This is still however a near optimal allocation 
pattern given the constraints.  It also gives (just like the MDADM and 
DM-RAID method) 100% deterministic behavior, the only difference is it 
depends on a slightly different factor.  Changing this to select the 
most idle disk as you suggest would remove that determinism, increase 
the likelihood of sub-optimal layouts in terms of space usage, increase 
the number of cases where you could get ENOSPC, and provide near zero 
net performance benefit except under heavy load.  IOW, it would provide 
a pretty negative net benefit.

What actually needs to happen to improve write performance is that BTRFS 
needs to quit serializing writes when writing chunks across multiple 
devices.  In the case of a raid1 setup, it writes first to one device, 
then the other, alternating back and forth as it updates each extent. 
This combined with the COW behavior causing write amplification is what 
makes write performance so horrible for BTRFS compared to MDADM or 
DM-RAID.  It's not that we have bad device selection for writes, it's 
that we don't even try to do any kind of practical parallelization 
despite it being an embarrassingly parallel task (and yes, that 
seriously is what something that's trivial to parallelize is called in 
scientific papers...).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-06 13:19       ` Austin S. Hemmelgarn
@ 2017-02-07 19:47         ` Kai Krakow
  2017-02-07 19:58           ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 22+ messages in thread
From: Kai Krakow @ 2017-02-07 19:47 UTC (permalink / raw)
  To: linux-btrfs

Am Mon, 6 Feb 2017 08:19:37 -0500
schrieb "Austin S. Hemmelgarn" <ahferroin7@gmail.com>:

> > MDRAID uses stripe selection based on latency and other measurements
> > (like head position). It would be nice if btrfs implemented similar
> > functionality. This would also be helpful for selecting a disk if
> > there're more disks than stripesets (for example, I have 3 disks in
> > my btrfs array). This could write new blocks to the most idle disk
> > always. I think this wasn't covered by the above mentioned patch.
> > Currently, selection is based only on the disk with most free
> > space.  
> You're confusing read selection and write selection.  MDADM and
> DM-RAID both use a load-balancing read selection algorithm that takes
> latency and other factors into account.  However, they use a
> round-robin write selection algorithm that only cares about the
> position of the block in the virtual device modulo the number of
> physical devices.

Thanks for clearing that point.

> As an example, say you have a 3 disk RAID10 array set up using MDADM 
> (this is functionally the same as a 3-disk raid1 mode BTRFS
> filesystem). Every third block starting from block 0 will be on disks
> 1 and 2, every third block starting from block 1 will be on disks 3
> and 1, and every third block starting from block 2 will be on disks 2
> and 3.  No latency measurements are taken, literally nothing is
> factored in except the block's position in the virtual device.

I didn't know MDADM can use RAID10 on odd amounts of disks...
Nice. I'll keep that in mind. :-)


-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
  2017-02-07 19:47         ` Kai Krakow
@ 2017-02-07 19:58           ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 22+ messages in thread
From: Austin S. Hemmelgarn @ 2017-02-07 19:58 UTC (permalink / raw)
  To: Kai Krakow, linux-btrfs

On 2017-02-07 14:47, Kai Krakow wrote:
> Am Mon, 6 Feb 2017 08:19:37 -0500
> schrieb "Austin S. Hemmelgarn" <ahferroin7@gmail.com>:
>
>>> MDRAID uses stripe selection based on latency and other measurements
>>> (like head position). It would be nice if btrfs implemented similar
>>> functionality. This would also be helpful for selecting a disk if
>>> there're more disks than stripesets (for example, I have 3 disks in
>>> my btrfs array). This could write new blocks to the most idle disk
>>> always. I think this wasn't covered by the above mentioned patch.
>>> Currently, selection is based only on the disk with most free
>>> space.
>> You're confusing read selection and write selection.  MDADM and
>> DM-RAID both use a load-balancing read selection algorithm that takes
>> latency and other factors into account.  However, they use a
>> round-robin write selection algorithm that only cares about the
>> position of the block in the virtual device modulo the number of
>> physical devices.
>
> Thanks for clearing that point.
>
>> As an example, say you have a 3 disk RAID10 array set up using MDADM
>> (this is functionally the same as a 3-disk raid1 mode BTRFS
>> filesystem). Every third block starting from block 0 will be on disks
>> 1 and 2, every third block starting from block 1 will be on disks 3
>> and 1, and every third block starting from block 2 will be on disks 2
>> and 3.  No latency measurements are taken, literally nothing is
>> factored in except the block's position in the virtual device.
>
> I didn't know MDADM can use RAID10 on odd amounts of disks...
> Nice. I'll keep that in mind. :-)
It's one of those neat features that I stumbled across by accident a 
while back that not many people know about.  It's kind of ironic when 
you think about it too, since the MD RAID10 profile with only 2 replicas 
is actually a more accurate comparison for the BTRFS raid1 profile than 
the MD RAID1 profile.  FWIW, it can (somewhat paradoxically) sometimes 
get better read and write performance than MD RAID0 across the same 
number of disks.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Very slow balance / btrfs-transaction
@ 2017-07-01 14:24 Sidney San Martín
  0 siblings, 0 replies; 22+ messages in thread
From: Sidney San Martín @ 2017-07-01 14:24 UTC (permalink / raw)
  To: rgoldwyn, linux-btrfs; +Cc: jb

February 3, 2017 11:26 PM, "Goldwyn Rodrigues" <rgoldwyn@suse.com> wrote:
> On 02/03/2017 04:13 PM, jb@capsec.org wrote:
> > Hi, 
> > 
> > 
> > I'm currently running a balance (without any filters) on a 4 drives raid1 
> > filesystem. The array contains 3 3TB drives and one 6TB drive; I'm running 
> > the rebalance because the 6TB drive recently replaced a 2TB drive. 
> > 
> > 
> > I know that balance is not supposed to be a fast operation, but this one is 
> > now running for ~6 days and it managed to balance ~18% (754 out of about 4250 
> > chunks balanced (755 considered),  82% left) -- so I expect it to take 
> > another ~4 weeks. 
> > 
> > That seems excessively slow for ~8TiB of data.
> > 
> > 
> > Is this expected behavior? In case it's not: Is there anything I can do to 
> > help debug it?
> 
> Do you have quotas enabled?
> 
> -- 
> Goldwyn

Just dropping in — I don’t normally follow the list but I found this thread when I was troubleshooting balance issues (kernel 4.11, converting raid1 to raid10). Disabling quotas had an immense impact on performance and it would be helpful if notes could be added in *lots* of places. With quotas on, each block group took 30 minutes to over an hour to convert, and the system was only usable for a few seconds per iteration:

    Jun 28 00:42:41 overkill kernel: BTRFS info (device sdc2): relocating block group 7141922439168 flags data|raid1
    Jun 28 01:32:13 overkill kernel: BTRFS info (device sdc2): relocating block group 7140848697344 flags data|raid1
    Jun 28 02:48:59 overkill kernel: BTRFS info (device sdc2): relocating block group 7139774955520 flags data|raid1
    Jun 28 03:50:12 overkill kernel: BTRFS info (device sdc2): relocating block group 7138701213696 flags data|raid1
    Jun 28 05:20:58 overkill kernel: BTRFS info (device sdc2): relocating block group 7137627471872 flags data|raid1
    Jun 28 06:49:00 overkill kernel: BTRFS info (device sdc2): relocating block group 7136553730048 flags data|raid1
    Jun 28 07:23:58 overkill kernel: BTRFS info (device sdc2): relocating block group 7135479988224 flags data|raid1
    Jun 28 08:03:39 overkill kernel: BTRFS info (device sdc2): relocating block group 7134406246400 flags data|raid1
    Jun 28 08:40:11 overkill kernel: BTRFS info (device sdc2): relocating block group 7133332504576 flags data|raid1
    Jun 28 09:44:46 overkill kernel: BTRFS info (device sdc2): relocating block group 7132258762752 flags data|raid1
    Jun 28 10:24:17 overkill kernel: BTRFS info (device sdc2): relocating block group 7131185020928 flags data|raid1
    Jun 28 11:35:39 overkill kernel: BTRFS info (device sdc2): relocating block group 7130111279104 flags data|raid1
    Jun 28 12:53:56 overkill kernel: BTRFS info (device sdc2): relocating block group 7129037537280 flags data|raid1
    Jun 28 13:37:00 overkill kernel: BTRFS info (device sdc2): relocating block group 7127963795456 flags data|raid1
    Jun 28 14:32:19 overkill kernel: BTRFS info (device sdc2): relocating block group 7126890053632 flags data|raid1
    Jun 28 15:45:19 overkill kernel: BTRFS info (device sdc2): relocating block group 7125816311808 flags data|raid1
    Jun 28 16:30:01 overkill kernel: BTRFS info (device sdc2): relocating block group 7124742569984 flags data|raid1
    Jun 28 17:26:57 overkill kernel: BTRFS info (device sdc2): relocating block group 7123668828160 flags data|raid1
    Jun 28 18:15:01 overkill kernel: BTRFS info (device sdc2): relocating block group 7122595086336 flags data|raid1
    Jun 28 18:48:05 overkill kernel: BTRFS info (device sdc2): relocating block group 7121521344512 flags data|raid1
    Jun 28 19:25:59 overkill kernel: BTRFS info (device sdc2): relocating block group 7120447602688 flags data|raid1
    Jun 28 19:55:46 overkill kernel: BTRFS info (device sdc2): relocating block group 7119373860864 flags data|raid1
    Jun 28 20:30:41 overkill kernel: BTRFS info (device sdc2): relocating block group 7118300119040 flags data|raid1
    Jun 28 21:28:43 overkill kernel: BTRFS info (device sdc2): relocating block group 7117226377216 flags data|raid1
    Jun 28 22:55:34 overkill kernel: BTRFS info (device sdc2): relocating block group 7114005151744 flags data|raid1
    Jun 28 23:19:06 overkill kernel: BTRFS info (device sdc2): relocating block group 7110783926272 flags data|raid1

With quotas off, it takes ~20 seconds to convert each block group and the system is completely usable:

    Jul 01 09:56:42 overkill kernel: BTRFS info (device sde): relocating block group 7085014122496 flags data|raid1
    Jul 01 09:56:59 overkill kernel: BTRFS info (device sde): relocating block group 7083940380672 flags data|raid1
    Jul 01 09:57:18 overkill kernel: BTRFS info (device sde): relocating block group 7082866638848 flags data|raid1
    Jul 01 09:57:39 overkill kernel: BTRFS info (device sde): relocating block group 7081792897024 flags data|raid1
    Jul 01 09:58:01 overkill kernel: BTRFS info (device sde): relocating block group 7080719155200 flags data|raid1
    Jul 01 09:58:27 overkill kernel: BTRFS info (device sde): relocating block group 7079645413376 flags data|raid1
    Jul 01 09:58:45 overkill kernel: BTRFS info (device sde): relocating block group 7078571671552 flags data|raid1
    Jul 01 09:59:00 overkill kernel: BTRFS info (device sde): relocating block group 7077497929728 flags data|raid1
    Jul 01 09:59:16 overkill kernel: BTRFS info (device sde): relocating block group 7076424187904 flags data|raid1

Cheers,
Sidney

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-07-01 14:24 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-02-03 22:13 Very slow balance / btrfs-transaction jb
2017-02-03 23:25 ` Goldwyn Rodrigues
2017-02-04  0:30 ` Jorg Bornschein
2017-02-04  1:07   ` Goldwyn Rodrigues
2017-02-04  1:47   ` Jorg Bornschein
2017-02-04  2:55     ` Lakshmipathi.G
2017-02-04  8:22       ` Duncan
2017-02-06  1:45     ` Qu Wenruo
2017-02-06 16:09       ` Goldwyn Rodrigues
2017-02-07  0:22         ` Qu Wenruo
2017-02-07 15:55           ` Filipe Manana
2017-02-08  0:39             ` Qu Wenruo
2017-02-08 13:56               ` Filipe Manana
2017-02-09  1:13                 ` Qu Wenruo
2017-02-06  9:14     ` Jorg Bornschein
2017-02-06  9:29       ` Qu Wenruo
2017-02-04 20:50   ` Jorg Bornschein
2017-02-04 21:10     ` Kai Krakow
2017-02-06 13:19       ` Austin S. Hemmelgarn
2017-02-07 19:47         ` Kai Krakow
2017-02-07 19:58           ` Austin S. Hemmelgarn
  -- strict thread matches above, loose matches on Subject: below --
2017-07-01 14:24 Sidney San Martín

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).