From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from www.sr71.net ([198.145.64.142]:59133 "EHLO blackbird.sr71.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755636AbcCPUxV (ORCPT ); Wed, 16 Mar 2016 16:53:21 -0400 To: linux-btrfs@vger.kernel.org, Qu Wenruo , Chris Mason From: Dave Hansen Subject: qgroup code slowing down rebalance Message-ID: <56E9C7BB.7060509@sr71.net> Date: Wed, 16 Mar 2016 13:53:15 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: I have a medium-sized multi-device btrfs filesystem (4 disks, 16TB total) running under 4.5.0-rc5. I recently added a disk and needed to rebalance. I started a rebalance operation three days ago. It was on the order of 20% done after those three days. :) During this rebalance, the disks were pretty lightly used. I would see a small burst of tens of MB/s, then it would go back to no activity for a few minutes, small burst, no activity, etc... During the quiet times (for the disk) one processor would be pegged inside the kernel and would have virtually no I/O wait time. Also during this time, the filesystem was pretty unbearably slow. An ls of a small directory would hang for minutes. A perf profile shows 92% of the cpu time is being spend in btrfs_find_all_roots(), called under this call path: btrfs_commit_transaction -> btrfs_qgroup_prepare_account_extents -> btrfs_find_all_roots So I tried disabling quotas by doing: btrfs quota disable /mnt/foo which took a few minutes to complete, but once it did, the disks went back up to doing ~200MB/s, the kernel time went down to ~20%, and the system now has lots of I/O wait time. It looks to be behaving nicely. Is this expected? From my perspective, it makes quotas pretty much unusable at least during a rebalance. I have a full 'perf record' profile with call graphs if it would be helpful.