From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cn.fujitsu.com ([59.151.112.132]:48220 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S932868AbbKFBCS (ORCPT ); Thu, 5 Nov 2015 20:02:18 -0500 Subject: Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete To: Mark Fasheh References: <56367AE8.9030509@profihost.ag> <5636BDA0.4020200@cn.fujitsu.com> <20151103192625.GE15575@wotan.suse.de> <563958F0.6030904@cn.fujitsu.com> <20151105192346.GI15575@wotan.suse.de> CC: Stefan Priebe , "linux-btrfs@vger.kernel.org" , , Chris Mason From: Qu Wenruo Message-ID: <563BFC15.4070705@cn.fujitsu.com> Date: Fri, 6 Nov 2015 09:02:13 +0800 MIME-Version: 1.0 In-Reply-To: <20151105192346.GI15575@wotan.suse.de> Content-Type: text/plain; charset="utf-8"; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: Mark Fasheh wrote on 2015/11/05 11:23 -0800: > On Wed, Nov 04, 2015 at 09:01:36AM +0800, Qu Wenruo wrote: >> >> >> Mark Fasheh wrote on 2015/11/03 11:26 -0800: >>> On Mon, Nov 02, 2015 at 09:34:24AM +0800, Qu Wenruo wrote: >>>> >>>> >>>> Stefan Priebe wrote on 2015/11/01 21:49 +0100: >>>>> Hi, >>>>> >>>>> this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html >>>>> >>>>> adds a regression to my test systems with very large disks (30tb and 50tb). >>>>> >>>>> btrfs balance is super slow afterwards while heavily making use of cp >>>>> --reflink=always on big files (200gb - 500gb). >>>>> >>>>> Sorry didn't know how to correctly reply to that "old" message. >>>>> >>>>> Greets, >>>>> Stefan >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> Thanks for the testing. >>>> >>>> Are you using qgroup or just doing normal balance with qgroup disabled? >>>> >>>> For the latter case, that's should be optimized to skip the dirty >>>> extent insert in qgroup disabled case. >>>> >>>> For qgroup enabled case, I'm afraid that's the design. >>>> As relocation will drop a subtree to relocate, and to ensure qgroup >>>> consistent, we must walk down all the tree blocks and mark them >>>> dirty for later qgroup accounting. >>> >>> Qu, we're always going to have to walk the tree when deleting it, this is >>> part of removing a subvolume. We've walked shared subtrees in this code for >>> numerous kernel releases without incident before it was removed in 4.2. >>> >>> Do you have any actual evidence that this is a major performance regression? >>> From our previous conversations you seemed convinced of this, without even >>> having a working subtree walk to test. I remember the hand wringing >>> about an individual commit being too heavy with the qgroup code (even though >>> I pointed out that tree walk is a restartable transaction). >>> >>> It seems that you are confused still about how we handle removing a volume >>> wrt qgroups. >>> >>> If you have questions or concerns I would be happy to explain them but >>> IMHO your statements there are opinion and not based in fact. >> >> Yes, I don't deny it. >> But it's quite hard to prove it, as we need such a huge storage like Stefan. >> What I have is only several hundred GB test storage. >> Even accounting all my home NAS, I only have 2T, far from the >> storage Stefan has. >> >> And what Stefan report should already give some hint about the >> performance issue. >> >> In your word "it won't be doing anything (ok some kmalloc/free of a >> very tiny object)", it's already slowing down balance, since balance >> also use btrfs_drop_subtree(). > > When I wrote that I was under the impression that the qgroup code was doing > it's own sanity checking (it used to) and since Stephan had them disabled > they couldn't be causing the problem. I read your e-mail explaining that the > qgroup api was now intertwined with delayed ref locking after this one. My fault, as btrfs_qgroup_mark_exntent_dirty() is an exception which doesn't have the qgroup status check and depend on existing locks. > > The same exact code ran in either case before and after your patches, so my > guess is that the issue is actually inside the qgroup code that shouldn't > have been run. I wonder if we even just filled up his memory but never > cleaned the objects. The only other thing I can think of is if > account_leaf_items() got run in a really tight loop for some reason. > > Kmalloc in the way we are using it is not usually a performance issue, > especially if we've been reading off disk in the same process. Ask yourself > this - your own patch series does the same kmalloc for every qgroup > operation. Did you notice a complete and massive performance slowdown like > the one Stefan reported? You're right, such memory allocation may impact performance but not so noticeable, compared to other operations which may kick disk IO, like btrfs_find_all_roots(). But at least, enabling qgroup will impact performance. Yeah, this time I has test data now. In a environment with 100 different snapshot, sysbench shows an overall performance drop about 5%, and in some case, up to 7%, with qgroup enabled. Not sure about the kmalloc impact, maybe less than 1% or maybe 2~3%, but at least it's worthy trying to use kmem cache. > > I will say that we never had this problem reported before, and > account_leaf_items() is always run in all kernels, even without qgroups > enabled. That will change with my new patch though. > > What we can say for sure is that drop_snapshot in the qgroup case will read > more disk and obviously that will have a negative impact depending on what > the tree looks like. So IMHO we ought to be focusing on reducing the amount > of I/O involved. Totally agree. Thanks, Qu > > >> But we can't just ignore such "possible" performance issue just >> because old code did the same thing.(Although not the same now, >> we're marking all subtree blocks dirty other than shared one). > > Well, I can't disagree with that - the only reason we are talking right now > is because you intentionally ignored the qgroup code in drop_snapshot(). So > let's start with this - no more 'fixing' code by tearing it out and replacing > it with /* TODO: somebody else re-implement this */ ;) > --Mark > > -- > Mark Fasheh >