Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Mark Fasheh <mfasheh@suse.de>
Cc: Stefan Priebe <s.priebe@profihost.ag>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
	<jbacik@fb.com>, Chris Mason <clm@fb.com>
Subject: Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
Date: Fri, 6 Nov 2015 09:02:13 +0800	[thread overview]
Message-ID: <563BFC15.4070705@cn.fujitsu.com> (raw)
In-Reply-To: <20151105192346.GI15575@wotan.suse.de>



Mark Fasheh wrote on 2015/11/05 11:23 -0800:
> On Wed, Nov 04, 2015 at 09:01:36AM +0800, Qu Wenruo wrote:
>>
>>
>> Mark Fasheh wrote on 2015/11/03 11:26 -0800:
>>> On Mon, Nov 02, 2015 at 09:34:24AM +0800, Qu Wenruo wrote:
>>>>
>>>>
>>>> Stefan Priebe wrote on 2015/11/01 21:49 +0100:
>>>>> Hi,
>>>>>
>>>>> this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html
>>>>>
>>>>> adds a regression to my test systems with very large disks (30tb and 50tb).
>>>>>
>>>>> btrfs balance is super slow afterwards while heavily making use of cp
>>>>> --reflink=always on big files (200gb - 500gb).
>>>>>
>>>>> Sorry didn't know how to correctly reply to that "old" message.
>>>>>
>>>>> Greets,
>>>>> Stefan
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>> Thanks for the testing.
>>>>
>>>> Are you using qgroup or just doing normal balance with qgroup disabled?
>>>>
>>>> For the latter case, that's should be optimized to skip the dirty
>>>> extent insert in qgroup disabled case.
>>>>
>>>> For qgroup enabled case, I'm afraid that's the design.
>>>> As relocation will drop a subtree to relocate, and to ensure qgroup
>>>> consistent, we must walk down all the tree blocks and mark them
>>>> dirty for later qgroup accounting.
>>>
>>> Qu, we're always going to have to walk the tree when deleting it, this is
>>> part of removing a subvolume. We've walked shared subtrees in this code for
>>> numerous kernel releases without incident before it was removed in 4.2.
>>>
>>> Do you have any actual evidence that this is a major performance regression?
>>>  From our previous conversations you seemed convinced of this, without even
>>> having a working subtree walk to test. I remember the hand wringing
>>> about an individual commit being too heavy with the qgroup code (even though
>>> I pointed out that tree walk is a restartable transaction).
>>>
>>> It seems that you are confused still about how we handle removing a volume
>>> wrt qgroups.
>>>
>>> If you have questions or concerns I would be happy to explain them but
>>> IMHO your statements there are opinion and not based in fact.
>>
>> Yes, I don't deny it.
>> But it's quite hard to prove it, as we need such a huge storage like Stefan.
>> What I have is only several hundred GB test storage.
>> Even accounting all my home NAS, I only have 2T, far from the
>> storage Stefan has.
>>
>> And what Stefan report should already give some hint about the
>> performance issue.
>>
>> In your word "it won't be doing anything (ok some kmalloc/free of a
>> very tiny object)", it's already slowing down balance, since balance
>> also use btrfs_drop_subtree().
>
> When I wrote that I was under the impression that the qgroup code was doing
> it's own sanity checking (it used to) and since Stephan had them disabled
> they couldn't be causing the problem. I read your e-mail explaining that the
> qgroup api was now intertwined with delayed ref locking after this one.

My fault, as btrfs_qgroup_mark_exntent_dirty() is an exception which 
doesn't have the qgroup status check and depend on existing locks.

>
> The same exact code ran in either case before and after your patches, so my
> guess is that the issue is actually inside the qgroup code that shouldn't
> have been run. I wonder if we even just filled up his memory but never
> cleaned the objects. The only other thing I can think of is if
> account_leaf_items() got run in a really tight loop for some reason.
>
> Kmalloc in the way we are using it is not usually a performance issue,
> especially if we've been reading off disk in the same process. Ask yourself
> this - your own patch series does the same kmalloc for every qgroup
> operation. Did you notice a complete and massive performance slowdown like
> the one Stefan reported?

You're right, such memory allocation may impact performance but not so 
noticeable, compared to other operations which may kick disk IO, like 
btrfs_find_all_roots().

But at least, enabling qgroup will impact performance.

Yeah, this time I has test data now.
In a environment with 100 different snapshot, sysbench shows an overall 
performance drop about 5%, and in some case, up to 7%, with qgroup enabled.

Not sure about the kmalloc impact, maybe less than 1% or maybe 2~3%, but 
at least it's worthy trying to use kmem cache.

>
> I will say that we never had this problem reported before, and
> account_leaf_items() is always run in all kernels, even without qgroups
> enabled. That will change with my new patch though.
>
> What we can say for sure is that drop_snapshot in the qgroup case will read
> more disk and obviously that will have a negative impact depending on what
> the tree looks like. So IMHO we ought to be focusing on reducing the amount
> of I/O involved.

Totally agree.

Thanks,
Qu

>
>
>> But we can't just ignore such "possible" performance issue just
>> because old code did the same thing.(Although not the same now,
>> we're marking all subtree blocks dirty other than shared one).
>
> Well, I can't disagree with that - the only reason we are talking right now
> is because you intentionally ignored the qgroup code in drop_snapshot(). So
> let's start with this - no more 'fixing' code by tearing it out and replacing
> it with /* TODO: somebody else re-implement this */   ;)
> 	--Mark
>
> --
> Mark Fasheh
>

next prev parent reply	other threads:[~2015-11-06  1:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-01 20:49 Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete Stefan Priebe
2015-11-01 22:57 ` Duncan
2015-11-02  1:34 ` Qu Wenruo
2015-11-02  5:46   ` Stefan Priebe
2015-11-03 19:15     ` Mark Fasheh
2015-11-03 19:26   ` Mark Fasheh
2015-11-03 19:42     ` Stefan Priebe
2015-11-03 23:31       ` Mark Fasheh
2015-11-04  2:22         ` Chris Mason
2015-11-04  1:01     ` Qu Wenruo
2015-11-05 19:23       ` Mark Fasheh
2015-11-06  1:02         ` Qu Wenruo [this message]
2015-11-06  3:15           ` Mark Fasheh
2015-11-06  3:25             ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=563BFC15.4070705@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=clm@fb.com \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mfasheh@suse.de \
    --cc=s.priebe@profihost.ag \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).