Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Mark Fasheh <mfasheh@suse.de>
Cc: Stefan Priebe <s.priebe@profihost.ag>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
	<jbacik@fb.com>, Chris Mason <clm@fb.com>
Subject: Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
Date: Fri, 6 Nov 2015 09:02:13 +0800	[thread overview]
Message-ID: <563BFC15.4070705@cn.fujitsu.com> (raw)
In-Reply-To: <20151105192346.GI15575@wotan.suse.de>



Mark Fasheh wrote on 2015/11/05 11:23 -0800:
> On Wed, Nov 04, 2015 at 09:01:36AM +0800, Qu Wenruo wrote:
>>
>>
>> Mark Fasheh wrote on 2015/11/03 11:26 -0800:
>>> On Mon, Nov 02, 2015 at 09:34:24AM +0800, Qu Wenruo wrote:
>>>>
>>>>
>>>> Stefan Priebe wrote on 2015/11/01 21:49 +0100:
>>>>> Hi,
>>>>>
>>>>> this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html
>>>>>
>>>>> adds a regression to my test systems with very large disks (30tb and 50tb).
>>>>>
>>>>> btrfs balance is super slow afterwards while heavily making use of cp
>>>>> --reflink=always on big files (200gb - 500gb).
>>>>>
>>>>> Sorry didn't know how to correctly reply to that "old" message.
>>>>>
>>>>> Greets,
>>>>> Stefan
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>> Thanks for the testing.
>>>>
>>>> Are you using qgroup or just doing normal balance with qgroup disabled?
>>>>
>>>> For the latter case, that's should be optimized to skip the dirty
>>>> extent insert in qgroup disabled case.
>>>>
>>>> For qgroup enabled case, I'm afraid that's the design.
>>>> As relocation will drop a subtree to relocate, and to ensure qgroup
>>>> consistent, we must walk down all the tree blocks and mark them
>>>> dirty for later qgroup accounting.
>>>
>>> Qu, we're always going to have to walk the tree when deleting it, this is
>>> part of removing a subvolume. We've walked shared subtrees in this code for
>>> numerous kernel releases without incident before it was removed in 4.2.
>>>
>>> Do you have any actual evidence that this is a major performance regression?
>>>  From our previous conversations you seemed convinced of this, without even
>>> having a working subtree walk to test. I remember the hand wringing
>>> about an individual commit being too heavy with the qgroup code (even though
>>> I pointed out that tree walk is a restartable transaction).
>>>
>>> It seems that you are confused still about how we handle removing a volume
>>> wrt qgroups.
>>>
>>> If you have questions or concerns I would be happy to explain them but
>>> IMHO your statements there are opinion and not based in fact.
>>
>> Yes, I don't deny it.
>> But it's quite hard to prove it, as we need such a huge storage like Stefan.
>> What I have is only several hundred GB test storage.
>> Even accounting all my home NAS, I only have 2T, far from the
>> storage Stefan has.
>>
>> And what Stefan report should already give some hint about the
>> performance issue.
>>
>> In your word "it won't be doing anything (ok some kmalloc/free of a
>> very tiny object)", it's already slowing down balance, since balance
>> also use btrfs_drop_subtree().
>
> When I wrote that I was under the impression that the qgroup code was doing
> it's own sanity checking (it used to) and since Stephan had them disabled
> they couldn't be causing the problem. I read your e-mail explaining that the
> qgroup api was now intertwined with delayed ref locking after this one.

My fault, as btrfs_qgroup_mark_exntent_dirty() is an exception which 
doesn't have the qgroup status check and depend on existing locks.

>
> The same exact code ran in either case before and after your patches, so my
> guess is that the issue is actually inside the qgroup code that shouldn't
> have been run. I wonder if we even just filled up his memory but never
> cleaned the objects. The only other thing I can think of is if
> account_leaf_items() got run in a really tight loop for some reason.
>
> Kmalloc in the way we are using it is not usually a performance issue,
> especially if we've been reading off disk in the same process. Ask yourself
> this - your own patch series does the same kmalloc for every qgroup
> operation. Did you notice a complete and massive performance slowdown like
> the one Stefan reported?

You're right, such memory allocation may impact performance but not so 
noticeable, compared to other operations which may kick disk IO, like 
btrfs_find_all_roots().

But at least, enabling qgroup will impact performance.

Yeah, this time I has test data now.
In a environment with 100 different snapshot, sysbench shows an overall 
performance drop about 5%, and in some case, up to 7%, with qgroup enabled.

Not sure about the kmalloc impact, maybe less than 1% or maybe 2~3%, but 
at least it's worthy trying to use kmem cache.

>
> I will say that we never had this problem reported before, and
> account_leaf_items() is always run in all kernels, even without qgroups
> enabled. That will change with my new patch though.
>
> What we can say for sure is that drop_snapshot in the qgroup case will read
> more disk and obviously that will have a negative impact depending on what
> the tree looks like. So IMHO we ought to be focusing on reducing the amount
> of I/O involved.

Totally agree.

Thanks,
Qu

>
>
>> But we can't just ignore such "possible" performance issue just
>> because old code did the same thing.(Although not the same now,
>> we're marking all subtree blocks dirty other than shared one).
>
> Well, I can't disagree with that - the only reason we are talking right now
> is because you intentionally ignored the qgroup code in drop_snapshot(). So
> let's start with this - no more 'fixing' code by tearing it out and replacing
> it with /* TODO: somebody else re-implement this */   ;)
> 	--Mark
>
> --
> Mark Fasheh
>

next prev parent reply	other threads:[~2015-11-06  1:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-01 20:49 Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete Stefan Priebe
2015-11-01 22:57 ` Duncan
2015-11-02  1:34 ` Qu Wenruo
2015-11-02  5:46   ` Stefan Priebe
2015-11-03 19:15     ` Mark Fasheh
2015-11-03 19:26   ` Mark Fasheh
2015-11-03 19:42     ` Stefan Priebe
2015-11-03 23:31       ` Mark Fasheh
2015-11-04  2:22         ` Chris Mason
2015-11-04  1:01     ` Qu Wenruo
2015-11-05 19:23       ` Mark Fasheh
2015-11-06  1:02         ` Qu Wenruo [this message]
2015-11-06  3:15           ` Mark Fasheh
2015-11-06  3:25             ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=563BFC15.4070705@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=clm@fb.com \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mfasheh@suse.de \
    --cc=s.priebe@profihost.ag \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.