* Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete @ 2015-11-01 20:49 Stefan Priebe 2015-11-01 22:57 ` Duncan 2015-11-02 1:34 ` Qu Wenruo 0 siblings, 2 replies; 14+ messages in thread From: Stefan Priebe @ 2015-11-01 20:49 UTC (permalink / raw) To: linux-btrfs@vger.kernel.org, mfasheh; +Cc: jbacik, Chris Mason Hi, this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html adds a regression to my test systems with very large disks (30tb and 50tb). btrfs balance is super slow afterwards while heavily making use of cp --reflink=always on big files (200gb - 500gb). Sorry didn't know how to correctly reply to that "old" message. Greets, Stefan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete 2015-11-01 20:49 Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete Stefan Priebe @ 2015-11-01 22:57 ` Duncan 2015-11-02 1:34 ` Qu Wenruo 1 sibling, 0 replies; 14+ messages in thread From: Duncan @ 2015-11-01 22:57 UTC (permalink / raw) To: linux-btrfs Stefan Priebe posted on Sun, 01 Nov 2015 21:49:44 +0100 as excerpted: > this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html > > adds a regression to my test systems with very large disks (30tb and > 50tb). > > btrfs balance is super slow afterwards while heavily making use of cp > --reflink=always on big files (200gb - 500gb). > > Sorry didn't know how to correctly reply to that "old" message. Just on the message-reply bit... Gmane.org carries this list (among many), archiving the posts with both nntp/news and http/web interfaces. Both the web and news interfaces normally allow replies to both old and current messages via the gmane gateway forwarding to the list, tho the first time you reply to a list via gmane, it'll respond with a confirmation to the email address you used, requiring you to reply to that before forwarding the mail on to the list. If you don't reply within a week, the message is dropped. However, at least for the news interface (not sure about the web interface), you only have to confirm for a particular list/newsgroup once, after that, it forwards to the list without further confirmations. That's how I follow all my lists, reading and replying to them as newsgroups via the gmane list2news interface. http://gmane.org for more info. The one caveat is that while on a lot of lists replies to the list only is the norm, on the Linux kernel and vger.kernel.org hosted lists (including this one), replying to all, list and previous posters, is the norm, and I'm not sure if the web interface allows that. On the news interface it of course depends on your news client -- mine is more adapted to news than mail, and while it allows forwarding to your normal mail client for the mail side, normal followups are to news only, and it's not easy to reply to all, so I generally reply to list (as newsgroup) only, unless a poster specifically requests to be CCed on replies. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete 2015-11-01 20:49 Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete Stefan Priebe 2015-11-01 22:57 ` Duncan @ 2015-11-02 1:34 ` Qu Wenruo 2015-11-02 5:46 ` Stefan Priebe 2015-11-03 19:26 ` Mark Fasheh 1 sibling, 2 replies; 14+ messages in thread From: Qu Wenruo @ 2015-11-02 1:34 UTC (permalink / raw) To: Stefan Priebe, linux-btrfs@vger.kernel.org, mfasheh; +Cc: jbacik, Chris Mason Stefan Priebe wrote on 2015/11/01 21:49 +0100: > Hi, > > this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html > > adds a regression to my test systems with very large disks (30tb and 50tb). > > btrfs balance is super slow afterwards while heavily making use of cp > --reflink=always on big files (200gb - 500gb). > > Sorry didn't know how to correctly reply to that "old" message. > > Greets, > Stefan > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Thanks for the testing. Are you using qgroup or just doing normal balance with qgroup disabled? For the latter case, that's should be optimized to skip the dirty extent insert in qgroup disabled case. For qgroup enabled case, I'm afraid that's the design. As relocation will drop a subtree to relocate, and to ensure qgroup consistent, we must walk down all the tree blocks and mark them dirty for later qgroup accounting. But there should be some hope left for optimization. For example, if all subtree blocks are already relocated, we can skip the tree down walk routine. Anyway, for your case of huge files, as tree level grows rapidly, any workload involving tree iteration will be very time consuming. Like snapshot deletion and relocation. BTW, thanks for you regression report, I also found another problem of the patch. I'll reply to the author to improve the patchset. Thanks, Qu ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete 2015-11-02 1:34 ` Qu Wenruo @ 2015-11-02 5:46 ` Stefan Priebe 2015-11-03 19:15 ` Mark Fasheh 2015-11-03 19:26 ` Mark Fasheh 1 sibling, 1 reply; 14+ messages in thread From: Stefan Priebe @ 2015-11-02 5:46 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs@vger.kernel.org, mfasheh; +Cc: jbacik, Chris Mason Am 02.11.2015 um 02:34 schrieb Qu Wenruo: > > > Stefan Priebe wrote on 2015/11/01 21:49 +0100: >> Hi, >> >> this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html >> >> adds a regression to my test systems with very large disks (30tb and >> 50tb). >> >> btrfs balance is super slow afterwards while heavily making use of cp >> --reflink=always on big files (200gb - 500gb). >> >> Sorry didn't know how to correctly reply to that "old" message. >> >> Greets, >> Stefan >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > Thanks for the testing. > > Are you using qgroup or just doing normal balance with qgroup disabled? just doing normal balance with qgroup disabled. > For the latter case, that's should be optimized to skip the dirty extent > insert in qgroup disabled case. > > For qgroup enabled case, I'm afraid that's the design. > As relocation will drop a subtree to relocate, and to ensure qgroup > consistent, we must walk down all the tree blocks and mark them dirty > for later qgroup accounting. > > But there should be some hope left for optimization. > For example, if all subtree blocks are already relocated, we can skip > the tree down walk routine. > > Anyway, for your case of huge files, as tree level grows rapidly, any > workload involving tree iteration will be very time consuming. > Like snapshot deletion and relocation. > > BTW, thanks for you regression report, I also found another problem of > the patch. > I'll reply to the author to improve the patchset. Thanks, Stefan > Thanks, > Qu ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete 2015-11-02 5:46 ` Stefan Priebe @ 2015-11-03 19:15 ` Mark Fasheh 0 siblings, 0 replies; 14+ messages in thread From: Mark Fasheh @ 2015-11-03 19:15 UTC (permalink / raw) To: Stefan Priebe; +Cc: Qu Wenruo, linux-btrfs@vger.kernel.org, jbacik, Chris Mason On Mon, Nov 02, 2015 at 06:46:06AM +0100, Stefan Priebe wrote: > Am 02.11.2015 um 02:34 schrieb Qu Wenruo: > > > > > >Stefan Priebe wrote on 2015/11/01 21:49 +0100: > >>Hi, > >> > >>this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html > >> > >>adds a regression to my test systems with very large disks (30tb and > >>50tb). > >> > >>btrfs balance is super slow afterwards while heavily making use of cp > >>--reflink=always on big files (200gb - 500gb). > >> > >>Sorry didn't know how to correctly reply to that "old" message. > >> > >>Greets, > >>Stefan > >>-- > >>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > >>the body of a message to majordomo@vger.kernel.org > >>More majordomo info at http://vger.kernel.org/majordomo-info.html > > > >Thanks for the testing. > > > >Are you using qgroup or just doing normal balance with qgroup disabled? > > just doing normal balance with qgroup disabled. Then that patch is very unlikely to be your actual problem as it won't be doing anything (ok some kmalloc/free of a very tiny object) since qgroups are disabled. Also, btrfs had working subtree accounting in that code for the last N releases (doing the same exact thing) and it only changed for the one release that Qu's rework was in (which lazily tore it out). --Mark -- Mark Fasheh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete 2015-11-02 1:34 ` Qu Wenruo 2015-11-02 5:46 ` Stefan Priebe @ 2015-11-03 19:26 ` Mark Fasheh 2015-11-03 19:42 ` Stefan Priebe 2015-11-04 1:01 ` Qu Wenruo 1 sibling, 2 replies; 14+ messages in thread From: Mark Fasheh @ 2015-11-03 19:26 UTC (permalink / raw) To: Qu Wenruo; +Cc: Stefan Priebe, linux-btrfs@vger.kernel.org, jbacik, Chris Mason On Mon, Nov 02, 2015 at 09:34:24AM +0800, Qu Wenruo wrote: > > > Stefan Priebe wrote on 2015/11/01 21:49 +0100: > >Hi, > > > >this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html > > > >adds a regression to my test systems with very large disks (30tb and 50tb). > > > >btrfs balance is super slow afterwards while heavily making use of cp > >--reflink=always on big files (200gb - 500gb). > > > >Sorry didn't know how to correctly reply to that "old" message. > > > >Greets, > >Stefan > >-- > >To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > >the body of a message to majordomo@vger.kernel.org > >More majordomo info at http://vger.kernel.org/majordomo-info.html > > Thanks for the testing. > > Are you using qgroup or just doing normal balance with qgroup disabled? > > For the latter case, that's should be optimized to skip the dirty > extent insert in qgroup disabled case. > > For qgroup enabled case, I'm afraid that's the design. > As relocation will drop a subtree to relocate, and to ensure qgroup > consistent, we must walk down all the tree blocks and mark them > dirty for later qgroup accounting. Qu, we're always going to have to walk the tree when deleting it, this is part of removing a subvolume. We've walked shared subtrees in this code for numerous kernel releases without incident before it was removed in 4.2. Do you have any actual evidence that this is a major performance regression? >From our previous conversations you seemed convinced of this, without even having a working subtree walk to test. I remember the hand wringing about an individual commit being too heavy with the qgroup code (even though I pointed out that tree walk is a restartable transaction). It seems that you are confused still about how we handle removing a volume wrt qgroups. If you have questions or concerns I would be happy to explain them but IMHO your statements there are opinion and not based in fact. Yes btw, we might have to do more work for the uncommon case of a qgroup being referenced by higher level groups but that is clearly not happening here (and honestly it's not a common case at all). --Mark -- Mark Fasheh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete 2015-11-03 19:26 ` Mark Fasheh @ 2015-11-03 19:42 ` Stefan Priebe 2015-11-03 23:31 ` Mark Fasheh 2015-11-04 1:01 ` Qu Wenruo 1 sibling, 1 reply; 14+ messages in thread From: Stefan Priebe @ 2015-11-03 19:42 UTC (permalink / raw) To: Mark Fasheh, Qu Wenruo; +Cc: linux-btrfs@vger.kernel.org, jbacik, Chris Mason Am 03.11.2015 um 20:26 schrieb Mark Fasheh: > On Mon, Nov 02, 2015 at 09:34:24AM +0800, Qu Wenruo wrote: >> >> >> Stefan Priebe wrote on 2015/11/01 21:49 +0100: >>> Hi, >>> >>> this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html >>> >>> adds a regression to my test systems with very large disks (30tb and 50tb). >>> >>> btrfs balance is super slow afterwards while heavily making use of cp >>> --reflink=always on big files (200gb - 500gb). >>> >>> Sorry didn't know how to correctly reply to that "old" message. >>> >>> Greets, >>> Stefan >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> Thanks for the testing. >> >> Are you using qgroup or just doing normal balance with qgroup disabled? >> >> For the latter case, that's should be optimized to skip the dirty >> extent insert in qgroup disabled case. >> >> For qgroup enabled case, I'm afraid that's the design. >> As relocation will drop a subtree to relocate, and to ensure qgroup >> consistent, we must walk down all the tree blocks and mark them >> dirty for later qgroup accounting. > > Qu, we're always going to have to walk the tree when deleting it, this is > part of removing a subvolume. We've walked shared subtrees in this code for > numerous kernel releases without incident before it was removed in 4.2. > > Do you have any actual evidence that this is a major performance regression? > From our previous conversations you seemed convinced of this, without even > having a working subtree walk to test. I remember the hand wringing > about an individual commit being too heavy with the qgroup code (even though > I pointed out that tree walk is a restartable transaction). > > It seems that you are confused still about how we handle removing a volume > wrt qgroups. > > If you have questions or concerns I would be happy to explain them but > IMHO your statements there are opinion and not based in fact. > > Yes btw, we might have to do more work for the uncommon case of a > qgroup being referenced by higher level groups but that is clearly not > happening here (and honestly it's not a common case at all). > --Mark Sorry don't know much about the btrfs internals. I just can reproduce this. Switching to a kernel with this patch and without. With it takes ages - without it's super fast. I prooved this several times by just rebooting to the other kernel. Stefan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete 2015-11-03 19:42 ` Stefan Priebe @ 2015-11-03 23:31 ` Mark Fasheh 2015-11-04 2:22 ` Chris Mason 0 siblings, 1 reply; 14+ messages in thread From: Mark Fasheh @ 2015-11-03 23:31 UTC (permalink / raw) To: Stefan Priebe; +Cc: Qu Wenruo, linux-btrfs@vger.kernel.org, jbacik, Chris Mason On Tue, Nov 03, 2015 at 08:42:33PM +0100, Stefan Priebe wrote: > Sorry don't know much about the btrfs internals. > > I just can reproduce this. Switching to a kernel with this patch and > without. With it takes ages - without it's super fast. I prooved > this several times by just rebooting to the other kernel. That's fine, disregard my previous e-mail - I just saw the mail Qu sent me. There's a problem in the code that the patch calls which is causing your performance issues. I'll CC you when I put out a fix. Thanks, --Mark -- Mark Fasheh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete 2015-11-03 23:31 ` Mark Fasheh @ 2015-11-04 2:22 ` Chris Mason 0 siblings, 0 replies; 14+ messages in thread From: Chris Mason @ 2015-11-04 2:22 UTC (permalink / raw) To: Mark Fasheh; +Cc: Stefan Priebe, Qu Wenruo, linux-btrfs@vger.kernel.org, jbacik On Tue, Nov 03, 2015 at 03:31:15PM -0800, Mark Fasheh wrote: > On Tue, Nov 03, 2015 at 08:42:33PM +0100, Stefan Priebe wrote: > > Sorry don't know much about the btrfs internals. > > > > I just can reproduce this. Switching to a kernel with this patch and > > without. With it takes ages - without it's super fast. I prooved > > this several times by just rebooting to the other kernel. > > That's fine, disregard my previous e-mail - I just saw the mail Qu sent me. > There's a problem in the code that the patch calls which is causing your > performance issues. I'll CC you when I put out a fix. Thanks Mark (and Qu), I'll get the fixed version into integration once it is out. -chris ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete 2015-11-03 19:26 ` Mark Fasheh 2015-11-03 19:42 ` Stefan Priebe @ 2015-11-04 1:01 ` Qu Wenruo 2015-11-05 19:23 ` Mark Fasheh 1 sibling, 1 reply; 14+ messages in thread From: Qu Wenruo @ 2015-11-04 1:01 UTC (permalink / raw) To: Mark Fasheh Cc: Stefan Priebe, linux-btrfs@vger.kernel.org, jbacik, Chris Mason Mark Fasheh wrote on 2015/11/03 11:26 -0800: > On Mon, Nov 02, 2015 at 09:34:24AM +0800, Qu Wenruo wrote: >> >> >> Stefan Priebe wrote on 2015/11/01 21:49 +0100: >>> Hi, >>> >>> this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html >>> >>> adds a regression to my test systems with very large disks (30tb and 50tb). >>> >>> btrfs balance is super slow afterwards while heavily making use of cp >>> --reflink=always on big files (200gb - 500gb). >>> >>> Sorry didn't know how to correctly reply to that "old" message. >>> >>> Greets, >>> Stefan >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> Thanks for the testing. >> >> Are you using qgroup or just doing normal balance with qgroup disabled? >> >> For the latter case, that's should be optimized to skip the dirty >> extent insert in qgroup disabled case. >> >> For qgroup enabled case, I'm afraid that's the design. >> As relocation will drop a subtree to relocate, and to ensure qgroup >> consistent, we must walk down all the tree blocks and mark them >> dirty for later qgroup accounting. > > Qu, we're always going to have to walk the tree when deleting it, this is > part of removing a subvolume. We've walked shared subtrees in this code for > numerous kernel releases without incident before it was removed in 4.2. > > Do you have any actual evidence that this is a major performance regression? > From our previous conversations you seemed convinced of this, without even > having a working subtree walk to test. I remember the hand wringing > about an individual commit being too heavy with the qgroup code (even though > I pointed out that tree walk is a restartable transaction). > > It seems that you are confused still about how we handle removing a volume > wrt qgroups. > > If you have questions or concerns I would be happy to explain them but > IMHO your statements there are opinion and not based in fact. Yes, I don't deny it. But it's quite hard to prove it, as we need such a huge storage like Stefan. What I have is only several hundred GB test storage. Even accounting all my home NAS, I only have 2T, far from the storage Stefan has. And what Stefan report should already give some hint about the performance issue. In your word "it won't be doing anything (ok some kmalloc/free of a very tiny object)", it's already slowing down balance, since balance also use btrfs_drop_subtree(). You're right about tree walk can happen in several transaction, and normally user won't notice anything as subvolume delete is in background. But in relocating case, it's causing relocation slower than it was, due to "nothing(kmalloc/free ting objects)". Yes, you can fix it by avoid memory allocation in qgroup disabled case, but what will happen if user enabled qgroup? I'm not saying there is anything wrong about your patch, in fact I'm quite happy you solved such problem with so small changes. But we can't just ignore such "possible" performance issue just because old code did the same thing.(Although not the same now, we're marking all subtree blocks dirty other than shared one). Thanks, Qu > > Yes btw, we might have to do more work for the uncommon case of a > qgroup being referenced by higher level groups but that is clearly not > happening here (and honestly it's not a common case at all). > --Mark > > > -- > Mark Fasheh > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete 2015-11-04 1:01 ` Qu Wenruo @ 2015-11-05 19:23 ` Mark Fasheh 2015-11-06 1:02 ` Qu Wenruo 0 siblings, 1 reply; 14+ messages in thread From: Mark Fasheh @ 2015-11-05 19:23 UTC (permalink / raw) To: Qu Wenruo; +Cc: Stefan Priebe, linux-btrfs@vger.kernel.org, jbacik, Chris Mason On Wed, Nov 04, 2015 at 09:01:36AM +0800, Qu Wenruo wrote: > > > Mark Fasheh wrote on 2015/11/03 11:26 -0800: > >On Mon, Nov 02, 2015 at 09:34:24AM +0800, Qu Wenruo wrote: > >> > >> > >>Stefan Priebe wrote on 2015/11/01 21:49 +0100: > >>>Hi, > >>> > >>>this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html > >>> > >>>adds a regression to my test systems with very large disks (30tb and 50tb). > >>> > >>>btrfs balance is super slow afterwards while heavily making use of cp > >>>--reflink=always on big files (200gb - 500gb). > >>> > >>>Sorry didn't know how to correctly reply to that "old" message. > >>> > >>>Greets, > >>>Stefan > >>>-- > >>>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > >>>the body of a message to majordomo@vger.kernel.org > >>>More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > >>Thanks for the testing. > >> > >>Are you using qgroup or just doing normal balance with qgroup disabled? > >> > >>For the latter case, that's should be optimized to skip the dirty > >>extent insert in qgroup disabled case. > >> > >>For qgroup enabled case, I'm afraid that's the design. > >>As relocation will drop a subtree to relocate, and to ensure qgroup > >>consistent, we must walk down all the tree blocks and mark them > >>dirty for later qgroup accounting. > > > >Qu, we're always going to have to walk the tree when deleting it, this is > >part of removing a subvolume. We've walked shared subtrees in this code for > >numerous kernel releases without incident before it was removed in 4.2. > > > >Do you have any actual evidence that this is a major performance regression? > > From our previous conversations you seemed convinced of this, without even > >having a working subtree walk to test. I remember the hand wringing > >about an individual commit being too heavy with the qgroup code (even though > >I pointed out that tree walk is a restartable transaction). > > > >It seems that you are confused still about how we handle removing a volume > >wrt qgroups. > > > >If you have questions or concerns I would be happy to explain them but > >IMHO your statements there are opinion and not based in fact. > > Yes, I don't deny it. > But it's quite hard to prove it, as we need such a huge storage like Stefan. > What I have is only several hundred GB test storage. > Even accounting all my home NAS, I only have 2T, far from the > storage Stefan has. > > And what Stefan report should already give some hint about the > performance issue. > > In your word "it won't be doing anything (ok some kmalloc/free of a > very tiny object)", it's already slowing down balance, since balance > also use btrfs_drop_subtree(). When I wrote that I was under the impression that the qgroup code was doing it's own sanity checking (it used to) and since Stephan had them disabled they couldn't be causing the problem. I read your e-mail explaining that the qgroup api was now intertwined with delayed ref locking after this one. The same exact code ran in either case before and after your patches, so my guess is that the issue is actually inside the qgroup code that shouldn't have been run. I wonder if we even just filled up his memory but never cleaned the objects. The only other thing I can think of is if account_leaf_items() got run in a really tight loop for some reason. Kmalloc in the way we are using it is not usually a performance issue, especially if we've been reading off disk in the same process. Ask yourself this - your own patch series does the same kmalloc for every qgroup operation. Did you notice a complete and massive performance slowdown like the one Stefan reported? I will say that we never had this problem reported before, and account_leaf_items() is always run in all kernels, even without qgroups enabled. That will change with my new patch though. What we can say for sure is that drop_snapshot in the qgroup case will read more disk and obviously that will have a negative impact depending on what the tree looks like. So IMHO we ought to be focusing on reducing the amount of I/O involved. > But we can't just ignore such "possible" performance issue just > because old code did the same thing.(Although not the same now, > we're marking all subtree blocks dirty other than shared one). Well, I can't disagree with that - the only reason we are talking right now is because you intentionally ignored the qgroup code in drop_snapshot(). So let's start with this - no more 'fixing' code by tearing it out and replacing it with /* TODO: somebody else re-implement this */ ;) --Mark -- Mark Fasheh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete 2015-11-05 19:23 ` Mark Fasheh @ 2015-11-06 1:02 ` Qu Wenruo 2015-11-06 3:15 ` Mark Fasheh 0 siblings, 1 reply; 14+ messages in thread From: Qu Wenruo @ 2015-11-06 1:02 UTC (permalink / raw) To: Mark Fasheh Cc: Stefan Priebe, linux-btrfs@vger.kernel.org, jbacik, Chris Mason Mark Fasheh wrote on 2015/11/05 11:23 -0800: > On Wed, Nov 04, 2015 at 09:01:36AM +0800, Qu Wenruo wrote: >> >> >> Mark Fasheh wrote on 2015/11/03 11:26 -0800: >>> On Mon, Nov 02, 2015 at 09:34:24AM +0800, Qu Wenruo wrote: >>>> >>>> >>>> Stefan Priebe wrote on 2015/11/01 21:49 +0100: >>>>> Hi, >>>>> >>>>> this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html >>>>> >>>>> adds a regression to my test systems with very large disks (30tb and 50tb). >>>>> >>>>> btrfs balance is super slow afterwards while heavily making use of cp >>>>> --reflink=always on big files (200gb - 500gb). >>>>> >>>>> Sorry didn't know how to correctly reply to that "old" message. >>>>> >>>>> Greets, >>>>> Stefan >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> Thanks for the testing. >>>> >>>> Are you using qgroup or just doing normal balance with qgroup disabled? >>>> >>>> For the latter case, that's should be optimized to skip the dirty >>>> extent insert in qgroup disabled case. >>>> >>>> For qgroup enabled case, I'm afraid that's the design. >>>> As relocation will drop a subtree to relocate, and to ensure qgroup >>>> consistent, we must walk down all the tree blocks and mark them >>>> dirty for later qgroup accounting. >>> >>> Qu, we're always going to have to walk the tree when deleting it, this is >>> part of removing a subvolume. We've walked shared subtrees in this code for >>> numerous kernel releases without incident before it was removed in 4.2. >>> >>> Do you have any actual evidence that this is a major performance regression? >>> From our previous conversations you seemed convinced of this, without even >>> having a working subtree walk to test. I remember the hand wringing >>> about an individual commit being too heavy with the qgroup code (even though >>> I pointed out that tree walk is a restartable transaction). >>> >>> It seems that you are confused still about how we handle removing a volume >>> wrt qgroups. >>> >>> If you have questions or concerns I would be happy to explain them but >>> IMHO your statements there are opinion and not based in fact. >> >> Yes, I don't deny it. >> But it's quite hard to prove it, as we need such a huge storage like Stefan. >> What I have is only several hundred GB test storage. >> Even accounting all my home NAS, I only have 2T, far from the >> storage Stefan has. >> >> And what Stefan report should already give some hint about the >> performance issue. >> >> In your word "it won't be doing anything (ok some kmalloc/free of a >> very tiny object)", it's already slowing down balance, since balance >> also use btrfs_drop_subtree(). > > When I wrote that I was under the impression that the qgroup code was doing > it's own sanity checking (it used to) and since Stephan had them disabled > they couldn't be causing the problem. I read your e-mail explaining that the > qgroup api was now intertwined with delayed ref locking after this one. My fault, as btrfs_qgroup_mark_exntent_dirty() is an exception which doesn't have the qgroup status check and depend on existing locks. > > The same exact code ran in either case before and after your patches, so my > guess is that the issue is actually inside the qgroup code that shouldn't > have been run. I wonder if we even just filled up his memory but never > cleaned the objects. The only other thing I can think of is if > account_leaf_items() got run in a really tight loop for some reason. > > Kmalloc in the way we are using it is not usually a performance issue, > especially if we've been reading off disk in the same process. Ask yourself > this - your own patch series does the same kmalloc for every qgroup > operation. Did you notice a complete and massive performance slowdown like > the one Stefan reported? You're right, such memory allocation may impact performance but not so noticeable, compared to other operations which may kick disk IO, like btrfs_find_all_roots(). But at least, enabling qgroup will impact performance. Yeah, this time I has test data now. In a environment with 100 different snapshot, sysbench shows an overall performance drop about 5%, and in some case, up to 7%, with qgroup enabled. Not sure about the kmalloc impact, maybe less than 1% or maybe 2~3%, but at least it's worthy trying to use kmem cache. > > I will say that we never had this problem reported before, and > account_leaf_items() is always run in all kernels, even without qgroups > enabled. That will change with my new patch though. > > What we can say for sure is that drop_snapshot in the qgroup case will read > more disk and obviously that will have a negative impact depending on what > the tree looks like. So IMHO we ought to be focusing on reducing the amount > of I/O involved. Totally agree. Thanks, Qu > > >> But we can't just ignore such "possible" performance issue just >> because old code did the same thing.(Although not the same now, >> we're marking all subtree blocks dirty other than shared one). > > Well, I can't disagree with that - the only reason we are talking right now > is because you intentionally ignored the qgroup code in drop_snapshot(). So > let's start with this - no more 'fixing' code by tearing it out and replacing > it with /* TODO: somebody else re-implement this */ ;) > --Mark > > -- > Mark Fasheh > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete 2015-11-06 1:02 ` Qu Wenruo @ 2015-11-06 3:15 ` Mark Fasheh 2015-11-06 3:25 ` Qu Wenruo 0 siblings, 1 reply; 14+ messages in thread From: Mark Fasheh @ 2015-11-06 3:15 UTC (permalink / raw) To: Qu Wenruo; +Cc: Stefan Priebe, linux-btrfs@vger.kernel.org, jbacik, Chris Mason On Fri, Nov 06, 2015 at 09:02:13AM +0800, Qu Wenruo wrote: > >The same exact code ran in either case before and after your patches, so my > >guess is that the issue is actually inside the qgroup code that shouldn't > >have been run. I wonder if we even just filled up his memory but never > >cleaned the objects. The only other thing I can think of is if > >account_leaf_items() got run in a really tight loop for some reason. > > > >Kmalloc in the way we are using it is not usually a performance issue, > >especially if we've been reading off disk in the same process. Ask yourself > >this - your own patch series does the same kmalloc for every qgroup > >operation. Did you notice a complete and massive performance slowdown like > >the one Stefan reported? > > You're right, such memory allocation may impact performance but not > so noticeable, compared to other operations which may kick disk IO, > like btrfs_find_all_roots(). > > But at least, enabling qgroup will impact performance. > > Yeah, this time I has test data now. > In a environment with 100 different snapshot, sysbench shows an > overall performance drop about 5%, and in some case, up to 7%, with > qgroup enabled. > > Not sure about the kmalloc impact, maybe less than 1% or maybe 2~3%, > but at least it's worthy trying to use kmem cache. Ok cool, what'd you do to generate the snapshots? I can try a similar test on one of my machines and see what I get. I'm not surprised that the overhead is noticable, and I agree it's easy enough to try things like replacing the allocation once we have a test going. Thanks, --Mark -- Mark Fasheh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete 2015-11-06 3:15 ` Mark Fasheh @ 2015-11-06 3:25 ` Qu Wenruo 0 siblings, 0 replies; 14+ messages in thread From: Qu Wenruo @ 2015-11-06 3:25 UTC (permalink / raw) To: Mark Fasheh Cc: Stefan Priebe, linux-btrfs@vger.kernel.org, jbacik, Chris Mason Mark Fasheh wrote on 2015/11/05 19:15 -0800: > On Fri, Nov 06, 2015 at 09:02:13AM +0800, Qu Wenruo wrote: >>> The same exact code ran in either case before and after your patches, so my >>> guess is that the issue is actually inside the qgroup code that shouldn't >>> have been run. I wonder if we even just filled up his memory but never >>> cleaned the objects. The only other thing I can think of is if >>> account_leaf_items() got run in a really tight loop for some reason. >>> >>> Kmalloc in the way we are using it is not usually a performance issue, >>> especially if we've been reading off disk in the same process. Ask yourself >>> this - your own patch series does the same kmalloc for every qgroup >>> operation. Did you notice a complete and massive performance slowdown like >>> the one Stefan reported? >> >> You're right, such memory allocation may impact performance but not >> so noticeable, compared to other operations which may kick disk IO, >> like btrfs_find_all_roots(). >> >> But at least, enabling qgroup will impact performance. >> >> Yeah, this time I has test data now. >> In a environment with 100 different snapshot, sysbench shows an >> overall performance drop about 5%, and in some case, up to 7%, with >> qgroup enabled. >> >> Not sure about the kmalloc impact, maybe less than 1% or maybe 2~3%, >> but at least it's worthy trying to use kmem cache. > > Ok cool, what'd you do to generate the snapshots? I can try a similar test > on one of my machines and see what I get. I'm not surprised that the > overhead is noticable, and I agree it's easy enough to try things like > replacing the allocation once we have a test going. > > Thanks, > --Mark Doing fsstress in a subvolume with 4 threads, creating a snapshot of that subvolume about every 5 seconds. And do sysbench inside the 50th snapshot. Such test takes both overhead of btrfs_find_all_roots() and kmalloc(). So I'm not sure which overhead is bigger. Thanks, Qu > > -- > Mark Fasheh > ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2015-11-06 3:25 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-11-01 20:49 Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete Stefan Priebe 2015-11-01 22:57 ` Duncan 2015-11-02 1:34 ` Qu Wenruo 2015-11-02 5:46 ` Stefan Priebe 2015-11-03 19:15 ` Mark Fasheh 2015-11-03 19:26 ` Mark Fasheh 2015-11-03 19:42 ` Stefan Priebe 2015-11-03 23:31 ` Mark Fasheh 2015-11-04 2:22 ` Chris Mason 2015-11-04 1:01 ` Qu Wenruo 2015-11-05 19:23 ` Mark Fasheh 2015-11-06 1:02 ` Qu Wenruo 2015-11-06 3:15 ` Mark Fasheh 2015-11-06 3:25 ` Qu Wenruo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).