public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] btrfs: reflink: disable cross-subvolume clone/dedupe for simple quota
@ 2024-03-21  3:39 Qu Wenruo
  2024-03-21  7:34 ` Roman Mamedov
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Qu Wenruo @ 2024-03-21  3:39 UTC (permalink / raw)
  To: linux-btrfs

Unlike the full qgroup, simple quota no longer makes backref walk to
properly make accurate accounting for each subvolume.

Instead it goes a much faster and much simpler way, anything modified by
the subvolume would be accounted to that subvolume.

Although it brings some small accuracy problem, mostly related to shared
extents between different subvolumes, the reduced overhead is more than
good enough.

Considering there are only 2 ways to share extents between subvolumes:

- Snapshotting
- Cross-subvolume clone/dedupe

And since snapshotting is the core functionality of btrfs, we will never
disable that.

But on the other hand, cross-subvolume snapshotting is not so critical,
and disabling that for simple quota would improve the accuracy of it,
I'd say it's worthy to do that.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/reflink.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)
---
[REASON FOR RFC]
I'm not sure how important the cross-subvolume clone functionality is in
real-world.

But considering squota is mostly designed for container usage, in that
case disabling cross-subvolume clone should be completely fine.

diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
index 08d0fb46ceec..906cb6166b67 100644
--- a/fs/btrfs/reflink.c
+++ b/fs/btrfs/reflink.c
@@ -15,6 +15,7 @@
 #include "file-item.h"
 #include "file.h"
 #include "super.h"
+#include "qgroup.h"
 
 #define BTRFS_MAX_DEDUPE_LEN	SZ_16M
 
@@ -350,6 +351,19 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
 	u64 last_dest_end = destoff;
 	u64 prev_extent_end = off;
 
+	/*
+	 * If squota is enabled, disable cloning between different subvolumes.
+	 *
+	 * As clone/reflink/dedupe is the only other way to share data between
+	 * different subvolumes other than snapshotting.
+	 * With it disabled, squota can be way more accurate.
+	 */
+	if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) {
+		if (BTRFS_I(src)->root->root_key.objectid !=
+		    BTRFS_I(inode)->root->root_key.objectid)
+			return -EXDEV;
+	}
+
 	ret = -ENOMEM;
 	buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
 	if (!buf)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] btrfs: reflink: disable cross-subvolume clone/dedupe for simple quota
  2024-03-21  3:39 [PATCH RFC] btrfs: reflink: disable cross-subvolume clone/dedupe for simple quota Qu Wenruo
@ 2024-03-21  7:34 ` Roman Mamedov
  2024-03-21 13:25 ` David Disseldorp
  2024-03-21 18:51 ` Josef Bacik
  2 siblings, 0 replies; 7+ messages in thread
From: Roman Mamedov @ 2024-03-21  7:34 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, 21 Mar 2024 14:09:38 +1030
Qu Wenruo <wqu@suse.com> wrote:

> Unlike the full qgroup, simple quota no longer makes backref walk to
> properly make accurate accounting for each subvolume.
> 
> Instead it goes a much faster and much simpler way, anything modified by
> the subvolume would be accounted to that subvolume.
> 
> Although it brings some small accuracy problem, mostly related to shared
> extents between different subvolumes, the reduced overhead is more than
> good enough.
> 
> Considering there are only 2 ways to share extents between subvolumes:
> 
> - Snapshotting
> - Cross-subvolume clone/dedupe
> 
> And since snapshotting is the core functionality of btrfs, we will never
> disable that.
> 
> But on the other hand, cross-subvolume snapshotting is not so critical,

I think you meant cross-subvolume clone/dedupe in this case ^

> and disabling that for simple quota would improve the accuracy of it,
> I'd say it's worthy to do that.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/reflink.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> ---
> [REASON FOR RFC]
> I'm not sure how important the cross-subvolume clone functionality is in
> real-world.
> 
> But considering squota is mostly designed for container usage, in that
> case disabling cross-subvolume clone should be completely fine.

Use case that comes to mind, create multiple LXC guests as snapshots from the
template container. They run for a few months, and then you upgrade OS and
software in each, using "apt dist-upgrade", to the latest releases.

Now it would be neat to be able to run dedupe on them to bring disk usage back
to about that of a single container, since the OS and newly installed files
would be mostly the same in all of them.

It also could be that the containers are managed by different users or
customers, in which case recreating them from template on every new OS releases
would not be feasible, and upgrading each separately is the only option.

> diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
> index 08d0fb46ceec..906cb6166b67 100644
> --- a/fs/btrfs/reflink.c
> +++ b/fs/btrfs/reflink.c
> @@ -15,6 +15,7 @@
>  #include "file-item.h"
>  #include "file.h"
>  #include "super.h"
> +#include "qgroup.h"
>  
>  #define BTRFS_MAX_DEDUPE_LEN	SZ_16M
>  
> @@ -350,6 +351,19 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
>  	u64 last_dest_end = destoff;
>  	u64 prev_extent_end = off;
>  
> +	/*
> +	 * If squota is enabled, disable cloning between different subvolumes.
> +	 *
> +	 * As clone/reflink/dedupe is the only other way to share data between
> +	 * different subvolumes other than snapshotting.
> +	 * With it disabled, squota can be way more accurate.
> +	 */
> +	if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) {
> +		if (BTRFS_I(src)->root->root_key.objectid !=
> +		    BTRFS_I(inode)->root->root_key.objectid)
> +			return -EXDEV;
> +	}
> +
>  	ret = -ENOMEM;
>  	buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
>  	if (!buf)


-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] btrfs: reflink: disable cross-subvolume clone/dedupe for simple quota
  2024-03-21  3:39 [PATCH RFC] btrfs: reflink: disable cross-subvolume clone/dedupe for simple quota Qu Wenruo
  2024-03-21  7:34 ` Roman Mamedov
@ 2024-03-21 13:25 ` David Disseldorp
  2024-03-21 20:17   ` Qu Wenruo
  2024-03-21 18:51 ` Josef Bacik
  2 siblings, 1 reply; 7+ messages in thread
From: David Disseldorp @ 2024-03-21 13:25 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, 21 Mar 2024 14:09:38 +1030, Qu Wenruo wrote:
...
> [REASON FOR RFC]
> I'm not sure how important the cross-subvolume clone functionality is in
> real-world.
> 
> But considering squota is mostly designed for container usage, in that
> case disabling cross-subvolume clone should be completely fine.

I think copy_file_range() is reasonably common nowadays, and this would
impact such workloads. I don't find the creator-subvol-pays simple quota
behaviour too confusing, so would vote too keep things as is.

Cheers, David

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] btrfs: reflink: disable cross-subvolume clone/dedupe for simple quota
  2024-03-21  3:39 [PATCH RFC] btrfs: reflink: disable cross-subvolume clone/dedupe for simple quota Qu Wenruo
  2024-03-21  7:34 ` Roman Mamedov
  2024-03-21 13:25 ` David Disseldorp
@ 2024-03-21 18:51 ` Josef Bacik
  2024-03-21 19:08   ` Boris Burkov
  2 siblings, 1 reply; 7+ messages in thread
From: Josef Bacik @ 2024-03-21 18:51 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Mar 21, 2024 at 02:09:38PM +1030, Qu Wenruo wrote:
> Unlike the full qgroup, simple quota no longer makes backref walk to
> properly make accurate accounting for each subvolume.
> 
> Instead it goes a much faster and much simpler way, anything modified by
> the subvolume would be accounted to that subvolume.
> 
> Although it brings some small accuracy problem, mostly related to shared
> extents between different subvolumes, the reduced overhead is more than
> good enough.
> 
> Considering there are only 2 ways to share extents between subvolumes:
> 
> - Snapshotting
> - Cross-subvolume clone/dedupe
> 
> And since snapshotting is the core functionality of btrfs, we will never
> disable that.
> 
> But on the other hand, cross-subvolume snapshotting is not so critical,
> and disabling that for simple quota would improve the accuracy of it,
> I'd say it's worthy to do that.
> 

We did this on purpose, and absolutely want to leave this functionality in
place.  Boris made sure to document this behavior explicitly, because we are
absolutely taking advantage of this internally by having the package management
subvolume managed under a different quota, and then reflinking those packages
into their containers volume.  This is the price of squotas, you aren't getting
full tracking, but you're getting limits and speed.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] btrfs: reflink: disable cross-subvolume clone/dedupe for simple quota
  2024-03-21 18:51 ` Josef Bacik
@ 2024-03-21 19:08   ` Boris Burkov
  2024-03-21 23:59     ` Qu Wenruo
  0 siblings, 1 reply; 7+ messages in thread
From: Boris Burkov @ 2024-03-21 19:08 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Qu Wenruo, linux-btrfs

On Thu, Mar 21, 2024 at 02:51:35PM -0400, Josef Bacik wrote:
> On Thu, Mar 21, 2024 at 02:09:38PM +1030, Qu Wenruo wrote:
> > Unlike the full qgroup, simple quota no longer makes backref walk to
> > properly make accurate accounting for each subvolume.
> > 
> > Instead it goes a much faster and much simpler way, anything modified by
> > the subvolume would be accounted to that subvolume.
> > 
> > Although it brings some small accuracy problem, mostly related to shared
> > extents between different subvolumes, the reduced overhead is more than
> > good enough.
> > 
> > Considering there are only 2 ways to share extents between subvolumes:
> > 
> > - Snapshotting
> > - Cross-subvolume clone/dedupe
> > 
> > And since snapshotting is the core functionality of btrfs, we will never
> > disable that.
> > 
> > But on the other hand, cross-subvolume snapshotting is not so critical,
> > and disabling that for simple quota would improve the accuracy of it,
> > I'd say it's worthy to do that.
> > 
> 
> We did this on purpose, and absolutely want to leave this functionality in
> place.  Boris made sure to document this behavior explicitly, because we are
> absolutely taking advantage of this internally by having the package management
> subvolume managed under a different quota, and then reflinking those packages
> into their containers volume.  This is the price of squotas, you aren't getting
> full tracking, but you're getting limits and speed.  Thanks,
> 
> Josef

For a little extra context, if we hadn't wanted to support reflinking,
then squotas would have been yet simplER, and wouldn't have needed the
new inline owner refs. For better or worse, we decided it was in fact
quite important, so we added the owner refs. Now that we did the hard
work and committed to the incompat change, I certainly think it makes
sense to leave the support.

Boris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] btrfs: reflink: disable cross-subvolume clone/dedupe for simple quota
  2024-03-21 13:25 ` David Disseldorp
@ 2024-03-21 20:17   ` Qu Wenruo
  0 siblings, 0 replies; 7+ messages in thread
From: Qu Wenruo @ 2024-03-21 20:17 UTC (permalink / raw)
  To: David Disseldorp, Qu Wenruo; +Cc: linux-btrfs



在 2024/3/21 23:55, David Disseldorp 写道:
> On Thu, 21 Mar 2024 14:09:38 +1030, Qu Wenruo wrote:
> ...
>> [REASON FOR RFC]
>> I'm not sure how important the cross-subvolume clone functionality is in
>> real-world.
>>
>> But considering squota is mostly designed for container usage, in that
>> case disabling cross-subvolume clone should be completely fine.
>
> I think copy_file_range() is reasonably common nowadays, and this would
> impact such workloads. I don't find the creator-subvol-pays simple quota
> behaviour too confusing, so would vote too keep things as is.
>
> Cheers, David
>
OK, thanks for all the feedback.

Please drop this patch.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] btrfs: reflink: disable cross-subvolume clone/dedupe for simple quota
  2024-03-21 19:08   ` Boris Burkov
@ 2024-03-21 23:59     ` Qu Wenruo
  0 siblings, 0 replies; 7+ messages in thread
From: Qu Wenruo @ 2024-03-21 23:59 UTC (permalink / raw)
  To: Boris Burkov, Josef Bacik; +Cc: linux-btrfs



在 2024/3/22 05:38, Boris Burkov 写道:
> On Thu, Mar 21, 2024 at 02:51:35PM -0400, Josef Bacik wrote:
>> On Thu, Mar 21, 2024 at 02:09:38PM +1030, Qu Wenruo wrote:
>>> Unlike the full qgroup, simple quota no longer makes backref walk to
>>> properly make accurate accounting for each subvolume.
>>>
>>> Instead it goes a much faster and much simpler way, anything modified by
>>> the subvolume would be accounted to that subvolume.
>>>
>>> Although it brings some small accuracy problem, mostly related to shared
>>> extents between different subvolumes, the reduced overhead is more than
>>> good enough.
>>>
>>> Considering there are only 2 ways to share extents between subvolumes:
>>>
>>> - Snapshotting
>>> - Cross-subvolume clone/dedupe
>>>
>>> And since snapshotting is the core functionality of btrfs, we will never
>>> disable that.
>>>
>>> But on the other hand, cross-subvolume snapshotting is not so critical,
>>> and disabling that for simple quota would improve the accuracy of it,
>>> I'd say it's worthy to do that.
>>>
>>
>> We did this on purpose, and absolutely want to leave this functionality in
>> place.  Boris made sure to document this behavior explicitly, because we are
>> absolutely taking advantage of this internally by having the package management
>> subvolume managed under a different quota, and then reflinking those packages
>> into their containers volume.  This is the price of squotas, you aren't getting
>> full tracking, but you're getting limits and speed.  Thanks,
>>
>> Josef
> 
> For a little extra context, if we hadn't wanted to support reflinking,
> then squotas would have been yet simplER,

I guess that's why my initial impression on squota, which can be even 
simpler without the extra extent tree change.

> and wouldn't have needed the
> new inline owner refs. For better or worse, we decided it was in fact
> quite important, so we added the owner refs. Now that we did the hard
> work and committed to the incompat change, I certainly think it makes
> sense to leave the support.

Considering you guys have the real world usage and have already 
committed so much to this feature, it's definitely worthy keeping.

The patch would be dropped.

Thanks,
Qu

> 
> Boris

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-03-21 23:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-21  3:39 [PATCH RFC] btrfs: reflink: disable cross-subvolume clone/dedupe for simple quota Qu Wenruo
2024-03-21  7:34 ` Roman Mamedov
2024-03-21 13:25 ` David Disseldorp
2024-03-21 20:17   ` Qu Wenruo
2024-03-21 18:51 ` Josef Bacik
2024-03-21 19:08   ` Boris Burkov
2024-03-21 23:59     ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox