* Re: [PATCH] btrfs: don't commit transaction for every subvol create
2023-04-11 19:10 [PATCH] btrfs: don't commit transaction for every subvol create Sweet Tea Dorminy
@ 2023-04-12 1:45 ` Neal Gompa
2023-04-12 2:34 ` Qu Wenruo
2023-04-12 13:49 ` Josef Bacik
2023-04-12 2:29 ` Qu Wenruo
` (3 subsequent siblings)
4 siblings, 2 replies; 11+ messages in thread
From: Neal Gompa @ 2023-04-12 1:45 UTC (permalink / raw)
To: Sweet Tea Dorminy
Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs, kernel-team
On Tue, Apr 11, 2023 at 3:22 PM Sweet Tea Dorminy
<sweettea-kernel@dorminy.me> wrote:
>
> Recently a Meta-internal workload encountered subvolume creation taking
> up to 2s each, significantly slower than directory creation. As they
> were hoping to be able to use subvolumes instead of directories, and
> were looking to create hundreds, this was a significant issue. After
> Josef investigated, it turned out to be due to the transaction commit
> currently performed at the end of subvolume creation.
>
> This change improves the workload by not doing transaction commit for every
> subvolume creation, and merely requiring a transaction commit on fsync.
> In the worst case, of doing a subvolume create and fsync in a loop, this
> should require an equal amount of time to the current scheme; and in the
> best case, the internal workload creating hundreds of subvols before
> fsyncing is greatly improved.
>
> While it would be nice to be able to use the log tree and use the normal
> fsync path, logtree replay can't deal with new subvolume inodes
> presently.
>
> It's possible that there's some reason that the transaction commit is
> necessary for correctness during subvolume creation; however,
> git logs indicate that the commit dates back to the beginning of
> subvolume creation, and there are no notes on why it would be necessary.
>
> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
> ---
> fs/btrfs/ioctl.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 25833b4eeaf5..a6f1ee2dc1b9 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -647,6 +647,8 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
> }
> trans->block_rsv = &block_rsv;
> trans->bytes_reserved = block_rsv.size;
> + /* tree log can't currently deal with an inode which is a new root */
> + btrfs_set_log_full_commit(trans);
>
> ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit);
> if (ret)
> @@ -755,10 +757,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
> trans->bytes_reserved = 0;
> btrfs_subvolume_release_metadata(root, &block_rsv);
>
> - if (ret)
> - btrfs_end_transaction(trans);
> - else
> - ret = btrfs_commit_transaction(trans);
> + btrfs_end_transaction(trans);
> out_new_inode_args:
> btrfs_new_inode_args_destroy(&new_inode_args);
> out_inode:
> --
> 2.40.0
>
Wouldn't the consequence of this mean that we lose some guarantees
around subvolume creation? That is, without it being committed, we
would have a window in which the subvolume and data can be written
without being written to disk? If so, how large is that window?
--
真実はいつも一つ!/ Always, there's only one truth!
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] btrfs: don't commit transaction for every subvol create
2023-04-12 1:45 ` Neal Gompa
@ 2023-04-12 2:34 ` Qu Wenruo
2023-04-12 2:35 ` Qu Wenruo
2023-04-12 13:49 ` Josef Bacik
1 sibling, 1 reply; 11+ messages in thread
From: Qu Wenruo @ 2023-04-12 2:34 UTC (permalink / raw)
To: Neal Gompa, Sweet Tea Dorminy
Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs, kernel-team
On 2023/4/12 09:45, Neal Gompa wrote:
> On Tue, Apr 11, 2023 at 3:22 PM Sweet Tea Dorminy
> <sweettea-kernel@dorminy.me> wrote:
>>
>> Recently a Meta-internal workload encountered subvolume creation taking
>> up to 2s each, significantly slower than directory creation. As they
>> were hoping to be able to use subvolumes instead of directories, and
>> were looking to create hundreds, this was a significant issue. After
>> Josef investigated, it turned out to be due to the transaction commit
>> currently performed at the end of subvolume creation.
>>
>> This change improves the workload by not doing transaction commit for every
>> subvolume creation, and merely requiring a transaction commit on fsync.
>> In the worst case, of doing a subvolume create and fsync in a loop, this
>> should require an equal amount of time to the current scheme; and in the
>> best case, the internal workload creating hundreds of subvols before
>> fsyncing is greatly improved.
>>
>> While it would be nice to be able to use the log tree and use the normal
>> fsync path, logtree replay can't deal with new subvolume inodes
>> presently.
>>
>> It's possible that there's some reason that the transaction commit is
>> necessary for correctness during subvolume creation; however,
>> git logs indicate that the commit dates back to the beginning of
>> subvolume creation, and there are no notes on why it would be necessary.
>>
>> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
>> ---
>> fs/btrfs/ioctl.c | 7 +++----
>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 25833b4eeaf5..a6f1ee2dc1b9 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -647,6 +647,8 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
>> }
>> trans->block_rsv = &block_rsv;
>> trans->bytes_reserved = block_rsv.size;
>> + /* tree log can't currently deal with an inode which is a new root */
>> + btrfs_set_log_full_commit(trans);
>>
>> ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit);
>> if (ret)
>> @@ -755,10 +757,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
>> trans->bytes_reserved = 0;
>> btrfs_subvolume_release_metadata(root, &block_rsv);
>>
>> - if (ret)
>> - btrfs_end_transaction(trans);
>> - else
>> - ret = btrfs_commit_transaction(trans);
>> + btrfs_end_transaction(trans);
>> out_new_inode_args:
>> btrfs_new_inode_args_destroy(&new_inode_args);
>> out_inode:
>> --
>> 2.40.0
>>
>
> Wouldn't the consequence of this mean that we lose some guarantees
> around subvolume creation? That is, without it being committed, we
> would have a window in which the subvolume and data can be written
> without being written to disk? If so, how large is that window?
That can be avoided in btrfs-progs, by adding some -C|-c options just
like subvolume deletion to ensure the case.
If you're really concern about the window size, it's no smaller than
commit interval (by default 30s).
But that's not the full story, if you do fsync() on any inode of the new
subvolume, it should cause a full commit transaction, which would write
every needed metadata, including the subvolume itself, to the disk.
So the end result is, you don't really need to bother the subvolume
itself, just do your usual fsync() work on the inode you're really
concerned, no need to bother the subvolume itself.
Thanks,
Qu
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] btrfs: don't commit transaction for every subvol create
2023-04-12 2:34 ` Qu Wenruo
@ 2023-04-12 2:35 ` Qu Wenruo
0 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2023-04-12 2:35 UTC (permalink / raw)
To: Neal Gompa, Sweet Tea Dorminy
Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs, kernel-team
On 2023/4/12 10:34, Qu Wenruo wrote:
>
>
> On 2023/4/12 09:45, Neal Gompa wrote:
>> On Tue, Apr 11, 2023 at 3:22 PM Sweet Tea Dorminy
>> <sweettea-kernel@dorminy.me> wrote:
>>>
>>> Recently a Meta-internal workload encountered subvolume creation taking
>>> up to 2s each, significantly slower than directory creation. As they
>>> were hoping to be able to use subvolumes instead of directories, and
>>> were looking to create hundreds, this was a significant issue. After
>>> Josef investigated, it turned out to be due to the transaction commit
>>> currently performed at the end of subvolume creation.
>>>
>>> This change improves the workload by not doing transaction commit for
>>> every
>>> subvolume creation, and merely requiring a transaction commit on fsync.
>>> In the worst case, of doing a subvolume create and fsync in a loop, this
>>> should require an equal amount of time to the current scheme; and in the
>>> best case, the internal workload creating hundreds of subvols before
>>> fsyncing is greatly improved.
>>>
>>> While it would be nice to be able to use the log tree and use the normal
>>> fsync path, logtree replay can't deal with new subvolume inodes
>>> presently.
>>>
>>> It's possible that there's some reason that the transaction commit is
>>> necessary for correctness during subvolume creation; however,
>>> git logs indicate that the commit dates back to the beginning of
>>> subvolume creation, and there are no notes on why it would be necessary.
>>>
>>> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
>>> ---
>>> fs/btrfs/ioctl.c | 7 +++----
>>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>> index 25833b4eeaf5..a6f1ee2dc1b9 100644
>>> --- a/fs/btrfs/ioctl.c
>>> +++ b/fs/btrfs/ioctl.c
>>> @@ -647,6 +647,8 @@ static noinline int create_subvol(struct
>>> mnt_idmap *idmap,
>>> }
>>> trans->block_rsv = &block_rsv;
>>> trans->bytes_reserved = block_rsv.size;
>>> + /* tree log can't currently deal with an inode which is a new
>>> root */
>>> + btrfs_set_log_full_commit(trans);
>>>
>>> ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit);
>>> if (ret)
>>> @@ -755,10 +757,7 @@ static noinline int create_subvol(struct
>>> mnt_idmap *idmap,
>>> trans->bytes_reserved = 0;
>>> btrfs_subvolume_release_metadata(root, &block_rsv);
>>>
>>> - if (ret)
>>> - btrfs_end_transaction(trans);
>>> - else
>>> - ret = btrfs_commit_transaction(trans);
>>> + btrfs_end_transaction(trans);
>>> out_new_inode_args:
>>> btrfs_new_inode_args_destroy(&new_inode_args);
>>> out_inode:
>>> --
>>> 2.40.0
>>>
>>
>> Wouldn't the consequence of this mean that we lose some guarantees
>> around subvolume creation? That is, without it being committed, we
>> would have a window in which the subvolume and data can be written
>> without being written to disk? If so, how large is that window?
>
> That can be avoided in btrfs-progs, by adding some -C|-c options just
> like subvolume deletion to ensure the case.
>
>
> If you're really concern about the window size, it's no smaller than
> commit interval (by default 30s).
s/smaller/larger/
>
>
> But that's not the full story, if you do fsync() on any inode of the new
> subvolume, it should cause a full commit transaction, which would write
> every needed metadata, including the subvolume itself, to the disk.
>
> So the end result is, you don't really need to bother the subvolume
> itself, just do your usual fsync() work on the inode you're really
> concerned, no need to bother the subvolume itself.
>
> Thanks,
> Qu
>
>>
>>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] btrfs: don't commit transaction for every subvol create
2023-04-12 1:45 ` Neal Gompa
2023-04-12 2:34 ` Qu Wenruo
@ 2023-04-12 13:49 ` Josef Bacik
1 sibling, 0 replies; 11+ messages in thread
From: Josef Bacik @ 2023-04-12 13:49 UTC (permalink / raw)
To: Neal Gompa
Cc: Sweet Tea Dorminy, Chris Mason, David Sterba, linux-btrfs,
kernel-team
On Tue, Apr 11, 2023 at 09:45:34PM -0400, Neal Gompa wrote:
> On Tue, Apr 11, 2023 at 3:22 PM Sweet Tea Dorminy
> <sweettea-kernel@dorminy.me> wrote:
> >
> > Recently a Meta-internal workload encountered subvolume creation taking
> > up to 2s each, significantly slower than directory creation. As they
> > were hoping to be able to use subvolumes instead of directories, and
> > were looking to create hundreds, this was a significant issue. After
> > Josef investigated, it turned out to be due to the transaction commit
> > currently performed at the end of subvolume creation.
> >
> > This change improves the workload by not doing transaction commit for every
> > subvolume creation, and merely requiring a transaction commit on fsync.
> > In the worst case, of doing a subvolume create and fsync in a loop, this
> > should require an equal amount of time to the current scheme; and in the
> > best case, the internal workload creating hundreds of subvols before
> > fsyncing is greatly improved.
> >
> > While it would be nice to be able to use the log tree and use the normal
> > fsync path, logtree replay can't deal with new subvolume inodes
> > presently.
> >
> > It's possible that there's some reason that the transaction commit is
> > necessary for correctness during subvolume creation; however,
> > git logs indicate that the commit dates back to the beginning of
> > subvolume creation, and there are no notes on why it would be necessary.
> >
> > Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
> > ---
> > fs/btrfs/ioctl.c | 7 +++----
> > 1 file changed, 3 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 25833b4eeaf5..a6f1ee2dc1b9 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -647,6 +647,8 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
> > }
> > trans->block_rsv = &block_rsv;
> > trans->bytes_reserved = block_rsv.size;
> > + /* tree log can't currently deal with an inode which is a new root */
> > + btrfs_set_log_full_commit(trans);
> >
> > ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit);
> > if (ret)
> > @@ -755,10 +757,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
> > trans->bytes_reserved = 0;
> > btrfs_subvolume_release_metadata(root, &block_rsv);
> >
> > - if (ret)
> > - btrfs_end_transaction(trans);
> > - else
> > - ret = btrfs_commit_transaction(trans);
> > + btrfs_end_transaction(trans);
> > out_new_inode_args:
> > btrfs_new_inode_args_destroy(&new_inode_args);
> > out_inode:
> > --
> > 2.40.0
> >
>
> Wouldn't the consequence of this mean that we lose some guarantees
> around subvolume creation? That is, without it being committed, we
> would have a window in which the subvolume and data can be written
> without being written to disk? If so, how large is that window?
It's the same as a normal directory, we create a subvol and then create a file
in that subvol and start writing to that file, then lose power before the
transaction commits, we lose everything. The same would happen for a new
directory.
The only concern here that's different is with the directory case you could
fsync() the new file and it would all go into the tree log. As pointed out
above the tree log stuff can't handle a subvolume that doesn't exist yet, so in
the above case if the user fsync()'s that new file it'll commit the whole
transaction and everything will be fine. Thanks,
Josef
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] btrfs: don't commit transaction for every subvol create
2023-04-11 19:10 [PATCH] btrfs: don't commit transaction for every subvol create Sweet Tea Dorminy
2023-04-12 1:45 ` Neal Gompa
@ 2023-04-12 2:29 ` Qu Wenruo
2023-04-12 13:51 ` Josef Bacik
` (2 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2023-04-12 2:29 UTC (permalink / raw)
To: Sweet Tea Dorminy, Chris Mason, Josef Bacik, David Sterba,
linux-btrfs, kernel-team
On 2023/4/12 03:10, Sweet Tea Dorminy wrote:
> Recently a Meta-internal workload encountered subvolume creation taking
> up to 2s each, significantly slower than directory creation. As they
> were hoping to be able to use subvolumes instead of directories, and
> were looking to create hundreds, this was a significant issue. After
> Josef investigated, it turned out to be due to the transaction commit
> currently performed at the end of subvolume creation.
>
> This change improves the workload by not doing transaction commit for every
> subvolume creation, and merely requiring a transaction commit on fsync.
> In the worst case, of doing a subvolume create and fsync in a loop, this
> should require an equal amount of time to the current scheme; and in the
> best case, the internal workload creating hundreds of subvols before
> fsyncing is greatly improved.
>
> While it would be nice to be able to use the log tree and use the normal
> fsync path, logtree replay can't deal with new subvolume inodes
> presently.
>
> It's possible that there's some reason that the transaction commit is
> necessary for correctness during subvolume creation; however,
> git logs indicate that the commit dates back to the beginning of
> subvolume creation, and there are no notes on why it would be necessary.
For subvolume creation it looks fine.
My main concern related to this topic is mostly around snapshots:
- Snapshots can only be created during commit transaction
- Snapshots qgroup accounting
For now we only support quick path (aka, inherit the old numbers from
the source, and that's no other qgroup level involved).
But for subvolume creation, those are not involved at all.
So the idea of skipping transaction commit looks solid to me.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Thanks,
Qu
>
> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
> ---
> fs/btrfs/ioctl.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 25833b4eeaf5..a6f1ee2dc1b9 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -647,6 +647,8 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
> }
> trans->block_rsv = &block_rsv;
> trans->bytes_reserved = block_rsv.size;
> + /* tree log can't currently deal with an inode which is a new root */
> + btrfs_set_log_full_commit(trans);
>
> ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit);
> if (ret)
> @@ -755,10 +757,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
> trans->bytes_reserved = 0;
> btrfs_subvolume_release_metadata(root, &block_rsv);
>
> - if (ret)
> - btrfs_end_transaction(trans);
> - else
> - ret = btrfs_commit_transaction(trans);
> + btrfs_end_transaction(trans);
> out_new_inode_args:
> btrfs_new_inode_args_destroy(&new_inode_args);
> out_inode:
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] btrfs: don't commit transaction for every subvol create
2023-04-11 19:10 [PATCH] btrfs: don't commit transaction for every subvol create Sweet Tea Dorminy
2023-04-12 1:45 ` Neal Gompa
2023-04-12 2:29 ` Qu Wenruo
@ 2023-04-12 13:51 ` Josef Bacik
2023-04-12 22:35 ` Neal Gompa
2023-04-17 18:46 ` David Sterba
4 siblings, 0 replies; 11+ messages in thread
From: Josef Bacik @ 2023-04-12 13:51 UTC (permalink / raw)
To: Sweet Tea Dorminy; +Cc: Chris Mason, David Sterba, linux-btrfs, kernel-team
On Tue, Apr 11, 2023 at 03:10:53PM -0400, Sweet Tea Dorminy wrote:
> Recently a Meta-internal workload encountered subvolume creation taking
> up to 2s each, significantly slower than directory creation. As they
> were hoping to be able to use subvolumes instead of directories, and
> were looking to create hundreds, this was a significant issue. After
> Josef investigated, it turned out to be due to the transaction commit
> currently performed at the end of subvolume creation.
>
> This change improves the workload by not doing transaction commit for every
> subvolume creation, and merely requiring a transaction commit on fsync.
> In the worst case, of doing a subvolume create and fsync in a loop, this
> should require an equal amount of time to the current scheme; and in the
> best case, the internal workload creating hundreds of subvols before
> fsyncing is greatly improved.
>
> While it would be nice to be able to use the log tree and use the normal
> fsync path, logtree replay can't deal with new subvolume inodes
> presently.
>
> It's possible that there's some reason that the transaction commit is
> necessary for correctness during subvolume creation; however,
> git logs indicate that the commit dates back to the beginning of
> subvolume creation, and there are no notes on why it would be necessary.
>
> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] btrfs: don't commit transaction for every subvol create
2023-04-11 19:10 [PATCH] btrfs: don't commit transaction for every subvol create Sweet Tea Dorminy
` (2 preceding siblings ...)
2023-04-12 13:51 ` Josef Bacik
@ 2023-04-12 22:35 ` Neal Gompa
2023-04-17 18:46 ` David Sterba
4 siblings, 0 replies; 11+ messages in thread
From: Neal Gompa @ 2023-04-12 22:35 UTC (permalink / raw)
To: Sweet Tea Dorminy
Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs, kernel-team
On Tue, Apr 11, 2023 at 3:22 PM Sweet Tea Dorminy
<sweettea-kernel@dorminy.me> wrote:
>
> Recently a Meta-internal workload encountered subvolume creation taking
> up to 2s each, significantly slower than directory creation. As they
> were hoping to be able to use subvolumes instead of directories, and
> were looking to create hundreds, this was a significant issue. After
> Josef investigated, it turned out to be due to the transaction commit
> currently performed at the end of subvolume creation.
>
> This change improves the workload by not doing transaction commit for every
> subvolume creation, and merely requiring a transaction commit on fsync.
> In the worst case, of doing a subvolume create and fsync in a loop, this
> should require an equal amount of time to the current scheme; and in the
> best case, the internal workload creating hundreds of subvols before
> fsyncing is greatly improved.
>
> While it would be nice to be able to use the log tree and use the normal
> fsync path, logtree replay can't deal with new subvolume inodes
> presently.
>
> It's possible that there's some reason that the transaction commit is
> necessary for correctness during subvolume creation; however,
> git logs indicate that the commit dates back to the beginning of
> subvolume creation, and there are no notes on why it would be necessary.
>
> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
> ---
> fs/btrfs/ioctl.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 25833b4eeaf5..a6f1ee2dc1b9 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -647,6 +647,8 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
> }
> trans->block_rsv = &block_rsv;
> trans->bytes_reserved = block_rsv.size;
> + /* tree log can't currently deal with an inode which is a new root */
> + btrfs_set_log_full_commit(trans);
>
> ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit);
> if (ret)
> @@ -755,10 +757,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
> trans->bytes_reserved = 0;
> btrfs_subvolume_release_metadata(root, &block_rsv);
>
> - if (ret)
> - btrfs_end_transaction(trans);
> - else
> - ret = btrfs_commit_transaction(trans);
> + btrfs_end_transaction(trans);
> out_new_inode_args:
> btrfs_new_inode_args_destroy(&new_inode_args);
> out_inode:
> --
> 2.40.0
>
Reviewed-by: Neal Gompa <neal@gompa.dev>
--
真実はいつも一つ!/ Always, there's only one truth!
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] btrfs: don't commit transaction for every subvol create
2023-04-11 19:10 [PATCH] btrfs: don't commit transaction for every subvol create Sweet Tea Dorminy
` (3 preceding siblings ...)
2023-04-12 22:35 ` Neal Gompa
@ 2023-04-17 18:46 ` David Sterba
2023-04-21 12:58 ` Sweet Tea Dorminy
4 siblings, 1 reply; 11+ messages in thread
From: David Sterba @ 2023-04-17 18:46 UTC (permalink / raw)
To: Sweet Tea Dorminy
Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs, kernel-team
On Tue, Apr 11, 2023 at 03:10:53PM -0400, Sweet Tea Dorminy wrote:
> Recently a Meta-internal workload encountered subvolume creation taking
> up to 2s each, significantly slower than directory creation. As they
> were hoping to be able to use subvolumes instead of directories, and
> were looking to create hundreds, this was a significant issue. After
> Josef investigated, it turned out to be due to the transaction commit
> currently performed at the end of subvolume creation.
>
> This change improves the workload by not doing transaction commit for every
> subvolume creation, and merely requiring a transaction commit on fsync.
> In the worst case, of doing a subvolume create and fsync in a loop, this
> should require an equal amount of time to the current scheme; and in the
> best case, the internal workload creating hundreds of subvols before
> fsyncing is greatly improved.
>
> While it would be nice to be able to use the log tree and use the normal
> fsync path, logtree replay can't deal with new subvolume inodes
> presently.
>
> It's possible that there's some reason that the transaction commit is
> necessary for correctness during subvolume creation; however,
> git logs indicate that the commit dates back to the beginning of
> subvolume creation, and there are no notes on why it would be necessary.
>
> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
For subvolume creation the situation is much easier than with deletion,
as said by others, the semantics is similar to creating a directory,
fsync works to make sure the data are persistent.
So I think it's safe and allows the use case of creating many subvolumes
without the full transaction commit overhead. And we don't need the
commandline options. Added to misc-next, thanks.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] btrfs: don't commit transaction for every subvol create
2023-04-17 18:46 ` David Sterba
@ 2023-04-21 12:58 ` Sweet Tea Dorminy
2023-04-21 15:23 ` David Sterba
0 siblings, 1 reply; 11+ messages in thread
From: Sweet Tea Dorminy @ 2023-04-21 12:58 UTC (permalink / raw)
To: dsterba; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs, kernel-team
>
> So I think it's safe and allows the use case of creating many subvolumes
> without the full transaction commit overhead. And we don't need the
> commandline options. Added to misc-next, thanks.
Just to check, have you pushed misc-next containing this to github? I
can't find it in git log.
Thanks!
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] btrfs: don't commit transaction for every subvol create
2023-04-21 12:58 ` Sweet Tea Dorminy
@ 2023-04-21 15:23 ` David Sterba
0 siblings, 0 replies; 11+ messages in thread
From: David Sterba @ 2023-04-21 15:23 UTC (permalink / raw)
To: Sweet Tea Dorminy
Cc: dsterba, Chris Mason, Josef Bacik, David Sterba, linux-btrfs,
kernel-team
On Fri, Apr 21, 2023 at 08:58:54AM -0400, Sweet Tea Dorminy wrote:
>
> >
> > So I think it's safe and allows the use case of creating many subvolumes
> > without the full transaction commit overhead. And we don't need the
> > commandline options. Added to misc-next, thanks.
>
> Just to check, have you pushed misc-next containing this to github? I
> can't find it in git log.
Maybe I forgot to push the latest changes, now pushed.
^ permalink raw reply [flat|nested] 11+ messages in thread