From: Josef Bacik <josef@toxicpanda.com>
To: Neal Gompa <neal@gompa.dev>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>,
Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.com>,
linux-btrfs@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH] btrfs: don't commit transaction for every subvol create
Date: Wed, 12 Apr 2023 09:49:25 -0400 [thread overview]
Message-ID: <20230412134925.GB3162142@perftesting> (raw)
In-Reply-To: <CAEg-Je_BJpdr+K8rA_NHzykBRT6qmGjzfei3NCzjWBW2kVObmg@mail.gmail.com>
On Tue, Apr 11, 2023 at 09:45:34PM -0400, Neal Gompa wrote:
> On Tue, Apr 11, 2023 at 3:22 PM Sweet Tea Dorminy
> <sweettea-kernel@dorminy.me> wrote:
> >
> > Recently a Meta-internal workload encountered subvolume creation taking
> > up to 2s each, significantly slower than directory creation. As they
> > were hoping to be able to use subvolumes instead of directories, and
> > were looking to create hundreds, this was a significant issue. After
> > Josef investigated, it turned out to be due to the transaction commit
> > currently performed at the end of subvolume creation.
> >
> > This change improves the workload by not doing transaction commit for every
> > subvolume creation, and merely requiring a transaction commit on fsync.
> > In the worst case, of doing a subvolume create and fsync in a loop, this
> > should require an equal amount of time to the current scheme; and in the
> > best case, the internal workload creating hundreds of subvols before
> > fsyncing is greatly improved.
> >
> > While it would be nice to be able to use the log tree and use the normal
> > fsync path, logtree replay can't deal with new subvolume inodes
> > presently.
> >
> > It's possible that there's some reason that the transaction commit is
> > necessary for correctness during subvolume creation; however,
> > git logs indicate that the commit dates back to the beginning of
> > subvolume creation, and there are no notes on why it would be necessary.
> >
> > Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
> > ---
> > fs/btrfs/ioctl.c | 7 +++----
> > 1 file changed, 3 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 25833b4eeaf5..a6f1ee2dc1b9 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -647,6 +647,8 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
> > }
> > trans->block_rsv = &block_rsv;
> > trans->bytes_reserved = block_rsv.size;
> > + /* tree log can't currently deal with an inode which is a new root */
> > + btrfs_set_log_full_commit(trans);
> >
> > ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit);
> > if (ret)
> > @@ -755,10 +757,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
> > trans->bytes_reserved = 0;
> > btrfs_subvolume_release_metadata(root, &block_rsv);
> >
> > - if (ret)
> > - btrfs_end_transaction(trans);
> > - else
> > - ret = btrfs_commit_transaction(trans);
> > + btrfs_end_transaction(trans);
> > out_new_inode_args:
> > btrfs_new_inode_args_destroy(&new_inode_args);
> > out_inode:
> > --
> > 2.40.0
> >
>
> Wouldn't the consequence of this mean that we lose some guarantees
> around subvolume creation? That is, without it being committed, we
> would have a window in which the subvolume and data can be written
> without being written to disk? If so, how large is that window?
It's the same as a normal directory, we create a subvol and then create a file
in that subvol and start writing to that file, then lose power before the
transaction commits, we lose everything. The same would happen for a new
directory.
The only concern here that's different is with the directory case you could
fsync() the new file and it would all go into the tree log. As pointed out
above the tree log stuff can't handle a subvolume that doesn't exist yet, so in
the above case if the user fsync()'s that new file it'll commit the whole
transaction and everything will be fine. Thanks,
Josef
next prev parent reply other threads:[~2023-04-12 13:49 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-11 19:10 [PATCH] btrfs: don't commit transaction for every subvol create Sweet Tea Dorminy
2023-04-12 1:45 ` Neal Gompa
2023-04-12 2:34 ` Qu Wenruo
2023-04-12 2:35 ` Qu Wenruo
2023-04-12 13:49 ` Josef Bacik [this message]
2023-04-12 2:29 ` Qu Wenruo
2023-04-12 13:51 ` Josef Bacik
2023-04-12 22:35 ` Neal Gompa
2023-04-17 18:46 ` David Sterba
2023-04-21 12:58 ` Sweet Tea Dorminy
2023-04-21 15:23 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230412134925.GB3162142@perftesting \
--to=josef@toxicpanda.com \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=kernel-team@meta.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=neal@gompa.dev \
--cc=sweettea-kernel@dorminy.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.