From: Jakob Unterwurzacher <jakobunt@gmail.com>
To: Chris Mason <chris.mason@oracle.com>, linux-btrfs@vger.kernel.org
Subject: Re: Rename+crash behaviour of btrfs - nearly ext3!
Date: Tue, 18 May 2010 20:24:48 +0200 [thread overview]
Message-ID: <4BF2DB70.4080702@gmail.com> (raw)
In-Reply-To: <20100518161013.GD8635@think>
On 18/05/10 18:10, Chris Mason wrote:
>>
>> I'm not sure how much memory a queued rename takes up, but the time that
>> would be spent flushing it to disk would then be spent flushing file
>> data, draining the write buffer and freeing memory, no?
>>
>> That would be writing to disk
>>
>> [Data..................][Rename] or
>> [Rename][Data..................]
>
> Actually it is:
>
> [Data..................][allow the transaction commit to complete] or
> [allow the transaction commit to complete][Data..................]
>
> The problem is that people think of the rename as a tiny thing, but it
> is really bundled in with all of the other metadata operations that were
> done in the current transaction. The space that was allocated to hold
> the new file name, the space that was freed to remove the old file name,
> the directory entries, the directory inode etc etc.
>
> This means that holding back that one rename requires holding back every
> operation done to the filesystem.
>
> In btrfs, we're still able to do fsyncs quickly in this case
> because we have a dedicated log for that. But there are a few different
> types of operations (like disk management) that require us to wait for
> the transaction to complete even when we use the dedicated log.
>
>>
>> Whether you drain the file data queue or the rename queue first, in the
>> end you'd have to write it all....
>
> It's about latency. The latency required to write the entire file is
> unbounded (the size of the file is unbounded). The latency required to
> commit the transaction without the file data is bounded because we are
> able to control the amount of metadata in each transaction.
>
> See the firefox vs ext3 wars for an example of all of this, it's the
> latency the firefox people were (rightly) complaining about.
>
>>
>> I thought the problem of delaying the renames was complexity, well, at
>> least T'Tso said it was [1] - I'm not sure if this applies to btrfs as well.
>
> I'm afraid there are lots and lots of different issues at play. The
> most important way to look at it is that forcing data to disk is very
> slow, which is why we try to avoid it whenever we can.
>
> Applications can request that the data go to disk via lots of different
> ways. Rename was never ever meant to be one of them, but it really does
> make sense to provide atomic replacement of old good data with new good
> data, so we've implemented that extra syncing.
>
> Implementing syncing when userland doesn't expect extra syncing usually
> just make userland very unhappy. It's not that we can't do it it's that
> doing it has implications for every application that uses rename.
>
> -chris
Thanks for all the insight.
I will update the wiki FAQ to make clear what "data=ordered" in btrfs
means, what not, and why (or something like that).
Jakob
next prev parent reply other threads:[~2010-05-18 18:24 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-17 18:04 Rename+crash behaviour of btrfs - nearly ext3! Jakob Unterwurzacher
2010-05-17 19:12 ` Ric Wheeler
2010-05-17 19:25 ` Josef Bacik
2010-05-17 20:09 ` Chris Mason
2010-05-17 20:30 ` Jakob Unterwurzacher
2010-05-17 19:36 ` Chris Mason
2010-05-18 0:14 ` Jakob Unterwurzacher
2010-05-18 0:30 ` Chris Mason
2010-05-18 0:59 ` Chris Mason
2010-05-18 12:03 ` Jakob Unterwurzacher
2010-05-18 13:13 ` Chris Mason
2010-05-18 13:28 ` Oystein Viggen
2010-05-18 14:47 ` Thomas Bellman
2010-05-18 13:39 ` Aidan Van Dyk
2010-05-18 14:06 ` Jakob Unterwurzacher
2010-05-18 14:36 ` Chris Mason
2010-05-18 15:57 ` Jakob Unterwurzacher
2010-05-18 16:10 ` Chris Mason
2010-05-18 18:01 ` Goffredo Baroncelli
2010-05-18 18:24 ` Jakob Unterwurzacher [this message]
2010-05-18 23:00 ` Ric Wheeler
2010-05-19 1:05 ` Bruce Guenter
2010-05-19 1:34 ` Andy Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BF2DB70.4080702@gmail.com \
--to=jakobunt@gmail.com \
--cc=chris.mason@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).