linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakob Unterwurzacher <jakobunt@gmail.com>
To: Chris Mason <chris.mason@oracle.com>, linux-btrfs@vger.kernel.org
Subject: Re: Rename+crash behaviour of btrfs - nearly ext3!
Date: Tue, 18 May 2010 20:24:48 +0200	[thread overview]
Message-ID: <4BF2DB70.4080702@gmail.com> (raw)
In-Reply-To: <20100518161013.GD8635@think>

On 18/05/10 18:10, Chris Mason wrote:
>>
>> I'm not sure how much memory a queued rename takes up, but the time that
>> would be spent flushing it to disk would then be spent flushing file
>> data, draining the write buffer and freeing memory, no?
>>
>> That would be writing to disk
>>
>>  [Data..................][Rename]  or
>>  [Rename][Data..................]
> 
> Actually it is:
> 
> [Data..................][allow the transaction commit to complete]  or
> [allow the transaction commit to complete][Data..................]
> 
> The problem is that people think of the rename as a tiny thing, but it
> is really bundled in with all of the other metadata operations that were
> done in the current transaction.   The space that was allocated to hold
> the new file name, the space that was freed to remove the old file name,
> the directory entries, the directory inode etc etc.
> 
> This means that holding back that one rename requires holding back every
> operation done to the filesystem.
> 
> In btrfs, we're still able to do fsyncs quickly in this case
> because we have a dedicated log for that.  But there are a few different
> types of operations (like disk management) that require us to wait for
> the transaction to complete even when we use the dedicated log.
> 
>>
>> Whether you drain the file data queue or the rename queue first, in the
>> end you'd have to write it all....
> 
> It's about latency.  The latency required to write the entire file is
> unbounded (the size of the file is unbounded).  The latency required to
> commit the transaction without the file data is bounded because we are
> able to control the amount of metadata in each transaction.
> 
> See the firefox vs ext3 wars for an example of all of this, it's the
> latency the firefox people were (rightly) complaining about.
> 
>>
>> I thought the problem of delaying the renames was complexity, well, at
>> least T'Tso said it was [1] - I'm not sure if this applies to btrfs as well.
> 
> I'm afraid there are lots and lots of different issues at play.  The
> most important way to look at it is that forcing data to disk is very
> slow, which is why we try to avoid it whenever we can.
> 
> Applications can request that the data go to disk via lots of different
> ways.  Rename was never ever meant to be one of them, but it really does
> make sense to provide atomic replacement of old good data with new good
> data, so we've implemented that extra syncing.
> 
> Implementing syncing when userland doesn't expect extra syncing usually
> just make userland very unhappy.  It's not that we can't do it it's that
> doing it has implications for every application that uses rename.
> 
> -chris

Thanks for all the insight.

I will update the wiki FAQ to make clear what "data=ordered" in btrfs
means, what not, and why (or something like that).


Jakob

  parent reply	other threads:[~2010-05-18 18:24 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-17 18:04 Rename+crash behaviour of btrfs - nearly ext3! Jakob Unterwurzacher
2010-05-17 19:12 ` Ric Wheeler
2010-05-17 19:25 ` Josef Bacik
2010-05-17 20:09   ` Chris Mason
2010-05-17 20:30     ` Jakob Unterwurzacher
2010-05-17 19:36 ` Chris Mason
2010-05-18  0:14   ` Jakob Unterwurzacher
2010-05-18  0:30     ` Chris Mason
2010-05-18  0:59       ` Chris Mason
2010-05-18 12:03         ` Jakob Unterwurzacher
2010-05-18 13:13           ` Chris Mason
2010-05-18 13:28             ` Oystein Viggen
2010-05-18 14:47               ` Thomas Bellman
2010-05-18 13:39             ` Aidan Van Dyk
2010-05-18 14:06             ` Jakob Unterwurzacher
2010-05-18 14:36               ` Chris Mason
2010-05-18 15:57                 ` Jakob Unterwurzacher
2010-05-18 16:10                   ` Chris Mason
2010-05-18 18:01                     ` Goffredo Baroncelli
2010-05-18 18:24                     ` Jakob Unterwurzacher [this message]
2010-05-18 23:00             ` Ric Wheeler
2010-05-19  1:05               ` Bruce Guenter
2010-05-19  1:34             ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BF2DB70.4080702@gmail.com \
    --to=jakobunt@gmail.com \
    --cc=chris.mason@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).