All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakob Unterwurzacher <jakobunt@gmail.com>
To: Chris Mason <chris.mason@oracle.com>, linux-btrfs@vger.kernel.org
Subject: Re: Rename+crash behaviour of btrfs - nearly ext3!
Date: Tue, 18 May 2010 20:24:48 +0200	[thread overview]
Message-ID: <4BF2DB70.4080702@gmail.com> (raw)
In-Reply-To: <20100518161013.GD8635@think>

On 18/05/10 18:10, Chris Mason wrote:
>>
>> I'm not sure how much memory a queued rename takes up, but the time that
>> would be spent flushing it to disk would then be spent flushing file
>> data, draining the write buffer and freeing memory, no?
>>
>> That would be writing to disk
>>
>>  [Data..................][Rename]  or
>>  [Rename][Data..................]
> 
> Actually it is:
> 
> [Data..................][allow the transaction commit to complete]  or
> [allow the transaction commit to complete][Data..................]
> 
> The problem is that people think of the rename as a tiny thing, but it
> is really bundled in with all of the other metadata operations that were
> done in the current transaction.   The space that was allocated to hold
> the new file name, the space that was freed to remove the old file name,
> the directory entries, the directory inode etc etc.
> 
> This means that holding back that one rename requires holding back every
> operation done to the filesystem.
> 
> In btrfs, we're still able to do fsyncs quickly in this case
> because we have a dedicated log for that.  But there are a few different
> types of operations (like disk management) that require us to wait for
> the transaction to complete even when we use the dedicated log.
> 
>>
>> Whether you drain the file data queue or the rename queue first, in the
>> end you'd have to write it all....
> 
> It's about latency.  The latency required to write the entire file is
> unbounded (the size of the file is unbounded).  The latency required to
> commit the transaction without the file data is bounded because we are
> able to control the amount of metadata in each transaction.
> 
> See the firefox vs ext3 wars for an example of all of this, it's the
> latency the firefox people were (rightly) complaining about.
> 
>>
>> I thought the problem of delaying the renames was complexity, well, at
>> least T'Tso said it was [1] - I'm not sure if this applies to btrfs as well.
> 
> I'm afraid there are lots and lots of different issues at play.  The
> most important way to look at it is that forcing data to disk is very
> slow, which is why we try to avoid it whenever we can.
> 
> Applications can request that the data go to disk via lots of different
> ways.  Rename was never ever meant to be one of them, but it really does
> make sense to provide atomic replacement of old good data with new good
> data, so we've implemented that extra syncing.
> 
> Implementing syncing when userland doesn't expect extra syncing usually
> just make userland very unhappy.  It's not that we can't do it it's that
> doing it has implications for every application that uses rename.
> 
> -chris

Thanks for all the insight.

I will update the wiki FAQ to make clear what "data=ordered" in btrfs
means, what not, and why (or something like that).


Jakob

  parent reply	other threads:[~2010-05-18 18:24 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-17 18:04 Rename+crash behaviour of btrfs - nearly ext3! Jakob Unterwurzacher
2010-05-17 19:12 ` Ric Wheeler
2010-05-17 19:25 ` Josef Bacik
2010-05-17 20:09   ` Chris Mason
2010-05-17 20:30     ` Jakob Unterwurzacher
2010-05-17 19:36 ` Chris Mason
2010-05-18  0:14   ` Jakob Unterwurzacher
2010-05-18  0:30     ` Chris Mason
2010-05-18  0:59       ` Chris Mason
2010-05-18 12:03         ` Jakob Unterwurzacher
2010-05-18 13:13           ` Chris Mason
2010-05-18 13:28             ` Oystein Viggen
2010-05-18 14:47               ` Thomas Bellman
2010-05-18 13:39             ` Aidan Van Dyk
2010-05-18 14:06             ` Jakob Unterwurzacher
2010-05-18 14:36               ` Chris Mason
2010-05-18 15:57                 ` Jakob Unterwurzacher
2010-05-18 16:10                   ` Chris Mason
2010-05-18 18:01                     ` Goffredo Baroncelli
2010-05-18 18:24                     ` Jakob Unterwurzacher [this message]
2010-05-18 23:00             ` Ric Wheeler
2010-05-19  1:05               ` Bruce Guenter
2010-05-19  1:34             ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BF2DB70.4080702@gmail.com \
    --to=jakobunt@gmail.com \
    --cc=chris.mason@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.