From: jim owens <jowens@hp.com>
To: Jamie Lokier <jamie@shareable.org>
Cc: Chris Mason <chris.mason@oracle.com>, linux-fsdevel@vger.kernel.org
Subject: Re: btrfs O_DIRECT was [rfc] fsync_range?
Date: Thu, 22 Jan 2009 08:50:37 -0500 [thread overview]
Message-ID: <497879AD.30204@hp.com> (raw)
In-Reply-To: <20090122000636.GC20407@shareable.org>
Jamie Lokier wrote:
> jim owens wrote:
>> Jamie Lokier wrote:
>>> Writing in place or new-place on a *non-shared* (i.e. non-snapshotted)
>>> file is the choice which is useful. It's a filesystem implementation
>>> detail, not a semantic difference. I'm suggesting writing in place
>>> may do no harm and be more like the expected behaviour with programs
>>> that use O_DIRECT, which are usually databases.
>>>
>>> How about a btrfs mount option?
>>> in_place_write=never/always/direct_only. (Default direct_only).
>> The harm is creating a special guarantee for just one case
>> of "don't move my data" based on a transient file open mode.
>>
>> What about defragmenting or moving the extent to another
>> device for performance or for (failing) device removal?
>>
>> We are on a slippery slope for presumed expectations.
>
> Don't make it a guarantee, just a hint to filesystem write strategy.
>
> It's ok to move data around when useful, we're not talking about a
> hard requirement, but a performance knob.
>
> The question is just what performance and fragmentation
> characteristics do programs that use O_DIRECT have?
>
> They are nearly all databases, filesystems-in-a-file, or virtual
> machine disks. I'm guessing virtually all of those _particular_
> applications programs would perform significantly differently with a
> write-in-place strategy for most writes, although you'd still want
> access to the bells and whistles of snapshots and COW and so on when
> requested.
>
> Note I said differently :-) I'm not sure write-in-place performs
> better for those sort of applications. It's just a guess.
I'm very certain that write-in-place performs much better
than cow because as we all know, doing storage allocation
is expensive. So many databases preallocate their files.
> Oracle probably has a really good idea how it performs on ZFS compared
> with a block device (which is always in place) - and knows whether ZFS
> does in-place writes with O_DIRECT or not. Chris?
We only disagree how the rule to write-in-place is defined
and more importantly documented so it is easy to understand.
Btrfs allows each individual file to have "nodatacow" set
as an attribute. That is an easy rule to document for
the db admin. Much easier than "if nothing else takes
precedence to make it cow, O_DIRECT will write-in-place".
jim
next prev parent reply other threads:[~2009-01-22 13:50 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-20 16:47 [rfc] fsync_range? Nick Piggin
2009-01-20 18:31 ` Jamie Lokier
2009-01-20 21:25 ` Bryan Henderson
2009-01-20 22:42 ` Jamie Lokier
2009-01-21 19:43 ` Bryan Henderson
2009-01-21 21:08 ` Jamie Lokier
2009-01-21 22:44 ` Bryan Henderson
2009-01-21 23:31 ` Jamie Lokier
2009-01-21 1:36 ` Nick Piggin
2009-01-21 19:58 ` Bryan Henderson
2009-01-21 20:53 ` Jamie Lokier
2009-01-21 22:14 ` Bryan Henderson
2009-01-21 22:30 ` Jamie Lokier
2009-01-22 1:52 ` Bryan Henderson
2009-01-22 3:41 ` Jamie Lokier
2009-01-21 1:29 ` Nick Piggin
2009-01-21 3:15 ` Jamie Lokier
2009-01-21 3:48 ` Nick Piggin
2009-01-21 5:24 ` Jamie Lokier
2009-01-21 6:16 ` Nick Piggin
2009-01-21 11:18 ` Jamie Lokier
2009-01-21 11:41 ` Nick Piggin
2009-01-21 12:09 ` Jamie Lokier
2009-01-21 4:16 ` Nick Piggin
2009-01-21 4:59 ` Jamie Lokier
2009-01-21 6:23 ` Nick Piggin
2009-01-21 12:02 ` Jamie Lokier
2009-01-21 12:13 ` Theodore Tso
2009-01-21 12:37 ` Jamie Lokier
2009-01-21 14:12 ` Theodore Tso
2009-01-21 14:35 ` Chris Mason
2009-01-21 15:58 ` Eric Sandeen
2009-01-21 20:41 ` Jamie Lokier
2009-01-21 21:23 ` jim owens
2009-01-21 21:59 ` Jamie Lokier
2009-01-21 23:08 ` btrfs O_DIRECT was " jim owens
2009-01-22 0:06 ` Jamie Lokier
2009-01-22 13:50 ` jim owens [this message]
2009-01-22 21:18 ` Florian Weimer
2009-01-22 21:23 ` Florian Weimer
2009-01-21 3:25 ` Jamie Lokier
2009-01-21 3:52 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=497879AD.30204@hp.com \
--to=jowens@hp.com \
--cc=chris.mason@oracle.com \
--cc=jamie@shareable.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).