From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jamie Lokier Subject: Re: btrfs O_DIRECT was [rfc] fsync_range? Date: Thu, 22 Jan 2009 00:06:36 +0000 Message-ID: <20090122000636.GC20407@shareable.org> References: <20090121045921.GA3944@shareable.org> <20090121062306.GK24891@wotan.suse.de> <20090121121308.GA31253@mit.edu> <20090121123711.GA10637@shareable.org> <20090121141207.GD31253@mit.edu> <1232548550.17244.3.camel@think.oraclecorp.com> <20090121204105.GA16133@shareable.org> <4977926E.30703@hp.com> <20090121215921.GG16133@shareable.org> <4977AAFA.7050503@hp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Chris Mason , linux-fsdevel@vger.kernel.org To: jim owens Return-path: Received: from mail2.shareable.org ([80.68.89.115]:46424 "EHLO mail2.shareable.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754400AbZAVAGl (ORCPT ); Wed, 21 Jan 2009 19:06:41 -0500 Content-Disposition: inline In-Reply-To: <4977AAFA.7050503@hp.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: jim owens wrote: > Jamie Lokier wrote: > > > >Writing in place or new-place on a *non-shared* (i.e. non-snapshotted) > >file is the choice which is useful. It's a filesystem implementation > >detail, not a semantic difference. I'm suggesting writing in place > >may do no harm and be more like the expected behaviour with programs > >that use O_DIRECT, which are usually databases. > > > >How about a btrfs mount option? > >in_place_write=never/always/direct_only. (Default direct_only). > > The harm is creating a special guarantee for just one case > of "don't move my data" based on a transient file open mode. > > What about defragmenting or moving the extent to another > device for performance or for (failing) device removal? > > We are on a slippery slope for presumed expectations. Don't make it a guarantee, just a hint to filesystem write strategy. It's ok to move data around when useful, we're not talking about a hard requirement, but a performance knob. The question is just what performance and fragmentation characteristics do programs that use O_DIRECT have? They are nearly all databases, filesystems-in-a-file, or virtual machine disks. I'm guessing virtually all of those _particular_ applications programs would perform significantly differently with a write-in-place strategy for most writes, although you'd still want access to the bells and whistles of snapshots and COW and so on when requested. Note I said differently :-) I'm not sure write-in-place performs better for those sort of applications. It's just a guess. Oracle probably has a really good idea how it performs on ZFS compared with a block device (which is always in place) - and knows whether ZFS does in-place writes with O_DIRECT or not. Chris? -- Jamie