From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jamie Lokier Subject: Re: [rfc] fsync_range? Date: Wed, 21 Jan 2009 21:59:21 +0000 Message-ID: <20090121215921.GG16133@shareable.org> References: <20090121031500.GA2354@shareable.org> <20090121041604.GI24891@wotan.suse.de> <20090121045921.GA3944@shareable.org> <20090121062306.GK24891@wotan.suse.de> <20090121121308.GA31253@mit.edu> <20090121123711.GA10637@shareable.org> <20090121141207.GD31253@mit.edu> <1232548550.17244.3.camel@think.oraclecorp.com> <20090121204105.GA16133@shareable.org> <4977926E.30703@hp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Chris Mason , Theodore Tso , Nick Piggin , linux-fsdevel@vger.kernel.org, Eric Sandeen To: jim owens Return-path: Received: from mail2.shareable.org ([80.68.89.115]:48525 "EHLO mail2.shareable.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752750AbZAUV7f (ORCPT ); Wed, 21 Jan 2009 16:59:35 -0500 Content-Disposition: inline In-Reply-To: <4977926E.30703@hp.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: jim owens wrote: > Jamie Lokier wrote: > > > >Does O_DIRECT on btrfs still allocate new data blocks? > >That's not very direct :-) > > > >I'm thinking if O_DIRECT is set, considering what's likely to request > >it, it may be reasonable for it to mean "overwrite in place" too > >(except for files which are actually COW-shared with others of course). > > O_DIRECT for databases is to bypass the OS file data cache. > > Those (oracle) who have long experience with it on unix > know that the physical storage location can change on > a filesystem. > > I do not think we want to make a special case, > it should be up to the db admin to choose cow/nocow > because if they want SNAPSHOTS they need cow. SNAPSHOTS is what "except for files which are actually COW-shared with others of course" refers to. An option to "choose" to corrupt snapshots would be very silly. Writing in place or new-place on a *non-shared* (i.e. non-snapshotted) file is the choice which is useful. It's a filesystem implementation detail, not a semantic difference. I'm suggesting writing in place may do no harm and be more like the expected behaviour with programs that use O_DIRECT, which are usually databases. How about a btrfs mount option? in_place_write=never/always/direct_only. (Default direct_only). -- Jamie