From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?ISO-8859-1?Q?P=E1draig_Brady?= Subject: Re: fallocate vs ENOSPC Date: Tue, 29 Nov 2011 14:11:48 +0000 Message-ID: <4ED4E824.4030107@draigBrady.com> References: <4ECF6D41.2040801@draigBrady.com> <20111125104050.GA26729@infradead.org> <20111127031455.GK5167@thunk.org> <20111127234331.GW2386@dastard> <20111128051054.GZ2386@dastard> <4ED34C66.8050300@draigBrady.com> <20111129002432.GA2386@dastard> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Theodore Tso , Christoph Hellwig , linux-fsdevel@vger.kernel.org To: Dave Chinner Return-path: Received: from mail2.vodafone.ie ([213.233.128.44]:46209 "EHLO mail2.vodafone.ie" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752000Ab1K2OLv (ORCPT ); Tue, 29 Nov 2011 09:11:51 -0500 In-Reply-To: <20111129002432.GA2386@dastard> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 11/29/2011 12:24 AM, Dave Chinner wrote: > On Mon, Nov 28, 2011 at 08:55:02AM +0000, P=E1draig Brady wrote: >> On 11/28/2011 05:10 AM, Dave Chinner wrote: >>> Quite frankly, if system utilities like cp and tar start to abuse >>> fallocate() by default so they can get "upfront ENOSPC detection", >>> then I will seriously consider making XFS use delayed allocation fo= r >>> fallocate rather than unwritten extents so we don't lose the past 1= 5 >>> years worth of IO and aging optimisations that delayed allocation >>> provides us with.... >> >> For the record I was considering fallocate() for these reasons. >> >> 1. Improved file layout for subsequent access >> 2. Immediate indication of ENOSPC >> 3. Efficient writing of NUL portions >> >> You lucidly detailed issues with 1. which I suppose could be somewha= t >> mitigated by not fallocating < say 1MB, though I suppose file system= s >> could be smarter here and not preallocate small chunks (or when >> otherwise not appropriate). >=20 > When you consider that some high end filesystem deployments have alig= nment > characteristics over 50MB (e.g. so each uncompressed 4k resolution > video frame is located on a different set of non-overlapping disks), > arbitrary "don't fallocate below this amount" heuristics will always > have unforseen failure cases... So about this alignment policy, I don't understand the issues so I'm gu= essing here. You say delalloc packs files, while fallocate() will align on XFS accor= ding to the stripe config. Is that assuming that when writing lots of files, th= at they will be more likely to be read together, rather than independently. That's a big assumption if true. Also the converse is a big assumption,= that fallocate() should be aligned, as that's more likely to be read indepen= dently. > In short: leave optimising general allocation strategies to the > filesytems and their developers - there is no One True Solution for > optimal file layout in a given filesystem, let alone across > different filesytems. In fact, I don't even want to think about the > mess fallocate() on everything would make of btrfs because of it's > COW structure - it seems to me to guarantee worse fragmentation than > using delayed allocation... >=20 >> We can already get ENOSPC from a write() >> after an fallocate() in certain edge cases, so it would probably mak= e >> sense to expand those cases. >=20 > fallocate is for preallocation, not for ENOSPC detection. If you > want efficient and effective ENOSPC detection before writing > anything, then you really want a space -reservation- extension to > fallocate. Filesystems that use delayed allocation already have a > space reservation subsystem - it how they account for space that is > reserved by delayed allocation prior to the real allocation being > done. IMO, allowing userspace some level of access to those > reservations would be more appropriate for early detection of ENOSPC > than using preallocation for everything... =46air enough, so fallocate() would be a superset of reserve(), though I'm having a hard time thinking of why one might ever need to fallocate() then. > As to efficient writing of NULL ranges - that's what sparse files > are for - you do not need to write or even preallocate NULL ranges > when copying files. Indeed, the most efficient way of dealing with > NULL ranges is to punch a hole and let the filesystem deal with > it..... well not for `cp --sparse=3Dnever` which might be used so that processing of the copy will not result in ENOSPC. I'm also linking here to a related discussion. http://oss.sgi.com/archives/xfs/2011-06/msg00064.html Note also that the gold linker does fallocate() on output files by defa= ult. cheers, P=E1draig. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html