From mboxrd@z Thu Jan 1 00:00:00 1970
From: =?ISO-8859-1?Q?P=E1draig_Brady?=
Subject: Re: fallocate vs ENOSPC
Date: Tue, 29 Nov 2011 14:11:48 +0000
Message-ID: <4ED4E824.4030107@draigBrady.com>
References: <4ECF6D41.2040801@draigBrady.com> <20111125104050.GA26729@infradead.org> <20111127031455.GK5167@thunk.org> <20111127234331.GW2386@dastard> <20111128051054.GZ2386@dastard> <4ED34C66.8050300@draigBrady.com> <20111129002432.GA2386@dastard>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: Theodore Tso ,
Christoph Hellwig ,
linux-fsdevel@vger.kernel.org
To: Dave Chinner
Return-path:
Received: from mail2.vodafone.ie ([213.233.128.44]:46209 "EHLO
mail2.vodafone.ie" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
with ESMTP id S1752000Ab1K2OLv (ORCPT
);
Tue, 29 Nov 2011 09:11:51 -0500
In-Reply-To: <20111129002432.GA2386@dastard>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID:
On 11/29/2011 12:24 AM, Dave Chinner wrote:
> On Mon, Nov 28, 2011 at 08:55:02AM +0000, P=E1draig Brady wrote:
>> On 11/28/2011 05:10 AM, Dave Chinner wrote:
>>> Quite frankly, if system utilities like cp and tar start to abuse
>>> fallocate() by default so they can get "upfront ENOSPC detection",
>>> then I will seriously consider making XFS use delayed allocation fo=
r
>>> fallocate rather than unwritten extents so we don't lose the past 1=
5
>>> years worth of IO and aging optimisations that delayed allocation
>>> provides us with....
>>
>> For the record I was considering fallocate() for these reasons.
>>
>> 1. Improved file layout for subsequent access
>> 2. Immediate indication of ENOSPC
>> 3. Efficient writing of NUL portions
>>
>> You lucidly detailed issues with 1. which I suppose could be somewha=
t
>> mitigated by not fallocating < say 1MB, though I suppose file system=
s
>> could be smarter here and not preallocate small chunks (or when
>> otherwise not appropriate).
>=20
> When you consider that some high end filesystem deployments have alig=
nment
> characteristics over 50MB (e.g. so each uncompressed 4k resolution
> video frame is located on a different set of non-overlapping disks),
> arbitrary "don't fallocate below this amount" heuristics will always
> have unforseen failure cases...
So about this alignment policy, I don't understand the issues so I'm gu=
essing here.
You say delalloc packs files, while fallocate() will align on XFS accor=
ding to
the stripe config. Is that assuming that when writing lots of files, th=
at they
will be more likely to be read together, rather than independently.
That's a big assumption if true. Also the converse is a big assumption,=
that
fallocate() should be aligned, as that's more likely to be read indepen=
dently.
> In short: leave optimising general allocation strategies to the
> filesytems and their developers - there is no One True Solution for
> optimal file layout in a given filesystem, let alone across
> different filesytems. In fact, I don't even want to think about the
> mess fallocate() on everything would make of btrfs because of it's
> COW structure - it seems to me to guarantee worse fragmentation than
> using delayed allocation...
>=20
>> We can already get ENOSPC from a write()
>> after an fallocate() in certain edge cases, so it would probably mak=
e
>> sense to expand those cases.
>=20
> fallocate is for preallocation, not for ENOSPC detection. If you
> want efficient and effective ENOSPC detection before writing
> anything, then you really want a space -reservation- extension to
> fallocate. Filesystems that use delayed allocation already have a
> space reservation subsystem - it how they account for space that is
> reserved by delayed allocation prior to the real allocation being
> done. IMO, allowing userspace some level of access to those
> reservations would be more appropriate for early detection of ENOSPC
> than using preallocation for everything...
=46air enough, so fallocate() would be a superset of reserve(),
though I'm having a hard time thinking of why one might ever need to
fallocate() then.
> As to efficient writing of NULL ranges - that's what sparse files
> are for - you do not need to write or even preallocate NULL ranges
> when copying files. Indeed, the most efficient way of dealing with
> NULL ranges is to punch a hole and let the filesystem deal with
> it.....
well not for `cp --sparse=3Dnever` which might be used
so that processing of the copy will not result in ENOSPC.
I'm also linking here to a related discussion.
http://oss.sgi.com/archives/xfs/2011-06/msg00064.html
Note also that the gold linker does fallocate() on output files by defa=
ult.
cheers,
P=E1draig.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel=
" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html