* [FAQ v2] XFS speculative preallocation
@ 2014-04-07 15:39 Brian Foster
2014-04-07 19:08 ` Eric Sandeen
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Brian Foster @ 2014-04-07 15:39 UTC (permalink / raw)
To: xfs
Hi all,
This is v2 of the speculative preallocation FAQ bits. The initial
proposal was here:
http://oss.sgi.com/archives/xfs/2014-03/msg00316.html
This version includes some updates based on review from arekm and
dchinner. Most notably, the content has been broken down into a few more
questions. Unless there are further major changes required, I'll plan to
post something along these lines to the wiki when my account is
approved. Thanks for the feedback!
Brian
---
Q: Why do files on XFS use more data blocks than expected?
A:
The XFS speculative preallocation algorithm allocates extra blocks
beyond end of file (EOF) to minimise file fragmentation during buffered
write workloads. Workloads that benefit from this behaviour include
slowly growing files, concurrent writers and mixed reader/writer
workloads. It also provides fragmentation resistence in situations where
memory pressure prevents adequate buffering of dirty data to allow
formation of large contiguous regions of data in memory.
This post-EOF block allocation is accounted identically to blocks within
EOF. It is visible in 'st_blocks' counts via stat() system calls,
accounted as globally allocated space and against quotas that apply to
the associated file. The space is reported by various userspace
utilities (stat, du, df, ls) and thus provides a common source of
confusion for administrators. Post-EOF blocks are temporary in most
situations and are usually reclaimed via several possible mechanisms in
XFS.
See the FAQ entry on speculative preallocation for details.
Q: What is speculative preallocation?
A:
XFS speculatively preallocates post-EOF blocks on file extending writes
in anticipation of future extending writes. The size of a preallocation
is dynamic and depends on the runtime state of the file and fs.
Generally speaking, preallocation is disabled for very small files and
preallocation sizes grow as files grow larger.
Preallocations are capped to the maximum extent size supported by the
filesystem. Preallocation size is throttled automatically as the
filesystem approaches low free space conditions or other allocation
limits on a file (such as a quota).
In most cases, speculative preallocation is automatically reclaimed when
a file is closed. Preallocation may also persist beyond the lifecycle of
the file descriptor. Certain application behaviors that are known to
cause fragmentation, such as file server workloads, slowly growing
files, etc., benefit from this and delay the removal of preallocated
blocks beyond fd close.
Q: How can I speed up or avoid delayed removal of speculative
preallocation?
A:
Remove the inode from the VFS cache or unmount the filesystem to remove
speculative preallocations associated with an inode.
Linux 3.8 (and later) includes a scanner to perform background trimming
of files with lingering post-EOF preallocations. The scanner bypasses
dirty files to avoid interference with ongoing writes. A 5 minute scan
interval is used by default and can be adjusted via the following file
(value in seconds):
/proc/sys/fs/xfs/speculative_prealloc_lifetime
Q: Is speculative preallocation permanent?
A:
Although speculative preallocation can lead to reports of excess space
usage, the preallocated space is not permanent unless explicitly made so
via fallocate or a similar interface. Preallocated space can also be
encoded permanently in situations where file size is extended beyond a
range of post-EOF blocks (i.e., via truncate). Otherwise, preallocated
blocks are reclaimed on file close, inode reclaim, unmount or in the
background once file write activity subsides.
Q: My workload has known characteristics - can I tune speculative
preallocation to an optimal fixed size?
A:
The 'allocsize=' mount option configures the XFS block allocation
algorithm to use a fixed allocation size. Speculative preallocation is
not dynamically resized when the allocsize mount option is set and thus
the potential for fragmentation is increased. XFS historically set
allocsize to 64k by default.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [FAQ v2] XFS speculative preallocation 2014-04-07 15:39 [FAQ v2] XFS speculative preallocation Brian Foster @ 2014-04-07 19:08 ` Eric Sandeen 2014-04-07 19:56 ` Brian Foster 2014-04-07 19:08 ` Arkadiusz Miśkiewicz ` (2 subsequent siblings) 3 siblings, 1 reply; 12+ messages in thread From: Eric Sandeen @ 2014-04-07 19:08 UTC (permalink / raw) To: Brian Foster, xfs On 4/7/14, 10:39 AM, Brian Foster wrote: > Hi all, > > This is v2 of the speculative preallocation FAQ bits. The initial > proposal was here: > > http://oss.sgi.com/archives/xfs/2014-03/msg00316.html > > This version includes some updates based on review from arekm and > dchinner. Most notably, the content has been broken down into a few more > questions. Unless there are further major changes required, I'll plan to > post something along these lines to the wiki when my account is > approved. Thanks for the feedback! > > Brian > > --- > > Q: Why do files on XFS use more data blocks than expected? > > A: > > The XFS speculative preallocation algorithm allocates extra blocks > beyond end of file (EOF) to minimise file fragmentation during buffered s/minimise/minimize/ > write workloads. Workloads that benefit from this behaviour include > slowly growing files, concurrent writers and mixed reader/writer > workloads. It also provides fragmentation resistence in situations where s/resistence/resistance/ > memory pressure prevents adequate buffering of dirty data to allow > formation of large contiguous regions of data in memory. > > This post-EOF block allocation is accounted identically to blocks within > EOF. It is visible in 'st_blocks' counts via stat() system calls, > accounted as globally allocated space and against quotas that apply to > the associated file. The space is reported by various userspace > utilities (stat, du, df, ls) and thus provides a common source of > confusion for administrators. Post-EOF blocks are temporary in most > situations and are usually reclaimed via several possible mechanisms in > XFS. "usually reclaimed" - is it ever "never" reclaimed, then? > See the FAQ entry on speculative preallocation for details. > > Q: What is speculative preallocation? > > A: > > XFS speculatively preallocates post-EOF blocks on file extending writes > in anticipation of future extending writes. The size of a preallocation > is dynamic and depends on the runtime state of the file and fs. > Generally speaking, preallocation is disabled for very small files and > preallocation sizes grow as files grow larger. > > Preallocations are capped to the maximum extent size supported by the > filesystem. Preallocation size is throttled automatically as the > filesystem approaches low free space conditions or other allocation > limits on a file (such as a quota). > > In most cases, speculative preallocation is automatically reclaimed when > a file is closed. Preallocation may also persist beyond the lifecycle of > the file descriptor. Certain application behaviors that are known to > cause fragmentation, such as file server workloads, slowly growing > files, etc., benefit from this and delay the removal of preallocated > blocks beyond fd close. this is a little handwavy. "It's reclaimed when it's closed, except when it's not?" Can we say something more informative here? > Q: How can I speed up or avoid delayed removal of speculative > preallocation? > > A: > > Remove the inode from the VFS cache or unmount the filesystem to remove > speculative preallocations associated with an inode. How does a user remove an inode from the VFS cache? ;) So far the answer to this question sounds like "no." We can't remove a single inode; drop_caches is way too heavy weight, and unmount isn't really viable in most cases. > Linux 3.8 (and later) includes a scanner to perform background trimming > of files with lingering post-EOF preallocations. The scanner bypasses > dirty files to avoid interference with ongoing writes. A 5 minute scan > interval is used by default and can be adjusted via the following file > (value in seconds): > > /proc/sys/fs/xfs/speculative_prealloc_lifetime > > Q: Is speculative preallocation permanent? > > A: > > Although speculative preallocation can lead to reports of excess space > usage, the preallocated space is not permanent unless explicitly made so > via fallocate or a similar interface. Preallocated space can also be > encoded permanently in situations where file size is extended beyond a > range of post-EOF blocks (i.e., via truncate). Otherwise, preallocated (maybe "an extending truncate") > blocks are reclaimed on file close, inode reclaim, unmount or in the > background once file write activity subsides. > > Q: My workload has known characteristics - can I tune speculative > preallocation to an optimal fixed size? > > A: > > The 'allocsize=' mount option configures the XFS block allocation > algorithm to use a fixed allocation size. Speculative preallocation is > not dynamically resized when the allocsize mount option is set and thus > the potential for fragmentation is increased. XFS historically set > allocsize to 64k by default. Thanks, -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [FAQ v2] XFS speculative preallocation 2014-04-07 19:08 ` Eric Sandeen @ 2014-04-07 19:56 ` Brian Foster 0 siblings, 0 replies; 12+ messages in thread From: Brian Foster @ 2014-04-07 19:56 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs On Mon, Apr 07, 2014 at 02:08:31PM -0500, Eric Sandeen wrote: > On 4/7/14, 10:39 AM, Brian Foster wrote: > > Hi all, > > > > This is v2 of the speculative preallocation FAQ bits. The initial > > proposal was here: > > > > http://oss.sgi.com/archives/xfs/2014-03/msg00316.html > > > > This version includes some updates based on review from arekm and > > dchinner. Most notably, the content has been broken down into a few more > > questions. Unless there are further major changes required, I'll plan to > > post something along these lines to the wiki when my account is > > approved. Thanks for the feedback! > > > > Brian > > > > --- > > > > Q: Why do files on XFS use more data blocks than expected? > > > > A: > > > > The XFS speculative preallocation algorithm allocates extra blocks > > beyond end of file (EOF) to minimise file fragmentation during buffered > > s/minimise/minimize/ > Fixed. > > write workloads. Workloads that benefit from this behaviour include > > slowly growing files, concurrent writers and mixed reader/writer > > workloads. It also provides fragmentation resistence in situations where > > s/resistence/resistance/ > Fixed. > > memory pressure prevents adequate buffering of dirty data to allow > > formation of large contiguous regions of data in memory. > > > > This post-EOF block allocation is accounted identically to blocks within > > EOF. It is visible in 'st_blocks' counts via stat() system calls, > > accounted as globally allocated space and against quotas that apply to > > the associated file. The space is reported by various userspace > > utilities (stat, du, df, ls) and thus provides a common source of > > confusion for administrators. Post-EOF blocks are temporary in most > > situations and are usually reclaimed via several possible mechanisms in > > XFS. > > "usually reclaimed" - is it ever "never" reclaimed, then? > I worded it that way because of the several little corner cases that can turn preallocations permanent. E.g., the extending truncate case and IIRC, an fallocate on an inode means the space won't be trimmed either. > > See the FAQ entry on speculative preallocation for details. > > > > Q: What is speculative preallocation? > > > > A: > > > > XFS speculatively preallocates post-EOF blocks on file extending writes > > in anticipation of future extending writes. The size of a preallocation > > is dynamic and depends on the runtime state of the file and fs. > > Generally speaking, preallocation is disabled for very small files and > > preallocation sizes grow as files grow larger. > > > > Preallocations are capped to the maximum extent size supported by the > > filesystem. Preallocation size is throttled automatically as the > > filesystem approaches low free space conditions or other allocation > > limits on a file (such as a quota). > > > > In most cases, speculative preallocation is automatically reclaimed when > > a file is closed. Preallocation may also persist beyond the lifecycle of > > the file descriptor. Certain application behaviors that are known to > > cause fragmentation, such as file server workloads, slowly growing > > files, etc., benefit from this and delay the removal of preallocated > > blocks beyond fd close. > > this is a little handwavy. "It's reclaimed when it's closed, except > when it's not?" Can we say something more informative here? > This used to say: "In most cases, speculative preallocation is automatically reclaimed when a file is closed. The preallocation may persist after file close if an open, write, close pattern is repeated on a file. In this scenario, post-EOF preallocation is trimmed once the inode is reclaimed from cache or the filesystem unmounted." The point I want to get it across here is simply that the default case is to reclaim on close. The delayed reclaim scenario is the exception based on a heuristic. How about this? "In most cases, speculative preallocation is automatically reclaimed when a file is closed. Applications that repeatedly trigger preallocation and reclaim cycles (e.g., this is common in file server or log file workloads) can cause fragmentation. Therefore, this pattern is detected and causes the preallocation to persist beyond the lifecycle of the file descriptor." > > Q: How can I speed up or avoid delayed removal of speculative > > preallocation? > > > > A: > > > > Remove the inode from the VFS cache or unmount the filesystem to remove > > speculative preallocations associated with an inode. > > How does a user remove an inode from the VFS cache? ;) > > So far the answer to this question sounds like "no." > > We can't remove a single inode; drop_caches is way too heavy weight, > and unmount isn't really viable in most cases. > I guess there's a fine line between informing what mechanisms remove the preallocations and what is potentially recommending people take inappropriate actions to clear preallocated blocks. My initial intent was to simply inform that the traditional post-eof preallocation is not permanent (e.g. "don't worry, in the worst case this space is reclaimed on inode reclaim or umount"). Given that and this is a user FAQ, I'm sympathetic to nuking the "remove from cache" bit. The answer to this question becomes "use the scanner" (as described below) and the bits about reclaim/umount remain referenced indirectly in the answer to the next question. Thoughts? > > Linux 3.8 (and later) includes a scanner to perform background trimming > > of files with lingering post-EOF preallocations. The scanner bypasses > > dirty files to avoid interference with ongoing writes. A 5 minute scan > > interval is used by default and can be adjusted via the following file > > (value in seconds): > > > > /proc/sys/fs/xfs/speculative_prealloc_lifetime > > > > Q: Is speculative preallocation permanent? > > > > A: > > > > Although speculative preallocation can lead to reports of excess space > > usage, the preallocated space is not permanent unless explicitly made so > > via fallocate or a similar interface. Preallocated space can also be > > encoded permanently in situations where file size is extended beyond a > > range of post-EOF blocks (i.e., via truncate). Otherwise, preallocated > > (maybe "an extending truncate") > Ok. Thanks for the feedback. Brian > > blocks are reclaimed on file close, inode reclaim, unmount or in the > > background once file write activity subsides. > > > > Q: My workload has known characteristics - can I tune speculative > > preallocation to an optimal fixed size? > > > > A: > > > > The 'allocsize=' mount option configures the XFS block allocation > > algorithm to use a fixed allocation size. Speculative preallocation is > > not dynamically resized when the allocsize mount option is set and thus > > the potential for fragmentation is increased. XFS historically set > > allocsize to 64k by default. > > Thanks, > -Eric > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [FAQ v2] XFS speculative preallocation 2014-04-07 15:39 [FAQ v2] XFS speculative preallocation Brian Foster 2014-04-07 19:08 ` Eric Sandeen @ 2014-04-07 19:08 ` Arkadiusz Miśkiewicz 2014-04-07 19:58 ` Brian Foster 2014-04-07 19:58 ` Mark Tinguely 2014-04-17 13:07 ` Brian Foster 3 siblings, 1 reply; 12+ messages in thread From: Arkadiusz Miśkiewicz @ 2014-04-07 19:08 UTC (permalink / raw) To: xfs On Monday 07 of April 2014, Brian Foster wrote: > Q: How can I speed up or avoid delayed removal of speculative > preallocation? > > A: > > Remove the inode from the VFS cache or unmount the filesystem to remove > speculative preallocations associated with an inode. "Remove all inodes from the VFS cache" + ? AFAIK there is no way to remove single inode from cache. Example would be nice, too. -- Arkadiusz Miśkiewicz, arekm / maven.pl _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [FAQ v2] XFS speculative preallocation 2014-04-07 19:08 ` Arkadiusz Miśkiewicz @ 2014-04-07 19:58 ` Brian Foster 0 siblings, 0 replies; 12+ messages in thread From: Brian Foster @ 2014-04-07 19:58 UTC (permalink / raw) To: Arkadiusz Miśkiewicz; +Cc: xfs On Mon, Apr 07, 2014 at 09:08:59PM +0200, Arkadiusz Miśkiewicz wrote: > On Monday 07 of April 2014, Brian Foster wrote: > > > Q: How can I speed up or avoid delayed removal of speculative > > preallocation? > > > > A: > > > > Remove the inode from the VFS cache or unmount the filesystem to remove > > speculative preallocations associated with an inode. > > "Remove all inodes from the VFS cache" + ? AFAIK there is no way to remove > single inode from cache. Example would be nice, too. > Yeah, that's confusing. If people prefer this bit to stay (I prefer to nuke it, as noted in my previous reply), then I'll have to word it more accurately and cautiously. Thanks. Brian > -- > Arkadiusz Miśkiewicz, arekm / maven.pl > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [FAQ v2] XFS speculative preallocation 2014-04-07 15:39 [FAQ v2] XFS speculative preallocation Brian Foster 2014-04-07 19:08 ` Eric Sandeen 2014-04-07 19:08 ` Arkadiusz Miśkiewicz @ 2014-04-07 19:58 ` Mark Tinguely 2014-04-07 21:45 ` Brian Foster 2014-04-17 13:07 ` Brian Foster 3 siblings, 1 reply; 12+ messages in thread From: Mark Tinguely @ 2014-04-07 19:58 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On 04/07/14 10:39, Brian Foster wrote: > Hi all, > > This is v2 of the speculative preallocation FAQ bits. The initial > proposal was here: > > http://oss.sgi.com/archives/xfs/2014-03/msg00316.html > > This version includes some updates based on review from arekm and > dchinner. Most notably, the content has been broken down into a few more > questions. Unless there are further major changes required, I'll plan to > post something along these lines to the wiki when my account is > approved. Thanks for the feedback! > > Brian > > --- > > Q: Why do files on XFS use more data blocks than expected? > > A: > > The XFS speculative preallocation algorithm allocates extra blocks > beyond end of file (EOF) to minimise file fragmentation during buffered ^^^ beyond here and then later adopt post-EOF phrasing. ... > See the FAQ entry on speculative preallocation for details. > > Q: What is speculative preallocation? > > A: > > XFS speculatively preallocates post-EOF blocks on file extending writes > in anticipation of future extending writes. The size of a preallocation > is dynamic and depends on the runtime state of the file and fs. > Generally speaking, preallocation is disabled for very small files and vague what is very small? ^^^ ... > Q: Is speculative preallocation permanent? > > A: > > Although speculative preallocation can lead to reports of excess space > usage, the preallocated space is not permanent unless explicitly made so > via fallocate or a similar interface. Preallocated space can also be > encoded permanently in situations where file size is extended beyond a > range of post-EOF blocks (i.e., via truncate). Otherwise, preallocated > blocks are reclaimed on file close, inode reclaim, unmount or in the > background once file write activity subsides. Switch order? Normally, preallocated blocks are reclaimed on file close, inode reclaim, unmount or in the background once file write activity subsides. They can be explictly made permanent . > > Q: My workload has known characteristics - can I tune speculative > preallocation to an optimal fixed size? > > A: > > The 'allocsize=' mount option configures the XFS block allocation > algorithm to use a fixed allocation size. Speculative preallocation is > not dynamically resized when the allocsize mount option is set and thus > the potential for fragmentation is increased. XFS historically set sets the > allocsize to 64k by default. > Q: Can I disable S-P-A ? -Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [FAQ v2] XFS speculative preallocation 2014-04-07 19:58 ` Mark Tinguely @ 2014-04-07 21:45 ` Brian Foster 2014-04-07 22:21 ` Mark Tinguely 2014-04-07 22:54 ` Dave Chinner 0 siblings, 2 replies; 12+ messages in thread From: Brian Foster @ 2014-04-07 21:45 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Mon, Apr 07, 2014 at 02:58:45PM -0500, Mark Tinguely wrote: > On 04/07/14 10:39, Brian Foster wrote: > >Hi all, > > > >This is v2 of the speculative preallocation FAQ bits. The initial > >proposal was here: > > > >http://oss.sgi.com/archives/xfs/2014-03/msg00316.html > > > >This version includes some updates based on review from arekm and > >dchinner. Most notably, the content has been broken down into a few more > >questions. Unless there are further major changes required, I'll plan to > >post something along these lines to the wiki when my account is > >approved. Thanks for the feedback! > > > >Brian > > > >--- > > > >Q: Why do files on XFS use more data blocks than expected? > > > >A: > > > >The XFS speculative preallocation algorithm allocates extra blocks > >beyond end of file (EOF) to minimise file fragmentation during buffered > ^^^ beyond here and then later adopt post-EOF phrasing. > I think you're suggesting a broader terminology change, but I'm not quite following. Could you be specific about what "later" bits should change? What phrasing in particular..? > ... > > >See the FAQ entry on speculative preallocation for details. > > > >Q: What is speculative preallocation? > > > >A: > > > >XFS speculatively preallocates post-EOF blocks on file extending writes > >in anticipation of future extending writes. The size of a preallocation > >is dynamic and depends on the runtime state of the file and fs. > >Generally speaking, preallocation is disabled for very small files and > vague what is very small? ^^^ > ... I originally pointed out 64k, but that and other heuristic details that are subject to change were purged in v2. I'm personally not against including something that indicates the default and the notion that it's subject to change. I don't feel too strongly about it either way. Thoughts appreciated. > > > >Q: Is speculative preallocation permanent? > > > >A: > > > >Although speculative preallocation can lead to reports of excess space > >usage, the preallocated space is not permanent unless explicitly made so > >via fallocate or a similar interface. Preallocated space can also be > >encoded permanently in situations where file size is extended beyond a > >range of post-EOF blocks (i.e., via truncate). Otherwise, preallocated > >blocks are reclaimed on file close, inode reclaim, unmount or in the > >background once file write activity subsides. > > Switch order? > > Normally, preallocated > blocks are reclaimed on file close, inode reclaim, unmount or in the > background once file write activity subsides. They can be explictly > made permanent . > Thoughts on the following? "Preallocated blocks are normally reclaimed on file close, inode reclaim, unmount or in the background once file write activity subsides. They can be explicitly made permanent via fallocate or a similar interface. They can be implicitly made permanent in situations where file size is extended beyond a range of post-EOF blocks (i.e., via an extending truncate)." > > > >Q: My workload has known characteristics - can I tune speculative > >preallocation to an optimal fixed size? > > > >A: > > > >The 'allocsize=' mount option configures the XFS block allocation > >algorithm to use a fixed allocation size. Speculative preallocation is > >not dynamically resized when the allocsize mount option is set and thus > >the potential for fragmentation is increased. XFS historically set > > sets the > > >allocsize to 64k by default. > > > > > Q: Can I disable S-P-A ? > A: No..? ;) Are you proposing this with the similar intent to the previous Q (i.e., "what's the alternative to the default behavior?"), or with the notion that Dave pointed out how technically preallocation is not really "off?" Or something else? If the former, we could modify the question: "My workload has known characteristics - can I disable speculative preallocation or tune it to an optimal fixed size?" Or something along those lines. Would anybody object to also pointing out that 'allocsize=4k' (or allocsize=<blocksize>?) could be considered "speculative preallocation == off" from the user's perspective? Thanks for the feedback. Brian > -Mark. > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [FAQ v2] XFS speculative preallocation 2014-04-07 21:45 ` Brian Foster @ 2014-04-07 22:21 ` Mark Tinguely 2014-04-07 22:57 ` Dave Chinner 2014-04-08 12:04 ` Brian Foster 2014-04-07 22:54 ` Dave Chinner 1 sibling, 2 replies; 12+ messages in thread From: Mark Tinguely @ 2014-04-07 22:21 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On 04/07/14 16:45, Brian Foster wrote: > On Mon, Apr 07, 2014 at 02:58:45PM -0500, Mark Tinguely wrote: >> On 04/07/14 10:39, Brian Foster wrote: >>> Hi all, >>> >>> This is v2 of the speculative preallocation FAQ bits. The initial >>> proposal was here: >>> >>> http://oss.sgi.com/archives/xfs/2014-03/msg00316.html >>> >>> This version includes some updates based on review from arekm and >>> dchinner. Most notably, the content has been broken down into a few more >>> questions. Unless there are further major changes required, I'll plan to >>> post something along these lines to the wiki when my account is >>> approved. Thanks for the feedback! >>> >>> Brian >>> >>> --- >>> >>> Q: Why do files on XFS use more data blocks than expected? >>> >>> A: >>> >>> The XFS speculative preallocation algorithm allocates extra blocks >>> beyond end of file (EOF) to minimise file fragmentation during buffered >> ^^^ beyond here and then later adopt post-EOF phrasing. >> > > I think you're suggesting a broader terminology change, but I'm not > quite following. Could you be specific about what "later" bits should > change? What phrasing in particular..? You use "blocks beyond end of file (EOF)" here and then later use the terminology of "post-EOF" through the rest of the document. Just pointing out the change in terminology. > >> ... >> >>> See the FAQ entry on speculative preallocation for details. >>> >>> Q: What is speculative preallocation? >>> >>> A: >>> >>> XFS speculatively preallocates post-EOF blocks on file extending writes >>> in anticipation of future extending writes. The size of a preallocation >>> is dynamic and depends on the runtime state of the file and fs. >>> Generally speaking, preallocation is disabled for very small files and >> vague what is very small? ^^^ >> ... > > I originally pointed out 64k, but that and other heuristic details that > are subject to change were purged in v2. I'm personally not against > including something that indicates the default and the notion that it's > subject to change. I don't feel too strongly about it either way. > Thoughts appreciated. I think the details are good since everyone has a different idea on "very small". The FAQ can be changed with the code. You can expect the TOT FAQ to represent Linux 3.0-stable. >> >> >>> Q: Is speculative preallocation permanent? >>> >>> A: >>> >>> Although speculative preallocation can lead to reports of excess space >>> usage, the preallocated space is not permanent unless explicitly made so >>> via fallocate or a similar interface. Preallocated space can also be >>> encoded permanently in situations where file size is extended beyond a >>> range of post-EOF blocks (i.e., via truncate). Otherwise, preallocated >>> blocks are reclaimed on file close, inode reclaim, unmount or in the >>> background once file write activity subsides. >> >> Switch order? >> >> Normally, preallocated >> blocks are reclaimed on file close, inode reclaim, unmount or in the >> background once file write activity subsides. They can be explictly >> made permanent . >> > > Thoughts on the following? > > "Preallocated blocks are normally reclaimed on file close, inode > reclaim, unmount or in the background once file write activity subsides. > They can be explicitly made permanent via fallocate or a similar > interface. They can be implicitly made permanent in situations where > file size is extended beyond a range of post-EOF blocks (i.e., via an > extending truncate)." > Looks good to me. >>> >>> Q: My workload has known characteristics - can I tune speculative >>> preallocation to an optimal fixed size? >>> >>> A: >>> >>> The 'allocsize=' mount option configures the XFS block allocation >>> algorithm to use a fixed allocation size. Speculative preallocation is >>> not dynamically resized when the allocsize mount option is set and thus >>> the potential for fragmentation is increased. XFS historically set >> >> sets the >> >>> allocsize to 64k by default. >>> >> >> >> Q: Can I disable S-P-A ? >> > > A: No..? ;) > > Are you proposing this with the similar intent to the previous Q (i.e., > "what's the alternative to the default behavior?"), or with the notion > that Dave pointed out how technically preallocation is not really "off?" > Or something else? If the former, we could modify the question: > > "My workload has known characteristics - can I disable speculative > preallocation or tune it to an optimal fixed size?" > > Or something along those lines. Would anybody object to also pointing > out that 'allocsize=4k' (or allocsize=<blocksize>?) could be considered > "speculative preallocation == off" from the user's perspective? > That sounds good to me. If they know it is there, eventually someone will ask "can I turn it off?". I would be happy with the answer of "no, but it can be tuned" and don't tell them how to effectively turn it off. > Thanks for the feedback. > > Brian > Thanks for the FAQ. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [FAQ v2] XFS speculative preallocation 2014-04-07 22:21 ` Mark Tinguely @ 2014-04-07 22:57 ` Dave Chinner 2014-04-08 12:04 ` Brian Foster 1 sibling, 0 replies; 12+ messages in thread From: Dave Chinner @ 2014-04-07 22:57 UTC (permalink / raw) To: Mark Tinguely; +Cc: Brian Foster, xfs On Mon, Apr 07, 2014 at 05:21:04PM -0500, Mark Tinguely wrote: > On 04/07/14 16:45, Brian Foster wrote: > >On Mon, Apr 07, 2014 at 02:58:45PM -0500, Mark Tinguely wrote: > >>On 04/07/14 10:39, Brian Foster wrote: > >>>XFS speculatively preallocates post-EOF blocks on file extending writes > >>>in anticipation of future extending writes. The size of a preallocation > >>>is dynamic and depends on the runtime state of the file and fs. > >>>Generally speaking, preallocation is disabled for very small files and > >> vague what is very small? ^^^ > >>... > > > >I originally pointed out 64k, but that and other heuristic details that > >are subject to change were purged in v2. I'm personally not against > >including something that indicates the default and the notion that it's > >subject to change. I don't feel too strongly about it either way. > >Thoughts appreciated. > > > I think the details are good since everyone has a different idea on > "very small". The FAQ can be changed with the code. You can expect > the TOT FAQ to represent Linux 3.0-stable. What's that supposed to mean? The FAQ on the xfs.org website does not represent a specific release. It is supposed to contain the most up-to-date information we have about various topics. If there's something specific to a kernel version we need to mention, then that's explicitly stated in the FAQ entry.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [FAQ v2] XFS speculative preallocation 2014-04-07 22:21 ` Mark Tinguely 2014-04-07 22:57 ` Dave Chinner @ 2014-04-08 12:04 ` Brian Foster 1 sibling, 0 replies; 12+ messages in thread From: Brian Foster @ 2014-04-08 12:04 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Mon, Apr 07, 2014 at 05:21:04PM -0500, Mark Tinguely wrote: > On 04/07/14 16:45, Brian Foster wrote: > >On Mon, Apr 07, 2014 at 02:58:45PM -0500, Mark Tinguely wrote: > >>On 04/07/14 10:39, Brian Foster wrote: > >>>Hi all, > >>> > >>>This is v2 of the speculative preallocation FAQ bits. The initial > >>>proposal was here: > >>> > >>>http://oss.sgi.com/archives/xfs/2014-03/msg00316.html > >>> > >>>This version includes some updates based on review from arekm and > >>>dchinner. Most notably, the content has been broken down into a few more > >>>questions. Unless there are further major changes required, I'll plan to > >>>post something along these lines to the wiki when my account is > >>>approved. Thanks for the feedback! > >>> > >>>Brian > >>> > >>>--- > >>> > >>>Q: Why do files on XFS use more data blocks than expected? > >>> > >>>A: > >>> > >>>The XFS speculative preallocation algorithm allocates extra blocks > >>>beyond end of file (EOF) to minimise file fragmentation during buffered > >> ^^^ beyond here and then later adopt post-EOF phrasing. > >> > > > >I think you're suggesting a broader terminology change, but I'm not > >quite following. Could you be specific about what "later" bits should > >change? What phrasing in particular..? > > You use "blocks beyond end of file (EOF)" here and then later use > the terminology of "post-EOF" through the rest of the document. Just > pointing out the change in terminology. > > Ok. I was just trying to be more descriptive here, this being the initial question so to speak (i.e., spelling out "end of file"). The remainder uses the abbreviation introduced here. Brian > >>... > >> > >>>See the FAQ entry on speculative preallocation for details. > >>> > >>>Q: What is speculative preallocation? > >>> > >>>A: > >>> > >>>XFS speculatively preallocates post-EOF blocks on file extending writes > >>>in anticipation of future extending writes. The size of a preallocation > >>>is dynamic and depends on the runtime state of the file and fs. > >>>Generally speaking, preallocation is disabled for very small files and > >> vague what is very small? ^^^ > >>... > > > >I originally pointed out 64k, but that and other heuristic details that > >are subject to change were purged in v2. I'm personally not against > >including something that indicates the default and the notion that it's > >subject to change. I don't feel too strongly about it either way. > >Thoughts appreciated. > > > I think the details are good since everyone has a different idea on > "very small". The FAQ can be changed with the code. You can expect > the TOT FAQ to represent Linux 3.0-stable. > > >> > >> > >>>Q: Is speculative preallocation permanent? > >>> > >>>A: > >>> > >>>Although speculative preallocation can lead to reports of excess space > >>>usage, the preallocated space is not permanent unless explicitly made so > >>>via fallocate or a similar interface. Preallocated space can also be > >>>encoded permanently in situations where file size is extended beyond a > >>>range of post-EOF blocks (i.e., via truncate). Otherwise, preallocated > >>>blocks are reclaimed on file close, inode reclaim, unmount or in the > >>>background once file write activity subsides. > >> > >>Switch order? > >> > >>Normally, preallocated > >>blocks are reclaimed on file close, inode reclaim, unmount or in the > >>background once file write activity subsides. They can be explictly > >>made permanent . > >> > > > >Thoughts on the following? > > > >"Preallocated blocks are normally reclaimed on file close, inode > >reclaim, unmount or in the background once file write activity subsides. > >They can be explicitly made permanent via fallocate or a similar > >interface. They can be implicitly made permanent in situations where > >file size is extended beyond a range of post-EOF blocks (i.e., via an > >extending truncate)." > > > > Looks good to me. > > >>> > >>>Q: My workload has known characteristics - can I tune speculative > >>>preallocation to an optimal fixed size? > >>> > >>>A: > >>> > >>>The 'allocsize=' mount option configures the XFS block allocation > >>>algorithm to use a fixed allocation size. Speculative preallocation is > >>>not dynamically resized when the allocsize mount option is set and thus > >>>the potential for fragmentation is increased. XFS historically set > >> > >>sets the > >> > >>>allocsize to 64k by default. > >>> > >> > >> > >>Q: Can I disable S-P-A ? > >> > > > >A: No..? ;) > > > >Are you proposing this with the similar intent to the previous Q (i.e., > >"what's the alternative to the default behavior?"), or with the notion > >that Dave pointed out how technically preallocation is not really "off?" > >Or something else? If the former, we could modify the question: > > > >"My workload has known characteristics - can I disable speculative > >preallocation or tune it to an optimal fixed size?" > > > >Or something along those lines. Would anybody object to also pointing > >out that 'allocsize=4k' (or allocsize=<blocksize>?) could be considered > >"speculative preallocation == off" from the user's perspective? > > > > That sounds good to me. If they know it is there, eventually someone > will ask "can I turn it off?". I would be happy with the answer of > "no, but it can be tuned" and don't tell them how to effectively > turn it off. > > >Thanks for the feedback. > > > >Brian > > > > Thanks for the FAQ. > > --Mark. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [FAQ v2] XFS speculative preallocation 2014-04-07 21:45 ` Brian Foster 2014-04-07 22:21 ` Mark Tinguely @ 2014-04-07 22:54 ` Dave Chinner 1 sibling, 0 replies; 12+ messages in thread From: Dave Chinner @ 2014-04-07 22:54 UTC (permalink / raw) To: Brian Foster; +Cc: Mark Tinguely, xfs On Mon, Apr 07, 2014 at 05:45:27PM -0400, Brian Foster wrote: > On Mon, Apr 07, 2014 at 02:58:45PM -0500, Mark Tinguely wrote: > > On 04/07/14 10:39, Brian Foster wrote: > > >See the FAQ entry on speculative preallocation for details. > > > > > >Q: What is speculative preallocation? > > > > > >A: > > > > > >XFS speculatively preallocates post-EOF blocks on file extending writes > > >in anticipation of future extending writes. The size of a preallocation > > >is dynamic and depends on the runtime state of the file and fs. > > >Generally speaking, preallocation is disabled for very small files and > > vague what is very small? ^^^ > > ... > > I originally pointed out 64k, but that and other heuristic details that > are subject to change were purged in v2. I'm personally not against > including something that indicates the default and the notion that it's > subject to change. I don't feel too strongly about it either way. > Thoughts appreciated. As i said in the original - if we put specific values in here or describe the exact heuristics, then it will be wrong the moment we change the code. If someone wants to know the exact details on how it works, then they can look at the code for the kernel they are running, because the behaviour can (and does) change from kernel to kernel. > > >Q: Is speculative preallocation permanent? > > > > > >A: > > > > > >Although speculative preallocation can lead to reports of excess space > > >usage, the preallocated space is not permanent unless explicitly made so > > >via fallocate or a similar interface. Preallocated space can also be > > >encoded permanently in situations where file size is extended beyond a > > >range of post-EOF blocks (i.e., via truncate). Otherwise, preallocated > > >blocks are reclaimed on file close, inode reclaim, unmount or in the > > >background once file write activity subsides. > > > > Switch order? > > > > Normally, preallocated > > blocks are reclaimed on file close, inode reclaim, unmount or in the > > background once file write activity subsides. They can be explictly > > made permanent . > > > > Thoughts on the following? > > "Preallocated blocks are normally reclaimed on file close, inode > reclaim, unmount or in the background once file write activity subsides. > They can be explicitly made permanent via fallocate or a similar > interface. They can be implicitly made permanent in situations where > file size is extended beyond a range of post-EOF blocks (i.e., via an > extending truncate)." Speculative prealloc may end up permanent on a crash, because the in-memory state used to track and reclaim the speculative preallocation is lost. > > >Q: My workload has known characteristics - can I tune speculative > > >preallocation to an optimal fixed size? > > > > > >A: > > > > > >The 'allocsize=' mount option configures the XFS block allocation > > >algorithm to use a fixed allocation size. Speculative preallocation is > > >not dynamically resized when the allocsize mount option is set and thus > > >the potential for fragmentation is increased. XFS historically set > > > > sets the > > > > >allocsize to 64k by default. > > > > > > > > > Q: Can I disable S-P-A ? > > > > A: No..? ;) That's the correct answer ;) > Are you proposing this with the similar intent to the previous Q (i.e., > "what's the alternative to the default behavior?"), or with the notion > that Dave pointed out how technically preallocation is not really "off?" > Or something else? If the former, we could modify the question: > > "My workload has known characteristics - can I disable speculative > preallocation or tune it to an optimal fixed size?" Yup, I'd change the questions like that. > Or something along those lines. Would anybody object to also pointing > out that 'allocsize=4k' (or allocsize=<blocksize>?) could be considered > "speculative preallocation == off" from the user's perspective? >From a users perspective, fixed size preallocation with anything less than allocsize=64k would be "off". That was the old default, and no user ever noticed that speculative prealloc was occurring except on large files when it failed to prevent fragmentation.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [FAQ v2] XFS speculative preallocation 2014-04-07 15:39 [FAQ v2] XFS speculative preallocation Brian Foster ` (2 preceding siblings ...) 2014-04-07 19:58 ` Mark Tinguely @ 2014-04-17 13:07 ` Brian Foster 3 siblings, 0 replies; 12+ messages in thread From: Brian Foster @ 2014-04-17 13:07 UTC (permalink / raw) To: xfs On Mon, Apr 07, 2014 at 11:39:06AM -0400, Brian Foster wrote: > Hi all, > > This is v2 of the speculative preallocation FAQ bits. The initial > proposal was here: > > http://oss.sgi.com/archives/xfs/2014-03/msg00316.html > > This version includes some updates based on review from arekm and > dchinner. Most notably, the content has been broken down into a few more > questions. Unless there are further major changes required, I'll plan to > post something along these lines to the wiki when my account is > approved. Thanks for the feedback! > > Brian > > --- I've updated the wiki with this content plus the feedback in this thread. The new FAQs are here: http://xfs.org/index.php/XFS_FAQ#Q:_Why_do_files_on_XFS_use_more_data_blocks_than_expected.3F http://xfs.org/index.php/XFS_FAQ#Q:_What_is_speculative_preallocation.3F http://xfs.org/index.php/XFS_FAQ#Q:_How_can_I_speed_up_or_avoid_delayed_removal_of_speculative_preallocation.3F http://xfs.org/index.php/XFS_FAQ#Q:_Is_speculative_preallocation_permanent.3F http://xfs.org/index.php/XFS_FAQ#Q:_My_workload_has_known_characteristics_-_can_I_disable_speculative_preallocation_or_tune_it_to_an_optimal_fixed_size.3F Thanks for all of the reviews and feedback. If there are any further suggestions... well, it's wiki! Feel free to modify it. ;) Brian > > Q: Why do files on XFS use more data blocks than expected? > > A: > > The XFS speculative preallocation algorithm allocates extra blocks > beyond end of file (EOF) to minimise file fragmentation during buffered > write workloads. Workloads that benefit from this behaviour include > slowly growing files, concurrent writers and mixed reader/writer > workloads. It also provides fragmentation resistence in situations where > memory pressure prevents adequate buffering of dirty data to allow > formation of large contiguous regions of data in memory. > > This post-EOF block allocation is accounted identically to blocks within > EOF. It is visible in 'st_blocks' counts via stat() system calls, > accounted as globally allocated space and against quotas that apply to > the associated file. The space is reported by various userspace > utilities (stat, du, df, ls) and thus provides a common source of > confusion for administrators. Post-EOF blocks are temporary in most > situations and are usually reclaimed via several possible mechanisms in > XFS. > > See the FAQ entry on speculative preallocation for details. > > Q: What is speculative preallocation? > > A: > > XFS speculatively preallocates post-EOF blocks on file extending writes > in anticipation of future extending writes. The size of a preallocation > is dynamic and depends on the runtime state of the file and fs. > Generally speaking, preallocation is disabled for very small files and > preallocation sizes grow as files grow larger. > > Preallocations are capped to the maximum extent size supported by the > filesystem. Preallocation size is throttled automatically as the > filesystem approaches low free space conditions or other allocation > limits on a file (such as a quota). > > In most cases, speculative preallocation is automatically reclaimed when > a file is closed. Preallocation may also persist beyond the lifecycle of > the file descriptor. Certain application behaviors that are known to > cause fragmentation, such as file server workloads, slowly growing > files, etc., benefit from this and delay the removal of preallocated > blocks beyond fd close. > > Q: How can I speed up or avoid delayed removal of speculative > preallocation? > > A: > > Remove the inode from the VFS cache or unmount the filesystem to remove > speculative preallocations associated with an inode. > > Linux 3.8 (and later) includes a scanner to perform background trimming > of files with lingering post-EOF preallocations. The scanner bypasses > dirty files to avoid interference with ongoing writes. A 5 minute scan > interval is used by default and can be adjusted via the following file > (value in seconds): > > /proc/sys/fs/xfs/speculative_prealloc_lifetime > > Q: Is speculative preallocation permanent? > > A: > > Although speculative preallocation can lead to reports of excess space > usage, the preallocated space is not permanent unless explicitly made so > via fallocate or a similar interface. Preallocated space can also be > encoded permanently in situations where file size is extended beyond a > range of post-EOF blocks (i.e., via truncate). Otherwise, preallocated > blocks are reclaimed on file close, inode reclaim, unmount or in the > background once file write activity subsides. > > Q: My workload has known characteristics - can I tune speculative > preallocation to an optimal fixed size? > > A: > > The 'allocsize=' mount option configures the XFS block allocation > algorithm to use a fixed allocation size. Speculative preallocation is > not dynamically resized when the allocsize mount option is set and thus > the potential for fragmentation is increased. XFS historically set > allocsize to 64k by default. > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-04-17 13:07 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-07 15:39 [FAQ v2] XFS speculative preallocation Brian Foster 2014-04-07 19:08 ` Eric Sandeen 2014-04-07 19:56 ` Brian Foster 2014-04-07 19:08 ` Arkadiusz Miśkiewicz 2014-04-07 19:58 ` Brian Foster 2014-04-07 19:58 ` Mark Tinguely 2014-04-07 21:45 ` Brian Foster 2014-04-07 22:21 ` Mark Tinguely 2014-04-07 22:57 ` Dave Chinner 2014-04-08 12:04 ` Brian Foster 2014-04-07 22:54 ` Dave Chinner 2014-04-17 13:07 ` Brian Foster
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).