* [QUESTION] about the freelist allocator in XFS
@ 2016-07-07 11:01 Kaho Ng
2016-07-07 12:13 ` Brian Foster
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Kaho Ng @ 2016-07-07 11:01 UTC (permalink / raw)
To: xfs
I am trying to investigate how freelist allocator in xfs interacts
with freespace B+Tree allocator.
First I prepared a patch
<https://gist.github.com/22ffca35929e67c08759b057779b7566> on
linux-source/fs/xfs/libxfs/xfs_alloc.c to print debugging messages
(The kernel version used is linux-3.10.0-327.22.2.el7).
Then, I wrote a simple utility
<https://gist.github.com/992364ceca984d3f14099ec94aaacd9d> to make
TONS of
holes in a filesystem by calling fallocate() to punch holes in a file
that is almost as large as the volume size.
I created an XFS filesystem image by the following steps:
1. fallocate -l 80G /mnt/disk2/xfs
2. mkfs.xfs -f -d agcount=1 /mnt/disk2/xfs
Then I created a large file by fallocate:
fallocate -l 85823746048 /mnt/test/abc
which left only 4 blocks available in the volume finally:
/dev/loop0 20961280 20961276 4 100% /mnt/test
The result of xfs_bmap against /mnt/test/abc:
/mnt/test/abc:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..167624503]: 83000..167707503 0 (83000..167707503) 167624504 10000
After that, I used the hole-punching utility above to create holes on
the files, and captured the output of kmsg.
When reading the log output
<https://gist.github.com/890076405e1c13c0a952a579e25e6afe> , I
realised that there is no B+Tree split
triggered by xfs_alloc_fix_freelist() when calling xfs_free_extent().
Isn't B+Tree split possible in by-size B+Tree even when truncating a
longer freespace record to shorter one? But what I found in the log is
only a few tree shrinks... And when reading the source code of
freespace allocator I found that a B+Tree growth in this case is
impossible at least...
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [QUESTION] about the freelist allocator in XFS 2016-07-07 11:01 [QUESTION] about the freelist allocator in XFS Kaho Ng @ 2016-07-07 12:13 ` Brian Foster 2016-07-07 22:28 ` Dave Chinner ` (2 subsequent siblings) 3 siblings, 0 replies; 13+ messages in thread From: Brian Foster @ 2016-07-07 12:13 UTC (permalink / raw) To: Kaho Ng; +Cc: xfs On Thu, Jul 07, 2016 at 07:01:35PM +0800, Kaho Ng wrote: > I am trying to investigate how freelist allocator in xfs interacts > with freespace B+Tree allocator. > First I prepared a patch > <https://gist.github.com/22ffca35929e67c08759b057779b7566> on > linux-source/fs/xfs/libxfs/xfs_alloc.c to print debugging messages > (The kernel version used is linux-3.10.0-327.22.2.el7). > Then, I wrote a simple utility > <https://gist.github.com/992364ceca984d3f14099ec94aaacd9d> to make > TONS of > holes in a filesystem by calling fallocate() to punch holes in a file > that is almost as large as the volume size. > > I created an XFS filesystem image by the following steps: > 1. fallocate -l 80G /mnt/disk2/xfs > 2. mkfs.xfs -f -d agcount=1 /mnt/disk2/xfs > > Then I created a large file by fallocate: > fallocate -l 85823746048 /mnt/test/abc > > which left only 4 blocks available in the volume finally: > /dev/loop0 20961280 20961276 4 100% /mnt/test > > The result of xfs_bmap against /mnt/test/abc: > /mnt/test/abc: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..167624503]: 83000..167707503 0 (83000..167707503) 167624504 10000 > > After that, I used the hole-punching utility above to create holes on > the files, and captured the output of kmsg. > > When reading the log output > <https://gist.github.com/890076405e1c13c0a952a579e25e6afe> , I > realised that there is no B+Tree split > triggered by xfs_alloc_fix_freelist() when calling xfs_free_extent(). > Isn't B+Tree split possible in by-size B+Tree even when truncating a > longer freespace record to shorter one? But what I found in the log is > only a few tree shrinks... And when reading the source code of > freespace allocator I found that a B+Tree growth in this case is > impossible at least... > I'd suggest to use a combination of xfs_db and tracepoints/xfsstats to identify what's happening in your test sequence. E.g., unmount and use xfs_db to identify the state of the free space btree(s) before and after various points of your test. See [1] for examples of how to use xfs_db to explore on-disk data structures. See 'man trace-cmd' to work with tracepoints and /proc/fs/xfs/stats (and /proc/sys/fs/xfs/stats_clear) to view runtime statistics (which I believe already includes the number of btree splits). Brian [1] http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure//tmp/en-US/html/index.html > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [QUESTION] about the freelist allocator in XFS 2016-07-07 11:01 [QUESTION] about the freelist allocator in XFS Kaho Ng 2016-07-07 12:13 ` Brian Foster @ 2016-07-07 22:28 ` Dave Chinner [not found] ` <CAGeO4WNAdmeXgL4+CAQ1Yqo18XFgv3NZxWVbDTS0xDZLyb3e2w@mail.gmail.com> 2016-07-08 5:48 ` Kaho Ng 2016-07-08 19:17 ` Kaho Ng 2016-07-10 16:57 ` Kaho Ng 3 siblings, 2 replies; 13+ messages in thread From: Dave Chinner @ 2016-07-07 22:28 UTC (permalink / raw) To: Kaho Ng; +Cc: xfs On Thu, Jul 07, 2016 at 07:01:35PM +0800, Kaho Ng wrote: > I am trying to investigate how freelist allocator in xfs interacts > with freespace B+Tree allocator. > First I prepared a patch > <https://gist.github.com/22ffca35929e67c08759b057779b7566> on > linux-source/fs/xfs/libxfs/xfs_alloc.c to print debugging messages > (The kernel version used is linux-3.10.0-327.22.2.el7). ...... > When reading the log output > <https://gist.github.com/890076405e1c13c0a952a579e25e6afe> , I > realised that there is no B+Tree split > triggered by xfs_alloc_fix_freelist() when calling xfs_free_extent(). > Isn't B+Tree split possible in by-size B+Tree even when truncating a > longer freespace record to shorter one? But what I found in the log is > only a few tree shrinks... And when reading the source code of > freespace allocator I found that a B+Tree growth in this case is > impossible at least... args->isfl doesn't mean what you think it means. args->isfl is only set when moving blocks from the freespace btree to the AGFL, which only occurs when a previous operation allocated a new freespace btree block and depleted the current freelist. i.e. "AG Free List" != "AG freespace btree" - they are different structures on disk... And when you consider that a freelist refill can only remove records from the the freespace btree, it's should be clear that a btree split won't occur during a freelist refill... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <CAGeO4WNAdmeXgL4+CAQ1Yqo18XFgv3NZxWVbDTS0xDZLyb3e2w@mail.gmail.com>]
* Fwd: [QUESTION] about the freelist allocator in XFS [not found] ` <CAGeO4WNAdmeXgL4+CAQ1Yqo18XFgv3NZxWVbDTS0xDZLyb3e2w@mail.gmail.com> @ 2016-07-08 2:29 ` Kaho Ng [not found] ` <20160708034710.GL12670@dastard> 1 sibling, 0 replies; 13+ messages in thread From: Kaho Ng @ 2016-07-08 2:29 UTC (permalink / raw) To: xfs ---------- Forwarded message ---------- From: Kaho Ng <ngkaho1234@gmail.com> Date: Fri, Jul 8, 2016 at 10:26 AM Subject: Re: [QUESTION] about the freelist allocator in XFS To: Dave Chinner <david@fromorbit.com> Hmm, wouldn't xfs_alloc_ag_vextent_size() first remove the free extent record, and insert a new extent record into the freespace by-size btree if the found free extent record is longer than args->maxlen? On Fri, Jul 8, 2016 at 6:28 AM, Dave Chinner <david@fromorbit.com> wrote: > On Thu, Jul 07, 2016 at 07:01:35PM +0800, Kaho Ng wrote: >> I am trying to investigate how freelist allocator in xfs interacts >> with freespace B+Tree allocator. >> First I prepared a patch >> <https://gist.github.com/22ffca35929e67c08759b057779b7566> on >> linux-source/fs/xfs/libxfs/xfs_alloc.c to print debugging messages >> (The kernel version used is linux-3.10.0-327.22.2.el7). > ...... >> When reading the log output >> <https://gist.github.com/890076405e1c13c0a952a579e25e6afe> , I >> realised that there is no B+Tree split >> triggered by xfs_alloc_fix_freelist() when calling xfs_free_extent(). >> Isn't B+Tree split possible in by-size B+Tree even when truncating a >> longer freespace record to shorter one? But what I found in the log is >> only a few tree shrinks... And when reading the source code of >> freespace allocator I found that a B+Tree growth in this case is >> impossible at least... > > args->isfl doesn't mean what you think it means. > > args->isfl is only set when moving blocks from the freespace btree > to the AGFL, which only occurs when a previous operation allocated a > new freespace btree block and depleted the current freelist. i.e. > "AG Free List" != "AG freespace btree" - they are different > structures on disk... > > And when you consider that a freelist refill can only remove records > from the the freespace btree, it's should be clear that a btree > split won't occur during a freelist refill... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20160708034710.GL12670@dastard>]
* Re: [QUESTION] about the freelist allocator in XFS [not found] ` <20160708034710.GL12670@dastard> @ 2016-07-08 4:05 ` Kaho Ng 0 siblings, 0 replies; 13+ messages in thread From: Kaho Ng @ 2016-07-08 4:05 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs Hmm, wouldn't xfs_alloc_ag_vextent_size() first remove the free extent record, and insert a new extent record into the freespace by-size btree if the found free extent record is longer than args->maxlen? On Fri, Jul 8, 2016 at 11:47 AM, Dave Chinner <david@fromorbit.com> wrote: > On Fri, Jul 08, 2016 at 10:26:33AM +0800, Kaho Ng wrote: >> Hmm, wouldn't xfs_alloc_ag_vextent_size() first remove the free extent >> record, and insert a new extent record into the freespace by-size >> btree if the found free extent record is longer than args->maxlen? > > Please reply to the list, not privately. > > -Dave. > >> >> On Fri, Jul 8, 2016 at 6:28 AM, Dave Chinner <david@fromorbit.com> wrote: >> > On Thu, Jul 07, 2016 at 07:01:35PM +0800, Kaho Ng wrote: >> >> I am trying to investigate how freelist allocator in xfs interacts >> >> with freespace B+Tree allocator. >> >> First I prepared a patch >> >> <https://gist.github.com/22ffca35929e67c08759b057779b7566> on >> >> linux-source/fs/xfs/libxfs/xfs_alloc.c to print debugging messages >> >> (The kernel version used is linux-3.10.0-327.22.2.el7). >> > ...... >> >> When reading the log output >> >> <https://gist.github.com/890076405e1c13c0a952a579e25e6afe> , I >> >> realised that there is no B+Tree split >> >> triggered by xfs_alloc_fix_freelist() when calling xfs_free_extent(). >> >> Isn't B+Tree split possible in by-size B+Tree even when truncating a >> >> longer freespace record to shorter one? But what I found in the log is >> >> only a few tree shrinks... And when reading the source code of >> >> freespace allocator I found that a B+Tree growth in this case is >> >> impossible at least... >> > >> > args->isfl doesn't mean what you think it means. >> > >> > args->isfl is only set when moving blocks from the freespace btree >> > to the AGFL, which only occurs when a previous operation allocated a >> > new freespace btree block and depleted the current freelist. i.e. >> > "AG Free List" != "AG freespace btree" - they are different >> > structures on disk... >> > >> > And when you consider that a freelist refill can only remove records >> > from the the freespace btree, it's should be clear that a btree >> > split won't occur during a freelist refill... >> > >> > Cheers, >> > >> > Dave. >> > -- >> > Dave Chinner >> > david@fromorbit.com >> > > -- > Dave Chinner > david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [QUESTION] about the freelist allocator in XFS 2016-07-07 22:28 ` Dave Chinner [not found] ` <CAGeO4WNAdmeXgL4+CAQ1Yqo18XFgv3NZxWVbDTS0xDZLyb3e2w@mail.gmail.com> @ 2016-07-08 5:48 ` Kaho Ng 2016-07-10 23:22 ` Dave Chinner 1 sibling, 1 reply; 13+ messages in thread From: Kaho Ng @ 2016-07-08 5:48 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs Hmm, wouldn't xfs_alloc_ag_vextent_size() first remove the free extent record, and insert a new extent record into the freespace by-size btree if the found free extent record is longer than args->maxlen? On Fri, Jul 8, 2016 at 6:28 AM, Dave Chinner <david@fromorbit.com> wrote: > On Thu, Jul 07, 2016 at 07:01:35PM +0800, Kaho Ng wrote: >> I am trying to investigate how freelist allocator in xfs interacts >> with freespace B+Tree allocator. >> First I prepared a patch >> <https://gist.github.com/22ffca35929e67c08759b057779b7566> on >> linux-source/fs/xfs/libxfs/xfs_alloc.c to print debugging messages >> (The kernel version used is linux-3.10.0-327.22.2.el7). > ...... >> When reading the log output >> <https://gist.github.com/890076405e1c13c0a952a579e25e6afe> , I >> realised that there is no B+Tree split >> triggered by xfs_alloc_fix_freelist() when calling xfs_free_extent(). >> Isn't B+Tree split possible in by-size B+Tree even when truncating a >> longer freespace record to shorter one? But what I found in the log is >> only a few tree shrinks... And when reading the source code of >> freespace allocator I found that a B+Tree growth in this case is >> impossible at least... > > args->isfl doesn't mean what you think it means. > > args->isfl is only set when moving blocks from the freespace btree > to the AGFL, which only occurs when a previous operation allocated a > new freespace btree block and depleted the current freelist. i.e. > "AG Free List" != "AG freespace btree" - they are different > structures on disk... > > And when you consider that a freelist refill can only remove records > from the the freespace btree, it's should be clear that a btree > split won't occur during a freelist refill... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [QUESTION] about the freelist allocator in XFS 2016-07-08 5:48 ` Kaho Ng @ 2016-07-10 23:22 ` Dave Chinner 2016-07-11 7:06 ` Kaho Ng 0 siblings, 1 reply; 13+ messages in thread From: Dave Chinner @ 2016-07-10 23:22 UTC (permalink / raw) To: Kaho Ng; +Cc: xfs [please don't top post] On Fri, Jul 08, 2016 at 01:48:43PM +0800, Kaho Ng wrote: > On Fri, Jul 8, 2016 at 6:28 AM, Dave Chinner <david@fromorbit.com> wrote: > > On Thu, Jul 07, 2016 at 07:01:35PM +0800, Kaho Ng wrote: > >> I am trying to investigate how freelist allocator in xfs interacts > >> with freespace B+Tree allocator. > >> First I prepared a patch > >> <https://gist.github.com/22ffca35929e67c08759b057779b7566> on > >> linux-source/fs/xfs/libxfs/xfs_alloc.c to print debugging messages > >> (The kernel version used is linux-3.10.0-327.22.2.el7). > > ...... > >> When reading the log output > >> <https://gist.github.com/890076405e1c13c0a952a579e25e6afe> , I > >> realised that there is no B+Tree split > >> triggered by xfs_alloc_fix_freelist() when calling xfs_free_extent(). > >> Isn't B+Tree split possible in by-size B+Tree even when truncating a > >> longer freespace record to shorter one? But what I found in the log is > >> only a few tree shrinks... And when reading the source code of > >> freespace allocator I found that a B+Tree growth in this case is > >> impossible at least... > > > > args->isfl doesn't mean what you think it means. > > > > args->isfl is only set when moving blocks from the freespace btree > > to the AGFL, which only occurs when a previous operation allocated a > > new freespace btree block and depleted the current freelist. i.e. > > "AG Free List" != "AG freespace btree" - they are different > > structures on disk... > > > > And when you consider that a freelist refill can only remove records > > from the the freespace btree, it's should be clear that a btree > > split won't occur during a freelist refill... > > Hmm, wouldn't xfs_alloc_ag_vextent_size() first remove the free extent > record, and insert a new extent record into the freespace by-size > btree if the found free extent record is longer than args->maxlen? Good, you went and looked to verify what I said(*). Indeed, when you read xfs_alloc_fixup_trees() where the two trees are modified after an extent has been selected for allocation: ... * Delete the entry from the by-size btree. ... * Add new by-size btree entry(s). ... * Fix up the by-block btree entry(s) It's pretty clear what the answer is. IOWs, yes, you're correct - we only ever modify or delete records from the by-bno freespace tree, but the by-size tree does do a delete and insert and so can split.(**) So, while it is possible for a split to occur, you still didn't see one in your test. That's because a split is rather unlikely in your scenario because once we have a large enough freespace btree records stored for a split to occur, we almost always have exact matches in the by-count tree for freelist fills and so record deletes are the usual operationon the by-count tree. And even if we are doing a by-count insert, the chances the target block for the insert is full is quite small, as are the chances the target block for the insert is different to the block the original record was deleted from (consider ordering, what record index the initial lookup returns and that freelist fills usually only require a block or two to be allocated). Cheers, Dave. (*) That was what my answer was aimed at getting you to do as I get a lot of timewasters asking me via direct email to answer their homework questions for them. Anyone who is trying to understand the basics of the freespace btrees should have realised the by-count tree needs a delete/insert operation pair if we modify the length of a freespace record. (**) The lesson here is this: trust what people say, but always verify it when you can. e.g. I now know you'll look at the code to try to understand answers you are given, so my time answering your questions is not going to be wasted. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [QUESTION] about the freelist allocator in XFS 2016-07-10 23:22 ` Dave Chinner @ 2016-07-11 7:06 ` Kaho Ng 2016-07-11 22:53 ` Dave Chinner 0 siblings, 1 reply; 13+ messages in thread From: Kaho Ng @ 2016-07-11 7:06 UTC (permalink / raw) To: xfs On Mon, Jul 11, 2016 at 7:22 AM, Dave Chinner <david@fromorbit.com> wrote: > [please don't top post] > > On Fri, Jul 08, 2016 at 01:48:43PM +0800, Kaho Ng wrote: >> On Fri, Jul 8, 2016 at 6:28 AM, Dave Chinner <david@fromorbit.com> wrote: >> > On Thu, Jul 07, 2016 at 07:01:35PM +0800, Kaho Ng wrote: >> >> I am trying to investigate how freelist allocator in xfs interacts >> >> with freespace B+Tree allocator. >> >> First I prepared a patch >> >> <https://gist.github.com/22ffca35929e67c08759b057779b7566> on >> >> linux-source/fs/xfs/libxfs/xfs_alloc.c to print debugging messages >> >> (The kernel version used is linux-3.10.0-327.22.2.el7). >> > ...... >> >> When reading the log output >> >> <https://gist.github.com/890076405e1c13c0a952a579e25e6afe> , I >> >> realised that there is no B+Tree split >> >> triggered by xfs_alloc_fix_freelist() when calling xfs_free_extent(). >> >> Isn't B+Tree split possible in by-size B+Tree even when truncating a >> >> longer freespace record to shorter one? But what I found in the log is >> >> only a few tree shrinks... And when reading the source code of >> >> freespace allocator I found that a B+Tree growth in this case is >> >> impossible at least... >> > >> > args->isfl doesn't mean what you think it means. >> > >> > args->isfl is only set when moving blocks from the freespace btree >> > to the AGFL, which only occurs when a previous operation allocated a >> > new freespace btree block and depleted the current freelist. i.e. >> > "AG Free List" != "AG freespace btree" - they are different >> > structures on disk... >> > >> > And when you consider that a freelist refill can only remove records >> > from the the freespace btree, it's should be clear that a btree >> > split won't occur during a freelist refill... >> >> Hmm, wouldn't xfs_alloc_ag_vextent_size() first remove the free extent >> record, and insert a new extent record into the freespace by-size >> btree if the found free extent record is longer than args->maxlen? > > Good, you went and looked to verify what I said(*). Indeed, when > you read xfs_alloc_fixup_trees() where the two trees are modified > after an extent has been selected for allocation: > > ... > * Delete the entry from the by-size btree. > ... > * Add new by-size btree entry(s). > ... > * Fix up the by-block btree entry(s) > > It's pretty clear what the answer is. IOWs, yes, you're correct - we > only ever modify or delete records from the by-bno freespace tree, > but the by-size tree does do a delete and insert and so can > split.(**) > > So, while it is possible for a split to occur, you still didn't see > one in your test. That's because a split is rather unlikely in your > scenario because once we have a large enough freespace btree records > stored for a split to occur, we almost always have exact matches in > the by-count tree for freelist fills and so record deletes are the > usual operationon the by-count tree. > > And even if we are doing a by-count insert, the chances the target > block for the insert is full is quite small, as are the chances the > target block for the insert is different to the block the original > record was deleted from (consider ordering, what record index the > initial lookup returns and that freelist fills usually only require > a block or two to be allocated). > > Cheers, > > Dave. > > (*) That was what my answer was aimed at getting you to do as I get > a lot of timewasters asking me via direct email to answer their > homework questions for them. Anyone who is trying to understand the > basics of the freespace btrees should have realised the by-count > tree needs a delete/insert operation pair if we modify the length of > a freespace record. > > (**) The lesson here is this: trust what people say, but always > verify it when you can. e.g. I now know you'll look at the > code to try to understand answers you are given, so my time > answering your questions is not going to be wasted. > > -- > Dave Chinner > david@fromorbit.com Thanks for the detailed explaination! Just wonders why we prefer failing the request of refilling freelist with XFS_WANT_CORRUPTED_RETURN(mp, i == 1) in some rare case, rather than returning NULLAGBLOCK and allowing the loop in xfs_alloc_ag_vextent_size() to try xfs_alloc_ag_vextent_small()... In such corner case there will always be a lot of small extents at the front of the by-count tree, and any truncation changes to the first entry in the tree will not result in tree splits and triggering assertion failure. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [QUESTION] about the freelist allocator in XFS 2016-07-11 7:06 ` Kaho Ng @ 2016-07-11 22:53 ` Dave Chinner 0 siblings, 0 replies; 13+ messages in thread From: Dave Chinner @ 2016-07-11 22:53 UTC (permalink / raw) To: Kaho Ng; +Cc: xfs On Mon, Jul 11, 2016 at 03:06:03PM +0800, Kaho Ng wrote: > Just wonders why we prefer failing the request of refilling freelist > with XFS_WANT_CORRUPTED_RETURN(mp, i == 1) in some rare case, rather > than returning NULLAGBLOCK and allowing the loop in > xfs_alloc_ag_vextent_size() to try xfs_alloc_ag_vextent_small()... Have a look at where xfs_alloc_ag_vextent_small() gets the blocks it returns to the caller if the btree cursor doesn't point to a btree record we can use. i.e. you can't refill the free list from xfs_alloc_ag_vextent_small() because it allocates blocks from ... > In > such corner case there will always be a lot of small extents at the > front of the by-count tree, and any truncation changes to the first > entry in the tree will not result in tree splits and triggering > assertion failure. If there are records we can use, then we'll allocate them from the btree. Failure to allocate from the btree indicates something is inconsistent, there's a bug in the code or we've got corruption occuring. A corruption shutdown is the only safe course of action when we find something confusing like this - if we guess wrong them we'll only make the bad state/corruption worse. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [QUESTION] about the freelist allocator in XFS 2016-07-07 11:01 [QUESTION] about the freelist allocator in XFS Kaho Ng 2016-07-07 12:13 ` Brian Foster 2016-07-07 22:28 ` Dave Chinner @ 2016-07-08 19:17 ` Kaho Ng 2016-07-09 12:26 ` Kaho Ng 2016-07-10 16:57 ` Kaho Ng 3 siblings, 1 reply; 13+ messages in thread From: Kaho Ng @ 2016-07-08 19:17 UTC (permalink / raw) To: xfs Maybe i should clarify my inquiries first... It is about whether freelist refilling will trigger any tree splits or even tree growths in by-size freespace tree. When reading the source code of XFS(xfs_alloc.c) to find information about freelist refilling in xfs_free_extent(), I found that insertion to by-block B+ Tree is not possible to happen since there is only record updates in this tree. That sounds clear to me. But insertion to by-size B+ Tree may happen in xfs_alloc_fixup_trees() after removing an record from the tree. Thus I come up with a doubt. Is tree split or tree growth in by-size B+ Tree possible in the above case? On Thu, Jul 7, 2016 at 7:01 PM, Kaho Ng <ngkaho1234@gmail.com> wrote: > I am trying to investigate how freelist allocator in xfs interacts > with freespace B+Tree allocator. > First I prepared a patch > <https://gist.github.com/22ffca35929e67c08759b057779b7566> on > linux-source/fs/xfs/libxfs/xfs_alloc.c to print debugging messages > (The kernel version used is linux-3.10.0-327.22.2.el7). > Then, I wrote a simple utility > <https://gist.github.com/992364ceca984d3f14099ec94aaacd9d> to make > TONS of > holes in a filesystem by calling fallocate() to punch holes in a file > that is almost as large as the volume size. > > I created an XFS filesystem image by the following steps: > 1. fallocate -l 80G /mnt/disk2/xfs > 2. mkfs.xfs -f -d agcount=1 /mnt/disk2/xfs > > Then I created a large file by fallocate: > fallocate -l 85823746048 /mnt/test/abc > > which left only 4 blocks available in the volume finally: > /dev/loop0 20961280 20961276 4 100% /mnt/test > > The result of xfs_bmap against /mnt/test/abc: > /mnt/test/abc: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..167624503]: 83000..167707503 0 (83000..167707503) 167624504 10000 > > After that, I used the hole-punching utility above to create holes on > the files, and captured the output of kmsg. > > When reading the log output > <https://gist.github.com/890076405e1c13c0a952a579e25e6afe> , I > realised that there is no B+Tree split > triggered by xfs_alloc_fix_freelist() when calling xfs_free_extent(). > Isn't B+Tree split possible in by-size B+Tree even when truncating a > longer freespace record to shorter one? But what I found in the log is > only a few tree shrinks... And when reading the source code of > freespace allocator I found that a B+Tree growth in this case is > impossible at least... _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [QUESTION] about the freelist allocator in XFS 2016-07-08 19:17 ` Kaho Ng @ 2016-07-09 12:26 ` Kaho Ng 0 siblings, 0 replies; 13+ messages in thread From: Kaho Ng @ 2016-07-09 12:26 UTC (permalink / raw) To: xfs if ((error = xfs_alloc_lookup_eq(cnt_cur, nfbno1, nflen1, &i))) return error; XFS_WANT_CORRUPTED_RETURN(mp, i == 0); if ((error = xfs_btree_insert(cnt_cur, &i))) return error; XFS_WANT_CORRUPTED_RETURN(mp, i == 1); The code is extracted from xfs_alloc_fixup_trees(). When the layout of the leaf in the by-size tree is like this: +------------------------------+-------------------------------+---- | blkcnt: 5, startblk: xxxx | blkcnt: 7, startblk: yyyyy | ... +------------------------------+-------------------------------+---- which is full of items with blkcnt == 7 in the remaining space, if the freelist refilling process requires 6 blocks to be allocated, wouldn't a tree split is required for the insertion to proceed in case the siblings are also full? On Sat, Jul 9, 2016 at 3:17 AM, Kaho Ng <ngkaho1234@gmail.com> wrote: > Maybe i should clarify my inquiries first... It is about whether > freelist refilling will trigger any tree splits or even tree growths > in by-size freespace tree. > > When reading the source code of XFS(xfs_alloc.c) to find information > about freelist refilling in xfs_free_extent(), I found that insertion > to by-block B+ Tree is not possible to happen since there is only > record updates in this tree. That sounds clear to me. > But insertion to by-size B+ Tree may happen in xfs_alloc_fixup_trees() > after removing an record from the tree. > > Thus I come up with a doubt. Is tree split or tree growth in by-size > B+ Tree possible in the above case? > > On Thu, Jul 7, 2016 at 7:01 PM, Kaho Ng <ngkaho1234@gmail.com> wrote: >> I am trying to investigate how freelist allocator in xfs interacts >> with freespace B+Tree allocator. >> First I prepared a patch >> <https://gist.github.com/22ffca35929e67c08759b057779b7566> on >> linux-source/fs/xfs/libxfs/xfs_alloc.c to print debugging messages >> (The kernel version used is linux-3.10.0-327.22.2.el7). >> Then, I wrote a simple utility >> <https://gist.github.com/992364ceca984d3f14099ec94aaacd9d> to make >> TONS of >> holes in a filesystem by calling fallocate() to punch holes in a file >> that is almost as large as the volume size. >> >> I created an XFS filesystem image by the following steps: >> 1. fallocate -l 80G /mnt/disk2/xfs >> 2. mkfs.xfs -f -d agcount=1 /mnt/disk2/xfs >> >> Then I created a large file by fallocate: >> fallocate -l 85823746048 /mnt/test/abc >> >> which left only 4 blocks available in the volume finally: >> /dev/loop0 20961280 20961276 4 100% /mnt/test >> >> The result of xfs_bmap against /mnt/test/abc: >> /mnt/test/abc: >> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS >> 0: [0..167624503]: 83000..167707503 0 (83000..167707503) 167624504 10000 >> >> After that, I used the hole-punching utility above to create holes on >> the files, and captured the output of kmsg. >> >> When reading the log output >> <https://gist.github.com/890076405e1c13c0a952a579e25e6afe> , I >> realised that there is no B+Tree split >> triggered by xfs_alloc_fix_freelist() when calling xfs_free_extent(). >> Isn't B+Tree split possible in by-size B+Tree even when truncating a >> longer freespace record to shorter one? But what I found in the log is >> only a few tree shrinks... And when reading the source code of >> freespace allocator I found that a B+Tree growth in this case is >> impossible at least... _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [QUESTION] about the freelist allocator in XFS 2016-07-07 11:01 [QUESTION] about the freelist allocator in XFS Kaho Ng ` (2 preceding siblings ...) 2016-07-08 19:17 ` Kaho Ng @ 2016-07-10 16:57 ` Kaho Ng 2016-07-10 23:27 ` Dave Chinner 3 siblings, 1 reply; 13+ messages in thread From: Kaho Ng @ 2016-07-10 16:57 UTC (permalink / raw) To: xfs well, a piece of comment about the corner case i mentioned is found in xfsprogs/repair/phase5.c, but i still have no idea how that is prevented by the xfs kernel module. /* * We need to leave some free records in the tree for the corner case of * setting up the AGFL. This may require allocation of blocks, and as * such can require insertion of new records into the tree (e.g. moving * a record in the by-count tree when a long extent is shortened). If we * pack the records into the leaves with no slack space, this requires a * leaf split to occur and a block to be allocated from the free list. * If we don't have any blocks on the free list (because we are setting * it up!), then we fail, and the filesystem will fail with the same * failure at runtime. Hence leave a couple of records slack space in * each block to allow immediate modification of the tree without * requiring splits to be done. * * XXX(hch): any reason we don't just look at mp->m_alloc_mxr? */ On Thu, Jul 7, 2016 at 7:01 PM, Kaho Ng <ngkaho1234@gmail.com> wrote: > I am trying to investigate how freelist allocator in xfs interacts > with freespace B+Tree allocator. > First I prepared a patch > <https://gist.github.com/22ffca35929e67c08759b057779b7566> on > linux-source/fs/xfs/libxfs/xfs_alloc.c to print debugging messages > (The kernel version used is linux-3.10.0-327.22.2.el7). > Then, I wrote a simple utility > <https://gist.github.com/992364ceca984d3f14099ec94aaacd9d> to make > TONS of > holes in a filesystem by calling fallocate() to punch holes in a file > that is almost as large as the volume size. > > I created an XFS filesystem image by the following steps: > 1. fallocate -l 80G /mnt/disk2/xfs > 2. mkfs.xfs -f -d agcount=1 /mnt/disk2/xfs > > Then I created a large file by fallocate: > fallocate -l 85823746048 /mnt/test/abc > > which left only 4 blocks available in the volume finally: > /dev/loop0 20961280 20961276 4 100% /mnt/test > > The result of xfs_bmap against /mnt/test/abc: > /mnt/test/abc: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..167624503]: 83000..167707503 0 (83000..167707503) 167624504 10000 > > After that, I used the hole-punching utility above to create holes on > the files, and captured the output of kmsg. > > When reading the log output > <https://gist.github.com/890076405e1c13c0a952a579e25e6afe> , I > realised that there is no B+Tree split > triggered by xfs_alloc_fix_freelist() when calling xfs_free_extent(). > Isn't B+Tree split possible in by-size B+Tree even when truncating a > longer freespace record to shorter one? But what I found in the log is > only a few tree shrinks... And when reading the source code of > freespace allocator I found that a B+Tree growth in this case is > impossible at least... _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [QUESTION] about the freelist allocator in XFS 2016-07-10 16:57 ` Kaho Ng @ 2016-07-10 23:27 ` Dave Chinner 0 siblings, 0 replies; 13+ messages in thread From: Dave Chinner @ 2016-07-10 23:27 UTC (permalink / raw) To: Kaho Ng; +Cc: xfs On Mon, Jul 11, 2016 at 12:57:59AM +0800, Kaho Ng wrote: > well, a piece of comment about the corner case i mentioned is found in > xfsprogs/repair/phase5.c, but i still have no idea how that is > prevented by the xfs kernel module. > > /* > * We need to leave some free records in the tree for the corner case of > * setting up the AGFL. This may require allocation of blocks, and as > * such can require insertion of new records into the tree (e.g. moving > * a record in the by-count tree when a long extent is shortened). If we > * pack the records into the leaves with no slack space, this requires a > * leaf split to occur and a block to be allocated from the free list. > * If we don't have any blocks on the free list (because we are setting > * it up!), then we fail, and the filesystem will fail with the same > * failure at runtime. Hence leave a couple of records slack space in > * each block to allow immediate modification of the tree without > * requiring splits to be done. > * > * XXX(hch): any reason we don't just look at mp->m_alloc_mxr? > */ Once mkfs fills the AGFL with the minimum number of blocks to ensure allocations always succeed, the kernel guarantees that minimum number of blocks will always be available on the AGFL and hence allocation and freeing of blocks will always succeed, regardless of whether btree splits are needed or not. i.e. ENOSPC is not reported at "filesystem has zero freespace" but at "freespace btrees are empty and each AGFL is at minimum free blocks". See XFS_ALLOC_SET_ASIDE() and XFS_ALLOC_AG_MAX_USABLE(). Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-07-11 22:54 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-07 11:01 [QUESTION] about the freelist allocator in XFS Kaho Ng
2016-07-07 12:13 ` Brian Foster
2016-07-07 22:28 ` Dave Chinner
[not found] ` <CAGeO4WNAdmeXgL4+CAQ1Yqo18XFgv3NZxWVbDTS0xDZLyb3e2w@mail.gmail.com>
2016-07-08 2:29 ` Fwd: " Kaho Ng
[not found] ` <20160708034710.GL12670@dastard>
2016-07-08 4:05 ` Kaho Ng
2016-07-08 5:48 ` Kaho Ng
2016-07-10 23:22 ` Dave Chinner
2016-07-11 7:06 ` Kaho Ng
2016-07-11 22:53 ` Dave Chinner
2016-07-08 19:17 ` Kaho Ng
2016-07-09 12:26 ` Kaho Ng
2016-07-10 16:57 ` Kaho Ng
2016-07-10 23:27 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox