From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: ***** SUSPECTED SPAM ***** [RFD 06/17] xfs: partial inode chunk allocation
Date: Mon, 12 Aug 2013 23:19:56 +1000 [thread overview]
Message-ID: <1376313607-28133-7-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1376313607-28133-1-git-send-email-david@fromorbit.com>
From: Dave Chinner <dchinner@redhat.com>
When a filesystem ages or when certain workloads dominate the storage capacity
of the filesystem, it can become difficult to find contiguous free space in the
filesystem and hence inode allocation can fail long before the filesystem is out
of space.
To avoid this problem, we need to be able to use smaller extents in the
filesystem to hold inodes than the size needed to hold a full chunk. To enable
this, we need to keep track of the region of the inode chunk that has actually
been allocated in the inode allocation record itself. The inobt record contains
a free inode count field that uses 32 bits of space, but has a maximum possible
value of 64. Hence there are many bitsin the field that we can repurpose for
a "allocated regions" mask.
To simplify the implementation and checking of the field, split the 32 bit field
into an 8 bite count variable in the same location as the existing count (i.e.
the LSB of the 32 bit variable, remembering that XFS big endian on disk), an 8
bit pad field and a 16 bit mask field that contains the allocated extent
tracking.
As we have 16 bits in the mask, each bit represents 4 inodes and hence that
defines the minimum allocation size we can support. In all cases, this will
limit the largest contiguous allocation required to 2 blocks for a new as the
minimum filesystem block size is limited by mkfs to being twice the inode size.
In most common configurations, a single block will contain more than 4
inodes and so this isn't a major limitation at all.
Hence during extent allocation for the inode chunk, if we cannot find an aligned
and contiguous extent, we can settle for something that is as large as possible
and mask off the region that we weren't able to allocate. When freeing the
chunk, we'll also know what extent we need to free. And for untrusted inode
number lookup, we can determine if the inode number falls into the invalid part
of the chunk.
Further, to avoid needing to do multiple extent allocations for "sparse" inode
chunks, if we allocate an extent that overlaps an existing partial inode chunk,
we can simply update the mask and free count to indicate that there are multiple
valid extents in the chunk. This gives us a potential route for partial inode
chunks to be made whole via ongoing filesystem modification or a forced scan
once space has been made available.
To make this as close to transparent as possible, use a value of 0 to indicate
that there are valid inodes in this location, and a value of 1 to indicate that
it is an invalid region. This means that the filesystem will be backwards
compatible with existing kernels and userspace up until the first partial chunk
is allocated. At that point, we need to set an incompatible feature flag as
older kernels and userspace are unable to interpret the value in the "free
inodes" field correctly. This also means that if we scan the inode btrees and
determine that there are no partial inode chunks, we can remove the feature
bit...
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_ialloc_btree.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_ialloc_btree.h b/fs/xfs/xfs_ialloc_btree.h
index 3ac36b76..75ee794 100644
--- a/fs/xfs/xfs_ialloc_btree.h
+++ b/fs/xfs/xfs_ialloc_btree.h
@@ -48,7 +48,9 @@ static inline xfs_inofree_t xfs_inobt_maskn(int i, int n)
*/
typedef struct xfs_inobt_rec {
__be32 ir_startino; /* starting inode number */
- __be32 ir_freecount; /* count of free inodes (set bits) */
+ __be16 ir_alloc_mask;
+ __u8 ir_pad;
+ __u8 ir_freecount;
__be64 ir_free; /* free inode mask */
} xfs_inobt_rec_t;
--
1.8.3.2
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-08-12 13:20 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-12 13:19 ***** SUSPECTED SPAM ***** [RFD 00/17] xfs: inode management development direction Dave Chinner
2013-08-12 13:19 ` ***** SUSPECTED SPAM ***** [RFD 01/17] xfs: inode allocation tickets Dave Chinner
2013-08-12 13:19 ` ***** SUSPECTED SPAM ***** [RFD 02/17] xfs: separate inode chunk allocation from free inode allocation Dave Chinner
2013-08-12 13:19 ` ***** SUSPECTED SPAM ***** [RFD 03/17] xfs: move inode chunk allocation into a workqueue Dave Chinner
2013-08-12 13:19 ` ***** SUSPECTED SPAM ***** [RFD 04/17] xfs: optimise background inode chunk allocation Dave Chinner
2013-08-12 13:19 ` ***** SUSPECTED SPAM ***** [RFD 05/17] xfs: introduce a free inode allocation btree Dave Chinner
2013-08-12 13:19 ` Dave Chinner [this message]
2013-08-13 22:07 ` ***** SUSPECTED SPAM ***** [RFD 06/17] xfs: partial inode chunk allocation Brian Foster
2013-08-12 13:19 ` ***** SUSPECTED SPAM ***** [RFD 07/17] xfs: separate inode chunk freeing from inode freeing Dave Chinner
2013-08-12 13:19 ` ***** SUSPECTED SPAM ***** [RFD 08/17] xfs: inode chunk freeing in the background Dave Chinner
2013-08-12 13:19 ` ***** SUSPECTED SPAM ***** [RFD 09/17] xfs: optimise inode chunk freeing Dave Chinner
2013-08-12 13:20 ` ***** SUSPECTED SPAM ***** [RFD 10/17] xfs: swap extents operations for CRC filesystems Dave Chinner
2013-08-12 13:20 ` ***** SUSPECTED SPAM ***** [RFD 11/17] xfs: factor xfs_create to prepare for O_TMPFILE Dave Chinner
2013-08-20 8:16 ` Zhi Yong Wu
2013-11-06 11:20 ` Christoph Hellwig
2013-11-06 11:21 ` Christoph Hellwig
2013-08-12 13:20 ` ***** SUSPECTED SPAM ***** [RFD 12/17] xfs: add tmpfile methods Dave Chinner
2013-08-12 13:20 ` ***** SUSPECTED SPAM ***** [RFD 13/17] xfs: allow linkat() on O_TMPFILE files Dave Chinner
2013-08-12 13:20 ` ***** SUSPECTED SPAM ***** [RFD 14/17] xfs: separate inode freeing from inactivation Dave Chinner
2013-08-12 13:20 ` ***** SUSPECTED SPAM ***** [RFD 15/17] xfs: introduce a method vector for unlinked list operations Dave Chinner
2013-08-12 13:20 ` ***** SUSPECTED SPAM ***** [RFD 16/17] xfs: add in-core unlinked list for v3 inodes Dave Chinner
2013-08-12 13:20 ` ***** SUSPECTED SPAM ***** [RFD 17/17] xfs: log unlinked list modifications in the incore v3 inode Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1376313607-28133-7-git-send-email-david@fromorbit.com \
--to=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox